Jerry Xiong

Archived Posts

[2025-06-12] RoPE visualization
[2025-04-26] mlp visualization (prototype)
[2025-04-17] disentangling raw and effective sample efficiency
[2025-04-17] why categorical regression stabilizes rl
[2024-12-13] on meta-rl, again
[2024-11-25] training for infinite contexts via gradient bootstrapping
[2024-11-09] kv caching with diffusion forcing
[2024-11-06] two steps to AGI
[2024-10-12] beyond next token prediction: data as interaction
[2024-07-30] byol learning dynamics are simple and well understood
[2024-07-29] discrete diffusion was always going to suck
[2024-06-15] runtime patching of a nn.Module
[2024-06-15] ssl cond indep is easy and fun
[2024-06-13] jepa == byol
[2024-06-13] synthetic data (we live in a society)
[2024-06-09] build software for ai
[2024-02-14] a perspective on simsiam