Archived Posts
- [2025-06-12] RoPE visualization
- [2025-04-26] mlp visualization (prototype)
- [2025-04-17] disentangling raw and effective sample efficiency
- [2025-04-17] why categorical regression stabilizes rl
- [2024-12-13] on meta-rl, again
- [2024-11-25] training for infinite contexts via gradient bootstrapping
- [2024-11-09] kv caching with diffusion forcing
- [2024-11-06] two steps to AGI
- [2024-10-12] beyond next token prediction: data as interaction
- [2024-07-30] byol learning dynamics are simple and well understood
- [2024-07-29] discrete diffusion was always going to suck
- [2024-06-15] runtime patching of a nn.Module
- [2024-06-15] ssl cond indep is easy and fun
- [2024-06-13] jepa == byol
- [2024-06-13] synthetic data (we live in a society)
- [2024-06-09] build software for ai
- [2024-02-14] a perspective on simsiam