Knowledge gains its value when shared.
Recent Posts
DiT training has three sources of shape dynamism that cause torch.compile to recompile every step. We eliminated all three and got stable compiled training on a consumer GPU.
Flash Attention 4 doesn't support consumer Blackwell GPUs yet. We fixed three critical bugs and got it running on the RTX 5060 Ti.
A C++ critique video as a lens into vibe coding and the myth of total code comprehension.
Two precision-oriented features for the LoRA training pipeline: lora_fp32_accumulation and attn_softmax_scale.
Personal opinion on the paper 'Epiplexity'
How to build native Windows desktop applications that integrate with the Claude Code CLI using a pure Rust backend.
Personal opinion on AI slops
An exploration of the challenges in long context language model research.
How to build native Windows desktop applications that integrate with the Claude Code CLI using a pure Rust backend.
A personal take on the role of persona in the agentic LLM paradigm.
How to build native Windows desktop applications that integrate with the Claude Code CLI using a pure Rust backend.