Selected NeurIPS 2025 papers across novel architectures, latent analysis, challenges to traditional beliefs, diffusion/SSM, and simple methods.
Reviews from ICLR 2026 selected papers — Why low-precision transformer training fails (Flash Attention analysis), and more.
Standout 2025 arxiv preprints — Predictable Scale (optimal hyperparameter scaling laws for LLM pretraining), and more.
A collection of memorable figures from papers — honesty/helpfulness in LLMs, training-loss zero, mechanistic explanations, and more.
Notable papers from industry labs — DeepSeek-OCR (contexts as optical compression) and more.
Reviews from ICLR 2025 orals — Unlearning-based Neural Interpretations and more.
ICML 2025 oral and spotlighted papers, organized by theme — new interpretations, consolidated metrics, observation-driven approaches.
Reviewing the counter-intuitive relationship between memorization and generalization in deep networks — from rethinking-generalization (ICLR 2017) onward.
Survey of architectures and training strategies that extend LLM context — efficient transformers, SSMs, and benchmarks.
Trends in deep learning that may be artifacts of evaluation choices — emergent abilities and how metric design can manufacture or erase them.
NeurIPS 2024 oral session papers grouped by theme — rethinking motivations, interpretability for LLMs, novel architectures, applications.
Selected arxiv preprints worth reading from 2024 and earlier.
Reviews from COLM 2024 and ICLR 2024 outstanding papers — TOFU unlearning, fair long-sequence comparisons, and more.
Reviews of ICML 2024 oral and spotlighted papers, with trend context and motivations laid out before each paper.
From Neural ODE to differential-equation-style formulations of LLM training and inference.
Older papers worth revisiting — both for industry impact and for the problem-solving strategies behind them. Starting with the Neural Tangent Kernel.
Short notes on papers that don't fit elsewhere — starting with Similarity of Neural Network Representations Revisited.
Reviews of NeurIPS 2023 oral and spotlight papers focused on large language models.