#1CS336: Language Modeling from Scratch
Stanford's CS336, taught by Percy Liang and Tatsunori Hashimoto, has students build every component of a language model from the ground up — tokenization, transformer architecture, custom GPU kernels in Triton, distributed training, scaling laws, and the full post-training alignment stack. No pretrained weights, no shortcuts: it's an operating-systems-from-scratch philosophy applied directly to modern LLMs. The course is a 5-unit commitment described by its own instructors as "very implementation-heavy," targeting students who already know PyTorch and GPU memory hierarchies.




