Transformer Language Model — From Scratch (In Progress)
Implemented a multi-layer decoder-only transformer in PyTorch from first principles, including embeddings, multi-head self-attention, feedforward blocks, and layer normalization. Built a full training pipeline with batching, optimization, evaluation, and autoregressive text generation. Conducting experiments on training dynamics and generalization behavior.