Step 2: Reproduce Large Language Model (from scratch)
Let’s Continue with the Second Step
At this point, you've built a strong foundation in generative deep learning. Now it’s time to dive into how models like ChatGPT are actually built.
Table of Contents
Build a Large Language Model (From Scratch)
This book is an excellent next step. Written by Sebastian Raschka, it guides you through the full process of developing a GPT-like Large Language Model (LLM) from scratch.
You'll Learn How To
Understand the architecture of LLMs
Implement your own transformer model
Pretrain it on a text corpus
Finetune it for specific tasks
The book is highly practical and comes with a comprehensive GitHub repository full of hands-on examples built in PyTorch. Don’t worry if you’re new to PyTorch, the book even includes an appendix to help you get started with it.
Outline of the Book
Ch 1: Understanding Large Language Models
Ch 2: Working with Text Data
Ch 3: Coding Attention Mechanisms
Ch 4: Implementing a GPT Model from Scratch
Ch 5: Pretraining on Unlabeled Data
Ch 6: Finetuning for Text Classification
Ch 7: Finetuning to Follow Instructions
Unlike the first book where you could skip around, the chapters in this book are sequentially dependent, each one builds directly on the previous. So, it's best to read the book from Chapter 1 all the way through to Chapter 7, without skipping.
What You'll Achieve
Once you finish this book, you'll not only understand how LLMs like ChatGPT work, but you'll also have built your own simplified version.
And with that, you’re ready for Step 3: Reinforcement Learning from Human Feedback (RLHF), which is important for LLM reasoning.
Want to Go Deeper?
This book focuses primarily on practical implementation. If you’re also interested in the mathematical theory behind LLMs (which not everyone is), there’s a helpful preprint on arXiv: Foundations of Large Language Models that complements the material.
Very Useful Alternative Resource
If you prefer video over textbooks, one of the best resources for learning about LLMs comes from one of the most prominent researchers in the field, Andrej Karpathy. His video series is a must-watch due to the sheer depth and clarity of the explanations. Not only does he teach in an easy-to-understand manner, but he also codes everything step by step.
Deep Dive into LLMs like ChatGPT In this video, Karpathy introduces Large Language Models (LLMs), explains how they work, and covers some of the most recent updates. This is the first video you should watch to get a high-level understanding of the world of LLMs.
Let's build GPT: from scratch, in code, spelled out Here, he focuses on building GPT, specifically GPT-2, from scratch, walking through the entire process in code.
Let's build the GPT Tokenizer This video explains how tokenization works, the process of converting text into tensors or vectors so it can be processed by a model. Karpathy builds a tokenizer from scratch using Byte Pair Encoding (BPE).
Let's reproduce GPT-2 (124M) This is an advanced follow-up to the earlier videos. It not only builds GPT-2 but also reproduces its results using real-world optimization and training strategies.
Last updated