Step 2: Large Language Model (from scratch)
Let's continue with the second step.
At this point, you've built a strong foundation in generative deep learning. Now it’s time to dive into how models like ChatGPT are actually built.
Step 2: Build a Large Language Model (From Scratch)
This book is an excellent next step. Written by Sebastian Raschka, it guides you through the full process of developing a GPT-like Large Language Model (LLM) from scratch.
You'll learn how to:
Understand the architecture of LLMs
Implement your own transformer model
Pretrain it on a text corpus
Finetune it for specific tasks
The book is highly practical and comes with a comprehensive GitHub repository full of hands-on examples built in PyTorch. Don’t worry if you’re new to PyTorch, the book even includes an appendix to help you get started with it.
Here’s the outline of the book:
Ch 1: Understanding Large Language Models
Ch 2: Working with Text Data
Ch 3: Coding Attention Mechanisms
Ch 4: Implementing a GPT Model from Scratch
Ch 5: Pretraining on Unlabeled Data
Ch 6: Finetuning for Text Classification
Ch 7: Finetuning to Follow Instructions
Unlike the first book where you could skip around, the chapters in this book are sequentially dependent, each one builds directly on the previous. So, it's best to read the book from Chapter 1 all the way through to Chapter 7, without skipping.
Once you finish this book, you'll not only understand how LLMs like ChatGPT work, but you'll also have built your own simplified version. And with that, you’re ready for Step 3: Reinforcement Learning from Human Feedback (RLHF), which is important for LLM reasoning.
Want to go deeper? This book focuses primarily on practical implementation. If you’re also interested in the mathematical theory behind LLMs (which not everyone is), there’s a helpful preprint on arXiv title 'Foundations of Large Language Models' that complements the material.
Last updated