Reshare: Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20

lqdev👽05/28/2024

https://github.com/karpathy/llm.c/discussions/481

...the TLDR is that we're training a 12-layer GPT-2 (124M), from scratch, on 10B tokens of FineWeb, with max sequence length of 1024 tokens.

The 124M model is the smallest model in the GPT-2 series released by OpenAI in 2019, and is actually quite accessible today, even for the GPU poor. With llm.c, which is quite efficient at up to ~60% model flops utilization, reproducing this model on one 8X A100 80GB SXM node takes ~90 minutes. For example, on Lambda this node goes for ~$14/hr, so the total cost of reproducing this model today is about $20. You can train the model with a single GPU too, it would just take proportionally longer (e.g. ~4-24 hours depending on the GPU).

Permalink: /feed/repro-gpt-2-llm-c-90-min-20-dollars-karpathy/

Tags: #ai #llm #gpt #gpt2 #llmc #c #slm

Back to feed

Send me a message or webmention