Given that we’ve successfully trained pretty strong multimodal language models at Reka, many people have been particularly curious about the experiences of building infrastructure and training large language & multimodal models from scratch from a completely clean slate.

I complain a lot about external (outside Google) infrastructure and code on my social media, leading people to really be curious about what are the things I miss and what I hate/love in the wilderness. So here’s a post (finally). This blogpost sheds light on the challenges and lessons learned.

Figuring out things in the wilderness was an interesting experience. It was unfortunately not painless. Compute scarcity and also unreliable compute providers made things significantly harder than expected but we’re glad we pulled through with brute technical strength.

All in all, this is only a small part of the story of how we started a company, raised some money, bought some chips and matched Gemini pro/GPT 3.5 and outperformed many others in less than a year having to build everything from scratch.

Send me a message or webmention
Back to feed