Episode Details
Back to Episodes
97% Cheaper, Faster, Better, Correct AI — with Varun Mohan of Codeium
Description
OpenAI just rollicked the AI world yet again yesterday — while releasing the long awaited ChatGPT API, they also priced it at $2 per million tokens generated, which is 90% cheaper than the text-davinci-003 pricing of the “GPT3.5” family. Their blogpost on how they did it is vague: Through a series of system-wide optimizations, we’ve achieved 90% cost reduction for ChatGPT since December; we’re now passing through those savings to API users.
We were fortunate enough to record Episode 2 of our podcast with someone who routinely creates 90%+ improvements for their customers, and in fact have started productizing their own infra skills with Codeium, the rapidly growing free-forever Copilot alternative (see What Building “Copilot for X” Really Takes). Varun Mohan is CEO of Exafunction/Codeium, and he indulged us in diving deep into AI infrastructure, compute-optimal training vs inference tradeoffs, and why he loves suffering.
Recorded in-person at the beautiful StudioPod studios in San Francisco.
Full transcript is below the fold.
Timestamps
* 00:00: Intro to Varun and Exafunction
* 03:06: GPU Efficiency, Model Flop Utilization, Dynamic Multiplexing
* 05:30: Should companies own their ML infrastructure?
* 07:00: The two kinds of LLM Applications
* 08:30: Codeium
* 14:50: “Our growth is 4-5% day over day”
* 16:30: Latency, Quality, and Correctability
* 20:30: Acceleration mode vs Exploration mode
* 22:00: Copilot for X - Harvey AI’s deal with Allen & Overy
* 25:00: Scaling Laws (Chinchilla)
* 28:45: “The compute-optimal model might not be easy to serve”
* 30:00: Smaller models
* 32:30: Deepmind Retro can retrieve external infromation
* 34:30: Implications for embedding databases
* 37:10: LLMOps - Eval, Data Cleaning
* 39:45: Testing/User feedback
* 41:00: “Users Is All You Need”
* 42:45: General Intelligence + Domain Specific Dataset
* 43:15: The God Nvidia computer
* 46:00: Lightning round
Show notes
* Blogpost: Are GPUs Worth it for ML
* Codeium
* Eleuther’s The Pile and The Stack
* What Building “Copilot for X” Really Takes
* Copilot for X
* Harvey, Copilot for Law - deal with Allen & Overy
* Scaling Laws
* Training Compute-Optimal Large Language Models - arXiv (Chinchilla paper)
* chinchilla's wild implications (LessWrong)