Episode Details

Back to Episodes
97% Cheaper, Faster, Better, Correct AI — with Varun Mohan of Codeium

97% Cheaper, Faster, Better, Correct AI — with Varun Mohan of Codeium

Published 3 years ago
Description

OpenAI just rollicked the AI world yet again yesterday — while releasing the long awaited ChatGPT API, they also priced it at $2 per million tokens generated, which is 90% cheaper than the text-davinci-003 pricing of the “GPT3.5” family. Their blogpost on how they did it is vague: Through a series of system-wide optimizations, we’ve achieved 90% cost reduction for ChatGPT since December; we’re now passing through those savings to API users.

We were fortunate enough to record Episode 2 of our podcast with someone who routinely creates 90%+ improvements for their customers, and in fact have started productizing their own infra skills with Codeium, the rapidly growing free-forever Copilot alternative (see What Building “Copilot for X” Really Takes). Varun Mohan is CEO of Exafunction/Codeium, and he indulged us in diving deep into AI infrastructure, compute-optimal training vs inference tradeoffs, and why he loves suffering.

Recorded in-person at the beautiful StudioPod studios in San Francisco.

Full transcript is below the fold.

Timestamps

* 00:00: Intro to Varun and Exafunction

* 03:06: GPU Efficiency, Model Flop Utilization, Dynamic Multiplexing

* 05:30: Should companies own their ML infrastructure?

* 07:00: The two kinds of LLM Applications

* 08:30: Codeium

* 14:50: “Our growth is 4-5% day over day”

* 16:30: Latency, Quality, and Correctability

* 20:30: Acceleration mode vs Exploration mode

* 22:00: Copilot for X - Harvey AI’s deal with Allen & Overy

* 25:00: Scaling Laws (Chinchilla)

* 28:45: “The compute-optimal model might not be easy to serve”

* 30:00: Smaller models

* 32:30: Deepmind Retro can retrieve external infromation

* 34:30: Implications for embedding databases

* 37:10: LLMOps - Eval, Data Cleaning

* 39:45: Testing/User feedback

* 41:00: “Users Is All You Need”

* 42:45: General Intelligence + Domain Specific Dataset

* 43:15: The God Nvidia computer

* 46:00: Lightning round

Show notes

* Varun Mohan Linkedin

* Exafunction

* Blogpost: Are GPUs Worth it for ML

* Codeium

* Copilot statistics

* Eleuther’s The Pile and The Stack

* What Building “Copilot for X” Really Takes

* Copilot for X

* Harvey, Copilot for Law - deal with Allen & Overy

* Scaling Laws

* Training Compute-Optimal Large Language Models - arXiv (Chinchilla paper)

* chinchilla's wild implications (LessWrong)

* UL2 20B: An

Listen Now