Episode Details

97% Cheaper, Faster, Better, Correct AI — with Varun Mohan of Codeium

Published 3 years ago

Description

OpenAI just rollicked the AI world yet again yesterday — while releasing the long awaited ChatGPT API, they also priced it at $2 per million tokens generated, which is 90% cheaper than the text-davinci-003 pricing of the “GPT3.5” family. Their blogpost on how they did it is vague: Through a series of system-wide optimizations, we’ve achieved 90% cost reduction for ChatGPT since December; we’re now passing through those savings to API users.

We were fortunate enough to record Episode 2 of our podcast with someone who routinely creates 90%+ improvements for their customers, and in fact have started productizing their own infra skills with Codeium, the rapidly growing free-forever Copilot alternative (see What Building “Copilot for X” Really Takes). Varun Mohan is CEO of Exafunction/Codeium, and he indulged us in diving deep into AI infrastructure, compute-optimal training vs inference tradeoffs, and why he loves suffering.

Recorded in-person at the beautiful StudioPod studios in San Francisco.

Full transcript is below the fold.

Timestamps

* 00:00: Intro to Varun and Exafunction

* 03:06: GPU Efficiency, Model Flop Utilization, Dynamic Multiplexing

* 05:30: Should companies own their ML infrastructure?

* 07:00: The two kinds of LLM Applications

* 08:30: Codeium

* 14:50: “Our growth is 4-5% day over day”

* 16:30: Latency, Quality, and Correctability

* 20:30: Acceleration mode vs Exploration mode

* 22:00: Copilot for X - Harvey AI’s deal with Allen & Overy