Podcast Episodes

No image available

KV Cache Explained

In this episode, we dive into the intriguing mechanics behind why chat experiences with models like GPT often start slow but then rapidly pick up speed. The key? The KV cache. This essential but unde…

Published on 10 months, 2 weeks ago

View Episode

Short Summary Not Available Long Summary Not Available

No image available

The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs

In this byte-sized podcast, Harrison Chu, Director of Engineering at Arize, breaks down the Shrek Sampler.

This innovative Entropy-Based Sampling technique--nicknamed the 'Shrek Sampler--is transform…

Published on 10 months, 3 weeks ago

View Episode

Short Summary Not Available Long Summary Not Available

No image available

Google's NotebookLM and the Future of AI-Generated Audio

This week, Aman Khan and Harrison Chu explore NotebookLM’s unique features, including its ability to generate realistic-sounding podcast episodes from text (but this podcast is very real!). They dive…

Published on 10 months, 3 weeks ago

View Episode

Short Summary Not Available Long Summary Not Available

No image available

Exploring OpenAI's o1-preview and o1-mini

OpenAI recently released its o1-preview, which they claim outperforms GPT-4o on a number of benchmarks. These models are designed to think more before answering and handle complex tasks better than t…

Published on 11 months, 1 week ago

View Episode

Short Summary Not Available Long Summary Not Available

No image available

Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning

A recent announcement on X boasted a tuned model with pretty outstanding performance, and claimed these results were achieved through Reflection Tuning. However, people were unable to reproduce the r…

Published on 11 months, 2 weeks ago

View Episode

Short Summary Not Available Long Summary Not Available

No image available

Composable Interventions for Language Models

This week, we're excited to be joined by Kyle O'Brien, Applied Scientist at Microsoft, to discuss his most recent paper, Composable Interventions for Language Models. Kyle and his team present a new …

Published on 11 months, 3 weeks ago

View Episode

Short Summary Not Available Long Summary Not Available

No image available

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

This week’s paper presents a comprehensive study of the performance of various LLMs acting as judges. The researchers leverage TriviaQA as a benchmark for assessing objective knowledge reasoning of L…

Published on 1 year ago

View Episode

Short Summary Not Available Long Summary Not Available

No image available

Breaking Down Meta's Llama 3 Herd of Models

Meta just released Llama 3.1 405B–according to them, it’s “the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerabi…

Published on 1 year, 1 month ago

View Episode

Short Summary Not Available Long Summary Not Available

No image available

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

Chaining language model (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic “prompt engineering.”

The paper this …

Published on 1 year, 1 month ago

View Episode

Short Summary Not Available Long Summary Not Available

No image available

RAFT: Adapting Language Model to Domain Specific RAG

Where adapting LLMs to specialized domains is essential (e.g., recent news, enterprise private documents), we discuss a paper that asks how we adapt pre-trained LLMs for RAG in specialized domains. S…

Published on 1 year, 2 months ago

View Episode

Short Summary Not Available Long Summary Not Available

1
2
3
4
5

If you like Podbriefly.com, please consider donating to support the ongoing development.

Donate

© 2025 Developer Service

Developer Service

Courses Developer Service

Is It Clickbait