Episode Details
Back to Episodes
π ThursdAI - May 9 - AlphaFold 3, im-a-good-gpt2-chatbot, Open Devin SOTA on SWE-Bench, DeepSeek V2 super cheap + interview with OpenUI creator & more AI news
Description
Hey π (show notes and links a bit below)
This week has been a great AI week, however, it does feel like a bit "quiet before the storm" with Google I/O on Tuesday next week (which I'll be covering from the ground in Shoreline!) and rumors that OpenAI is not just going to let Google have all the spotlight!
Early this week, we got 2 new models on LMsys, im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot, and we've now confirmed that they are from OpenAI, and folks have been testing them with logic puzzles, role play and have been saying great things, so maybe that's what we'll get from OpenAI soon?
Also on the show today, we had a BUNCH of guests, and as you know, I love chatting with the folks who make the news, so we've been honored to host Xingyao Wang and Graham Neubig core maintainers of Open Devin (which just broke SOTA on Swe-Bench this week!) and then we had friends of the pod Tanishq Abraham and Parmita Mishra dive deep into AlphaFold 3 from Google (both are medical / bio experts).
Also this week, OpenUI from Chris Van Pelt (Co-founder & CIO at Weights & Biases) has been blowing up, taking #1 Github trending spot, and I had the pleasure to invite Chris and chat about it on the show!
Let's delve into this (yes, this is I, Alex the human, using Delve as a joke, don't get triggered π)
TL;DR of all topics covered (trying something new, my Raw notes with all the links and bulletpoints are at the end of the newsletter)
* Open Source LLMs
* OpenDevin getting SOTA on Swe-Bench with 21% (X, Blog)
* DeepSeek V2 - 236B (21B Active) MoE (X, Try It)
* Weights & Biases OpenUI blows over 11K stars (X, Github, Try It)
* LLama-3 120B Chonker Merge from Maxime Labonne (X, HF)
* Alignment Lab open sources Buzz - 31M rows training dataset (X, HF)
* xLSTM - new transformer alternative (X, Paper, Critique)
* Benchmarks & Eval updates
* LLama-3 still in 6th place (LMsys analysis)
* Reka Core gets awesome 7th place and Qwen-Max breaks top 10 (X)
* No upsets in LLM leaderboard
* Big CO LLMs + APIs
* Google DeepMind announces AlphaFold-3 (Paper, Announcement)
* OpenAI publishes their Model Spec (Spec)
* OpenAI tests 2 models on LMsys (im-also-a-good-gpt2-chatbot & im-a-good-gpt2-chatbot)
* OpenAI joins Coalition for Conten