Episode Details
Back to Episodes
ThursdAI - May 2nd - New GPT2? Copilot Workspace, Evals and Vibes from Reka, LLama3 1M context (+ Nous finetune) & more AI news
Description
Hey 👋 Look it May or May not be the first AI newsletter you get in May, but it's for sure going to be a very information dense one. As we had an amazing conversation on the live recording today, over 1K folks joined to listen to the first May updates from ThursdAI.
As you May know by now, I just love giving the stage to folks who are the creators of the actual news I get to cover from week to week, and this week, we had again, 2 of those conversations.
First we chatted with Piotr Padlewski from Reka, the author on the new Vibe-Eval paper & Dataset which they published this week. We've had Yi and Max from Reka on the show before, but it was Piotr's first time and he was super super knowledgeable, and was really fun to chat with.
Specifically, as we at Weights & Biases launch a new product called Weave (which you should check out at https://wandb.me/weave) I'm getting more a LOT more interested in Evaluations and LLM scoring, and in fact, we started the whole show today with a full segment on Evals, Vibe checks and covered a new paper from Scale about overfitting.
The second deep dive was with my friend Idan Gazit, from GithubNext, about the new iteration of Github Copilot, called Copilot Workspace. It was a great one, and you should definitely give that one a listen as well
TL;DR of all topics covered + show notes
* Scores and Evals
* No notable changes, LLama-3 is still #6 on LMsys
* gpt2-chat came and went (in depth chan writeup)
* Scale checked for Data Contamination on GSM8K using GSM-1K (Announcement, Paper)
* Vibes-Eval from Reka - a set of multimodal evals (Announcement, Paper, HF dataset)
* Open Source LLMs
* Gradient releases 1M context window LLama-3 finetune (X)
* MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4 (X, HF)
* Nous Research - Hermes Pro 2 - LLama 3 8B (X, HF)
* AI Town is running on Macs thanks to Pinokio (X)
* LMStudio releases their CLI - LMS (X, Github)
* Big CO LLMs + APIs
* Github releases Copilot Workspace (Announcement)
* AI21 - releases Jamba Instruct w/ 256K context (Announcement)
* Google shows Med-Gemini with some great results (Announcement)
* Claude releases IOS app and Team accounts (X)
* This weeks Buzz