Episode Details

📅 ThursdAI - Sunday special on Merging with Maxime LaBonne

Published 2 years, 2 months ago

Description

Hey everyone, we have an exciting interview today with Maxime Labonne.

Maxime is a senior Machine Learning Scientist at JPMorgan, the author of Hands on GNNs book and his own ML Blog, creator of LazyMergeKit (which we cover on the pod) and holds a PHD in Artificial Intelligence from the Institut Polytechnique de Paris.

Maxime has been mentioned on ThursdAI a couple of times before, as he released the first Phi mixture-of-experts, and has previously finetuned OpenHermes using DPO techniques which resulted in NeuralChat7B

For the past couple of months, following AI on X, it was hard not to see Maxime's efforts show up on the timeline, and one of the main reasons I invited Maxime to chat was the release of NeuralBeagle7B, which at the time of writing was the top performing 7B model on the LLM leaderboard, and was specifically a merge of a few models.

Model merging

Model merging has been around for a while but recently has been heating up, and Maxime has a lot to do with that, as he recently checked, and his wrapper on top of MergeKit by Charles Goddard (which is the library that put model merging into the mainstream) called LazyMergeKit was in charge of >50% of the merged models on HuggingFace hub leaderboard.

Maxime also authored a model merging blogpost on Hugging Face and wrote quite a few articles and shared code that helped others to put merged models out.

Modern day Alchemy

This blogpost is a great resource on what model merging actually does, so I won't go into depth of what the algorithms are, please refer to that if you want a deep dive, but in a nutshell, model merging is a technique to apply algorithms to the weights of a few models, even a few instances of the same model (like Mistral7B) and create a new model, that often performs better than the previous ones, without additional training!

Since this is algorithmic, it doesn't require beefy GPUs burning power to keep training or finetuning, and since the barrier of entry is very low, we get some cool and crazy results as you'll see below.

Yeah, quite crazy as it sounds, this method can also create models of non standard sizes, like 10B or 120B models, since it's slicing pieces of other models and stitching them together in new ways.

If you recall, we had a deep dive with Jon Durbin who released Bagel, and Jon specifically mentioned that he created Bagel (based on everything everywhere all at once) as a good base for merges, that will include all the prompt formats, you can read and listen to that episode here

This merge frenzy, made HuggingFace change the leaderboard, and add a checkbox that hides model merges, because they are flooding the leaderboard, and often, and require much smaller effort than actually pre-training or even finetuning a model

And quite often the top of the leaderboard was overrun with model merges like in this example of Bagel and it's merges by CloudYu (which are not the top ones but still in the top 10 as I write this)

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new

Episode Details

📅 ThursdAI - Sunday special on Merging with Maxime LaBonne

Description

Listen Now

Love PodBriefly?