Episode Details
Back to Episodes
ThursdAI Aug 24 - Seamless Voice Model, LLaMa Code, GPT3.5 FineTune API & IDEFICS vision model from HF
Description
Hey everyone, this week has been incredible (isn’t every week?), and as I’m writing this, I had to pause and go check out breaking news about LLama code which was literally released on ThursdAI as I’m writing the summary! I think Meta deserves their own section in this ThursdAI update 👏
A few reminders before we dive in, we now have a website (thursdai.news) which will have all the links to Apple, Spotify, Full recordings with transcripts and will soon have a calendar you can join to never miss a live space!This whole thing would have been possible without Yam, Nisten, Xenova , VB, Far El, LDJ and other expert speakers from different modalities who join and share their expertise from week to week, and there’s a convenient way to follow all of them now!
TL;DR of all topics covered
* Voice
* Seamless M4T Model from Meta (demo)
* Open Source LLM
* LLaMa2 - code from Meta
* Vision
* IDEFICS - A multi modal text + image model from Hugging face
* AI Art & Diffusion
* 1 year of Stable Diffusion 🎂
* IdeoGram
* Big Co LLMs + API updates
* AI Tools & Things
* Cursor IDE
Voice
Seamless M4t - A multi lingual, mutli tasking, multimodality voice model.
To me, the absolute most mindblowing news of this week was Meta open sourcing (not fully, not commercially licensed) SeamlessM4T
This is a multi lingual model that takes speech (and/or text) can generate the following:
* Text
* Speech
* Translated Text
* Translated Speech
In a single model! For comparison sake, I takes a whole pipeline with whisper and other translators in targum.video not to mention much bigger models, and not to mention I don’t actually generate speech!
This incredible news got me giddy and excited so fast, not only because it simplifies and unifies so much of what I do into 1 model, and makes it faster and opens up additional capabilities, but also because I strongly believe in the vision that Language Barriers should not exist and that’s why I built Targum.
Meta apparently also believes in this vision, and gave us an incredible new power unlock that understands 100 languages and does so multilingually without effort.
Language barri