Episode Details

Back to Episodes
ThursdAI Aug 24 - Seamless Voice Model, LLaMa Code, GPT3.5 FineTune API & IDEFICS vision model from HF

ThursdAI Aug 24 - Seamless Voice Model, LLaMa Code, GPT3.5 FineTune API & IDEFICS vision model from HF

Published 2 years, 7 months ago
Description

Hey everyone, this week has been incredible (isn’t every week?), and as I’m writing this, I had to pause and go check out breaking news about LLama code which was literally released on ThursdAI as I’m writing the summary! I think Meta deserves their own section in this ThursdAI update 👏

A few reminders before we dive in, we now have a website (thursdai.news) which will have all the links to Apple, Spotify, Full recordings with transcripts and will soon have a calendar you can join to never miss a live space!This whole thing would have been possible without Yam, Nisten, Xenova , VB, Far El, LDJ and other expert speakers from different modalities who join and share their expertise from week to week, and there’s a convenient way to follow all of them now!

TL;DR of all topics covered

* Voice

* Seamless M4T Model from Meta (demo)

* Open Source LLM

* LLaMa2 - code from Meta

* Vision

* IDEFICS - A multi modal text + image model from Hugging face

* AI Art & Diffusion

* 1 year of Stable Diffusion 🎂

* IdeoGram

* Big Co LLMs + API updates

* GPT 3.5 Finetuninng API

* AI Tools & Things

* Cursor IDE

Voice

Seamless M4t - A multi lingual, mutli tasking, multimodality voice model.

To me, the absolute most mindblowing news of this week was Meta open sourcing (not fully, not commercially licensed) SeamlessM4T

This is a multi lingual model that takes speech (and/or text) can generate the following:

* Text

* Speech

* Translated Text

* Translated Speech

In a single model! For comparison sake, I takes a whole pipeline with whisper and other translators in targum.video not to mention much bigger models, and not to mention I don’t actually generate speech!

This incredible news got me giddy and excited so fast, not only because it simplifies and unifies so much of what I do into 1 model, and makes it faster and opens up additional capabilities, but also because I strongly believe in the vision that Language Barriers should not exist and that’s why I built Targum.

Meta apparently also believes in this vision, and gave us an incredible new power unlock that understands 100 languages and does so multilingually without effort.

Language barri

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us