Episode Details

Segment Anything Model and the Hard Problems of Computer Vision — with Joseph Nelson of Roboflow

Published 2 years, 11 months ago

Description

2023 is the year of Multimodal AI, and Latent Space is going multimodal too!

* This podcast comes with a video demo at the 1hr mark and it’s a good excuse to launch our YouTube - please subscribe!

* We are also holding two events in San Francisco — the first AI | UX meetup next week (already full; we’ll send a recap here on the newsletter) and Latent Space Liftoff Day on May 4th (signup here; but get in touch if you have a high profile launch you’d like to make).

* We also joined the Chroma/OpenAI ChatGPT Plugins Hackathon last week where we won the Turing and Replit awards and met some of you in person!

This post featured on Hacker News.

Out of the five senses of the human body, I’d put sight at the very top. But weirdly when it comes to AI, Computer Vision has felt left out of the recent wave compared to image generation, text reasoning, and even audio transcription. We got our first taste of it with the OCR capabilities demo in the GPT-4 Developer Livestream, but to date GPT-4’s vision capability has not yet been released.

Meta AI leapfrogged OpenAI and everyone else by fully open sourcing their Segment Anything Model (SAM) last week, complete with paper, model, weights, data (6x more images and 400x more masks than OpenImages), and a very slick demo website. This is a marked change to their previous LLaMA release, which was not commercially licensed. The response has been ecstatic:

SAM was the talk of the town at the ChatGPT Plugins Hackathon and I was fortunate enough to book Joseph Nelson who was frantically integrating SAM into Roboflow this past weekend. As a passionate instructor, hacker, and founder, Joseph is possibly the single best person in the world to bring the rest of us up to speed on the state of Computer Vision and the implications of SAM. I was already a fan of him from his previous pod with (hopefully future guest) Beyang Liu of Sourcegraph, so this served as a personal catchup as well.

Enjoy! and let us know what other news/models/guests you’d like to have us discuss!

- swyx

Recorded in-person at the beautiful StudioPod studios in San Francisco.

Full transcript is below the fold.

Show Notes

* Joseph’s links: Twitter, Linkedin, Personal

Episode Details

Segment Anything Model and the Hard Problems of Computer Vision — with Joseph Nelson of Roboflow

Description

Listen Now

Love PodBriefly?