Episode Details

Back to Episodes
ARC-AGI-3 leaderboard shock & Search as Code for agents - AI News (Jun 3, 2026)

ARC-AGI-3 leaderboard shock & Search as Code for agents - AI News (Jun 3, 2026)

Published 2 weeks, 5 days ago
Description
Please support this podcast by checking out our sponsors:
- Consensus: AI for Research. Get a free month - https://get.consensus.app/automated_daily
- SurveyMonkey, Using AI to surface insights faster and reduce manual analysis time - https://get.surveymonkey.com/tad
- KrispCall: Agentic Cloud Telephony - https://try.krispcall.com/tad


Support The Automated Daily directly:
Buy me a coffee: https://buymeacoffee.com/theautomateddaily

Today's topics:

ARC-AGI-3 leaderboard shock - A claim on X says “Opus 4.8” jumped the ARC-AGI-3 benchmark on the ARC Prize leaderboard, beating “GPT-5.5” by a large margin, but still far from human efficiency—raising fresh questions about generalization progress.

Search as Code for agents - Perplexity’s “Search as Code” pitches agent-built Python retrieval pipelines in sandboxes using low-level primitives, aiming to cut tokens, reduce context noise, and scale wide research tasks.

AI compute funding and capex - Alphabet is reportedly raising up to $80B via stock sales to expand AI compute capacity, underscoring how the AI arms race is pushing Big Tech into massive capex and dilution-sensitive financing.

Export controls tighten on GPUs - The US Commerce Department updated guidance to block Chinese AI firms from buying Nvidia and AMD frontier chips through overseas subsidiaries, closing a key export-control loophole.

NVIDIA pushes open world models - NVIDIA announced Cosmos 3 as an open multimodal “world foundation model” for robotics and physical AI, highlighting a push toward simulation, synthetic data, and interoperable world-model ecosystems.

Open-weights models heat up - NVIDIA’s Nemotron 3 Ultra and the new open-weight Mellum 2 model show the open ecosystem accelerating on both intelligence and efficiency, intensifying competition across US and Chinese labs.

Agents move into Microsoft 365 - Microsoft’s Scout is an always-on Microsoft 365 agent tied to governed identity, signaling a shift from chat copilots to background automation—and putting security and control in the spotlight.

AI tutoring beats law professors - A Stanford Law School-led study found professors often preferred AI-generated responses to common student questions, suggesting AI tutoring may already be competitive in nuanced, reasoning-heavy domains.

Production LLM ops gets messy - Datadog’s report, based on LLM telemetry from 1,000+ orgs, says teams are running multi-model fleets and agent workflows in production—while accumulating “LLM tech debt” and observability gaps.

AI policy, cyber, and society - Trump’s scaled-back AI cybersecurity executive order proposes voluntary pre-release review and a vulnerability clearinghouse, while US communities increasingly push back on data centers as a proxy fight over AI.

Model welfare and alignment tradeoffs - A critique of Anthropic’s Claude Opus 4.8 argues ‘fixing’ behavior can cause new failure modes, challenging self-report welfare evaluations and highlighting alignment tradeoffs like honesty versus affect.

AI and mental health risks - AXA’s global survey reports worsening mental health and widespread AI use for advi

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us