Proactive Agents for the Web [Devi Parikh] – 756

Proactive Agents for the Web [Devi Parikh] - 756

Today, we’re joined by Devi Parikh, co-founder and co-CEO of Yutori, to discuss browser use models and a future where we interact with the web through proactive, autonomous agents. We explore the technical challenges of creating reliable web agents, the advantages of visually-grounded models that operate on screenshots rather than the browser’s more brittle document object model, or DOM, and why this counterintuitive choice has proven far more robust and generalizable for handling complex web interfaces. Devi also shares insights into Yutori’s training pipeline, which has evolved from supervised fine-tuning to include rejection sampling and reinforcement learning. Finally, we discuss how Yutori’s “Scouts” agents orchestrate multiple tools and sub-agents to handle complex queries, the importance of background, "ambient" operation for these systems, and what the path looks like from simple monitoring to full task automation on the web.

🗒️ For the full list of resources for this episode, visit the show notes page: https://twimlai.com/go/756.

🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1

🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/
Follow us on Twitter: https://twitter.com/twimlai
Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/
Join our Slack Community: https://twimlai.com/community/
Subscribe to our newsletter: https://twimlai.com/newsletter/
Want to get in touch? Send us a message: https://twimlai.com/contact/

📖 CHAPTERS
===============================
00:00 – Introduction
04:05 – Yutori
09:22 – AI browsers
13:15 – Scouts
15:43 – Complexities and architecture
20:20 – MCP servers
21:27 – Model
24:17 – Post-training VLM
29:40 – Capabilities
33:20 – Model training
38:01 – RL fine-tuning
39:13 – What’s next for Scouts
41:00 – Use cases
43:29 – Ingestion architecture and tool integration
45:35 – Multi-agent
47:17 – Taking advantage of redundancy in cron jobs
49:50 – Predictions

🔗 LINKS & RESOURCES
===============================
Yutori – https://yutori.com/
Human-AI Collaboration for Creativity with Devi Parikh – 399 – https://twimlai.com/podcast/twimlai/human-ai-collaboration-creativity-devi-parikh/
Why Agents Are Stupid & What We Can Do About It with Dan Jeffries – 713 – https://twimlai.com/podcast/twimlai/why-agents-are-stupid-what-we-can-do-about-it/
Building Maps and Spatial Awareness in Blind AI Agents with Dhruv Batra – 629 – https://twimlai.com/podcast/twimlai/learning-maps-and-spatial-awareness-in-blind-ai-agents/

📸 Camera: https://amzn.to/3TQ3zsg
🎙️Microphone: https://amzn.to/3t5zXeV
🚦Lights: https://amzn.to/3TQlX49
🎛️ Audio Interface: https://amzn.to/3TVFAIq
🎚️ Stream Deck: https://amzn.to/3zzm7F5

Grok 4.1 Just Dropped and Broke the Charts: Steals Gemini 3 Moment

Gemini 3: Code a retro 3D spaceship game with a single prompt

Grok 4.1 Just Dropped and Broke the Charts: Steals Gemini 3 Moment

Gemini 3: Code a retro 3D spaceship game with a single prompt

Related posts

Radical Job Destruction Is Coming | MOONSHOTS

GPT-5.3 Instant is Here: The Game-Changing AI Update | What’s New & How to Access It

GPT-5.3 Unleashed: Instant Performance, AI Image Generation & Game-Changing Coding Features!