Why Bigger Context Windows Don’t Solve the RAG Problem

Why Bigger Context Windows Don't Solve the RAG Problem

As context windows grow into the millions of tokens, many AI practitioners are questioning whether retrieval-augmented generation (RAG) is still necessary. If modern models can ingest entire libraries of documents, why bother with retrieval at all?

In this episode, Alex Bowcut, Head of Engineering at Sphere, explains why the answer depends on the application. Sphere uses AI to automate global tax compliance—an environment where getting the answer right isn’t enough. Every conclusion must be backed by the correct legal citation, and every decision must withstand expert review.

We explore how Sphere built TRAM (Tax Review and Assessment Model), a production AI system that combines retrieval, reasoning models, legal review workflows, reinforcement learning, and deterministic systems to help tax experts move nearly two orders of magnitude faster while maintaining accuracy.

Along the way, we discuss why RAG remains critical in high-stakes domains, how Sphere processes legal and regulatory documents from jurisdictions around the world, retrieval architectures, semantic chunking, dense versus sparse retrieval, expert feedback loops, and the challenges of building AI systems that people can actually trust.

🗒️ Full show notes: https://twimlai.com/go/769.

🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1

📖 CHAPTERS
===============================
00:00 Intro: Is RAG Obsolete?
01:24 Meet Sphere and TRAM
02:02 Why Tax Content Is Hard
03:54 How AI Supercharges Tax Experts
05:14 Alex Background Story
07:03 Messy Legal Data Ingestion
08:58 How TRAM Review Works
11:19 What Triggers Updates
15:13 Legal Review Not Labeling
16:08 Chunking and Indexing Law
21:21 Dense Versus Sparse Search
25:07 Taxonomy Driven Queries
27:55 RAG Is Dead Debate
29:55 Citations and Traceability
31:22 RFT for Accuracy Gains
34:50 Evals and Model Drift
37:47 LLM Reranking and Expansion
40:28 Chasing Nines Accuracy
42:39 Context Windows Impact
44:46 Costs and Latency Reality
45:40 Future Roadmap for TRAM
48:57 Personal AI Workflow Tools
50:30 Closing

🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/
Follow us on Twitter: https://twitter.com/twimlai
Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/
Join our Slack Community: https://twimlai.com/community/
Subscribe to our newsletter: https://twimlai.com/newsletter/
Want to get in touch? Send us a message: https://twimlai.com/contact/

SpaceX IPOs at $2.89T Market Cap, US Govt Suspends Fable & Mythos 5, Altman Delays OpenAI’s IPO |265

Brian Armstrong: Bitcoin, Anthropic’s Fable 5 & Mythos 5, NewLimit’s $435M Age-Reversal | MOONSHOTS

Tesla Pi Phone 2026: The Truth Nobody Wants to Show You