Is It Time to Rethink LLM Pre-Training? [Aditi Raghunathan] – 747

Is It Time to Rethink LLM Pre-Training? [Aditi Raghunathan] - 747

Today, we’re joined by Aditi Raghunathan, assistant professor at Carnegie Mellon University, to discuss the limitations of LLMs and how we can build more adaptable and creative models. We dig into her ICML 2025 Outstanding Paper Award winner, “Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction,” which examines why LLMs struggle with generating truly novel ideas. We dig into the "Roll the dice" approach, which encourages structured exploration by injecting randomness at the start of generation, and the "Look before you leap" concept, which trains models to take "leaps of thought" using alternative objectives to create more diverse and structured outputs. We also discuss Aditi’s papers exploring the counterintuitive phenomenon of "catastrophic overtraining," where training models on more data improves benchmark performance but degrades their ability to be fine-tuned for new tasks, and dig into her lab’s work on creating more controllable and reliable models, including the concept of "memorization sinks," an architectural approach to isolate and enable the targeted unlearning of specific information.

🗒️ For the full list of resources for this episode, visit the show notes page: https://twimlai.com/go/747.

🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1

🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/
Follow us on Twitter: https://twitter.com/twimlai
Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/
Join our Slack Community: https://twimlai.com/community/
Subscribe to our newsletter: https://twimlai.com/newsletter/
Want to get in touch? Send us a message: https://twimlai.com/contact/

📖 CHAPTERS
===============================
00:00 – Introduction
4:30 – Gap between benchmark performance and real-world user experience
6:19 – Fine-tuning and model adaptability
10:16 – Token to parameter ratio
14:38 – Overtrained Language Models Are Harder to Fine-Tune paper
16:17 – Base model selection
17:55 – Unlearning
22:04 – Memorization Sinks: Isolating Memorization during LLM Training paper
29:05 – Role of memory in LLMs
30:53 – Going beyond the creative limits of next-token prediction paper
34:49 – Creativity
37:12 – Exploratory
38:20 – Difference of creativity in LLMs
44:22 – Look before you leap part in the paper
46:36 – Roll the dice part
52:43 – Compatibility with RL training
54:00 – Future directions

🔗 LINKS & RESOURCES
===============================
Aditi Raghunathan’s Group @ ICML 2025 – https://www.cs.cmu.edu/~aditirag/icml2025.html
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction – https://arxiv.org/pdf/2504.15266
Overtrained Language Models Are Harder to Fine-Tune – https://arxiv.org/pdf/2503.19206
Memorization Sinks: Isolating Memorization during LLM Training – https://arxiv.org/pdf/2507.09937
Exploring the “Biology” of LLMs with Circuit Tracing with Emmanuel Ameisen – #727 – https://twimlai.com/podcast/twimlai/exploring-the-biology-of-llms-with-circuit-tracing/

📸 Camera: https://amzn.to/3TQ3zsg
🎙️Microphone: https://amzn.to/3t5zXeV
🚦Lights: https://amzn.to/3TQlX49
🎛️ Audio Interface: https://amzn.to/3TVFAIq
🎚️ Stream Deck: https://amzn.to/3zzm7F5

OpenAI Just Dropped GPT5 Codex: The Most Powerful Coding AI Ever

This AI TEACHER left ChatGPT in the dust

OpenAI Just Dropped GPT5 Codex: The Most Powerful Coding AI Ever

This AI TEACHER left ChatGPT in the dust

Related posts

Recurrence and Attention for Long-Context Transformers [Jacob Buckman] – 750

Why Do We Get Sick? | How the Human Body Breaks Down | Explained Simply

OpenAI Just Dropped ChatGPT Apps SDK: Massive Upgrade!