🚀
Headlines & Launches
GPT-5.5 Instant (8 minute read)
OpenAI released GPT5.5 Instant, updating its default ChatGPT model with improved factual accuracy, reduced hallucinations, and stronger personalization based on user context.
Meta plans advanced 'agentic' AI assistant for users (2 minute read)
Meta is building a highly personalized AI assistant that will be able to carry out everyday tasks. The digital assistant will be powered by the company's new Muse Spark AI model. It can connect several hardware and software tools and learn from data with less human intervention than a chatbot. Meta is targeting a launch before the fourth quarter of this year.
🧠
Deep Dives & Analysis
In search of wasted bits: how much information do LLM weights carry? (11 minute read)
A lot of LLM inference is transferring data from one place to another and then computing on it when it's there. The most frustrating bottleneck in the system is when compute units sit idle because the data bus feeding them isn't fast enough. The solution is to transform memory into compute. Quantization is a nice trick, but it doesn't actually trade memory for compute - it transfers half as much data to a place to do twice as much computation.
Computer use is 45x More Expensive Than Structured APIs (7 minute read)
Vision agents are the default for operating web apps that don't expose APIs. Most teams default to vision agents because the alternative, writing an MCP or REST surface, is too expensive to build. The cost of the vision approach is treated as a fixed price. Current vision agents require detailed prompts to succeed in tasks, and they are still prone to making mistakes. Better vision models reduce error rates, but they do not reduce the number of screenshots required to reach the relevant data, each of which is worth thousands of input tokens.
👨💻
Engineering & Research
How to Scale Your Model (14 minute read)
This book discusses the science of scaling language models. It covers how TPUs and GPUs work, how they communicate with each other, how LLMs run on real hardware, and how to parallelize models during training and inference so they run efficiently at massive scale. The book answers questions about how expensive training a model should be, how much memory is needed to serve models, and more.
Accelerating Gemma 4: faster inference with multi-token prediction drafters (4 minute read)
Gemma 4 models reduce latency bottlenecks and achieve improved responsiveness for developers by using Multi-Token Prediction drafters. These drafters deliver up to a 3x speedup without any degradation in output quality or reasoning logic due to a specialized speculative decoding architecture. Speculative decoding decouples token generation from verification. It utilizes idle compute to 'predict' several future tokens at once with the drafter in less time than it takes for the target model to process just one token. The target model then verifies all of these suggested tokens in parallel.
AI2 Released MolmoAct 2 (9 minute read)
MolmoAct 2 is an upgraded action reasoning model that improves real-world robot task performance and is paired with a large open bimanual manipulation dataset.
Gemini API File Search is now multimodal: build efficient, verifiable RAG (3 minute read)
Multimodal support, custom metadata filtering, and page-level citations are now available in the Gemini API File Search tool. The features can help developers bring structure to unstructured data for efficient, verifiable RAG. Users' RAG systems can now natively process and better organize text and visual data. The File Search tool handles the heavy infrastructure so users can focus on building products.
⚡️
Quick Links
Agents for financial services (12 minute read)
Anthropic has released 10 ready-to-run templates for the most time-consuming work in financial services, including building pitchbooks, screening KYC files, and closing the books at month-end.
Become a curator for TLDR AI (3-5 hrs/week)
TLDR is looking for an engineer/researcher at a major AI lab or startup to help write for 1M+ subscribers. Our curators have been invited to Google I/O and OpenAI DevDay, scouted for Tier 1 VCs, and get early access to unreleased TLDR products.
Learn more.