Blog

Intelligent Systems
Design Series

From LLM fundamentals to inference infrastructure — a practical guide to building and serving production AI systems.

Learning Paths

Reading roadmap

LLM Foundations

Follow the 5 posts in this series. Start with chapter 1 or jump to any chapter.

chapters

Phases

Start with chapter 1

Chapter

Click any chapter card to jump straight into the sequence.

5chapters

01Foundations
What Large Language Models Actually Do
You type a sentence into an AI application. Seconds later, it returns several paragraphs of fluent, well-organized text that reads like it was written by a knowledgeable human. That experience is now routine. What is not routine — and what matters if you plan to build anything on top of these sys...
Mar 20, 2026
02Foundations
Prompts, Context Windows, and How You Talk to an LLM
In the previous post, we sent a single line to an LLM — "Plan a trip to Helsinki" — and got back an itinerary full of specific-sounding details: restaurant names, transit directions, day-trip logistics. It was fluent and plausible, but several of those details turned out to be wrong. The model wa...
Mar 20, 2026
03Foundations
Why LLMs Need Help — Hallucinations, Grounding, and the Case for Systems
Large language models produce fluent, confident text. That confidence is the problem. A model can sound authoritative about a property listing that no longer exists, a tax rate that changed last quarter, or a school rating from three years ago. It has no mechanism to check. It was not designed to...
Mar 20, 2026
04Foundations
AI Assistants, AI Agents, and Everything In Between
A useful AI system is defined less by whether it is called an assistant or an agent than by how much control it has over the next step.
Mar 20, 2026
05Foundations
AI-Powered Customer Support — From Chatbot to Intelligent System
Customer support is a useful capstone example because one message can require retrieval, tool use, memory, routing, and approval boundaries at the same time.
Mar 20, 2026

Reading roadmap

Building AI Systems

Follow the 7 posts in this series. Start with chapter 1 or jump to any chapter.

chapters

Phases

Start with chapter 1

Chapter

Click any chapter card to jump straight into the sequence.

7chapters

Reading roadmap

Document & Multimodal Intelligence

Follow the 3 posts in this series. Start with chapter 1 or jump to any chapter.

chapters

Phases

Start with chapter 1

Chapter

Click any chapter card to jump straight into the sequence.

3chapters

Reading roadmap

LLM Inference Infrastructure

Follow the 6 posts in this series. Start with chapter 1 or jump to any chapter.

chapters

Phases

Start with chapter 1

Chapter

Click any chapter card to jump straight into the sequence.

6chapters

Intelligent Systems
Design Series

Learning Paths

LLM Foundations

Building AI Systems

Document & Multimodal Intelligence

LLM Inference Infrastructure

LLM Foundations

What Large Language Models Actually Do

Prompts, Context Windows, and How You Talk to an LLM

Why LLMs Need Help — Hallucinations, Grounding, and the Case for Systems

AI Assistants, AI Agents, and Everything In Between

AI-Powered Customer Support — From Chatbot to Intelligent System

Building AI Systems

From Models to Compound AI Systems

Reliable LLM Pipelines and Control Logic

Grounding with RAG: How AI Systems Retrieve Evidence Before They Answer

Memory, State, and Knowledge: Stop Calling Everything "Memory"

Assistants, Workflows, and Agents: Designing for the Right Level of Autonomy

Agent Loops in Practice: ReAct, Tools, and Failure Modes

When RAG Is Not Enough: CAG, Hybrid Retrieval, and Working Memory

Document & Multimodal Intelligence

Document Intelligence Beyond OCR: Layout, Tables, and Evidence Reconstruction

Multimodal Evidence Systems: VLMs, Figure Grounding, and Cross-Modal Retrieval

Building the Travel Copilot: End-to-End Architecture, Approval Gates, and Auditability

LLM Inference Infrastructure

What Happens After You Call the API

Continuous Batching: Serving Many Requests on One GPU

Paged KV Cache: GPU Memory Management for LLM Serving

Prefill-Decode Disaggregation: Splitting the Two Stages of Inference

Prefix-Aware Routing: Cache-Conscious Request Distribution

MoE Sharding: Parallelism Strategies for Mixture-of-Experts Models