| Management number | 220024476 | Release Date | 2026/05/03 | List Price | US$12.00 | Model Number | 220024476 | ||
|---|---|---|---|---|---|---|---|---|---|
| Category | |||||||||
Are you tired of shipping LLM “features” that look amazing in demos… and fall apart in production?You’re not imagining it.LLMs don’t fail because you “picked the wrong prompt.”They fail because the system around them is under-engineered.One week your RAG prototype answers perfectly.The next week it hallucinates with confidence, ignores instructions after a few turns, and blows your token budget so hard your CFO starts asking questions.And the worst part?You can’t even tell why it failed—because you don’t have the right mental model of how LLMs actually behave, how inference really works, and where reliability leaks into chaos.If you’re building real products, this is the gap that burns time, credibility, and money.The good news?You can design LLM systems that are predictable enough to ship—without mystique, without “vibes-based” iteration, and without betting your roadmap on luck.That’s exactly what this book does.This is not a fluffy overview.It’s a systems-first, engineering-grade playbook that starts from the fundamentals (tokens, tokenization, embeddings, transformers, attention, inference) and takes you all the way to what matters in the real world: RAG, tool use, guardrails, and evals—with step-by-step walkthroughs and an actively maintained GitHub repo you can clone and adapt.Here is a mere fraction of what you will learn:How to think in tokens instead of words so you can predict cost latency and failure risk before you deployHow context windows actually break so you stop losing instructions and silently shipping degraded behaviorHow tokenization choices distort numbers code and multilingual text so your system stays stable across real inputsHow embeddings create semantic geometry so retrieval returns the right evidence instead of “nearby nonsense”How attention and long context scale in compute so you avoid slow expensive designs that don’t buy reliabilityHow inference really works through prefill decode and KV cache so you diagnose latency instead of guessingHow sampling settings change determinism so you can choose reproducibility when it matters and creativity when it paysHow pretraining rewards pattern completion not truth so you design guardrails that target the real failure modesHow data pipelines create both capability and liability so you reduce contamination toxicity and unexpected regressionsHow fine tuning shapes behavior so you get the outputs you want without training in new bugsHow preference optimization works in practice so you stop treating alignment like magic and start treating it like engineeringHow to pick between RAG prompt engineering and fine tuning so you ship faster with fewer moving partsThis isn’t theory for theory’s sake.This is the missing “systems layer” that turns an LLM into a reliable product capability.If you’ve ever said:“I can make it work… but I can’t make it dependable,”then this book was written for you.So here’s the question:Will you keep shipping fragile demos dressed up as features…or will you start building LLM systems that can survive real users, real traffic, and real constraints?Click “Buy Now” and start designing LLM systems you can actually ship. Read more
| ISBN13 | 979-8249550684 |
|---|---|
| Language | English |
| Publisher | Independently published |
| Dimensions | 8.5 x 0.78 x 11 inches |
| Item Weight | 2.18 pounds |
| Print length | 345 pages |
| Publication date | February 23, 2026 |
If you notice any omissions or errors in the product information on this page, please use the correction request form below.
Correction Request Form