DeepSeek-V4: when inference is faster and training goes through verification
Published on 4/26/2026
•
Engineering
The AI model race has entered a stage where architectural breakthroughs happen less often, and engineering ones more often. DeepSeek-V4, judging by the LMSYS blog, went not for size but for a combination: fast inference on SGLang and verified RL (reinforcement learning) via Miles. For us, this is not so much news about a new model as a signal: the bottleneck is no longer model quality, but the cost of running it and confidence in the output.
The main difference of V4 from its predecessors is a two-phase pipeline. In the first phase, SGLang optimizes execution: dynamic batching, fused kernels, efficient KV-cache management. In the second, Miles uses verifiers instead of ordinary RLHF: the model doesn't just learn from human preferences, but checks its answers against formal criteria. This reduces hallucinations without huge labeled datasets.
Why this matters for production
In our experience, most problems with LLMs in real projects are not that the model is "undertrained," but that it's expensive to keep in production and hard to control quality. DeepSeek-V4 attacks both problems at once. If SGLang gives a real throughput boost on the same hardware (2–3x speedup on typical workloads), then verified RL is the answer to "how to make the model not lie without an army of assessors."
We wouldn't put V4 into a client's production right now — the model is too fresh, and the Miles documentation is still sparse. But the direction resonates with us: instead of "feed more data" — "add formal verification." This is especially relevant for tasks where the cost of error is high: financial reporting, medical diagnoses, legal document generation.
Where it might not work
Verifiers are good when the criterion can be formalized. For creative tasks — writing emails, generating ad copy, content ideas — verification won't help and may reduce response diversity. Miles, judging by the description, is tailored for tasks with a clear structure: code, math, logical chains. For "soft" domains, RLHF with human feedback remains more effective.
Another risk is vendor lock-in. SGLang and Miles are open-source projects, but their integration with DeepSeek-V4 is optimized specifically for this model. Switching to another model with the same pipeline will require adapting verifiers and tuning inference. In practice, this means if you choose V4, swapping it for Llama-5 or GPT-5 won't be "just replace the model" but rewriting part of the pipeline.
DeepSeek-V4 is not a revolution, but evolution in the right direction: less compute cost, more determinism. For teams already building AI products, it's a reason to look at verified RL as a risk-reduction tool. For those just starting out, it's another argument not to chase model size, but to design the pipeline with quality control in mind.
