1.6 Trillion Parameters on Huawei Chips: What DeepSeek V4's Release Actually Means
Published on 4/27/2026
•
Engineering
When a model with 1.6 trillion parameters runs on chips that weren't originally designed for that class of tasks, it's less a demonstration of technological superiority and more an engineering compromise. DeepSeek has released a preview of V4 — its largest model yet — on Huawei accelerators, and this decision says more about the market situation than any datasheet numbers. The question isn't how much smarter V4 is than its predecessors, but what price you pay for hardware independence.
Our team has often faced situations where hardware choice is dictated not by performance but by availability. If you're working with AI infrastructure in a region where NVIDIA is under sanctions or costs three times as much, you'll inevitably look for alternatives. DeepSeek, apparently, went down that path, but with a caveat: Huawei chips likely required serious software-level optimization — from custom CUDA-like runtime kernels to rewriting attention layers for the specific tensor block architecture. This isn't something you can replicate over a weekend.
Performance as a Function of Constraints
1.6 trillion parameters is an impressive number, but in practice, inference of such a model on available hardware will be either slow or expensive. Even on H100s, such a model requires clusters and distributed inference. On Huawei chips — even more so. We'd guess DeepSeek applied aggressive pruning or mixture-of-experts with sparse activation; otherwise, latency would be unacceptable for real-world use. The original news from Tom's Hardware mentions this is only a preview, leaving room for further trade-offs.
Political Context vs Engineering Reality
US allegations of intellectual property theft are noise that doesn't change physics. Even if DeepSeek had access to others' work, running a model on non-standard hardware requires so much in-house engineering that copying becomes only a small part of the story. A much more interesting question is whether DeepSeek can maintain this model in production when every optimization for Huawei is tied to a specific chip revision and library version. In our experience, swapping one GPU vendor for another takes at least three months of adapting the production pipeline, even if the model is already trained.
While some argue about politics, others count latency per token. And here's the main lesson for those choosing AI infrastructure: if you bet on proprietary hardware with closed software, you take on vendor lock-in that may be tighter than any political ban.
