1.6 Trillion Parameters on Huawei Chips: What DeepSeek V4's Release Actually Means
Published on 4/27/2026
•
Engineering
When a model with 1.6 trillion parameters runs on chips that weren't originally designed for that class of tasks, it's less a demonstration of technological superiority and more an engineering compromise. DeepSeek has released a preview of V4 — its largest model yet — on Huawei accelerators, and this decision says more about the market situation than any datasheet numbers. The question isn't how much smarter V4 is than its predecessors, but what price you pay for hardware independence.
Our team has often faced situations where hardware choice is dictated not by performance but by availability. If you're working with AI infrastructure in a region where NVIDIA is under sanctions or costs three times as much, you'll inevitably look for alternatives. DeepSeek, apparently, went down that path, but with a caveat: Huawei chips likely required serious software-level optimization — from custom CUDA-like runtime kernels to rewriting attention layers for the specific tensor block architecture. This isn't something you can replicate over a weekend.
Performance as a Function of Constraints
1.6 trillion parameters is an impressive number, but in practice, inference of such a model on available hardware will be either slow or expensive. Even on H100s, such a model requires clusters and distributed inference. On Huawei chips — even more so. We'd guess DeepSeek applied aggressive pruning or mixture-of-experts with sparse activation; otherwise, latency would be unacceptable for real-world use. The original news from Tom's Hardware mentions this is only a preview, leaving room for further trade-offs.
Political Context vs Engineering Reality
US allegations of intellectual property theft are noise that doesn't change physics. Even if DeepSeek had access to others' work, running a model on non-standard hardware requires so much in-house engineering that copying becomes only a small part of the story. A much more interesting question is whether DeepSeek can maintain this model in production when every optimization for Huawei is tied to a specific chip revision and library version. In our experience, swapping one GPU vendor for another takes at least three months of adapting the production pipeline, even if the model is already trained.
While some argue about politics, others count latency per token. And here's the main lesson for those choosing AI infrastructure: if you bet on proprietary hardware with closed software, you take on vendor lock-in that may be tighter than any political ban.
Related articles
Starship V3: space construction where blueprints are written mid-flight
SpaceX launched Starship V3 — successfully, but without reaching orbit. We break down why the "fail fast" approach works in space and IT, and when it's better to draw first and build later.
5/24/2026
When Memory Costs More Than GPU: What Nvidia Doesn't Say in Vera Rubin Presentations
Memory in Nvidia's new AI systems has jumped 485% in cost, now making up 25% of rack price. We break down why GPU isn't the main expense and how to calculate TCO for AI infrastructure.
5/22/2026
VS Code Extension as Entry Point: 3800 Repositories Compromised
Leak of 3800 GitHub repos via malicious VS Code extension: why IDEs are a security weak spot and how to reduce risk without banning plugins.
5/21/2026
