This is how DeepSeek built a powerful AI with less money
DeepSeek's AI model presets better than that of industry giants such as Open AI and Meta. How did the Chinese manage to do this?
Published on February 13, 2025

DeepSeek, a Chinese start-up, builds a powerful AI with only 2,000 GPU chips, while other tech companies use 16,000. By employing clever techniques like a mixture of experts, they cut costs to a fraction of the usual amounts. This setup caused swings in the U.S. stock market and redefined the challenges in AI development under U.S. chip constraints. As its new neural network AI system surprises the world, DeepSeek pushes forward as a major player in technology by making the best use of its resources to achieve maximum efficiency. The innovative design of the open-source DeepSeek-V3 model with 671 billion parameters concludes that it is even more advantageous than competitors. Will DeepSeek set the new standard in AI technology despite hardware limitations?
Unprecedented efficiency in hardware usage
DeepSeek's performance is remarkable: with only 2,000 GPU chips, they achieve what other companies do with 16,000 chips. This translates into an investment of about $6 million in computing power, only a tenth of what Meta spent for comparable AI technology. This efficiency is achieved despite heavy U.S. export restrictions since Feb. 8, 2025. DeepSeek was forced to work with NVIDIA's H800 GPUs, a slimmed-down version of the more powerful H100. DeepSeek developed an innovative approach to overcome this hardware limitation: they avoided NVIDIA's standard CUDA system and communicated directly with the hardware via PTX (Parallel Thread Execution).

Mistral AI takes on ChatGPT with lightning-fast 'le Chat'
Le Chat is here: Mistral AI’s latest AI assistant aims to reshape productivity and rival top players like ChatGPT.
Revolutionary architecture with MoE
The heart of DeepSeek's innovation lies in its Mixture of Expert (MoE) architecture. Their V3 model, released on Dec. 26, 2024, contains 671 billion parameters, but the brilliant thing is that not all parameters are deployed at once. The system activates only the necessary model components, drastically reducing computational waste. This approach is combined with DeepSeekMLA (Multi-head Latent Attention), a technique that optimizes memory usage by prioritizing essential information.
Market impact
The announcement of DeepSeek's AI system in January 2025 caused a significant reaction in financial markets. NVIDIA's share price fell directly due to DeepSeek's breakthroughs and growing concerns about competition from other chip makers such as AMD, Intel and Chinese manufacturers. DeepSeek's pricing strategy remains competitive with a rate of $0.27 per million tokens as of Feb. 8, 2025, which the company says still represents the best value for money in the market. These developments lead to a reassessment of the feasibility of AI development in China, especially in light of the upcoming AI Action Summit 2025 in Paris on Feb. 10-11.

AI expert on Le Chat: 'Impressed by speed'
After China's DeepSeek, now comes Mistral AI with Le Chat. We talked about it with an AI expert.