DeepSeek has gained widespread attention.
The Chinese AI lab, DeepSeek, has broken into the mainstream consciousness this week, following the rise of its chatbot app to the top of the Apple App Store charts (and Google Play as well). This development has led Wall Street analysts and technologists to question whether the U.S. can maintain its lead in the AI race and whether the demand for AI chips will be sustained, given DeepSeek’s AI models, which were trained using compute-efficient techniques (as reported by analysts and technologists alike).
The sudden rise of DeepSeek to international fame has raised questions about its origins and the factors contributing to its rapid growth.
The Origins of DeepSeek
DeepSeek is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that utilizes AI to inform its trading decisions.
AI enthusiast Liang Wenfeng, who co-founded High-Flyer in 2015, reportedly began exploring trading while a student at Zhejiang University. Wenfeng launched High-Flyer Capital Management as a hedge fund in 2019, focusing on developing and deploying AI algorithms.
In 2023, High-Flyer established DeepSeek as a lab dedicated to researching AI tools separate from its financial business. With High-Flyer as one of its investors, the lab spun off into its own company, also called DeepSeek.
From its inception, DeepSeek built its own data center clusters for model training. However, like other AI companies in China, DeepSeek has been affected by U.S. export bans on hardware, forcing the company to use Nvidia H800 chips, a less powerful version of the H100 chip available to U.S. companies, to train one of its more recent models.
DeepSeek’s technical team is characterized by its youthful composition. The company reportedly aggressively recruits doctorate AI researchers from top Chinese universities. Additionally, DeepSeek hires individuals without a computer science background to help its tech better understand a wide range of subjects, as reported by The New York Times.
DeepSeek’s Strong Models
DeepSeek unveiled its first set of models, including DeepSeek Coder, DeepSeek LLM, and DeepSeek Chat, in November 2023. However, it wasn’t until the release of its next-gen DeepSeek-V2 family of models in the spring that the AI industry began to take notice.
DeepSeek-V2, a general-purpose text- and image-analyzing system, performed well in various AI benchmarks and was significantly cheaper to run than comparable models at the time. This forced DeepSeek’s domestic competition, including ByteDance and Alibaba, to cut the usage prices for some of their models and make others completely free.
The launch of DeepSeek-V3 in December 2024 further solidified DeepSeek’s notoriety.
According to DeepSeek’s internal benchmark testing, DeepSeek V3 outperforms both downloadable, openly available models like Meta’s Llama and “closed” models that can only be accessed through an API, like OpenAI’s GPT-4o.
DeepSeek’s R1 “reasoning” model, released in January, is equally impressive, with the company claiming it performs as well as OpenAI’s o1 model on key benchmarks.
As a reasoning model, R1 effectively fact-checks itself, helping it avoid pitfalls that normally trip up models. While reasoning models take longer to arrive at solutions, they tend to be more reliable in domains such as physics, science, and math.
However, there is a downside to R1, DeepSeek V3, and DeepSeek’s other models. As Chinese-developed AI, they are subject to benchmarking by China’s internet regulator to ensure that their responses “embody core socialist values.” For instance, DeepSeek’s chatbot app, which utilizes R1, won’t answer questions about Tiananmen Square or Taiwan’s autonomy.
A Disruptive Approach
DeepSeek’s business model is not clearly defined. The company prices its products and services well below market value and gives others away for free. Furthermore, DeepSeek is not taking investor money, despite significant VC interest.
According to DeepSeek, efficiency breakthroughs have enabled the company to maintain extreme cost competitiveness. However, some experts dispute the figures supplied by the company.
Developers have taken to DeepSeek’s models, which are available under permissive licenses that allow for commercial use. Clem Delangue, the CEO of Hugging Face, one of the platforms hosting DeepSeek’s models, reports that developers on Hugging Face have created over 500 “derivative” models of R1, which have racked up 2.5 million downloads combined.
DeepSeek’s success against larger and more established rivals has been described as “upending AI” and “over-hyped.” The company’s success was at least partially responsible for causing Nvidia’s stock price to drop by 18% in January and for eliciting a public response from OpenAI CEO Sam Altman.
Microsoft announced that DeepSeek is available on its Azure AI Foundry service, a platform that brings together AI services for enterprises under a single banner. During Meta’s first-quarter earnings call, CEO Mark Zuckerberg stated that spending on AI infrastructure will continue to be a “strategic advantage” for Meta. In March, OpenAI called DeepSeek “state-subsidized” and “state-controlled,” and recommends that the U.S. government consider banning models from DeepSeek.
During Nvidia’s fourth-quarter earnings call, Source Link