DeepSeek has taken the world by storm, achieving unprecedented success in a remarkably short period.
This week, the Chinese AI lab, DeepSeek, burst into the global consciousness after its chatbot app surged to the top of the Apple App Store charts, and also rose to the top on Google Play. The company’s AI models, which were trained using innovative, compute-efficient techniques, have sparked intense debate among Wall Street analysts and technologists, raising questions about the United States’ ability to maintain its lead in the AI race and the sustainability of demand for AI chips.
However, the question on everyone’s mind is: where did DeepSeek come from, and how did it achieve international fame so rapidly?
The Origins of DeepSeek: A Trader’s Story
DeepSeek is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that utilizes AI to inform its trading decisions, demonstrating the company’s roots in the financial sector.
Liang Wenfeng, an AI enthusiast and co-founder of High-Flyer, launched the company in 2015. Wenfeng, who began exploring trading while a student at Zhejiang University, established High-Flyer Capital Management as a hedge fund in 2019, focusing on developing and deploying AI algorithms to drive its investment strategies.
In 2023, High-Flyer initiated DeepSeek as a dedicated lab for researching AI tools, separate from its financial operations. With High-Flyer as one of its key investors, the lab eventually spun off into its own company, also named DeepSeek, marking the beginning of a new era in AI innovation.
From its inception, DeepSeek built its own data center clusters for model training, showcasing its commitment to self-sufficiency. However, like other AI companies in China, DeepSeek has faced challenges due to U.S. export bans on hardware, forcing the company to rely on less powerful chips, such as the Nvidia H800, for training its models.
DeepSeek’s technical team is notable for its youthful demographic. The company aggressively recruits doctorate AI researchers from top Chinese universities, ensuring a constant influx of fresh talent and innovative ideas. Additionally, DeepSeek hires individuals without computer science backgrounds to contribute to its tech, enabling the company to better understand a wide range of subjects, as reported by The New York Times.
The Strength of DeepSeek’s Models
DeepSeek unveiled its initial set of models, including DeepSeek Coder, DeepSeek LLM, and DeepSeek Chat, in November 2023. However, it was not until the release of its next-generation DeepSeek-V2 family of models in the following spring that the AI industry began to take notice of the company’s remarkable capabilities.
DeepSeek-V2, a general-purpose text- and image-analyzing system, performed exceptionally well in various AI benchmarks and was significantly cheaper to run than comparable models at the time. This forced DeepSeek’s domestic competitors, including ByteDance and Alibaba, to reduce the usage prices for some of their models and make others completely free, demonstrating the company’s disruptive influence on the AI landscape.
The launch of DeepSeek-V3 in December 2024 further solidified DeepSeek’s reputation as a leader in AI innovation.
According to DeepSeek’s internal benchmark testing, DeepSeek V3 outperforms both downloadable, openly available models like Meta’s Llama and “closed” models that can only be accessed through an API, like OpenAI’s GPT-4o, showcasing the company’s ability to develop highly competitive AI models.
Equally impressive is DeepSeek’s R1 “reasoning” model, released in January, which the company claims performs as well as OpenAI’s o1 model on key benchmarks, demonstrating its capabilities in complex reasoning tasks.
As a reasoning model, R1 effectively fact-checks itself, helping to avoid common pitfalls that trip up other models. Although reasoning models take slightly longer to arrive at solutions, typically seconds to minutes longer, they tend to be more reliable in domains such as physics, science, and math, making them highly valuable for applications requiring precision and accuracy.
However, there is a downside to R1, DeepSeek V3, and DeepSeek’s other models. As Chinese-developed AI, they are subject to benchmarking by China’s internet regulator to ensure that their responses “embody core socialist values.” For instance, in DeepSeek’s chatbot app, R1 won’t answer questions about Tiananmen Square or Taiwan’s autonomy, highlighting the complexities and challenges associated with AI development in a global context.
A Disruptive Approach to AI
DeepSeek’s business model, if it can be called that, is not entirely clear. The company prices its products and services well below market value and gives others away for free, disrupting the traditional AI market dynamics.
According to DeepSeek, efficiency breakthroughs have enabled the company to maintain extreme cost competitiveness. However, some experts dispute the figures supplied by the company, raising questions about the sustainability of its business model.
Regardless of the underlying dynamics, developers have embraced DeepSeek’s models, which are available under permissive licenses that allow for commercial use. Clem Delangue, the CEO of Hugging Face, one of the platforms hosting DeepSeek’s models, noted that developers on Hugging Face have created over 500 “derivative” models of R1, which have racked up 2.5 million downloads combined, demonstrating the significant impact of DeepSeek’s innovations on the AI community.
DeepSeek’s success against larger and more established rivals has been described as “upending AI” and “over-hyped.” The company’s achievements were at least partially responsible for causing Nvidia’s stock price to drop by 18% and eliciting a public response from OpenAI CEO Sam Altman, highlighting the significant ripple effects of DeepSeek’s rise in the AI industry.
Microsoft announced that DeepSeek is available on its Azure AI Foundry service, a platform that brings together AI services for enterprises under a single banner. When asked about DeepSeek’s impact on Meta’s AI spending during its first-quarter earnings call, CEO Mark Zuckerberg stated that spending on AI infrastructure will continue to be a “strategic advantage” for Meta, underscoring the ongoing commitment to AI development among industry leaders.
At the same time, some companies are blocking DeepSeek, and entire countries and governments are following suit, citing concerns over China’s data risks and the potential for foreign influence. New York state also banned DeepSeek from being used on government devices, reflecting the growing scrutiny of AI technologies and their potential implications for national security and data privacy.
As for what the future holds for DeepSeek, it is uncertain. While improved models are likely, the U.S. government appears to be growing increasingly wary of what it perceives as harmful foreign influence, which could impact the company’s ability to operate and expand globally.
TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.
This story was originally published on January 28, 2025, and will be updated continuously with more information as it becomes available.
Source Link