Model (V3 and R1) launched by DeepSeek rivals OpenAI's ChatGPT and Meta's Llama 3.1.
DeepSeek is a start-up and private Chinese company founded in Hangzhou, (China) in July 2023 by Liang Wenfeng, a graduate of Zhejiang University, one of China's top universities, who funded the startup via his hedge fund. Liang has about $8 billion in assets. Liang Wenfeng is also the co-founder of the quantitative hedge fund High-Flyer.
In China, the start-up is known for grabbing young and talented A.I. researchers from top universities, promising high salaries and an opportunity to work on cutting-edge research projects.
DeepSeek's team primarily comprises young, talented graduates from top Chinese universities, fostering a culture of innovation and a deep understanding of the Chinese language and culture. Notably, the company's hiring practices prioritize technical abilities over traditional work experience, resulting in a team of highly skilled individuals with a fresh perspective on AI development.
Liang's fund announced in March 2023 on its official WeChat account that it was "starting again", going beyond trading to concentrate resources on creating a "new and independent research group, to explore the essence of AGI" (Artificial General Intelligence). DeepSeek was made later in July 2023.
It is unclear how much High-Flyer has invested in DeepSeek. However, High-Flyer has an office in the same building as DeepSeek and owns patents related to chip clusters used to train AI models.
This unique funding model has allowed DeepSeek to pursue ambitious AI projects without the pressure of external investors, enabling it to prioritize long-term research and development.
Its goal is to build A.I. technologies along the lines of OpenAI’s ChatGPT chatbot or Google’s Gemini.
By 2021, DeepSeek had acquired thousands of computer chips from the U.S. chipmaker Nvidia, which are a fundamental part of any effort to create powerful A.I. systems.
The fund, by 2022, had amassed a cluster of 10,000 of California-based Nvidia’s high-performance A100 graphics processor chips that are used to build and run AI systems, according to a post on the Chinese social media platform WeChat.
U.S. soon after restricted sales of those chips to China.
DeepSeek released its first large language model in 2023 and later released several large language models, which are the technology behind chatbots like ChatGPT and Gemini.
On January 10, 2025, it released its first free chatbot app, which was based on a new model called DeepSeek-V3.
DeepSeek has said its recent models were built with Nvidia’s lower-performing H800 chips, which are not banned in China, sending a message that the fanciest hardware might not be needed for cutting-edge AI research.
The company has attracted attention in global AI circles after writing in a paper in December 2024 that the training of DeepSeek-V3 required less than $6 million worth of computing power from Nvidia H800 chips.
Deepseek claimed that its new AI model is at par with similar models from US companies such as ChatGPT maker OpenAI, and was more cost-effective in its use of expensive Nvidia chips to train the system on troves of data.
But that was not the end of the claim, Deepseek published another research paper on January 20, 2025, on the same day as President Donald Trump’s inauguration, that set in motion the panic that followed.
That paper was about another DeepSeek AI model called R1 that showed advanced “reasoning” skills, such as the ability to rethink its approach to a math problem and was significantly cheaper than a similar model sold by OpenAI called o1.
The startup says its AI models, DeepSeek-V3 and DeepSeek-R1, are at par or better than the most advanced models in the United States from OpenAI i.e. the company behind ChatGPT, and Facebook parent company Meta at a fraction of cost.
Outages hit its website amid a spike in interest.
DeepSeek's journey began with the release of DeepSeek Coder in November 2023, an open-source model designed for coding tasks.
This was followed by DeepSeek LLM, a 67B parameter model aimed at competing with other large language models.
DeepSeek-V2, launched in May 2024, gained significant attention for its strong performance and low cost, triggering a price war in the Chinese AI model market.
This disruptive pricing strategy forced other major Chinese tech giants, such as ByteDance, Tencent, Baidu, and Alibaba, to lower their AI model prices to remain competitive.
DeepSeek-V2 was succeeded by DeepSeek-Coder-V2, a more advanced model with 236 billion parameters.
It is designed for complex coding challenges and features a high context length of up to 128K tokens.
This model is available through a cost-effective API (Application Programming Interface), priced at $0.14 per million input tokens and $0.28 per million output tokens.
The company's latest models, DeepSeek-V3 and DeepSeek-R1, have further solidified its position as a disruptive force.
DeepSeek-V3, a 671B parameter model, boasts impressive performance on various benchmarks while requiring significantly fewer resources than its peers.
DeepSeek-R1, released in January 2025, focuses on reasoning tasks and challenges OpenAI's o1 model with its advanced capabilities.
DeepSeek also offers a range of distilled models, known as DeepSeek-R1-Distill, which are based on popular open-weight models like Llama and Qwen, fine-tuned on synthetic data generated by R1.
These distilled models provide varying levels of performance and efficiency, catering to different computational needs and hardware configurations.
Comments
Post a Comment
Thank you, most welcome, 👍