DeepSeek is already the king of AI chatbots if the number of installs was the only parameter for genius. Its developers made this work even with less powerful NVIDIA GPUs (H800s), leading investors to lose confidence in the US chipmaker’s AI-related importance.
On the other side of the spectrum, OpenAI is accusing DeepSeek of model distilling when its own “closed-source” models are built by consuming publicly available data without bothering to take explicit permission from their authors. (that’s corporate irony at its best)
Amid this chaos, we have a freemium AI model you can use on the web, smartphones, and as an API. In addition, it’s open source, which means businesses can tailor DeepSeek the way they see fit.
But since smartphone users in China are almost thrice the entire US population, we shouldn’t get fooled into taking the number of downloads as a worthy “performance” metric. Instead, we should put DeepSeek under our own scanner and evaluate its worth.
What is DeepSeek?
DeepSeek is a Chinese AI startup that has developed a chat interface to its LLM.
Its latest, DeepSeek R1, is an AI utility similar to the popular superbots (ChatGPT, Claude, Gemini, etc.) you might have already used on your smartphone or PC.
Interestingly, DeepSeek has achieved this feat without the big fat purse of its US peers. It claims to have spent only $6 million for AI training, in comparison to over $100 million OpenAI burned for its GPT-4.
Keeping the financials aside, DeepSeek puts a good fight against OpenAI’s GPT-4o, Meta’s Llama-3.1, and Anthropic’s Claude-3.5 in various domains, including English, coding, mathematics, and Chinese language.
Taking note of this hype and its potential, Microsoft and Amazon have made DeepSeek available on their cloud platforms. Even Sam Altman, OpenAI CEO, couldn't resist but praising DeepSeek for its low-cost AI spectacle.
What makes DeepSeek Special?
AI bots are not anything new past ChatGPT. Most major internet companies are developing custom LLMs, and this technology is gradually entering a commodity stage.
So why even talk about DeepSeek? And what's the social media buzz all about?
Keeping China's cyber clout aside, DeepSeek pulled it off without investor money and (apparently) without distilling bigger AI models. This has led US companies to doubt their workflows and question if they need all that money to get things working.
However, it's important to understand DeepSeek's financials and a few things that are core to its work culture before assuming anything. We need to realize that it's not a made in a high-cost-of-living American or European country. From human resources and infrastructure to electricity and most computing equipment, everything is way cheaper in China compared to the US.
Besides, DeepSeek's core research team does not constitute high-paying, experienced (or rusty!) professionals but is eager-to-prove academically brilliant recent PhD graduates from eminent Chinese universities.
This makes DeepSeek an intense group of individuals united against US sanctions on top-tier NVIDIA chips exports to China. Finally, the freedom to research and experiment helped the world acknowledge that there is more to inventions than mere resources.
DeepSeek Development Timeline
I have briefly documented major DeepSeek developments in the following sections. The events are mentioned in reverse chronological order, starting with DeepSeek R1, down to its founding parent, High-Flyer.
DeepSeek R1 - January 2025
R1 is DeepSeek's reasoning-optimized AI chatbots, a domain-specific upgrade over its predecessor, V3. These models follow a chain of thought process, similar to OpenAI's latest "mini" lineup of bots.
DeepSeek's self-analysis reveals R1 performs on par with OpenAI's -o1-1217 and exceeds o1-mini in AIME 2024, MATH-500, Codeforces, MMLU, GPQA Diamond, and more.
In addition to R1 models, DeepSeek also released distilled models while keeping open-source models from Alibaba (Qwen) and Meta (Llama) as the base.
Distilled models also displayed impressive performance with DeepSeek-R1-Distill-Qwen-1.5B, surpassing GPT-4o and Claude-3.5-Sonnet on math benchmarks.
But more than the performance, it's the $5.576 million training cost made headlines since its US counterparts were breaking banks in the West. In addition, DeepSeek achieved this feat with significantly less powerful H800 GPUs, triggering a historic single-day 17% wipeout in NVIDIA's value--a first for any US-based company.
DeepSeek V3 - December 2024
Building on V2's MoE excellence, DeepSeek came up with V3 in December 2024. It activates 37B parameters per token out of its total 671B parameter corpus. V3 was trained on 14.8 trillion tokens and was further enhanced with supervised fine-tuning and reinforcement learning.
DeepSeek V3 strongly rivaled current open-source and closed-source LLMs on various benchmarks, including MMLU, LongBench, HumanEval, and MATH.
DeepSeek Coder, LLM, and MoE
DeepSeek released its Coder family of LLMs with varying sizes from 1.3B, 5.7B, and 6.7B to 33B parameters in November 2023. These were pre-trained on code-specific data consisting of 2 trillion tokens. This data comprised source code (87%), code-related English natural language (10%), and non-code Chinese natural language (3%).
It has a 16K context window, similar to that of GPT-3.5 Turbo.
Internal evaluations state DeepSeek Coder surpassed the existing open and closed source models, including Codex and GPT-3.5.
While "Coder" is task-specific, DeepSeek LLM, released in the second half of November 2023, was a more general-purpose AI chatbot. It has two variants (67B & 7B parameters) and was trained on 2 trillion tokens in English and Chinese.
Developers adopted strategies like data deduplication and two-stage supervised fine-tuning. This was done to ensure the training data remains diverse and the responses do not become repetitive.
DeepSeek LLM outpaced Meta's Llama 2 in coding, maths, reasoning, and Chinese comprehension. Its 67B model also did better than GPT 3.5 in Chinese in DeepSeek internal testing.
DeepSeek MoE (Mixture of Experts) was closely launched following its LLM's debut. It was unique with its sub-networks to handle domain-specific input tokens. The aim was to deliver comparable results with less computing invest. This strategy worked with DeepSeek MoE, performing on par with Llama 2 7B while using just 40% compute.
DeepSeek continued this computing efficiency with the launch of V2 in May 2024. Trained on multi-sourced 8.1 trillion tokens, V2 has 236 billion parameters, with only 21B activated per input token.
There were a few more technical enhancements, such as sparse computation, multi-head latent attention, and load balancing, that led to greater efficiency in model training and inference.
While DeepSeek was doing well for the price point, it was only until DeepSeek V3 (December 2024) and DeepSeek R1 (January 2025) that the AI fraternity took serious notice of this small, independent AI lab based out of Hangzhou, Zhejiang, China.
DeepSeek's Birth - 2023
High Flyer spun off DeepSeek in April 2023.
It was its first AI venture that wasn't about stock trading. Instead, the focus was large language models (LLMs) and artificial general intelligence (AGI).
DeepSeek was believed to have a stockpile of 10,000 high-end NVIDIA A100’s before US sanctions came into effect. Still, those weren't enough, and the developers had to turn to software optimizations, particularly tweaking the model architecture and using less powerful NVIDIA GPUs. In addition, DeepSeek switched to Huawei chips (Ascend 910C) for running inference.
All these efforts paid off with the launch of the first family of AI models launched in November 2023.
High-Flyer - DeepSeeK's Parent
Liang Wenfeng and two of his batchmates from Zhejiang University founded High-Flyer (a Chinese hedge fund company) in 2016.
Building on their past experience, the trio established two subsidiaries offering 450 investment products before resorting to machine learning-based trading.
In 2019, High-Flyer instituted its AI division which went on to build two supercomputers Fire-Flyer I and II through 2021.
DeepSeek Hands-on
The simplest way to use DeepSeek is via its website and (iOS and Android) apps. I logged into its website and started a few chats to see it firsthand.

My sole objective was to check if there is more to DeepSeek than mere social media eyewash. Since I've been using LLMs from GPT-3, I believe I have the skills to tell if a LLM is worth their salt.
So, I started with it, and the first thing I observed was, obviously, its user interface. It's clean but follows a template design. You can enable web search and select DeepThink R1 for advanced reasoning queries. It also supports uploading images; however only for text extraction.
Coming to the real-world experience, I expected a bit more refined experience. Take a look at the response to this simple reseasoning puzzle.
"In a family, there is are husband-wife, two sons and two daughters. All the ladies were invited to a dinner. Both sons went out to play. The husband did not return from the office. Who was at home?"
That was a good 985 words with its standard non-reasoning model. Even with the reasoning turned on, it processed for 37 seconds before displaying 569 words worth of thought process, with the actual reply summarized at the bottom.

But since it's been compared with ChatGPT all over the internet, it felt stupid NOT to see if OpenAI's bot went through the same degree of hyper-analysis. I switched to ChatGPT (free), and here's how it responded to the same query:

Anthropics's Claude did great, too.

But over-analysis isn't DeepSeek's only problem.
It straightaway rejects all prompts about Xi Jinping.

It's programmed to avoid all conversations about the Chinese premiere and China-sensitive issues. I have also asked a few more things, but it was firm on not responding.
On the other hand, I tried asking ChatGPT anti-USA things, and it was more liberal in decently rejecting my premise via elaborated answers. That felt more human than a bot saying NO to everything its masters don't want us to discuss.
However, not everyone is interested in things China wants to censor. Fair enough! Still, there were a few too many instances where DeepSeek's server was simply unresponsive.

And even the "server is busy" response displays after 5-10 seconds of wait. Frankly, a typical user would hardly stay after 2-3 seconds unless DeepSeek has something novel to offer.
Unfortunately, there was none in this case.
It's just the beginning
There is a ton to improve with DeepSeek.
The guardrails are too narrow. Its server becomes unresponsive at times. The performance didn't feel that helpful to me as an end user. And DeepSeek's privacy policy is another issue.
On the upside, the cost can be a significant advantage for Chinese developers. But still, I doubt people will switch to DeepSeek just for the price.
Competition is breakneck, and quality, which DeepSeek currently struggles with, is the only constant people look up to and stay with.