Reuters reports that Chinese AI company DeepSeek is accelerating the release of its new R2 model, potentially launching before its originally planned May debut. The R2 model is expected to significantly enhance reasoning capabilities and compete directly with OpenAI’s o3 model while maintaining DeepSeek’s commitment to open-source development.
Enhanced Reasoning Through Reinforcement Learning
According to internal documents, DeepSeek R2 will be built upon its V3 base model, leveraging the company’s existing infrastructure and computing capabilities. DeepSeek researcher Daya had previously mentioned in early February that reinforcement learning (RL) was still in early stages but would see “significant progress” this year.
The R1 research paper noted that the next version would see substantial improvements as more reinforcement learning data becomes available. The paper highlighted that as RL training data increases, models not only improve at complex reasoning tasks but also naturally develop emergent behaviors such as reflection and exploring alternative approaches.
Open-Source Strategy vs. Closed Models
Industry analysts note that DeepSeek’s approach stands in contrast to major competitors. While OpenAI has decided not to release the complete o3 model, with GPT-4.5 reportedly becoming the last standalone base model before moving to a more integrated and closed “GPT-5” approach, DeepSeek plans to continue its open-source strategy.
This distinction is significant as companies like OpenAI and Anthropic increasingly treat both base and reasoning models as “raw materials” rather than “final products,” keeping their most advanced systems proprietary. DeepSeek, meanwhile, is committing to open-sourcing not just the models themselves, but also the “recipes” for creating them.
Infrastructure and Technical Innovations
DeepSeek’s competitive advantage stems from significant investments in research and computing infrastructure by its investor Phantom, which spent approximately 1.2 billion yuan ($166 million) between 2020 and 2021 to build two AI supercomputing clusters equipped with about 10,000 NVIDIA A100 chips.
Former DeepSeek employees attribute the company’s success to founder Liang Wenfeng’s focus on cost-effective AI architectures. The company utilizes techniques like Mixture of Experts (MoE) and Multi-head Latent Attention (MLA) to significantly reduce computational costs.
According to Vijayasimha Alilughatta, COO of Indian tech service provider Zensar, “DeepSeek R2’s release could be a turning point for the AI industry,” with its cost-effective approach potentially “inspiring global enterprises to accelerate their own efforts and break the monopoly of industry giants.”
The company reportedly maintains a flat management structure that fosters collaboration, with Liang described as “low-key and introverted” but deeply engaged with technical details alongside younger staff members.
Industry observers suggest that beyond R2, the upcoming V4 base model might incorporate multimodal capabilities and establish a new performance ceiling for reasoning models, potentially targeting capabilities similar to what OpenAI is developing for “GPT-5.”