DeepSeek teams up with Tsinghua University to develop self-improving AI models

Tuesday 15 Apr 2025

07 Apr 2025, 02:31 pm

(April 7): DeepSeek is working with Tsinghua University on reducing the training its artificial intelligence (AI) models need in an effort to lower operational costs.

The Chinese start-up, which roiled markets with its low-cost reasoning model that emerged in January, collaborated with researchers from the Beijing institution on a paper detailing a novel approach to reinforcement learning to make models more efficient.

The new method aims to help AI models better adhere to human preferences by offering rewards for more accurate and understandable responses, the researchers wrote. Reinforcement learning has proven effective in speeding up AI tasks in narrow applications and spheres. However, expanding it to more general applications has proven challenging — and that’s the problem that DeepSeek’s team is trying to solve with something it calls self-principled critique tuning. The strategy outperformed existing methods and models on various benchmarks, and the result showed better performance with fewer computing resources, according to the paper.

DeepSeek is calling these new models DeepSeek-GRM — short for “generalist reward modelling” — and will release them on an open source basis, the company said. Other AI developers, including Chinese tech giant Alibaba Group Holding Ltd and San Francisco-based OpenAI, are also pushing into a new frontier of improving reasoning and self-refining capabilities, while an AI model is performing tasks in real time.

Menlo Park, California-based Meta Platforms Inc released its latest family of AI models, Llama 4, over the weekend, and marked them as its first to use the mixture of experts (MOE) architecture. DeepSeek’s models rely significantly on MOE to make more efficient use of resources, and Meta benchmarked its new release against the Hangzhou-based start-up. DeepSeek hasn’t specified when it might release its next flagship model.

Uploaded by Tham Yek Lee

China suppliers mock tariffs with Nike, Lululemon deals on TikTok

Tariff shock awaits China after trade surplus hits US$103 bil

Malaysia declares state funeral for Tun Abdullah Ahmad Badawi

Malaysia semiconductor stocks fall amid US probes, software firms gain

US steps up probes into pharmaceutical, chip imports, setting stage for tariffs

Farewell, Pak Lah

Trump floats temporary reprieve for autos as parts tariffs loom

MyCC probes 22 child care services providers for alleged price fixing in KL and Selangor

Imprisoned Najib allowed to attend Tun Abdullah's funeral

HRD Corp's chief executive Shahul Hameed steps down, confirming The Edge report