Home News DeepSeek AI's Low-Cost Models Suspected to Use OpenAI Data, Sparking Online Irony

DeepSeek AI's Low-Cost Models Suspected to Use OpenAI Data, Sparking Online Irony

by Aria Apr 08,2025

The emergence of DeepSeek AI, a Chinese-developed model, has sparked significant controversy and concern within the U.S. tech industry. The suspicion that DeepSeek may have utilized OpenAI's data to train its own models has led to a sharp reaction from industry leaders and political figures alike. Donald Trump has labeled DeepSeek as a "wake-up call" for the U.S. tech sector, especially after Nvidia experienced a staggering $600 billion drop in market value following a 16.86% plummet in its stock price—the largest single-day loss in Wall Street history. Other tech giants like Microsoft, Meta Platforms, and Google's parent company Alphabet also saw declines ranging from 2.1% to 4.2%, while Dell Technologies, a key player in AI server manufacturing, dropped by 8.7%.

DeepSeek's R1 model, built on the open-source DeepSeek-V3, claims to be a cost-effective alternative to Western AI models like ChatGPT, reportedly requiring significantly less computing power and having been trained for just $6 million. This claim, while disputed by some, has raised questions about the massive investments U.S. tech companies are making in AI, causing investor unease. DeepSeek's rapid rise to the top of the U.S. free app download charts underscores its growing influence and the public's interest in its capabilities.

In response to these developments, OpenAI and Microsoft are investigating whether DeepSeek used OpenAI's API to integrate OpenAI's models into its own, a practice known as distillation. This technique involves extracting data from larger, more capable models to train new ones, which violates OpenAI's terms of service. OpenAI has emphasized its commitment to protecting its intellectual property and is collaborating with the U.S. government to safeguard its advanced models from such practices by competitors and adversaries.

David Sacks, President Trump's AI czar, highlighted the evidence suggesting DeepSeek distilled knowledge from OpenAI's models, indicating that leading U.S. AI companies are likely to take steps to prevent such distillation in the future.

The irony of OpenAI's situation has not gone unnoticed, with critics pointing out that OpenAI itself has been accused of using copyrighted material from the internet to train ChatGPT. In January 2024, OpenAI admitted that training large language models without copyrighted material was "impossible," arguing that limiting training data to public domain works would not meet modern needs. This stance has fueled ongoing debates about the use of copyrighted materials in AI training, highlighted by lawsuits from The New York Times and a group of 17 authors, including George R. R. Martin, against OpenAI and Microsoft for alleged "unlawful use" of their works. OpenAI has defended its practices as "fair use," asserting that such training is essential for developing AI systems that serve contemporary needs.

The legal landscape surrounding AI and copyright continues to evolve, with a notable ruling in August 2023 by District Judge Beryl Howell affirming the U.S. Copyright Office's stance that AI-generated art cannot be copyrighted, emphasizing the necessity of human creativity in copyright protection.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.