ChatGPT Maker Suspects China’s Dirt Cheap DeepSeek AI Models Were Built Using OpenAI Data — and the Irony Is Not Lost on the Internet

Feb 27,25

OpenAI suspects that China's DeepSeek AI models, significantly cheaper than Western counterparts, may have been trained using OpenAI data, sparking controversy and market turmoil. The emergence of DeepSeek, and its R1 model specifically, caused a dramatic drop in the stock prices of major AI-related companies, with Nvidia experiencing its largest single-day loss in history. DeepSeek claims its model's low training cost ($6 million) and reduced computational needs are due to its open-source DeepSeek-V3 foundation.

This development has raised concerns about the massive investments American tech companies are pouring into AI, prompting investor apprehension. The popularity of DeepSeek, which quickly topped US app download charts, further fueled these anxieties. OpenAI and Microsoft are investigating whether DeepSeek violated OpenAI's terms of service by employing a technique called "distillation" – extracting data from larger models to train smaller ones – using OpenAI's API. OpenAI confirmed its awareness of such attempts by Chinese and other companies to replicate leading US AI models and stated its commitment to protecting its intellectual property.

Donald Trump's AI advisor, David Sacks, corroborated OpenAI's suspicions, suggesting that DeepSeek's actions involved the unauthorized extraction of knowledge from OpenAI models. He anticipates that leading AI companies will implement measures to prevent future instances of data distillation.

The situation highlights a significant irony: OpenAI itself has faced accusations of using copyrighted material without authorization in the development of ChatGPT. This hypocrisy has been widely noted on social media, with critics pointing to OpenAI's previous assertion that creating AI tools like ChatGPT without copyrighted material is "impossible." OpenAI has defended its practices, citing the extensive use of copyrighted material as necessary for training large language models and arguing that its use constitutes "fair use." This claim is currently being challenged in lawsuits filed by the New York Times and 17 authors, alleging copyright infringement. The legal landscape surrounding AI training data and copyright remains highly contested, particularly in light of a 2018 US Copyright Office ruling that AI-generated art is not eligible for copyright protection.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.

Top News
MORE
Copyright © 2024 kuko.cc All rights reserved.