OpenAI Directed to Produce 20M ChatGPT Logs in Ongoing Copyright Dispute

OpenAI has been ordered by a federal magistrate judge to provide approximately 20 million de-identified chat logs to The New York Times and other plaintiffs as part of a continuing copyright lawsuit. The order, issued in New York, rejects OpenAI’s attempt to block the production of chat records and requires the company to release the data under a protective framework. The case centers on allegations that OpenAI trained its models on copyrighted news content without permission, a claim brought by The New York Times in December 2023. Judge Ona T. Wang stated, “while the court recognizes that the privacy considerations of OpenAI’s users are sincere,” those concerns “are only one factor in the proportionality analysis, and cannot predominate where there is clear relevance and minimal burden.”

The court determined that the chat logs are necessary to assess whether ChatGPT outputs replicated the Times’ material. The ruling adds pressure to OpenAI as separate disputes highlight rising concerns around training data, content licensing, and user privacy. Over the past year, the court directed the company to retain a wide range of user data, including chats that users may have deleted. OpenAI later argued that the recent order was “clearly erroneous” and “disproportionate,” stating that compliance would require revealing millions of private conversations. The dispute reflects broader legal challenges facing AI developers, as authors, news organizations, and other creators seek clarity on how copyright laws apply to modern AI systems.

Become a Subscriber

Please purchase a subscription to continue reading this article.

Subscribe Now

Read more