Premium

Authors Sue OpenAI for Alleged Copyright Violations, Testing the Boundaries of AI and Intellectual Property

July 21, 2023

In a landmark lawsuit, authors Mona Awad and Paul Tremblay have filed a class action complaint against OpenAI, the company behind ChatGPT, alleging copyright violations. The authors claim that their copyrighted publications were improperly "ingested" and "used to train" ChatGPT, as the chatbot generated "very accurate summaries" of their novels. The case raises important questions about the legal boundaries of generative AI and the protection of intellectual property rights.

Awad and Tremblay's lawyers argue that OpenAI is profiting unfairly from "stolen writing and ideas" and seek damages on behalf of all US-based authors whose works were utilized to train ChatGPT. They contend that organizations like OpenAI cannot disregard copyright laws, which provide substantial legal protection for authors. However, proving monetary loss for the authors may be challenging, as ChatGPT's functionality relies on a vast range of internet content, including book discussions, and may perform similarly without direct training on specific books.

Become a Subscriber

Please purchase a subscription to continue reading this article.

Subscribe Now

The lawsuit also brings attention to OpenAI's increasing secrecy regarding its training data. OpenAI has previously hinted at the use of its "internet-based book corpora" (Books2) for training ChatGPT. The plaintiffs' lawyers suggest that this collection, purportedly sourced from shadow libraries such as Library Genesis (LibGen) and Z-Library, which facilitate widespread book torrenting, could be the basis of the alleged copyright infringement.

The outcome of this case may hinge on whether courts view the use of copyrighted material in AI training as "fair use" or as unauthorized copying. The absence of a "fair use" argument in the UK legal system highlights the potential for different outcomes in different jurisdictions. Scholars and experts point out that AI policies need to adapt to rapidly evolving technological developments, as they are currently fragmented and inconsistent across different jurisdictions.

The publishing industry has been grappling with how to protect authors against AI technology since ChatGPT's introduction in November 2022. The Society of Authors (SoA) recently issued guidelines for authors to safeguard their work against AI models. The SoA and the Authors' Licensing and Collecting Society (ALCS) express support for the authors' lawsuit, as they have long been concerned about the extensive use of authors' work for training large language models.

Furthermore, the lawyers representing Awad and Tremblay emphasize that AI should adhere to copyright law and only use licensed data from proper sources. They argue that it is ironic that AI tools, often referred to as "artificial intelligence," rely entirely on human data and innovation. They conclude by noting the significance of this case in shaping the future of AI, intellectual property, and the rights of human creators.

It also underscores the need for consistent and adaptive AI policies across jurisdictions to protect the rights and interests of authors and creators in the digital age.