The Fine Line of Fair Use: AI and Copyright Concerns

February 24, 2023

Generative AI has made its mark in the tech industry thanks to its ability to assist in millions of daily tasks for various personal and commercial purposes across multiple industries, like Marketing, Advertising, Communication, Education, Healthcare and more.

According to data from Pitchbook, the generative AI space raised a staggering $1.3 billion in VC funding by November 2022, marking a 15% increase from the previous year. The number of VC deals in the field almost doubled from 48 to 84 between 2020 and 2021, with a significant spike in the deal value of nearly 400%.

However, the use of generative AI for commercial purposes raises various ethical and legal questions, one of which is the issue of copyright: who owns the rights to the content generated by an AI model? What are the implications for the original creators of the data used to train the model?

This is relatively new and uncharted territory. The future of commercial generative AI will likely be shaped by cases that address these questions, and their impact will likely be felt as quickly as the technology continues to advance.

In this article, we will give a brief overview of what Generative AI is and which major implications related to copyright are.

Definition and explanation of Generative AI

Generative AI refers to a category of artificial intelligence (AI) technologies that use machine learning algorithms to automatically generate new data, such as images, sounds, and text. The process involves training an AI model on a large dataset of examples in a particular domain and then using that model to generate new, similar examples.

The process used in generative AI is based on deep learning, specifically a type of deep learning called generative adversarial networks (GANs). GANs consist of two neural networks, a generator and a discriminator, that work together to create new data. The generator network creates new content, such as images, text, or audio, based on the data it has been trained on. The discriminator network evaluates the generated data and provides feedback to the generator. The generator uses this feedback to improve its outputs until it generates content that is indistinguishable from real data.

This process of collaboration between the generator and discriminator allows generative AI to create new and unique outputs, expanding the potential of traditional AI that is only capable of recognizing patterns and making predictions. With the development of generative AI, new possibilities for creating images, text, audio, and other forms of content have emerged.

Who owns the content copyright?

The ownership of content generated by an AI model is a complex and evolving issue, with both legal and ethical implications for the original creators of the data used to train the model.

When an AI model is trained on a dataset, it is learning from and making use of the information contained in that dataset. As such, it can be argued that the creators of the dataset have some claim to the AI-generated content. However, it can also be argued that the AI model is an original creation in its own right and that the rights to its output should be owned by the creators of the AI.

From one side, the use of copyrighted material for training AI models might be considered fair use, as long as it falls under certain categories such as educational content, criticism, news reporting, or research. However, this does not apply to the generation of content. To put it differently, it is permissible to use someone else’s data for training a generative AI model, but the usage of the output generated by the model could infringe copyright laws

In fact, ownership concern becomes even more complex when the AI model is used in a commercial setting. In this case, the ownership of works generated by the AI may depend on the terms of the agreement between the parties involved, such as the creators of the dataset and the owners of the AI model. In some cases, the rights to the content generated by an AI model may be protected by intellectual property laws, such as copyright or trademark law.

Recent lawsuits in the generative AI space

As generative AI becomes increasingly widespread, several lawsuits have emerged, with major corporations facing allegations of copyright infringement. A class action lawsuit has been filed against Microsoft, GitHub, and OpenAI, accusing them of violating copyright law by enabling Copilot, a generative AI system trained on vast amounts of publicly available code, to reproduce licensed code snippets without proper attribution.

In addition, two companies that offer popular AI art tools, Stability AI and Midjourney, are being sued for violating the rights of millions of artists by using web-collected images to train their AI systems. Recently, Getty Images, a provider of stock images, sued Stability AI for utilizing a considerable number of images from their site without authorization to train its art generation tool - Stable Diffusion.

Getty Images CEO Craig Peters told The Verge in an interview: “We don’t believe this specific deployment of Stability’s commercial offering is covered by fair dealing in the UK or fair use in the US. The company made no outreach to Getty Images to utilize our or our contributors’ material, so we’re taking an action to protect our and our contributors’ intellectual property rights.”

Certainly, the rise of generative AI technology has sparked controversy over the legality of training AI systems on pre-existing works. From AI firms’ perspective, such practices are covered under laws like the US fair use doctrine, but numerous rights holders take issue with this claim and believe it constitutes a breach of their copyright. Although opinions among legal experts vary, we will wait to see what will be court’s final decision on this matter.

Implications for Original Creators of the Data

Therefore, looking at the recent panorama, the implications of this uncertainty for the original creators are significant. Without clear and consistent legal protection, they may have difficulty asserting their rights to the content generated by the AI, and may be at risk of losing control over their work and having it used without their permission.

Overall, the issue of ownership of content generated by AI models is complex and requires a nuanced understanding of both the legal and ethical implications of AI technology. As the use of AI continues to grow, it will be important for the industry to establish clear and consistent guidelines for the ownership of AI-generated content in order to ensure that those rights are protected.

The outcome of a fair use defense for AI-generated works mostly depends on whether they are considered transformative, meaning they use copyrighted works in a manner that is distinct from the original. The landmark 2021 Supreme Court case of Google v. Oracle established that the use of data to create new works can be considered transformative. In this case, Google’s utilization of parts of Java SE code for the creation of its Android operating system was deemed fair use.

Globally, there is a trend toward more lenient usage of publicly available content, regardless of its copyright status. For instance, the UK is proposing to amend a law to allow text and data extraction for any purpose, empowering businesses and commercial entities at the expense of rights holders. On the other hand, some experts argue that the use of copyrighted data to train AI systems should incur fines or penalties due to intellectual property or privacy restrictions.

As this is still a developing area, there are no definite answers. Hence, it’s crucial for companies to assess the terms of use for each commercial generative AI system before proceeding. Midjourney, for example, has different rights for paid and unpaid users, while OpenAI’s DALL-E assigns rights to generated art, but warns users about “similar content” and emphasizes due diligence to avoid infringement.

Final Thoughts

Generative AI has revolutionized the tech industry, but with its great potential comes great responsibility. The question of who owns the rights to the content generated by an AI model has raised legal and ethical discussions. While opinions among legal experts vary, the challenge lies in striking a balance between protecting the rights of creators and enabling the advancements of AI technology.

So, what will be the future of generative AI and intellectual property rights? The future direction of this conundrum will likely depend on a range of factors, such as legal precedent, technological advancements, and societal values. Certainly, as the technology continues to advance, these questions will need an urgent answer.

← Previous

Next →