Copyright and Generative AI under the AI Act

Copyright and Generative AI under the AI Act
Definition
Foundation Models have key functional characteristics of a general-purpose model, in particular the generality and the capability to competently perform a wide range of distinct tasks (rec. 97). Such generality is assumed for a model with at least a billion parameters and trained with a large amount of data using self-supervision at scale (rec. 98). Large generative models allow for flexible generation of content such as in the form of text, audio, images or video.
Documentation
Providers of Generative AI should adopt transparency measures, drawing and keeping up to date documentation and the provision of information to downstream providers. The minimum set of documents to be included in such documentation is set at Annex XII of the AI Act.
In particular, they must include a general description of the general-purpose AI model including (a) the tasks that the model is intended to perform and the type and nature of AI systems into which it can be integrated, (b) the acceptable use policies applicable, (c) the date of release and methods of distribution, (d) how the model interacts, or can be used to interact, with hardware or software that is not part of the model itself, where applicable, (e) the versions of relevant software related to the use of the general-purpose AI model, where applicable, (f) the architecture and number of parameters, (g) the modality (e.g. text, image) and format of inputs and outputs, (h) the licence for the model.
In addition, the general purpose AI must include a description of the elements of the model and of the process for its development, including, (a) the technical means (e.g. instructions for use, infrastructure, tools) required for the general-purpose AI model to be integrated into AI systems, (b) the modality (e.g. text, image, etc.) and format of the inputs and outputs and their maximum size (e.g. context window length, etc.), (c) information on the data used for training, testing and validation, where applicable, including the type and provenance of data and curation methodologies.
Obligations
According to the AI Act in force, the following are some of the obligations of providers of foundation models:
(a) providers of Generative AI in the EU market should ensure compliance with EU law on copyright and related rights and, in particular, identify and respect the reservation of rights expressed by right holders under the European Union Copyright Directive (EUCD), irrespective of where the copyright-relevant training activities have occurred (recs 21 and 22) and to the extend the output produced by those systems is intended to be used in the EU;
(b) providers must adopt a policy to respect EU copyright law, this includes identifying and respecting the reservations of rights as expressed in Art. 4 EUCD (Art. 53, recs106 and 107); and
(c) providers will be obliged to draft and make publicly available a detailed summary of the content used for training their GPAIM; this summary should be based on a template provided by the AI Office.
These obligations, under certain conditions, do not apply to providers of AI models that are made accessible to the public under a free and open licence, who are instead encouraged to implement widely adopted documentation practices, such as model cards and data sheets (recs 89, 102,103 and 104).
Art.53(1) of the AI Act foresees that providers of GPAIM shall: (a) [..] (b) draw up, keep up-to-date and make available information and documentation to providers of AI systems who intend to integrate the GPAIM into their AI systems. Without prejudice to the need to observe and protect IP rights and confidential business information or trade secrets in accordance with EU and national law, the information and documentation shall: (i) enable providers of AI systems to have a good understanding of the capabilities and limitations of the GPAIM and to comply with their obligations pursuant to this Regulation; and (ii) contain, at a minimum, the elements set out in Annex XII; and (c) put in place a policy to comply with EU law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technologies, a reservation of rights expressed pursuant to Art. 4(3) of Directive (EU) 2019/790.
Code of Practice of GPAI
The Code of Practice on General-Purpose Artificial Intelligence (GPAI), drafted by the AI Office, aims to help companies comply with the EU’s AI Act, and includes transparency and copyright-related rules, risk assessment and mitigation measures, and is now set to come out in May 2025 at the earliest.
The AI Office’s released template (2nd draft code of practice) requires AI providers of GPAI to disclose detailed data usage from pre-training to fine-tuning. The following are required: (1) to break down data types (text, audio, etc.); (2) to list major sources (open web, datasets); and (3) to explain copyright compliance (as per the EUCD). The key recommendations for copyright compliance (p.14 of the code) are: (1) an internal compliance policy and assigned responsibilities; (2) reasonable copyright due diligence, including asking content licensors about their compliance with opt-outs; (3) to avoid overfitting; (4) requirement for downstream obligations to use measures to prevent outputs that are identical or recognisably similar to protected works; (5) to respect robot.txt. files; (6) to implement widely used standards and tools to detect opt-outs; (7) to collaborate on standards-setting for interoperable, machine-readable opt-outs; and (8) no crawling of piracy websites.
Final Remarks
The working Group of the EU GPAI Code of Practice on copyright has hinted for an intermediate solution to the question of the AI’s Act extraterritoriality provision, that is that the EU AI Act applies when the model provider scraped websites hosted on servers located in the EU. If the content is hosted and access controlled in the EU, it seems justified to oblige a GPAIM provider to identify and comply with the respective access controls laws, even if the consecutive training takes place in a third country.
It should be emphasised that the reproduction of works by an AI model constitutes a copyright-relevant reproduction and in addition that making them available on the EU market may infringe the right of making available to the public. The training of such models is not a case of TDM but of copyright infringement, since parts of the training data can be memorised in whole or in part by current generative models and can therefore be generated again with suitable prompts by the end-users. According to rec. 105 ‘..any use of copyright protected content requires the authorization of the rightsholder, unless relevant copyright exceptions and limitations apply’.
Lastly, it is noted that compliance with the transparency obligations of the AI Act should not be interpreted as indicating that the use of the AI system or output is lawful (rec. 137).
Marios D. Sioufas
Deputy Managing Partner
LL.M. in Intellectual Property Law – Queen Mary University of London
Save the article in PDF format:
Σιούφας & Συνεργάτες | Γιώργος Σιούφας | Μάριος Σιούφας
For More Info
Contact the secretariat of the Legal Services Directorate at telephone: 213 017 5600, or send an email to info@sioufaslaw.gr and we will contact you immediately.