The Hamburg Regional Court (LG) has ruled (Judgment of 27.09.2024, Ref. 310 O 227/23) that the Downloading a photograph protected by copyright by the provider of a data set for the Artificial intelligence training in the present case falls under the text-and-data-mining (“TDM”) limitation for scientific purposes pursuant to Section 60d of the German Copyright Act (UrhG). The training of an AI itself was not the subject of the ruling.
The defendant is the non-profit research network Laion (for “Large-Scale Artificial Intelligence Open Network”). It provides a data set for the training of AI models, among other things, publicly and free of charge. The set contains almost 6 billion. links to publicly accessible images with a description of the image content. The defendant had downloaded the images linked in a pre-existing data set, used software to check whether the respective description was correct and enriched the images with metadata before publishing the data set. An image provided by a picture agency on the Internet with a watermark from the agency was also affected. There was an objection to scraping on the agency’s website.
Against this background, the Regional Court assessed the admissibility under copyright law as follows:
No merely ephemeral or accompanying reproduction
Not relevant was § Section 44a UrhGwhich exempts a reproduction that fleeting or accompanying is an integral and essential part of a technical process, serves only for transmission in a network or the lawful use of a work and has no independent economic significance:
- The duplication was non-volatilebecause it is not user-independent, but only based on corresponding programming by the provider; moreover, the defendant had said nothing about the storage period;
- she was also not accompanyingbecause images were downloaded specifically for an analysis, i.e. in a conscious and active procurement prior to the analysis.
Under Swiss copyright law, the situation should be assessed in the same way. Art. 24a CopA exempts temporary reproduction under the same conditions as Section 44a UrhG. The reproduction of copyrighted works in a data set for the purpose of training an AI model would hardly be covered by this (see our FAQ on the AI Actquestion 59).
Application of the TDM science barrier
The German Copyright Act regulates the exemption of reproduction for “Text and Data Mining” (TDM) in two provisions:
- § Section 60d UrhG allows TDM and other facilities that non-commercial scientific Conduct research.
- § Section 44b UrhG contains a General barrier provision for TDM also outside of non-commercial research, but under the Reservation of a right of use for publicly accessible works (and with a deletion obligation that does not apply in the case of Section 60d UrhG).
Unlike § 44b UrhG, § 60d UrhG is relevant. The reproduction took place within the framework of a TDM. TDM is the automated analysis of digital works in order to obtain information, particularly about patterns, trends and correlations. This is true: the duplication was used to find “correlations”, namely those between image content and image description.
In this case, the TDM from Laion For the purposes of scientific research:
The concept of scientific research, by allowing the methodical and systematic “pursuit” of new knowledge to suffice, is not to be understood so narrowly that it only covers the work steps directly associated with the acquisition of knowledge. it is sufficient that the work step in question is aimed at a (later) gain in knowledge […]. In particular, the concept of scientific research does not presuppose any subsequent research success.
This could also cover the training of an AI:
Although the creation of the data set as such may not yet be associated with a gain in knowledge, it is a fundamental work step with the aim of using the data set for the purpose of gaining knowledge at a later date. It can be affirmed that such an objective also existed in the present case. It is sufficient that the Data set – undisputed – published free of charge and thus (also) made available to researchers in the field of artificial neural networks.
It was therefore not relevant whether the development of the defendant’s own AI models constituted the defendant’s own research:
Whether the data set […] will also be used by commercial companies for training or for the further development of their AI systems is already irrelevant, because research by commercial companies is also still research – even if not privileged as such under Sections 60c et seq. UrhG - is.
However, the privileged status of Section 60d UrhG only applies to the Non-commercial research. This was fulfilled in the present case because the defendant made the database publicly available free of charge.
The Swiss URG contains TDM only in the context of science, with Art. 24d CopA:
1 For the purpose of scientific research, it is permissible to reproduce a work if the reproduction is conditional on the use of a technical process and there is lawful access to the works to be reproduced.
2 Reproductions made in the context of this Article may be retained for archiving and backup purposes after completion of the scientific research.
3 This Article shall not apply to the reproduction of computer programs.
It is quite obvious to assess the situation in the same way as the Hamburg Regional Court here. The concept of research is not narrower, but on the contrary also includes commercial research (which is only exempted under Section 44b of the German Copyright Act). It can hardly be argued that every training of an AI is covered by this (not every training is likely to be aimed at gaining knowledge), but in the particular circumstances of the present case, Art. 24d CopA is also likely to apply.
Rather no application of the general TDM barrier
On the other hand, the General TDM barrier of Section 44b UrhG. The general requirements are fulfilled (obiter dicta):
Whether the TDM barrier only covers the exploitation of “information hidden in the data” and not also the use of “the content of the intellectual creation”, which is “occasionally advocated”, is doubted by the Regional Court (obiter, since Section 60d UrhG already applies):
- This distinction is justified in the literature by the fact that the training of an AI ultimately serves to generate new image content with the AI, which is why Section 44b UrhG applies accordingly. to reduce teleologically is. However, this intention and the success of the training have not yet been established, as the LG states.
- In addition, it follows from Art. 53 para. 1 lit. c of the AI Actthat the TDM barrier under European law can at least cover training (GPAIM providers must, among other things, have a “Union copyright compliance policy” which also includes the use of a “copyright management system established in accordance with Article 4(3) of the Directive (EU) 2019/790 The TDM barrier), and §44b UrhG implements this provision.
The following also had to be taken into account Infosoc Directivethe Directive on the harmonization of certain aspects of copyright in the information society:
- Its Art. 5 (5) permits the application of the TDM barrier only in special cases where the normal exploitation of the work is not impaired and the Interests of the rights holder are not unduly violated.
- This is also the case here – in particular, the possibility of competition from AI-generated content is not sufficient, if only because future, not yet foreseeable developments would not allow a legally secure distinction between permissible and impermissible uses.
Finally, the downloaded works were also Legally accessibleas required by §44b UrhG. It was not the original image offered only under license that was downloaded, but a watermarked preview image posted for advertising purposes.
However, the application of Section 44b UrhG is likely to lack an effective Reservation of use fail (also here obiter):
- The reservation of use had been declared by the picture agency, which was authorized to do so as the rights holder, and the plaintiff as the rights holder should be able to invoke it.
- The reservation was formulated clearly enough. The fact that it concerned all published works does not contradict this.
- He was probably also machine-readable. “Machine-readable” should be interpreted as “machine-understandable”. In the opinion of the Regional Court, a reservation written in natural language should also suffice, because such reservations are at least machine-readable with a corresponding AI (the Regional Court again refers to Art. 53 para. 1 lit. c AIA, according to which the strategy of the provider of a GPAIM to comply with copyright law also includes the “identification of and compliance with a […] reservation of rights”). also through the latest technologies” includes”. However, the Regional Court points out that it is probably going against a majority opinion here. Ultimately, however, the question can be left open.