The Munich Regional Court (LG) ruled in a judgment dated November 11, 2025 (Ref. 42 O 14139/24) decided that reproducible training texts available in a model (here: ChatGPT 4 and 4o from OpenAI) (“Memorization„) as duplication within the meaning of Section 16 of the German Copyright Act (UrhG). It is sufficient that the training texts are reproducibly available in the model:
The plaintiff claims that the chatbot generates Reproductions of the training data to a considerable extent. This so-called Memorization of content within models leads to the Regurgitationi.e. to produce output that explicitly reproduces certain training inputs […] […]
201 a. The Chamber is convinced that the texts at issue are […] Included in the model.
202 aa. It is known from information technology research that training data can be contained in models and can be extracted as outputs, which is referred to as memorization […]. Such memorization occurs when the unspecific parameters in training do not just extract information from the training data set, but a complete transfer of the training data can be found in the parameters specified after the training.
203 The multiple occurrence of a training date in the training set is assumed to be the cause of memorization, which mainly occurs with large models […].
204 The memorization of training data can be verified using various methods. If the training data is known, it is possible to compare the training data with outputs using simple prompts and sufficient text length to determine memorization. Otherwise, the parameters entropy and perplexity are used to examine the certainty with which a model reproduces an output – in the case of trained and memorized content, the certainty is high […]. Contrary to the defendant’s statement, simple prompts are not a condition for generating the training data as outputs, but merely serve to prove memorization. […]
205. the Memorization can already be determined here by comparing the lyrics with the outputs. The use of the disputed song lyrics as training data is undisputed. According to Annex K 2, the song lyrics in dispute are clearly recognizable in the submitted outputs by the very simple prompts “What are the lyrics of [song title]”, “Who wrote the lyrics”, “What is the chorus of [song title]”, “Please also tell me the first verse”, and “Please also tell me the second verse”.
The fact that texts have been fed in as training data and are reproduced during queries constitutes prima facie evidence that the texts are stored in the model in duplicated form. Furthermore, duplication does not require a work to be reproduced identically. It is also sufficient to specify a work in a modified form. The technical details are also irrelevant:
For reproduction under copyright law how memorization works in detail remains open. It is irrelevant whether one speaks of storing or copying the training data or, as the defendants put it, whether the model reflects in its parameters what it has learned based on the entire training data set, namely relationships and patterns of all words or tokens that represent the diversity of human language and its contexts. This is because it is crucial that the song lyrics that served as training data are reproducibly contained in the model and thus embodied.
The following was then not applicable TDM exception (§ 44b UrhG):
Language models such as the models in dispute generally fall within the scope of the text and data mining restrictions. The regulations cover necessary duplications when compiling the data corpus in phase 1 (see above), but not further duplications in the model in phase 2. If, as in the present case, information is not only extracted from training data in phase 2, but works are also reproduced, this does not constitute text and data mining. Even if the limitations provisions generally apply to the training of models, reproductions in the model are not reproductions that are covered by the limitations provision, as they are not only used to prepare the text and data mining.
The judgment was issued by Mathias Lejeune comments.