Die englische Datenschutz-Aufsichtsbehörde ICO hat ein umfangreiches Grundlagenpapier zum Thema “Big data, artificial intelligence, machine learning and data protection” in einer neuen Fassung veröffentlicht. Zu einer der Hauptfragen in diesem Zusammenhang, der Anonymisierung (und damit auch zur Frage, wann Personendaten vorliegen), äussert sich das Papier nicht abschliessend, aber recht differenziert:
“Some commentators have pointed to examples of where it has apparently been possible to identify individuals in anonymised datasets, and so concluded that anonymisation is becoming increasingly ineffective in the world of big data. On the other hand, Cavoukian and Castro have found shortcomings in the main studies on which this view is based. A recent MIT study looked at records of three months of credit card transactions for 1.1 million people and claimed that, using the dates and locations of four purchases, it was possible to identify 90 percent of the people in the dataset. However, Khalid El Emam has pointed out that, while the researchers were able to identify unique patterns of spending, they did not actually identify any individuals. He also suggested that in practice access to a dataset such as this would be controlled and also that the anonymisation techniques applied to the dataset were not particularly sophisticated and could have been improved.
It may not be possible to establish with absolute certainty that an individual cannot be identified from a particular dataset, taken together with other data that may exist elsewhere. The issue is not about eliminating the risk of re-identification altogether, but whether it can be mitigated so it is no longer significant. Organisations should focus on mitigating the risks to the point where the chance of re-identification is extremely remote. The range of datasets available and the power of big data analytics make this more difficult, and the risk should not be underestimated. But that does not make anonymisation impossible or in effective.”