Die eng­li­sche Daten­schutz-Auf­sichts­be­hör­de ICO hat ein umfang­rei­ches Grund­la­gen­pa­pier zum The­ma “Big data, arti­fi­ci­al intel­li­gence, machi­ne learning and data pro­tec­tion” in einer neu­en Fas­sung ver­öf­fent­licht. Zu einer der Haupt­fra­gen in die­sem Zusam­men­hang, der Anony­mi­sie­rung (und damit auch zur Fra­ge, wann Per­so­nen­da­ten vor­lie­gen), äussert sich das Papier nicht abschlie­ssend, aber recht dif­fe­ren­ziert:

Some com­men­ta­tors have poin­ted to examp­les of whe­re it has appar­ent­ly been pos­si­ble to iden­ti­fy indi­vi­du­als in anony­mi­sed data­sets, and so con­clu­ded that anony­mi­sa­ti­on is beco­m­ing increa­singly inef­fec­tive in the world of big data. On the other hand, Cavou­ki­an and Castro have found short­co­m­ings in the main stu­dies on which this view is based. A recent MIT stu­dy loo­ked at records of three mon­ths of credit card tran­sac­tions for 1.1 mil­li­on peop­le and clai­med that, using the dates and loca­ti­ons of four purcha­ses, it was pos­si­ble to iden­ti­fy 90 per­cent of the peop­le in the data­set. Howe­ver, Kha­lid El Emam has poin­ted out that, whi­le the rese­ar­chers were able to iden­ti­fy uni­que pat­terns of spen­ding, they did not actual­ly iden­ti­fy any indi­vi­du­als. He also sug­ge­sted that in prac­tice access to a data­set such as this would be con­trol­led and also that the anony­mi­sa­ti­on tech­ni­ques applied to the data­set were not par­ti­cu­lar­ly sophi­sti­ca­ted and could have been impro­ved.

It may not be pos­si­ble to esta­blish with abso­lu­te cer­tain­ty that an indi­vi­du­al can­not be iden­ti­fied from a par­ti­cu­lar data­set, taken toge­ther with other data that may exist else­whe­re. The issue is not about eli­mi­na­ting the risk of re-iden­ti­fi­ca­ti­on alto­ge­ther, but whe­ther it can be miti­ga­ted so it is no lon­ger signi­fi­cant. Orga­ni­sa­ti­ons should focus on miti­ga­ting the risks to the point whe­re the chan­ce of re-iden­ti­fi­ca­ti­on is extre­me­ly remo­te. The ran­ge of data­sets avail­able and the power of big data ana­ly­tics make this more dif­fi­cult, and the risk should not be unde­re­sti­ma­ted. But that does not make anony­mi­sa­ti­on impos­si­ble or in effec­tive.”

Posted by David Vasella

RA Dr. David Vasella ist Rechtsanwalt bei FRORIEP. Er ist auf IT-, Datenschutz- und Immaterialgüterrecht spezialisiert und ist Lehrbeauftragter der Universität Zürich. Er ist Gründer von swissblawg.