ICO: Big data, arti­fi­ci­al intel­li­gence, machi­ne lear­ning and data protection

Die eng­li­sche Daten­schutz-Auf­sichts­be­hör­de ICO hat ein umfang­rei­ches Grund­la­gen­pa­pier zum The­ma “Big data, arti­fi­ci­al intel­li­gence, machi­ne lear­ning and data pro­tec­tion” in einer neu­en Fas­sung ver­öf­fent­licht. Zu einer der Haupt­fra­gen in die­sem Zusam­men­hang, der Anony­mi­sie­rung (und damit auch zur Fra­ge, wann Per­so­nen­da­ten vor­lie­gen), äussert sich das Papier nicht abschlie­ssend, aber recht differenziert:

Some com­men­ta­tors have poin­ted to examp­les of whe­re it has appar­ent­ly been pos­si­ble to iden­ti­fy indi­vi­du­als in anony­mi­sed data­sets, and so con­clu­ded that anony­mi­sa­ti­on is beco­ming incre­a­sing­ly inef­fec­ti­ve in the world of big data. On the other hand, Cavou­ki­an and Castro have found short­co­mings in the main stu­dies on which this view is based. A recent MIT stu­dy loo­ked at records of three months of cre­dit card tran­sac­tions for 1.1 mil­li­on peo­p­le and clai­med that, using the dates and loca­ti­ons of four purcha­ses, it was pos­si­ble to iden­ti­fy 90 per­cent of the peo­p­le in the data­set. Howe­ver, Kha­lid El Emam has poin­ted out that, while the rese­ar­chers were able to iden­ti­fy uni­que pat­terns of spen­ding, they did not actual­ly iden­ti­fy any indi­vi­du­als. He also sug­ge­sted that in prac­ti­ce access to a data­set such as this would be con­trol­led and also that the anony­mi­sa­ti­on tech­ni­ques applied to the data­set were not par­ti­cu­lar­ly sophi­sti­ca­ted and could have been improved.

It may not be pos­si­ble to estab­lish with abso­lu­te cer­tain­ty that an indi­vi­du­al can­not be iden­ti­fi­ed from a par­ti­cu­lar data­set, taken tog­e­ther with other data that may exist else­whe­re. The issue is not about eli­mi­na­ting the risk of re-iden­ti­fi­ca­ti­on altog­e­ther, but whe­ther it can be miti­ga­ted so it is no lon­ger signi­fi­cant. Orga­ni­sa­ti­ons should focus on miti­ga­ting the risks to the point whe­re the chan­ce of re-iden­ti­fi­ca­ti­on is extre­me­ly remo­te. The ran­ge of data­sets available and the power of big data ana­ly­tics make this more dif­fi­cult, and the risk should not be unde­re­sti­ma­ted. But that does not make anony­mi­sa­ti­on impos­si­ble or in effec­ti­ve.”




Ähnliche Beiträge