ICO: Big data, arti­fi­ci­al intel­li­gence, machi­ne lear­ning and data protection

The Eng­lish data pro­tec­tion super­vi­so­ry aut­ho­ri­ty ICO has issued a com­pre­hen­si­ve back­ground paper on “Big data, arti­fi­ci­al intel­li­gence, machi­ne lear­ning and data pro­tec­tion” has been published in a new ver­si­on. The paper is not con­clu­si­ve on one of the main que­sti­ons in this con­text, name­ly anony­mizati­on (and thus also on the que­sti­on of when per­so­nal data exists), but is quite differentiated:

Some com­men­ta­tors have poin­ted to examp­les of whe­re it has appar­ent­ly been pos­si­ble to iden­ti­fy indi­vi­du­als in anony­mi­sed data­sets, and so con­clu­ded that anony­mizati­on is beco­ming incre­a­sing­ly inef­fec­ti­ve in the world of big data. On the other hand, Cavou­ki­an and Castro have found short­co­mings in the main stu­dies on which this view is based. A recent MIT stu­dy loo­ked at records of three months of cre­dit card tran­sac­tions for 1.1 mil­li­on peo­p­le and clai­med that, using the dates and loca­ti­ons of four purcha­ses, it was pos­si­ble to iden­ti­fy 90 per­cent of the peo­p­le in the data­set. Howe­ver, Kha­lid El Emam has poin­ted out that, while the rese­ar­chers were able to iden­ti­fy uni­que pat­terns of spen­ding, they did not actual­ly iden­ti­fy any indi­vi­du­als. He also sug­ge­sted that in prac­ti­ce access to a data­set such as this would be con­trol­led and also that the anony­mizati­on tech­ni­ques applied to the data­set were not par­ti­cu­lar­ly sophi­sti­ca­ted and could have been improved.

It may not be pos­si­ble to estab­lish with abso­lu­te cer­tain­ty that an indi­vi­du­al can­not be iden­ti­fi­ed from a par­ti­cu­lar data­set, taken tog­e­ther with other data that may exist else­whe­re. The issue is not about eli­mi­na­ting the risk of re-iden­ti­fi­ca­ti­on altog­e­ther, but whe­ther it can be miti­ga­ted so it is no lon­ger signi­fi­cant. Orga­nizati­ons should focus on miti­ga­ting the risks to the point whe­re the chan­ce of re-iden­ti­fi­ca­ti­on is extre­me­ly remo­te. The ran­ge of data­sets available and the power of big data ana­ly­tics make this more dif­fi­cult, and the risk should not be unde­re­sti­ma­ted. But that does not make anony­mizati­on impos­si­ble or in effec­ti­ve.“

Aut­ho­ri­ty

Area

Topics

Rela­ted articles

Sub­scri­be