Take-Aways (AI)
  • Nine data pro­tec­tion aut­ho­ri­ties publish a joint state­ment on auto­ma­ted data scra­ping and address lar­ge tech­no­lo­gy com­pa­nies as suspects.
  • Publicly acce­s­si­ble data is also sub­ject to data pro­tec­tion obli­ga­ti­ons for data coll­ec­tors and publishers; Swiss law empha­si­zes limi­t­ed exemption.
  • Scraped data har­bors risks such as iden­ti­ty theft, sur­veil­lan­ce through aggre­ga­ti­on, faci­li­ta­ted facial reco­gni­ti­on as well as abu­si­ve access by aut­ho­ri­ties and spam.
  • Ope­ra­tors should take tech­ni­cal and orga­nizatio­nal pro­tec­ti­ve mea­su­res; pri­va­te indi­vi­du­als pro­tect them­sel­ves through data pro­tec­tion noti­ces and cau­tious sharing.

An eclec­tic group of nine data pro­tec­tion aut­ho­ri­ties – the FDPIC and aut­ho­ri­ties from Austra­lia, Cana­da, UK, Hong Kong, Nor­way, New Zea­land, Colom­bia, Jer­sey, Moroc­co, Argen­ti­na and Mexi­co – have deve­lo­ped a joint state­ment on data scra­ping published, i.e. for the auto­ma­ted extra­c­tion of data from web­sites. This is hap­pe­ning incre­a­sing­ly often, and the suspects are Alpha­bet (for You­Tube), Byte­Dance (for Tik­Tok), Meta (for Insta­gram, Face­book and Threads), Micro­soft (for Lin­ke­dIn), Sina (for Wei­bo) and X Corp. (for Twit­ter, or now “X”), who were ser­ved with the statement.

The state­ment is accor­din­gly gene­ric. In essence, it says that both the com­pa­ny that obta­ins data from the Inter­net and the one that publishes it, data pro­tec­tion obli­ga­ti­ons have, even if the data are fac­tual­ly public. Under Swiss law, this is true inso­far as the exemp­ti­on for pro­ce­s­sing public data is of limi­t­ed scope and often overestimated.

In doing so, the aut­ho­ri­ties make cer­tain Risks from. Scraped data – a Ger­man term is argu­ab­ly lack­ing – can be used for attacks and iden­ti­ty theft, and aggre­ga­ting them crea­tes the risk of sur­veil­lan­ce – e.g., faci­li­ta­ted facial reco­gni­ti­on – and access by aut­ho­ri­ties inte­re­sted in such data pools, inclu­ding for poli­ti­cal or intel­li­gence pur­po­ses. Spam is also a risk, he said.

Anyo­ne who publishes data should the­r­e­fo­re pro­tect them­sel­ves from scra­ping. pro­tectThis may include tech­ni­cal rest­ric­tions on fre­quent or suspec­ted access, aut­ho­rizati­on mea­su­res such as captchas, and orga­nizatio­nal mea­su­res such as war­nings against scra­pers. If the appli­ca­ble law covers scra­ping as a secu­ri­ty breach – which under the DPA requi­res that secu­ri­ty mea­su­res have been taken – noti­fi­ca­ti­on may be required.

Pri­va­te indi­vi­du­als can also pro­tect them­sel­ves, for exam­p­le by rea­ding the pri­va­cy state­ments of web­site ope­ra­tors (ano­ther rea­son to read pri­va­cy state­ments!), and abo­ve all by sha­ring less.