FINMA today published its Regu­la­to­ry Noti­ce 08/2024 – Gover­nan­ce and risk manage­ment in the use of arti­fi­ci­al intel­li­gence published (PDF).

FINMA had alre­a­dy for­mu­la­ted its expec­ta­ti­ons for deal­ing with AI in various places, both in the ban­king and insu­rance sec­tors. On this basis, it has car­ri­ed out super­vi­so­ry reviews, inclu­ding on-site inspec­tions. The super­vi­so­ry com­mu­ni­ca­ti­on is the result of the­se and essen­ti­al­ly sum­ma­ri­zes them,

  • which Risks FINMA sees,
  • to which Chal­lenges it has encoun­te­red in its super­vi­si­on (inclu­ding on-site inspec­tions) and
  • which Mea­su­res obser­ved and tested them.

Over­all, FINMA has obser­ved that

  • the use of AI in the finan­cial mar­ket is increasing,
  • the asso­cia­ted risks are often dif­fi­cult to assess and
  • The gover­nan­ce and risk manage­ment struc­tures of finan­cial insti­tu­ti­ons are usual­ly still being developed.

In essence, the super­vi­so­ry com­mu­ni­ca­ti­on cla­ri­fi­es what is known (not in the­se words):

  • When using new tech­no­lo­gy, the dri­ver is not the 2nd line, but the 1st line. It under­stands the tech­no­lo­gy, but some­ti­mes only its pos­si­ble appli­ca­ti­ons and some­ti­mes not even that.
  • The orga­nizati­on as a who­le often neither under­stands the asso­cia­ted risks nor has the neces­sa­ry gover­nan­ce in place. Both the under­stan­ding and the inter­nal struc­tures are deve­lo­ping much more slow­ly than the technology.
  • In addi­ti­on, the­re is often blind trust in the qua­li­ty of purcha­sed ser­vices, with a lack of choice due to mar­ket concentration.
  • At the same time, the inter­nal effort requi­red for audi­ting, moni­to­ring and con­trol­ling per­for­mance is underestimated.

Against this back­drop, the super­vi­so­ry com­mu­ni­ca­ti­on addres­ses both AI-rela­ted chal­lenges and gene­ral or typi­cal defi­ci­en­ci­es in inter­nal struc­tures. Its con­tent can be sum­ma­ri­zed as fol­lows – to put it blunt­ly, FINMA does not paint such a black picture:

Risks

FINMA sees the fol­lo­wing AI-rela­ted risks in particular:

  • Ope­ra­tio­nal risksin par­ti­cu­lar model risks (e.g. lack of robust­ness, cor­rect­ness, bias, lack of sta­bi­li­ty and explainability)
  • IT and cyber risks
  • incre­a­sing Depen­dence from third par­ties, in par­ti­cu­lar hard­ware, model and cloud providers
  • Legal and repu­ta­tio­nal risks
  • Assign­ment of respon­si­bi­li­tycom­pli­ca­ted by the “auto­no­mous and dif­fi­cult to explain actions” of AI systems and “scat­te­red respon­si­bi­li­ties for AI applications”

Gover­nan­ce

  • Pro­blem:
    • Focus too much on data pro­tec­tion risks and too litt­le on model risks
    • the deve­lo­p­ment of AI appli­ca­ti­ons is often decen­tra­li­zed – this leads to less con­si­stent stan­dards, blur­ring of respon­si­bi­li­ty and over­loo­king of risks
    • Purcha­sed ser­vices: It is not always under­s­tood whe­ther they con­tain AI, what data and methods are used and whe­ther suf­fi­ci­ent due dili­gence exists
  • Expec­ta­ti­ons:
    • Super­vi­sed enti­ties with “many or signi­fi­cant appli­ca­ti­ons” have AI governance
    • The­re is a cen­tral inven­to­ry with risk clas­si­fi­ca­ti­on and measures
    • Respon­si­bi­li­ties and accoun­ta­bi­li­ties for the deve­lo­p­ment, imple­men­ta­ti­on, moni­to­ring and use of AI are defined
    • Spe­ci­fi­ca­ti­ons for model tests and sup­port­ing system con­trols, docu­men­ta­ti­on stan­dards and “broad trai­ning mea­su­res” exist
    • Out­sour­cing: addi­tio­nal tests and con­trols are imple­men­ted; the­re are con­trac­tu­al clau­ses that regu­la­te respon­si­bi­li­ties and lia­bi­li­ty issues; the neces­sa­ry skills and expe­ri­ence of the ser­vice pro­vi­der are checked

Inven­to­ry and risk classification

  • Pro­blemDif­fi­cul­ty of com­ple­ten­ess of the inven­to­ry, also due to a nar­row defi­ni­ti­on of AI, decen­tra­li­zed use and incon­si­stent cri­te­ria for the inven­to­ry of appli­ca­ti­ons that are of par­ti­cu­lar importance due to their signi­fi­can­ce or the asso­cia­ted risks
  • Expec­ta­ti­ons:
    • AI” is defi­ned broad­ly enough so that “clas­sic appli­ca­ti­ons” with simi­lar risks are also covered
    • AI invent­ories are com­ple­te and con­tain a risk clas­si­fi­ca­ti­on of AI applications

Data qua­li­ty

  • Pro­blem:
    • Spe­ci­fi­ca­ti­ons and con­trols for data qua­li­ty are missing
    • Data can be incor­rect, incon­si­stent, incom­ple­te, unre­pre­sen­ta­ti­ve, out­da­ted or bia­sed (and with lear­ning systems, data qua­li­ty is often more important than model selection)
    • Purcha­sed solu­ti­ons: Trai­ning data is often not known and may not be suitable
  • Expec­ta­ti­ons:
    • Inter­nal direc­ti­ves with spe­ci­fi­ca­ti­ons for ensu­ring data quality

Tests and ongo­ing monitoring

  • Pro­blemWeak­ne­s­ses in the sel­ec­tion of per­for­mance indi­ca­tors, tests and ongo­ing monitoring
  • Expec­ta­ti­ons:
    • Tests to ensu­re data qua­li­ty and the func­tion­a­li­ty of the AI appli­ca­ti­ons are plan­ned (inclu­ding a check for accu­ra­cy, robust­ness and sta­bi­li­ty as well as bias if necessary)
    • Experts pro­vi­de que­sti­ons and expectations
    • Per­for­mance indi­ca­tors for the sui­ta­bi­li­ty of an AI appli­ca­ti­on are defi­ned, e.g. thres­hold values or other vali­da­ti­on methods for asses­sing the cor­rect­ness and qua­li­ty of the outputs
    • Chan­ges in input data are moni­to­red (“data drift”)
    • if an out­put is igno­red or chan­ged by users, this is moni­to­red as an indi­ca­ti­on of pos­si­ble vulnerabilities
    • Super­vi­sors give pri­or con­side­ra­ti­on to the reco­gni­ti­on and hand­ling of exceptions

Docu­men­ta­ti­on

  • Pro­blem:
    • The­re are no cen­tral spe­ci­fi­ca­ti­ons for documentation
    • Exi­sting docu­men­ta­ti­on is not suf­fi­ci­ent­ly detail­ed and recipient-oriented
  • Expec­ta­ti­ons:
    • essen­ti­al appli­ca­ti­ons: the docu­men­ta­ti­on addres­ses the pur­po­se of the appli­ca­ti­ons, data sel­ec­tion and pre­pa­ra­ti­on, model sel­ec­tion, per­for­mance mea­su­res, assump­ti­ons and limi­ta­ti­ons, test­ing and con­trols, and fall­back solutions
    • Data sel­ec­tion: Data sources and data qua­li­ty checks can be explai­ned (incl. inte­gri­ty, cor­rect­ness, appro­pria­ten­ess, rele­van­ce, bias and stability)
    • Robust­ness, relia­bi­li­ty and tracea­bi­li­ty of the appli­ca­ti­on are ensured
    • Appli­ca­ti­ons are appro­pria­te­ly cate­go­ri­zed in a risk cate­go­ry (with cor­re­spon­ding justi­fi­ca­ti­on and review)

Explaina­bi­li­ty

  • Pro­blem: Results can­not be tra­ced, explai­ned or repro­du­ced and thus assessed
    Expec­ta­ti­onsPlau­si­bi­li­ty and robust­ness of the results can be asses­sed when making decis­i­ons vis-à-vis inve­stors, cli­ents, employees, the super­vi­so­ry aut­ho­ri­ty or the audit firm
  • Among other things, the dri­vers of the appli­ca­ti­ons or the beha­vi­or under dif­fe­rent con­di­ti­ons are understood

Inde­pen­dent review

  • Pro­blem:
    • The­re is no clear demar­ca­ti­on bet­ween the deve­lo­p­ment of AI appli­ca­ti­ons and their inde­pen­dent testing
    • few super­vi­sors car­ry out an inde­pen­dent review of the enti­re model deve­lo­p­ment process
  • Expec­ta­ti­ons:
    • essen­ti­al appli­ca­ti­ons” means an inde­pen­dent review that inclu­des an objec­ti­ve, expe­ri­en­ced and unbi­a­sed opi­ni­on on the appro­pria­ten­ess and relia­bi­li­ty of a pro­ce­du­re for a par­ti­cu­lar application
    • the results of the review are taken into account during development

Expec­ta­ti­ons (sum­ma­ry)

Over­all, FINMA’s expec­ta­ti­ons can be sum­ma­ri­zed as follows:

  • AI gover­nan­ce:
    • Respon­si­bi­li­ties and accoun­ta­bi­li­ties (AKV) for the deve­lo­p­ment, imple­men­ta­ti­on, moni­to­ring and use of AI are defined.
    • Inter­nal direc­ti­ves with spe­ci­fi­ca­ti­ons for ensu­ring data quality
    • Essen­ti­al appli­ca­ti­ons: Docu­men­ta­ti­on addres­ses pur­po­se of appli­ca­ti­ons, data sel­ec­tion and pre­pa­ra­ti­on, model sel­ec­tion, per­for­mance mea­su­res, assump­ti­ons and limi­ta­ti­ons, test­ing and con­trols, and fall­back solutions.
  • Cen­tral inven­to­ry with risk clas­si­fi­ca­ti­on and measures: 
    • AI” is defi­ned broad­ly enough
    • Com­ple­ten­ess of the inven­to­ry with risk clas­si­fi­ca­ti­on (with veri­fi­ca­ti­on and justification)
  • Data and model qua­li­ty:
    • Super­vi­sors con­sider in advan­ce how to reco­gnize and hand­le exceptions.
    • Spe­ci­fi­ca­ti­ons for model tests and sup­port­ing system con­trols, docu­men­ta­ti­on stan­dards and “broad trai­ning mea­su­res” exist
    • Tests to ensu­re data qua­li­ty and the func­tion­a­li­ty of the AI appli­ca­ti­ons are plan­ned (inclu­ding a check for accu­ra­cy, robust­ness and sta­bi­li­ty as well as bias if necessary).
    • Experts pro­vi­de que­sti­ons and expectations.
    • Per­for­mance indi­ca­tors for the sui­ta­bi­li­ty of an AI appli­ca­ti­on are defi­ned, e.g. thres­hold values or other vali­da­ti­on methods for asses­sing the cor­rect­ness and qua­li­ty of the outputs.
    • Chan­ges in input data are moni­to­red (“data drift”).
    • Data sel­ec­tion: Data sources and data qua­li­ty checks can be explai­ned (incl. inte­gri­ty, cor­rect­ness, appro­pria­ten­ess, rele­van­ce, bias and stability).
    • Robust­ness, relia­bi­li­ty and tracea­bi­li­ty of the appli­ca­ti­on are ensured.
    • If an out­put is igno­red or modi­fi­ed by users, this is moni­to­red as an indi­ca­ti­on of pos­si­ble vulnerabilities
  • Explaina­bi­li­ty:
    • Decis­i­ons vis-à-vis inve­stors, cli­ents, employees, the super­vi­so­ry aut­ho­ri­ty or the audit firm: plau­si­bi­li­ty and robust­ness can be assessed
    • Dri­vers of the appli­ca­ti­ons or beha­vi­or are understood
  • Out­sour­cingAddi­tio­nal tests and con­trols are imple­men­ted; the­re are con­trac­tu­al clau­ses that regu­la­te respon­si­bi­li­ties and lia­bi­li­ty issues; the neces­sa­ry skills and expe­ri­ence of the ser­vice pro­vi­der are checked.
  • Essen­ti­al appli­ca­ti­ons:
    • inde­pen­dent review inclu­ding an objec­ti­ve, expe­ri­en­ced and unbi­a­sed opi­ni­on on the appro­pria­ten­ess and relia­bi­li­ty for a spe­ci­fic application
    • Results are taken into account during development