Title: Safety Critical Integrity Assurance in Large Datasets

Author(s): Ali Hessami, Graham Sutherland

Publication Event: Proceedings of the Twenty-eighth Safety-Critical Systems Symposium, York, UK

Publication Date: 2020-02-11

Resource URL: https://scsc.uk/r1185.pdf

Abstract:

Historically, data types such as standing, configuration, and other data types have had to be proven correct before application in a safety-critical environment. Usually, this has been achieved by rigorous manual or automated checking and system testing before first use, and is feasible because the data sets are relatively small. However, a “safety by compliance” strategy for data does not adequately deal with sources of errors leading to accidents. As AI is based on the availability of huge quantities of data, such approaches become increasingly useless at scale. Three problems therefore must be overcome. First, by ensuring that large data sets contain sufficiently granular detail to correlate to events associated with identified accident potential or other rare events, and validated using appropriate principles; second, to assess whether related, but diversely sourced data sets could be cross-validated by identifying and quantifying the probability of encountering missing features in the data and, finally, to provide assurance that any capacity of an AI-driven function to incorrectly extrapolate from data within the existing data set is minimised. This paper is concerned with possible approaches to address these problems in greater detail.