The SCSC publishes a range of documents:
The club publishes its newsletter Safety Systems three times a year in February, June and October. The newsletter is distributed to paid-up members and can be made available in electronic form for inclusion on corporate members' intranet sites.
The proceedings of the annual symposium, held each February since 1993, are published in book form. Since 2013 copies can be purchased from Amazon.
The club publishes the Safety-critical Systems eJournal (ISSN 2754-1118) containing high-quality, peer-reviewed articles on the subject of systems safety.
If you are interested in being an author or a reviewer please see the Call for Papers.
All publications are available to download free by current SCSC members (please log in first), recent books are available as 'print on demand' from Amazon at reasonable cost.
Our world is subject to dramatic change: the war in Ukraine, climate change, the threat of new infectious diseases and a new monarch in the UK. Life is having to adjust to this new normal of ‘shock’ changes. The new applications in system safety are no less profound: air taxis, offshore grids, remote air traffic control centres, UK space launches, virtual hospital wards, battery-electric trains, self-driving vehicles ... the list is extensive and growing.
System safety practices and approaches must adapt to deal with these new applications and new technologies, especially in areas related to Artificial Intelligence and Machine Learning: object recognition and autonomous decision making, so crucial to new air, marine and road vehicles.
Most of us now cannot understand the complexity inside systems which ensure safety, and we cannot sensibly be relied upon to take over if the systems fail suddenly. Hence systems need to be both fail-safe, with multiple levels of resilience, and to be able to explain their decisions. Justification is everything.
Our horizons are expanding: everything we do now should be considered for impact on the environment, carbon release, wider human health and personal well-being.
Free and open-source software (FOSS) powers a huge range of critical systems and services, including internet infrastructure, cloud services and core financial systems. FOSS also dominates the realm of software development tooling, especially for DevOps and AI/ML applications. In the development of safety-critical systems, however, the use of FOSS has historically been treated as problematic, because open-source projects do not typically apply the formal software engineering processes (e.g. quality management, requirements specification and formal design) expected by safety standards.
FOSS can be certified as part of safety-critical products by establishing the ‘missing’ process evidence for the specific application. However, this purely ‘remedial’ approach to consuming FOSS can be counter-productive, especially if the ‘remedies’ are not shared by the product developers. It may also defeat one of the key benefits of the open-source model, which has already been demonstrated in the security domain: transparency as the basis of trust and continuous improvement. Furthermore, the use of increasingly complex, software-intensive systems in safety applications makes ‘traditional’ approaches to assurance increasingly difficult. Precisely specified behaviour for all execution paths may not be achievable, and much of the software involved in such systems (or in the toolchains used to construct and test them) may be pre-existing and generic, with complex or loosely-controlled supply chains for its dependencies.
We propose a new approach to consuming pre-existing software (both FOSS and proprietary) as part of safety-critical products and processes, using System Theoretic Process Analysis (STPA) to model the roles and responsibilities of software components in a given system, and to specify the system’s safety goals: how components contribute to them, how they may be violated, and what constraints must be met in order to prevent or mitigate such hazards. Using such a model as a system specification can provide a basis for verifying an integrated system, including fault injection scenarios to validate the effectiveness of verification measures and safety mechanisms, as well as providing detailed safety requirements for individual components. It may also be used to analyse the risks involved in consuming specific components — including those arising from their software development processes — and to specify how these risks may be addressed or mitigated. Open publication of such models for generic or anonymised systems could also enable organisations to collaborate in refining and validating the hazards considered and mitigations adopted for complex safety applications, such as those required for advanced automated driving solutions. This may provide a way to deliver the benefits of transparency to safety problems.
The development of the railways in Great Britain at the start of the 19th century was a time of great innovation in engineering, including infrastructure and rail vehicles under steam motive power. Unfortunately, it was also a time of some terrible accidents, including the fatality at the opening of the Liverpool and Manchester Railway in 1830 when Stephenson’s Rocket killed William Huskisson, and the Clayton Tunnel rail crash in 1861 where confusion over signalling led to 23 fatalities when one train ran into another in the tunnel. But, most importantly, lessons were learned from these dreadful events, leading to improved designs for locomotives and rolling stock, safer methods of working and operation, and new regulations and legislation. We are now at the dawn of autonomous road transport which offers the prospect of major benefits, not least in terms of a reduction in the number of road accidents and consequential fatalities. However, autonomous vehicles incorporate critical new technologies, such as machine learning, which could interact with unforeseen scenarios and manually driven traffic in unpredictable ways. This does mean that some residual accidents are inevitable. This paper considers a selection of railway accidents from the steam age and interprets them for the ‘autonomous age’, reading across to show how they can still be relevant and instructive.
Every safety-critical industry has its own legislation and standards which safety engineers have to consider in their demonstration that a system is safe. Often these standards overlap and conflict and they inevitably become out of date as technology and understanding advance. In addition, there are always trade-offs to consider between safety, environment, performance (and sometimes security). Choosing an approach for a project involves selecting standards and resolving these conflicts in a way which is acceptable to each of the parties whose approval the project needs to secure. This paper examines some of the challenges, drawing on the authors’ experience across multiple industry sectors. The paper then derives some principles for making suitable, informed choices at the outset of a project and for monitoring those choices as the project proceeds.
Autonomous systems are becoming more and more prevalent within industry. However, it is no easy feat to ensure their safety. Current safety approaches struggle to deal with the unprecedented levels of complexity introduced due to these new autonomous systems. More recently, System-Theoretic Process Analysis (STPA) was introduced to help solve some of these problems. However, real-world examples or practical reviews and guidance are still hard to find. To bridge this knowledge gap, this paper takes a critical look at STPA using the results of two case studies: an autonomous and a collaborative system. We present a loss and hazard list for autonomous mobile systems, alongside a more systematic method to structure certain steps within the analysis. Additionally, we reflect on the challenges that had to be overcome and highlight the differences between applying STPA on new systems, as opposed to applying STPA on existing systems. We highlight the importance of using the correct language / vocabulary and discuss how to build confidence in the results achieved by performing STPA.
To help AV companies to deploy safe, trustworthy autonomy, Edge Case Research proposes a new product – nLoop, which supports a new working model for building autonomous systems where the entire organization speaks the language of safety and measures progress continuously. nLoop’s live safety cases, requirements tracing, hazard tracking, and test coordination help teams achieve the goal of building AVs that are safe enough to deploy. At its core, nLoop supports the specification and management of structured safety cases. The validity status of the claims within nLoop safety cases may be evaluated by the evaluation of defined Safety Performance Indicators (SPIs), which, according to UL 4600, are metrics for assessing the system's safety performance. SPIs in nLoop are continuously evaluated based on the data coming from safety evidence providers (e.g., databases, issue tracking systems, external verification and validation tools) connected to the safety case via dynamic links, where the dynamic links allow the safety case to be aware of newly generated evidence both during design- and run-time. The evaluation of the SPIs enables the evaluation of the claims within the safety case, and, implicitly, the evaluation of the entire safety case. Consequently, safety cases in nLoop are ‘live’, keeping track of the status of the current safety performance of the overall system, given the available safety-relevant evidence. The validity status of the claims in the safety case may be used as feedback for system developers. Given the invalidation of a safety claim, system developers and operators may update the system so that the desired system safety performance is (re)established. System updates usually imply the generation of new safety evidence, which, via safety evidence providers, is used again for evaluating SPIs, thus creating a feedback loop between the safety case and the activities executed by system developers and operators.
On 24th March 1999 a transport truck caught fire while driving through the Mont Blanc tunnel between Italy and France. Other vehicles travelling through the tunnel became trapped and fire crews were unable to reach the transport truck. The fire burned for 53 hours and reached temperatures of 1,000°C producing toxic smoke. Authorities compounded the problem by pumping air from the Italian side, feeding the fire and forcing poisonous black smoke through the length of the tunnel. A total of 39 people were killed. In the aftermath, major changes were made to the tunnel to improve its safety. This paper analyses the accident from a Services perspective, examining which critical services were in use (fire alert service, tunnel control service, ventilation service, etc), and which contributed in some way .The Safety Critical Systems Club (SCSC) Service Assurance Guidance v3.0 is used to guide the analysis and provide structure to the work, producing a service hierarchy map, criticality levels and identification of assurance needs. The improvements put in place after the accident are assessed to see the effect on the service assurance involved.
Liability for defective systems arises in both criminal and civil law. There are no significant differences or exceptions to liability attaching to defective software within systems. The civil liability sets a higher bar than the criminal system: to put it another way if a manufacturer produces a product which is safe enough so as not to attract civil liability if it goes wrong, that manufacturer will certainly have no criminal liability if the product is nevertheless defective. Dai Davis, who is both a Chartered Engineer as well as a Solicitor will explain precisely what liability attaches to defective software and more importantly what a manufacturer or software author must do to ensure that a product (or the code) is “safe” from a legal perspective. He will explain in detail the civil legal wrong of Product Liability as well as looking briefly at the criminal law offences relevant to unsafe products. Finally, he will examine the issue of personal liability as opposed to corporate liability.
Systems composed of many interacting components are complex. This complexity varies with scale, the number and the type of active members. Complex system behaviour is intrinsically challenging to model due to the dependencies, competitions, relationships, or other interactions between their parts or between a given system and its environment. Historically, participants were users; in future systems, software agents will likely undertake these roles. Other properties might arise from agent relationships, such as nonlinearity, emergence, spontaneous order, adaptation, and feedback loops. This emergence of technological agents begs the question of how should we mitigate the potential safety effects of agent failures? How can we assure ourselves that an agent that takes an active role in a system is adequately safe? What if there are multiple interacting agents? This paper considers the use of ‘Engineered Agents’ and how they might change the activities undertaken by Systems and Safety Engineering. It addresses the nature of single-point and common-cause failures. How might these failure modes be addressed and assured for agent-based systems? Finally, the paper addresses architectural matters and indicates how those issues contribute to safety assurance and its associated approvals in technological agent-based systems.
Digital Twins (DT) are abstract, data-dependent and data-driven models. They are used to model the state of a physical component or system. This facilitates failure warnings and allows continuous improvement activities to be undertaken. As a result, DTs can be used as part of operational safety management. However, DTs also provide a tantalising opportunity to model whole systems before implementation, establishing a baseline model from which to identify safety and safety management issues relating to system realisation and operational safety management. They facilitate comparison between this model and reality as system realisation progresses. This study considers the use of DT in this role. Can the use of DTs mitigate issues that often delay the introduction to service? Can they be used early in the lifecycle to propose and check the credibility of ‘approval-in-principle’ documentation to ensure issues are discovered early rather than late in the project lifecycle? Benefits range from the application of change controls, impact assessment and the identification of interfaces and dependencies. This work sets out an approach to Systems and Safety Engineering using ‘Engineered DTs’ as part of safety assurance of current and future phases of a safety-critical project.
The aviation literature gives relatively little guidance to practitioners about the specifics of architecting systems for safety, particularly the impact of architecture on allocating safety requirements, or the relative ease of system assurance resulting from system or subsystem level architectural choices. As an exemplar, this paper considers common architectural patterns used within traditional aviation systems and explores their safety and safety assurance implications when applied in the context of integrating artificial intelligence (AI) and machine learning (ML) based functionality. Considering safety as an architectural property, we discuss both the allocation of safety requirements and the architectural trade-offs involved early in the design lifecycle. This approach could be extended to other assured properties, similar to safety, such as security. We conclude with a discussion of the safety considerations that emerge in the context of candidate architectural patterns that have been proposed in the recent literature for enabling autonomous capabilities by integrating AI and ML. A recommendation is made for the generation of a property-driven architectural pattern catalogue.
The use of drones has been hailed for many years as the next multi-billion dollar industry and technological disruptor. While swarms of drones do not yet blacken our skies, there is evidence that the use of drones is starting to gather pace; Royal Mail post and Covid-19 medical deliveries, the use of drones in the war in Ukraine, and even, the aerial lightshow at the Queen's Platinum Jubilee Party, show that drones are becoming more prevalent in the public consciousness. However, the main barrier to much more prolific drone use is the regulatory challenges in flying drones Beyond Visual Line of Sight (BVLOS). BVLOS is something of a "Holy Grail" for drone operations as it unlocks an immense number of novel business opportunities and means to radically improve efficiency and safety of existing operations. The UK is leading the way in BVLOS operations, with much continued government funding, but although regulations and means of compliance are starting to emerge for aspects of general BVLOS operations, the path to routine commercial operations remains challenging. The risks of BVLOS flights are currently managed by regulators on a case-by-case basis and those operations that have been approved are generally flying in remote geographies away from populated areas, or in segregated airspace specially provisioned for drone flights. This paper explores the challenges and practical realities for BVLOS operations in higher-risk locations, such as urban areas and in controlled and unsegregated airspace, and assesses how stakeholders across the entire ecosystem, from operators through to regulators, are progressing towards making these types of operation more routinely certifiable.
Ask a member of the public to think about ‘Nuclear Safety’ and it’s highly likely a number of emotive words and images will spring to mind – events like Three Mile Island, Chernobyl and Fukushima have touched thousands, tainted the industry’s reputation and contributed to years of slow progression of the technology. With nuclear power offering a viable solution to many of the planet’s most pressing concerns such as climate change, fossil fuel dependence and energy security, it’s time to change the narrative a little and think about how the industry has addressed the challenges raised by these significant events. In this presentation, Tom will provide an insight into how the nuclear industry rigorously assesses and implements changes based on operational experience, not only to physical designs but also to the way plants are organised and operated. Several fascinating case studies will be explored and explained from a technical perspective, as well as looking at how human factors influence safety.
Are safety assurance standards actually software engineering artefacts, part of the decomposition of organisational goals into software requirements and designs? Loosely speaking, aren’t they just software that is executed by an organisation rather than a computer? And if so, can we use software engineering methods to improve them? Software safety standards have a vital role in delivering safe products, services and systems. In critical systems, software failures can lead to significant loss of life, so it is especially important that such standards are well understood by their users. Yet, they are often verbose, lengthy documents written by committees; hard for the uninitiated to immediately digest and understand, and awkward to implement as written. This implies that the review process for such standards is not entirely effective. Building on the author’s MSc research at the University of Oxford, this paper examines how techniques from the domain of software engineering and allied fields can be used to improve the review of standards, potentially leading to better safety standards and safer systems. It presents a selection of potential techniques, evaluates the results of applying them to Def Stan 00-055, (the Ministry of Defence’s Requirements for Safety of Programmable Elements in Defence Systems), shows how they can be helpful, and discusses the practicalities of applying them to review of new and existing standards.
IEC 63187 is the new functional safety framework being developed by the International Electrotechnical Commission for the defence sector. In this sector, applications are typically complex systems, elements of which may themselves be both technically complex and managerially complex systems in their own right: developed by different suppliers, to different standards, and at different stages in their product lifecycles. Defence systems are also subject to dynamic changes of risk, depending on the context of their deployment. Existing safety standards are not well adapted to this level of complexity. They tend to be aimed at single organisations rather than complex hierarchies, and to focus on the failures of system elements, rather than important emergent properties of the overall system. The new international standard in development, IEC 63187, tackles these problems using modern systems engineering principles. It applies the ISO/IEC 15288 life cycle processes to supplement IEC 61508 and other safety standards, proposing an approach that allows requirements to be tailored to the risk and managed across multiple system layers. This framework is designed to be open, for compatibility with different national approaches to assurance and risk acceptance, and with different traditional standards for realisation of individual system elements. This paper discusses the motivation, principles and approach of IEC 63187 and gives an update of the progress of the drafting of the document through the standardization process.
A key goal of the System-Theoretic Process Analysis (STPA) hazard analysis technique is the identification of loss scenarios – causal factors that could potentially lead to an accident. We propose an approach that aims to assist engineers in identifying potential loss scenarios that are associated with flawed assumptions about a system’s intended operational environment. Our approach combines aspects of STPA with formal modelling and simulation. Currently we are at a proof-of-concept stage and illustrate the approach using a case study based upon a simple car door locking system. In terms of the formal modelling, we use Extended Logic Programming (ELP) and on the simulation side, we use the CARLA simulator for autonomous driving. We make use of the problem frames approach to requirements engineering to bridge between the informal aspects of STPA and our formal modelling.
This paper articulates a methodology for exploring systemic change: Making the Water Visible. Developed as the author attempted to make sense of the Grenfell Tower Fire in which 72 people lost their lives and used as a structure for her 2021 book ‘Catastrophe and Systemic Change: Learning from the Grenfell Tower Fire and Other Disasters’. The approach is codified in the hopes that others will further develop it to explore complex challenges and help us learn. It is as much a story of a complex journey through despair, grief, and sense-making as a method for making the water visible.
There is a great drive and incentive in industry to increase the level of automation in heavy-duty mobile machinery, but further progress is slowed down due to a lack of regulations and division of legal responsibilities, on top of the limitations of system capabilities in terms of reliability, maintainability, performance, and available technologies. In higher levels of automation, the operator is no longer in full control of the machine, and the machine itself becomes the controller. The newly emerging requirements for safety are not covered by existing standards leading to difficulties for manufacturers to embed a justifiable level of safety into their machinery. In this paper, we first provide a survey on relevant recent research efforts towards safer highly automated and autonomous systems. We then discuss the conformance process and emerging limitations of existing EU machine safety regulations in relation to an increase of automation in heavyduty mobile machinery. Guided by a clarifying example we then identify six topics in existing EU machine safety regulations, limiting the conformance of machinery a) run-time failures, b) algorithmic failures, c) convoluted architectural design patterns, d) data-driven intended behaviour, e) quality integration and f) formal verification limitations. We assert that reaching future compliance of highly automation and autonomous heavy-duty mobile machinery is achieved through overcoming the aforementioned limitations.
The authors have previously articulated the need to think beyond safety to encompass ethical and environmental (sustainability) concerns, and to address these concerns through the medium of argumentation. However, the scope of concerns is very large and there are other challenges such as the need to make trade-offs between incommensurable concerns. The paper outlines an approach to these challenges through suitably framing the argument and illustrates the approach by considering alternative concept designs for an autonomous mobility service.
When an autonomous system is deployed into a specific environment there may be new safety risks introduced. These could include risks due to staff interacting with the new system in unsafe ways (e.g. getting too close), risks to infrastructure (e.g. collisions with maintenance equipment), and also risks to the environment (e.g. due to increased traffic flows). Hence changes must be made to the local Safety Management System (SMS) governing how the system is deployed, operated, maintained and disposed of within its operating context. This includes how the operators, maintainers, emergency services and accident investigators have to work to new practices and develop new skills. They may also require new approaches, tools and techniques to do their jobs. It is also noted that many autonomous systems (for example aerial drones or self-driving shuttles) may come with a generic product-based safety justification, comprising a safety case and operational information (e.g. manuals) that may need tailoring or adapting to each deployment environment. This adaptation may be done, in part, via the SMS. This paper focusses on these deployment and adaptation issues, highlighting changes to working processes and practices.
The language we use when discussing risk plays a major role in how that risk is perceived. In this paper we present a discussion of risk-based language, illustrating how the word, sentence and structure choices we make can serve to obscure or exaggerate certain risks. We introduce the concept of a system’s narrative, which describes the context and history of this system, and explore how different narrative techniques can provide different perspectives on the risk posed by the system. Finally, we discuss metaphor, simile and language creativity, and whether there is a place for these in safety-critical analysis.
Automated Lane Keeping Systems (ALKS) are the first commercially available systems designed for passenger vehicles that will enable the driver to safely hand over control to the vehicle. This is made possible through certification to UNECE Regulation 157, the first Type Approval Regulation for automated vehicles. Combined with the adoption of UNECE Regulation 155 on Cyber Security the first steps have been taken towards safe and secure deployment of automated vehicles. Looking at these two topics we can see how vehicle regulators are rising to the challenge of certifying automated vehicles that will enable widescale commercial deployment.
Securing the operation of an out-of-context component can be extremely challenging. This is due to a number of reasons, not least that the context of use for the system component has a huge impact on the exposure of the system to specific risks. Many standards across multiple sectors focus on the role of system integrators and Tier 1 suppliers. But how should that security argument flow down to Tier 2 suppliers and below? Can Tier 2 suppliers be “intelligent suppliers”, providing security assurances that feed into hierarchical or modular assurance cases? In this paper we approach these questions, illustrating the challenges and proposed solutions using the use case of a safety-assured, automotive operating system. The automotive sector was selected because the cybersecurity standard (ISO 21434) demands that the security argument extend beyond safetyrelated security issues. An operating system is a highly versatile component that can have multiple contexts. The paper concludes that there are activities that lower-tier suppliers can do to support the integrated security/safety argument. These activities are then highlighted as potential requirements for system integrators, Tier 2 suppliers and below.
This talk examines the problems of unused, hidden or legacy data in software systems. For software that has evolved over many years, the purpose and intent behind the data items (e.g. constants in source files, initialisation data, and system configuration values) may be lost in the mists of time, with little or no supporting documentation. Sometimes this data is used within the system, sometimes not, and sometimes it should be used when it is not, and not used when it should be. This talk categorises types of Dead Data, Hidden Data and Change Data and explains how they are identified, named and managed. The naming and metaphorical imagery used is considered relevant, as it helps with identification and awareness in the minds of the developers charged with maintaining complex legacy software systems. There is an analogy with Dead and Deactivated code, a concept used in several standards and guidance including DO-178C.
This IEEE Standard provides guidance on the assessment and application of techniques and measures that can help reduce the risks associated with the interfering effects of electromagnetic disturbances on digital electronic systems, especially safety- or mission-related systems. When competently selected and applied, a set of such techniques and measures can provide the part of the evidence relevant to EMI required for justifying functional safety decisions and for compliance with functional safety standards (including all applicable parts of IEC 61508 Ed.2:2010 or functional safety standards that are based on IEC 61508). They can also provide part of the evidence relevant to EMI for medical/ healthcare systems for which risks are managed in accordance with ISO 14971:2007. This standard supports the adoption of adequate electromagnetic resilience engineering practices throughout the functional safety lifecycle by offering further guidance and practical advice on the application of risk management activities, including the techniques and measures set out in IEC 61000-1- 2:2016. While it is primarily intended to be used by those who have responsibilities for functional safety, the methodologies, techniques, and measures it describes can also be used for the reduction of other kinds of risks in any systems that employ electronic technology, such as security risks and non–safety- related risks (for example, risks to the operation of commercial IT systems).
The safety element out of context (SEooC) is described by ISO 26262 part 10. It addresses safety-related elements that are not developed in the context of a particular vehicle but rather with assumptions that have to be validated before integration into the final system. It aims at reducing the certification cost through modularization and reuse of element certification evidence. A complete safety case is needed for every release instead of just at the start of production (SOP) within agile product development that uses continuous integration and continuous deployment (CI/CD). So effective approaches to managing the safety cases are needed to fit into the CI framework. In this paper, we provide a practical approach that facilitates implementing the safety-critical applications as fragments of safety elements out of context (SEooC) and automating the merging of modular safety case fragments at the end to build the product line safety case. The approach shows a reduction in development costs. We design it to get fully automated SEooC integration and verification in modern CI/CD frameworks. We get an automated SEooC integration flow starting from integration in the CI process, passing through the verification of assumptions and the configurations, and ending by generating the safety case of SEooC. We build the approach into an embedded testing framework to verify the SEooC integration constraints and ensure the SEooC integrator follows the assumptions mentioned in the safety contract. Finally, the proposed approach leads to having a continuous automated generation of proof of compliance with the safety contract assumptions.
When we design systems, it is usual to scrutinise them for any safety issues of concern and ensure the critical components are sufficiently reliable, or to add other systems designed to protect against failures. Such safety-critical systems are in effect adding to the defences against system vulnerability to known scenarios, or “design safety cases”. But these additions inevitably make the systems more complex and their control more challenging. Understanding how they behave requires a system-wide model. This model must allow the observation of the possible non-linear, non-predetermined interactions and interdependencies, between subsystems, especially safety-critical ones, which can give rise to unforeseen, emergent, or resonant behaviours. These are often the cause of unexpected and unplanned disturbances in normal operations, which in turn, are normally worked around, but which can occasionally get out of hand and result in significant incidents. Currently the only complex system modelling approach which allows this systematic identification of such resonances, is Hollnagel’s Functional Resonance Analysis Method (FRAM). This allows us to pick up safety issues, but also to design-in and evaluate functions to learn from normal operations and to continuously improve the operability of the systems. But the real bonus is, that through this learning, we can utilise the memory, or database, to discern trends in patterns of behaviours, which could enable the anticipation of emerging problems and modify the responses proactively. So, this allows the incorporation of an extra dimension, of not just passive, reactive (imagined?) safety, but proactive operational resilience, for actual complex sociotechnical systems, in the real world. This paper sets out to illustrate this approach, by looking at the safety-critical systems in the Macondo Well Blowout accident.