Data validation in the railways

The Genesis

Metro de Malaga with Alstom Urbalis CBTC

Verifying railways systems covers many aspects and requires a large number of cross-verifications, performed by a wide range of actors including the designer of the system, the company in charge of its exploitation, the certification body, etc. Even if complete automation is not possible, any automatic verification is welcome as it helps to improve the overall level of confidence.

Indeed a railways system is a collection of highly dependent sub-system specification and these dependencies need to be checked. They may be based on railways signalling rules (that are specific to every country or even every company in a single country), on rolling stock features (constant or variable train size or configuration) and exploitation conditions.

In France, AQL RATP laboratory initiated the development of a generic tool to verify trackside data for the metro line 1 in Paris that is being automated. This tool, based on the PredicateB predicate evaluator, is able to parse data (XML, csv or text-based formats), load rules and verify that data complies with rules. Initially tested on line 13 configuration data, the tool has been able to check 400 definitions and 125 rules in 5 minutes.

Example of data definition and property

Figure 1: Example of data definition and property

In Fig 1, we see on the left a data (called E_a_trainDynamicDeparture_minimum_speed) that associates to a train (refered to by an integer index) its minimum speed (a floating point value). It is declared as a total function, indexes and minimum speed being reachable in an excel file (A7 containing the first index and AM7 the first minimum speed). On the right, a named property is described in natural and in mathematical languages. This property may refer to data previously defined.

However the PredicateB tool is just a calculator able to manipulate B/Event-B mathematical language predicates: it is not able to find all possible values for any non-deterministic substitution or to find all counter-examples. Moreover the way the errors are displayed may lead to difficult analysis when the faulty predicate is complex.

A false property and its graphical representation

Figure 2: A false property and its graphical representation

During the DEPLOY project, the University of Düsseldorf and Siemens Transportation Systems have elaborated a new approach, based on the ProB model checker to dramatically reduce validation duration from about six months to some minutes. Data is extracted from ADA source code and properties come from B models. In the case of the San Juan project, 79 files with a total of 23,000 lines of B are parsed to extract 226 properties and 147 assertions. The verification took 1017 seconds and led to the discovery of 4 false formulas. ProB was then experimented with great success on several projects: Roissy Charles de Gaule airport shuttle, Barcelona line 9, San Paulo line 4, Paris line 1 and Algiers line 1. At that occasion, ProB was slightly improved in order to deal with large scale problems and well validated in order to ease its acceptance by a certification body. However analysing false properties remains difficult. In Fig 2, a failed invariant is listed on the left (the one that is rewritten as false) while the counterexample is shown on the right (the values used for the data that lead to the breaking of the invariant).

 Data Table Validation Tool

Alstom Transport Information Solutions decided to experiment a new approach by reusing successful features of previous experiments.

A new tool, DTVT, is defined and implemented. Its structure is presented in figure 3.

DTVT tool structure

Figure 3 : DTVT tool structure

Input data is in csv format. Data items are identified through their container file and their name. For example, Curvatures_Cap!BeginValueCm refers to the variable BeginValueCm in the file Curvatures_Cap.xls (see figure 5).

Example of data declaration

Figure 4 : Example of data declaration

Supported basic types are INT, BOOL and STRING. Data items are sequences of these basic types. Values are extracted from xls files (see figure 5, the positions are expressed in centimeters).

Example of data valuation

Figure 5 : Example of data valuation

The verification rules are expressed using the B mathematical language and structured as B operations. Instead of having to deal with too large, quantified predicates, a verification rule is decomposed in small steps that allow displaying accurate error message helping to determine the source of the error. A rule is composed of one or several COUNTEREXAMPLEs. COUNTEREXAMPLEs are evaluated in the order they are defined. Keyword COUNTEREXAMPLE is followed by a formatted message (%1, %2, %3, etc. represent the value of the first, second, third parameter of the following ANY substitution).  The ANY substitution allows to filter data or to calculate values.

Example of a verification rule

Figure 6 : Example of a verification rule

In figure 6, the first rule computes the number of couples of the sequence ATC_Equipment_Type whose second element is the string “Trackside OMAP”. The ANY substitution is followed by an EXPECTED field. If some values of the parameters of the ANY substitution satisfy the predicate of that substitution but don’t satisfy the predicate of the EXPECTED field, the error message is displayed with its parameters instantiated.In figure 6, the error message of the second rule displays the value of urbalisSectorID (%1) and nb (%3).

ProB is the central tool for the verification. It has been modified in order to produce a file containing all counter examples detected (see figure 5) and slightly improved to better support some B keywords. DTVT has been experimented with success on several ongoing developments (Mexico, Toronto, Sao Paulo, and Panama) to verify up to 100,000 Excel cells against up to 500 rules. A first round allowed defining required concepts, intermediate constructs (predicates used by several rules) and formalizing a set of generic rules that are shared by all projects. During the next rounds, specific project rules and data files were added. A complete verification is performed in about 10 minutes, including the verification report. The process is completely automatic and can be replayed without any human intervention when data values are modified.


Figure 7 : Number of verified rules per metro



Add Comment Register

Leave a Reply

Post Navigation