Transportation Research Part F 66 (2019) 234–251
Contents lists available at ScienceDirect
Transportation Research Part F journal homepage: www.elsevier.com/locate/trf
Ontology-based adaptive testing for automated driving functions using data mining techniques M. Elgharbawy a,b,⇑, A. Schwarzhaupt a, M. Frey b, F. Gauterin b a b
Truck Product Engineering, Daimler AG, 70372 Stuttgart, Germany Institute of Vehicle System Technology, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany
a r t i c l e
i n f o
Article history: Received 1 October 2018 Received in revised form 3 March 2019 Accepted 24 July 2019
Keywords: Event-based time-series analysis Data mining Hierarchical agglomerative clustering Ontology-based test scenario synthesis Hardware-in-the-Loop co-simulation platform
a b s t r a c t This paper presents an adaptive verification framework for automated driving functions based on ontologies and data-mining techniques. Despite the recent rapid growth of driver assistance systems to consolidate road safety, they still have various challenges coping with dynamic traffic situations of daily life. Therefore, automotive systems engineering has established data- and knowledge-driven test methods to assure the required functional safety and reliability in a highly safety-critical context. However, the reliance on field testing is inadequate and, in particular, time- and cost-intensive when applied to the next generation of automated driving functions, e.g. collision-free emergency braking and vehicle platooning. The presented framework utilises an ontology-based test scenario synthesis to identify criticality margins using a Hardware-in-the-Loop co-simulation platform for automated driving functions. Additionally, we demonstrate a systematic process to complement virtual testing by extracting insights from field testing database using eventbased time-series analysis. To this end, data mining techniques are used to obtain representative scenarios witnessed in real-world traffic. Agglomerative hierarchical clustering is performed to extract homogeneous groups (clusters) from recorded triggering events by proximity metrics using normalised cross-correlations. Extracted scenarios are subsequently used at earlier stages of development to effectively and efficiently ensure reliability and safety. In summary, the results show the benefits and some of the challenges of using the industry-proven framework, which enables a cost-effective extension of test domain vaidility throughout software product engineering. Ó 2019 Elsevier Ltd. All rights reserved.
1. Background Passive and active safety systems are different from their evaluation methods. For passive safety systems, a standard assessment approach is well established to verify their behaviour with a reasonable number of crash test cases under certain worst-case conditions. In contrast, the safeguarding of active safety systems presents a set of challenges regarding the variety of relevant scenarios and environmental conditions, the complexity of the systems, the variability of driver behaviour and functional deficiencies. The verification approaches relate to an expected response, reference values or specified requirements. In the absence of explicit system criteria, the validation procedures concern whether the overall behaviour is adequate or safe enough. Objective and subjective evaluation criteria assist systematic, statistical errors or functional ⇑ Corresponding author. E-mail address:
[email protected] (M. Elgharbawy). https://doi.org/10.1016/j.trf.2019.07.021 1369-8478/Ó 2019 Elsevier Ltd. All rights reserved.
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
235
Nomenclature Acronyms ABox ACC ADC CEB DL ECU E/E Ego-vehicle HAC HiL KDD NCC N-FOTs NL RDF OWL SWRL SoP TBox THW TLC TTB TTC TTR UML XiL Definitions ADC-HiL
Assertional Box Adaptive Cruise Control Automated Driving Control Collision-free Emergency Braking Description Logic Electronic Control Unit Electrical and/or Electronic Automated vehicle under test Hierarchical Agglomerative Clustering Hardware-in-the-Loop Knowledge Discovery in Databases Normalised Cross Correlation Naturalistic-Field Operational Tests Natural Language Resource Description Framework Ontology Web Language Semantic Web Rule Language Start of Production Terminological Box Time Headway Time-To-Line-Crossing Time-To-Brake Time-To-Collision Time-To-React Unified Modeling Language Something (X)-in-the-Loop.
closed-loop test bench for the verification with integration of hardware and software components of automated driving systems extending to a specific E/E subsystem level Complete linkage smallest maximum pairwise distance between elements from each cluster. Complete linkage is less susceptible to noise and outliers Correlation measure of the linear relationship between the attributes of the objects Cluster Proximity numerical measure of the degree to which the two time-series are different Data Mining process of automatically discovering novel and useful patterns in large databases ENABLE-S3 European initiative to enable validation for highly automated safe and secure systems ISO 26262:2018 standard for determining the functional safety of E/E systems in series production road vehicles with an extended scope of application to motorcycles and commercial vehicles such as trucks and buses ISO 8855:2011 standard for defining the right-handed vehicle coordination systems ISO 22839:2013 test method standard for defining the required behaviours and test criteria of CEB L3Pilot European initiative to enable validation for highly automated safe and secure systems Off-tracking phenomenon when a vehicle turns, the rear wheels track inside the path traced by the front wheels Ontology-based NL notation semantic human and machine-understandable representation of knowledge terms and their inference rules Outlier anomaly value of an attribute that is unusual with respect of the typical values for that attribute PEGASUS research project for establishing generally accepted quality criteria, tools and methods as well as scenarios and situations Prototype-based Clustering clustering in which each time-series is closer to the prototype that defines the cluster than to the prototype of any other cluster SAE J3016 classification of automation levels from 0 to 5, whereby, level 0 means no drive automation, level 1 applies to assisted driving, level 2 for partial driving automation, level 3 for conditional driving automation, level 4 for self-driving vehicles and level 5 for driverless driving SafeMove research project for system validation of automotive radars using over-the-air target simulators
236
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
TAF-BW Time Headway Time-series Data Time-To-Collision Z notation Variables C1s C2s C3s C4s x
drel y
test area autonomous driving Baden-Wuerttemberg project for automated and interconnected driving of cars, buses and commercial vehicles required time for the Ego-vehicle to reach the position of the relevant object special type of sequential data in which each record is a time-series, i.e., a series of measurements taken over time required time at which a collision occurs if none of the vehicles changes velocity or direction formal specification language used for describing and modelling functional specifications.
homogeneous group (cluster) for triggered events of time-series data when driving on a left curve during coming close to a stationary object homogeneous group (cluster) for triggered events of time-series data when driving around a stationary object from the left-hand side homogeneous group (cluster) for triggered events of time-series data when driving around a stationary object from the right-hand side homogeneous group (cluster) for triggered events of time-series data when driving on a right curve during coming close to a stationary object relative longitudinal distance between the vehicle and the relevant object
drel
relative lateral deviation between the vehicle and the relevant object
DNCC E 1s
distance matric calculated from Normalised Cross Correlation algorithm first escalation level with visual and audible warnings in response to a stationary object
E 2s
second escalation level with haptic warning and partial braking in response to a stationary object
E 3s
third escalation level with emergency braking in response to a stationary object
jego
measured curvature of the predictive trajectory from the radar ECU of the Ego-vehice data set representing the implemented behaviour of an automated driving function coverage probability of equivalence classes for ontologies data set representing the required behaviour of an automated driving function data set representing the specified behaviour of an automated driving function required time to collision with a stationary object assuming the Ego-vehicle doesn’t change neither speed nor direction equivalence class of ontologies absolute longitudinal velocity of the Ego-vehicle absolute longitudinal velocity of the object vehicle
M P R S TTCs U
txego txobj
deficiencies of active safety systems. User-oriented assessment procedures are the current de facto standard for the validation of active safety and automated driving functions. These methods provide system performance metrics, e.g. confusion matrices for all possible system reactions with classification as intended functional interventions or unintended side effects. In this scheme, the evaluation of software releases must be carried out in various phases up to the SoP. These phases could be carried out initially by driving simulators, followed by XiL technologies, controlled test tracks and N-FOTs. For testing, a clarification of the application limits of simulative safeguarding has to take place. Additionally, the reasoning and the decisionmaking process for selection of the respective appropriate or necessary test method (simulation/laboratory, proving ground or field testing) for different scenarios and their interaction with other test methods need to be elaborated (Elgharbawy, Bernier, Frey, & Gauterin, 2016). Consequently, new innovative approaches need to be established, especially in simulation and laboratories to allow development of new market-ready products in a short time span, at low cost through highly efficient testing. 2. Related work Current research efforts address limitations of knowledge-driven approaches by, among others, exploring new methods for the generation of test cases (Elgharbawy, Schwarzhaupt, Frey & Gauterin, 2019), utilising ontologies to integrate expert knowledge into modelling traffic scenes and improving extensibility by providing methodologies for specifying traffic scenarios abstractly (Elgharbawy, Arenskrieger, Schwarzhaupt, Frey, & Gauterin, 2019). Similarly, limitations of N-FOTs are addressed by, developing defined processes to enable synthetic testing in a closed-loop setting and new methods to accelerate stochastic evaluation based on recorded driving data. Schuldt, Menzel, and Maurer (2015) propose the use of equivalence classes, boundary value analysis and combinational methods for identifying representative driving scenarios (Elgharbawy, Schwarzhaupt, Scherhaufer, Gut, & Frey, 2019b). The proposed approach of Schuldt et al. provides a systematic generation of test cases, but lacks a method to determine a meaningful test coverage. Schuldt et al. motivate a scenario-based test process and present a systematic test case generation by use of a four layer abstraction model. Wachenfeld (2017)
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
237
proposes the use of stochastic methods for introduction of automated driving, where safety of automated driving cannot be proven statistically using accumulated kilometres of physical on-road testing under consideration of an estimated uncertainty. Bach, Otten, and Sax (2016) introduce a model-based specification of driving scenarios with the example use case of an ACC system based on the abstraction of temporal and spatial information. Otten et al. (2018) extend the modelbased scenario specification with an automated assessment and evaluation concept for stochastic digital test drives. To speed up the required N-FOTs of automated vehicles in car-following scenarios, Zhao, Huang, Peng, Lam, and LeBlanc (2018) propose an accelerated evaluation method using stochastic optimisation and importance sampling methods (Zhao, 2016).
3. Methodological framework The conversion from driver assistance systems of levels 0,1 and 2 to higher levels of automation in accordance with SAE J3016 represents a new challenge for type-approval of automated commercial trucks. The main difference is that driver assistance can have unintended interventions, where the driver can override these interventions at any time if functional limitations appear. Their functions are therefore designed to be controllable, but this can reduce their benefits. The controllability of system interventions and the effectiveness in the field with minimal undesired consequences are therefore decisive for the series development of these driving functions. Moreover, long-haul commercial vehicles are heavier, larger and less maneuverable than passenger cars. Commercial truck characteristics (e.g. dimension, low-speed transient off-tracking, braking distance, type variability, etc.) pose therefore new challenges for automated driving functions (Elgharbawy, Schwarzhaupt, Scherhaufer, Gut, & Frey, 2019a). Accordingly, systems engineering requires state-of-the-art evaluation procedures to verify and validate these systems. N-FOTs are carried out to define thresholds for intervening systems, based on the collected data. On the one hand, trigger algorithms can be optimised to minimise the frequency and impact of falsely triggered interventions and, on the other hand, to maximise the number of legitimate responses. Nevertheless, automated driving requires the system to exploit the limits of dynamic driving tasks and to master most environmental conditions controlled by a human driver. The ISO 26262:2018 standard extends the functional safety regulations of E/E systems for heavyduty commercial vehicles. However, the safety standard is limited to avoiding potentially safety-critical situations caused by systematic software and random hardware failure. Safety violations due to technological and system-technical deficiences remain outside the scope of ISO 26262:2018. In particular, automated driving without driver monitoring can also lead to potentially safety-critical situations resulting from deficiencies in the estimation, interpretation and perception processes. While there are, at present, no generally accepted test procedures that enable automated driving functions to be validated with affordable efforts, onging research projects (e.g. PEGASUS, L3Pilot, ENABLE-S3, SafeMove and TAF-BW) show the relevance of research for new test methods. For this reason, the primary question is: How can automated driving be efficiently and effectively verified to achieve the required test completion criteria (Elgharbawy, Scherhaufer, Oberhollenzer, Frey, & Gauterin, 2019). This paper describes a modular framework that provides the verification process of perception sensors, decision algorithms and functional robustness. The modular framework represents an ADC-HiL co-simulation framework that enables the functional verification of automated driving functions precisely and efficiently on the target ECU in the laboratory. The proposed procedure offers an optimised test strategy for the systematic extension of the requirements-based test coverage resting upon a modular verification framework with continuous knowledge enhancement from field observations, as represented in Fig. 1. The framework offers an efficient compromise between the requirements of simulation realism and the real-time performance of the simulation environment. In this scheme, the processing chain includes hierarchical clustering of time-series triggering events to identify and assign the necessary test cases for different appropriate test environments. In addition, the structure employed utilises a backend database that is filled with catalogues of relevant driving scenarios from fieldbased observations. Using an ontology-based method, a category of adequate and relevant scenarios for existing field tests is extracted. A semantic representation of corner case scenarios can be obtained using data mining techniques, and systematically processed in requirements for adaptive test coverage (Elgharbawy, Schwarzhaupt, & Frey, 2019). There are numerous steps in the interactive and iterative data mining process to continuous monitoring and learning from field-based observations, summarised as: (I) Selecting time-series variables from object list sensor dataset to identify driving situations. (II) Data preprocessing involves basic functions such as removing noise or outliers and searching for relevant patterns with relevant time intervals. (III) Data projection includes the identification of appropriate similarity metrics and data normalization for the semantic comparison of the recorded driving situations. (IV) Data clustering comprises the identification of suitable clustering methods and the determination of the optimal number of clusters. (V) Data regression consists of identifying suitable characteristic signal prototypes for each cluster. (VI) Scenario parsing in de facto standard format and use of ontology-based transformation rules to synthesise closed-loop variables from open-loop signal prototypes. (VII) Verification on an ADC-HiL framework and evaluation of test cases based on envelope components of pass and fail criteria.
238
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
Fig. 1. Overview of the coverage-driven functional testing concept with its main components.
These new test cases then complement the existing test cases, which were developed from expert knowledge with natural language based statements, in an adaptive test coverage manner. The proposed framework therefore contributes to a potential trade-off between the efficiency and effectiveness criteria of a coverage-driven verification concept. 4. Functional requirements engineering notations Requirements engineering refers to the process of eliciting, specifying and evaluating the desired behaviour of a softwareintensive system. The functional requirements form the backbone of a comprehensive technical understanding of the developed system. Such requirements, therefore, need to be unambiguous and understandable to allow an external testing organisation to perform independent tests of the system. On the one hand, the natural language based requirements can be engineered in a straightforward way without explicit knowledge of the syntax. On the other hand, the model-based requirements facilitate the clarity of a complex software product and enable a simplified representation of the system with diagrams or axioms. The approaches of requirements management can essentially be divided into five notation types by using NL and model-based notations as follows: 1. Ad-hoc NL notation is a free writing style that offers a high degree of flexibility in specifying requirements, but at the same time leads to ambiguity and a lack of expressiveness, completeness and consistency. 2. Structured NL notation has less potential for ambiguity than an ad hoc informal approach by requiring the use of NL in a structured format. 3. Ontology-based NL notation uses a formal description language to describe an ontological knowledge base with a glossary of terms and relationships specified by a set of rules. The ontologies support the verification of the consistency and completeness of the requirements. 4. Graphical notation, based on finite state machines and sets using UML, represents use cases and/or sequence diagrams. The graphical notations have a precise syntax and a defined semantics. The graphical notation facilitates the clarity of a complex software product and enables a simplified representation of the system with diagrams or axioms. In spite of improved human readability, it may not be suitable for large systems where the graphical notations are not auditable and are not easily maintainable and modifiable.
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
239
Fig. 2. Comparable evaluation of common notation types used in functional requirements management.
5. Mathematical notation provides a formal machine-readable format based on a set theory with Z notation specification. The Z notation is a mixture of formal, mathematical statements and informal text. The formal mathematical statements give a precise description of the system. The informal text describes in NL the meaning of the mathematical statements to make the specification more readable. Large systems with a complex domain may not be easily specified in the Z notation, where formal specification is probably not readable and understandable to the client. Fig. 2 refers to a comparable evaluation where the scale ranges in requirement management notations from poor to optimal. The development of functional requirements presents a joint process between the client and the contractor, in which the technical knowledge of the client and the software development competence of the contractor become accessible. Therefore, the formalization of functional requirements has the following ten quality characteristics: human readability, correctness, expressiveness, completeness, unambiguity, traceability, consistency, verifiability, maintainability and usability. Therefore, the process of adaptive test coverage relies on structured NL and ontology-based NL notations in a complementary way to effectively and efficiently verify automated driving functions. The knowledge management process serves to close the gap between knowledge- and data-driven strategies by supplementing scenario catalogues of structured NL notations derived from knowledge experiences with ontology-based NL notations from field-based observations. 5. Data mining techniques Data Mining is an integrated KDD process for automatically identifying useful information in big data repositories. Data mining techniques are typically applied to retrieve novel and useful patterns from large databases. Therefore, the data mining process involves several transformation steps, from pre-processing object list data to post-processing data mining results. In accordance with the circular buffer concept, representative driving situations in N-FOTs at object list level can be separated from continuous recording in the form of time-series of environmental perception sensor data. The triggering events can be individually defined in the data logger depending on the ECU reaction, so that the event-based recordings are transported to the databases for 10 s before and after the triggering conditions. The recordings include vehicle bus system data, sensor object lists and surveillance video streams for data analysis purposes. For example, a CEB function recognises a road sign gantry as a relevant stationary object during highway operation that leads to unintended partial braking. 5.1. Interval analysis of time-series data Time-series analysis allows the extraction of representative situations observed during on-road test drives. The data consists of processed object lists of environmental perception sensors that change over time. Time-series data refers to a specific type of sequential data, where each data set represents a time-series, i.e. a sequence of values that change over time. The outliers represent values of time-series data that have some characteristics that differ from most other time-series in the data set. If outliers are filtered out by the time-series signals, the rolling standard deviation method quantifies the degree of variation of each value in a time-series and selects the optimised time interval around the triggering event. The low standard deviation indicates that the time-series tends to be closed to the mean, while high standard deviation implies that the time-series tends to be spread over a wider range of values. The determination of the optimised time interval depends on the assumption that time intervals with high variations of the sensor values are more representative for an efficient cluster
240
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
analysis of time-series signals. Therefore, the interval around the trigger event is selected with a higher rolling standard deviation than other intervals. The Eq. (1) represents the normalised rolling standard deviation of each point zðai Þ of the timePw1 1 series A ¼ fai gm1 i¼0 to determine the corresponding time intervals of the measured sensor variables, where aiþj ¼ w j¼0 aiþj indicates the rolling mean and w denotes the rolling window.
zðai Þ ¼
1 w
Pw1
iþj aiþj a qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pw1 2 ffi 1 j¼0 aiþj w2
2
j¼0
ð1Þ
We have achieved a significant reduction in search space from various possible time intervals using rolling standard deviation. However, the subset of the likely time intervals must be examined to find the time intervals with the best results. 5.2. Proximity matrix selection The proximity between two time-series signals is a numerical measure of the degree to which the two signals are equal. Proximities are usually not negative and are often between 0 for no similarity and 1 for full similarity. Time-series proximity can be performed with the Minkowski distance metric, as in the Eq. (2) to compare the original time-series with each other, where r ¼ 1 for the distance from Manhattan (L1 norm), r ¼ 2 for Euclidean distance (L2 norm) and r ¼ 1 for Supremum distance (L1 norm).
Dr ðA; BÞ ¼
m 1 X
!1r r
j a i bi j
ð2Þ
i¼0
Although the NCC is often used in image and signal processing for template matching, similar traffic situations can be effectively described by similarities in sensor measurements with time shifts and sliding windows. Therefore, the NCC provides as a suitable measure for the proximity in the time-series analysis of the measurements from the environmental perm1
ception sensors. Eq. (3) refers to the NCC calculation between two different time-series A ¼ fai gm1 i¼0 and B ¼ fbi gi¼0 with a time shift S j 2 fS mþ1 ; . . . ; S m1 g and a time interval j 2 fm þ 1; . . . ; m 1g. S
j DNCC ðA; BÞ ¼
X ðai a Þ biþj b 1 m1 ; 8 i þ j 2 ½0; m 1 m i¼0 ra rb
ð3Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pm1 Pm1 1 1 ¼m 2 The mean and standard deviation of the time-series A indicate a i¼0 ai and ra ¼ i¼0 ða aÞ . The mean and m qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1 Pm1 bi and r ¼ 1 Pm1 b b 2 respectively. Based on the Eq. (3), standard deviation of the time-series B give b b i¼0 i¼0 m m S
j shows a value between ½1; 1 for a certain time shift j 2 fm þ 1; . . . ; m 1g, where 1 denotes a complete disthe DNCC similarity and 1 denotes a perfect match. Eq. (4) searches for a shift with the maximum proximity between the two given time-series over all possible shifts and converts the selected shift into a distance DNCC . The computed distance is then defined in the range ½0; 1, where 0 indicates a minimum distance for perfect match and 1 a maximum distance for complete dissimilarity.
DNCC ðA; BÞ ¼
1 Sj 1 max DNCC ðA; BÞ ; 8j 2 ½m þ 1; m 1 j 2
ð4Þ
5.3. Time-series clustering Cluster analysis refers to an unsupervised classification to group the time-series data based on the information retrieved in the data that describes the time-series signals and their relationships. The time-series within a cluster should be similar to each other and different from the time-series in other clusters. Therefore, the more similarity within a cluster and the more dissimilarity between clusters, the more distinct the clustering. In prototype-based clustering, a cluster consists of a set of time-series in which each signal is similar to the prototype describing the cluster than to the prototype of another cluster. The Algorithm 1 desribes the calculation of agglomerative hierarchical clustering f AHC that generates hierarchical clustering by starting with each element as a singleton cluster and then recursively merging the two nearest clusters until a single cluster remains. The choice of the appropriate matrix and the corresponding linkage criterion plays a major role for the Algorithm 1 to effectively update the proximity matrix during the merge process. Therefore, a complete linkage for the Algorithm 1 is chosen to be less susceptible to noise and outliers. For complete linkage of hierarchical clustering, the proximity of two clusters is defined as the maximum distance or minimum similarity between any two time-series signals in the two different clusters.
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
241
Algorithm 1. Agglomerative hierarchical clustering algorithm n oT ; DNCC , Complete Linkage) 1: procedure f AHC (C ¼ CðtÞ 2: 3: 4:
while length(C) > 1do
t¼1
Cð1Þ ; Cð2Þ max DNCC CðiÞ ; CðjÞ ; 8 CðiÞ – Cð jÞ ðiÞ ðjÞ C ;C 2C n o n o C = C n Cð1Þ ; Cð2Þ [ C ð1Þ [ C ð2Þ
5: return C 6: end while 7: end procedure
6. Collision-free emergency braking function The CEB function represents an automated driving function in the forward control of the Ego-vehicle by using a detection unit to measure the distance from an object in front of the Ego-vehicle, as illustrated in Fig. 3. A visual and an acoustic alarm are emitted as a warning indication in a first escalation stage E 1s , automatic partial braking is carried out as a haptic alarm in a second escalation stage E 2s , and full braking is carried out as a braking process in a third escalation stage E 3s . The emergency braking operation is initiated to achieve a predetermined target safety distance between the vehicle and the preceding object after the automatic braking operation is completed. False interventions of the automatic emergency braking are avoided or at least significantly reduced by the detection of edge structures and boundary objects of the road (e.g. guideposts, crash barriers and traffic signs), that is usually dependent on the type of road. Moreover, the truck’s braking and driving dynamics differ from those of a passenger car. The pneumatic brake system has a time delay and slower response than the hydraulic brake system in passenger cars. Therefore, the load condition influences both the dynamic stability and the braking dynamics, whereby the difference between full and empty weight is very large.
Fig. 3. Schematic diagram for a CEB function with its escalation stages for collision-free trajectory planning using the example of braking on a stationary object (Trost & Zomotor, 2015).
242
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
7. Cluster analysis from N-FOTs Hierarchical clustering is often graphically represented in a clustermap consisting of a dendrogram and a headmap. The dendrogram is a tree-like diagram that shows the cluster-subcluster relationships and the order in which clusters are merged. The headmap represents the proximity matrix after the merge operations. The cluster analysis efficiency depends on the determination of the relevant time-series, the appropriate proximity matrix and the corresponding clustering algorithm. Therefore, the knowledge discovery platform requires a quantified selection of the time-series signals of the available sensor object lists based on expert knowledge. For stationary objects, two object list variables measured by the radar ECU are y selected to characterise the event-based driving situations, namely the lateral deviation drel of the relevant stationary object and the road curvature jego . Fig. 4 shows a clustermap to visualise the hierarchical clustering of time-series events using complete linkage within four clusters of unlabeled trigger-events caused by stationary objects with only E 1s and E 2s . No triggering events were recorded for the escalation level E 3s of the entire N-FOTs campaign. The well-separated clusters show a very strong, block-diagonal pattern in the reordered proximity matrix. The cluster C1s displays 183 driving situations, in which the Ego-vehicle triggers an unreasonable intervention in a left turn due to an irrelevant obstacle at the right lane. The cluster C2s indicates 51 driving situations, in which the Ego-vehicle triggers a false-triggered intervention with an unre-
Fig. 4. Clustermap of reordered proximity matrix using agglomerative hierarchical clustering with complete linkage of 350 time-series events caused by stationary objects.
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
243
lated obstacle located on a left traffic island. The cluster C23 collects 31 driving situations, in which the Ego-vehicle triggers a false-triggered intervention with an unrelated obstacle located on a right traffic island. Eventually, the cluster C4s represents 85 driving situations, in which the Ego-vehicle triggers an inappropriate intervention in a right turn due to an irrelevant obstacle at the left lane. Subsequently, data regression implies the identification of suitable characteristic signal prototypes for each cluster. Fig. 5 y represents the signal prototypes of selected sensor data variables. The cluster C1s indicates a decreasing lateral deviation drel of the stationary object in front, which signals that the Ego-vehicle is moving to the right-hand side. The driving in a lefthand curve is also evident from the increasing value of road curvature. Therefore signal prototypes of cluster C1s represent driving towards a stationary object in a left turn. Cluster C2s shows events where the Ego-vehicle has driven to the right and subsequently to the left, basically driving around the stationary object from the left-hand side. As a result, signal prototypes of the cluster C2s show driving around a stationary object from the left-hand side. Cluster C3s demonstrates Ego-vehicles drove around a stationary object from the right-hand side. Cluster C4s indicates events where Ego-vehicles are driving in a right turn.
y
Fig. 5. Extracted signal prototypes of lateral deviation drel and road curvature
jego for each cluster of 350 time-series events caused by stationary objects.
8. Criticality assessment metrics The scenario-based test concept emphasises testing particularly critical traffic scenarios. Most typical traffic situations are not regarded as particularly dangerous and therefore contribute relatively less to the proof of safety. Thus, the identification of critical situations requires indicators that quantify the criticality of the current or near-future traffic situation (Schreier, 2017). ISO 22839:2013 applies CEB criticality assessment metrics to provide a certain ratio of false positives and false negatives for the functional assessment. Deterministic TTR indicators are used which describe the remaining time until critical event happens, such as TLC; TTC and TTB. Other deterministic indicators use driving dynamics parameters and physical capabilities of the truck to assess criticality, such as the required braking acceleration to avoid dangerous traffic situations. The shortcoming of these indicators is that their basis is the assumption of the trajectory prediction. As a result, small changes in motion prediction can lead to significantly different results (Wachenfeld, Junietz, Wenzel, & Winner, 2016). While THW and TTC are often used in criticality assessment, pass/fail envelopes depend on those criticality parameters (Elgharbawy, Elsayed, Birlet, Frey, M., & Gauterin, 2018). Eq. (5) presents the mathematical description of THW.
244
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251 x
THW ¼
drel
t
x ego
; 8
txego > 0
ð5Þ
Eq. (6) describes a mathematical description of the TTCs for each escalation stage (E 1s ; E 2s and E s3 ) in the case of standing obstacles. x
TTCs ¼
drel
t
x rel
; 8
txobj ¼ 0
ð6Þ x
Fig. 6 depicts the development of TTCs compared to the relative distance drel with a stationary object located in front of
the Ego-vehicle with different longitudinal velocities t for the various escalation levels (E 1s ; E 2s and E 3s ) in a straight-road scenario applied to the ADC-HiL platform. If the braking cascade of the CEB controller is triggered, the driving scenario can be evaluated using the TTCs calculation below the limit value of 0:9 s. The CEB intervention at various escalation levels causes the Ego-vehicle to change finally to the stationary state. At low distances, the TTCs is suddently increased to 1 when vxrel goes to 0. Fig. 7 represents a parameter identification of TTCs with a stationary object in front of the Ego-vehicle with different longitudinal velocities for the Ego-vehicle txego on the various escalation levels (E 1s ; E 2s and E 3s ). The results are based on synthetic driving data within the simulation coordinate systems of a monocular vision camera as well as a RADAR sensor model, which are applied to the ADC-HiL platform (Elgharbawy, Schwarzhaupt, Scheike, Frey, & Gauterin, 2016). Beyond logical software errors, the TTCs parameters of triggered events from N-FOTs can be correlated with the pass/fail envelopes obtained from the ADC-HiL to identify the criticality of each triggered event. The snapshots of clustered events C4s at txego ¼ 17 km=h and C1s at x ego
txego ¼ 45 km=h show examples of more critical events. However, the snapshot of clustered event C4s at txego ¼ 35 km=h shows an example of parameter matching with the obtained pass/fail envelopes to be less critical according to the required functional behaviour.
x
Fig. 6. TTCs calculation related to the relative distance drel with a stationary object located in front of the Ego-vehicle at different velocities using the ADCHiL platform.
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
245
Fig. 7. Criticality analysis of triggered events from N-FOTs in an example of 3 triggered events based on TTCs parameter identification with a stationary object (Elgharbawy, Scherhaufer, Frey, & Gauterin, 2018).
9. Ontology-based scenario synthesis In software engineering, an ontology is defined as the explicit specification of a conceptualisation as an abstract simplified representation of some selected parts of an application domain. Therefore, the ontology can be considered as a knowledge base consisting of a TBox and an ABox. While TBoxs describe the concepts of an application domain, these concepts are expressed by hierarchical classes, axioms and properties. However, ABoxs represent instances of classes and observed facts of situative knowledge (Mohammad, Kaloskampis, Hicks, & Setchi, 2015). While knowledge representation in an ontology is based on DL, the OWL is a de facto file format used for storing ontologies based on the RDF format. Automated driving involves the use of ontologies in various applications, especially in situation assessment, scene understanding and behavioural planning (Bagschik, Menzel, & Maurer, 2018). Armand, Ibanez-Guzman, and Zinoune (2017) describe the application of ontologies to model interactions based on spatio-temporal relationships between road users and infrastructure. The sensor data are employed as ABoxs of an ontology to develop a human-like understanding of scenes. The scene understanding relies on object tracking, map data and the dynamic states of the Ego-vehicle. Behaviour rules are stored in the SWRL to infer knowledge from the TBoxs to a given ABox from the sensor data. Ulbrich, Nothdurft, Maurer, and Hecker (2014) propose an environmental model derived from a knowledge base with hierarchical classes and relations between the entities. The environmental model is updated by sensor data and utilised for online decisions. Geyer et al. (2014) propose nomenclatures of a unified ontology for generating test cases and scenario catalogues. However, Geyer et al. describe that each scenario catalogue shall have its own nomenclature and concepts for knowledge organisation. Fig. 8 shows an ontology-based test scenario synthesis based on knowledge discovery based on triggered events from NFOTs. The systematic test case generation leads to concrete scenarios and test cases based on a generic model consisting of four layers of scenario description. The first layer belongs to road geometry, the second layer to static objects, the third layer to dynamic objects and finally the fourth layer to weather conditions. The test cases are also deduced from the functional specification, which results from the top-level requirements and the use cases, whereby the use cases are also inferred from the top-level requirements. The test cases are executed on an ADC-HiL co-simulation platform. The ontology combines relevant entities in natural language and assigns a formal order through conceptualisation. Irrelevant entities or relations are excluded for the traceability of parameter changes associated with an ontology-based scenario synthesis. The ontology provides a method to derive possible
246
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
Fig. 8. coverage-driven test concept with systematic test case generation based on field-based observations.
observations ABoxs from modeled knowledge TBoxs. Sebsequently, ontology-based scenario synthesis can be converted into de facto simulation data formats (e.g. OpenDRIVE, OpenSCENARIO, etc.). The identification of situations from N-FOTs using data mining techniques offers a valuable solution for the generating simulated test scenarios to extend the validity of test coverage of automated driving functions more cost-effectively. Although the corner case scenarios occur rarely, the ontology-based approach provides the relevant parameter space from open-loop sensor signals to ensure the validity of the generated test cases. Fig. 9 illustrates a logical scenario from cluster C1s , which presents 183 events with 52:29% in total of 350 triggered events. The ontology uses a ”consists of” statement to model the elements of a road network layout with two lane classes and one class for a hard shoulder. The statements ”has right neighbour” and ”has left neighbour” are used to arrange road elements to each other. The position instances are generated on the basis of a relation ”offers position” for the ontology road elements. The statements ”left of” and ”right of” are utilised to arrange the position instances with logic reasoning. The statements ‘‘driving on” and ‘‘located on” are employed to control the dynamic objects with different position instances. The object obj1 is defined as a stationary object on the hard shoulder. The SWR is implemented in the ontology for each logical scenario, that allows SWRs to combine logic operators into rules. Therefore, invalid or forbidden combinations are eliminated from the scenario catalogue. Fig. 10 illustrates a logical scenario from cluster C4s , which presents 85 events with 24:29% in total of 350 triggered events.
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
Fig. 9. Logical scenario synthesis of cluster C1s when driving on a left curve during coming close to a stationary object with 183 events.
Fig. 10. Logical scenario synthesis of cluster C4s when driving on a right curve during coming close to a stationary object with 85 events.
247
248
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
10. Adaptive partitioning of equivalence classes The adaptive verification approach requires the probability P that a CEB function executes a functional scenario catalogue F. Since the functional scenario ‘‘A 1 ¼ turn” presents clusters C1s and C4s , this probability is determined by ‘‘A 1 ¼ P ðFjturnÞ”. If the clusters C2s and C3s have to be considered, the ontology has to expanded with the functional scenario ‘‘A 1 ¼ turn around”. Eq. (7) presents the overall probability of fulfilling the functional scenario catalogue extracted from cluster analysis of triggered events for a CEB function.
P t ¼ P ðFjturnÞ P ðturnÞ þ P ðFjturn aroundÞ P ðturn aroundÞ
ð7Þ
Fig. 11 illustrates a logical scenario from cluster C2s , which presents 51 events with 14:57% in total of 350 triggered events. The statements ‘‘left around” and ‘‘right around” are utilised to arrange the position instances with logic reasoning. Fig. 12 illustrates a logical scenario from cluster C3s , which presents 31 events with 8:86% in total of 350 triggered events. In general, a logical scenario can be defined as a combination of one equivalence class for each dimension P P ðLjU i Þ:P ðU i Þ for U ¼ ½U 1 ; U 2 ; . . . ; U n . The overall probability of logical scenarios can be generally formulated as P ðLÞ ¼ QU n n dimensions, where the number of scenarios results from the equivalence classes of the dimension i¼0 i. 11. Conclusions and future work The process of knowledge discovery is used to identify adaptive scenario catalogues from field-based observations and knowledge experiences. The ontology-based method is identified to extract a category of adequate and relevant scenarios for existing N-FOTs using semantic representation of corner case scenarios. These are to be obtained using data-mining techniques and consequently systematically transformed into requirements-based test coverage. The proposed concept aims to bridge the gap between knowledge- and data-driven test approaches to enable continuous extensibility of experience in an adaptive test coverage manner, as demonstrated in Fig. 13. The deductive gap between required, specified, and implemented behaviours refers to the use of invalid hypotheses at different levels of abstraction that cause unintended functionality. In the set diagram, the three sets S; M, and R create several overlapping areas when they intersect. The verification refers to a procedure that proves the correct implementation of each individual requirement, e.g. in a laboratory setting to minimise the intersection area, v i, so it is the set M \ S \ R. The validation is a procedure or process model that compares the results with observed empirical data to comfirm the correctness of the requirements to minimise the intersection area, v , so it is the
Fig. 11. Logical scenario synthesis of cluster C2s when driving around a stationary object from the left-hand side with 51 events.
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
Fig. 12. Logical scenario synthesis of cluster C3s when driving around a stationary object from the right-hand side with 31 events.
Fig. 13. Continuous testing process for achieving the required test completion criteria of automated driving functions.
249
250
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
set M \ ðS [ RÞ. The accreditation is aimed at minimising the areas i and iv , so they are the sets R \ ðM [ S Þ and R \ S \ M, respectively. The adaptive functional testing aims to maximise the optimised behaviour with the intersection area, iii, so it is the set S \ M \ R and thus minimise the deductive gap. If the deductive gap are less than the reasonable risk, the continuous test process can be terminated according to the optimisation goals by sign-off recommendations of automated driving functions. The areas ii and v ii are not critical, where they have no safety-related impacts, so they are the sets R \ M \ S and S \ ðM [ RÞ, respectively. Finally, the ADC-HiL framework efficiently verifies the automated driving functions in the adaptive test coverage. Several steps of the process are computerised: Data preprocessing, data transformation, data mining and ontology-based test scenario synthesis. Human intervention is still required in the variable selection and interpretation of the results. The interpretation of the cluster analysis is necessary to make appropriate use of the obtained information. Since it is not possible to guarantee absolute safety for automated vehicles, one of the biggest challenge in automated driving is to argue for a reasonably low residual risk resulting from imperfections of the environmental perception sensors. Such arguments are not currently supported by the relevant safety norms. This paper proposes applying an adaptive testing approach to determine how such an argument could be formed by decomposing the goals to tame the long tail of testing. The structure presented in this paper raises several issues that require substantial future research activities. These activities have to be integrated into a system engineering approach that supports the structure of the adaptive functional testing. This technical research work needs to be complemented by activities within industry to form a consensus on risk evaluation and acceptable argumentation structures that would feed into future standards and code of practice guidelines. Acknowledgements The authors thank Prof. Dr.-Ing. Eric Sax, Institute for Information Processing Technologies, Karlsruhe Institute of Technology for comments that greatly improved the manuscript. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.trf.2019. 07.021. References Armand, A., Ibanez-Guzma, J., & Zinoune, C. (2017). Digital maps for driving assistance systems and autonomous driving. In Automated Driving: Safer and More Efficient Future Driving (pp. 201–244). Springer. Bach J., Otten S. and Sax E. (2016). Model based scenario specification for development and test of automated driving functions, In: 2016 IEEE intelligent vehicles symposium (IV), Gothenburg, 1149–1155. Bagschik G., Menzel T. and Maurer M. (2018). Ontology based scene creation for the development of automated vehicles, In: 2018 IEEE intelligent vehicles symposium (IV), Changshu, 1813-1820. Elgharbawy, M., Arenskrieger, R., Schwarzhaupt, A., Frey, M., & Gauterin, F. (2019). A testing framework for predictive driving features with an electronic Horizon, Transportation Research Part F: Traffic Psychology and Behaviour 61 (pp. 291–304). Special TRF issue: Driving simulation. Elgharbawy, M., Bernier, B., Frey, M., & Gauterin, F., (2016). An agile verification framework for traffic sign classification algorithms in heavy vehicles, In: 13th IEEE international conference on computer systems and applications (pp. 1–8). Agadir. Elgharbawy, M., Elsayed, H., Birlet, A., Frey, M., & Gauterin, F. (2018). A scenario-based verification framework for truck platooning. In: 18th driving simulation & virtual reality conference & exhibition (pp. 185–187). Antibes. Elgharbawy, M., Scherhaufer, I., Frey, M., & Gauterin, F. (2018). A data-driven verification framework for active safety functions. In: 18th driving simulation and virtual reality conference and exhibition (pp. 131–135). Antibes. Elgharbawy, M., Scherhaufer, I., Oberhollenzer, K., Frey, M., & Gauterin, F. (2019). Adaptive functional testing for autonomous trucks, International Journal of Transportation. Science and Technology, 8, 202–218. Elgharbawy, M., Schwarzhaupt, A., & Frey, M. (2019). Verfahren zum Testen eines Spurhalteassistenzsystems für ein Fahrzeug. DE102017009971A1. Elgharbawy, M., Schwarzhaupt, A., Frey, M., & Gauterin, F. (2019). A real-time multisensor fusion verification framework for advanced driver assistance systems, Transportation Research Part F: Traffic Psychology and Behaviour 61 (pp. 259–267). Special TRF issue: Driving simulation. Elgharbawy, M., Schwarzhaupt, A., Scheike, G., Frey, M., & Gauterin, F. (2016). A generic architecture of adas sensor fault injection for virtual tests. In: 13th IEEE International Conference of Computer Systems and Applications (pp. 1–7). Agadir. Elgharbawy, M., Schwarzhaupt, A., Scherhaufer, I., Gut, M., & Frey, M. (2019a). Verfahren zum Testen eines Totwinkelassistenzsystems für ein Fahrzeug. DE102018005864A1. Elgharbawy, M., Schwarzhaupt, A., Scherhaufer, I., Gut, M., & Frey, M. (2019b). Verfahren zum Testen eines Assistenzsystems für ein Fahrzeug. DE102018005865A1. Geyer, S., Baltzer, M., Franz, B., Hakuli, S., Kauer, M., Kienle, M., Meier, S., Weissgerber, T., Bengler, K., Bruder, R., Flemisch, F., & Winner, H. (2014). Concept and development of a unified ontology for generating test and use-case catalogues for assisted and automated vehicle guidance. IET Intelligent Transport Systems, 8, 183–189. Mohammad, M. A., Kaloskampis, I., Hicks, Y., & Setchi, R. (2015). Ontology-based framework for risk assessment in road scenes using videos. Procedia Computer Science, 60, 1532–1541. Otten, S., Bach, J., Wohlfahrt, C., King, C., Lier, J., Schmid, H., ... Sax, E. (2018). Automated assessment and evaluation of digital test drives. In Advanced microsystems for automotive applications 2017 (pp. 189–199). Springer. Schreier M., Bayesian environment representation, prediction, and criticality assessment for driver assistance systems, Ph.D. thesis, 2017, TU Darmstadt.. Schuldt, F., Menzel, T., & Maurer, M. (2015). Eine methode für die zuordnung von testfällen für automatisierte fahrfunktionen auf x-in-the-loop verfahren im modularen virtuellen testbaukasten. In Workshop fahrerassistenzsysteme (pp. 171–182). Springer. Trost, J. & Zomotor, Z. (2015). Method for operating a brake assist device, and the brake assist apparatus for a vehicle. U.S. Patent No. 9,079,571. Washington. Ulbrich, S., Nothdurft, T., Maurer, M., & Hecker, P. (2014). Graph-based context representation, environment modeling and information aggregation for automated driving. In 2014 IEEE intelligent vehicles symposium proceedings.
M. Elgharbawy et al. / Transportation Research Part F 66 (2019) 234–251
251
Wachenfeld W.H.K., How stochastic can help to introduce automated driving, Ph.D. thesis 2017, TU Darmstadt; Darmstadt. Wachenfeld W., Junietz P., Wenzel R. and Winner H., The worst-time-to-collision metric for situation identification, In: 2016 IEEE intelligent vehicles symposium (IV), 2016, Gothenburg, 729-734. Zhao, D. (2016). Accelerated evaluation of automated vehicles, Ph.D. thesis. Michigan; The University of Michigan. Zhao, D., Huang, X., Peng, H., Lam, H., & LeBlanc, D. J. (2018). Accelerated evaluation of automated vehicles in car-following maneuvers. IEEE Transactions on Intelligent Transportation Systems, 19, 733–744.