Developing a data element repository to support EHR-driven phenotype algorithm authoring and execution

Developing a data element repository to support EHR-driven phenotype algorithm authoring and execution

Journal of Biomedical Informatics 62 (2016) 232–242 Contents lists available at ScienceDirect Journal of Biomedical Informatics journal homepage: ww...

3MB Sizes 0 Downloads 46 Views

Journal of Biomedical Informatics 62 (2016) 232–242

Contents lists available at ScienceDirect

Journal of Biomedical Informatics journal homepage: www.elsevier.com/locate/yjbin

Developing a data element repository to support EHR-driven phenotype algorithm authoring and execution Guoqian Jiang a,⇑, Richard C. Kiefer a, Luke V. Rasmussen b, Harold R. Solbrig a, Huan Mo c, Jennifer A. Pacheco d, Jie Xu e, Enid Montague e,f, William K. Thompson f, Joshua C. Denny c,g, Christopher G. Chute h, Jyotishman Pathak i a

Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, USA Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA d Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA e Feinberg School of Medicine, Northwestern University, Chicago, IL, USA f School of Computing, DePaul University, Chicago, IL, USA g Department of Medicine, Vanderbilt University, Nashville, TN, USA h School of Medicine, Johns Hopkins University, Baltimore, MD, USA i Division of Health Informatics, Weill Cornell Medical College, Cornell University, New York City, NY, USA b c

a r t i c l e

i n f o

Article history: Received 28 May 2016 Accepted 4 July 2016 Available online 5 July 2016 Keywords: Quality Data Model (QDM) HL7 Fast Healthcare Interoperability Resources (FHIR) Metadata standards Semantic Web technology Phenotype algorithms

a b s t r a c t The Quality Data Model (QDM) is an information model developed by the National Quality Forum for representing electronic health record (EHR)-based electronic clinical quality measures (eCQMs). In conjunction with the HL7 Health Quality Measures Format (HQMF), QDM contains core elements that make it a promising model for representing EHR-driven phenotype algorithms for clinical research. However, the current QDM specification is available only as descriptive documents suitable for human readability and interpretation, but not for machine consumption. The objective of the present study is to develop and evaluate a data element repository (DER) for providing machine-readable QDM data element service APIs to support phenotype algorithm authoring and execution. We used the ISO/IEC 11179 metadata standard to capture the structure for each data element, and leverage Semantic Web technologies to facilitate semantic representation of these metadata. We observed there are a number of underspecified areas in the QDM, including the lack of model constraints and pre-defined value sets. We propose a harmonization with the models developed in HL7 Fast Healthcare Interoperability Resources (FHIR) and Clinical Information Modeling Initiatives (CIMI) to enhance the QDM specification and enable the extensibility and better coverage of the DER. We also compared the DER with the existing QDM implementation utilized within the Measure Authoring Tool (MAT) to demonstrate the scalability and extensibility of our DER-based approach. Ó 2016 Elsevier Inc. All rights reserved.

Abbreviations: QDM, Quality Data Model; EHR, electronic health record; eCQMs, electronic clinical quality measures; HQMF, Health Quality Measures Format; DER, data element repository; FHIR, Fast Healthcare Interoperability Resources; CIMI, Clinical Information Modeling Initiatives; MAT, Measure Authoring Tool; eMERGE, Electronic Medical Records and Genomics; SHARP, Strategic Health Information Technology Advanced Research Project; HMORN, HMO Research Network; PCORnet, National PatientCentered Clinical Research Network; PhEMA, phenotype execution and modeling architecture; NQF, National Quality Forum; MDR, Metadata Registry; W3C, World Wide Web Consortium; RDF, Resource Description Framework; OWL, Web Ontology Language; SPARQL, SPARQL Protocol and RDF Query Language; URI, Uniform Resource Identifier; HCLS, Semantic Web Health Care and Life Sciences; ITS, Implementable Technology Specifications; BD2K, Big Data to Knowledge; bioCADDIE, biomedical and healthCAre Data Discovery and Indexing; CEDAR, Center for Expanded Data Annotation and Retrieval; NCI, National Cancer Institute; caDSR, Cancer Data Standards Registry and Repository; API, application programming interface; MMS, meta-model schema; KNIME, Konstanz Information Miner; VSAC, Value Set Authority Center; OMG, Object Management Group; CTS2, Common Terminology Services 2; AMI, Acute Myocardial Infarction; IRA, inter-rater agreement. ⇑ Corresponding author at: Mayo Clinic, Department of Health Sciences Research, 200 First Street, SW, Rochester, MN 55905, USA. E-mail address: [email protected] (G. Jiang). http://dx.doi.org/10.1016/j.jbi.2016.07.008 1532-0464/Ó 2016 Elsevier Inc. All rights reserved.

G. Jiang et al. / Journal of Biomedical Informatics 62 (2016) 232–242

1. Introduction The creation of phenotype algorithms (i.e., structured selection criteria designed to produce research-quality phenotypes) and the execution of these algorithms against electronic health record (EHR) data to identify patient cohorts have become a common practice in a number of research communities, including the Electronic Medical Records and Genomics (eMERGE) Network [1–3], the Strategic Health Information Technology Advanced Research Project (SHARP) [4,5], the HMO Research Network (HMORN) [6,7] and the National Patient-Centered Clinical Research Network (PCORnet) [8]. However, there exists a limited toolbox enabling the creation of reusable and machine-executable phenotype algorithms which has hampered effective cross-institutional research collaborations [9]. To address this overarching challenge, we are actively developing a phenotype execution and modeling architecture (PhEMA) [10] (http://projectphema.org/) to enable: (1) unambiguous representation of phenotype algorithm logic and semantically rich patient data; (2) effective execution of the phenotype algorithm to generate reproducible and sharable results; and (3) a repository to share phenotypes and execution results for collaborative research. The Quality Data Model (QDM) has been chosen in the PhEMA project as an information model for representing phenotype algorithms. QDM was developed by the National Quality Forum (NQF) for representing EHR-based electronic clinical quality measures (eCQMs). In conjunction with the HL7 Health Quality Measures Format (HQMF), QDM contains core elements that make it a promising model for representing phenotype algorithms for clinical research [11,12]. However, currently the QDM specification [13] is available only as descriptive text documents, which require human interpretation and implementation

233

for broader use and machine consumption. We believe that a standards-based, semantically annotated rendering of the QDM data elements is critical to support the development of phenotype algorithm authoring and execution applications. The objective of this study is to develop and evaluate a data element repository (DER) that provides standard representations and machine-readable service APIs for data elements extracted from the QDM specification. The system architecture and tooling choices and their evaluations are described in the following sections. 2. Background 2.1. NQF QDM The NQF QDM describes clinical concepts in a standardized format to enable electronic quality performance measurement in support of operationalizing the Meaningful Use Program in the United States. It consists of two modules: a data model module and a logic module [6]. The data model module is used to represent clinical entities (e.g. diagnoses, laboratory results) and includes the notions of category, datatype, attribute, and value set comprising concept codes from one or more terminologies. A QDM element encapsulates a certain category (e.g., Medication) with an associated datatype (e.g., ‘‘Medication, Administered”). Each datatype has a number of associated attributes (e.g., Dose). Fig. 1 shows the QDM element structure [13]. In QDM elements, value sets can be used to define possible codes for the QDM element’s definition or the QDM elements’ attributes. The logic module includes logical, comparison, temporal, and subset operators and functions. These may be combined to constrain combinations of data model entities (e.g. Diagnosis A AND (COUNT(Medication B) > 5)). As of July 2015, the latest release of QDM is version 4.1.2 [13]. Table 1

Fig. 1. Quality Data Model (QDM) element structure. (Reproduced using the source from the QDM element specification [13].)

234

G. Jiang et al. / Journal of Biomedical Informatics 62 (2016) 232–242

Table 1 The QDM module, and core model element definitions and examples. Module

Core model element

Definition

Example

Data model module

Category

A category consists of a single clinical concept identified by a value set. A category is the highest level of definition for a QDM element. The QDM currently contains 19 categories A datatype is the context in which each category is used to describe a part of the clinical care process An attribute provides specific detail about a QDM element. QDM elements have two types of attributes, datatype - specific and data flow attributes A value set is a set of values that contain specific codes derived from a particular code system. Value sets are used to define the set of codes that can possibly be found in a patient record for a particular concept

Medication, Procedure, Condition/Diagnosis/Problem, Communication, and Encounter

Attribute filters can be applied to QDM elements to further restrict the set of events that are returned Min, Max, Median, Average, Count, Sum, Age At, DateDiff, and TimeDiff

Filter by existence of a recorded value, filter by value set or filter by date Min >= 120 mmHg of: ‘‘Physical Exam, Performed: Systolic Blood Pressure (result)” during ‘‘Measurement Period” Intersection of: ‘‘Encounter, Performed: Office Visit” during ‘‘MeasurementPeriod” ‘‘Encounter, Performed: Office Visit” ends before start of ‘‘Diagnosis, Active: Diabetes” AND: ‘‘Encounter: Hospital Inpatient” AND: ‘‘Physical Exam, Performed: Weight Measurement” during ‘‘Measurement Period” ‘‘Encounter: Hospital Inpatient (duration > 120 day(s))”

Datatype Attribute

Value sets

Logic module

Attribute filters Functions

Subset operators

First, Second, Third, Fourth, Fifth, Most Recent, Intersection Of, Union Of, Satisfies Any, and Satisfies All

Logic operators

And, Or, Not

Comparison operators

Equal to, Less Than, Less Than Or Equal To, Greater Than, and Greater Than Or Equal To Fulfills

General relationship operators Temporal operators

‘Starts Before Start Of’, ‘Starts After Start Of’, ‘Starts Before End Of’, ‘Starts Concurrent With’, ‘Starts During’, ‘Ends During’, ‘Concurrent With’, ‘During’, ‘Overlaps’, etc.

shows the definitions and examples of core model elements in the QDM specification. 2.2. ISO/IEC 11179 metadata standard ISO/IEC 11179 is a six-part international standard known as the ISO/IEC 11179 Metadata Registry (MDR) standard [14]. Part 3 of the ISO/IEC 11179 standard describes a model for formally associating data model elements with their intended meaning. Fig. 2 shows a high-level data description metamodel in the ISO/IEC 11179 specification [14]. The lower part of the metamodel is a representation layer, which describes how information about observations and values is represented, and the upper part of the metamodel is a conceptual layer, which describes how semantic meaning of the observations and values are represented unambiguously using standard domain ontologies [15]. A data element is one of the foundational concepts in the specification. ISO/IEC 11179 specifies the relationships and interfaces between data elements, value sets (i.e., enumerated value domains) and standard terminologies. 2.3. Semantic Web technologies The World Wide Web Consortium (W3C) is the main standards body for the World Wide Web [16]. Its goal is to develop interoperable technologies and tools as well as specifications and guidelines to realize the full potential of the Web. The Resource Description Framework (RDF) [17], Web Ontology Language (OWL) [18], and SPARQL [19] (a recursive acronym for SPARQL Protocol and RDF Query Language) specifications have all achieved the level of W3C recommendations (the highest level for the W3C standards), and have gained wide acceptance and use. RDF is a general-purpose framework for naming, describing, and organizing resources. A resource is identified and can be referenced by a

‘Medication, Active’ and ‘Medication, Administered’ as applied to the Medication category ‘Dose’, ‘Frequency’, ‘Route’, ‘Start Datetime’ and ‘Stop Datetime’, for the datatype ‘Mediation Active’ Laboratory Test, Performed: ‘‘value set A” (result: ‘‘value set B”)

‘‘Communication: Provider to Provider: Consult Note” fulfills ‘‘Intervention, Order: Referral” ‘‘Diagnosis, Active: Diabetes” starts during ‘‘Encounter: Hospital Inpatient”

Uniform Resource Identifier (URI). An RDF statement is represented by a triple format (i.e., subject-predicate-object). A set of RDF statements (i.e., triples) forms a directed graph, which expresses a graph data model. SPARQL is a standard query language on RDF graphs and a SPARQL endpoint can be established to provide standard query services on RDF graphs. OWL provides the standard ontology modeling language to capture formal relationships among entities in a particular domain and enables semantic reasoning and logical inference. These technologies based on the W3C standards provide a solid foundation to define how to model, capture, and disseminate information with the explicit goal of maximizing semantic interoperability. The W3C Semantic Web Health Care and Life Sciences (HCLS) interest group [20] has been established to develop, advocate for, and support the use of Semantic Web technologies across health care, life sciences, clinical research and translational medicine. As a joint collaboration with the W3C HCLS on Clinical Observations Interoperability, a sub-group of the HL7 Implementable Technology Specifications (ITS) group known as the RDF for Semantic Interoperability [21] was established to facilitate the use of RDF as a common semantic foundations for healthcare information interoperability. In this study, we use Semantic Web technologies to build our DER infrastructure as detailed in the following sections. 2.4. Related work There have been a number of research efforts to develop and promote metadata standards that help scientists annotate their research data and results [22–24]. Notably the NIH Big Data to Knowledge (BD2K) bioCADDIE (biomedical and healthCAre Data Discovery and Indexing) project [25] and the Center for Expanded Data Annotation and Retrieval (CEDAR) project [26] are initiated for the study and development of metadata standards and

235

G. Jiang et al. / Journal of Biomedical Informatics 62 (2016) 232–242

Fig. 2. A high-level data description meta-model specified in the ISO/IEC 11179 (source from the ISO/IEC11179 specification document [14]).

applications for annotating biomedical datasets to facilitate data discovery, data interpretation, and data reuse. The BD2K community has produced a metadata specification [27] to describe the metadata and the structure for datasets. However, these efforts neither focuses on the use cases in EHR-driven phenotype algorithm applications, nor uses the ISO/IEC 11179-based approach to manage their metadata. In the present study, we investigated and used the ISO/IEC 11179 standard since this standard defines a common language to describe different aspects of metadata registries and enables the exchange of such metadata between systems that follow this standard. ISO/IEC 11179-based metadata repositories have been successfully implemented in the following projects: (1) the National Cancer Institute (NCI) Cancer Data Standards Registry and Repository (caDSR) [28,29], (2) the Semantic MDR developed by two European projects: EHR-enabled clinical research (EHR4CR) and patient safety (SALUS) [30,31], and (3) the CDISC2RDF repository under the Food and Drug Administration (FDA) PhUSE project [32]. With respect to the use of the data elements in the QDM specification, the Measure Authoring Tool (MAT) [33] currently implements a subset of the most recent QDM specification to provide a software tool for the creation of the electronic clinical quality measures (eCQM) in standard formats. Table 2 shows the database tables for the core QDM elements implemented in the MAT. However, the implementation is MAT-specific, and not scalable to reuse in broad clinical research communities.

Table 2 The database tables for the core QDM elements implemented in the MAT. Table name

Table field

CATEGORY

CATEGORY_ID DESCRIPTION ABBREVIATION

DATA_TYPE

DATA_TYPE_ID DESCRIPTION CATEGORY_ID

QDM_ATTRIBUTES

ID NAME DATA_TYPE_ID QDM_ATTRIBUTE_TYPE

ATTRIBUTE_DETAILS

ATTRIBUTE_DETAILS_ID ATTR_NAME CODE CODE_SYSTEM CODE_SYSTEM_NAME MODE TYPE_CODE

CODE_SYSTEM

CODE_SYSTEM_ID DESCRIPTION CATEGORY_ID ABBREVIATION

CODE

CODE_ID CODE DESCRIPTION CODE_LIST_ID

OPERATOR

ID LONG_NAME SHORT_NAME FK_OPERATOR_TYPE

OPERATOR_TYPE

ID NAME

3. Methods We used the ISO/IEC 11179 metadata standard [14] to capture the metadata structure for each data element in the DER. Furthermore, we leveraged Semantic Web technologies [16] to facilitate semantic representation of these metadata. In addition, we built a RESTful service application programming interface (API) driven by requirements from the PhEMA phenotype algorithm authoring and execution applications. We observed that there are a number of underspecified areas in the QDM, including the lack of model constraints (e.g., datatype constraints) and predefined value sets. We also compared our approach to an existing QDM implementation in the Measure Authoring Tool (MAT) to demonstrate the scalability and extensibility of our DER-based approach.

architecture of the framework. In the repository layer, we describe the QDM data model elements and logic elements using the ISO/IEC 11179 standard and represent them using RDF and OWL. In the semantic service layer, we developed a suite of Semantic Web services on top of our metadata repository using the Linked Data API [34]. The services include both RDF/SPARQL-based services and simple RESTful services with JSON and XML renderings. In the application layer, the RESTful services are consumed within the phenotype algorithm authoring and execution applications. 3.2. System implementation

3.1. System architecture We developed a three-layer semantic framework for data element representation and management. Fig. 3 shows the system

3.2.1. Creating a QDM reference model schema in OWL We manually developed a QDM schema using OWL. The schema represents the QDM modules and core model elements as shown in

236

G. Jiang et al. / Journal of Biomedical Informatics 62 (2016) 232–242

Fig. 3. System architecture.

Table 1, which is designed as an extension of the ISO/IEC 11179 standard. In the present study, we used a meta-model schema (MMS) in the OWL/RDF rendering (developed by the FDA PhUSE Semantic Technology project [32]) which is a subset of the ISO/IEC 11179 Part 3 metadata model. Fig. 4 shows a Protégé 5 screenshot illustrating the QDM schema and its instanceelements rendered in the OWL format as an extension of the MMS. In the MMS, the Administered Item is the root class. In the ISO/IEC 11179, an administered item is any item that has registration, governance, and life cycle information associated with it. The MMS also has a number of custom specializations of Context to distinguish the different context levels in a model. The left-hand panel

in Fig. 4 shows the high-level classes of the MMS and the QDM schema. The QDM Data Model Element and QDM Logic Element are represented as the subclasses of the top class Administered Item of the MMS. We designed a base URI (Unified Resource Identifier) - http://rdf.healthit.gov/qdm/schema# for the QDM schema so that each element in the schema can be uniquely and uniformly identified. Two of the authors (GJ and RK) manually converted the data elements specified in the QDM version 4.1.2 from an NQF specification document into an Excel spreadsheet. We then wrote a Java-based program to parse the spreadsheet and transformed the data elements into the RDF Turtle format [35] that represents

Fig. 4. A Protégé 5 screenshot illustrating the QDM schema and its instance-elements rendered in the OWL format. The left-hand panel shows the ISO/IEC 11179 meta-model schema and its QDM schema extension; the panel in the middle shows a list of data elements for a particular class (e.g., QDMDatatype); the right-hand panel shows the metadata (e.g., label, textual definition, types, and context) for a particular data element (e.g., Diagnosis, Active).

G. Jiang et al. / Journal of Biomedical Informatics 62 (2016) 232–242

237

Table 3 The service URI scheme designed for the REST service API. REST service URI scheme

Description

/qdm/categories /qdm/category/[category] /qdm/datatypes /qdm/datatype/[datatype] /qdm/category/[category]/datatypes /qdm/category/[category/datatype/[datatype] /qdm/attributes /qdm/attribute/[attribute] /qdm/datatype/[datatype]/attributes /qdm/datatype/[datatype]/attribute/[attribute] /qdm/functions /qdm/function/[function] /qdm/comparisonOperators /qdm/comparisonOperator/[comparisonOperator] /qdm/logicalOperators /qdm/logicalOperator/[logicalOperator] /qdm/relationshipOperators /qdm/relationshipOperator/[relationshipOperator] /qdm/subsetOperators /qdm/subsetOperator/[subsetOperator] /qdm/temporalOperators /qdm/temporalOperator/[temporalOperator]

Get Get Get Get Get Get Get Get Get Get Get Get Get Get Get Get Get Get Get Get Get Get

an RDF graph in a compact textual format. Each QDM element is also asserted as an instance of ISO 11179 Data Element. The panel in the middle of Fig. 4 shows a list of data elements for a particular class (e.g., QDMDatatype); the right-hand panel shows the metadata (e.g., label, textual definition, types, and context) for a particular data element (e.g., ‘‘Diagnosis, Active”). We also designed another base URI - http://rdf.healthit.gov/qdm/element# for uniquely and uniformly identifying each data element instance. In total, 19 instances of the QDM Category, 76 instances of the QDM Datatype, 528 instances of the QDM Attribute and 53 instances of the QDM Logic Element are populated using the QDM schema. 3.2.2. Developing the Semantic Web data element services After the data elements extracted from the QDM specification are represented in RDF, they are loaded into a RDF triple store. In the present study, we used an open source RDF store known as 4store [36] for the backend repository. Using the built-in feature of 4store, we established a SPARQL endpoint that provides standard semantic query services. As RESTful services are well supported by software developers, we adopted the principles of Linked Data API and developed simple RESTful services on top of the RDF-based DER. Specifically, we designed the service URI scheme based on the QDM schema (see Table 3), and created a collection of SPARQL queries that retrieve the metadata required by each service URI defined in the scheme. The services provide easy-to-process representations for the data element metadata in XML and JSON formats. Both the SPARQL endpoint and the simple RESTful services APIs are publicly available from the PhEMA project website at http://projectphema.org/. 3.2.3. Interacting with phenotype algorithm authoring and execution applications We developed the requirements for the DER and services in collaboration with the PhEMA developers, both through discussions about desired aspects of these services and by examining the prototype implementation of PhEMA phenotype algorithm authoring and execution applications. This process was conducted iteratively to improve the utility of the services as the applications matured. Features desired by the application developers included a simple RESTful API that returned JSON or XML representations of the data, which obviated the need to perform SPARQL queries directly.

all QDM categories a particular QDM category based on name all QDM datatypes a particular QDM datatype based on name all QDM datatypes for a particular category a particular QDM datatype for a particular category all QDM attributes a particular QDM attribute based on name all QDM attributes for a particular datatype a particular QDM attribute for a particular datatype all QDM functions a particular QDM function all QDM comparison operators a particular QDM comparison operator based on name all QDM logical operators a particular QDM logical operator all QDM relationship operators a particular QDM relationship operator all QDM subset operators a particular QDM subset operator all QDM temporal operators a particular QDM temporal operator

As of December 2015, the PhEMA authoring tool has implemented many of its features using the DER services APIs, including: (1) A QDM model element browser that renders the QDM elements in a hierarchical tree; (2) A feature to highlight metadata for a particular data element, e.g., textual definition; (3) A feature to suggest the value set binding for a particular data element. The use of the JSON REST API has allowed the authoring tool to rapidly introduce new data elements during development by leveraging consistent, computable definitions. Fig. 5 shows (A) the use of QDM data elements and their metadata (e.g., textual definition) in the PhEMA phenotype algorithm authoring application and (B) the use of QDM data elements in constructing executable phenotype algorithms in the Konstanz Information Miner (KNIME) analytics platform (https://www.knime.org/) [37]. 4. Findings and discussion Two fundamental service components in the PhEMA are proposed to enable the standard representation of data elements and value sets used in the phenotype algorithms: data model services and terminology services. The data element repository in the present study is a reference implementation of two components in the PhEMA application suite. 4.1. Underspecified constraints in QDM Using the ISO/IEC 11179-based representation, we were able to identify a number of underspecified areas in QDM. First, we found that no datatype or cardinality is specified for each data element in QDM, which may cause arbitrary interpretation of a data element when it is used. For example, the datatype of ‘‘DiagnosisActive.Se verity” should be specified as ‘‘Encoded Value”, and that of ‘‘Diagno sisActive.StartDatetime” should be specified as ‘‘Date”. With such a specification, phenotype algorithm applications could understand that the former element is associated with a list of coded values (i.e., a value set) and the value of the latter is a date. In addition, with a cardinality constraint, the system could know whether the values associated with a data element are required or optional. However, to the best of our knowledge, the QDM specification does not provide such constraints. To deal with the modeling challenges, we are exploring recent development in the clinical modeling communities including HL7

238

G. Jiang et al. / Journal of Biomedical Informatics 62 (2016) 232–242

Fig. 5. (A) This is a screenshot of the PhEMA phenotype algorithm authoring application illustrating the use of QDM data elements and their metadata. In the left panel it shows a QDM model element browser that renders the QDM elements in a hierarchical tree, and a feature to highlight metadata for a particular data element, e.g., textual definition. (B) This is a screenshot of the PhEMA phenotype algorithm execution platform based on the KNIME illustrating the use of QDM data elements in constructing executable phenotype algorithms platform.

Fast Healthcare Interoperability Resources (FHIR) and the Clinical Information Modeling Initiatives (CIMI). FHIR is an emerging HL7 standard that leverages existing logical and theoretical models to provide a consistent, easy to implement, and rigorous mechanism for exchanging data between healthcare applications [38]. The primary goal of CIMI is to ‘‘improve the interoperability of healthcare systems through shared implementable clinical information models” [39]; CIMI works with FHIR in defining resources. In our studies, we have extended the QDM schema with the notions of

FHIR Datatypes and Resources and populated the schema with data elements from FHIR models [40]. Through loading the DER with the data elements extracted from FHIR models, on the one hand, the well-defined constraints including data types and value sets could potentially be used to enhance the QDM specification with appropriate mappings. To this end, we also developed a crowdsourcing approach in harmonizing high-level data elements between QDM and HL7 FHIR [41], which provides potential to capture and reuse those constraints defined in the HL7 FHIR models for phenotype

G. Jiang et al. / Journal of Biomedical Informatics 62 (2016) 232–242

239

Table 4 The mappings between QDM and FHIR that are highly agreed among the reviewers (n = 7). QDM data element

FHIR data element

Number of reviewers with agreement

Device Diagnosis_Active Encounter Medication Procedure Communication Communication_From Patient to Provider Communication_From Provider to Patient Communication_From Provider to Provider Condition/Diagnosis/Problem Diagnosis_Family History Diagnosis_Inactive Diagnosis_Resolved Diagnostic Study_Order Encounter_Active Encounter_Performed Medication_Administered Medication_Dispensed Medication_Order Patient Characteristic Substance

Device Condition Encounter Medication Procedure Communication Communication Communication Communication Condition Family history Condition Condition Diagnostic order Encounter Encounter Medication administration Medication dispense Medication prescription Patient Substance

7 7 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

applications. In total, 94 data elements from QDM (consisting of 18 QDM Categories and 76 QDM Datatypes) and 98 data elements from FHIR (all FHIR Resources) were extracted for mappings. We received the responses from 7 team members and 206 mapping pairs were created. We used Fleiss’ kappa [42] for assessing the reliability of agreement between a fixed number of raters. If the raters are in complete agreement then kappa = 1. If there is no agreement among the raters, then kappa 6 0. All QDM data elements had at least one mapping suggested whereas 65 FHIR data elements did not have any mapping. We observed that most of these 65 FHIR data elements are those not commonly used in the phenotype algorithms (data not shown). Only fair agreement (kappa = 0.24) was achieved. The QDM categories Communication, Condition/Diagnosis/Problem, Encounter, Medication, Procedure and Patient Characteristics were in relatively high inter-rater agreement (IRA) (see Table 4). All of these mappings with high IRA are also the top data elements used in the phenotype algorithms [43–45]. The textual definitions of the data elements, along with multiple factors (e.g., attributes associated with each type, ambiguity in naming within two models), are important for creating correct mappings. For those 65 FHIR data elements that did not have any mappings, most of them belong to non-clinical categories of FHIR Resources, indicating that FHIR covers more granular and infrastructure-related data elements. We consider that a complete and high-quality mapping produced through the harmonization will help clinical researchers understand the domain coverage of the two models and ultimately establish the degree of interoperability between QDM-based phenotype algorithms and patient data populated with FHIR models. On the other hand, clinical phenotype applications require comprehensive coverage of data elements to better support a variety of use cases in clinical and translational research. This demands the extensibility of a DER that can support loading data elements from different clinical information models and resources. Thus, we plan to develop a standard interface with the CIMI [39] modeling languages, which would enable the DER to load the data elements from HL7 and CIMI-compliant clinical models. 4.2. Lack of pre-defined value sets in QDM Although there is a notion of Value Set in the QDM schema, the QDM specification does not provide any pre-defined value sets that can be reused by QDM adopters. Each QDM adopter will need to

determine their own strategy for value set definition and management. For example, the 2014 Clinical Quality Measures (CQMs) [46] use the National Library of Medicine Value Set Authority Center (VSAC) [47] as a repository for associated value sets. In the PhEMA project, we adopted the combination of the VSAC (for reuse of existing value sets) and the Object Management Group (OMG) standard Common Terminology Services 2 (CTS2) [48] for value set definition and management in support of phenotype algorithm creation and execution. We developed CTS2 value set services for both the VSAC value sets and HL7 FHIR value sets, so that the PhEMA applications could invoke such services through standard APIs. Although the value sets needed for PhEMA that did not already exist in the VSAC could be uploaded to VSAC with appropriate authoring credentials, we consider the CTS2-based value sets service APIs provide additional features that are complementary to the VSAC-based services. First, CTS2 is an OMG standard; the standard-based value sets services are more interoperable across different systems that observe the same standard. Second, CTS2 has a write API that allows users to define their own value sets within PhEMA applications, which is one of the key requirements for the creation of phenotype algorithms. Third, from the perspective of user experiences, it is not straightforward to use the VSAC for publishing value sets given the complexity of the VSAC system; and it is desirable to be able to define value sets within PhEMA applications without directing the users to a separate system. In addition, we loaded all the data elements and their associated value sets used in the definition of CQMs into our DER to facilitate the reuse of the VSAC value sets. Fig. 6 shows a QDM-based semantic representation of criteria defined in CQM 30 - ‘‘Diagnosis, Active: Acute Myocardial Infarction (AMI) (ordinality: Principal)” starts during Occurrence A of ‘‘Encounter Inpatient”. As illustrated in Fig. 6, the data elements qdm:AcuteMyocardialInfarction (an instance of the QDM datatype qdm:DiagnosisActive), qdm:EncounterInpatient (an instance of the QDM datatype qdm:EncounterPerformed), qdm: Principal (an instance of the QDM attribute qdm:DiagnosisActive:Or dinality) are linked with their corresponding value sets using the predicate ‘‘mms:dataElementValueDomain” which is defined in the meta-model schema based on the ISO/IEC11179. With such semantic representation, the PhEMA authoring applications, for example, could provide a feature that recommends value sets to the users who choose to use a CQM data element to define their phenotype algorithms.

240

G. Jiang et al. / Journal of Biomedical Informatics 62 (2016) 232–242

Fig. 6. A QDM-based semantic representation of an example criteria - ‘‘Diagnosis, Active: Acute Myocardial Infarction (AMI) (ordinality: Principal)” starts during Occurrence A of ‘‘Encounter Inpatient”.

4.3. Comparison with the MAT-based implementation As previously mentioned, the MAT [33] implements a subset of the most recent QDM specification and provides a web-based tool that allows measure developers to author eCQMs using the QDM. The MAT backend is based on a relational database schema, and the QDM model elements are implemented in a number of database tables as shown in Table 2. In contrast, we implemented the QDM specification using a DER-based approach leveraging standards-based representations and Semantic Web technologies. First, the DER-based QDM implementation is application-independent, while the MAT-based QDM implementation tightly couples QDM elements to the MAT application. Our DER is developed as a module of the PhEMA infrastructure, which provides semantic and REST service APIs that may be consumed by any applications that use QDM, including (but not limited to) PhEMA authoring and execution tools. For example, the DER is currently being adopted by a Big Data to Knowledge (BD2K) bioCADDIE pilot project [49] to build tools for indexing clinical research datasets using HL7 FHIR. Here, the DER-based approach with FHIR data element services will be used in building indexing services. Second, the DER-based approach uses ISO/IEC 11179 standard to represent the metadata of the data elements from QDM. We found that the textual definitions of the QDM data elements, which are important to help improve the users’ experience when choosing the QDM data elements for building their phenotype algorithms, are not captured in the MAT. Also, as previously described, by converting the textual description to a standard representation we were able to identify underspecified areas such as missing constraints in the QDM specification. In addition, the ISO/IEC 11179 specifies a semantically precise structure for data elements and enables a mechanism to identify two data

elements from different models with the same intended meaning, which would facilitate semantic harmonization of the data elements between different models, e.g., the harmonization of the data elements between QDM and FHIR. In addition, we leveraged the CTS2 standard to enable machine-readable value set services, which provides many of metadata for the value set management that are not covered by MAT. For example, the MAT Code System table (see Table 2) does not have a field for the Code System Version whereas the CTS2 value set services do cover this metadata. This is important for handling different versions of a code system used. Third, the DER-based approach leverages scalable Semantic Web technologies. The RDF-based graph data model allows incremental data integration from disparate data sources and provide an agile approach for dynamic aggregation of large knowledge resources and datasets. As demonstrated, our DER has been extended to load the data elements from other information models such as HL7 FHIR, and other sources such as eCQMs. 4.4. Limitations of the study There are a number of limitations. First, we used a subset of ISO/IEC 11179 metadata standard, i.e. a metamodel schema (MMS) adopted by the CDISC2RDF project. Although the MMS contains core constructs of the metadata standard and has proved very useful in representing the metadata of the QDM data elements in a lightweight manner, the full potential of the ISO/IEC 11179 standard has yet to be explored. For example, our DER has not implemented a key feature of the ISO/IEC 11179 that enables the definition of the intended meaning of a data element or a permissible value using standard vocabularies. To explore the full specification of the ISO/IEC 11179 standard is beyond the scope of this study but will be one of our future study areas. Second, the ser-

G. Jiang et al. / Journal of Biomedical Informatics 62 (2016) 232–242

vice URI scheme designed for the REST service API is based on the notions of QDM elements (e.g., category, datatype, and attribute). This works well for exposing metadata of the QDM data elements but may cause confusion if we use these notions to represent metadata of the data elements extracted from other data models (e.g., FHIR). This demands a generic framework for building metadata services when we load our DER with data elements from a variety of data models. Here, we believe that ISO/IEC 11179 would provide such a generic framework to enable the creation of standard metadata service URI scheme, and we are working on defining such a metadata service URI scheme using the ISO/IEC 11179 standard. Third, the mappings between QDM and FHIR created using a crowdsourcing approach were still preliminary and only fair agreement (kappa = 0.24) was achieved. We plan to look into those mappings with high agreement among reviewers at first while a community-based review mechanism would be needed to incrementally build a collection of reliable mappings between QDM and FHIR. Finally, the DER and its metadata services are mainly used to support the PhEMA phenotype algorithm applications. To enable a robust metadata infrastructure, it is desirable to collect the requirements from research applications in broader clinical research communities. In addition to FHIR, we are also closely collaborating with a number of such communities as BD2K, eMERGE, PCORnet and OHDSI (Observational Health Data Sciences and Informatics) [50] to identify their needs in metadata management to enhance and expand the capabilities of our DER infrastructure. 5. Conclusion We developed a data element repository that provides a standards-based semantic infrastructure to enable machinereadable QDM data element services in support of EHR-driven phenotype algorithm authoring and execution. Using the ISO/IEC 11179-based representation, we were able to identify a number of underspecified areas in QDM. Compared with an existing MAT-based QDM implementation, we demonstrated the scalability and extensibility of our DER-based approach. In the future, we plan to develop a CIMI-based interface to enhance the extensibility of the DER through incorporating data elements extracted from external clinical data and information models (e.g., HL7 FHIR). Contributors G.J., R.K., and L.V.R drafted the manuscript; G.J, H.R.S., R.K., L.V.R. and C.G.C, lead data element repository development; G.J., R.K., and L.V.R. led mapping activities; L.V.R. led authoring environment and modularization studies; J.A.P., and H.M. led algorithm modeling studies; W.K.T., J.A.P., L.V.R., R.K., and H.M. led executability and adaptability studies; J.X. and E.M. led environmental scan and usability studies; J.P., G.J., J.C.D., and W.K.T. provided leadership for the project; all authors contributed expertise and edits. Acknowledgments The manuscript is an expanded version of a podium abstract presented in the AMIA Clinical Research Informatics (CRI) 2015 conference. This work has been supported in part by funding from PhEMA (R01 GM105688), eMERGE (U01 HG006379, U01 HG006378 and U01 HG006388), and caCDE-QA (U01 CA180940). References [1] C.A. McCarty, R.L. Chisholm, C.G. Chute, I.J. Kullo, G.P. Jarvik, E.B. Larson, et al., The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies, BMC Med. Genomics 4 (2011) 13. Epub 2011/01/29.

241

[2] O. Gottesman, H. Kuivaniemi, G. Tromp, W.A. Faucett, R. Li, T.A. Manolio, et al., The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future, Genet. Med.: Off. J. Am. College Med. Genet. 15 (10) (2013) 761–771. Epub 2013/06/08. [3] C.G. Chute, M. Ullman-Cullere, G.M. Wood, S.M. Lin, M. He, J. Pathak, Some experiences and opportunities for big data in translational research, Genet. Med.: Off. J. Am. College Med. Genet. 15 (10) (2013) 802–809. Epub 2013/09/ 07. [4] C.G. Chute, J. Pathak, G.K. Savova, K.R. Bailey, M.I. Schor, L.A. Hart, et al., The SHARPn project on secondary use of Electronic Medical Record data: progress, plans, and possibilities, AMIA Annu. Symp. Proc./AMIA Symp. AMIA Symp. 2011 (2011) 248–256. Epub 2011/12/24. [5] J. Pathak, K.R. Bailey, C.E. Beebe, S. Bethard, D.C. Carrell, P.J. Chen, et al., Normalization and standardization of electronic health records for highthroughput phenotyping: the SHARPn consortium, J. Am. Med. Inform. Assoc: JAMIA 20 (e2) (2013) e341–e348. Epub 2013/11/06. [6] E.E. Thompson, J.F. Steiner, Embedded research to improve health: the 20th annual HMO Research Network conference, March 31–April 3, 2014, Phoenix, Arizona, Clin. Med. Res. 12 (1–2) (2014) 73–76. Epub 2014/10/30. [7] T.R. Ross, D. Ng, J.S. Brown, R. Pardee, M.C. Hornbrook, G. Hart, et al., The HMO research network virtual data warehouse: a public data model to support collaboration, EGEMS (Wash, DC) 2 (1) (2014) 1049. Epub 2014/01/01. [8] S.E. Daugherty, S. Wahba, R. Fleurence, Patient-powered research networks: building capacity for conducting patient-centered clinical outcomes research, J. Am. Med. Inform. Assoc.: JAMIA 21 (4) (2014) 583–586. Epub 2014/05/14. [9] H. Mo, W.K. Thompson, L.V. Rasmussen, J.A. Pacheco, G. Jiang, R. Kiefer, et al., Desiderata for computable representations of electronic health records-driven phenotype algorithms, J. Am. Med. Inform. Assoc. 22 (6) (2015) 1220–1230, http://dx.doi.org/10.1093/jamia/ocv112. Epub 2015/09/05. [10] L.V. Rasmussen, R.C. Kiefer, H. Mo, P. Speltz, W.K. Thompson, G. Jiang, et al., A modular architecture for electronic health record-driven phenotyping, AMIA Joint Summits Transl. Sci. Proc. AMIA Summit Transl. Sci. 2015 (2015) 147– 151. Epub 2015/08/26. [11] D. Li, C.M. Endle, S. Murthy, C. Stancl, D. Suesse, D. Sottara, et al., Modeling and executing electronic health records driven phenotyping algorithms using the NQF Quality Data Model and JBoss(R) Drools Engine, AMIA Annu. Symp. Proc./AMIA Symp. AMIA Symp. 2012 (2012) 532–541. Epub 2013/01/11. [12] W.K. Thompson, L.V. Rasmussen, J.A. Pacheco, P.L. Peissig, J.C. Denny, A.N. Kho, et al., An evaluation of the NQF Quality Data Model for representing Electronic Health Record driven phenotyping algorithms, AMIA Annu. Symp. Proc./AMIA Symp. AMIA Symp. 2012 (2012) 911–920. Epub 2013/01/11. [13] Quality Data Model (QDM) Specification, 2015. Available from: (July 14, 2015). [14] ISO/IEC 11179, Information Technology – Metadata Registries (MDR), 2015. Available from: (July 14, 2015). [15] J. Davies, J. Gibbons, S. Harris, C. Crichton, The CancerGrid experience: metadata-based model-driven engineering for clinical trials, Sci. Comput. Progr. 89 (2014) 126–143. [16] W3C Standards, 2015. Available from: (July 14, 2015). [17] RDF, 2015. Available from: (December 24, 2015). [18] OWL Web Ontology Language, 2015. Available from: (December 24, 2015). [19] SPARQL Query Language for RDF, 2015. Available from: (December 24, 2015). [20] The W3C Semantic Web Health Care and Life Sciences (HCLS), 2015. Available from: (December 24, 2015). [21] The RDF for Semantic Interoperability Group, 2015. Available from: (December 24, 2015). [22] FORCE11, The Future of Research Communications and e-scholarship, 2016. Available from: World Wide Web Consortium (April 28, 2016). [23] Biosharing, 2016. Available from: (April 28, 2016). [24] S.A. Sansone, P. Rocca-Serra, D. Field, E. Maguire, C. Taylor, O. Hofmann, et al., Toward interoperable bioscience data, Nat. Genet. 44 (2) (2012) 121–126. Epub 2012/01/28. [25] NIH BD2K bioCADDIE Project, 2016. Available from: (April 28, 2016). [26] M.A. Musen, C.A. Bean, K.H. Cheung, M. Dumontier, K.A. Durante, O. Gevaert, et al., The center for expanded data annotation and retrieval, J. Am. Med. Inform. Assoc.: JAMIA 22 (6) (2015) 1148–1152. Epub 2015/06/27. [27] BD2K bioCADDIE WG3: Metadata Specification, 2016. Available from: (April 28, 2016). [28] G.A. Komatsoulis, D.B. Warzel, F.W. Hartel, K. Shanbhag, R. Chilukuri, G. Fragoso, et al., CaCORE version 3: implementation of a model driven, serviceoriented architecture for semantic interoperability, J. Biomed. Inform. 41 (1) (2008) 106–123. Epub 2007/05/22. [29] NCI caDSR Wiki, 2015. Available from: (December 24, 2015). [30] C. Daniel, A. Sinaci, D. Ouagne, E. Sadou, G. Declerck, D. Kalra, et al., Standardbased EHR-enabled applications for clinical research and patient safety: CDISC - IHE QRPH - EHR4CR & SALUS collaboration, AMIA Joint Summits Transl. Sci. Proc. AMIA Summit Transl. Sci. 2014 (2014) 19–25. Epub 2014/01/01.

242

G. Jiang et al. / Journal of Biomedical Informatics 62 (2016) 232–242

[31] J. Doods, R. Bache, M. McGilchrist, C. Daniel, M. Dugas, F. Fritz, Piloting the EHR4CR feasibility platform across Europe, Methods Inf. Med. 53 (4) (2014) 264–268. Epub 2014/06/24. [32] PhUSE Semantic Technology Working Group CDISC Standards, 2015. Available from: (August 3, 2015). [33] Measure Authoring Tool, 2015. Available from: (December 24, 2015). [34] Linked Data API, 2015. Available from: (July 14, 2015). [35] RDF Turtle: Terse RDF Triple Language, 2015. Available from: (December 28, 2015). [36] The 4store, 2015. Available from: (July 14, 2015). [37] H. Mo, J.A. Pacheco, L.V. Rasmussen, P. Speltz, J. Pathak, J.C. Denny, et al., A prototype for executable and portable electronic clinical quality measures using the KNIME analytics platform, AMIA Joint Summits Transl. Sci. Proc. AMIA Summit Transl. Sci. 2015 (2015) 127–131. Epub 2015/08/26. [38] HL7 FHIR DTSU 2, 2015. Available from: (July 14, 2015). [39] Clinical Information Modeling Initiative, 2015. Available from: (July 14, 2015). [40] G. Jiang, H.R. Solbrig, R. Kiefer, L.V. Rasmussen, H. Mo, P. Speltz, et al., A standards-based semantic metadata repository to support EHR-driven phenotype authoring and execution, Stud. Health Technol. Inform. 216 (2015) 1098. Epub 2015/08/12. [41] G. Jiang, H.R. Solbrig, R. Kiefer, L.V. Rasmussen, H. Mo, J.A. Pacheco, et al. (Eds.), Harmonization of quality data model with HL7 FHIR to support EHR-driven

[42] [43]

[44]

[45]

[46] [47] [48] [49]

[50]

phenotype authoring and execution: a pilot study, AMIA Annu. Symp. Proc., 2015, 2015 November 13; p. 1512. Fleiss’ Kappa, 2016. Available from: (May 17, 2016). M. Conway, R.L. Berg, D. Carrell, J.C. Denny, A.N. Kho, I.J. Kullo, et al., Analyzing the heterogeneity and complexity of Electronic Health Record oriented phenotyping algorithms, AMIA Annu. Symp. Proc./AMIA Symp. AMIA Symp. 2011 (2011) 274–283. Epub 2011/12/24. J. Pathak, A.N. Kho, J.C. Denny, Electronic health records-driven phenotyping: challenges, recent advances, and perspectives, J. Am. Med. Inform. Assoc.: JAMIA 20 (e2) (2013) e206–e211. Epub 2013/12/05. A.N. Kho, J.A. Pacheco, P.L. Peissig, L. Rasmussen, K.M. Newton, N. Weston, et al., Electronic medical records for genetic research: results of the eMERGE consortium, Sci. Transl. Med. 3 (79) (2011). 79re1. Epub 2011/04/22. Clinical Quality Measures, 2015. Available from: (July 14, 2015). NLM Value Set Authority Center (VSAC), 2015. Available from: (July 14, 2015). Common Terminology Services 2 (CTS2) Specification, 2015. Available from: (July 14, 2015). bioCADDIE Pilot Project, 2015. Available from: (December 24, 2015). OHDSI (Observational Health Data Sciences and Informatics), 2016. Available from: (May 17, 2016).