Insights
of |

XML Based CCD Processing for HealthVault Integration
June 29, 2009In recent years, healthcare providers, insurance companies and health applications have all been collecting mountains of health-related data in various electronic formats. The ability to search and analyze this data in context is required in order to prove it worthy of the effort of collecting it in the first place. One big challenge that stands in the way of ubiquitous health information is the data format disparity between different systems. It is not uncommon today to see legacy applications that are using older standards such as flat text files, HL7, CCR, or (even worse) proprietary data formats.
With the latest efforts of Microsoft and Google to create health platforms, this disparity is becoming more obvious to big players in the industry. Microsoft’s HealthVault offers a flexible way of storing health information in a variety of formats. HealthVault has introduced its own type system to store individual health data items such as medications, allergies, conditions, etc. In addition to this, it also supports proprietary data formats defined by the integrating applications. Finally, as a move towards interoperability, HealthVault has recently added support for CCD and CCR formats. HealthVault has the ability to deal with many different data formats to accommodate integration with a variety of healthcare applications, and is on the way to becoming a health data warehouse.
When developing an application that acts as a bridge between many health applications and a data warehouse like HealthVault, one prerequisite is to ensure interoperability between systems. This in turn involves using a common denominator in data representation. There are a number of initiatives to create such a standard. Especially with recent efforts by HL7 and the American Society for Testing and Materials (ASTM), a new data representation format called Continuity of Care Document (CCD) is emerging as the new data standard in relatively newer applications. Although a standard is being slowly set for new applications, there is still a plethora of software systems running today that do not use the CCD standard. If these older systems had used CCD, interoperability would be fairly straight forward. Unfortunately, it’s simply not feasible to modify these legacy systems to use the newer, more compatible standard.
With this obstacle in mind, architects designing health systems that interface with multiple data sources must consider globalizing the input and standardizing the output. In other words, the application must accept as its input as many different data representations as it can. At the same time, it should produce a single format output, preferably the standard format that is widely accepted by the industry (CCD format).
Throughout this white paper, we will tackle the design challenges of an application that aggregates health data from multiple sources, creates a CCD document from the available data, and finally exports the record to Microsoft HealthVault. The following diagram illustrates the application at a high level (See Figure 1).
Figure 1: Disparate Data Sources
Designing a system that works with multiple data sources is a well-identified problem with available solutions. Thus, it is tempting for architects to reduce this problem to one of simple data mapping and delve immediately into implementation. Unfortunately, the problem as applied to the health industry is a special case considering that lives may be at stake. At the same time, quick access to accurate health information about a patient can save time and lives in certain cases. So there is a clear need for interoperability between various health systems that can provide valuable information about a patient.
Knowing the critical nature of the problem at hand, how can the new system consume health data from external systems that come in legacy formats and still produce a standardized and accurate health record? When we were asked this question by a leading health insurance provider, we inspected all available options. At the end, we identified a six-step process that involves collecting data from separate sources, filtering the data, translating it to a form that the application can work with, and finally producing a CCD document. Figure 2 CCD generation steps outlines the different aspects of the process.
Figure 2: CCD Generation Steps
For the sake of clarity and time, and to stay focused on the topic, we will briefly mention the pre-processing and data aggregation steps but will leave it up to the reader to discover more information. We believe these two steps depend heavily on the existing infrastructure and may change in accordance with each system. We will look at data filtering, data serialization, and data mapping in more detail, however, as they are critical to a successful CCD generation.
Pre-Processing
This step refers to any action that the application might take prior to CCD generation. In its simplest form, this step might include a check to verify the availability of all data sources. In the case of unavailability of any (or all) data sources, the entire process could be terminated without wasting any more resources. This is just one example of a pre-process and, depending on the system in question, this step might involve a more complicated process.
Data Aggregation
This step refers to retrieval of data from all data sources that will be used in CCD generation. The implementation of the data aggregation step might change drastically based on existing implementation and the third-party services with which the current system interacts.
Data Filtering
Data filtering is a necessary evil when dealing with multiple data sources. It is often the case that the same piece of information about a patient’s health record comes from multiple, disparate data sources. In order to create a complete and lean health record, the system in question needs to eliminate duplicate records. One way to do this is to involve a rules engine that works like a state machine throughout the execution of the application and filters out duplicate records. Let’s look at an actual example to illustrate the duplicate data problem.
Let's assume that the current system receives data from two sources: a claims service and a web-portal where users enter health-related records themselves. In a simple scenario, let us assume that the user has started taking ROBITUSSIN and has entered the name as is in the user portal. For the sake of argument, let us assume ROBITUSSIN is a prescription strength medicine and the claims feed that comes through the insurance company also contains the same record with one difference. The insurance company record has an extra code field that has the value "54569133000," which in NDC code corresponds to "ROBITUSSIN-DAC" which contains the extra generic name information of "P-EPHED HCL/CODEINE/GUAIFEN." When the application receives both of these feeds from data sources, it needs to make a decision which one to use. In our example above, going with the claims feed which has codified data provides much more detail (different formula), and this additional information can make a drastic difference in the interpretation of data by a physician. If we change this example slightly to involve penicillin instead of a cough medicine, one can see the huge impact that the duplicate elimination decision can have.
How can the application be structured to use one record versus another? One proven solution is to use a rules engine for different data types and rank the quality of the records based on different criteria such as:
- Origination source of the information
- Required field availability
- Usability of data in the record
Although the first two are self-explanatory, the third point, usability of data in the record, deserves some explanation. Two medication records might both have the same name but different codes. One record might contain a code using the NDC coding system, while the other might have a code using the SNOMED coding system. If the application in question does not know how to interpret the SNOMED codes, it should pick the record with the NDC coding system. Machine readability is the key when deciding which record to work with. Although the criteria might be different for different systems, an application should always pick the record it can interpret most effectively. Having a rules engine to define and store what it means to be machine-readable greatly facilitates the maintenance of the system over time, as rules will change to accommodate the differences in standards and so on. If the rules engine determines the data to be of insufficient quality, that record should be discarded as early as possible in the CCD export process for performance and bandwith reasons.
Data Serialization
To set a common ground for the remainder of this section, we will define three terms in the context of this article: application object model, DTO, and XSLT transformations. We will be referring to these terms throughout the remainder of this paper.
Because the assumption is that there is an existing application that knows how to work with different data sources, it is also assumed that the application in question already unifies the data representation in one format. Considering the application is written in an object-oriented language and there is behavior added to raw data, we will refer to this common representation as an application object model.
Data in an XML document can be converted from one schema to another. For example, a medication record originating from different data providers can be represented in different structures, even though both are in XML. To represent the medication in a single XML form, a transformation is needed between the two formats. A transformation language called XSLT comes into play to enable such transformations. XSLT defines a set of rules that are then executed by an XSLT engine to transform one XML format to another.
A Data Transfer Object (DTO) can be thought of as a stripped-down version of the object model instance. DTOs are commonly used in data transformations to simplify the data model and minimize the time it takes to serialize and transfer the data.
When working with different data sources in legacy systems, it is safe to assume that the application knows how to work with separate data sources under one object model. One data source might be a flat file, while the other is a DataSet returned from a service. The purpose of the data serialization step is to generate an XML document containing all the necessary health data that can be further processed by the XSLT rules to create the CCD document. Therefore, the objective is to extract the data from the application object model into an XML document.
The natural tendency would be to serialize the current object model to XML using XML serialization. However, there might be fields/child objects that are not relevant to the CCD mapping. In order to simplify the object model, Netsoft USA has created a plug-in to Microsoft Visual Studio that creates simple Data Transfer Objects (DTOs) from the object model at design time. Subsequently, the DTO model is serialized to XML, giving us an XML representation of the plain data we are exporting to the CCD format.
Data Mapping
The Continuity of Care Document (CCD) is a very elaborate formatting system with different rules and constraints. Although based on XML, its constraints require a certain structure that might grossly differ from the data representation coming from the various data sources. Again, architects trying to solve this problem might be tempted to reduce it to data mapping from one system to another, much like mapping a flat text file to a database table. As tempting as this may be, certain complexities should be taken into account before delving into implementation.
Figure 3: Object Model Serialization to XML
One such complexity is data semantics. Most of the healthcare data is codified using different coding systems. Therefore, applications that consume data coming from different systems need to be code-system aware. To illustrate, imagine a bank transaction system that transfers money from one account to another by simply looking at the amount being transferred and disregarding the currency – Euros versus Dollars, for example. Such a system might work in some contexts, but not be too feasible for international transactions. The same problems exist when dealing with health information. There is a large set of coding standards such as SNOMED, NDC for medications, or CPT, ICD-9/10 for procedures.
To create the CCD document from the XML data, an XSLT document should be defined that contains a set of transformations. In essence, XSLT transformations are just text and can be edited using a simple text editor. Using an XML-based mapping tool like Altova, however, greatly streamlines the mapping process. One thing to consider when defining an XSLT document is the XSLT version number. Currently, there are two versions: v1.0 and v2.0. Although XSLT 1.0 is capable of running basic transformations, in our experience we have found that XSLT 2.0 includes a set of string and date processing functions that make the mapping much easier. One down side of using XSLT 2.0 is that the built-in engine into .NET 3.0/3.5 does not support XSLT 2.0 functions. This requires using a third-party XSLT engine. There are some open source engines and some proprietary ones. Depending on your choice, you might have to tweak the XSLT document because some version/product-specific functions cannot be migrated to other engines.
We believe that the true power of XSLT processing is the ability to have a master/child document structure, such that the general structure of the CCD mapping is in one master XSLT. The master XSLT in turn delegates individual sections (children) to other imported XSLT files that map to different sections of the CCD document. Child sections can be re-used under different master XSLTs that may be defined for different clients/scenarios. The above approach enables the application to:
- Modify the actual mapping logic in child sections without going through another release cycle.
- Enable re-use of child XSLT documents in different scenarios.
- Dynamically structure the CCD content by enabling/disabling sections.
Figure 3 illustrates the master-child structure of the XSLT document described above.
Figure 4: Master-Child XSLT Structure
Communicating with Health Vault
After converting the data into the CCD XML format, the application needs to store the record in the HealthVault. To accomplish this, we have utilized Microsoft’s HealthVault SDK which provides .NET libraries to interact with the HealthVault. At this point in the process, the application simply attaches the XML CCD record and makes a request to HealthVault to export the CCD document to HealthVault. For more details on how to perform a data export operation to HealthVault, see our related white papers on Netsoft-USA.com.
Caveats
Although we have outlined a six-step process here, there are still some things to keep in mind when working with CCD documents.
First, depending on the availability of the mapping information, it might not make sense to include some records in the CCD document. For example, a lab result of blood glucose level with no unit of measure may not only be ambiguous, but it also might lead to an incorrect diagnosis or improper treatment and could represent a potential danger to a patient’s health.
Additionally, throughout the process of creating a CCD document, one must keep in mind the consumers of the resulting data. Particularly in multi-language nations, it might be sensible to provide a translation step at the very end of the process so that human-readable parts of the CCD are translated into the target language(s). We need to add a word of caution here; if the target culture also requires changes in units and dates, the translation process should be moved closer to the data filtering step to accommodate for such changes.
The generated document must conform to the CCD schema. One thing to consider is the schematic validation of the generated CCD document prior to an attempt to transmit it to HealthVault. This step will enable the application to catch any schematically erroneous documents and possibly save processing power and bandwidth involved in creating a connection to the HealthVault service, transmitting the data, and receiving a schema exception from HealthVault.
of |
