7 Challenges of Mastering Clinical Data Registries: Interoperability and Data Repurposing

This series outlines the Seven Informatics Challenges for Clinical Data Registries, the questions you should ask when addressing research data management, and how our RexStudy platform is engineered to ensure your research teams generate high-quality, reliable, and statistically sound data.

Building CDRs that support acquisition, curation, and dissemination of clinical research data poses a number of unique challenges. For example, a center-level CDR system needs to accumulate data across multiple studies, time points, and data-types. At the same time, it needs to support research operations workflows that are heterogeneous across projects. CDRs must operate within a complex ecology of data sources, consumers, and governance. The complex ecology poses a number of informatics challenges to delivering CDRs.

Informatics Challenge #6: Data Reuse and Repurposing 

Research data accumulation and reuse requires the ability to import and transform data from a variety of data sources. Once acquired, Clinical Data Registries (CDRs) must provide the capability to reorganize and transform data for different uses. On the output side, CDRs must provide both manual and programmatic methods for querying and exporting data.

Questions you should ask before building your CDR 

  • Does the system support complex queries needed in research environments?

  • Does the system provide an Application Programming Interface (API) for programmatic access?

  • Does that API support the complex query functionality?

  • How does the system address unpredictable processing loads on the transactional system imposed by ad hoc queries?

  • What mechanisms are available for ad hoc querying by data consumers?

  • How does the system support learnability by data consumers for self-service data access?

  • What is the strategy for interoperability with external EDC systems?

  • Is there a published standard for technical interoperability and for semantic interoperability?

RexStudy is designed to maximize the value and use of your data 

A powerful web API is essential for any research data repository. Programmatic access should include advanced query and data transformation functionality—not merely record iteration and retrieval. RexStudy uses powerful open source query API, the HyperText Structured Query Language (HTSQL), which supports query, submission, and data transformation needs.

A systematic approach to exchanging and storing scientific variables generated by forms and measurement instruments across different systems can simplify data exchange. RexStudy uses an open source instrument definition model, Research Instrument Open Standard (RIOS), which supports portable instrument definitions, multiple languages, and multi-channel (e.g., web, SMS) data acquisition.

Configurable analytic data marts should be used to simplify data models available to data consumers to reduce training costs. RexStudy allows data managers to configure multiple data marts that better fit users mental models about data sets. Using data marts also offer a performance advantage: ad hoc queries can run against analytic data marts, insulating the transactional system from unpredictable loads. Further, data marts benefit by having their own set of permissions allowing, for example, a user to have access to a data mart free of protected health information (PHI) but not access to the transaction system where PHI is stored.

Transactional databases are optimized for proper storage of data, not for ease of search by end-users. They may contain a very large number of tables and columns. Generally, the more types of data you organize in a transactional database, the harder it becomes to teach data consumers how to query it effectively. Further, there is often tension between optimal ways of organizing data for different data consumers. Our solution is to re-organize the data into data marts that are configured for the specific querying needs of each class of data consumer.

Don’t miss the first five parts of this series:

Part 1: Metadata Variety

Part 2: Schema Volatility

Part 3: Workflow Variability

Part 4: Complex Data Provenance

Part 5: Security and Privacy

If you enjoyed this article, register to receive notification of our latest posts, webinars, white papers, and more using the form at the top of our DataBytes blog page here.