7 Challenges of Mastering Clinical Data Registries: Metadata Variety

This series outlines the Seven Informatics Challenges for Clinical Data Registries, the questions you should ask when addressing research data management, and how our RexStudy platform is engineered to ensure your research teams generate high-quality, reliable, and statistically sound data.

Building CDRs that support acquisition, curation, and dissemination of clinical research data poses a number of unique challenges. For example, a center-level CDR system needs to accumulate data across multiple studies, time points, and data-types. At the same time, it needs to support research operations workflows that are heterogeneous across projects. CDRs must operate within a complex ecology of data sources, consumers, and governance. The complex ecology poses a number of informatics challenges to delivering CDRs.

Informatics Challenge #1: Metadata Variety

The more varied and volatile data elements and schemas should be treated differently from those that are more homogeneous and stable. For example, data models that define output of measurement instruments (e.g., data collection forms and devices that generate measurement data files), as well as the data generated by the instruments need to be treated differently than research operations data (e.g., studies, grants, and research staff).

Measurements may involve:

  • Storing tens of thousands of scientific variables
  • Operational workflows with dozens of tables with hundreds of columns

Questions you should ask before building your CDR

  • How will the system scale to a very large number of variables
  • How will the system scale to data structures that vary across data types such as:
  • Relatively flat (e.g. form data)
  • Deep hierarchies (e.g. derived results linked to protocol methods)
  • Highly interlinked structures (e.g. sensor data from experiments that present specific stimuli to participants)

How RexStudy handles Metadata Variety

The more varied and volatile data elements and schemas should be treated differently from those that are more homogeneous and stable. For example, data models that define output of measurement instruments (e.g., data collection forms and devices that generate measurement data files), as well as the data generated by the instruments need to be treated differently than research operations data (e.g., studies, grants, and research staff). RexStudy uses a mixed representation model, collecting and storing the instrument metadata and data as hierarchical files (JSON objects) while storing the operational data in database tables. The instrument data is unpacked into database tables for exploration in data marts.

Stay tuned for the next blog in this series: 7 Challenges of Mastering Clinical Data Registries: Schema Volatility.

If you enjoyed this article, register to receive notification of our latest posts, webinars, white papers, and more using the form at the top of our DataBytes blog page here.