Four ways to increase genomic data diversity
for research


There is a Lack of Diversity in Genomic Data used for Drug Development

In order to achieve the vision that precision medicine promises, we need to work towards “delivering the right treatments, at the right time, every time to the right person.” To realize this, we have to first acknowledge the limitations of the data that are typically used to determine safety and efficacy of drugs in human populations. We know that the majority of data used in drug development and approvals comes from populations that do not reflect the full heterogeneity of key factors. Study samples over-represent patients of European ancestry and those that seek care at academic medical centers. This lack of diversity is especially important if we are to understand genomic factors of disease and predictors or prognostic factors in treatment outcomes. It is critical that we supplement the availability of representative populations using real world data (including health records, imaging, genomics, proteomics and other omics data) in addition to clinical trial data across diverse populations. In doing so, we will be able to more precisely target therapies for well defined segments of patients, rather than adopting a one size fits all approach.


When exploring genomic data for drug development however, we see the majority of genomes used for research lack representation from areas of the world outside of Europe, specifically Indigenous populations, African populations and Asian Populations. Some efforts exist, like the African Genome Variation Project, which has cataloged the genomic profile of 100 individuals each from 10 ethnic groups for 2.5 million genetic variants, and The Global Alliance for Genomics and Health (GA4GH) which is working on a framework to store, analyze and share genomic data among international researchers, but more are needed. Without this data, drug developers will continue to struggle in stratifying diverse populations and developing therapies that are targeted to patient’s genetic make up.


An increasing number of studies collecting genomic data from non-European caucasian populations are identifying different patterns of mutations than those previously shown to be actionable, especially in Oncology. For example, in mNSCLC, researchers found that STK11 mutations were significantly higher in African Americans compared to those of European descent and another study showed a greater percentage of Asians with Epidermal Growth Factor Receptor mutations.


“Historically, the people who have provided their DNA for genomics research have been overwhelmingly of European ancestry, which creates gaps in knowledge about the genomes from people in the rest of the world.” – NIH


Homogeneity in Cardiovascular Genomics Means Worse Outcomes


Cardiovascular medicine has been significantly impacted by the introduction of genetically driven therapies and care including clinical genome sequencing, genetic risk scores, targeted therapies like PCSK9 inhibitors and induced pluripotent stem cells. Genetic testing in CVD (from cardiomyopathies to arrhythmias) is already facilitating diagnoses, helping families understand hereditary risk, directing precision therapy and identifying patients for targeted therapy.


However, the majority of genomic data that informed these advances we know to have originated from European populations. One example is in 9p21 which was one of the early targets for risk stratification in coronary artery disease. Despite 100s of studies of 9p21 in populations of European descent, none have been replicated in African American or Latin American populations. Given that under-represented minorities like African American men and women still have much worse outcomes from CVD than do those of European descent, it is clear that there remains unmet need for genomic data and research in these populations. If we don’t start incorporating genomic data from more diverse populations in clinical research now, the disparities we see in treatment outcomes will only be amplified in the coming years as companion diagnostics and therapies are approved.


Diversity in NAFLD Genomics drives understanding of associated diseases


In addition to disparities in cardiovascular disease, we see a higher burden among hispanics compared to other ethnic groups that present with non-alcoholic fatty liver disease (NAFLD) and one meta analysis reports that less than half of clinical trials in NASH capture ethnic and racial demographic data. There is high unmet need in NASH patients due to the complexity of the disease and difficulty in identification of accurate biomarkers, leaving physicians to rely primarily on lifestyle measures like diet. The first GWAS study of NASH in 2008 which included African Americans, European Americans and Hispanic populations found that the genetic variant I148M was associated with greater lipid contents, and subsequent studies have backed this up. Since then, ~30 genetic mutations have been associated with disease and outcomes in NAFLD patients. Genomic data accessed from more diverse populations will allow for better patient stratification and targeted therapy development in this complex disease.


4 ways to increase Diversity of Genomic Data


There are several ways that researchers and data vendors can start to address these disparities in genomic data availability:

  1. Establish trust with diverse populations and the research community to encourage sharing of data: Working with traditionally marginalized groups to enhance trust with the research community will advance development of therapies for these populations.
  2. Encourage investigators to selectively sequence samples from diverse populations: The research community needs to identify and selectively sequence populations with the highest unmet need to help researchers access data needed for therapy development in these populations.
  3. Use non-traditional sources of genomic data: Genomic data that comes from clinical trial populations or academic medical centers as well as European registries will continue to lack diversity and have single source bias. Alternative sources of genomic data, such as distributed labs with wide geographic spread, could help expand diversity in available data.
  4. Encourage diverse population inclusion requirements in protocols at all stages of drug development: FDA Industry standards for diversity in clinical trials have support. Let’s extend this to diversity in genomic data used for research.


By including diverse populations in genomic studies, drug researchers will be able to better associate genomic variants with patient outcomes and ultimately help bring more individualized therapies to market that treat all populations.



The Propagation of Racial Disparities in Cardiovascular Genomics Research (2021)

A more-inclusive genome project aims to capture all of human diversity, Nature

Diversity in Genomic Research: Fact Sheet NIH

Towards equitable and trustworthy genomics research (2022)

Next generation sequencing in cardiovascular diseases NIH

Cardiovascular Precision Medicine in the Genomics Era

Integrating genomics with biomarkers and therapeutic targets to invigorate cardiovascular drug development (2021)

Lack Of Diversity In Genomic Databases Is A Barrier To Translating Precision Medicine Research Into Practice (2018)

Racial disparities in nonalcoholic fatty liver disease clinical trial enrollment: A systematic review and meta-analysis (2020)
Genetic Polymorphisms and Diversity in Nonalcoholic Fatty Liver Disease (NAFLD): A Mini Review (2023)

Neil Hanchard unravels the complexity of childhood diseases, aims to diversify genomics research (2023)

Get the latest news and updates about Ovation's Genomic Data offerings delivered to your inbox.

Ovation and DNAnexus Present at the 2nd Precision Medicine in IBD Summit

Learn about the transformative power of leveraging high-value multiomics data alongside an advanced..

Linking Genomic and Phenotypic Data to Better Understand the Patient Journey

In partnership with Datavant, we explore how researchers can overcome the challenges posed by..

We looking forward to speaking with you!