Four ways to increase genomic data diversity for research

Four ways to increase genomic data diversity
for research

February 2, 2023

There is a Lack of Diversity in Genomic Data used for Drug Development

In order to achieve the vision that precision medicine promises, we need to work towards “delivering the right treatments, at the right time, every time to the right person.” To realize this, we have to first acknowledge the limitations of the data that are typically used to determine safety and efficacy of drugs in human populations. We know that the majority of data used in drug development and approvals comes from populations that do not reflect the full heterogeneity of key factors. Study samples over-represent patients of European ancestry and those that seek care at academic medical centers. This lack of diversity is especially important if we are to understand genomic factors of disease and predictors or prognostic factors in treatment outcomes. It is critical that we supplement the availability of representative populations using real world data (including health records, imaging, genomics, proteomics and other omics data) in addition to clinical trial data across diverse populations. In doing so, we will be able to more precisely target therapies for well defined segments of patients, rather than adopting a one size fits all approach.

When exploring genomic data for drug development however, we see the majority of genomes used for research lack representation from areas of the world outside of Europe, specifically Indigenous populations, African populations and Asian Populations. Some efforts exist, like the African Genome Variation Project, which has cataloged the genomic profile of 100 individuals each from 10 ethnic groups for 2.5 million genetic variants, and The Global Alliance for Genomics and Health (GA4GH) which is working on a framework to store, analyze and share genomic data among international researchers, but more are needed. Without this data, drug developers will continue to struggle in stratifying diverse populations and developing therapies that are targeted to patient’s genetic make up.

An increasing number of studies collecting genomic data from non-European caucasian populations are identifying different patterns of mutations than those previously shown to be actionable, especially in Oncology. For example, in mNSCLC, researchers found that STK11 mutations were significantly higher in African Americans compared to those of European descent and another study showed a greater percentage of Asians with Epidermal Growth Factor Receptor mutations.

“Historically, the people who have provided their DNA for genomics research have been overwhelmingly of European ancestry, which creates gaps in knowledge about the genomes from people in the rest of the world.” – NIH

Homogeneity in Cardiovascular Genomics Means Worse Outcomes

Cardiovascular medicine has been significantly impacted by the introduction of genetically driven therapies and care including clinical genome sequencing, genetic risk scores, targeted therapies like PCSK9 inhibitors and induced pluripotent stem cells. Genetic testing in CVD (from cardiomyopathies to arrhythmias) is already facilitating diagnoses, helping families understand hereditary risk, directing precision therapy and identifying patients for targeted therapy.

However, the majority of genomic data that informed these advances we know to have originated from European populations. One example is in 9p21 which was one of the early targets for risk stratification in coronary artery disease. Despite 100s of studies of 9p21 in populations of European descent, none have been replicated in African American or Latin American populations. Given that under-represented minorities like African American men and women still have much worse outcomes from CVD than do those of European descent, it is clear that there remains unmet need for genomic data and research in these populations. If we don’t start incorporating genomic data from more diverse populations in clinical research now, the disparities we see in treatment outcomes will only be amplified in the coming years as companion diagnostics and therapies are approved.

Diversity in NAFLD Genomics drives understanding of associated diseases

In addition to disparities in cardiovascular disease, we see a higher burden among hispanics compared to other ethnic groups that present with non-alcoholic fatty liver disease (NAFLD) and one meta analysis reports that less than half of clinical trials in NASH capture ethnic and racial demographic data. There is high unmet need in NASH patients due to the complexity of the disease and difficulty in identification of accurate biomarkers, leaving physicians to rely primarily on lifestyle measures like diet. The first GWAS study of NASH in 2008 which included African Americans, European Americans and Hispanic populations found that the genetic variant I148M was associated with greater lipid contents, and subsequent studies have backed this up. Since then, ~30 genetic mutations have been associated with disease and outcomes in NAFLD patients. Genomic data accessed from more diverse populations will allow for better patient stratification and targeted therapy development in this complex disease.

4 ways to increase Diversity of Genomic Data

There are several ways that researchers and data vendors can start to address these disparities in genomic data availability:

Establish trust with diverse populations and the research community to encourage sharing of data: Working with traditionally marginalized groups to enhance trust with the research community will advance development of therapies for these populations.
Encourage investigators to selectively sequence samples from diverse populations: The research community needs to identify and selectively sequence populations with the highest unmet need to help researchers access data needed for therapy development in these populations.
Use non-traditional sources of genomic data: Genomic data that comes from clinical trial populations or academic medical centers as well as European registries will continue to lack diversity and have single source bias. Alternative sources of genomic data, such as distributed labs with wide geographic spread, could help expand diversity in available data.
Encourage diverse population inclusion requirements in protocols at all stages of drug development: FDA Industry standards for diversity in clinical trials have support. Let’s extend this to diversity in genomic data used for research.

By including diverse populations in genomic studies, drug researchers will be able to better associate genomic variants with patient outcomes and ultimately help bring more individualized therapies to market that treat all populations.