Population genetics data delivers health breakthroughs using SAS® machine learning and advanced analytics
As one of the first community-based population health studies in the US, the Healthy Nevada Project launched in 2016 with three straightforward goals: Conduct sound science, improve health and save lives.
Now the nation’s largest such study, this groundbreaking health and genetics project is three for three.
Developed by Renown Institute for Health Innovation (Renown IHI), the Healthy Nevada Project offers genetic testing at no cost to Nevada residents who want to learn more about their health and genetic profile.
By combining genetic data, environmental data and individual health information, researchers and physicians are gaining new insights into population health, enabling personalized health care while improving the health and well-being of entire communities in Nevada.
We have cases where people have told us, ‘Thank you so much, you saved my life,’ because they were able to have preventive surgery, or they found a treatable Stage I tumor because of the results of genetic testing. Those are the things we live for in this project. Jim Metcalf Chief Data Scientist Healthy Nevada Project
Predicting health outcomes with analytics
Painting an accurate portrait of an individual or population to help understand and anticipate health outcomes requires data representing many life factors. Those include genetics, socioeconomic backgrounds, physical environments, lifestyle behaviors and quality of health care.
“One of medicine’s most complicated questions is, how do you predict what someone’s health outcome is going to be?” says Joseph Grzymski, PhD, who serves as Principal Investigator of the Healthy Nevada Project, Chief Scientific Officer of Renown Health, and Research Professor of Computational Biology and Genetics at the Desert Research Institute. “It’s not just genetics, or your blood pressure or where you live. It’s trying to model all the impacting factors for diseases. The massive challenge of population health studies is to build better predictive models to understand why some people get sick and others don’t, why some live to be 90 and above, and determine what that magical equation is.”
Working in tandem with experts in environmental data at the Desert Research Institute, Renown Health fuels the project with de-identified electronic health records. Researchers supplement this with data from the Environmental Protection Agency (EPA), the US Census Bureau, birth and death records, and other data sources to build a population health portrait.
To form connections between participant genetic information and other health factors, data scientists apply machine learning and artificial intelligence capabilities to DNA results generated by Helix, a partner specializing in population genomics.
“We’re working to understand how environmental and other factors can help predict who may be at risk, allow for quicker diagnoses and encourage the development of more precise treatments,” says Jim Metcalf, Chief Data Scientist of the Healthy Nevada Project. “The modern statistical and machine learning methods, along with the intuitive data visualizations made possible by SAS, have been critical elements.”
The underpinning of a population genetics study is access to data and then the ability to extract, transform and study the data for any of the myriad health outcomes we want to focus on. Joseph Grzymski, PhD Principal Investigator, Healthy Nevada Project Chief Scientific Officer, Renown Health
Early detection and prevention: ‘The things we live for’
In addition to using analytics to identify populations and subpopulations of people who already have a disease in common, project researchers also apply analytics to get in front of diseases before they manifest in individuals.
After a participant’s voluntary genetic testing, the team checks for risks for many serious genomic conditions, including the top three identified by the Centers for Disease Control and Prevention as medically actionable (CDC Tier 1):
- Hereditary Breast and Ovarian Cancer Syndrome, with increased risk for breast, ovarian, tubal and other cancers due to mutations in BRCA1 or BRCA2 genes.
- Lynch syndrome, which has a genetic predisposition to colorectal, endometrial, ovarian and certain other cancers.
- Familial hypercholesterolemia (FH), a high cholesterol condition caused by genetic mutations that can lead to a heart attack or stroke if left untreated.
Most individuals affected by these genetic risks aren’t aware they have them. “We have medically licensed genetic counselors who will call our participants if they have a particular mutation and inform them, so they can talk with their physician and make important health decisions,” Metcalf says.
Healthy Nevada Project participant Jordan Stiteler says the unexpected phone call saved her life.
Stiteler, a young mother, had family members who had suffered heart attacks and strokes at early ages. When she learned she carried the FH marker, she received guidance and support to help her make healthy lifestyle and medication choices. Soon several other family members joined the study to learn about their genetic risks.
Genetic screening also makes it possible to get in front of a cancer diagnosis. “The ideal is to detect these mutations prior to any kind of a tumor becoming untreatable,” Metcalf says. “We have cases where people have told us, ‘Thank you so much, you saved my life,’ because they were able to have preventive surgery, or they found a treatable Stage I tumor because of the results of genetic testing. Those are the things we live for in this project.”
We use SAS to comb through, manipulate and extract 200 terabytes of genetics and health records data. Setting the right parameters, we can look through a billion-record table of physician notes with no problem. Jim Metcalf Chief Data Scientist Healthy Nevada Project
More data leads to greater understanding
Since its initial 10,000 adult participants, the Healthy Nevada Project has grown to more than 52,000 individuals and expanded from northern Nevada to Las Vegas and its outlying areas in the southern part of the state.
According to Grzymski, more genome data from more people equates to greater statistical power and accuracy in understanding the links between who you are and your health outcomes. “The underpinning of a population genetics study is access to data and then the ability to extract, transform and study the data for any of the myriad health outcomes we want to focus on,” he says.
Providing the foundation for those efforts is a SAS platform, which the project runs in an on-premises, HIPAA-compliant computing environment.
“The strength of the language, the depth, everything that SAS brings has been rock solid,” Metcalf says. “We use SAS to comb through, manipulate and extract 200 terabytes of genetics and health records data. Setting the right parameters, we can look through a billion-record table of physician notes with no problem.”
An ongoing journey into population health
The Healthy Nevada Project continues to bring a variety of data sources to the table for insights into population health, including:
- Analysis of statewide data of all emergency room visits to provide an extensive view of why people visit the ER.
- Mining decades of data from EPA air quality monitors in Nevada’s Washoe Valley to determine links between wildfire smoke and population occurrences of respiratory diseases.
- Analyzing hospital COVID-19 data and data from EPA air quality sensors to identify a correlation between smoke from a heavy wildfire season and increases in COVID-19 cases. “The study found an approximate 18% increase in COVID case counts when people are breathing forest fire smoke, underscoring how environmental factors absolutely weigh into this project,” Metcalf says.
The team uses SAS statistical models and analyses to report results to hospital administrators and research to the team’s scientific peers for review.
Healthy Nevada Project – Facts & Figures
terabytes of genetics and health records data
Increasing analytical power with SAS AI and machine learning
“The SAS platform has been the foundation bedrock of the Healthy Nevada Project,” Metcalf says. “We have immersed ourselves in the machine learning and AI procedures that SAS has and use those on a continual basis.”
For example, a hospital wanted to reduce the time patients spend in the post-anesthesia care unit or stepdown room after surgery. To understand why some patients required more time there, the Healthy Nevada Project used a variety of SAS procedures, such as variable selection in the analytic process, to facilitate machine learning, allowing researchers to identify and eliminate possible causes as key factors.
The researchers found that the top factors most directly contributing to time spent in the stepdown room were the anesthesia type used, the patient’s age and the patient’s relative health.
“The Healthy Nevada Project has elevated Nevada’s profile in doing cutting-edge research, using data to deliver evidence-based, publishable results in peer-reviewed scientific journals and databases,” says Grzymski. “The entire team is proud of the work we’ve delivered and its impact as we continue to understand what makes people sick or well and enable preventive care.”
The results illustrated in this article are specific to the particular situations, business models, data input, and computing environments described herein. Each SAS customer’s experience is unique based on business and technical variables and all statements must be considered non-typical. Actual savings, results, and performance characteristics will vary depending on individual customer configurations and conditions. SAS does not guarantee or represent that every customer will achieve similar results. The only warranties for SAS products and services are those that are set forth in the express warranty statements in the written agreement for such products and services. Nothing herein should be construed as constituting an additional warranty. Customers have shared their successes with SAS as part of an agreed-upon contractual exchange or project success summarization following a successful implementation of SAS software. Brand and product names are trademarks of their respective companies.