PHILADELPHIA — Pleiotropy analysis, which provides insight on how individual genes result in multiple characteristics, has become increasingly valuable as medicine continues to lean into mining genetics to inform disease treatments. Privacy stipulations, though, make it difficult to perform comprehensive pleiotropy analysis because individual patient data often can’t be easily and regularly shared between sites. However, a statistical method called Sum-Share, developed at Penn Medicine, can pull summary information from many different sites to generate significant insights. In a test of the method, published in Nature Communications, Sum-Share’s developers were able to detect more than 1,700 DNA-level variations that could be associated with five different cardiovascular conditions. If patient-specific information from just one site had been used, as is the norm now, only one variation would have been determined.
“Full research of pleiotropy has been difficult to accomplish because of restrictions on merging patient data from electronic health records at different sites, but we were able to figure out a method that turns summary-level data into results that are exponentially greater than what we could accomplish with individual-level data currently available,” said the one of the study’s senior authors, Jason Moore, PhD, director of the Institute for Biomedical Informatics and a professor of Biostatistics, Epidemiology and Informatics. “With Sum-Share, we greatly increase our abilities to unveil the genetic factors behind health conditions that range from those dealing with heart health, as was the case in this study, to mental health, with many different applications in between.”
Sum-Share is powered by bio-banks that pool de-identified patient data, including genetic information, from electronic health records (EHRs) for research purposes. For their study, Moore, co-senior author Yong Chen, PhD, an associate professor of Biostatistics, lead author Ruowang Li, PhD, a post-doc fellow at Penn, and their colleagues used eMERGE to pull seven different sets of EHRs to run through Sum-Share in an attempt to detect the genetic effects between five cardiovascular-related conditions: obesity, hypothyroidism, type 2 diabetes, hypercholesterolemia, and hyperlipidemia.
With Sum-Share, the researchers found 1,734 different single-nucleotide polymorphisms (SNPs, which are differences in the building blocks of DNA) that could be tied to the five conditions. Then, using results from just one site’s EHR, only one SNP was identified that could be tied to the conditions.
Additionally, they determined that their findings were identical whether they used summary-level data or individual-level data in Sum-Share, making it a “lossless” system.
To determine the effectiveness of Sum-Share, the team then compared their method’s results with the previous leading method, PheWAS. This method operates best when it pulls what individual-level data has been made available from different EHRs. But when putting the two on a level playing field, allowing both to use individual-level data, Sum-Share was statistically determined to be more powerful in its findings than PheWAS. So, since Sum-Share’s summary-level data findings have been determined to be as insightful as when it uses individual-level data, it appears to be the best method for determining genetic characteristics.
“This was notable because Sum-Share enables loss-less data integration, while PheWAS loses some information when integrating information from multiple sites,” Li explained. “Sum-Share can also reduce the multiple hypothesis testing penalties by jointly modeling different characteristics at once.”
Currently, Sum-Share is mainly designed to be used as a research tool, but there are possibilities for using its insights to improve clinical operations. And, moving forward, there is a chance to use it for some of the most pressing needs facing health care today.
“Sum-Share could be used for COVID-19 with research consortia, such as the Consortium for Clinical Characterization of COVID-19 by EHR (4CE),” Yong said. “These efforts use a federated approach where the data stay local to preserve privacy.”
This study was supported by the National Institutes of Health (grant number NIH LM010098).
Co-authors on the study include Rui Duan, Xinyuan Zhang, Thomas Lumley, Sarah Pendergrass, Christopher Bauer, Hakon Hakonarson, David S. Carrell, Jordan W. Smoller, Wei-Qi Wei, Robert Carroll, Digna R. Velez Edwards, Georgia Wiesner, Patrick Sleiman, Josh C. Denny, Jonathan D. Mosley, and Marylyn D. Ritchie.
Penn Medicine is one of the world’s leading academic medical centers, dedicated to the related missions of medical education, biomedical research, excellence in patient care, and community service. The organization consists of the University of Pennsylvania Health System and Penn’s Raymond and Ruth Perelman School of Medicine, founded in 1765 as the nation’s first medical school.
The Perelman School of Medicine is consistently among the nation's top recipients of funding from the National Institutes of Health, with $550 million awarded in the 2022 fiscal year. Home to a proud history of “firsts” in medicine, Penn Medicine teams have pioneered discoveries and innovations that have shaped modern medicine, including recent breakthroughs such as CAR T cell therapy for cancer and the mRNA technology used in COVID-19 vaccines.
The University of Pennsylvania Health System’s patient care facilities stretch from the Susquehanna River in Pennsylvania to the New Jersey shore. These include the Hospital of the University of Pennsylvania, Penn Presbyterian Medical Center, Chester County Hospital, Lancaster General Health, Penn Medicine Princeton Health, and Pennsylvania Hospital—the nation’s first hospital, founded in 1751. Additional facilities and enterprises include Good Shepherd Penn Partners, Penn Medicine at Home, Lancaster Behavioral Health Hospital, and Princeton House Behavioral Health, among others.
Penn Medicine is an $11.1 billion enterprise powered by more than 49,000 talented faculty and staff.