N3C, on the other hand, is verifiable by, and responds to, thousands of researchers in hundreds of participating institutions, with a strong focus on transparency and reproducibility. Everything that users do in the interface, which he uses Palovir’s GovCloud platform, is carefully maintained, so anyone with access can retrace their steps.
“This isn’t race science, and it’s not really new. It’s just hard work. It’s annoying, it has to be done with care, and we have to validate every step,” says Christopher Chute, professor of medicine at Johns Hopkins who co -directs also N3C. “The worst thing we can do is methodically turn the data into junk that would give us wrong answers.”
Handel indicates that these efforts did not come easily. “The diversity of skills it took to achieve this situation, perseverance, dedication and, frankly, brute force, is just unprecedented,” he says.
This brute force comes from many different fields, many of which are not traditionally part of medical research.
“Having helped everyone on board from all aspects of science really helped. At the time of the covid people were much more willing to collaborate,” says Mary Boland, professor of computer science at the University of Pennsylvania. “You could have engineers, you could have scientists in computer science, physicists, all these people who might not normally participate in public health research.”
Boland is part of a group that uses N3C data to look for whether covid increases irregular bleeding in women with polycystic ovary syndrome. Outside of the covid, most researchers will use insurance claims data to get a large enough database for population-level analysis, he says.
The data of the claims may answer some questions about the functioning of the drug in the real world, for example. But these databases lack an enormous amount of information, including lab results, what symptoms people report, and even whether patients die.
Harvesting and cleaning
Outside of insurance claims databases, most health data collaborations in the United States use a federated model. Participants in these studies all agreed to format their own data sets in a common format, and then launched questions from the collective, such as the proportion of severe cases per age group. Several international covid research teams, including the Observational Health Sciences and Informatics (OHDSI, pronounced “Odyssey”), operate in this way, avoiding legal and political problems with cross-border patient data.
OHDSI, which was founded in 2014, has researchers from 30 countries, who together hold records for 600 million patients.
“This allows each institution to keep its data behind its own firewalls, with its own data protections in place. It doesn’t require patient data to move back and forth,” says Boland. “It’s comforting for a lot of places, especially with all the hacking that’s been going on lately.”