Privacy-First Research: How OpenSAFELY Transforms Medical Data Analysis
OpenSAFELY is revolutionizing medical research by protecting patient privacy and unlocking insights from healthcare data. Hannew Lowette talks to Eli Holderness about its secure, reproducible approach to research.
About the experts
Read further
Revolutionizing Research: How OpenSAFELY Protects GP Data While Unlocking Medical Insights
Hannes Lowette: Hi, I'm Hannes. I'm a principal consultant working for Axxes in Belgium, and I'm joined here today by Eli Holderness to talk about responsible care, to talk about responsibly using GP records for research purposes. Eli, would you care to introduce yourselves to the audience?
Eli Holderness: Hello? Yes, my name is Eli. I am a research software advocate at the Bennett Institute based out of the University of Oxford. The main thing that the Bennett Institute does is produce, administer, build, design, OpenSAFELY, which is what I'm going to talk about with harness today.
Hannes Lowette: Now, just for the reference of the audience, can you maybe explain a little bit what OpenSAFELY is, what it does. So that we have a little bit of an idea of what we were talking about.
Eli Holderness: The way I like to talk about it is that OpenSAFELY is both a philosophy of research, sort of, data science being used for research purposes. It's also a set of tooling that enables that philosophy to be carried out. So the idea is that if you've got lots of sensitive data. At the moment we have access to about 58 million rows of 58, of patient data in the UK provided by the college or GP's.
If you want to use that data to draw insights about, you want to make sure that that data is handled responsibly with respect to the patient's privacy, and that the tooling is in place to make that easy.
What we have is we have access to about 58 million rows of patient data. And we are building. We've got this up and running. Papers are being written based on the back of research done using the open safety tooling. That's what it is. But the philosophy of it is the idea is that if you've got this patient data, if you've got sensitive data and you want to do it, you want to run code against it, you want to query it, you want to do statistical analyzes.
Either you have to move that data out of whatever secure environment it's stored in, or you have to take the analysis to the data. And so that's what OpenSAFELY does, is all of the research code that is ever run against this data that we have access to happens behind a sort of secure environment.
Hannes Lowette: That's really interesting to me, because the way I just understand what you just said is instead of giving people access to the data so that they can query it themselves and draw their own conclusions. You are going to be operating their code against the data and they never see the actual data. They only see the results of their analysis. Is that how it works?
Eli Holderness: Yes, pretty much so if you are a researcher and you want to write a study and you want to use OpenSAFELY to do it, you contact us, you get approval for your study. So there's several governing bodies that we need to go and talk to about that. NHS England is one of them. We have our own approvals process.
The pre-approval process, once you have been cleared to run that project, you get put in touch with one of our copilots or our copilot team, which I sort of support first line research support. You write some code publicly on GitHub. You don't have to make the repository public until you've actually published any papers. But you write code on GitHub.
The way that that code ends up being run against the real data is that you submit it to us through our OpenSAFELY jobs server. You schedule a job and then we take that code, which is some of which will be written in our domain specific query language, which we wrote specifically for working with electronic health care records.
Which is incredibly cool and it is open source. I will love to talk more about it. You write code in that query language. That is then eventually compiled to SQL that runs against the specific database that we have access to, the skill back end. All of the code that makes that happen is open source as well.
Our job site, our job runs all of these things. And that code then runs in the secure environment. And we as the open safety team, have access to certain things within that remote, that secure environment. We've got some limited telemetry that we can get out of there. We can know why jobs failed or if they succeeded, things like that.
But we don't even have direct access to. Look at that data. It's super locked down. So your code has run. You've extracted data from the database into a sort of working memory. It's actually there and runs transformations on it. You've run your analyzes and statistical models, and then the results of that are aggregated and made available to output checkers and output checkers are trained professionals, usually researchers themselves, who look through the results of what you've done, which might be like tables and graphs and things like that, and check whether any of what you've output is disclosed live.
Could be used to identify individuals or small groups of individuals. And that's like zero absolutely hard and fast rules about how that's done. But there's also some element of judgment calls about whether or not, well, it's okay that this number was referred to only a really small number of patients because we expected that as part of the study design.
So that's a key part of the process. That's something that standard across any kind of research that deals with sensitive data is output checking.
Hannes Lowette: You can even see it in systems where you are trying to measure employee satisfaction, right? It's a place where I've run across this, if you start adding parameters to your queries, as soon as you get close enough to being able to identify one of your employees, like who has this one thing.
Eli Holderness: You have to shut that down.
Hannes Lowette: You have to shut that down. And basically that's what you have to do for records that are way more sensitive than employee satisfaction.
Eli Holderness: As far as I know, my data is in that back end. So with GP records in the UK, there are two main suppliers of electronic healthcare records software, and they are TPP and IMS. And in the past OpenSAFELY has had software running in both of those back end secure back end.
Right now, we only have TPP available due to funding reasons. I'm not sure which of the two systems my GP record is on, and the data in my record doesn’t feel personal to me. But if I wanted to, I could recall things like the date and time of an appointment, a diagnosis I received, or a prescription I was given, and use that personal knowledge to explore and access my own records remotely.
But the thing is that I would never get that information back out because the output checkers would go, well, that's not that's not good. You can't be you can't be doing that because I could do it about myself. And maybe you could argue that I should be able to do that about myself. I shouldn't be able to do it about, let's say, a politician in my country who's right, some of whose medical history is known.
So output checking is a sort of key part of how we preserve privacy of this data.
Hannes Lowette: And that at the moment is still done by humans.
Eli Holderness: Yes, it's still done at the moment, at time of recording by humans within the institute. That doesn't scale if it doesn't scale infinitely. So we're looking at better ways to have that output checking be performed by a broader set of people. Obviously having people outside the Institute as well reduces the chance that anybody could think that that's biased.
Because obviously we're also doing science here. So we want it to be as open and as sort of democratic as possible.
Hannes Lowette: You're basing these studies on the general GP data available in that database. What kind of research is being done with it? What types of studies are being carried out using this data?
Eli Holderness: At the moment, OpenSAFELY started during the early days of Covid. What we have is we have permissions for any studies to be run that refer to that, that are related to Covid. So we have studies so we have studies about things like rates of certain drug prescriptions during Covid, rates of incidence of certain diseases linked to secondary conditions.
Obviously we've got sort of one of the primary arguments for having this data available for research in the first place during Covid was do the vaccines work? Who's getting them? Do we see side effects? So lots of things like that. As we move away from the period in time when that stuff was considered most critical? We're looking at, at the time of recording, we don't have permissions for non-COVID related research proposals to be granted for with OpenSAFELY, but work is being done to make that happen.
Ideally, sometime this year or maybe next year — I can't say for sure — we'll have permissions in place for any kind of research you can think of. Basically, any question you believe could be explored using this kind of data. What’s interesting, though, is that this data isn’t clinical research data.
It's data that has been collected for the purpose of GP's being able to do their jobs and provide primary health care.
Hannes Lowette: Does that mean that maybe there is some data lacking in the system, or some data that is not really structured for things that these researchers are doing with it?
Eli Holderness: Well, one of the interesting things, so we have access to this GP data. There are also linked data sets. So like a renal registry data set. And so the specialists do things like that. All of these data sets have different formats and different ways of accessing them. Which means that when you as a researcher write code against some of this data, if you're doing it directly in the model, that is not the OpenSAFELY model, the standard model where you are sort of given direct access access to a side on a curated data set, your code is very bespoke and can only run on that database and that data set essentially because
What OpenSAFELY has done is try to address the issue of working with all these different, often inconsistent data schemas. To tackle that, we’ve developed a domain-specific query language that compiles into the correct type of SQL and handles the data appropriately.
Hannes Lowette: What are some of the intricacies of, of this, this DSL that you created?
Eli Holderness: It's built on top of Python. And which is just a choice we've made because it was the research language. I mean, obviously we have, you have Python, you have R Language, and you have Stata as some of this, like three of the most commonly written languages for doing statistical research. Python, out of those three, was the one that the tech team were most familiar with.
Because it's also a software industry language in a way that sort of R and data are much more specialized into their research domains. What it is is it looks very much like Python with some sort of weird bits of magic sprinkled in, sort of messing around with Python internals and things to make certain things work.
It’s incredibly cool. And the language is: Electronic Healthcare Record Query Language: ehrQL (pronounced “urcle,” like it rhymes with “circle”). As a researcher, you write code in this language that looks a lot like Python. That code is then serialized into a query model, which is analyzed and ultimately transformed into a SQL query.
The type of database it runs against is defined in the configuration. For example, TPP uses a MySQL database, so a version of erhQLis set up to generate MySQL-compatible SQL when you submit code. But if another data provider uses PostgreSQL instead, the system can be configured to output SQL tailored for that database.
If OpenSAFELY were running in a different environment, it would generate a different SQL query tailored to that setup. The key idea is that you can write a query once and run it across multiple environments, and still get consistent, expected results.
Hannes Lowette: So you could theoretically in the future, run the same programs that your researchers have written against different health care providers. Exactly. Possibly even different countries. Or is that completely out of scope?
Eli Holderness: Who can say? One reason I like to separate OpenSAFELY into a research philosophy and a set of tools is because the philosophy—writing hypothesis-driven code, submitting it to be run, and receiving checked, aggregated results—isn’t specific to healthcare.
You could apply the OpenSAFELY model to pensions, for example, to analyze how people invest, and which groups tend to choose pensions with certain characteristics. That’s also sensitive data. We also have an initiative called Open Safe Schools, which uses the same research philosophy to explore education data—another domain with fragmented, bespoke datasets spread across the country.
The specific tools might change, since something designed for electronic health records might not be ideal for pension data. But the model has huge potential across different domains. A major challenge, though, is that if you’re committed to keeping data within its original environment, how do you handle linkage across different datasets?
That’s not something I know a lot about, but it’s a common issue in data research and trusted research environments (TREs). OpenSAFELY is sometimes described as a TRE, though it works a bit differently.
In most TREs, researchers are granted access to curated pseudonymous datasets. But the key question is—who curated that data? It’s not raw. There’s already, in technical terms, bias introduced by deciding what you're allowed to access.
If you don't need to view the data directly, you don't need to curate it first—and that changes the game.
Hannes Lowette: Does this whole process change how research is done? Since you don’t get access to the raw data, I imagine researchers have to adjust their approach—you can’t just click around and explore what you have.
Eli Holderness: There’s definitely a kind of culture shock for researchers new to OpenSAFELY. It’s a very different experience—you can’t just run quick exploratory queries and get instant feedback. That slower feedback loop is a trade-off; in some ways, it makes the system less usable.
But it also protects against practices like p-hacking—where a researcher starts with a hypothesis, doesn’t get the result they want, and starts slicing and dicing the data to make it fit. For example: excluding certain groups and coming up with a post-hoc justification for it, until the data finally seems to support the original hypothesis.
You can’t really do that with OpenSAFELY. Technically, you could try, but it would take forever. More importantly, all code used in published OpenSAFELY research must be made public. That transparency means anyone can review your logic and ask, “Why did you exclude these groups? What’s the scientific justification?” Saying “It made my hypothesis correct” isn’t good enough.
So OpenSAFELY enforces a truly hypothesis-driven approach. You have to clearly state your question and your expectations up front, and then see what the data says—without the option to quietly tweak things until the result looks right.
Hannes Lowette: Basically forces people to think about their hypotheses a lot harder.
Eli Holderness: I like to imagine that researchers are all thinking about their hypotheses really hard. It does. So something that this sort of touches on is that researchers and research as a profession is a really difficult thing because as a researcher, you want to do good science. People have got into doing research. Nobody goes into epidemiology because they just, you know.
Hannes Lowette: Want to get paid.
Eli Holderness: As a researcher, you want to do good science—you don’t want to massage the data just to retrofit your hypothesis. But there’s also real pressure. If you don’t produce positive results, your university might stop funding you. You need a career to pay the bills.
Sometimes, the questions you’re asking lead to negative or null results—like showing there’s no link between a COVID vaccine and increased rates of pneumonia. That’s valuable knowledge, but it’s not flashy, and it’s harder to publish or gain recognition for.
So researchers often find themselves stuck between two forces: the need for publishable results to sustain their careers, and the reality that some hypotheses just don’t pan out. OpenSAFELY shifts the balance. It forces you to ask the right question up front and commit to a hypothesis-driven approach.
That’s a really interesting aspect of the platform.
Recommended talk: Language Games • Eli Holderness • GOTO 2024
Ensuring Secure, Scalable Medical Research with OpenSAFELY
Hannes Lowette: You mentioned that the code is written in your DSL. And then you execute it like how does that work compute wise? Like which servers does that run? Who manages all that infrastructure? Who is funding that?
Eli Holderness: When you write ehrQL-code as a researcher, you can test it locally. We have dummy data generation built in, so when you run your code, it uses simulated data. You can also create and plug in your own dummy dataset if you prefer. Everything executes locally in exactly the same way it does on our backend. That was a deliberate design choice—errors you’d see in the backend should also show up locally. Otherwise, you'd waste a lot of time.
Since everything is local at this stage, you're never working with sensitive data, which keeps things privacy-compliant. From an infrastructure perspective, we make heavy use of GitHub Codespaces and encourage researchers to work within them.
When code runs in the real environment—our secure backend—it’s currently operating within TPP’s database service. We have a small virtual machine in there with all the necessary tooling. That includes a job scheduler and the executable code itself.
When executed, the code eventually compiles into SQL that runs against the target database. But there are multiple safeguards along the way. For instance, the code is first compiled to JSON, then recompiled, which allows us to validate certain properties of the final SQL query.
For example, we can ensure the query won’t return values below a certain threshold or flag results that might identify individual patients—like if a query only returns a single person. We’ve built in several layers of checks to control how code behaves during execution, ensuring both data privacy and integrity.
Recommended talk: Reading Code Effectively: An Overlooked Dev. Skill • Marit van Dijk & Hannes Lowette • GOTO 2025
Privacy Protection and Output Checking
Hannes Lowette: Things that you check, automatically, like, hey, there's a query that's running here that's returning very little data.
Eli Holderness: Good question. I don't actually know. That would be a question for the platform team that builds and maintains, which I'm, I'm sort of embedded within at the moment. But what you can do is you can go and look at the source code because it is all open source and, and figure it out. And that's what I might go and do after this interview, because I'm interested in the answer.
Hannes Lowette: Because that seems like one of the easy ways that you can already infuse some machine.
Eli Holderness: With this low hanging fruit. Yeah. And there is discussion within this sort of community of secure records research about the idea of using some kinds of machine learning or some kind of automated output checking not to completely reduce, not to completely remove humans from the process because they'll always need to be some human I.
I think there'll always need to be human eyeballs in that at some point. To reduce the load on them, to make their job easier and just sort of clear out some of those low hanging fruit type issues.
Recommended talk: Unlocking the Web: Exploring WebAuthn & Beyond • Eli Holderness & Mark Rendle • GOTO 2023
Key Challenges: Data Mismatch, Privacy, and Trust
Hannes Lowette: There's always going to be humans involved, as you mentioned, because the data is very sensitive. But another thing you mentioned about the data is that it's not really collected for the purpose that it's used for at the moment. Right? Yes. Collected by GP's to have patient records. But now we're using it for statistical research.
Eli Holderness: Yes.
Hannes Lowette: What are the difficulties with that, because I'm assuming that there's maybe some stuff missing here or there or something else going wrong.
Eli Holderness: One thing that comes to mind is that both medications and diagnoses are represented using codes. Diagnosis typically uses SNOMED codes, while some medications use a coding format known as DM+D. These codes are organized into hierarchical structures—each code exists within a broader system that groups related concepts together.
Take asthma, for example—there are codes for the general condition, and then increasingly specific ones as you go deeper into the hierarchy. The same goes for medications. You might have a broad code for a whole class of drugs containing paracetamol, and then much more specific codes for individual formulations.
In practice, GPs often don’t use the most specific codes. They might record a general code like “ibuprofen prescribed,” even if the patient actually received a specific type of Nurofen with a certain number of pills in the package. If you're a researcher trying to study that exact formulation, you won’t find it—because the data only captures the general ibuprofen code.
That’s a good example of a mismatch between research needs and clinical reality. Researchers might want highly granular data, but collecting it would be too burdensome for GPs, whose primary job is patient care—not detailed data entry. So that level of detail often just isn’t captured.
Hannes Lowette: Right. So if you want to clean that up or make sure that the data is more complete, you would have to have all the GP's participate in it.
Eli Holderness: You would have to make it easy and worthwhile for them to do so. At the moment that's not an incentive because that's not what the data is for. And not nor necessarily should it be. Right. It's I you don't want to make GP's lives even harder than they already are because that could use more funding, could use more support, as always. You don't want to make their lives harder than they already are.
For the purposes of clinical research, even though such is vital and worth promoting if it means that GP's are less able to do their jobs, we can't make that trade off right?
Hannes Lowette: What are the biggest risks of this type of research? So assume that this becomes really popular. Where might we have to intervene.
Eli Holderness: One thing that is we haven't talked about yet is the the fact that this data, it is owned and managed by GP's, but it belongs to patients and the GP's have a duty to respect the patient's wishes, desires for privacy, for safe usage, for worthwhile usage, of their data. And if something that we do, whether that's the benefits to the research community as a whole, the College of GP's, if something happens such that patients' trust is eroded or damaged, that's really bad.
In the UK we have in the UK you can opt out of your GP records being used for research purposes called type one opt outs. I think there's about 3 million people maybe, who have opted out of their data being used for these research purposes. And is that if that number rises.
Hannes Lowette: You still have data, but you can't use it, which is not in the system.
Eli Holderness: Good question. That is a question for TPP and EMAs. And I don't know the answer, but I know that you should not be using that data for research purposes. I think that data still has to be in this database because GPS still needs to access it. It just won't be made available to us, or there will be some flag on it that says we can't use it.
It's my understanding. But if that number of people who have chosen not to use that, not to allow their data to be used, if that rises because people don't trust it, it's being used responsibly. We can't do science anymore. Like if everybody opts out, we can't do science with that anymore.
Hannes Lowette: You’ll devaluate the data set.
Eli Holderness: Do we have a team at the state, PPE, patient and public involvement and engagement to make it easy for people to come to us with concerns or questions, and have those addressed or be put in the white and put in touch with the right person who can address them?
So that's one setting that could be very, very bad is that if people feel that their data is not being respected and handled with the appropriate care and privacy and, and sort of, you know, used for good purposes. And this is part of what the approvals process for studies is about, is about making sure that people are not using GP's, patient's data in a way that could be harmful to them isn't actually a public good, things of that nature, but a failure in that process could be really, really bad.
Recommended talk: (Guitar) Strings Attached: From UTF-8 to EADGBE • Hannes Lowette • GOTO 2023
Secure, Reproducible Science with Sensitive Data
Hannes Lowette: Where do you see the biggest benefits?
Eli Holderness: I mean, kind of the dream is that it does get used. We talked about using it.
So we talked about applying this philosophy to different domains. And to me, that kind of is the dream. Is that where we have sensitive data, we have something like this in place to allow it to be safely used for research. And I know that's something that so the Bennett professor at Oxford is Ben Goldacre who wrote about science, about pharma.
He's my boss a couple of levels away from being my boss. And I know that he has often talked about the idea that maybe this could be the future of secure records research and lots of different domains. And, I mean, I would love to see it happen because I think it is making it easier to do better science, making it easier to do more reproducible science.
I think I think it could make all those things somebody.
Earlier we talked about the idea that this could be used in different domains, like this philosophy could be applied in different domains. And I think that could like that's the dream kind of is that this philosophy and, you know, set of tools and which doesn't eventually it doesn't all have to be built by people at the Bennett Institute.
Right. Could be the standard way of doing research with secure records that, you know, it's it makes it easier to do better science, more reproducible science. That that, to me, is kind of the dream. And then we use it to solve all the problems in the world forever, right? Because we can just do that with more data and asking the right questions.
Hannes Lowette: I believe at least it will have a positive impact.
Eli Holderness: It's really nice to go to work every day. I mean, go downstairs and work remotely, but I feel like my job is doing something to make the world just a little bit better. That's a rare thing, and I really appreciate it.
Hannes Lowette: All right. Thanks so much for joining me today. Thanks for giving us an insight into what it is that you do. Responsibly handling records and patient sensitive data. I at least feel reassured that my data isn't used in a way that it wasn't in, that it wasn't intended to be used as.
Eli Holderness: Was a pleasure.