SAS data warehouse from Sciensano supports worldwide reuse of scientific research data about public health
Scientific research is crucial for our public health. That’s why Sciensano, formerly the ‘Wetenschappelijke Instituut Volksgezondheid’ (Scientific Institute of Public Health), wanted to find a way to centralise all scientific research data. The institute therefore developed a data warehouse, healthdata.be, to help researchers collect, manage and analyze their data. Sciensano contracted Cronos Group and SBI Consulting to help with this development.
The new data warehouse was built as part of the e-Health 2013-2018 action plan. One of the action points was the standardization of all data collections for scientific policy research. ‘It means our employees and researchers can all work in a standardized way in terms of data quality, storage, analysis and reporting. These best practices mean other researchers can also reuse this data in various other research projects,’ says Johan Van Bussel, coordinator of Healthdata at Sciensano.
Thorough consultation with researchers on data warehouse requirements
Sciensano organized various workshops with researchers to design the new architecture and associated features. Van Bussel: ‘This gave us a good view of what they needed, and meant we could list the requirements for data storage, analysis and reporting one step at a time.’
Secure and structured processes for data collection and storage
‘It goes without saying that we need to store all data extremely securely, both for patients and researchers. We therefore decided to build the data warehouse using SAS technology. The researchers themselves also indicated that they preferred SAS, because it’s ideal for standardizing data collection and storage processes and making the information available to users, all in a secure and anonymized environment.’
Using SAS means the data warehouse also offers the possibility to use R, a programming language that is often well known by data scientists. And with Hadoop users can easily process the high volumes of unstructured data. Sciensano also built a BI layer in the data warehouse, together with its IT partners, to give researchers sufficient reporting options. The main advantage is that all users now use the same method.
Linked data helps advancements in research
Centralizing all the data collections in a central data warehouse means researchers can easily link projects together. Van Bussel: ‘We can for example link data from the cancer register to data from the social security data warehouse. This can help researchers to better answer questions about reintegration in the labour market after cancer treatment for instance.’
Other project examples include data from patients with rare diseases, as well as more common diseases such as diabetes. These kinds of illnesses have a big impact on healthcare and thanks to the data collections, Sciensano fundamentally contributes to research and government institutions.
Unique global platform for research data
Early 2014, Sciensano drew up a short-term planning for 42 projects. ‘In the meantime, we also have a list of more than 250 projects that want to make use of our new infrastructure,’ says Van Bussel. ‘It looks like we’re evolving into an international platform of health data for scientific research, which is unique in the world.’
Sciensano now somewhat needs to temper researchers’ expectations. ‘It will still take a while before all data collections can be reused, but we’re evolving towards a much simpler administration and great progress in healthcare,’ concludes Van Bussel.
SAS is ideal for standardizing data collection and storage processes and making the information available to users for further analysis. Johan Van Bussel Coordinator of Healthdata Sciensano
Sciensano – Facts & Figures
Research projects in production
Projects that want to make use of the infrastructure
Users currently working on the data warehouse