Everyone has experienced the plight of cold and flu season at some point during the winter months. Winter storms and cold weather in general keep us inside where we trade germs more frequently. Many times the germs that are spread can lead to hospitalizations. That’s where a research team at University of Central Florida (UCF) comes in. It used predictive analytics to show that winter storms have the greatest impact on infectious diseases – more than floods, thunderstorms or windstorms.
This information can help healthcare providers estimate and prepare for incoming patients. After winning the fifth annual SAS Data Mining Shootout, I asked UCF team captain Jun Han a few questions about the innovative project.
1) You began by seeking an early-warning system for predicting diseases and took into account multiple data sources like weather, population and demographic data. How did you whittle all of this data down to draw conclusions?
The steps of seeking an early-warning system for predicting diseases really follow the traditional steps for predictive modeling. We started with the discovery phase of analytics beginning with a lot of data exploration to understand every data input. We then integrated all the information to find the best way to apply predictive analytics. The process involves handling of missing data points, data quality issues and data transformation.
After that, we moved into model-building using data mining techniques, and benchmarked the results and different model approaches. The models were validated and the monitoring of results gave feedback to the data preparation. This learning cycle allowed us to increase our prediction accuracy by fine-tuning the models along the way.
2) What benefits will come out of your predictive models – for health care providers and patients?
I now study statistics, and my undergraduate background is in economics. Therefore a specialist in health care may give a more detailed answer to this question than me. But it is apparent that our model can give some valuable predictions.
For a specified age group in a given area, within a given week, we can develop posterior probabilities respectively for 23 kinds of diseases occurring. This information will help health care providers allocate their resources more timely and efficiently. The model will also improve the understanding of the relationship between the emergence and spread of infectious diseases and patterns of human activities under different weather conditions. This is also helpful for scientists in epidemiology to improve measures for disease prevention.
3) What’s next on your analytical study path?
According to my statistics and economics background, I am interested in all kinds of statistics applications in business intelligence. As a master’s student, I am seeking a job and will graduate in May 2012. But I will still keep learning how to develop data mining applications, especially social network and text mining because in recent years, these methods and corresponding data sources are being well-developed.