Want more Insights from SAS? Subscribe to our Insights newsletter. Or check back often to get more insights on the topics you care about, including analytics, big data, data management, marketing, and risk & fraud.
Leveraging big data to predict the risk of suicide among Canadian youth
SAS Aces Canada Health Infoway’s Data Impact Challenge
By Suzanne Sprajcar Beldycki, Head of Communications SAS Canada
Suicide is the second leading cause of death among youth in Canada, according to Statistics Canada, accounting for one fifth of deaths of people under the age of 25 in 2011. The Canadian Mental Health Association says that among 15 – 24 year olds the number is an even more frightening at 24 percent – the third highest in the industrialized world. Yet despite these disturbing statistics, the signals that an individual plans on self-injury or suicide are hard to isolate.
Canada Health Infoway (Infoway), an independent, not-for-profit organization funded by the federal government to help improve the health of Canadians by working with partners to accelerate the development, adoption and effective use of digital health across Canada, created the Data Impact Challenge to try to discover answers to questions like this. After asking Canadian individuals and organizations to submit the questions they'd like to see answered, and then to vote on the submissions to choose the final group, authorized users of Canadian health care data sets were invited to answer any one, or more of the challenge questions for an opportunity to earn a share in over $95,000 in awards. Submissions were then evaluated by a panel of data experts and academic judges.
Greg Horne, National Lead, Healthcare at SAS Canada noted, "One of the things we constantly fight against in the healthcare space is the concept of 'I do nothing because I'm trying to boil the ocean with my analytics'. Paralysis by analysis.” He promotes thinking about the challenges, understanding what the problem is, and then considering how data and analytics might help to solve the problem.
Infoway was also thinking along those same lines, and put forth a set of very specific questions as part of this year's Data Impact Challenge. “The move from paper to digital has created a critical mass of information that can be quickly accessed and analyzed to help inform the policies that help lead to better informed decision-making,” said Fraser Ratchford, Group Program Director at Canada Health Infoway. “For instance, the Challenge demonstrated that in just a short period of time, and with existing data, important health policy issues and questions can be answered.”
Horne heard about the challenge through his existing relationship with Infoway, and said he worked with them to determine how SAS could get involved. The question he and his team chose to address was around trying to identify people between the ages of 15 and 25 who had self-harm or suicide in mind through their posts on social media.
The rationale for this question, according to Infoway:
With response rates to traditional surveys in decline, it is incumbent on governments and their partners to tap into new and emerging sources of publicly available data such as social media. Response rates to traditional surveys among youth, in particular male youth, are some of the lowest among all age groups in the country.
Youth are active users of social media, with about 74% of Twitter users between the ages of 15 and 25 years of age. Use of social media outlets such as Facebook or Twitter as a data source would serve to augment existing survey and administrative data sources and allow for better analysis, contextualization and interpretation of traditional incidence, prevalence and case level data currently available. There is even a potential opportunity that these new sources could provide an early indication of possible trends to guide more formal surveillance activities.
“Analytics is engrained into so many aspects of our daily lives, from targeted coupons to determining who gets a mortgage loan to a simple online search, remarked Jos Polfliet, one of the lead data scientists tasked to work on this project. “Investing time and effort around such an important issue like the mental health of youth in Canada was a really rewarding exercise.”
Team members Horne, plus Tim Trussell, Manager Presales Specialist, Data Sciences, both of whom have healthcare backgrounds, and data scientists Marie Soehl and Jos Polfliet, who did the programming, collected 2.3 million tweets and used text mining software to identify 1.1 million of them as likely to have been authored by 13 to 17 year olds in Canada by building a machine learning model to predict age, based on the open source PAN author profiling dataset. Their analysis made use of natural language processing, predictive modelling, text mining, and data visualization.
"With this project, the cool thing was how we were able to handle that amount of data," Soehl said. "Making sense of it was the difficult part." The directions it leads researchers towards with the insights into who's posting to a site, and then being able to backtrack and look at their previous behaviour could lead to a better understanding of the population.
However, there were challenges. Ages are not revealed on Twitter, so the team had to figure out how to tease out the data for 13 – 17 year olds in Canada. "We had a text data set, and we created a model to identify if people were in that age group based on how they talked in their tweets," Soehl said. "From there, we picked some specific buzzwords and created topics around them, and our software mined those tweets to collect the people."
Another issue was the restrictions Twitter places on pulling data, though she believes that once this analysis becomes an established solution, Twitter may work with researchers to expedite the process. And, she said, "now that we've shown it's possible, there are a lot of places we can go with it. Once you know your path, and once you figure out what's going to be valuable, things come together quite quickly," she added.
The team looked at the percentage of people in the group who were talking about depression or suicide, and what they were talking about. Horne said that when SAS's work went in front of a Canadian audience working in the healthcare field, they said that it definitely filled a gap in their data; and that was the validation he'd been looking for. The team also won $10,000 (which it donated to mental health charities mind your mind and Rise Asset Development) for creating the best answer to this question.
That doesn't mean the work is done, said Jos Polfliet. "We're just scraping the surface of what can be done with the information.” Another way to use the results is to look at patterns and trends. The data can tell us if specific regions in Canada have a problem or help identify a specific school or time of year. Ultimately, it could allow for more targeting prevention campaigning, instructing decision makers where the most effort and dollars are needed to treat those most at risk, he pointed out.
"It's just starting to show some of the capabilities, and what is the potential," Horne added. "There is a lot more that can be done, not only around the discovery of people at risk, but then how you manage people at risk, and what you do as a population piece of work as a follow-up."
He envisions the solution being used to find not only at-risk teens, but others like first responders or veterans who may be considering suicide. Privacy is a huge issue, however, especially if the solution is extended to operate on other social media platforms where individuals can be identified. The ethical implications, Horne said, still need to be worked out.
The project pays homage to the data for good movement where instead of just using data to boost the bottom line, organizations are looking for new ways to use analytics to make a difference. SAS was founded on the principal of using analytics to change the world. From fighting cancer and researching the Zika virus to changing the lives of Ghana women by teaching them how to code, SAS has remained committed to helping solve critical humanitarian issues using data and analytics.