Suicide is the second leading cause of death among youth in Canada, according to Statistics Canada, accounting for one-fifth of deaths of people under the age of 25 in 2011. The Canadian Mental Health Association states that among 15 – 24 year olds the number is an even more frightening at 24 percent – the third highest in the industrialized world. Yet despite these disturbing statistics, the signals that an individual plans on self-injury or suicide are hard to isolate.
Canada Health Infoway (Infoway), a not-for-profit organization funded by the Canadian government to help improve health by accelerating the development, adoption and effective use of digital health across Canada, created the Data Impact Challenge to try to discover answers to problems like this. Infoway asked Canadian individuals and organizations to submit the questions they'd like to see answered, and then to vote on the submissions to choose the final group. Authorized users of Canadian health care data sets were invited to try to answer one, or more, of the challenge questions for an opportunity to earn a share in over $95,000 in awards. Submissions were then evaluated by a panel of data experts and academic judges.
See how data is shaping our world - for the better
Greg Horne, National Lead, Healthcare at SAS Canada noted, "One of the things we constantly fight against in the health care space is the concept of 'I do nothing because I'm trying to boil the ocean with my analytics'. Paralysis by analysis.” He promotes thinking about the challenges, understanding what the problem is, and then considering how data and analytics might help to solve the problem.
Infoway was thinking along those same lines, and put forth a set of specific questions as part of this year's Data Impact Challenge. “The move from paper to digital has created a critical mass of information that can be quickly accessed and analyzed to help inform the policies that help lead to better informed decision-making,” said Fraser Ratchford, Group Program Director at Canada Health Infoway. “The Challenge demonstrated that in just a short period of time, and with existing data, important health policy issues and questions can be answered.”
The move from paper to digital has created a critical mass of information that can be quickly accessed and analyzed to help inform the policies that help lead to better informed decision-making. Fraser Ratchford Group Program Director Canada Health Infoway
Horne heard about the challenge through his existing relationship with Infoway, and said he worked with them to determine how SAS could get involved. The question he and his team chose to address involved using social media posts to try and identify people ages 15 to 25 who had self-harm or suicide in mind.
The rationale for this question, according to Infoway:
With response rates to traditional surveys in decline, it's incumbent on governments and their partners to tap into new and emerging sources of publicly available data such as social media. Response rates to traditional surveys among youth, in particular young males, are some of the lowest among all age groups in the country.
Youth are active users of social media, with about 74 percent of Twitter users between the ages of 15 and 25 years of age. Use of social media outlets such as Facebook or Twitter as a data source would serve to augment existing survey and administrative data sources and allow for better analysis, contextualization and interpretation of traditional incidence, prevalence and case level data currently available. There's even a potential opportunity that these new sources could provide an early indication of possible trends to guide more formal surveillance activities.
“Analytics is engrained in so many aspects of our daily lives, from targeted coupons to determining who gets a mortgage loan to a simple online search," remarked Jos Polfliet, one of the lead data scientists tasked to work on this project. “Investing time and effort around such an important issue like the mental health of youth in Canada was a really rewarding exercise.”
How they did it
Team members Horne, plus Tim Trussell, Manager Presales Specialist, Data Sciences, both of whom have health care backgrounds, and data scientists Marie Soehl and Jos Polfliet, who did the programming, collected 2.3 million tweets and used text mining software to identify 1.1 million of them as likely to have been authored by 13 to 17 year olds in Canada by building a machine learning model to predict age, based on the open source PAN author profiling dataset. Their analysis made use of natural language processing, predictive modelling, text mining, and data visualization.
"With this project, the cool thing was how we were able to handle that amount of data," Soehl said. "Making sense of it was the difficult part." Insights into who's posting to a site, and then being able to backtrack and look at their previous behavior could lead to a better understanding of the population.
However, there were challenges. Ages are not revealed on Twitter, so the team had to figure out how to tease out the data for 13 – 17 year olds in Canada. "We had a text data set, and we created a model to identify if people were in that age group based on how they talked in their tweets," Soehl said. "From there, we picked some specific buzzwords and created topics around them, and our software mined those tweets to collect the people."
Another issue was the restrictions Twitter places on pulling data, though Soehl believes that once this analysis becomes an established solution, Twitter may work with researchers to expedite the process. "Now that we've shown it's possible, there are a lot of places we can go with it," said Soehl. "Once you know your path and figure out what's going to be valuable, things come together quickly."
The team looked at the percentage of people in the group who were talking about depression or suicide, and what they were talking about. Horne said that when SAS' work went in front of a Canadian audience working in health care, they said that it definitely filled a gap in their data -- and that was the validation he'd been looking for. The team also won $10,000 for creating the best answer to this question (the team donated the award money to two mental health charities: Mind Your Mind and Rise Asset Development)
That doesn't mean the work is done, said Jos Polfliet. "We're just scraping the surface of what can be done with the information.” Another way to use the results is to look at patterns and trends. "The data can tell us if specific regions in Canada have a problem, or help identify a specific school or time of year. Ultimately, it could allow for more targeted prevention campaigns, and help inform decision makers where the most effort and dollars are needed to treat those most at risk," said Polfliet.
"It's just starting to show some of the capabilities and potential," Horne said. "There's a lot more that can be done, not only around the discovery of people at risk, but then how you manage people at risk, and what you do as a follow up."
He envisions the solution being used to find not only at-risk teens, but others too, like first responders or veterans who may be considering suicide. Privacy is a huge issue, however, especially if the solution is extended to other social media platforms where individuals can be identified. "The ethical implications still need to be worked out," said Horne.
The project is another example of using data for good instead of just to boost the bottom line. Organizations like SAS are looking for new ways to use analytics to make a difference. SAS was founded on the principle of using analytics to change the world. From fighting cancer and researching the Zika virus to changing the lives of Ghana women by teaching them how to code, SAS has remained committed to helping solve critical humanitarian issues using data and analytics.
Using data and analytics to identify teens' suicide risk
Greg Horne, National Lead, Healthcare at SAS Canada describes how his team answered a crucial question: Can big data be used to predict the risk of suicide among Canadian youth? They dug into the data on Twitter and here's what they found.