There was a stark reminder for teams participating in the SAS Institute Safe Roads Competition that data science problems are real-world problems.
Final presentations before a panel of judges at the March event were delayed when one of those judges was delayed by a road closure—a traffic accident had taken the life of a 60-year-old pedestrian only a mile away from the Toronto event. Statistics represent more than simply data points.
That the winning team at the competition had taken that to heart was no small coincidence. George Brown College students Corinne Blanchett and Farid Rasi put their number crunching into real-world context: After drilling into datasets from Toronto Police Services (TPS) and fleet management company Geotab Inc., they visited a number of representative accident sites in person or online to glean more insights to bolster their conclusions. This included an in-depth onsite dissection of a fatal accident that tightly corresponded to their modeling and the hair-raising navigation of an east-end crossing made notorious by the data.
But it wasn’t going those extra few miles that won the judges over. Corinne and Farid’s recommendations also differentiated them from the four other presenting teams. While others made valid and actionable proposals in the areas of targeted awareness and enforcement, and real-time traffic information and diversion, theirs went to the systemic design of the dangerous arteries: self-enforcing safety features like roundabouts and speed bumps, shorter distances between controlled crossings. Where a person could fail, the road system mustn’t.
DEFINING THE BUSINESS PROBLEM
But most importantly to TPS data science lead Meghan Fotak, one of the judges of the event, was the approach of the team to discovering the business problem. A hypothesis-driven approach is, in essence, “Here’s a business issue—what can the data tell me about it?” Corinne and Farid’s data-driven approach, on the other hand, is: “What does the data tell me the business problem is?” Think of it as akin to the continuum of types of analytics, from descriptive through diagnostic and predictive to prescriptive analytics.
Like other teams, Corinne and Farid noted a significant downward trend overall in serious accidents. Contrary to the trend, though, was the severity and frequency of accidents involving pedestrians 55 and older. Mapping by region, traffic control, monthly volume, and a host of other datasets, the pair outlined the best predictors of fatal accidents in this demographic: lack of traffic controls, high volume, and high acceleration.
When they visited the site of an August 2018 fatal accident, they found a perfect storm of predictive conditions—a seniors’ safety zone a five-minute walk (longer for an older person) from the nearest controlled crossing in either direction, with a high volume of traffic building up speed between the distant pairs of traffic lights. On a rainy night like that of the accident, it’s almost inevitable that a senior would try to cross the street between controls. Since there is a strong likelihood of human failure, the road system itself must be engineered to mitigate that failure with more self-enforcing traffic controls.
Notice the progression from descriptive through prescriptive analytics. That’s exactly the journey real-world data scientists take every day, says Fotak.
ENGAGING THE NEXT GENERATION
TPS’s student engagement program is constantly working on projects with universities and colleges revolving around the service’s Public Safety Open Data Portal. The portal’s two-dozen-plus datasets provide open access to reports of crimes, traffic and boundaries for students and the public at large to parse and crunch and integrate with other open datasets, like Geotab’s anonymized and aggregated fleet telematic and GPS data, in search of insights or data science experience.
“Experience” is a key word here. The participants in the road safety challenge are not trained data scientists, nor are many of the users of the portal. Simple tools like SAS Visual Analytics are key to the citizen data scientist. They are learning how data science works, not how a software suite works. Transparency and intuitiveness are the most important features of the toolset.
Like TPS, SAS is continually engaging with the student community. SAS and its partners like TPS and Geotab sponsor hundreds of student events worldwide every year. We’re involved in data science program development at post-secondary institutions from college to post-doctorate levels all over Canada. We recognize the need for data science talent is growing—but the talent pool isn’t getting deeper.
According to 2018 figures from the University of California at Riverside, global demand for data scientists exceeded supply by 50 percent. Meanwhile, small graduating cohorts from the dearth of universities offering data science degrees, especially at the undergraduate level, spell an increasing gap. In the U.S. alone, UCR estimated there were 490,000 data science positions available in 2018, with fewer than 200,000 data scientists to fill them.
If your school has a programming opportunity to expose more students to what Harvard Business Review calls “the sexiest job of the 21st Century,” get in touch with us Lindsay.Hart@sas.com to explore how we can develop it together. Meanwhile, you can explore the many student resources and free learning SAS offers here.
- A data scientist’s views on data literacyGet a data scientist and teacher's perspective on the value of having foundational knowledge so you can more easily tell data fact from data fiction.
- Fighting coronavirus: 4 ways analytics is making a differenceCoronavirus has separated us from family, friends, cultural and religious communities. Unfortunately, isolation is essential to slowing the spread of the virus. What else can be done? Learn how analytics is being used to improve responses to the coronavirus outbreak.
- Three steps for conquering the last mile of analyticsPutting your analytical models into production can be the most difficult part of the analytics journey. It’s no surprise that this last mile of analytics – bringing models into deployment – is the hardest part of digital transformation initiatives for organizations to master, yet it’s the most crucial.