As data scientists, there are five areas where, with some focus and commitment, we can deliver better results. These tips apply whether you’re an analytics veteran or someone relatively new to the cause, and we speak from both of those perspectives: Tina is a long-time analytics ambassador and team leader, while Alex brings the perspective of a recent recruit to the table.
TIP#1: TREAT DATA SCIENCE AS A TEAM SPORT
Do you feel like you’re treading water, having trouble delivering data science initiatives or scaling to absorb new stakeholders or use cases? That’s sometime symptomatic of a data science team working in isolation from the rest of the business. Typical data science teams are small and scattered throughout the enterprise, organizationally and often geographically.
Data science teams have to get bigger to service the surge in enterprise use cases. But all that growth isn’t necessarily within the core data science team. The extended cast includes business stakeholders, data engineers, application developers, and more, and according to research firm Forrester, those extended data science teams will become bigger than software development teams in the next five years. For a data science team to provide maximum value, there must be buy-in from all stakeholders. Everyone has a significant role in the delivery of a shared analytic and artificial intelligence vision and that requires effective collaboration among all these team members.
TIP #2: EMBRACE AUTOMATION
Data science taxes the intellect, places demands on the analytic and modeling centres of the brain, and is ultimately a rewarding and satisfying intellectual discipline. Too bad we spend as little as 20 per cent of our time on it, according to a study by CrowdFlower. Most of a data scientist’s time is spent collecting, organizing, and cleansing data sets.
Vendors like SAS offer numerous tools to automate the heavy lifting part of the analytics job: data preparation and auto-tuning, model assessments and interpretation, suggestion engines driven by AI.
These can improve productivity, expose extended team members to the analytics curve, and let you get back to the kind of intellectual work that led the Harvard Business Review to name data scientist “The Sexiest Job of the 21st Century.”
“One sometimes finds what one is not looking for,” said Alexander Fleming of his accidental discovery that mold kills bacteria, paving the way for the first antibiotic, penicillin. While a data science query generally starts with a hypothesis, sometimes the data itself raises questions of its own. This iterative process can lead down the road to insights no one had foreseen, or even sought.
Truly experimenting with data requires resources. Data science teams need access to all the relevant data, not just a sample or subset. AI and machine learning use data-hungry algorithms. More data means better models. Work with the extended team to make it easier to bring all the data to the table. A high-performance analytics engine is needed to practically run iteration after iteration of experimental data within reasonable run times.
Remain open to as many algorithms, tools and techniques as you can bring to bear on a dataset. There’s an element of Zen to finding the answer to a question you didn’t know was important to ask. Governance standards are essential to ensure the results can be trusted regardless of coding language.
TIP#4: USE IT OR LOSE IT
Ask a room full of data scientists to raise their hands if they’ve created a new model in the last few months, and watch the hands shoot up. Then ask them how many have seen their models go into production. See how few hands remain in the air. According to Forrester, it’s a common complaint among data science professionals—models are only sometimes, if ever, deployed.
Chief among the obstacles to moving from discovery to actionable insight is a disjointed workflow between the data science team and the IT team. Analytical models developed by the business side often have to be recoded to run in a production environment, a manual process that can take weeks or months, often resulting in missed business opportunities.
With SAS you can use a common framework to deploy models developed in different languages to make the transition more efficient and effective—something your analytics platform should provide. Monitor performance: Models decay because analytics is a dynamic science.
Balance accuracy and practicability. Remember the story of the Netflix Prize, a $1-million challenge to improve the recommendation algorithm of the entertainment powerhouse launched in 2007. It took two years for a team from AT&T Labs to cross the 10 per cent improvement threshold; in 2012, Netflix announced it wouldn’t use the winning algorithm partly because of the engineering effort involved. The perfect model isn’t necessarily the best model.
TIP #5: GET ON BOARD WITH AI ETHICS
Perhaps it shouldn’t be surprising given the Canadian propensity to move carefully with new technologies. According to a recent study, Canadian companies are trailing much of the world in the adoption of artificial intelligence technology. But we’re among the world’s leaders in ethical oversight of AI programs, with 73 per cent of companies creating ethics committees to monitor them, more than any of the 10 countries studied by Forbes Insights.
AI has an impact on the lives of everyone, not simply users or their customers. AI algorithms can be programmed with an inherent bias (one algorithm to predict criminal risk used in the U.S. was found to inherently biased against African-Americans). AI depends on huge pools of often personally identifiable information, and makes or recommends decisions on everything from credit applications to hiring processes. It’s important to remember that analytics and AI aren’t applied in a vacuum.
Canada’s Treasury Board has taken a leadership role in online consultation over the use of artificial intelligence for its own applications, hoping to establish rules that will guide AI use for the private sector. It’s crucial for the data science community to advise and participate in the process, and apply its findings to maintain public faith.
Best of luck with all your data science initiatives! To learn more, consider reading: Here and now: The need for an analytics platform Read the ebook
About the Authors
Tina Schweihofer, Customer Advisory Pre-Sales Manager SAS Canada
Tina Schweihofer is passionate about helping people understand how high-performance analytics, coupled with the right data strategy can deliver real business benefits. Tina leads a talented data sciences team that helps organizations across industries apply analytics to solve unique business problems using SAS.
Alexander Terado, Solutions Specialist, Data Sciences Team SAS Canada
Alex works with customers across Canada to help them transform their data into intelligence through the power of analytics. He enjoys solving complex problems across the entire analytics lifecycle and is a passionate advocate of data-driven decision making.
- What is a data lake and why does it matter?A data lake is a storage repository that quickly ingests large amounts of raw data in its native format. As containers for multiple collections of data in one convenient location, data lakes allow for self-service access, exploration and visualization. In turn, businesses can see and respond to new information faster.
- Finding COVID-19 answers with data and analyticsLearn how data plays a role in optimizing hospital resources, understanding disease spread, supply chain forecasting and scientific discoveries.
- 5 ways to measure beehive health with analytics and hive-streaming dataThis analytical approach to understanding bee hive health can automatically alert beekeepers to changes in hive weights, temperatures, flight activity and more.