Happy 2019! Hopefully, your holiday season was hectic in only the best way, and that you’re facing the new year with clear eyes, a clean slate, and a new sense of purpose.
Tradition has it that at the turn of the year, we make resolutions to improve ourselves, be better at being who we are and doing what we do. As data scientists, there are five areas where, with some focus and commitment, we can deliver better results this year.
These resolutions apply whether you’re an analytics veteran or someone relatively new to the cause, and we speak from both of those perspectives: Tina is a long-time analytics ambassador and team leader, while Alex brings the perspective of a recent recruit to the table.
The best resolutions to make are the realistic ones. We all have an unused gym membership or unread Gardening for Dummies in our closets because we didn’t approach our resolutions realistically. Remember the advice of the American Psychological Association: start small, change one thing at a time, and ask for support when you need it.
RESOLUTION #1: TREAT DATA SCIENCE AS A TEAM SPORT
Do you feel like you’re treading water, having trouble delivering data science initiatives or scaling to absorb new stakeholders or use cases? That’s sometime symptomatic of a data science team working in isolation from the rest of the business. Typical data science teams are small and scattered throughout the enterprise, organizationally and often geographically.
Data science teams have to get bigger to service the surge in enterprise use cases. But all that growth isn’t necessarily within the core data science team. The extended cast includes business stakeholders, data engineers, application developers, and more, and according to research firm Forrester, those extended data science teams will become bigger than software development teams in the next five years. For a data science team to provide maximum value, there must be buy-in from all stakeholders. Everyone has a significant role in the delivery of a shared analytic and artificial intelligence vision and that requires effective collaboration among all these team members.
RESOLUTION #2: EMBRACE AUTOMATION
Data science taxes the intellect, places demands on the analytic and modeling centres of the brain, and is ultimately a rewarding and satisfying intellectual discipline. Too bad we spend as little as 20 per cent of our time on it, according to a study by CrowdFlower. Most of a data scientist’s time is spent collecting, organizing, and cleansing data sets.
Vendors like SAS offer numerous tools to automate the heavy lifting part of the analytics job: data preparation and auto-tuning, model assessments and interpretation, suggestion engines driven by AI.
These can improve productivity, expose extended team members to the analytics curve, and let you get back to the kind of intellectual work that led the Harvard Business Review to name data scientist “The Sexiest Job of the 21st Century.”
RESOLUTION #3: EXPERIMENT
“One sometimes finds what one is not looking for,” said Alexander Fleming of his accidental discovery that mold kills bacteria, paving the way for the first antibiotic, penicillin. While a data science query generally starts with a hypothesis, sometimes the data itself raises questions of its own. This iterative process can lead down the road to insights no one had foreseen, or even sought.
Truly experimenting with data requires resources. Data science teams need access to all the relevant data, not just a sample or subset. AI and machine learning use data-hungry algorithms. More data means better models. Work with the extended team to make it easier to bring all the data to the table. A high-performance analytics engine is needed to practically run iteration after iteration of experimental data within reasonable run times.
Remain open to as many algorithms, tools and techniques as you can bring to bear on a dataset. There’s an element of Zen to finding the answer to a question you didn’t know was important to ask. Governance standards are essential to ensure the results can be trusted regardless of coding language.
RESOLUTION #4: USE IT OR LOSE IT
Ask a room full of data scientists to raise their hands if they’ve created a new model in the last few months, and watch the hands shoot up. Then ask them how many have seen their models go into production. See how few hands remain in the air. According to Forrester, it’s a common complaint among data science professionals—models are only sometimes, if ever, deployed.
Chief among the obstacles to moving from discovery to actionable insight is a disjointed workflow between the data science team and the IT team. Analytical models developed by the business side often have to be recoded to run in a production environment, a manual process that can take weeks or months, often resulting in missed business opportunities.
With SAS you can use a common framework to deploy models developed in different languages to make the transition more efficient and effective—something your analytics platform should provide. Monitor performance: Models decay because analytics is a dynamic science.
Balance accuracy and practicability. Remember the story of the Netflix Prize, a $1-million challenge to improve the recommendation algorithm of the entertainment powerhouse launched in 2007. It took two years for a team from AT&T Labs to cross the 10 per cent improvement threshold; in 2012, Netflix announced it wouldn’t use the winning algorithm partly because of the engineering effort involved. The perfect model isn’t necessarily the best model.
RESOLUTION #5: GET ON BOARD WITH AI ETHICS
Perhaps it shouldn’t be surprising given the Canadian propensity to move carefully with new technologies. According to a recent study, Canadian companies are trailing much of the world in the adoption of artificial intelligence technology. But we’re among the world’s leaders in ethical oversight of AI programs, with 73 per cent of companies creating ethics committees to monitor them, more than any of the 10 countries studied by Forbes Insights.
AI has an impact on the lives of everyone, not simply users or their customers. AI algorithms can be programmed with an inherent bias (one algorithm to predict criminal risk used in the U.S. was found to inherently biased against African-Americans). AI depends on huge pools of often personally identifiable information, and makes or recommends decisions on everything from credit applications to hiring processes. It’s important to remember that analytics and AI aren’t applied in a vacuum.
Canada’s Treasury Board has taken a leadership role in online consultation over the use of artificial intelligence for its own applications, hoping to establish rules that will guide AI use for the private sector. It’s crucial for the data science community to advise and participate in the process, and apply its findings to maintain public faith.
Best of luck with all your data science initiatives in the coming year.
- How organizations are preparing for the future with the strategic and innovative use of analytics.
- What challenges organizations recognize on their way to fully deploy the potential of analytics.
- How leading organizations benefit from an analytics platform and get out the most of their analytics investment.
- How your organization compares to your peers.
About the Authors
Tina Schweihofer is a Customer Advisory Pre-Sales Manager at SAS Canada. She is passionate about helping people understand how high-performance analytics, coupled with the right data strategy can deliver real business benefits. Tina leads a talented data sciences team that helps organizations across industries apply analytics to solve unique business problems using SAS.
Alexander Terado is a Solutions Specialist on the Data Sciences team of SAS Canada. Alex works with customers across Canada to help them transform their data into intelligence through the power of analytics. He enjoys solving complex problems across the entire analytics lifecycle and is a passionate advocate of data-driven decision making.
- Five AI TechnologiesDo you know the difference between artificial intelligence and machine learning? And can you explain why computer vision is an AI technology? Find out in this short explainer.
- Data management backgrounderFrom data integration to data quality and data preparation, find out what these terms mean and why they’re so important for your analytics projects.
- Data quality management What you need to knowData quality isn’t simply good or bad. Data quality management puts quality in context to improve fitness of the data you use for analysis and decision-making.