A data scientist’s views on data literacy
Kirk Borne on the value of universal understanding of data and its effect on society
Jeff Alford, SAS Insights Editor
It’s difficult to recall a time when we were more bombarded with facts, figures and statistics than during the coronavirus pandemic. Statistical curves (and the hoped-for flattening of them), infection rates and, sadly, mortality rates were difficult to compare, contrast and parse.
Our experiences with COVID-19 prove that being data literate is more important than ever. Without some understanding of the ways data can be presented, it’s difficult to distinguish between good and bad analyses. For instance, how would you know which of the differing reports to believe about how long the virus lives on different types of surfaces or even in the air?
Being data literate provides another weapon to help us understand and continue to battle COVID-19 and its effects on us all. Not to mention the importance of data literacy in a world that's now heavily influenced by generative AI technology.
Become a certified data scientist
Interested in data curation, advanced analytics, AI and machine learning? Data scientists are in demand in an ever-expanding world of data. We can help prepare you to become a credentialed data scientist.
We recently had a chance to pick Kirk Borne’s brain on this important topic of data literacy – what it is, why it’s important in our lives and how you can live a better life by being more data literate.
He is a data scientist and astrophysicist and has worked at global technology and consulting firm Booz Allen Hamilton since 2015. His roles include principal data scientist, data science fellow and executive advisor. He provides thought leadership, mentoring, training and consulting activities in data science, machine learning and AI across multiple disciplines. Previously, he was a professor of astrophysics at George Mason University for 12 years in the graduate and undergraduate data science programs. Prior to that, he spent nearly 20 years supporting data systems activities for NASA space science programs, including a role as NASA’s Data Archive Project scientist for the Hubble Space Telescope and as a contract manager in NASA's Astronomy Data Center and Space Science Data Operations Office.
What is data literacy?
Data literacy has several components, which together add up to someone becoming a data-literate person. One of the formal definitions states that data literacy is "the ability to read, work with, analyze, and argue with data." Being data literate means possessing an understanding of what data is and its characteristics (sources, types, formats and data features), data applications (for analysis, business intelligence, data science, decision support, artificial intelligence, automation and analytics), data techniques (such as pattern discovery, pattern recognition and prediction), and data communication (for instance, storytelling, evidence-based reasoning, decision support and visualization).
The first thing to do is to recognize that data is everywhere, that nearly everything is digital and that those digital things produce and consume data. Kirk Borne Data Scientist
Why is data literacy gaining in importance and why are we hearing more about it now?
Data literacy is growing in importance for multiple reasons. I group these reasons into three categories:
- Individuals. There are huge career opportunities and numerous job openings. My own experience in teaching also shows me that most students become fascinated with the topic once they understand what it is and why it is important.
- Organizations. For organizations, there is enormous pressure to use their massive stockpiles of data for business insight, innovation and value creation. An organization's data is now one of its most valuable assets, and it is a renewable asset, that is, the same data can be used and re-used in different applications to fuel various projects and to enrich multiple value streams.
- Market forces. In addition, market forces are rewarding organizations that are data-driven and that have a data-literate workforce. Organizations that lag in these areas are also beginning to lag in competitiveness, recruiting top talent and market value.
What are good first steps to becoming data literate?
The first thing to do is to recognize that data is everywhere, that nearly everything is digital and that those digital things produce and consume data.
Examples include chatbots, online recommendations, autonomous vehicles, predictive modeling, predictive maintenance, fraud detection, claims processing, social sentiment analysis, fake news detection, facial recognition (a nice feature for touch-free login on your smartphone) and text message auto-complete, just to name a few.
Awareness of how much data and data applications permeate our daily life is the first step toward data literacy.
The next step is to realize that nearly every person, thing and activity in the world is producing data, and those data sources are the input to processes that are generating value (e.g., products, decisions and actions) for someone or for some organization and for almost every single industry, job, and market. Hopefully people can start to envision themselves as contributors and consumers of data.
The third step to becoming data literate is that people must see that they can learn about and be part of the digital transformation of the world. I teach the broad concepts of data science, machine learning and AI to general audiences by comparing these "complex" things to the analogous normal cognitive abilities of pattern detection, pattern recognition and evidence-based decision-making.
My audiences are amazed that it is really that simple for them to reach the first level of understanding of what otherwise appears to be unreachable, complex topics. If those steps take place, then people are motivated to learn more. If that does not work, then I try to motivate them into reading and viewing targeted content on these topics within the context of things that they personally care about. That could be health, finance, online shopping, sports, entertainment, recreation, travel, science, or anything.
For example, when I taught a graduate course on data science at George Mason University, I had a unit on geospatial databases and spatial analytics. As part of the material, I covered geographic information systems (GIS). GIS can be a highly technical topic for those unfamiliar with it, so I asked the students to complete a simple exercise: open a web browser, and search for "GIS geospatial" plus anything else that interests them (within the domain of science and technical topics, preferably), then report back what they found. I taught that course every year for over 10 years – each year, my students and I were always surprised and entertained by what we discovered.
How can we use data literacy as responsible citizens?
I taught a course on data ethics at GMU. I could just have easily renamed the course "Data Literacy". In my instruction, I included excerpts from three books How to Lie with Statistics, How to Lie with Maps, and Visual and Statistical Thinking. The idea behind my choice of these books was to demonstrate how we can either intentionally or accidentally be a producer of biased data results and the consumer of biased results.
I used good and bad examples of charts, graphs and statistical results to demonstrate to my students how they should start to think about these things as responsible citizens. Responsible citizenship these days hinges on having some data, statistical and information literacy, in order to fight bias, misinterpretations and misleading hypotheses from uses of data in public discourse.
The famous author H.G. Wells said it best more than 100 years ago: "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." That statement would now include data literacy and analytic thinking. One of my most amusing exercises in that data ethics course was the first-day class activity. I asked the students to give me their reaction to a statement that I heard in the news more than 20 years ago from a famous politician, who said: "I am shocked that half the students in this country score below average on their standardized tests." This class exercise led to some very interesting conversations about statistics and averages (means, medians, and modes) in different types of data distributions.
I must admit that whenever a new student approached me (in my role as undergraduate advisor for the data science degree program), asking me whether they should take my data ethics class or the general university ethics class, I told them that the general ethics class was fine but in my class I would teach them how to lie (humor intended!). They were hooked, they signed up for my class every time! Specifically, I taught my students the various ways in which people and organizations can and do lie with statistics, whether intentionally or inadvertently. I explained to my students that I do this for three reasons:
- Help students to recognize statistical biases and fallacies in the world.
- Show them how to address these issues when they encounter them.
- Demonstrate how to avoid similar problems in their own data-related activities.
These exercises blended statistical literacy with data literacy because there are common biases in applications of data for statistics, data science, machine learning and AI.
How does data literacy affect the success of an organization?
Data literacy is an essential component of the larger concept of data democratization. Data democratization affects the success of organizations in at least five dimensions:
- Data awareness – employees grow in their awareness of the ubiquity and the types of data that the organization uses (or can use).
- Data relevance – employees begin to see the connection between the data and their own role in the business.
- Data literacy – employees learn to read, work with, analyze and argue with role-appropriate data sources.
- Data science – most (if not all) employees then learn how to gain insights and infer understanding from data (pattern discovery, pattern recognition, pattern exploration and pattern exploitation).
- Data imperative – employees ultimately realize that the inability to use and analyze data is crippling businesses (and potentially their own career longevity).
Do you think that businesses truly understand the importance and are offering data literacy training opportunities to their employees?
Many organizations are now at that stage, but many more are not. Fortunately, such programs are springing up everywhere. Those who are not yet on board need to see all of the benefits that can come from having a data-literate workforce. I have a direct personal experience with this.
Several years ago, I was invited by a small company (under 100 employees) to present a two-day training course on data science, which actually covered the five dimensions of data democratization that I just described. What was impressive about this event was that the owners of the company required every one of their employees to attend, not only the technical and business staff.
One of the most excited attendees was the front office receptionist who loved all the new things she was learning. Those business owners truly understood the importance of this data literacy training opportunity for their employees and for their business. The validation came a couple of years later when they successfully sold their company to a larger corporation.
What cultural changes need to happen in society to address data literacy?
First, society needs to realize that data has value. What I mean is that data is often presented as something invasive, destructive or far too complicated for the common person.
Second, there needs to be more positive examples; for example, data hackathons for social good, business analytics examples, and examples in the palm of our hand (our smartphones). Data should be advocated in education, the news, business communications and normal conversation.
Third, there needs to be discussion about how businesses are creating jobs, markets, opportunities and new benefits to society with data.
Fourth, the education system must introduce numbers, stats, data, pattern detection and scientific hypothesis generation from evidence much more intensely, deliberately and creatively in all courses and curricula (at an age-appropriate level, of course) because the world is digital, and it will only become more digital.
Parting thoughts
I offer this reflection. Data permeates our daily lives through all conceivable digital technologies, handheld devices, business activities and personal activities. Through data, the world is computable.
The focus of data literacy should not be on the mathematics, the algorithms or the engineering. Instead, the focus should be on demonstrating that data science and analytics are universally appealing, data literacy is accessible and data fluency is achievable for all.
The democratization of data assets and data literacy is essential for all organizations. Teams of data-literate professionals have the power to understand numerous, diverse data sources, to understand what the data is telling them, and to drive new outcomes, successes and value for any organization. Data literacy is not a math skill – it is a life skill.
Recommended reading
- Resilience in the face of unpredictabilityUnpredictability can “shatter and reshape” a society. And in these unpredictable times, it is important to remain resilient and be prepared to bounce back. This article explores what it truly means to be resilient, how to build it, and how analytics can help you act when your resilience is tested.
- Respond, recover and reimagineDisruptions to our lives happen regularly, though most are not as far-reaching as the COVID-19 pandemic. Whatever their nature, it’s helpful to have a plan for how to exit disruption still on your feet and in the game. Learn about the three-phase approach SAS recommends for mitigating widespread disturbances.
- Contact tracing investigations for public health: Technology enhances epidemic investigationWhat was once a cumbersome process that relied on an individual’s often incomplete or inaccurate memory, contact tracing investigations for public health has entered the digital era thanks to advanced analytics and data visualization.
Ready to subscribe to Insights now?