Personal data: Getting it right with GDPR
An interview with Jay Exum, Privacy Counsel at SAS
By Cindy Turner, SAS Insights Editor
Ninety percent of the world’s data was created in the last two years alone. More and more of this is big data streaming from connected IoT devices – a trend giving rise to an exponential increase in the amount of personal data.
Today, this data is easier to access and more affordable to process and store than ever before. As a result, many organizations gather significant amounts of personal data about individuals. But our ability to collect and store personal data has accelerated faster than our ability to think through how to manage and protect it.
To learn more about the definition of personal data, why it’s in the news and why it’s being tightly regulated by laws like the General Data Protection Regulation (GDPR), we interviewed Jay Exum, Privacy Counsel at SAS.
GDPR: What It Means and How SAS Data Management Can Help
The General Data Protection Regulation (GDPR) puts the individual at the center of data protection and holds organizations across the globe accountable. With its broad data management capabilities, including data governance, data quality, data federation and more, SAS can help you prepare your data for all phases of the data protection life cycle: access, identify, govern, protect and audit.
What does the term ‘personal data’ mean, and what is its legal definition according to the GDPR?
Instinctively, most people jump to categories when they try to understand personal data. For example, you might think of your business phone number versus your “more personal” Social Security number. That’s not how it works under the GDPR.
The General Data Protection Regulation uses an extremely broad definition of personal data. It’s not based on what category the data is in, and it doesn’t matter how sensitive the data is. “Personal” does not equal “confidential.”
The GDPR regulations are not just about data you consider personal. They’re about data that can be associated with you as an individual. The GDPR defines personal data as any information that relates to an identified or identifiable individual. That’s quite broad. It means that if the data can be specifically tied to an individual – even if it takes extra steps to get there (such as having an encryption key or other knowledge) – it’s personal data.
Think of the IP address your computer uses when connecting to the internet. If someone saw this address by itself, they wouldn’t know who was using it. But with additional information, it would be possible to tie that IP address back to you. So, under the GDPR, an IP address is indirectly identifiable personal data.
Directly identifiable data is uniquely tied to an individual, such as name or date of birth. Indirectly identifiable data can be tied back to an individual, but only after additional steps are taken. Jay Exum Privacy Counsel SAS
Are personally identifiable information (PII) and personal data the same thing?
The issue with the term “personally identifiable information” is that its definition is ambiguous. In most cases, people define PII in one of two ways: Either with a category-based approach (as in, PII includes these specific types of things about me) – or, by how easily the information could be associated with them. The GDPR definition of personal data, on the other hand, doesn’t care about any of that. It covers everything that even theoretically could be tied back to an individual.
Consider a public social media feed. Many people would say that’s not personal data because it’s not private or sensitive – after all, it’s already been published to the world. But it’s still associated with you as an individual because it’s associated with your Twitter and email accounts, and they can ultimately be traced back to you. According to the GDPR, that means it’s personal data.
The GDPR regulations talk about the data controller and the data processor. What do these terms mean, and what role do controllers and processors play in protecting personal data?
The GDPR definition of processing is as broad as the definition of personal data. Almost anything you can think of – any activity associated with personal data – constitutes processing. It’s not just computer operations; it doesn’t even have to involve a computer. Processing includes collecting, storing, providing access to and backing up data, digital or otherwise. Even if you simply look at personal data from a remote connection, it’s still considered “processing” the data.
According to the GDPR, both controllers and processors “process” personal data. But the data controller is the party with authority to decide what happens to its data. The processor does something specific with the data on behalf of the controller.
For example, controllers decide who to share their data with, and how it will be used. SAS is the controller for its HR data. We decide who our processors will be – like payroll and health care providers. But SAS is sometimes a data processor too, such as when we place a customer’s data in a hosted environment.
GDPR puts the bulk of responsibility on controllers. For example, it’s the controller’s responsibility to respond to a data subject’s right to be forgotten request. But data controllers need to make sure they’re working with reputable processors that will enable them to respond to requests like that.
The GDPR says both controllers and processors can be held accountable for mistreating personal data in some circumstances. Although controllers may bear primary responsibility in many ways under the GDPR, data processors bear direct responsibility of their own.
Is it possible to hide personal data, or make it anonymous? Would that be sufficient protection for personal data under the GDPR?
First, it’s important to distinguish “anonymizing” data from “pseudonymizing” it. Truly anonymous data is not governed by the GDPR, but the GDPR is strict about what that means. It’s only anonymous if it would be essentially impossible for anyone to identify an individual with the data that remains. This is a difficult standard to meet.
Pseudonymizing means that you have done something to make it very difficult – but not impossible – for an individual’s personal data to be used to identify them. Think about a list of employees and their salaries. You could use a sophisticated algorithm to scramble identifying information so no one could interpret it without the key. This would be a privacy-friendly thing to do with the data. But even though the data couldn’t be read or easily connected to an individual after this process, it’s still personal data because the keys to unlock that data exist somewhere.
There’s no simple answer to whether pseudonymization is “sufficient.” First, there are lots of different ways to do it, and they aren’t all equally protective. Another reason: The GDPR expects you to use privacy measures that are appropriate for the nature of the information and the processing done with it. So your efforts might be sufficient in a medium-sensitivity scenario, but not in a high-sensitivity scenario.
Should I give up on trying to de-identify personal data? What else can I do to protect personal data?
You can’t easily get away from personal data as defined by the GDPR, but it’s still a great idea to pseudonymize your data when it’s practical to do so. Even though the GDPR still considers it personal data, de-identifying data means you’re taking steps to protect privacy. And that’s a good thing.
There are many other things you can do to protect personal data, too. Just to outline a few: You can control access to personal data; use encryption where appropriate; maintain sensible governance policies; make sure you don’t collect or use more personal data than you have a legitimate need for; and securely dispose of it when there’s no longer a valid reason to keep it.
Sensitivity changes the risks associated with data – so you need to be extra attentive to protecting sensitive personal data, like genetic data or religious beliefs. But you always need to be thoughtful – all personal data should be handled with care. Jay Exum Privacy Counsel SAS
What are some ways personal data can be exposed or misused? What concerns does this raise?
The Facebook and Cambridge Analytica incident highlighted issues many people hadn’t considered before. The story started with the Cambridge professor collecting personal data from around 240,000 individuals for academic purposes – a use that was disclosed by Facebook’s privacy policy and terms of use. But when he gave the data to a third party, the number of people affected grew to 87 million. Why? Because the data was no longer just that of the original Facebook account holders – it was all their connections’ data, too.
To be proactive in protecting personal data, organizations should consider “privacy by design and by default.” This data minimization approach means you collect and store the minimal amount of data needed to run your business – for no longer than necessary – and you only disclose what’s essential. It means asking questions up front, and thinking through future scenarios. For example, you might ask: Should we be giving this data to the person or organization that requested it? If so, how much data should we provide? In what form? What can we do to protect the data before we share it?
If you look at your data from a privacy standpoint first, you’ll do more vetting, narrow the data fields and avoid using sensitive personal data. You could provide pseudonymous data to third parties. This approach would be far less intrusive on privacy and would harm fewer people if the data were ever compromised. Another advantage: If your organization embraces privacy by design and later gets hacked, the resulting damage will be much less severe.
Do you think most US businesses fully understand personal data in context of the EU GDPR?
US companies with a global footprint have been thinking about the broad GDPR definition of personal data for a while. Because SAS is global, we’ve had to really wrap our minds around this. We know data subjects in the EU can hold us accountable for what we do with their data.
Responding to data subject rights requests entails knowing where all that person’s information resides in multiple systems, how it flows, how it’s used, stored and protected – and why it’s moving around. Companies that are preparing to be accountable for all the personal data they process under the GDPR quickly realize it’s a significant undertaking, especially those with many years of data in their systems.
What do you think the future will bring – what are the challenges and opportunities?
It’s impossible to foresee all the possibilities, all the contexts where personal data comes into play. We’re just beginning to think as a culture about what this means, and what we should consider reasonable uses of personal data. There’s no map to guide us.
There are clear benefits to GDPR. Rethinking how you manage personal data across the business can help you operate more effectively and efficiently. The process of reviewing how you do things is good hygiene, and can uncover business inefficiencies and risks you didn’t even know you had. It can help free up storage space and bandwidth. What’s more, GDPR will create new revenue opportunities due to new products and business models.
One huge challenge, though, is thinking about how to handle personal data globally – across the organization. Rethinking how you do everything is overwhelming. Is it a legal issue or a technological issue? I think it’s a mix of both. But there’s no straightforward set of steps that will solve all the problems. I think there will be more stories in the news, with lessons learned. Things are chaotic. I think they will stay that way until we reach a point of equilibrium – culturally, legally and technologically. It’s scary and interesting and compelling all at the same time.
Recommended reading
- Article Unlocking a strategic approach to data and AIAI is only as good as the data that powers it – this is a fundamental truth about data and AI that defines the limits of what’s possible with artificial intelligence. It may seem surprising, but it's rarely a bad algorithm or a bad learning model that causes AI failures. It's not the math or the science. More often, it's the quality of the data being used to answer the question.
- Interview How do you know if you’re ready for Hadoop?Ask an early adopter. Epsilon VP of Products answers our Hadoop questions about implementation, preparation and scalability.
- Article The “problem-solver” approach to data preparationNoted technology author David Loshin explains why it's important to know what the problems are before getting data ready for analytics.
- Article Soccer versus baseball: which is the best analogy for data governance?Is data governance more like baseball, featuring individual effort, or like soccer, where a team approach wins? Carol Newcomb evaluates the best sports analogy for data governance.
Ready to subscribe to Insights now?