Automating Privacy in Data Use Featured Signal of Change: SoC1198 December 2020
Since the early days of big data, privacy concerns have limited the potential of data-driven software. For example, 2005's SoC083 — Proliferating Privacy Threats notes that threats to privacy seemed to be emerging at an exponential rate, and 2013's SoC655 — Big Data, Big Concerns says that organizations hoping to develop big-data systems face the hurdles of data regulation and consumers' privacy expectations. SoC933 — Snooping Technologies from 2017 argues that the line between customer service and privacy infringement is becoming narrower, and SoC1059 — (P)Review 2018/2019: Data and Privacy? from the beginning of 2019 provides an overview of all the privacy-related topics that Scan™ discussed in 2018. SoC1171 — Covid-19 and Civil Liberties from 2020 discusses how the use of digital tracking to combat and control the covid-19 (coronavirus-disease-2019) pandemic has come head-to-head with data privacy. But what if a future in which large-scale data collection and processing and robust individual privacy could coexist were possible? In such a future, big data could finally fulfill its potential, unimpeded by privacy regulations and concerns. Data-driven artificial intelligence would likely see significant progress, as would data-driven computing in general. Health care, education, consumer services, and other fields may see breakthroughs. Companies that have seen privacy issues temper their digital-transformation goals would be able to implement aggressive digital strategies. Although uncertainties exist, current signposts suggest that a long-term future in which automation has essentially eliminated privacy problems is becoming plausible. Automated-privacy-management tools, federated machine learning, and other emerging technologies could protect individual privacy and enable organizations to exploit rapidly growing amounts of data to the fullest.
Privacy-management tools should lower the cost of regulatory compliance and reduce risk.
Already, various vendors offer semiautomated software to help organizations comply with privacy legislation such as the European General Data Protection Regulation (GDPR) and the California Consumer Privacy Act. Investors have shown significant interest in such software. For example, BigID (New York, New York) has raised close to $150 million in funding. The company offers a platform that uses data analytics alongside compliance software to help organizations identify and map personal data (even if those data have seen distribution across multiple on-premises and cloud-based systems). The platform also partially automates privacy-related functions such as fulfilling data-access requests from customers or employees. And Securiti.ai (San Jose, California) is only a little more than two years old and has raised $81 million in funding. Like BigID, Securiti.ai uses machine learning to manage distributed personal data and to partially automate various aspects of privacy compliance. Other examples exist. InCountry (San Francisco, California) launched in May 2019 and has raised $40 million to date. InCountry assists with privacy compliance by helping companies manage data storage across jurisdictions. Other notable providers include Privitar (London, England), TrustArc (San Francisco, California), and OneTrust (Atlanta, Georgia, and London, England), which acquired competitor Integris Software in mid-2020.
Privacy-management solutions could level the playing field between companies that can and companies that cannot invest substantial sums in compliance. Google's (Alphabet; Mountain View California) workforce apparently spent hundreds of years of human time on GDPR compliance in the run-up to the GDPR's introduction. Privacy-management tools should lower the cost of regulatory compliance and reduce risk, potentially encouraging companies (including those outside the tech sector) to develop new data services across international markets.
Another family of software that aims to automate data privacy is federated-learning systems for AI. Machine-learning AI is reliant on (typically very large) data sets to learn its intelligent behaviors; however, in many cases, privacy laws or privacy concerns limit developers' ability to access data for machine learning. Federated-learning systems aim to address these concerns by breaking the AI-training task into discrete modules and distributing the training task to local devices. For example, if an individual's phone contains data necessary for the AI training, the modularized machine-learning training system can utilize the data on the phone itself without ever uploading them to the cloud (only the trained AI module sees uploading). Google, IBM (Armonk, New York), Intel (Santa Clara, California), and Microsoft (Redmond, Washington) are among the organizations developing federated-machine-learning software.
Further progress in privacy automation will likely occur. Notably, IBM recently completed field trials of fully homomorphic encryption (FHE) that enables computers to perform computation and logical operations on encrypted data without having to decrypt the data first. In essence, FHE enables computers to do with private encrypted data anything they can do with unencrypted data—at least in theory. Although this progress by IBM enables a promising step forward, FHE requires substantial increases in computational power and memory. For example, IBM's tests revealed that FHE-encrypted machine-learning models require 40 to 50 times the computational power and 10 to 20 times the memory of unencrypted models to perform the same tasks. For now, these performance trade-offs likely limit applications to industries such as financial services, health care, and government; however, the trade-offs will likely diminish as computing speeds increase and FHE techniques evolve.
Today's automated privacy tools are limited and in most cases do not fully automate privacy-related tasks. Even so, they point the way to a future in which organizations can automate privacy-compliance tasks and personal-data processing does not compromise privacy. Although this future is becoming plausible, it is at least ten years away, and significant uncertainty exists. Notable technical challenges remain (for example, FHE is still in the early stages of development), and trust in privacy-protecting software needs to develop. The good news is that along the road to this future, opportunities do exist—for example, the current opportunity to streamline privacy compliance by using already-available commercial tools.