Facebook today open-sourced Opacus, a library for training PyTorch models with differential privacy that’s ostensibly more scalable than existing methods. With the release of Opacus, Facebook says it hopes to provide an easier path for engineers to adopt differential privacy in AI and to accelerate in-the-field differential privacy research.
Typically, differential privacy entails injecting a small amount of noise into the raw data before feeding it into a local machine learning model, thus making it difficult for malicious actors to extract the original files from the trained model. An algorithm can be considered differentially private if an observer seeing its output cannot tell if it used a particular individual’s information in the computation.
“Our goal with Opacus is to preserve the privacy of each training sample while limiting the impact on the accuracy of the final model. Opacus does this by modifying a standard PyTorch optimizer in order to enforce (and measure) differential privacy during training. More specifically, our approach is centered on differentially private stochastic gradient descent,” Facebook explained in a blog post. “The core idea behind this algorithm is that we can protect the privacy of a training dataset by intervening on the parameter gradients that the model uses to update its weights, rather than the data directly.”
Opacus uniquely leverages hooks in PyTorch to achieve an “order of magnitude” speedup compared with existing libraries, according to Facebook. Moreover, it keeps track of how much of the “privacy budget” — a core mathematical concept in differential privacy — has been spent at any given point in time to enable real-time monitoring.
Opacus also employs a cryptographically safe, pseudo-random, GPU-accelerated number generator for security-critical code, and it ships with tutorials and helper functions that warn about incompatible components. The library works behind the scenes with PyTorch, Facebook says, producing standard AI models that can be deployed as usual without extra steps.
“We hope that by developing PyTorch tools like Opacus, we’re democratizing access to such privacy-preserving resources,” Facebook wrote. “We’re bridging the divide between the security community and general machine learning engineers with a faster, more flexible platform using PyTorch.”
The release of Opacus follows Google’s decision to open-source the differential privacy library used in some its core products, such as Google Maps, as well as an experimental module for TensorFlow Privacy that enables assessments of the privacy properties of various machine learning classifiers. More recently, Microsoft released WhiteNoise, a platform-agnostic toolkit for differential privacy in Azure and in open source on GitHub.