Transfer learning is an area of intense AI research — it focuses on storing knowledge gained while solving a problem and applying it to a related problem. But despite recent breakthroughs, it’s not yet well-understood what enables a successful transfer and which parts of algorithms are responsible for it.
That’s why Google researchers sought to develop analysis techniques tailored to explainability challenges in transfer learning. In a new paper, they say their contributions help to solve a few of the mysteries around why machine learning models successfully — or unsuccessfully — transfer.
During the first of several experiments in the course of the study, the researchers sourced images from a medical imaging data set of chest x-rays (CheXpert) and sketches, clip art, and paintings from the open source DomainNet corpus. They partitioned each image into equal-sized blocks and shuffled the blocks randomly, disrupting the images’ visual features, after which they compared the agreements and disagreements between models trained from pretraining versus from scratch.
The researchers found the reuse of features — the individual measurable properties of a phenomenon being observed — is an important factor in successful transfers, but not the only one. Low-level statistics of the data that weren’t disturbed by things like shuffling the pixels also play a role. Moreover, any two instances of models trained from pretrained weights make similar mistakes, suggesting these models capture features in common.
Working from this knowledge, the researchers attempted to pinpoint where feature reuse occurs within models. They observed that features become more specialized the denser the model becomes (in terms of layers) and that feature-reuse is more prevalent in layers closer to the input. (Deep learning models contain mathematical functions arranged in layers that transmit signals from input data.) They also find it’s possible to fine-tune pretrained models on a target task sooner than originally assumed without sacrificing accuracy.
“Our observation of low-level data statistics improving training speed could lead to better network initialization methods,” the researchers wrote. “Using these findings to improve transfer learning is of interest for future work.”
A better understanding of transfer learning could yield substantial algorithmic performance gains. Already, Google is using transfer learning in Google Translate so that insights gleaned through training on high-resource languages including French, German, and Spanish (which have billions of parallel examples) can be applied to the translation of low-resource languages like Yoruba, Sindhi, and Hawaiian (which have only tens of thousands of examples). Another Google team has applied transfer learning techniques to enable robot control algorithms to learn how to manipulate objects faster with less data.