Increasingly, researchers are using AI to transform historical footage — like the Apollo 16 moon landing and 1895 Lumière Brothers film “Arrival of a Train at La Ciotat station” — into high-resolution, high-framerate videos that look as though they’ve been shot with modern equipment. It’s a boon for preservationists, and as an added bonus, the same techniques can be applied to footage for security screening, television production, filmmaking, and other such scenarios. In an effort to simplify the process, researchers at the University of Rochester, Northeastern University, and Purdue University recently proposed a framework that generates high-resolution slow-motion video from a low frame rate, low-resolution video. They say that their approach — Space-Time Video Super-Resolution (STVSR) — not only generates quantitatively and qualitatively better videos than existing methods, but that it’s three times faster than previous state-of-the-art AI models.
In some ways, it advances the work Nvidia published in 2018, which described an AI model that could apply slow motion to any video — regardless of the video’s framerate. And similar up-resolution techniques have been applied in the video game domain. Last year, fans of Final Fantasy used a $100 piece of software called A.I. Gigapixel to improve the resolution of Final Fantasy VII’s backdrops.
STVSR learns temporal interpolation (i.e., how to synthesize nonexistent intermediate video frames in between original frames) and spatial super-resolution (how to reconstruct a high-resolution frame from the corresponding reference frame and its neighboring supporting frames) simultaneously. Moreover, thanks to a companion convolutional long short-term memory model, it’s able to leverage a video’s context with temporal alignment to reconstruct frames from the aggregated features.
The researchers trained STVSR using a data set of over 60,000 7-frame clips from Vimeo, with a separate evaluation corpus split into fast motion, medium motion, and slow-motion sets to measure performance under various conditions. In experiments, they found that STVSR obtained “significant” improvements on videos with fast motions, including those with “challenging” motions like basketball players quickly moving up a court. Moreover, it demonstrated an aptitude for reconstructing “visually appealing” frames with more accurate image structures and fewer blurring artifacts, while at the same time remaining up to four times smaller and at least two times faster than the baseline models.
“With such a one-stage design, our network can well explore intra-relatedness between temporal interpolation and spatial super-resolution in the task,” wrote the coauthors of the preprint paper describing the work. “It enforces our model to adaptively learn to leverage useful local and global temporal contexts for alleviating large motion issues. Extensive experiments show that our … framework is more effective yet efficient than existing … networks, and the proposed feature temporal interpolation network and deformable [model] are capable of handling very challenging fast motion videos.”
The researchers intend to release the source code this summer.