A group of AI researchers from Facebook, Virginia Tech, and National Tsing Hua University in Taiwan say they’ve created a novel way to generate 3D photos that’s superior to Facebook 3D Photos and other existing methods. Facebook 3D Photos launched in October 2018 for dual-camera smartphones like the iPhone X, which uses its TrueDepth camera to determine depth in photos. In the new research, the authors use a range of photos taken with an iPhone to demonstrate how their approach gets rid of the blur and discontinuity that other 3D methods introduce.
The method could make for better Facebook 3D Photos some day, but if the method for creating better 3D photos translates to other environments it can lead to more lifelike immersions in environments with 3D digital graphics like virtual games and meetings or applications in ecommerce or a future metaverse.
The new learning-based method can generate 3D photos from RGB-D imagery like photos taken with an iPhone X. It also works with simpler 2D photos by using a pretrained depth estimation model. Authors applied their method to historic images of the 20th century to demonstrate effectiveness on 2D images.
The work also claims better performance than Nvidia’s Xview, as well as Local Light Field Fusion (LLFF), a model highlighted last year by a consortium of authors at computer graphics conference SIGGRAPH.
Performance of 3D models was assessed using randomly sampled imagery from the RealEstate10K data set. Head-to-head demos of advanced 3D image generation methods are available on a website, and in supplementary material created by authors Meng-Li Shih, Shih-Yang Su, Johannes Kopf, and Jia-Bin Huang.
In recent months, Facebook, Microsoft, and Nvidia released tech to generate 3D objects from 2D images, but the new method relies heavily on inpainting. Inpainting is the process of AI predicting missing pixels in a photograph. It’s been used to auto-crop Google Photos videos and to make better unsupervised generative adversarial networks.
The cutting-edge 3D photo approach was detailed in a paper published in preprint arXiv. Motivated in part by the EdgeConnect in 2019 for generative inpainting adversarial models, authors say their work is different in that it applies inpainting for both color and depth value predictions. Another key difference is that the new learning method adapts to local depth complexity and does not require predetermining a fixed number of layers. Both Facebook 3D Photos and the experimental approach introduced in the recent paper rely on layered depth image (LDI) representation for a more adaptive approach.
“Each LDI pixel stores a color and a depth value. Unlike the original LDI work, we explicitly represent the local connectivity of pixels: each pixel stores pointers to either zero or at most one direct neighbor in each of the four cardinal directions (left, right, top, bottom),” the paper reads. “Unlike most previous approaches we do not require predetermining a fixed number of layers. Instead our algorithm adapts by design to the local depth-complexity of the input and generates a varying number of layers across the image. We have validated our approach on a wide variety of photos captured in different situations.”
The paper was accepted for publication at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), which will take place next month. Initially scheduled to take place June 16-18 in Seattle, CVPR will, like other major researcher conferences, move entirely online. According to the AI Index 2019 report, CVPR is one of the largest annual machine learning conferences for AI researchers.