Google today released a demo app that visualizes a real-time point cloud from uDepth, the stereo depth sensor built into the Pixel 4 and Pixel 4 XL. As the tech giant explains in a lengthy post, uDepth taps machine learning to identify users while protecting against spoof attacks; it supports a number of features, including post-processing photo retouching and depth-based scene segmentation as well as background blur, portrait effects, and 3D photos.
Google warns not to expect support for the app, which isn’t available on the Google Play Store, and the UI is about as bare-bones as it gets. Selecting the Mode button in the bottom-left corner switches between a heatmap and color visualization, while tapping and dragging on the visualization changes the perspective.
uDepth might be exclusive to the Pixel 4, but the platform’s technologies could inform development in other domains, like robotics, where accurate depth sensing is the key to obstacle avoidance. Or it might lead to advancements in computer vision, a technology core to robotic process automation and countless other applications.
uDepth — which was recently exposed as an API, and which is available on the Pixel 4 via the Google Camera app’s Social Media Depth setting — employs two infrared (IR) cameras, an infrared pattern projector, and the Pixel 4’s dedicated neural chip (the Pixel Neural Core). To produce frames with depth, it looks at regions surrounding pixels in images captured by one camera and tries to find similar regions in images from the second camera, projecting an IR pattern into the scene that’s detected by both cameras to make low-texture regions easier to identify.
To match the regions, uDepth compiles depth proposals by comparing non-overlapping image tiles. (A machine learning model draws on brightness and neighbor information to adjust incorrect matches in less than 1.5 milliseconds per frame.) And to account for divergences from the cameras’ factory calibration as the result of hard falls, uDepth evaluates depth images for signs of miscalibration, building confidence in the device state and generating parameters from the current scene if something’s amiss.
uDepth infers 3D depth maps using an AI model that combines color images, people segmentation, and raw depth. According to Google, training it required building a volumetric capture system that could produce “near-photorealistic” models of people using a geodesic sphere outfitted with 331 LEDs, an array of high-resolution cameras, and a set of high-resolution depth sensors. Pixel 4 phones within the sphere were synchronized with the lights and cameras to create real images and synthetic renderings from the handsets’ camera viewpoints.
“The ability to determine 3D information about the scene, called depth sensing, is a valuable tool for developers and users alike, [but] typical RGB-based stereo depth sensing techniques can be computationally expensive, suffer in regions with low texture, and fail completely in extreme low light conditions,” wrote uDepth software lead Michael Schoenberg and Google Research hardware and systems lead Adarsh Kowdle in the Google post. “Because the Face Unlock feature on Pixel 4 must work at high speed and in darkness, it called for a different approach.”