Like Apple and Google, Facebook has been working to develop computer vision connectivity that extends the power of mobile apps.
While Apple's ARKit and Google's ARCore guessed the location of horizontal and vertical surfaces using computer vision. Facebook researchers now claim to have prototype mechanisms that allow the 3D shape of an object to be derived from its 2D image information.
If Facebook's research is as groundbreaking as it sounds, then so does the above-mentioned mobile toolkits from Apple and Google. Facebook's breakthroughs as a basis for future AR wearables could contribute to their own smart glass development.
This week Facebook researchers Georgia Gkioxari, Shubham Tulsiani and David Novotny published their findings on four new methods for 3D image recognition presented at the International Conference on Computer Vision (ICCV) in Seoul, South Korea, to takes 2 November.
Two of these methods involve the identification of 3D objects from 2D images. Based on the Mask R-CNN model for segmenting the objects in an image (and presented at the same conference last year), Mesh R-CNN inserts the 3D shapes of these identified objects, resembling occlusion, disarray, and other challenging compilations from photos.
"Adding a third dimension to object recognition systems that are robust to such complexities requires more technical skills, and current technical frameworks have hampered progress in this area," the team wrote in a blog post.
In addition, the team built another computer vision model that serves as an alternative and complement to Mesh R -CNN serves. With the cheeky C3DPO (Canonical 3D Pose Networks) system, 3D objects with only 2D key points for 14 object categories such as birds, humans and automobiles can be reconstructed on a large scale.
There were such reconstructions before. This is not possible because of memory limitations in the earlier matrix factorization methods, because they can not work in minibatch mode, unlike our deep network. Former methods used deformity modeling by using multiple simultaneous images and creating correspondences between instant 3D reconstructions, "wrote the team, which requires hardware that is mainly found in specialized laboratories." Introduced by C3DPO Efficiency advantages allow 3D reconstruction in cases where the use of hardware for 3D acquisition is not feasible, eg.
The team also built a system for canonical surface mapping, the This way, computer vision applications can better understand the common properties of different objects in an AR scene.
"For example, if we're training a system to have the right place This may be our use for sitting on a chair or grabbing a mug. Useful the next time the system needs to know where to sit on another chair or how to grab another mug, "wrote the team Tasks can not only help deepen our understanding of traditional 2D imagery and video content, but also the AR / VR experience by transferring representations of objects. "
Finally, VoteNet is an experimental 3D object recognition network that produces a 3D Point cloud with only geometric information and not with color images can understand exactly.
"VoteNet has a simple design, a compact model size and high efficiency" The team claims, with a speed of about 100 milliseconds, for a complete scene and a smaller footprint than previous methods developed for research purposes. " Algorithm captures 3D point clouds from depth cameras and returns 3D bounding boxes of objects with their semantic classes. "