面向真实 3D 物体的感知、理解、重建与生成是计算机视觉领域一直倍受关注的问题,也在近年来取得了飞速的进展。然而,由于社区中长期缺乏大规模的实采 3D 物体数据库,大部分技术方法仍依赖于 ShapeNet[1] 等仿真数据集。再者,仿真数据与真实数据之间的外观和分布差距巨大,这大大限制了它们在现实生活中的应用。为了解决这一困难,近年来也有一些优秀的工作如 CO3D[2] 等从视频/多视角图片中寻求突破点,并利用 SfM 的方式重建 3D 点云,然而这种方式得到的点云往往难以提供完整、干净、精准的 3D 表面和纹理。因此,社区迫切需要一个大规模且高质量的真实世界 3D 物体扫描数据集,这将有助于推进许多3D视觉任务和下游应用。
关于数据集本身,我们会致力于不断扩大和更新数据集以满足更广泛的研究需求。除了现有的应用,我们还计划进一步发展其他下游任务,如 2D / 3D 物体检测和 6D 姿态估计等。除了感知和重建任务外,在 AIGC 时代,我们相信 OmniObject3D 能够在推动真实感 3D 生成方面发挥至关重要的作用。
References:
[1] Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3d model repository. arXiv.org, 1512.03012, 2015.[2] Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10901–10911, 2021.[3] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009.[4] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), pages 740–755, 2014.[5] Agrim Gupta, Piotr Dollar, and Ross Girshick. LVIS: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5356–5364, 2019.[6] Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Thanh Nguyen, and Sai-Kit Yeung. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1588–1597, 2019.[7] Jiawei Ren, Liang Pan, and Ziwei Liu. Benchmarking and analyzing point cloud classification under corruptions. In Proceedings of the International Conference on Machine learning (ICML), 2022.[8] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision (ECCV), pages 405–421, 2020.[9] Henrik Aanæs, Rasmus Ramsbøl Jensen, George Vogiatzis, Engin Tola, and Anders Bjorholm Dahl. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision (IJCV), 120(2):153–168, 2016.[10] Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d textured shapes learned from images. In Advances in Neural Information Processing Systems (NIPS), 2022.