Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors

ECCV 2024

1School of Software, Tsinghua University, 2Kuaishou Technology 3Wayne State University

Abstract

Unsigned distance functions (UDFs) have been a vital representation for open surfaces. With different differentiable renderers, current methods are able to train neural networks to infer a UDF by minimizing the rendering errors on the UDF to the multi-view ground truth. However, these differentiable renderers are mainly handcrafted, which makes them either biased on ray-surface intersections, or sensitive to unsigned distance outliers, or not scalable to large scale scenes. To resolve these issues, we present a novel differentiable renderer to infer UDFs more accurately. Instead of using handcrafted equations, our differentiable renderer is a neural network which is pre-trained in a data-driven manner. It learns how to render unsigned distances into depth images, leading to a prior knowledge, dubbed volume rendering priors. To infer a UDF for an unseen scene from multiple RGB images, we generalize the learned volume rendering priors to map inferred unsigned distances in alpha blending for RGB image rendering. Our results show that the learned volume rendering priors are unbiased, robust, scalable, 3D aware, and more importantly, easy to learn. We evaluate our method on both widely used benchmarks and real scenes, and report superior performance over the state-of-the-art methods.

Method

In this paper, we (1) introduce volume rendering priors to infer UDFs from multi-view images. Our prior can be learned in a data-driven manner, which provides a novel perspective to recover geometry with prior knowledge through volume rendering. (2) propose a novel deep neural network and learning scheme, and report extensive analysis to learn an unbiased differentiable renderer for UDFs with robustness, scalability, and 3D awareness.

Here is an overview of our method. In the training phase, our volume rendering prior takes sliding windows of GT UDFs from training meshes as input, and outputs opaque densities for alpha blending. The parameters are optimized by the error between rendered depth and ground truth depth maps. During the testing phase, we freeze the volume rendering prior and use ground truth multi-view RGB images to optimize a randomly initialized UDF field.

Visualization Results

Comparison on Deepfashion3D Dataset

Visual comparisons on open surface reconstructions with error maps on DeepFashion3D dataset. Note that NeAT uses additional mask supervision. The transition from blue to yellow indicates small to large reconstruction errors.

Comparison on DTU Dataset

Comparison on Replica Dataset

Comparison on Real-Captured Datasets

Comparison on Real-Captured Datasets. (a) is the visualization of our real-captured scenes. The first one is a paper-folding greeting card, and the second one is a plant. (b) is the visualization of the real scans used in NeUDF. The right-top and right-bottom part of each image are enlarged details and rendering views, respectively.

Visualization Video

BibTeX

@article{zhang2024learning,
      title={Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors},
      author={Zhang, Wenyuan and Shi, Kanle and Liu, Yu-Shen and Han, Zhizhong},
      journal={European Conference on Computer Vision},
      year={2024},
      organization={Springer}
    }