Learn Your Scales: Towards Scale-Consistent Generative Novel View Synthesis

Fereshteh Forghani¹, Jason J. Yu¹, Tristan Aumentado-Armstrong^1,3, Konstantinos G. Derpanis^1,2,3, Marcus A. Brubaker^1,2,4

{forghani,jjyu,kosta,marcus.brubaker}@yorku.ca, tristan.a@samsung.com

¹York University, ²Vector Institute for AI, ³Samsung AI Centre Toronto ⁴Google DeepMind

Authors

Fereshteh Forghani

Jason J. Yu

Tristan Aumentado-Armstrong

Konstantinos G. Derpanis

Marcus A. Brubaker

Material

arXiv

BibTex

RealEstate10K Qualitative Scale Noise Edge Maps

This viewer allows individual novel view samples with the same conditioning information to be viewed to inspect their entropy. Select a scene from the menu below. Then use the frame controls to automatically loop through the samples to get a sense of the differences between samples. Use the toggle button below the image to highlight regions of interest.

Scene Selection

Toggle regions of interest

Frame Control (automated playback or manual slider)

Notice how the samples from models that use scale learning (two right-most columns) "jitter" less.

Stop animation

Play

Sample Flow Consistency (SFC)

SFC measures scale variability using motion variation among generated images with the same conditioning image and camera pose. We measure motion by optical flow and use the median absolute deviation (MAD) of them as a proxy to quantify scale uncertainty. The lower the SFC is, the more consistent scales of the samples are.

In this viewer, we visualize the components of our proposed method, SFC. Same as the previous viewer, you can select a scene from the menu below, view novel view samples with the same conditioning information, calculated optical flows, and the MAD map generated from the flows. The control buttons are the same as the above viewer.

Scene Selection

Frame Control (automated playback or manual slider)

Notice how the MAD maps of the samples from models that use scale learning (two right-most columns) are darker and their flow maps "fliker" less.

Stop animation

Play

Scale-Sensitive Thresholded Symmetric Epipolar Distance (SS-TSED)

Start from a camera pose as the conditioning view.

Translate it along one of the axes (e.g., the x-axis) and generate the corresponding frame.

Then translate it along another one of the axes (e.g., the y-axis) and generate the corresponding frame.

3D position of a point in the conditioning view will always lie on a ray originating from the conditioning view.

However, different scene scales from each generated view will place the point at different distances from the ray.

Different scales cause the 2D position of the point observed from one camera to lie a distance off the epipolar line/plane formed by the generated views and the point observed by they other camera.

Whereas, consistent scales in both directions causes the 2D position of the point observed from one camera to lie on the epipolar line/plane formed by the generated views and the point observed by the other camera.

Learn Your Scales: Towards Scale-Consistent Generative Novel View Synthesis

Abstract

Authors

Material

RealEstate10K Qualitative Scale Noise Edge Maps

Scene Selection

Frame Control (automated playback or manual slider)

Sample Flow Consistency (SFC)

Scene Selection

Frame Control (automated playback or manual slider)

Scale-Sensitive Thresholded Symmetric Epipolar Distance (SS-TSED)

BibTeX