Basic Knowledge of View Synthesis

Mar 14, 2024

                                               [Runyi Yang](<https://runyiyang.github.io/>) / [Runyi’s Blogs](<https://runyiyang.notion.site/Runyi-s-Blogs-f52d6bf73e104c51a4f5e80529b6a9b6>)

1. What is 3D content

3D contents are combinations of shape and appearance models that can be rendered into 2D images from different viewpoints.

Untitled

Computer Graphics: Assume we have a bunch of 3D properties. How to set up an efficient & high-quality rendering process.
Computer Vision: Given images/point clouds/RGBD which are accessible by devices (e.g. cameras), get the principle elements of it. Focus on inverse rendering.

What shape representations to use?
- Mesh
- Point cloud
- Occupancy Field
- Signed Distance Fields
What appearance representations to use?
- Material Texture Map & Environmental Lighting:
  - Split the lighting and material
  - Ideal ways, but hard to achieve
- Radiance Field (surface light field [1] Siggraph 2000)
  - Not Split the lighting and material
  - Simple but can’t edit and relight
What rendering operator to use?
- Depend on the specific shape and appearance representations
- Need to be completely differentiable

2. Task: View Synthesis / What is View Synthesis?

Untitled

View synthesis is a problem of generating a synthetic image that looks as if it was taken from a novel viewpoint.

Imaging you have a set of images for a scene, which are enough to describe details of a scene / item in the 3D world, you could definitely imagine what it would be like in a novel view. This task is to teach algorithms / computers to imagine this.

3. How to Evaluate View Synthesis

PSNR: Peak Signal to Noise Ratio
- Describe the difference between a noisy image and a clear image
- For a clear image $I$ and a noisy image $K$, both are in the dimension of $m \times n$, $MSE$ shows the mean square error between 2 images with corresponding pixels. $MAX$ shows the dynamic range of the pixel value, e.g. 255.
- The difference between 2 images decreases as the $MSE$ decreases, and the $PSNR$ increases.
$$ MSE = \frac{1}{mn}\sum_{i=0}^{m-1}\sum_{j=0}^{n-1} [I(i,j) - K(i, j)]^2 $$

$$ PSNR = 10·\log_{10}(\frac{MAX^2_I}{MSE}) $$
SSIM: Structural Similarity Index Measure
- To determine whether the picture is distorted and describe the similarity between 2 images
- Measured three properties of the picture: Luminance, Contrast, Structure
  - luminance
    
    $$ \mu_x = \frac{1}{N}\sum^N_{i=1}x_i \\ l(x,y) = \frac{2\mu_x\mu_y + C_1}{\mu_x^2+\mu_y^2+C_1} $$
  - Contrast
    
    $$ \sigma_x = (\frac{1}{N-1}\sum_{i=1}^N(x_i-\mu_x)^2)^\frac{1}{2} \\ c(x,y) = \frac{2\sigma_x\sigma_y + C_2}{\sigma_x^2+\sigma_y^2+C_2} $$
  - Structure
    
    $$ \sigma_{xy}=\frac{1}{N-1}\sum_{i=1}^N(x_i-\mu_x)(y_i-\mu_y) \\ s(x,y) = \frac{\sigma_{xy} + C_3}{\sigma_x\sigma_y^+C_3} $$
- SSIM
  
  $$ SSIM(x,y) = l(x,y)^\alpha · c(x,y)^\beta · s(x,y)^\gamma $$
  
  where $\alpha, \beta, \gamma$ are hyperparameters and are usually set to 1.
  - Circular-symmetric Gaussian Weighting Function
LPIPS: Learned Perceptual Image Patch Similarity

Use the deep feature to compare the similarity to solve the problem that using L2, PSNR, and SSIM couldn’t recognize the smoothed images.

Use the deep neural network (e.g. AlexNet) to generate features of 2 images, and compare the 2 features using L2 / MSE.