Julian Ost

Ph.D. Candidate at Princeton University | Inverse Rendering

Inverse Neural Rendering for Tracking

Inverse Neural Rendering for Explainable Multi-Object Tracking

3D Multi-Object Tracking Inverse Rendering Explainability

We propose to recast 3D multi-object tracking (MOT) from RGB cameras as an Inverse Neural Rendering (INR) problem. By optimizing via a differentiable rendering pipeline over the latent space of pre-trained 3D object representations, we retrieve the latents that best represent object instances in a given input image. We investigate not only an alternate take on tracking but our method also enables examining the generated objects, reasoning about failure cases, and resolving ambiguous cases. We validate the generalization and scaling capabilities of our method on automotive datasets that are completely unseen to our method and do not require fine-tuning.

Neural Point Light Fields

Neural Point Light Fields

CVPR 2022 Light Fields Neural Rendering Point Clouds

Neural Point Light Fields represent scenes implicitly with a light field living on a sparse point cloud. This approach combines the efficiency of point-based representations with the high-quality view synthesis capabilities of neural rendering, enabling realistic scene reconstruction from sparse inputs. Promoting sparse point clouds to neural implicit light fields allows us to represent large scenes effectively with only a single radiance evaluation.

Neural Scene Graphs visualization

Neural Scene Graphs for Dynamic Scenes

CVPR 2021 (Oral) Neural Rendering Scene Understanding 3D Reconstruction

Neural Scene Graphs combines traditional scene graph representations with neural networks. It presents a first neural rendering approach that decomposes dynamic multi-object scenes into a learned scene graph representation, that encode object transformation and radiance, to efficiently render novel arrangements and views of the scene. The proposed method is assessed on synthetic and real automotive data, validating that our approach learns dynamic scenes - only by observing a video of this scene - and allows for rendering novel photo-realistic views of novel scene compositions and manipulation with unseen sets of objects at unseen poses.