Julian Ost

PhD Candidate at Princeton University | 3D Generation & Inverse Rendering

Profile picture of Julian Ost

I'm a PhD candidate in Computer Science at the Princeton Computational Imaging Lab advised by Felix Heide. My primary research interests lie at the intersection of computer vision and computer graphics, particularly in 3D generation and inverse rendering for perception. I graduated with a B.Sc. in Mechanical Engineering and a M.Sc. in Robotics from the Technical University of Munich (TUM).

News

  • Mar 2026 One paper accepted at SIGGRAPH 2026
  • Jan 2026 LSD-3D presented at AAAI 2026 in Singapore

Selected Publications

LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding

AAAI-2026 3D Scene Generation World Simulation Neural Rendering

Large-scale scene data is essential for training and testing in robot learning. Neural reconstruction methods have promised the capability of reconstructing large physically-grounded outdoor scenes from captured sensor data. However, these methods have baked-in static environments and only allow for limited scene control -- they are functionally constrained in scene and trajectory diversity by the captures from which they are reconstructed. In contrast, generating driving data with recent image or video diffusion models offers control, however, at the cost of geometry grounding and causality. We aim to bridge this gap and present a method that directly generates large-scale 3D driving scenes with accurate geometry, allowing for causal novel view synthesis with object permanence and explicit 3D geometry estimation.

Towards generalizable and interpretable three-dimensional tracking with inverse neural rendering

Nature Machine Intelligence 3D Multi-Object Tracking Inverse Rendering Explainability

We propose to recast 3D multi-object tracking (MOT) from RGB cameras as an Inverse Neural Rendering (INR) problem. By optimizing via a differentiable rendering pipeline over the latent space of pre-trained 3D object representations, we retrieve the latents that best represent object instances in a given input image. We investigate not only an alternate take on tracking but our method also enables examining the generated objects, reasoning about failure cases, and resolving ambiguous cases. We validate the generalization and scaling capabilities of our method on automotive datasets that are completely unseen to our method and do not require fine-tuning.

Neural Point Light Fields

CVPR 2022 Light Fields Neural Rendering Point Clouds

Neural Point Light Fields represent scenes implicitly with a light field living on a sparse point cloud. This approach combines the efficiency of point-based representations with the high-quality view synthesis capabilities of neural rendering, enabling realistic scene reconstruction from sparse inputs. Promoting sparse point clouds to neural implicit light fields allows us to represent large scenes effectively with only a single radiance evaluation.

Neural Scene Graphs for Dynamic Scenes

CVPR 2021 (Oral) Neural Rendering Scene Understanding 3D Reconstruction

Neural Scene Graphs combines traditional scene graph representations with neural networks. It presents a first neural rendering approach that decomposes dynamic multi-object scenes into a learned scene graph representation, that encode object transformation and radiance, to efficiently render novel arrangements and views of the scene. The proposed method is assessed on synthetic and real automotive data, validating that our approach learns dynamic scenes - only by observing a video of this scene - and allows for rendering novel photo-realistic views of novel scene compositions and manipulation with unseen sets of objects at unseen poses.

Robotics Publications

VERDI: VLM-Embedded Reasoning for Autonomous Driving

End-to-end planning Vision-Language Models (VLM) Autonomous Driving

VERDI introduces a novel framework for VLM-embedded reasoning in autonomous driving scenarios. We aim to achieve both (1) fast online planning with a modularized differentiable end-to-end (e2e) architecture, and (2) human-like reasoning process with a vision-language model (VLM). Our key idea is to distill the reasoning process and commonsense knowledge from a VLM to the e2e driving model.

Teaching

COS324 Intro to Machine Learning
Graduate Teaching Assistance, Fall 2024
COS429 Intro to Computer Vision
Graduate Teaching Assistance, Spring 2024

Service

Outstanding Reviewer Award
ECCV 2024, CVPR 2023
Reviewer
Vision: CVPR (2025, 2024, 2023), ICCV (2025, 2023), ECCV 2024
Machine Learning: NeurIPS (2025, 2024, 2023), ICLR 2023, AAAI 2026
Graphics: SIGGRAPH 2026, SIGGRAPH Asia 2023, Eurographics 2025