Julian Ost — Princeton CS PhD, 3D Generation & Inverse Rendering

News

May 2026 CaRaFe accepted to SIGGRAPH 2026 — camera-radar radiance fields for scene reconstruction
Apr 2026 WorldFlow3D released — flow matching for unbounded 3D worlds
Mar 2026 ChopGrad released — memory-efficient pixel losses for video diffusion
Jan 2026 LSD-3D presented at AAAI 2026 in Singapore

Publications on 3D & Video Generation

WorldFlow3D: Unbounded 3D world generation through flow matching.

WorldFlow3D: Flowing Through 3D Distributions for Unbounded World Generation

Julian Ost^*, Amogh Joshi^*, Felix Heide ^*equal contribution

3D Scene Generation Diffusion Models Flow Matching

Unbounded 3D world generation is emerging as a foundational task for scene modeling in computer vision, graphics, and robotics. We present WorldFlow3D, a novel method capable of generating unbounded 3D worlds. Building upon a foundational property of flow matching – defining a path of transport between two data distributions – we model 3D generation more generally as a problem of flowing through 3D data distributions, not limited to conditional denoising. Our latent-free flow approach generates causal and accurate 3D structure, and can use this as an intermediate distribution to guide the generation of more complex structure and high-quality texture – all while converging more rapidly than existing methods. We enable controllability over generated scenes with vectorized scene layout conditions for geometric structure control and visual texture control through scene attributes. We validate WorldFlow3D on both real outdoor driving scenes and synthetic indoor scenes, confirming cross-domain generalizability and high-quality generation.

Project Page Paper (PDF)

Generated scenes from ground and from far away.

LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding

Julian Ost^*, Andrea Ramazzina^*, Amogh Joshi^*, Maximilian Bömer, Mario Bijelic, Felix Heide ^*equal contribution

AAAI-2026 3D Scene Generation World Simulation Neural Rendering

Large-scale scene data is essential for training and testing in robot learning. Neural reconstruction methods have promised the capability of reconstructing large physically-grounded outdoor scenes from captured sensor data. However, these methods have baked-in static environments and only allow for limited scene control -- they are functionally constrained in scene and trajectory diversity by the captures from which they are reconstructed. In contrast, generating driving data with recent image or video diffusion models offers control, however, at the cost of geometry grounding and causality. We aim to bridge this gap and present a method that directly generates large-scale 3D driving scenes with accurate geometry, allowing for causal novel view synthesis with object permanence and explicit 3D geometry estimation.

Project Page Paper (PDF)

ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation

Dmitriy Rivkin, Parker Ewen, Lili Gao, Julian Ost, Stefanie Walz, Rasika Kangutkar, Mario Bijelic, Felix Heide

Video Diffusion Video Super-Resolution Video Inpainting

Training video diffusion models with pixel-level supervision is constrained by the memory cost of backpropagating through the decoder. We introduce ChopGrad, a technique that reduces training memory from scaling linearly with the number of video frames to constant memory by implementing truncated backpropagation through the decoder's causal cache mechanism. Exploiting the observation that gradient influence decays exponentially over temporal distance, we demonstrate applications in video super-resolution, video inpainting, artifact removal in novel view synthesis, and controlled driving video generation.

Project Page Paper (arXiv)

Publications on Neural Rendering

CaRaFe: Camera-Radar Radiance Fields for outdoor scene reconstruction.

CaRaFe: Camera-Radar Radiance Fields for Scene Reconstruction

David Borts, Julian Ost, Shamik Basu, Tim Broedermann, Andrea Ramazzina, Christos Sakaridis, Mario Bijelic, Felix Heide

SIGGRAPH 2026 Neural Rendering Radar Sensor Fusion

CaRaFe combines camera and radar measurements into a unified volumetric neural representation for outdoor scene reconstruction. Radar offers robust, reliable metric depth but lacks angular and elevation resolution, while cameras provide high angular resolution with no innate depth. Our approach reconstructs outdoor driving scenes that capture fine geometric detail from cameras while being anchored to metric scale through radar.

Project Page Paper (PDF) Supplement

3D tracked car models with bounding boxes overlaid on a street scene.

Towards generalizable and interpretable three-dimensional tracking with inverse neural rendering

Julian Ost^*, Tanushree Banerjee^*, Mario Bijelic, Felix Heide ^*equal contribution

Nature Machine Intelligence 3D Multi-Object Tracking Inverse Rendering Explainability

We propose to recast 3D multi-object tracking (MOT) from RGB cameras as an Inverse Neural Rendering (INR) problem. By optimizing via a differentiable rendering pipeline over the latent space of pre-trained 3D object representations, we retrieve the latents that best represent object instances in a given input image. We investigate not only an alternate take on tracking but our method also enables examining the generated objects, reasoning about failure cases, and resolving ambiguous cases. We validate the generalization and scaling capabilities of our method on automotive datasets that are completely unseen to our method and do not require fine-tuning.

Project Page Paper (PDF) Code

Neural Point Light Fields

Julian Ost, Issam Laradji, Alejandro Newell, Yuval Bahat, Felix Heide

CVPR 2022 Light Fields Neural Rendering Point Clouds

Neural Point Light Fields represent scenes implicitly with a light field living on a sparse point cloud. This approach combines the efficiency of point-based representations with the high-quality view synthesis capabilities of neural rendering, enabling realistic scene reconstruction from sparse inputs. Promoting sparse point clouds to neural implicit light fields allows us to represent large scenes effectively with only a single radiance evaluation.

Project Page Paper (PDF) Code

Neural Scene Graph visualization for an autonomous driving scene.

Neural Scene Graphs for Dynamic Scenes

Julian Ost, Fahim Mannan, Nils Thuerey, Julian Knodt, Felix Heide

CVPR 2021 (Oral) Neural Rendering Scene Understanding 3D Reconstruction

Neural Scene Graphs combines traditional scene graph representations with neural networks. It presents a first neural rendering approach that decomposes dynamic multi-object scenes into a learned scene graph representation, that encode object transformation and radiance, to efficiently render novel arrangements and views of the scene. The proposed method is assessed on synthetic and real automotive data, validating that our approach learns dynamic scenes - only by observing a video of this scene - and allows for rendering novel photo-realistic views of novel scene compositions and manipulation with unseen sets of objects at unseen poses.

Project Page Paper (PDF) Code

Publications on Embodied AI

ScenarioControl: vectorized latent driving scenario with map, agents, and ego view.

ScenarioControl: Vision-Language Controllable Vectorized Latent Scenario Generation

Lili Gao^*, Yanbo Xu^*, William Koch^*, Samuele Ruffino, Luke Rowe, Behdad Chalaki, Dmitriy Rivkin, Julian Ost, Roger Girgis, Mario Bijelic, Felix Heide ^*equal contribution

Scenario Generation Simulation Autonomous Driving

ScenarioControl introduces vision-language control mechanism for learned driving scenario generation. Given a text prompt or an input image, it synthesizes diverse, realistic 3D scenario rollouts — including map structure, reactive agents over time, pedestrians, infrastructure, and ego-view observations. The model generates scenes in a vectorized latent space that jointly represents road structure and dynamic agents. This unified representation enables fine-grained controllability over scene layout and agent behavior, providing a scalable source of rare and safety-critical scenarios for training and evaluating autonomous driving systems.

Project Page Paper (arXiv) Code

VERDI: VLM-Embedded Reasoning for Autonomous Driving

Bowen Feng^*, Zhiting Mei^*, Julian Ost, Filippo Ghilotti, Baiang Li, Roger Girgis, Anirudha Majumdar, Felix Heide ^*equal contribution

End-to-end planning Vision-Language Models (VLM) Autonomous Driving

VERDI introduces a novel framework for VLM-embedded reasoning in autonomous driving scenarios. We aim to achieve both (1) fast online planning with a modularized differentiable end-to-end (e2e) architecture, and (2) human-like reasoning process with a vision-language model (VLM). Our key idea is to distill the reasoning process and commonsense knowledge from a VLM to the e2e driving model.

Project Page Paper (PDF)

Teaching

COS324 Intro to Machine Learning

Graduate Teaching Assistance, Fall 2024

COS429 Intro to Computer Vision

Graduate Teaching Assistance, Spring 2024

Service

Outstanding Reviewer Award

ECCV 2024, CVPR 2023

Reviewer

Vision: CVPR (2025, 2024, 2023), ICCV (2025, 2023), ECCV 2024
Machine Learning: NeurIPS (2025, 2024, 2023), ICLR 2023, AAAI 2026
Graphics: SIGGRAPH 2026, SIGGRAPH Asia 2023, Eurographics 2025