Spatiotemporally Consistent Indoor Lighting Estimation with Diffusion Priors

SIGGRAPH 2025

Mutian Tong ¹

Rundi Wu ¹

Changxi Zheng ¹

¹Columbia University

Paper Code

TL;DR: Our method can recover 4D lighting estimation from any monocular RGB video or image input.

Pipeline overview. We first fine-tune an inpainting diffusion model on small-scale synthetic datasets, which enables it to generalize well to real-world indoor scenes. Because ground-truth 4D lighting data are scarce, we train only on static scenes and, during optimization, lift the model’s predictions into a temporally consistent 4D lighting representation using our proposed method.

Abstract

Indoor lighting estimation from a single image or video remains a challenge due to its highly ill-posed nature, especially when the lighting condition of the scene varies spatially and temporally. We propose a method that estimates from an input video a continuous light field describing the spatiotemporally varying lighting of the scene. We leverage 2D diffusion priors for optimizing such light field represented as a MLP. To enable zero-shot generalization to in-the-wild scenes, we fine-tune a pre-trained image diffusion model to predict lighting at multiple locations by jointly inpainting multiple chrome balls as light probes. We evaluate our method on indoor lighting estimation from a single image or video and show superior performance over compared baselines. Most importantly, we highlight results on spatiotemporally consistent lighting estimation from in-the-wild videos, which is rarely demonstrated in previous works.

Spatially Varying Lighting Estimation from Image

Our method enables spatially varying lighting estimation from a single image. In the video below, we show on the left the estimating lighting as a chrome ball moving in space and show on the right the corresponding depth map on the right.

Try selecting different scenes:

Spatiotemporally Varying Lighting Estimation from Video

Our method enables lighting estimation from a single video of dynamic lighting conditions. In the video below, we show on the left the estimating lighting as a chrome ball moving in space and show on the right the corresponding depth map.

Try selecting different scenes:

Objects relighting results on real-world videos

The environment maps estimated by our method can be used to relight objects in the scene, enabling realistic object insertion and relighting in real-world videos.

Citation

@inproceedings{10.1145/3721238.3730749, author = {Tong, Mutian and Wu, Rundi and Zheng, Changxi}, title = {Spatiotemporally Consistent Indoor Lighting Estimation with Diffusion Priors}, year = {2025}, isbn = {9798400715402}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3721238.3730749}, doi = {10.1145/3721238.3730749}, booktitle = {Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers}, articleno = {107}, numpages = {11}, keywords = {Lighting estimation, diffusion models}, series = {SIGGRAPH Conference Papers '25} }