Abstract


Responsive image

We present a deep neural network for removing undesirable shading features from an unconstrained portrait image, recovering the underlying texture. Our training scheme incorporates three regularization strategies: masked loss, to emphasize high-frequency shading features; soft-shadow loss, which improves sensitivity to subtle changes in lighting; and shading-offset estimation, to supervise separation of shading and texture. Our method demonstrates improved delighting quality and generalization when compared with the state-of-the-art. We further demonstrate how our delighting method can enhance the performance of light-sensitive computer vision tasks such as face relighting and semantic parsing, allowing them to handle extreme lighting conditions.


Overview


Responsive image

(1) We train a U-Net based CNN to estimate the ground-truth de-lit image Idlt from a given upper-body portrait image Isrc. (2) A second decoder learns the offset image Ioff = Isrc - Idlt during training, allowing the latent space to discriminate between shading and texture. (3) We synthesize soft-shadow variants of each input image in our training set, and apply a small regularization loss to their outputs to improve generalization to subtle changes in lighting. (4) We localize high-frequency shading (i.e. shadow boarders, reflections) in our training images, and emphasize these regions in our loss function to motivate the removal of small but visually significant lighting artifacts.


Dataset


Responsive image

We modify a popular face-recognition dataset: CMU Multi-PIE (a), to synthesize source (d), target (c) and all intermediary images (e, f, g) required for our supervised learning pipeline. Our training set consists of 130 subjects each captured under two different poses/clothing (260 unique images). 1,293 lighting conditions (d) were generated for each unique image.


Results


Responsive image

We compare our method with recent state-of-the-art methods: Total Relighting (TR), and Single Image Portrait Relighting via Explicit Multiple Reflectance Channel Modelling (EMR). We emphasize our methods consistency regarding the removal of light artifacts, the recovery of texture, and the preservation of non-lighting based content.


Applications


Responsive image

Our method serves as an effective data normalization tool for face relighting by removing shading artifacts from the input image, Our method also improves the semantic clarity of face images under harsh illuminations, which greatly improves the results of existing face parsing pipelines.



Presentation Video



Citation



                @InProceedings{weir2022deep, 
                title="Deep Portrait Delighting",
                author="Weir, Joshua and Zhao, Junhong and Chalmers, Andrew and Rhee, Taehyun",
                booktitle="Computer Vision -- ECCV 2022",
                year="2022"}


Acknowledgement


This work was supported by the Entrepreneurial University Programme from the Tertiary Education Commission, and MBIE Smart Idea Programme by Ministry of Business, Innovation and Employment in New Zealand. We thank all image providers, including Flickr users: “Debarshi Ray”, “5of7” and “photographer695”, whose photographs were cropped, and processed by our neural network. Sources are provided in the supplementary material.