PortraitBooth : A Versatile Portrait Model for Fast Identity-preserved Personalization

1Xiamen University,2Tencent,3Nanjing University

TL;DR:PortraitBooth enables text-to-portrait generation from a single image, preserving identity, promoting diverse expression editing, and supporting multi-subject generation with low training costs, eliminating the need for finetuning during inference.

Your Image Description

Abstract

Recent advancements in personalized image generation using diffusion models have been noteworthy. However, existing methods suffer from inefficiencies due to the requirement for subject-specific fine-tuning. This computationally intensive process hinders efficient deployment, limiting practical usability. Moreover, these methods often grapple with identity distortion and limited expression diversity. In light of these challenges, we propose PortraitBooth, an innovative approach designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation, without the need for fine-tuning. PortraitBooth leverages subject embeddings from a face recognition model for personalized image generation without fine-tuning. It eliminates computational overhead and mitigates identity distortion. The introduced dynamic identity preservation strategy further ensures close resemblance to the original image identity. Moreover, PortraitBooth incorporates emotion-aware cross-attention control for diverse facial expressions in generated images, supporting text-driven expression editing. Its scalability enables efficient and high-quality image creation, including multi-subject generation. Extensive results demonstrate superior performance over other state-of-the-art methods in both single and multiple image generation scenarios.

Method

Your Image Description

Overview framework of PortraitBooth. In the current field of portrait generation, most methods utilize a CLIP Image Encoder to extract the identity embedding from reference images. However, this approach only captures the superficial appearance without grasping the essence, and it fails to allow for expression editing. This has inspired us to develop a more advanced portrait generation method that not only maintains a higher level of identity preservation but also enables expression editing. For more detailed information, please refer to our paper.

Expression Editing

Your Image Description
Your Image Description

Our model supports diverse facial expression and attribute editing while maintaining high identity preservation.

Comparisons

Your Image Description

Comparison with state-of-the-art methods in identity-preserving personalized portrait generation.

Your Image Description

Comparison with the ongoing work, InstantID.

More results

Your Image Description
Your Image Description

Our method is easily extendable. It can be combined with multi-subject generation methods to achieve personalized portrait generation for multiple subjects.

BibTeX

@article{peng2023portraitbooth,
        title={PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization},
        author={Peng, Xu and Zhu, Junwei and Jiang, Boyuan and Tai, Ying and Luo, Donghao and Zhang, Jiangning and Lin, Wei and Jin, Taisong and Wang, Chengjie and Ji, Rongrong},
        journal={arXiv preprint arXiv:2312.06354},
        year={2023}
      }