LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

Yang Yang1  Wen Wang1  Liang Peng3   Chaotian Song2   Yao Chen2  
Hengjia Li1   Xiaolong Yang4   Qinglin Lu4   Deng Cai1   Boxi Wu2   Wei Liu4  

1State Key Lab of CAD&CG, Zhejiang University, China 2The School of Software Technology, Zhejiang University, China
   3Fabu Inc. China    4Tencent Inc. China

📖TL;DR: LoRA-Composer allowing users to generate images with fewer conditions and readily accessible LoRA techniques (only require prompt and layout condition).

 

For example, we extend following 16 customized concepts. 👇 👇 👇

 




 

Abstract

Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts. Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a Low-Rank Adaptations (LoRA) fusion matrix of multiple LoRA to merge various concepts into a single image. However, we identify this straightforward method faces two major challenges: 1) concept confusion, which occurs when the model cannot preserve distinct individual characteristics, and 2) concept vanishing, where the model fails to generate the intended subjects. To address these issues, we introduce LoRA-Composer, a training-free framework designed for seamlessly integrating multiple LoRAs, thereby enhancing the harmony among different concepts within generated images. LoRA-Composer addresses concept vanishing through Concept Injection Constraints, enhancing concept visibility via an expanded cross-attention mechanism. To combat concept confusion, Concept Isolation Constraints are introduced, refining the self-attention computation. Furthermore, Latent Re-initialization is proposed to effectively stimulate concept-specific latent within designated regions. Our extensive testing showcases a notable enhancement in LoRA-Composer's performance compared to standard baselines, especially when eliminating the image-based conditions like canny edge or pose estimations.

 

Main Observation

Our method distinguishes itself from Mix-of-Show eliminating the image-based conditions (the sketch shown above) and the requirement to train a LoRA fusion matrix. Furthermore, we highlight the limitations of Mix-of-Show through the demonstration of failure cases. In the top row, we illustrate two key issues: concept vanishing, marked by the absence of intended concepts in the image, and concept confusion, where the model mistakenly merges and confuses distinct concepts

Method Overview

LoRA-Composer utilizes textual, layout, and image-based conditions (optional) to integrate and customize multiple concepts through Latent Re-initialization for precise layout generation.
Modifications to the Stable Diffusion U-Net in LoRA-Composer Block include concept isolation in self-attention and Concept Injection in cross-attention, optimizing for accurate concept placement while preventing feature leakage across concepts.

 

Three Highlights of LoRA-Composer


Multi-Concept Generation (without image-based conditions)


Multi-Concept Generation (with image-based conditions)

The yellow box emphasizes the issue of concept confusion, while red boxes underscore instances of concept vanishing.

 

Bibtex


    @article{yang2024loracomposer,
        title   = {LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models},
        author  = {Yang Yang and Wen Wang and Liang Peng and Chaotian Song and Yao Chen and Hengjia Li and Xiaolong Yang and Qinglin Lu and Deng Cai and Boxi Wu and Wei Liu},
        year    = {2024},
        journal = {arXiv preprint arXiv: 2403.11627}
      }