LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models
1State Key Lab of CAD&CG, Zhejiang University, China | 2The School of Software Technology, Zhejiang University, China |
3Fabu Inc. China | 4Tencent Inc. China |
📖TL;DR: LoRA-Composer allowing users to generate images with fewer conditions and readily accessible LoRA techniques (only require prompt and layout condition).
For example, we extend following 16 customized concepts. 👇 👇 👇
Abstract
Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts. Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a Low-Rank Adaptations (LoRA) fusion matrix of multiple LoRA to merge various concepts into a single image. However, we identify this straightforward method faces two major challenges: 1) concept confusion, which occurs when the model cannot preserve distinct individual characteristics, and 2) concept vanishing, where the model fails to generate the intended subjects. To address these issues, we introduce LoRA-Composer, a training-free framework designed for seamlessly integrating multiple LoRAs, thereby enhancing the harmony among different concepts within generated images. LoRA-Composer addresses concept vanishing through Concept Injection Constraints, enhancing concept visibility via an expanded cross-attention mechanism. To combat concept confusion, Concept Isolation Constraints are introduced, refining the self-attention computation. Furthermore, Latent Re-initialization is proposed to effectively stimulate concept-specific latent within designated regions. Our extensive testing showcases a notable enhancement in LoRA-Composer's performance compared to standard baselines, especially when eliminating the image-based conditions like canny edge or pose estimations.
Main Observation
Our method distinguishes itself from Mix-of-Show eliminating the image-based conditions (the sketch shown above) and the requirement to train a LoRA fusion matrix. Furthermore, we highlight the limitations of Mix-of-Show through the demonstration of failure cases. In the top row, we illustrate two key issues: concept vanishing, marked by the absence of intended concepts in the image, and concept confusion, where the model mistakenly merges and confuses distinct concepts
Method Overview
LoRA-Composer utilizes textual, layout, and image-based conditions (optional) to integrate and customize multiple concepts through Latent Re-initialization for precise layout generation.
Modifications to the Stable Diffusion U-Net in LoRA-Composer Block include concept isolation in self-attention and Concept Injection in cross-attention, optimizing for accurate concept placement while preventing feature leakage across concepts.
Three Highlights of LoRA-Composer
Multi-Concept Generation (without image-based conditions)
Multi-Concept Generation (with image-based conditions)
The yellow box emphasizes the issue of concept confusion, while red boxes underscore instances of concept vanishing.
Bibtex