Latent Diffusion
FixedGroupNorm
¶
FixedGroupNorm(target: GroupNorm)
Bases: Chain
, Adapter[GroupNorm]
Adapter for GroupNorm layers to fix the running mean and variance.
This is useful when running tiled inference with a autoencoder to ensure that the statistics of the GroupNorm layers are consistent across tiles.
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
LatentDiffusionAutoencoder
¶
Bases: Chain
Latent diffusion autoencoder model.
Attributes:
Name | Type | Description |
---|---|---|
encoder_scale |
The encoder scale to use. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
device | str | None
|
The PyTorch device to use. |
None
|
dtype
|
dtype | None
|
The PyTorch data type to use. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
decode
¶
encode
¶
images_to_latents
¶
Convert a list of images to latents.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images
|
list[Image]
|
The list of images to convert. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
A tensor containing the latents associated with the images. |
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
latents_to_image
¶
latents_to_image(x: Tensor) -> Image
Decode latents to an image.
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
latents_to_images
¶
tiled_image_to_latents
¶
tiled_image_to_latents(image: Image) -> Tensor
Convert an image to latents with gradient blending to smooth tile edges.
You need to activate the tiled inference context manager with the tiled_inference
method to use this method.
```python with lda.tiled_inference(sample_image, tile_size=(768, 1024)): latents = lda.tiled_image_to_latents(sample_image)
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
tiled_inference
¶
tiled_inference(
image: Image,
tile_size: tuple[int, int] = (512, 512),
blending: int = 64,
) -> Generator[None, None, None]
Context manager for tiled inference operations to save VRAM for large images.
This context manager sets up a consistent GroupNorm statistics for performing tiled operations on the autoencoder, including setting and resetting group norm statistics. This allow to make sure that the result is consistent across tiles by capturing the statistics of the GroupNorm layers on a downsampled version of the image.
Be careful not to use the normal image_to_latents
and latents_to_image
methods while this context manager is
active, as this will fail silently and run the operation without tiling.
```python with lda.tiled_inference(sample_image, tile_size=(768, 1024), blending=32): latents = lda.tiled_image_to_latents(sample_image) decoded_image = lda.tiled_latents_to_image(latents)
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
tiled_latents_to_image
¶
tiled_latents_to_image(x: Tensor) -> Image
Convert latents to an image with gradient blending to smooth tile edges.
You need to activate the tiled inference context manager with the tiled_inference
method to use this method.
```python with lda.tiled_inference(sample_image, tile_size=(768, 1024)): image = lda.tiled_latents_to_image(latents)
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
LatentDiffusionModel
¶
LatentDiffusionModel(
unet: Chain,
lda: LatentDiffusionAutoencoder,
clip_text_encoder: Chain,
solver: Solver,
classifier_free_guidance: bool = True,
device: device | str = "cpu",
dtype: dtype = float32,
)
Source code in src/refiners/foundationals/latent_diffusion/model.py
init_latents
¶
init_latents(
size: tuple[int, int],
init_image: Image | None = None,
noise: Tensor | None = None,
) -> Tensor
Initialize the latents for the diffusion process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size
|
tuple[int, int]
|
The size of the latent (in pixel space). |
required |
init_image
|
Image | None
|
The image to use as initialization for the latents. |
None
|
noise
|
Tensor | None
|
The noise to add to the latents. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/model.py
sample_noise
staticmethod
¶
sample_noise(
size: tuple[int, ...],
device: device | None = None,
dtype: dtype | None = None,
offset_noise: float | None = None,
) -> Tensor
Sample noise from a normal distribution with an optional offset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size
|
tuple[int, ...]
|
The size of the noise tensor. |
required |
device
|
device | None
|
The device to put the noise tensor on. |
None
|
dtype
|
dtype | None
|
The data type of the noise tensor. |
None
|
offset_noise
|
float | None
|
The offset of the noise tensor. Useful at training time, see https://www.crosslabs.org/blog/diffusion-with-offset-noise. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/model.py
set_inference_steps
¶
Set the steps of the diffusion process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_steps
|
int
|
The number of inference steps. |
required |
first_step
|
int
|
The first inference step, used for image-to-image diffusion.
You may be used to setting a float in |
0
|
Source code in src/refiners/foundationals/latent_diffusion/model.py
ControlLora
¶
Bases: Passthrough
ControlLora is a Half-UNet clone of the target UNet,
patched with various LoRA
layers, ZeroConvolution
layers, and a ConditionEncoder
.
Like ControlNet, it injects residual tensors into the target UNet. See https://github.com/HighCWu/control-lora-v2 for more details.
Gets context:
Type | Description |
---|---|
Float[Tensor, 'batch condition_channels width height']
|
The input image. |
Sets context:
Type | Description |
---|---|
list[Tensor]
|
The residuals to be added to the target UNet's residuals. (context="unet", key="residuals") |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the ControlLora. |
required |
unet
|
SDXLUNet
|
The target UNet. |
required |
scale
|
float
|
The scale to multiply the residuals by. |
1.0
|
condition_channels
|
int
|
The number of channels of the input condition tensor. |
3
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/control_lora.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 |
|
ControlLoraAdapter
¶
ControlLoraAdapter(
name: str,
target: SDXLUNet,
scale: float = 1.0,
condition_channels: int = 3,
weights: dict[str, Tensor] | None = None,
)
Bases: Chain
, Adapter[SDXLUNet]
Adapter for ControlLora
.
This adapter simply prepends a ControlLora
model inside the target SDXLUNet
.
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/control_lora.py
load_condition_encoder
staticmethod
¶
load_condition_encoder(
state_dict: dict[str, Tensor], control_lora: ControlLora
)
Load the ConditionEncoder
's layers from the state_dict into the ControlLora
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict
|
dict[str, Tensor]
|
The state_dict containing the ConditionEncoder layers to load. |
required |
control_lora
|
ControlLora
|
The ControlLora to load the ConditionEncoder layers into. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/control_lora.py
load_lora_layers
staticmethod
¶
load_lora_layers(
name: str,
state_dict: dict[str, Tensor],
control_lora: ControlLora,
) -> None
Load the LoRA
layers from the state_dict into the ControlLora
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the ControlLora. |
required |
state_dict
|
dict[str, Tensor]
|
The state_dict containing the LoRA layers to load. |
required |
control_lora
|
ControlLora
|
The ControlLora to load the LoRA layers into. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/control_lora.py
load_weights
¶
Load the weights from the state_dict into the ControlLora
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict
|
dict[str, Tensor]
|
The state_dict containing the weights to load. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/control_lora.py
load_zero_convolution_layers
staticmethod
¶
load_zero_convolution_layers(
state_dict: dict[str, Tensor], control_lora: ControlLora
)
Load the ZeroConvolution
layers from the state_dict into the ControlLora
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict
|
dict[str, Tensor]
|
The state_dict containing the ZeroConvolution layers to load. |
required |
control_lora
|
ControlLora
|
The ControlLora to load the ZeroConvolution layers into. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/control_lora.py
SDXLAutoencoder
¶
Bases: LatentDiffusionAutoencoder
Stable Diffusion XL autoencoder model.
Attributes:
Name | Type | Description |
---|---|---|
encoder_scale |
float
|
The encoder scale to use. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
device | str | None
|
The PyTorch device to use. |
None
|
dtype
|
dtype | None
|
The PyTorch data type to use. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
SDXLIPAdapter
¶
SDXLIPAdapter(
target: SDXLUNet,
clip_image_encoder: CLIPImageEncoderH | None = None,
image_proj: (
ImageProjection | PerceiverResampler | None
) = None,
scale: float = 1.0,
fine_grained: bool = False,
weights: dict[str, Tensor] | None = None,
)
Image Prompt adapter for the Stable Diffusion XL U-Net model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
SDXLUNet
|
The SDXLUNet model to adapt. |
required |
clip_image_encoder
|
CLIPImageEncoderH | None
|
The CLIP image encoder to use. |
None
|
image_proj
|
ImageProjection | PerceiverResampler | None
|
The image projection to use. |
None
|
scale
|
float
|
The scale to use for the image prompt. |
1.0
|
fine_grained
|
bool
|
Whether to use fine-grained image prompt. |
False
|
weights
|
dict[str, Tensor] | None
|
The weights of the IPAdapter. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/image_prompt.py
SDXLLcmAdapter
¶
SDXLLcmAdapter(
target: SDXLUNet,
condition_scale_embedding_dim: int = 256,
condition_scale: float = 7.5,
)
Bases: Chain
, Adapter[SDXLUNet]
Note that LCM must be used without CFG. You can disable CFG on SD by setting the
classifier_free_guidance
attribute to False
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
SDXLUNet
|
A SDXL UNet. |
required |
condition_scale_embedding_dim
|
int
|
LCM uses a condition scale embedding, this is its dimension. |
256
|
condition_scale
|
float
|
Because of the embedding, the condition scale must be passed to this adapter instead of SD. The condition scale passed to SD will be ignored. |
7.5
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/lcm.py
SDXLUNet
¶
Bases: Chain
Stable Diffusion XL U-Net.
See [arXiv:2307.01952] SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels
|
int
|
Number of input channels. |
required |
device
|
device | str | None
|
Device to use for computation. |
None
|
dtype
|
dtype | None
|
Data type to use for computation. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/unet.py
set_clip_text_embedding
¶
set_clip_text_embedding(
clip_text_embedding: Tensor,
) -> None
Set the clip text embedding context.
Note
This context is required by the SDXLCrossAttention
blocks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
clip_text_embedding
|
Tensor
|
The CLIP text embedding tensor. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/unet.py
set_pooled_text_embedding
¶
set_pooled_text_embedding(
pooled_text_embedding: Tensor,
) -> None
Set the pooled text embedding context.
Note
This is required by TextTimeEmbedding
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pooled_text_embedding
|
Tensor
|
The pooled text embedding tensor. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/unet.py
set_time_ids
¶
set_time_ids(time_ids: Tensor) -> None
Set the time IDs context.
Note
This is required by TextTimeEmbedding
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
time_ids
|
Tensor
|
The time IDs tensor. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/unet.py
StableDiffusion_XL
¶
StableDiffusion_XL(
unet: SDXLUNet | None = None,
lda: SDXLAutoencoder | None = None,
clip_text_encoder: DoubleTextEncoder | None = None,
solver: Solver | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: LatentDiffusionModel
Stable Diffusion XL model.
Attributes:
Name | Type | Description |
---|---|---|
unet |
SDXLUNet
|
The U-Net model. |
clip_text_encoder |
DoubleTextEncoder
|
The text encoder. |
lda |
SDXLAutoencoder
|
The image autoencoder. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unet
|
SDXLUNet | None
|
The SDXLUNet U-Net model to use. |
None
|
lda
|
SDXLAutoencoder | None
|
The SDXLAutoencoder image autoencoder to use. |
None
|
clip_text_encoder
|
DoubleTextEncoder | None
|
The DoubleTextEncoder text encoder to use. |
None
|
solver
|
Solver | None
|
The solver to use. |
None
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/model.py
compute_clip_text_embedding
¶
compute_clip_text_embedding(
text: str | list[str],
negative_text: str | list[str] = "",
) -> tuple[Tensor, Tensor]
Compute the CLIP text embedding associated with the given prompt and negative prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str | list[str]
|
The prompt to compute the CLIP text embedding of. |
required |
negative_text
|
str | list[str]
|
The negative prompt to compute the CLIP text embedding of.
If not provided, the negative prompt is assumed to be empty (i.e., |
''
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/model.py
compute_self_attention_guidance
¶
compute_self_attention_guidance(
x: Tensor,
noise: Tensor,
step: int,
*,
clip_text_embedding: Tensor,
pooled_text_embedding: Tensor,
time_ids: Tensor,
**kwargs: Tensor
) -> Tensor
Compute the self-attention guidance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor. |
required |
noise
|
Tensor
|
The noise tensor. |
required |
step
|
int
|
The step to compute the self-attention guidance at. |
required |
clip_text_embedding
|
Tensor
|
The CLIP text embedding to compute the self-attention guidance with. |
required |
pooled_text_embedding
|
Tensor
|
The pooled CLIP text embedding to compute the self-attention guidance with. |
required |
time_ids
|
Tensor
|
The time IDs to compute the self-attention guidance with. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The computed self-attention guidance. |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/model.py
has_self_attention_guidance
¶
has_self_attention_guidance() -> bool
Whether the model has self-attention guidance or not.
set_self_attention_guidance
¶
Sets the self-attention guidance.
See [arXiv:2210.00939] Improving Sample Quality of Diffusion Models Using Self-Attention Guidance for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
enable
|
bool
|
Whether to enable self-attention guidance or not. |
required |
scale
|
float
|
The scale to use. |
1.0
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/model.py
set_unet_context
¶
set_unet_context(
*,
timestep: Tensor,
clip_text_embedding: Tensor,
pooled_text_embedding: Tensor,
time_ids: Tensor,
**_: Tensor
) -> None
Set the various context parameters required by the U-Net model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timestep
|
Tensor
|
The timestep to set. |
required |
clip_text_embedding
|
Tensor
|
The CLIP text embedding to set. |
required |
pooled_text_embedding
|
Tensor
|
The pooled CLIP text embedding to set. |
required |
time_ids
|
Tensor
|
The time IDs to set. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/model.py
add_lcm_lora
¶
add_lcm_lora(
manager: SDLoraManager,
tensors: dict[str, Tensor],
name: str = "lcm",
scale: float = 8.0 / 64.0,
check_validity: bool = True,
) -> None
Add a LCM-LoRA or a LoRA with similar structure such as SDXL-Lightning to SDXLUNet.
This is a complex LoRA so SDLoraManager.add_loras() is not enough. Instead, we add the LoRAs to the UNet in several iterations, using the filtering mechanism of auto_attach_loras.
LCM-LoRA can be used with or without CFG in SD. If you use CFG, typical values range from 1.0 (same as no CFG) to 2.0.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
manager
|
SDLoraManager
|
A SDLoraManager for SDXL. |
required |
tensors
|
dict[str, Tensor]
|
The |
required |
name
|
str
|
The name of the LoRA. |
'lcm'
|
scale
|
float
|
The scale to use for the LoRA (should generally not be changed, those LoRAs must use alpha / rank). |
8.0 / 64.0
|
check_validity
|
bool
|
Perform additional checks, raise an exception if they fail. |
True
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/lcm_lora.py
ICLight
¶
ICLight(
patch_weights: dict[str, Tensor],
unet: SD1UNet,
lda: SD1Autoencoder | None = None,
clip_text_encoder: CLIPTextEncoderL | None = None,
solver: Solver | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: StableDiffusion_1
IC-Light is a Stable Diffusion model that can be used to relight a reference image.
At initialization, the UNet will be patched to accept four additional input channels. Only the text-conditioned relighting model is supported for now.
Example
import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from refiners.fluxion.utils import load_from_safetensors, manual_seed, no_grad
from refiners.foundationals.clip import CLIPTextEncoderL
from refiners.foundationals.latent_diffusion.stable_diffusion_1 import SD1Autoencoder, SD1UNet
from refiners.foundationals.latent_diffusion.stable_diffusion_1.ic_light import ICLight
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dtype = torch.float32
no_grad().__enter__()
manual_seed(42)
sd = ICLight(
patch_weights=load_from_safetensors(
path=hf_hub_download(
repo_id="refiners/ic_light.sd1_5.fc",
filename="model.safetensors",
),
device=device,
),
unet=SD1UNet(in_channels=4, device=device, dtype=dtype).load_from_safetensors(
tensors_path=hf_hub_download(
repo_id="refiners/realistic_vision.v5_1.sd1_5.unet",
filename="model.safetensors",
)
),
clip_text_encoder=CLIPTextEncoderL(device=device, dtype=dtype).load_from_safetensors(
tensors_path=hf_hub_download(
repo_id="refiners/realistic_vision.v5_1.sd1_5.text_encoder",
filename="model.safetensors",
)
),
lda=SD1Autoencoder(device=device, dtype=dtype).load_from_safetensors(
tensors_path=hf_hub_download(
repo_id="refiners/realistic_vision.v5_1.sd1_5.autoencoder",
filename="model.safetensors",
)
),
device=device,
dtype=dtype,
)
prompt = "soft lighting, high-quality professional image"
negative_prompt = "lowres, bad anatomy, bad hands, cropped, worst quality"
clip_text_embedding = sd.compute_clip_text_embedding(text=prompt, negative_text=negative_prompt)
image = Image.open("reference-image.png").resize((512, 512))
sd.set_ic_light_condition(image)
x = torch.randn(
size=(1, 4, 64, 64),
device=device,
dtype=dtype,
)
for step in sd.steps:
x = sd(
x=x,
step=step,
clip_text_embedding=clip_text_embedding,
condition_scale=1.5,
)
predicted_image = sd.lda.latents_to_image(x)
predicted_image.save("ic-light-output.png")
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/ic_light.py
compute_gray_composite
staticmethod
¶
Compute a grayscale composite of an image and a mask.
IC-Light will recreate the image
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image
|
Image
|
The image to composite. |
required |
mask
|
Image
|
The mask to use for the composite. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/ic_light.py
set_ic_light_condition
¶
Set the IC light condition.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image
|
Image
|
The reference image. |
required |
mask
|
Image | None
|
The mask to use for the reference image. |
None
|
If a mask is provided, it will be used to compute a grayscale composite of the image and the mask ; otherwise, the image will be used as is, but note that IC-Light requires a 127-valued gray background to work.
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/ic_light.py
SD1Autoencoder
¶
Bases: LatentDiffusionAutoencoder
Stable Diffusion 1.5 autoencoder model.
Attributes:
Name | Type | Description |
---|---|---|
encoder_scale |
float
|
The encoder scale to use. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
device | str | None
|
The PyTorch device to use. |
None
|
dtype
|
dtype | None
|
The PyTorch data type to use. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
SD1ELLAAdapter
¶
Bases: ELLAAdapter[SD1UNet]
ELLA
adapter for Stable Diffusion 1.5.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
SD1UNet
|
The target model to adapt. |
required |
weights
|
dict[str, Tensor] | None
|
The weights of the ELLA adapter (see |
None
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/ella_adapter.py
SD1UNet
¶
Bases: Chain
Stable Diffusion 1.5 U-Net.
See [arXiv:2112.10752] High-Resolution Image Synthesis with Latent Diffusion Models for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels
|
int
|
The number of input channels. |
required |
device
|
device | str | None
|
The PyTorch device to use for computation. |
None
|
dtype
|
dtype | None
|
The PyTorch dtype to use for computation. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/unet.py
set_clip_text_embedding
¶
set_clip_text_embedding(
clip_text_embedding: Tensor,
) -> None
Set the CLIP text embedding.
Note
This context is required by the CLIPLCrossAttention
blocks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
clip_text_embedding
|
Tensor
|
The CLIP text embedding. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/unet.py
StableDiffusion_1
¶
StableDiffusion_1(
unet: SD1UNet | None = None,
lda: SD1Autoencoder | None = None,
clip_text_encoder: CLIPTextEncoderL | None = None,
solver: Solver | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: LatentDiffusionModel
Stable Diffusion 1.5 model.
Attributes:
Name | Type | Description |
---|---|---|
unet |
SD1UNet
|
The U-Net model. |
clip_text_encoder |
CLIPTextEncoderL
|
The text encoder. |
lda |
SD1Autoencoder
|
The image autoencoder. |
Example:
import torch
from refiners.fluxion.utils import manual_seed, no_grad
from refiners.foundationals.latent_diffusion.stable_diffusion_1 import StableDiffusion_1
# Load SD
sd15 = StableDiffusion_1(device="cuda", dtype=torch.float16)
sd15.clip_text_encoder.load_from_safetensors("sd1_5.text_encoder.safetensors")
sd15.unet.load_from_safetensors("sd1_5.unet.safetensors")
sd15.lda.load_from_safetensors("sd1_5.autoencoder.safetensors")
# Hyperparameters
prompt = "a cute cat, best quality, high quality"
negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"
seed = 42
sd15.set_inference_steps(50)
with no_grad(): # Disable gradient calculation for memory-efficient inference
clip_text_embedding = sd15.compute_clip_text_embedding(text=prompt, negative_text=negative_prompt)
manual_seed(seed)
x = sd15.init_latents((512, 512)).to(sd15.device, sd15.dtype)
# Diffusion process
for step in sd15.steps:
x = sd15(x, step=step, clip_text_embedding=clip_text_embedding)
predicted_image = sd15.lda.latents_to_image(x)
predicted_image.save("output.png")
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unet
|
SD1UNet | None
|
The SD1UNet U-Net model to use. |
None
|
lda
|
SD1Autoencoder | None
|
The SD1Autoencoder image autoencoder to use. |
None
|
clip_text_encoder
|
CLIPTextEncoderL | None
|
The CLIPTextEncoderL text encoder to use. |
None
|
solver
|
Solver | None
|
The solver to use. |
None
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
compute_clip_text_embedding
¶
compute_clip_text_embedding(
text: str | list[str],
negative_text: str | list[str] = "",
) -> Tensor
Compute the CLIP text embedding associated with the given prompt and negative prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str | list[str]
|
The prompt to compute the CLIP text embedding of. |
required |
negative_text
|
str | list[str]
|
The negative prompt to compute the CLIP text embedding of.
If not provided, the negative prompt is assumed to be empty (i.e., |
''
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
compute_self_attention_guidance
¶
compute_self_attention_guidance(
x: Tensor,
noise: Tensor,
step: int,
*,
clip_text_embedding: Tensor,
**kwargs: Tensor
) -> Tensor
Compute the self-attention guidance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor. |
required |
noise
|
Tensor
|
The noise tensor. |
required |
step
|
int
|
The step to compute the self-attention guidance at. |
required |
clip_text_embedding
|
Tensor
|
The CLIP text embedding to compute the self-attention guidance with. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The computed self-attention guidance. |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
has_self_attention_guidance
¶
has_self_attention_guidance() -> bool
Whether the model has self-attention guidance or not.
set_self_attention_guidance
¶
Set whether to enable self-attention guidance.
See [arXiv:2210.00939] Improving Sample Quality of Diffusion Models Using Self-Attention Guidance for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
enable
|
bool
|
Whether to enable self-attention guidance. |
required |
scale
|
float
|
The scale to use. |
1.0
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
set_unet_context
¶
Set the various context parameters required by the U-Net model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timestep
|
Tensor
|
The timestep tensor to use. |
required |
clip_text_embedding
|
Tensor
|
The CLIP text embedding tensor to use. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
StableDiffusion_1_Inpainting
¶
StableDiffusion_1_Inpainting(
unet: SD1UNet | None = None,
lda: SD1Autoencoder | None = None,
clip_text_encoder: CLIPTextEncoderL | None = None,
solver: Solver | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: StableDiffusion_1
Stable Diffusion 1.5 inpainting model.
Attributes:
Name | Type | Description |
---|---|---|
unet |
The U-Net model. |
|
clip_text_encoder |
The text encoder. |
|
lda |
The image autoencoder. |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
compute_self_attention_guidance
¶
compute_self_attention_guidance(
x: Tensor,
noise: Tensor,
step: int,
*,
clip_text_embedding: Tensor,
**kwargs: Tensor
) -> Tensor
Compute the self-attention guidance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor. |
required |
noise
|
Tensor
|
The noise tensor. |
required |
step
|
int
|
The step to compute the self-attention guidance at. |
required |
clip_text_embedding
|
Tensor
|
The CLIP text embedding to compute the self-attention guidance with. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The computed self-attention guidance. |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
set_inpainting_conditions
¶
set_inpainting_conditions(
target_image: Image,
mask: Image,
latents_size: tuple[int, int] = (64, 64),
) -> tuple[Tensor, Tensor]
Set the inpainting conditions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_image
|
Image
|
The target image to inpaint. |
required |
mask
|
Image
|
The mask to use for inpainting. |
required |
latents_size
|
tuple[int, int]
|
The size of the latents to use. |
(64, 64)
|
Returns:
Type | Description |
---|---|
tuple[Tensor, Tensor]
|
The mask latents and the target image latents. |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
DDIM
¶
DDIM(
num_inference_steps: int,
first_inference_step: int = 0,
params: BaseSolverParams | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: Solver
Denoising Diffusion Implicit Model (DDIM) solver.
See [arXiv:2010.02502] Denoising Diffusion Implicit Models for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
first_inference_step
|
int
|
The first inference step to perform. |
0
|
params
|
BaseSolverParams | None
|
The common parameters for solvers. |
None
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/ddim.py
DDPM
¶
DDPM(
num_inference_steps: int,
first_inference_step: int = 0,
params: BaseSolverParams | None = None,
device: device | str = "cpu",
)
Bases: Solver
Denoising Diffusion Probabilistic Model (DDPM) solver.
Warning
Only used for training Latent Diffusion models. Cannot be called.
See [arXiv:2006.11239] Denoising Diffusion Probabilistic Models for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
first_inference_step
|
int
|
The first inference step to perform. |
0
|
params
|
BaseSolverParams | None
|
The common parameters for solvers. |
None
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/ddpm.py
DPMSolver
¶
DPMSolver(
num_inference_steps: int,
first_inference_step: int = 0,
params: BaseSolverParams | None = None,
last_step_first_order: bool = False,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: Solver
Diffusion probabilistic models (DPMs) solver.
See [arXiv:2211.01095] DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models for more details.
Note
Regarding last_step_first_order: DPM-Solver++ is known to introduce artifacts when used with SDXL and few steps. This parameter is a way to mitigate that effect by using a first-order (Euler) update instead of a second-order update for the last step of the diffusion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
first_inference_step
|
int
|
The first inference step to perform. |
0
|
params
|
BaseSolverParams | None
|
The common parameters for solvers. |
None
|
last_step_first_order
|
bool
|
Use a first-order update for the last step. |
False
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/dpm.py
dpm_solver_first_order_update
¶
dpm_solver_first_order_update(
x: Tensor,
noise: Tensor,
step: int,
sde_noise: Tensor | None = None,
) -> Tensor
Applies a first-order backward Euler update to the input data x
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input data. |
required |
noise
|
Tensor
|
The predicted noise. |
required |
step
|
int
|
The current step. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The denoised version of the input data |
Source code in src/refiners/foundationals/latent_diffusion/solvers/dpm.py
multistep_dpm_solver_second_order_update
¶
multistep_dpm_solver_second_order_update(
x: Tensor, step: int, sde_noise: Tensor | None = None
) -> Tensor
Applies a second-order backward Euler update to the input data x
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input data. |
required |
step
|
int
|
The current step. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The denoised version of the input data |
Source code in src/refiners/foundationals/latent_diffusion/solvers/dpm.py
rebuild
¶
Rebuilds the solver with new parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int | None
|
The number of inference steps. |
required |
first_inference_step
|
int | None
|
The first inference step. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/dpm.py
remove_noise
¶
Remove noise from the input tensor using the current step of the diffusion process.
See Solver.remove_noise
for more details.
Source code in src/refiners/foundationals/latent_diffusion/solvers/dpm.py
Euler
¶
Euler(
num_inference_steps: int,
first_inference_step: int = 0,
params: BaseSolverParams | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: Solver
Euler solver.
See [arXiv:2206.00364] Elucidating the Design Space of Diffusion-Based Generative Models for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
first_inference_step
|
int
|
The first inference step to perform. |
0
|
params
|
BaseSolverParams | None
|
The common parameters for solvers. |
None
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/euler.py
scale_model_input
¶
Scales the model input according to the current step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The model input. |
required |
step
|
int
|
The current step. This method is called with |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The scaled model input. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/euler.py
FrankenSolver
¶
FrankenSolver(
get_diffusers_scheduler: Callable[[], SchedulerLike],
num_inference_steps: int,
first_inference_step: int = 0,
device: device | str = "cpu",
dtype: dtype = float32,
**kwargs: Any
)
Bases: Solver
Lets you use Diffusers Schedulers as Refiners Solvers.
For instance
Source code in src/refiners/foundationals/latent_diffusion/solvers/franken.py
LCMSolver
¶
LCMSolver(
num_inference_steps: int,
first_inference_step: int = 0,
params: BaseSolverParams | None = None,
num_orig_steps: int = 50,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: Solver
Latent Consistency Model solver.
This solver is designed for use either with a specific base model or a specific LoRA.
See [arXiv:2310.04378] Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference for details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
first_inference_step
|
int
|
The first inference step to perform. |
0
|
params
|
BaseSolverParams | None
|
The common parameters for solvers. |
None
|
num_orig_steps
|
int
|
The number of inference steps of the emulated DPM solver. |
50
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/lcm.py
ModelPredictionType
¶
NoiseSchedule
¶
An enumeration of schedules used to sample the noise.
Attributes:
Name | Type | Description |
---|---|---|
UNIFORM |
A uniform noise schedule. |
|
QUADRATIC |
A quadratic noise schedule. Corresponds to "Stable Diffusion" in [arXiv:2305.08891] Common Diffusion Noise Schedules and Sample Steps are Flawed table 1. |
|
KARRAS |
Solver
¶
Solver(
num_inference_steps: int,
first_inference_step: int = 0,
params: BaseSolverParams | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
The base class for creating a diffusion model solver.
Solvers create a sequence of noise and scaling factors used in the diffusion process, which gradually transforms the original data distribution into a Gaussian one.
This process is described using several parameters such as initial and final diffusion rates,
and is encapsulated into a __call__
method that applies a step of the diffusion process.
Attributes:
Name | Type | Description |
---|---|---|
params |
ResolvedSolverParams
|
The common parameters for solvers. See |
num_inference_steps |
The number of inference steps to perform. |
|
first_inference_step |
The step to start the inference process from. |
|
scale_factors |
The scale factors used to denoise the input. These are called "betas" in other implementations,
and |
|
cumulative_scale_factors |
The cumulative scale factors used to denoise the input. These are called "alpha_t" in other implementations. |
|
noise_std |
The standard deviation of the noise used to denoise the input. This is called "sigma_t" in other implementations. |
|
signal_to_noise_ratios |
The signal-to-noise ratios used to denoise the input. This is called "lambda_t" in other implementations. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
first_inference_step
|
int
|
The first inference step to perform. |
0
|
params
|
BaseSolverParams | None
|
The common parameters for solvers. |
None
|
device
|
device | str
|
The PyTorch device to use for the solver's tensors. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use for the solver's tensors. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
add_noise
¶
Add noise to the input tensor using the solver's parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor to add noise to. |
required |
noise
|
Tensor
|
The noise tensor to add to the input tensor. |
required |
step
|
int | list[int]
|
The current step(s) of the diffusion process. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The input tensor with added noise. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
generate_timesteps
staticmethod
¶
generate_timesteps(
spacing: TimestepSpacing,
num_inference_steps: int,
num_train_timesteps: int = 1000,
offset: int = 0,
) -> Tensor
Generate a tensor of timesteps according to a given spacing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spacing
|
TimestepSpacing
|
The spacing to use for the timesteps. |
required |
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
num_train_timesteps
|
int
|
The number of timesteps used to train the diffusion process. |
1000
|
offset
|
int
|
The offset to use for the timesteps. |
0
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
rebuild
¶
Rebuild the solver with new parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int | None
|
The number of inference steps to perform. |
required |
first_inference_step
|
int | None
|
The first inference step to perform. |
None
|
Returns:
Type | Description |
---|---|
T
|
A new solver instance with the specified parameters. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
remove_noise
¶
Remove noise from the input tensor using the current step of the diffusion process.
Note
See [arXiv:2006.11239] Denoising Diffusion Probabilistic Models, Equation 15 and [arXiv:2210.00939] Improving Sample Quality of Diffusion Models Using Self-Attention Guidance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor to remove noise from. |
required |
noise
|
Tensor
|
The noise tensor to remove from the input tensor. |
required |
step
|
int
|
The current step of the diffusion process. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The denoised input tensor. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
sample_noise_schedule
¶
sample_noise_schedule() -> Tensor
Sample the noise schedule.
Returns:
Type | Description |
---|---|
Tensor
|
A tensor representing the noise schedule. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
sample_power_distribution
¶
Sample a power distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
power
|
float
|
The power to use for the distribution. |
2
|
Returns:
Type | Description |
---|---|
Tensor
|
A tensor representing the power distribution between the initial and final diffusion rates of the solver. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
scale_model_input
¶
Scale the model's input according to the current timestep.
Note
This method should only be overridden by solvers that need to scale the input according to the current timestep.
By default, this method does not scale the input. (scale=1)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor to scale. |
required |
step
|
int
|
The current step of the diffusion process. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The scaled input tensor. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
to
¶
Move the solver to the specified device and data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
device | str | None
|
The PyTorch device to move the solver to. |
None
|
dtype
|
dtype | None
|
The PyTorch data type to move the solver to. |
None
|
Returns:
Type | Description |
---|---|
Solver
|
The solver instance, moved to the specified device and data type. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
SolverParams
dataclass
¶
SolverParams(
*,
num_train_timesteps: int | None = None,
timesteps_spacing: TimestepSpacing | None = None,
timesteps_offset: int | None = None,
initial_diffusion_rate: float | None = None,
final_diffusion_rate: float | None = None,
noise_schedule: NoiseSchedule | None = None,
sigma_schedule: NoiseSchedule | None = None,
model_prediction_type: (
ModelPredictionType | None
) = None,
sde_variance: float = 0.0
)
Bases: BaseSolverParams
Common parameters for solvers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_train_timesteps
|
int | None
|
The number of timesteps used to train the diffusion process. |
None
|
timesteps_spacing
|
TimestepSpacing | None
|
The spacing to use for the timesteps. |
None
|
timesteps_offset
|
int | None
|
The offset to use for the timesteps. |
None
|
initial_diffusion_rate
|
float | None
|
The initial diffusion rate used to sample the noise schedule. |
None
|
final_diffusion_rate
|
float | None
|
The final diffusion rate used to sample the noise schedule. |
None
|
noise_schedule
|
NoiseSchedule | None
|
The noise schedule used to sample the noise schedule. |
None
|
model_prediction_type
|
ModelPredictionType | None
|
Defines what the model predicts. |
None
|
TimestepSpacing
¶
An enumeration of methods to space the timesteps.
See [arXiv:2305.08891] Common Diffusion Noise Schedules and Sample Steps are Flawed table 2.
Attributes:
Name | Type | Description |
---|---|---|
LINSPACE |
Sample N steps with linear interpolation, return a floating-point tensor. |
|
LINSPACE_ROUNDED |
Same as LINSPACE but return an integer tensor with rounded timesteps. |
|
LEADING |
Sample N+1 steps, do not include the last timestep (i.e. bad - non-zero SNR). Used in DDIM, with a mitigation for that issue. |
|
TRAILING |
Sample N+1 steps, do not include the first timestep. |
|
CUSTOM |
Use custom timespacing in solver (override |
SDLoraManager
¶
SDLoraManager(target: LatentDiffusionModel)
Manage LoRAs for a Stable Diffusion model.
Note
In the context of SDLoraManager, a "LoRA" is a set of "LoRA layers" that can be attached to a target model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
LatentDiffusionModel
|
The target model to manage the LoRAs for. |
required |
Source code in src/refiners/foundationals/latent_diffusion/lora.py
lora_adapters
property
¶
lora_adapters: list[LoraAdapter]
List of all the LoraAdapters managed by the SDLoraManager.
scales
property
¶
The scales of all the LoRAs managed by the SDLoraManager.
add_loras
¶
add_loras(
name: str,
/,
tensors: dict[str, Tensor],
scale: float = 1.0,
unet_inclusions: list[str] | None = None,
unet_exclusions: list[str] | None = None,
unet_preprocess: dict[str, str] | None = None,
text_encoder_inclusions: list[str] | None = None,
text_encoder_exclusions: list[str] | None = None,
) -> None
Load a single LoRA from a state_dict
.
Warning
This method expects the keys of the state_dict
to be in the commonly found formats on CivitAI's hub.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the LoRA. |
required |
tensors
|
dict[str, Tensor]
|
The |
required |
scale
|
float
|
The scale to use for the LoRA. |
1.0
|
unet_inclusions
|
list[str] | None
|
A list of layer names, only layers with such a layer in their ancestors will be considered when patching the UNet. |
None
|
unet_exclusions
|
list[str] | None
|
A list of layer names, layers with such a layer in
their ancestors will not be considered when patching the UNet.
If this is |
None
|
unet_preprocess
|
dict[str, str] | None
|
A map between parts of state dict keys and layer names.
This is used to attach some keys to specific parts of the UNet.
You should leave it set to |
None
|
text_encoder_inclusions
|
list[str] | None
|
A list of layer names, only layers with such a layer in their ancestors will be considered when patching the text encoder. |
None
|
text_encoder_exclusions
|
list[ |