diffengine.models¶
Subpackages¶
diffengine.models.archsdiffengine.models.editorsdiffengine.models.editors.amuseddiffengine.models.editors.deepfloyd_ifdiffengine.models.editors.distill_sddiffengine.models.editors.esddiffengine.models.editors.instruct_pix2pixdiffengine.models.editors.ip_adapterdiffengine.models.editors.kandinskydiffengine.models.editors.lcmdiffengine.models.editors.pixart_alphadiffengine.models.editors.ssd_1bdiffengine.models.editors.stable_diffusiondiffengine.models.editors.stable_diffusion_controlnetdiffengine.models.editors.stable_diffusion_inpaintdiffengine.models.editors.stable_diffusion_xldiffengine.models.editors.stable_diffusion_xl_controlnetdiffengine.models.editors.stable_diffusion_xl_dpodiffengine.models.editors.stable_diffusion_xl_inpaintdiffengine.models.editors.t2i_adapterdiffengine.models.editors.wuerstchen
diffengine.models.lossesdiffengine.models.utils
Package Contents¶
Classes¶
aMUSEd. |
|
AMUSEdPreprocessor. |
|
DeepFloyd/IF. |
|
Distill Stable Diffusion XL. |
|
Stable Diffusion XL Erasing Concepts from Diffusion Models. |
|
ESDXLDataPreprocessor. |
|
Stable Diffusion XL Instruct Pix2Pix. |
|
Stable Diffusion XL IP-Adapter. |
|
Stable Diffusion XL IP-Adapter Plus. |
|
IPAdapterXLDataPreprocessor. |
|
Stable Diffusion XL IP-Adapter Plus. |
|
KandinskyV22 Prior. |
|
KandinskyV22 Decoder. |
|
KandinskyV22DecoderDataPreprocessor. |
|
KandinskyV3. |
|
Stable Diffusion XL Latent Consistency Models. |
|
PixArt Alpha. |
|
PixArtAlphaDataPreprocessor. |
|
SSD1B. |
|
Stable Diffusion. |
|
SDDataPreprocessor. |
|
Stable Diffusion ControlNet. |
|
SDControlNetDataPreprocessor. |
|
SDInpaintDataPreprocessor. |
|
Stable Diffusion Inpaint. |
|
`Stable Diffusion XL. |
|
SDXLDataPreprocessor. |
|
SDXLControlNetDataPreprocessor. |
|
Stable Diffusion XL ControlNet. |
|
Stable Diffusion XL DPO. |
|
SDXLDataPreprocessor. |
|
Stable Diffusion XL Inpaint. |
|
SDXLInpaintDataPreprocessor. |
|
Stable Diffusion XL T2I Adapter. |
|
`Wuerstchen Prior. |
|
L2 loss. |
|
SNR weighting gamma L2 loss. |
|
DeBias Estimation loss. |
|
Huber loss. |
|
CrossEntropy loss. |
|
White noise module. |
|
Offset noise module. |
|
Pyramid noise module. |
|
Time Steps module. |
|
Later biased Time Steps module. |
|
Earlier biased Time Steps module. |
|
Range biased Time Steps module. |
|
Cubic Sampling Time Steps module. |
|
Wuerstchen Random Time Steps module. |
|
DDIM Time Steps module. |
- class diffengine.models.AMUSEd(tokenizer, text_encoder, vae, transformer, model='amused/amused-512', loss=None, transformer_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, data_preprocessor=None, vae_batch_size=8, *, finetune_text_encoder=False, gradient_checkpointing=False, enable_xformers=False)[source]¶
Bases:
mmengine.model.BaseModelaMUSEd.
Args:¶
tokenizer (dict): Config of tokenizer. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. transformer (dict): Config of transformer. model (str): pretrained model name.
Defaults to “amused/amused-512”.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- transformer_lora_config (dict, optional): The LoRA config dict for
Transformer. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- data_preprocessor (dict, optional): The pre-process config of
vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- property device: torch.device¶
Get device information.
- Returns:
torch.device
- Return type:
device.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=12, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 12.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- _forward_vae(img, num_batches)[source]¶
Forward vae.
- Parameters:
img (torch.Tensor) –
num_batches (int) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
tokenizer (dict) –
text_encoder (dict) –
vae (dict) –
transformer (dict) –
model (str) –
loss (dict | None) –
transformer_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
data_preprocessor (dict | torch.nn.Module | None) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
- class diffengine.models.AMUSEdPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorAMUSEdPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.DeepFloydIF(tokenizer, scheduler, text_encoder, unet, model='DeepFloyd/IF-I-XL-v1.0', loss=None, unet_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, tokenizer_max_length=77, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, *, finetune_text_encoder=False, gradient_checkpointing=False, enable_xformers=False)[source]¶
Bases:
mmengine.model.BaseModelDeepFloyd/IF.
Args:¶
tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. unet (dict): Config of unet. model (str): pretrained model name of stable diffusion.
Defaults to ‘DeepFloyd/IF-I-XL-v1.0’.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- tokenizer_max_length (int): The max length of tokenizer.
Defaults to 77.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler: noise_scheduler.config.prediciton_type is chosen. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise').- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps').- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- property device: torch.device¶
Get device information.
- Returns:
torch.device
- Return type:
device.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘pt’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- loss(model_pred, noise, latents, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- _preprocess_model_input(latents, noise, timesteps)[source]¶
Preprocess model input.
- Parameters:
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
tokenizer_max_length (int) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
- class diffengine.models.DistillSDXL(*args, model_type, unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, **kwargs)[source]¶
Bases:
diffengine.models.editors.stable_diffusion_xl.StableDiffusionXLDistill Stable Diffusion XL.
Args:¶
- model_type (str): The type of model to use. Choice from sd_tiny,
sd_small.
- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
model_type (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
- class diffengine.models.ESDXL(*args, finetune_text_encoder=False, pre_compute_text_embeddings=True, height=1024, width=1024, negative_guidance=1.0, train_method='full', prediction_type=None, data_preprocessor=None, **kwargs)[source]¶
Bases:
diffengine.models.editors.stable_diffusion_xl.StableDiffusionXLStable Diffusion XL Erasing Concepts from Diffusion Models.
Args:¶
height (int): Image height. Defaults to 1024. width (int): Image width. Defaults to 1024. negative_guidance (float): Negative guidance for loss. Defaults to 1.0. train_method (str): Training method. Choice from full, xattn,
noxattn, selfattn. Defaults to full
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- train(*, mode=True)[source]¶
Convert the model into training mode.
- Parameters:
mode (bool) –
- Return type:
None
- abstract _preprocess_model_input(latents, noise, timesteps)[source]¶
Preprocess model input.
- Parameters:
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
finetune_text_encoder (bool) –
pre_compute_text_embeddings (bool) –
height (int) –
width (int) –
negative_guidance (float) –
train_method (str) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
- class diffengine.models.ESDXLDataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorESDXLDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
Union[dict, list]
- class diffengine.models.StableDiffusionXLInstructPix2Pix(*args, zeros_image_embeddings_prob=0.1, unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, data_preprocessor=None, **kwargs)[source]¶
Bases:
diffengine.models.editors.stable_diffusion_xl.StableDiffusionXLStable Diffusion XL Instruct Pix2Pix.
Args:¶
- zeros_image_embeddings_prob (float): The probabilities to
generate zeros image embeddings. Defaults to 0.1.
- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
- data_preprocessor (dict, optional): The pre-process config of
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, condition_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- condition_image (List[Union[str, Image.Image]]):
The condition image for ControlNet.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
zeros_image_embeddings_prob (float) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
data_preprocessor (dict | torch.nn.Module | None) –
- class diffengine.models.IPAdapterXL(*args, image_encoder, image_projection, feature_extractor, pretrained_adapter=None, pretrained_adapter_subfolder='', pretrained_adapter_weights_name='', unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, zeros_image_embeddings_prob=0.1, data_preprocessor=None, hidden_states_idx=-2, **kwargs)[source]¶
Bases:
diffengine.models.editors.stable_diffusion_xl.StableDiffusionXLStable Diffusion XL IP-Adapter.
Args:¶
image_encoder (dict): The image encoder config. image_projection (dict): The image projection config. feature_extractor (dict): The feature extractor config. pretrained_adapter (str, optional): Path to pretrained IP-Adapter.
Defaults to None.
- pretrained_adapter_subfolder (str, optional): Sub folder of pretrained
IP-Adapter. Defaults to ‘’.
- pretrained_adapter_weights_name (str, optional): Weights name of
pretrained IP-Adapter. Defaults to ‘’.
- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
- zeros_image_embeddings_prob (float): The probabilities to
generate zeros image embeddings. Defaults to 0.1.
- data_preprocessor (dict, optional): The pre-process config of
- hidden_states_idx (int): Index of the hidden states to be used.
Defaults to -2.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, example_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- example_image (List[Union[str, Image.Image]]):
The image prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
example_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
image_encoder (dict) –
image_projection (dict) –
feature_extractor (dict) –
pretrained_adapter (str | None) –
pretrained_adapter_subfolder (str) –
pretrained_adapter_weights_name (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
zeros_image_embeddings_prob (float) –
data_preprocessor (dict | torch.nn.Module | None) –
hidden_states_idx (int) –
- class diffengine.models.IPAdapterXLPlus(*args, image_encoder, image_projection, feature_extractor, pretrained_adapter=None, pretrained_adapter_subfolder='', pretrained_adapter_weights_name='', unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, zeros_image_embeddings_prob=0.1, data_preprocessor=None, hidden_states_idx=-2, **kwargs)[source]¶
Bases:
IPAdapterXLStable Diffusion XL IP-Adapter Plus.
- Parameters:
image_encoder (dict) –
image_projection (dict) –
feature_extractor (dict) –
pretrained_adapter (str | None) –
pretrained_adapter_subfolder (str) –
pretrained_adapter_weights_name (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
zeros_image_embeddings_prob (float) –
data_preprocessor (dict | torch.nn.Module | None) –
hidden_states_idx (int) –
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- class diffengine.models.IPAdapterXLDataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorIPAdapterXLDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.TimmIPAdapterXLPlus(*args, image_encoder, image_projection, feature_extractor, pretrained_adapter=None, pretrained_adapter_subfolder='', pretrained_adapter_weights_name='', unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, zeros_image_embeddings_prob=0.1, data_preprocessor=None, hidden_states_idx=-2, **kwargs)[source]¶
Bases:
diffengine.models.editors.ip_adapter.ip_adapter_xl.IPAdapterXLPlusStable Diffusion XL IP-Adapter Plus.
- Parameters:
image_encoder (dict) –
image_projection (dict) –
feature_extractor (dict) –
pretrained_adapter (str | None) –
pretrained_adapter_subfolder (str) –
pretrained_adapter_weights_name (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
zeros_image_embeddings_prob (float) –
data_preprocessor (dict | torch.nn.Module | None) –
hidden_states_idx (int) –
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, example_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- example_image (List[Union[str, Image.Image]]):
The image prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
example_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (list | None) –
mode (str) –
- Return type:
dict
- class diffengine.models.KandinskyV22Prior(tokenizer, scheduler, text_encoder, image_encoder, prior, decoder_model='kandinsky-community/kandinsky-2-2-decoder', prior_model='kandinsky-community/kandinsky-2-2-prior', loss=None, prior_lora_config=None, prior_loss_weight=1.0, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, *, gradient_checkpointing=False, enable_xformers=False)[source]¶
Bases:
mmengine.model.BaseModelKandinskyV22 Prior.
Args:¶
tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. image_encoder (dict): Config of image encoder. prior (dict): Config of prior. decoder_model (str): pretrained model name of decoder.
Defaults to “kandinsky-community/kandinsky-2-2-decoder”.
- prior_model (str): pretrained model name of prior.
Defaults to “kandinsky-community/kandinsky-2-2-prior”.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- prior_lora_config (dict, optional): The LoRA config dict for Prior.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise').- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps').- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- property device: torch.device¶
Get device information.
- Returns:
torch.device
- Return type:
device.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- loss(model_pred, noise, latents, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- _preprocess_model_input(latents, noise, timesteps)[source]¶
Preprocess model input.
- Parameters:
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
image_encoder (dict) –
prior (dict) –
decoder_model (str) –
prior_model (str) –
loss (dict | None) –
prior_lora_config (dict | None) –
prior_loss_weight (float) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
- class diffengine.models.KandinskyV22Decoder(scheduler, image_encoder, vae, unet, decoder_model='kandinsky-community/kandinsky-2-2-decoder', prior_model='kandinsky-community/kandinsky-2-2-prior', loss=None, unet_lora_config=None, prior_loss_weight=1.0, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, gradient_checkpointing=False, enable_xformers=False)[source]¶
Bases:
mmengine.model.BaseModelKandinskyV22 Decoder.
Args:¶
scheduler (dict): Config of scheduler. image_encoder (dict): Config of image encoder. vae (dict): Config of vae. unet (dict): Config of unet. decoder_model (str): pretrained model name of decoder.
Defaults to “kandinsky-community/kandinsky-2-2-decoder”.
- prior_model (str): pretrained model name of prior.
Defaults to “kandinsky-community/kandinsky-2-2-prior”.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise').- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps').- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- property device: torch.device¶
Get device information.
- Returns:
torch.device
- Return type:
device.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- loss(model_pred, noise, latents, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- _preprocess_model_input(latents, noise, timesteps)[source]¶
Preprocess model input.
- Parameters:
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
- Return type:
torch.Tensor
- _forward_vae(img, num_batches)[source]¶
Forward vae.
- Parameters:
img (torch.Tensor) –
num_batches (int) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
scheduler (dict) –
image_encoder (dict) –
vae (dict) –
unet (dict) –
decoder_model (str) –
prior_model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
- class diffengine.models.KandinskyV22DecoderDataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorKandinskyV22DecoderDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.KandinskyV3(tokenizer, scheduler, text_encoder, vae, unet, model='kandinsky-community/kandinsky-3', loss=None, unet_lora_config=None, prior_loss_weight=1.0, tokenizer_max_length=128, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, gradient_checkpointing=False, enable_xformers=False)[source]¶
Bases:
mmengine.model.BaseModelKandinskyV3.
Args:¶
tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. unet (dict): Config of unet. model (str): pretrained model name.
Defaults to “kandinsky-community/kandinsky-3”.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- tokenizer_max_length (int): The max length of tokenizer.
Defaults to 128.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise').- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps').- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- property device: torch.device¶
Get device information.
- Returns:
torch.device
- Return type:
device.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- loss(model_pred, noise, latents, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- _preprocess_model_input(latents, noise, timesteps)[source]¶
Preprocess model input.
- Parameters:
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
- Return type:
torch.Tensor
- _forward_vae(img, num_batches)[source]¶
Forward vae.
- Parameters:
img (torch.Tensor) –
num_batches (int) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
vae (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
prior_loss_weight (float) –
tokenizer_max_length (int) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
- class diffengine.models.LatentConsistencyModelsXL(*args, timesteps_generator=None, num_ddim_timesteps=50, w_min=3.0, w_max=15.0, ema_type='ExponentialMovingAverage', ema_momentum=0.05, **kwargs)[source]¶
Bases:
diffengine.models.editors.stable_diffusion_xl.StableDiffusionXLStable Diffusion XL Latent Consistency Models.
Args:¶
- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='DDIMTimeSteps').
num_ddim_timesteps (int): Number of DDIM timesteps. Defaults to 50. w_min (float): Minimum guidance scale. Defaults to 3.0. w_max (float): Maximum guidance scale. Defaults to 15.0. ema_type (str): The type of EMA.
Defaults to ‘ExponentialMovingAverage’.
ema_momentum (float): The EMA momentum. Defaults to 0.05.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, height=None, width=None, num_inference_steps=4, guidance_scale=1.0, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
guidance_scale (float): The guidance scale. Defaults to 1.0. output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
guidance_scale (float) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- loss(model_pred, gt, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
gt (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- _predicted_origin(model_output, timesteps, sample)[source]¶
Predict the origin of the model output.
Args:¶
model_output (torch.Tensor): The model output. timesteps (torch.Tensor): The timesteps. sample (torch.Tensor): The sample.
- Parameters:
model_output (torch.Tensor) –
timesteps (torch.Tensor) –
sample (torch.Tensor) –
- Return type:
torch.Tensor
- Parameters:
timesteps_generator (dict | None) –
num_ddim_timesteps (int) –
w_min (float) –
w_max (float) –
ema_type (str) –
ema_momentum (float) –
- class diffengine.models.PixArtAlpha(tokenizer, scheduler, text_encoder, vae, transformer, model='PixArt-alpha/PixArt-XL-2-1024-MS', loss=None, transformer_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, tokenizer_max_length=120, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, finetune_text_encoder=False, gradient_checkpointing=False, enable_xformers=False)[source]¶
Bases:
mmengine.model.BaseModelPixArt Alpha.
Args:¶
tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. transformer (dict): Config of transformer. model (str): pretrained model name of stable diffusion.
Defaults to ‘PixArt-alpha/PixArt-XL-2-1024-MS’.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- transformer_lora_config (dict, optional): The LoRA config dict for
Transformer. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- tokenizer_max_length (int): The max length of tokenizer.
Defaults to 120.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise').- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps').- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- property device: torch.device¶
Get device information.
- Returns:
torch.device
- Return type:
device.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- loss(model_pred, noise, latents, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- _preprocess_model_input(latents, noise, timesteps)[source]¶
Preprocess model input.
- Parameters:
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
- Return type:
torch.Tensor
- _forward_vae(img, num_batches)[source]¶
Forward vae.
- Parameters:
img (torch.Tensor) –
num_batches (int) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
vae (dict) –
transformer (dict) –
model (str) –
loss (dict | None) –
transformer_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
tokenizer_max_length (int) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
- class diffengine.models.PixArtAlphaDataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorPixArtAlphaDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.SSD1B(tokenizer_one, tokenizer_two, scheduler, text_encoder_one, text_encoder_two, vae, teacher_unet, student_unet, model='stabilityai/stable-diffusion-xl-base-1.0', loss=None, unet_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, finetune_text_encoder=False, gradient_checkpointing=False, pre_compute_text_embeddings=False, enable_xformers=False, student_weight_from_teacher=False)[source]¶
Bases:
diffengine.models.editors.stable_diffusion_xl.StableDiffusionXLSSD1B.
Refer to official implementation: https://github.com/segmind/SSD-1B/blob/main/distill_sdxl.py
Args:¶
tokenizer_one (dict): Config of tokenizer one. tokenizer_two (dict): Config of tokenizer two. scheduler (dict): Config of scheduler. text_encoder_one (dict): Config of text encoder one. text_encoder_two (dict): Config of text encoder two. vae (dict): Config of vae. teacher_unet (dict): Config of teacher unet. student_unet (dict): Config of student unet. model (str): pretrained model name of stable diffusion xl.
Defaults to ‘stabilityai/stable-diffusion-xl-base-1.0’.
- vae_model (str, optional): Path to pretrained VAE model with better
numerical stability. More details: https://github.com/huggingface/diffusers/pull/4038. Defaults to None.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler: noise_scheduler.config.prediciton_type is chosen. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise').- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps').- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- pre_compute_text_embeddings(bool): Whether or not to pre-compute text
embeddings to save memory. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- student_weight_from_teacher (bool): Whether or not to initialize
student model with teacher model. Defaults to False.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- _forward_vae(img, num_batches)[source]¶
Forward vae.
- Parameters:
img (torch.Tensor) –
num_batches (int) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
tokenizer_one (dict) –
tokenizer_two (dict) –
scheduler (dict) –
text_encoder_one (dict) –
text_encoder_two (dict) –
vae (dict) –
teacher_unet (dict) –
student_unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
pre_compute_text_embeddings (bool) –
enable_xformers (bool) –
student_weight_from_teacher (bool) –
- class diffengine.models.StableDiffusion(tokenizer, scheduler, text_encoder, vae, unet, model='runwayml/stable-diffusion-v1-5', loss=None, unet_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, finetune_text_encoder=False, gradient_checkpointing=False, enable_xformers=False)[source]¶
Bases:
mmengine.model.BaseModelStable Diffusion.
Args:¶
tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. unet (dict): Config of unet. model (str): pretrained model name of stable diffusion.
Defaults to ‘runwayml/stable-diffusion-v1-5’.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise').- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps').- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- property device: torch.device¶
Get device information.
- Returns:
torch.device
- Return type:
device.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- loss(model_pred, noise, latents, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- _preprocess_model_input(latents, noise, timesteps)[source]¶
Preprocess model input.
- Parameters:
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
- Return type:
torch.Tensor
- _forward_vae(img, num_batches)[source]¶
Forward vae.
- Parameters:
img (torch.Tensor) –
num_batches (int) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
vae (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
- class diffengine.models.SDDataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorSDDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.StableDiffusionControlNet(*args, controlnet_model=None, transformer_layers_per_block=None, unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, data_preprocessor=None, **kwargs)[source]¶
Bases:
diffengine.models.editors.stable_diffusion.StableDiffusionStable Diffusion ControlNet.
Args:¶
- controlnet_model (str, optional): Path to pretrained ControlNet model.
If None, use the default ControlNet model from Unet. Defaults to None.
- transformer_layers_per_block (List[int], optional):
The number of layers per block in the transformer. More details: https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0-small. Defaults to None.
- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
- data_preprocessor (dict, optional): The pre-process config of
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, condition_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- condition_image (List[Union[str, Image.Image]]):
The condition image for ControlNet.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- _forward_compile(noisy_latents, timesteps, encoder_hidden_states, inputs)[source]¶
Forward function for torch.compile.
- Parameters:
noisy_latents (torch.Tensor) –
timesteps (torch.Tensor) –
encoder_hidden_states (torch.Tensor) –
inputs (dict) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
controlnet_model (str | None) –
transformer_layers_per_block (list[int] | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
data_preprocessor (dict | torch.nn.Module | None) –
- class diffengine.models.SDControlNetDataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorSDControlNetDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.SDInpaintDataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorSDInpaintDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.StableDiffusionInpaint(*args, model='runwayml/stable-diffusion-inpainting', data_preprocessor=None, **kwargs)[source]¶
Bases:
diffengine.models.editors.stable_diffusion.StableDiffusionStable Diffusion Inpaint.
Args:¶
- model (str): pretrained model name of stable diffusion.
Defaults to ‘runwayml/stable-diffusion-v1-5’.
- data_preprocessor (dict, optional): The pre-process config of
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, image, mask, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- image (List[Union[str, Image.Image]]):
The image for inpainting.
- mask (List[Union[str, Image.Image]]):
The mask for inpainting.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
image (list[str | PIL.Image.Image]) –
mask (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
model (str) –
data_preprocessor (dict | torch.nn.Module | None) –
- class diffengine.models.StableDiffusionXL(tokenizer_one, tokenizer_two, scheduler, text_encoder_one, text_encoder_two, vae, unet, model='stabilityai/stable-diffusion-xl-base-1.0', loss=None, unet_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, finetune_text_encoder=False, gradient_checkpointing=False, pre_compute_text_embeddings=False, enable_xformers=False)[source]¶
Bases:
mmengine.model.BaseModel`Stable Diffusion XL.
<https://huggingface.co/papers/2307.01952>`_
Args:¶
tokenizer_one (dict): Config of tokenizer one. tokenizer_two (dict): Config of tokenizer two. scheduler (dict): Config of scheduler. text_encoder_one (dict): Config of text encoder one. text_encoder_two (dict): Config of text encoder two. vae (dict): Config of vae. unet (dict): Config of unet. model (str): pretrained model name of stable diffusion xl.
Defaults to ‘stabilityai/stable-diffusion-xl-base-1.0’.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler: noise_scheduler.config.prediciton_type is chosen. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise').- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps').- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- pre_compute_text_embeddings (bool): Whether or not to pre-compute text
embeddings to save memory. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- property device: torch.device¶
Get device information.
- Returns:
torch.device
- Return type:
device.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- encode_prompt(text_one, text_two)[source]¶
Encode prompt.
Args:¶
text_one (torch.Tensor): Token ids from tokenizer one. text_two (torch.Tensor): Token ids from tokenizer two.
Returns:¶
tuple[torch.Tensor, torch.Tensor]: Prompt embeddings
- Parameters:
text_one (torch.Tensor) –
text_two (torch.Tensor) –
- Return type:
tuple[torch.Tensor, torch.Tensor]
- loss(model_pred, noise, latents, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- _preprocess_model_input(latents, noise, timesteps)[source]¶
Preprocess model input.
- Parameters:
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
- Return type:
torch.Tensor
- _forward_vae(img, num_batches)[source]¶
Forward vae.
- Parameters:
img (torch.Tensor) –
num_batches (int) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
tokenizer_one (dict) –
tokenizer_two (dict) –
scheduler (dict) –
text_encoder_one (dict) –
text_encoder_two (dict) –
vae (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
pre_compute_text_embeddings (bool) –
enable_xformers (bool) –
- class diffengine.models.SDXLDataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorSDXLDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.SDXLControlNetDataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorSDXLControlNetDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.StableDiffusionXLControlNet(*args, controlnet_model=None, transformer_layers_per_block=None, unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, data_preprocessor=None, **kwargs)[source]¶
Bases:
diffengine.models.editors.stable_diffusion_xl.StableDiffusionXLStable Diffusion XL ControlNet.
Args:¶
- controlnet_model (str, optional): Path to pretrained ControlNet model.
If None, use the default ControlNet model from Unet. Defaults to None.
- transformer_layers_per_block (List[int], optional):
The number of layers per block in the transformer. More details: https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0-small. Defaults to None.
- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
- data_preprocessor (dict, optional): The pre-process config of
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, condition_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- condition_image (List[Union[str, Image.Image]]):
The condition image for ControlNet.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- _forward_compile(noisy_latents, timesteps, prompt_embeds, unet_added_conditions, inputs)[source]¶
Forward function for torch.compile.
- Parameters:
noisy_latents (torch.Tensor) –
timesteps (torch.Tensor) –
prompt_embeds (torch.Tensor) –
unet_added_conditions (dict) –
inputs (dict) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
controlnet_model (str | None) –
transformer_layers_per_block (list[int] | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
data_preprocessor (dict | torch.nn.Module | None) –
- class diffengine.models.StableDiffusionXLDPO(*args, beta_dpo=5000, loss=None, data_preprocessor=None, **kwargs)[source]¶
Bases:
diffengine.models.editors.stable_diffusion_xl.StableDiffusionXLStable Diffusion XL DPO.
Args:¶
beta_dpo (int): DPO KL Divergence penalty. Defaults to 5000. loss (dict, optional): The loss config. Defaults to None. data_preprocessor (dict, optional): The pre-process config of
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- loss(model_pred, ref_pred, noise, latents, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
ref_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
beta_dpo (int) –
loss (dict | None) –
data_preprocessor (dict | torch.nn.Module | None) –
- class diffengine.models.SDXLDPODataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorSDXLDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.StableDiffusionXLInpaint(*args, model='diffusers/stable-diffusion-xl-1.0-inpainting-0.1', data_preprocessor=None, **kwargs)[source]¶
Bases:
diffengine.models.editors.stable_diffusion_xl.StableDiffusionXLStable Diffusion XL Inpaint.
Args:¶
- model (str): pretrained model name of stable diffusion.
Defaults to ‘diffusers/stable-diffusion-xl-1.0-inpainting-0.1’.
- data_preprocessor (dict, optional): The pre-process config of
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, image, mask, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- image (List[Union[str, Image.Image]]):
The image for inpainting.
- mask (List[Union[str, Image.Image]]):
The mask for inpainting.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
image (list[str | PIL.Image.Image]) –
mask (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
model (str) –
data_preprocessor (dict | torch.nn.Module | None) –
- class diffengine.models.SDXLInpaintDataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorSDXLInpaintDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.StableDiffusionXLT2IAdapter(*args, adapter, unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, timesteps_generator=None, data_preprocessor=None, **kwargs)[source]¶
Bases:
diffengine.models.editors.stable_diffusion_xl.StableDiffusionXLStable Diffusion XL T2I Adapter.
Args:¶
adapter (dict): The adapter config. unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='CubicSamplingTimeSteps').- data_preprocessor (dict, optional): The pre-process config of
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, condition_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- condition_image (List[Union[str, Image.Image]]):
The condition image for ControlNet.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- _forward_compile(noisy_latents, timesteps, prompt_embeds, unet_added_conditions, inputs)[source]¶
Forward function for torch.compile.
- Parameters:
noisy_latents (torch.Tensor) –
timesteps (torch.Tensor) –
prompt_embeds (torch.Tensor) –
unet_added_conditions (dict) –
inputs (dict) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
adapter (dict) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
timesteps_generator (dict | None) –
data_preprocessor (dict | torch.nn.Module | None) –
- class diffengine.models.WuerstchenPriorModel(tokenizer, scheduler, text_encoder, image_encoder, prior, decoder_model='warp-ai/wuerstchen', prior_model='warp-ai/wuerstchen-prior', loss=None, prior_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, *, finetune_text_encoder=False, gradient_checkpointing=False)[source]¶
Bases:
mmengine.model.BaseModel`Wuerstchen Prior.
<https://arxiv.org/abs/2306.00637>`_
Args:¶
tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. image_encoder (dict): Config of image encoder. prior (dict): Config of prior. decoder_model (str): pretrained decoder model name of Wuerstchen.
Defaults to ‘warp-ai/wuerstchen’.
- prior_model (str): pretrained prior model name of Wuerstchen.
Defaults to ‘warp-ai/wuerstchen-prior’.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- prior_lora_config (dict, optional): The LoRA config dict for Prior.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise').- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='WuerstchenRandomTimeSteps').- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- property device: torch.device¶
Get device information.
- Returns:
torch.device
- Return type:
device.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- train(*, mode=True)[source]¶
Convert the model into training mode.
- Parameters:
mode (bool) –
- Return type:
None
- infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- loss(model_pred, noise, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- _preprocess_model_input(latents, noise, timesteps)[source]¶
Preprocess model input.
- Parameters:
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
image_encoder (dict) –
prior (dict) –
decoder_model (str) –
prior_model (str) –
loss (dict | None) –
prior_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
- class diffengine.models.L2Loss(loss_weight=1.0, reduction='mean', loss_name='l2')[source]¶
Bases:
diffengine.models.losses.base.BaseLossL2 loss.
Args:¶
- loss_weight (float, optional): Weight of this loss item.
Defaults to
1..- reduction: (str): The reduction method for the loss.
Defaults to ‘mean’.
- loss_name (str, optional): Name of the loss item. If you want this loss
item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘l2’.
- forward(pred, gt, weight=None)[source]¶
Forward function.
Args:¶
pred (torch.Tensor): The predicted tensor. gt (torch.Tensor): The ground truth tensor. weight (torch.Tensor | None, optional): The loss weight.
Defaults to None.
Returns:¶
torch.Tensor: loss
- Parameters:
pred (torch.Tensor) –
gt (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
torch.Tensor
- Parameters:
loss_weight (float) –
reduction (str) –
loss_name (str) –
- class diffengine.models.SNRL2Loss(loss_weight=1.0, snr_gamma=5.0, reduction='mean', loss_name='snrl2')[source]¶
Bases:
diffengine.models.losses.base.BaseLossSNR weighting gamma L2 loss.
https://arxiv.org/abs/2303.09556
Args:¶
- loss_weight (float): Weight of this loss item.
Defaults to
1..- snr_gamma (float): SNR weighting gamma to be used if re balancing the
loss. “More details here: https://arxiv.org/abs/2303.09556.” Defaults to
5..- reduction: (str): The reduction method for the loss.
Defaults to ‘mean’.
- loss_name (str, optional): Name of the loss item. If you want this loss
item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘l2’.
- property use_snr: bool¶
Whether or not this loss uses SNR.
- Return type:
bool
- forward(pred, gt, timesteps, alphas_cumprod, prediction_type, weight=None)[source]¶
Forward function.
Args:¶
pred (torch.Tensor): The predicted tensor. gt (torch.Tensor): The ground truth tensor. timesteps (torch.Tensor): The timestep tensor. alphas_cumprod (torch.Tensor): The alphas_cumprod from the
scheduler.
prediction_type (str): The prediction type from scheduler. weight (torch.Tensor | None, optional): The loss weight.
Defaults to None.
Returns:¶
torch.Tensor: loss
- Parameters:
pred (torch.Tensor) –
gt (torch.Tensor) –
timesteps (torch.Tensor) –
alphas_cumprod (torch.Tensor) –
prediction_type (str) –
weight (torch.Tensor | None) –
- Return type:
torch.Tensor
- Parameters:
loss_weight (float) –
snr_gamma (float) –
reduction (str) –
loss_name (str) –
- class diffengine.models.DeBiasEstimationLoss(loss_weight=1.0, reduction='mean', loss_name='debias_estimation')[source]¶
Bases:
diffengine.models.losses.base.BaseLossDeBias Estimation loss.
https://arxiv.org/abs/2310.08442
Args:¶
- loss_weight (float): Weight of this loss item.
Defaults to
1..- reduction: (str): The reduction method for the loss.
Defaults to ‘mean’.
- loss_name (str, optional): Name of the loss item. If you want this loss
item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘l2’.
- property use_snr: bool¶
Whether or not this loss uses SNR.
- Return type:
bool
- forward(pred, gt, timesteps, alphas_cumprod, prediction_type, weight=None)[source]¶
Forward function.
Args:¶
pred (torch.Tensor): The predicted tensor. gt (torch.Tensor): The ground truth tensor. timesteps (torch.Tensor): The timestep tensor. alphas_cumprod (torch.Tensor): The alphas_cumprod from the
scheduler.
prediction_type (str): The prediction type from scheduler. weight (torch.Tensor | None, optional): The loss weight.
Defaults to None.
Returns:¶
torch.Tensor: loss
- Parameters:
pred (torch.Tensor) –
gt (torch.Tensor) –
timesteps (torch.Tensor) –
alphas_cumprod (torch.Tensor) –
prediction_type (str) –
weight (torch.Tensor | None) –
- Return type:
torch.Tensor
- Parameters:
loss_weight (float) –
reduction (str) –
loss_name (str) –
- class diffengine.models.HuberLoss(delta=1.0, loss_weight=1.0, reduction='mean', loss_name='l2')[source]¶
Bases:
diffengine.models.losses.base.BaseLossHuber loss.
Args:¶
- delta (float, optional): Specifies the threshold at which to change
between delta-scaled L1 and L2 loss. The value must be positive. Default: 1.0
- loss_weight (float, optional): Weight of this loss item.
Defaults to
1..- reduction: (str): The reduction method for the loss.
Defaults to ‘mean’.
- loss_name (str, optional): Name of the loss item. If you want this loss
item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘l2’.
- forward(pred, gt, weight=None)[source]¶
Forward function.
Args:¶
pred (torch.Tensor): The predicted tensor. gt (torch.Tensor): The ground truth tensor. weight (torch.Tensor | None, optional): The loss weight.
Defaults to None.
Returns:¶
torch.Tensor: loss
- Parameters:
pred (torch.Tensor) –
gt (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
torch.Tensor
- Parameters:
delta (float) –
loss_weight (float) –
reduction (str) –
loss_name (str) –
- class diffengine.models.CrossEntropyLoss(loss_weight=1.0, reduction='mean', ignore_index=-100, loss_name='cross_entropy')[source]¶
Bases:
diffengine.models.losses.base.BaseLossCrossEntropy loss.
Args:¶
- loss_weight (float, optional): Weight of this loss item.
Defaults to
1..- reduction: (str): The reduction method for the loss.
Defaults to ‘mean’.
- ignore_index (int): Specifies a target value that is ignored.
Defaults to -100.
- loss_name (str, optional): Name of the loss item. If you want this loss
item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘l2’.
- forward(pred, gt, weight=None)[source]¶
Forward function.
Args:¶
pred (torch.Tensor): The predicted tensor. gt (torch.Tensor): The ground truth tensor. weight (torch.Tensor | None, optional): The loss weight.
Defaults to None.
Returns:¶
torch.Tensor: loss
- Parameters:
pred (torch.Tensor) –
gt (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
torch.Tensor
- Parameters:
loss_weight (float) –
reduction (str) –
ignore_index (int) –
loss_name (str) –
- class diffengine.models.WhiteNoise(*args, **kwargs)[source]¶
Bases:
torch.nn.ModuleWhite noise module.
- class diffengine.models.OffsetNoise(offset_weight=0.05)[source]¶
Bases:
torch.nn.ModuleOffset noise module.
https://www.crosslabs.org/blog/diffusion-with-offset-noise
Args:¶
offset_weight (float): Noise offset weight. Defaults to 0.05.
- Parameters:
offset_weight (float) –
- class diffengine.models.PyramidNoise(discount=0.9, *, random_multiplier=True)[source]¶
Bases:
torch.nn.ModulePyramid noise module.
https://wandb.ai/johnowhitaker/multires_noise/reports/ Multi-Resolution-Noise-for-Diffusion-Model-Training–VmlldzozNjYyOTU2
Args:¶
discount (float): Noise offset weight. Defaults to 0.9. random_multiplier (bool): Whether to use random multiplier.
Defaults to True.
- Parameters:
discount (float) –
random_multiplier (bool) –
- class diffengine.models.TimeSteps(*args, **kwargs)[source]¶
Bases:
torch.nn.ModuleTime Steps module.
- forward(scheduler, num_batches, device)[source]¶
Forward pass.
Generates time steps for the given batches.
Args:¶
scheduler (DDPMScheduler): Scheduler for training diffusion model. num_batches (int): Batch size. device (str): Device.
- Parameters:
scheduler (diffusers.DDPMScheduler) –
num_batches (int) –
device (str) –
- Return type:
torch.Tensor
- class diffengine.models.LaterTimeSteps(bias_multiplier=5.0, bias_portion=0.25)[source]¶
Bases:
torch.nn.ModuleLater biased Time Steps module.
Args:¶
bias_multiplier (float): Bias multiplier. Defaults to 10. bias_portion (float): Portion of later time steps to bias.
Defaults to 0.25.
- forward(scheduler, num_batches, device)[source]¶
Forward pass.
Generates time steps for the given batches.
Args:¶
scheduler (DDPMScheduler): Scheduler for training diffusion model. num_batches (int): Batch size. device (str): Device.
- Parameters:
scheduler (diffusers.DDPMScheduler) –
num_batches (int) –
device (str) –
- Return type:
torch.Tensor
- Parameters:
bias_multiplier (float) –
bias_portion (float) –
- class diffengine.models.EarlierTimeSteps(bias_multiplier=5.0, bias_portion=0.25)[source]¶
Bases:
torch.nn.ModuleEarlier biased Time Steps module.
Args:¶
bias_multiplier (float): Bias multiplier. Defaults to 10. bias_portion (float): Portion of earlier time steps to bias.
Defaults to 0.25.
- forward(scheduler, num_batches, device)[source]¶
Forward pass.
Generates time steps for the given batches.
Args:¶
scheduler (DDPMScheduler): Scheduler for training diffusion model. num_batches (int): Batch size. device (str): Device.
- Parameters:
scheduler (diffusers.DDPMScheduler) –
num_batches (int) –
device (str) –
- Return type:
torch.Tensor
- Parameters:
bias_multiplier (float) –
bias_portion (float) –
- class diffengine.models.RangeTimeSteps(bias_multiplier=5.0, bias_begin=0.25, bias_end=0.75)[source]¶
Bases:
torch.nn.ModuleRange biased Time Steps module.
Args:¶
bias_multiplier (float): Bias multiplier. Defaults to 10. bias_begin (float): Portion of begin time steps to bias.
Defaults to 0.25.
- bias_end (float): Portion of end time steps to bias.
Defaults to 0.75.
- forward(scheduler, num_batches, device)[source]¶
Forward pass.
Generates time steps for the given batches.
Args:¶
scheduler (DDPMScheduler): Scheduler for training diffusion model. num_batches (int): Batch size. device (str): Device.
- Parameters:
scheduler (diffusers.DDPMScheduler) –
num_batches (int) –
device (str) –
- Return type:
torch.Tensor
- Parameters:
bias_multiplier (float) –
bias_begin (float) –
bias_end (float) –
- class diffengine.models.CubicSamplingTimeSteps(*args, **kwargs)[source]¶
Bases:
torch.nn.ModuleCubic Sampling Time Steps module.
For more details about why cubic sampling is used, refer to section 3.4 of https://arxiv.org/abs/2302.08453
- forward(scheduler, num_batches, device)[source]¶
Forward pass.
Generates time steps for the given batches.
Args:¶
scheduler (DDPMScheduler): Scheduler for training diffusion model. num_batches (int): Batch size. device (str): Device.
- Parameters:
scheduler (diffusers.DDPMScheduler) –
num_batches (int) –
device (str) –
- Return type:
torch.Tensor
- class diffengine.models.WuerstchenRandomTimeSteps(*args, **kwargs)[source]¶
Bases:
torch.nn.ModuleWuerstchen Random Time Steps module.
- class diffengine.models.DDIMTimeSteps(num_ddim_timesteps=50)[source]¶
Bases:
torch.nn.ModuleDDIM Time Steps module.
Args:¶
num_ddim_timesteps (int): Number of DDIM timesteps. Defaults to 50.
- forward(scheduler, num_batches, device)[source]¶
Forward pass.
Generates time steps for the given batches.
Args:¶
scheduler (DDPMScheduler): Scheduler for training diffusion model. num_batches (int): Batch size. device (str): Device.
- Parameters:
scheduler (diffusers.DDPMScheduler) –
num_batches (int) –
device (str) –
- Return type:
torch.Tensor
- Parameters:
num_ddim_timesteps (int) –