`diffengine.models.editors.stable_diffusion_xl_controlnet`¶

Submodules¶

Package Contents¶

Classes¶

`SDXLControlNetDataPreprocessor`	SDXLControlNetDataPreprocessor.
`StableDiffusionXLControlNet`	Stable Diffusion XL ControlNet.

class diffengine.models.editors.stable_diffusion_xl_controlnet.SDXLControlNetDataPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

SDXLControlNetDataPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

dict | list

class diffengine.models.editors.stable_diffusion_xl_controlnet.StableDiffusionXLControlNet(*args, controlnet_model=None, transformer_layers_per_block=None, unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, data_preprocessor=None, **kwargs)[source]¶

Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL

Stable Diffusion XL ControlNet.

Args:¶

controlnet_model (str, optional): Path to pretrained ControlNet model.
If None, use the default ControlNet model from Unet. Defaults to None.

transformer_layers_per_block (List[int], optional):
The number of layers per block in the transformer. More details: https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0-small. Defaults to None.

unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.

data_preprocessor (dict, optional): The pre-process config of
SDControlNetDataPreprocessor.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, condition_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):
The prompt or prompts to guide the image generation.

condition_image (List[Union[str, Image.Image]]):
The condition image for ControlNet.

negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):
The height in pixels of the generated image. Defaults to None.

width (int, optional):
The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.
Defaults to 50.

output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

_forward_compile(noisy_latents, timesteps, prompt_embeds, unet_added_conditions, inputs)[source]¶

Forward function for torch.compile.

Parameters:

noisy_latents (torch.Tensor) –
timesteps (torch.Tensor) –
prompt_embeds (torch.Tensor) –
unet_added_conditions (dict) –
inputs (dict) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

controlnet_model (str | None) –
transformer_layers_per_block (list[int] | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
data_preprocessor (dict | torch.nn.Module | None) –

diffengine.models.editors.stable_diffusion_xl_controlnet¶

Submodules¶

Package Contents¶

Classes¶

Args:¶

Returns:¶

Args:¶

Args:¶

Args:¶

Returns:¶

`diffengine.models.editors.stable_diffusion_xl_controlnet`¶