diffengine.models.editors.ip_adapter.ip_adapter_xl

Module Contents

Classes

IPAdapterXL

Stable Diffusion XL IP-Adapter.

IPAdapterXLPlus

Stable Diffusion XL IP-Adapter Plus.

class diffengine.models.editors.ip_adapter.ip_adapter_xl.IPAdapterXL(*args, image_encoder, image_projection, feature_extractor, pretrained_adapter=None, pretrained_adapter_subfolder='', pretrained_adapter_weights_name='', unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, zeros_image_embeddings_prob=0.1, data_preprocessor=None, hidden_states_idx=-2, **kwargs)[source]

Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL

Stable Diffusion XL IP-Adapter.

Args:

image_encoder (dict): The image encoder config. image_projection (dict): The image projection config. feature_extractor (dict): The feature extractor config. pretrained_adapter (str, optional): Path to pretrained IP-Adapter.

Defaults to None.

pretrained_adapter_subfolder (str, optional): Sub folder of pretrained

IP-Adapter. Defaults to ‘’.

pretrained_adapter_weights_name (str, optional): Weights name of

pretrained IP-Adapter. Defaults to ‘’.

unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. This should be False when training ControlNet. Defaults to False.

zeros_image_embeddings_prob (float): The probabilities to

generate zeros image embeddings. Defaults to 0.1.

data_preprocessor (dict, optional): The pre-process config of

SDControlNetDataPreprocessor.

hidden_states_idx (int): Index of the hidden states to be used.

Defaults to -2.

set_lora()[source]

Set LORA for model.

Return type:

None

prepare_model()[source]

Prepare model for training.

Disable gradient for some models.

Return type:

None

set_ip_adapter()[source]

Set IP-Adapter for model.

Return type:

None

infer(prompt, example_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]

Inference function.

Args:
prompt (List[str]):

The prompt or prompts to guide the image generation.

example_image (List[Union[str, Image.Image]]):

The image prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:
  • prompt (list[str]) –

  • example_image (list[str | PIL.Image.Image]) –

  • negative_prompt (str | None) –

  • height (int | None) –

  • width (int | None) –

  • num_inference_steps (int) –

  • output_type (str) –

Return type:

list[numpy.ndarray]

forward(inputs, data_samples=None, mode='loss')[source]

Forward function.

Args:

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:

dict: The loss dict.

Parameters:
  • inputs (dict) –

  • data_samples (Optional[list]) –

  • mode (str) –

Return type:

dict

Parameters:
  • image_encoder (dict) –

  • image_projection (dict) –

  • feature_extractor (dict) –

  • pretrained_adapter (str | None) –

  • pretrained_adapter_subfolder (str) –

  • pretrained_adapter_weights_name (str) –

  • unet_lora_config (dict | None) –

  • text_encoder_lora_config (dict | None) –

  • finetune_text_encoder (bool) –

  • zeros_image_embeddings_prob (float) –

  • data_preprocessor (dict | torch.nn.Module | None) –

  • hidden_states_idx (int) –

class diffengine.models.editors.ip_adapter.ip_adapter_xl.IPAdapterXLPlus(*args, image_encoder, image_projection, feature_extractor, pretrained_adapter=None, pretrained_adapter_subfolder='', pretrained_adapter_weights_name='', unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, zeros_image_embeddings_prob=0.1, data_preprocessor=None, hidden_states_idx=-2, **kwargs)[source]

Bases: IPAdapterXL

Stable Diffusion XL IP-Adapter Plus.

Parameters:
  • image_encoder (dict) –

  • image_projection (dict) –

  • feature_extractor (dict) –

  • pretrained_adapter (str | None) –

  • pretrained_adapter_subfolder (str) –

  • pretrained_adapter_weights_name (str) –

  • unet_lora_config (dict | None) –

  • text_encoder_lora_config (dict | None) –

  • finetune_text_encoder (bool) –

  • zeros_image_embeddings_prob (float) –

  • data_preprocessor (dict | torch.nn.Module | None) –

  • hidden_states_idx (int) –

prepare_model()[source]

Prepare model for training.

Disable gradient for some models.

Return type:

None

forward(inputs, data_samples=None, mode='loss')[source]

Forward function.

Args:

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:

dict: The loss dict.

Parameters:
  • inputs (dict) –

  • data_samples (Optional[list]) –

  • mode (str) –

Return type:

dict