diffengine.models.editors.ip_adapter¶
Submodules¶
Package Contents¶
Classes¶
Stable Diffusion XL IP-Adapter. |
|
Stable Diffusion XL IP-Adapter Plus. |
|
IPAdapterXLDataPreprocessor. |
|
Stable Diffusion XL IP-Adapter Plus. |
- class diffengine.models.editors.ip_adapter.IPAdapterXL(*args, image_encoder, image_projection, feature_extractor, pretrained_adapter=None, pretrained_adapter_subfolder='', pretrained_adapter_weights_name='', unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, zeros_image_embeddings_prob=0.1, data_preprocessor=None, hidden_states_idx=-2, **kwargs)[source]¶
Bases:
diffengine.models.editors.stable_diffusion_xl.StableDiffusionXLStable Diffusion XL IP-Adapter.
Args:¶
image_encoder (dict): The image encoder config. image_projection (dict): The image projection config. feature_extractor (dict): The feature extractor config. pretrained_adapter (str, optional): Path to pretrained IP-Adapter.
Defaults to None.
- pretrained_adapter_subfolder (str, optional): Sub folder of pretrained
IP-Adapter. Defaults to ‘’.
- pretrained_adapter_weights_name (str, optional): Weights name of
pretrained IP-Adapter. Defaults to ‘’.
- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
- zeros_image_embeddings_prob (float): The probabilities to
generate zeros image embeddings. Defaults to 0.1.
- data_preprocessor (dict, optional): The pre-process config of
SDControlNetDataPreprocessor.- hidden_states_idx (int): Index of the hidden states to be used.
Defaults to -2.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, example_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- example_image (List[Union[str, Image.Image]]):
The image prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
example_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
image_encoder (dict) –
image_projection (dict) –
feature_extractor (dict) –
pretrained_adapter (str | None) –
pretrained_adapter_subfolder (str) –
pretrained_adapter_weights_name (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
zeros_image_embeddings_prob (float) –
data_preprocessor (dict | torch.nn.Module | None) –
hidden_states_idx (int) –
- class diffengine.models.editors.ip_adapter.IPAdapterXLPlus(*args, image_encoder, image_projection, feature_extractor, pretrained_adapter=None, pretrained_adapter_subfolder='', pretrained_adapter_weights_name='', unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, zeros_image_embeddings_prob=0.1, data_preprocessor=None, hidden_states_idx=-2, **kwargs)[source]¶
Bases:
IPAdapterXLStable Diffusion XL IP-Adapter Plus.
- Parameters:
image_encoder (dict) –
image_projection (dict) –
feature_extractor (dict) –
pretrained_adapter (str | None) –
pretrained_adapter_subfolder (str) –
pretrained_adapter_weights_name (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
zeros_image_embeddings_prob (float) –
data_preprocessor (dict | torch.nn.Module | None) –
hidden_states_idx (int) –
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- class diffengine.models.editors.ip_adapter.IPAdapterXLDataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorIPAdapterXLDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.editors.ip_adapter.TimmIPAdapterXLPlus(*args, image_encoder, image_projection, feature_extractor, pretrained_adapter=None, pretrained_adapter_subfolder='', pretrained_adapter_weights_name='', unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, zeros_image_embeddings_prob=0.1, data_preprocessor=None, hidden_states_idx=-2, **kwargs)[source]¶
Bases:
diffengine.models.editors.ip_adapter.ip_adapter_xl.IPAdapterXLPlusStable Diffusion XL IP-Adapter Plus.
- Parameters:
image_encoder (dict) –
image_projection (dict) –
feature_extractor (dict) –
pretrained_adapter (str | None) –
pretrained_adapter_subfolder (str) –
pretrained_adapter_weights_name (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
zeros_image_embeddings_prob (float) –
data_preprocessor (dict | torch.nn.Module | None) –
hidden_states_idx (int) –
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, example_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- example_image (List[Union[str, Image.Image]]):
The image prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
example_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (list | None) –
mode (str) –
- Return type:
dict