diffengine.models.editors.kandinsky¶
Submodules¶
Package Contents¶
Classes¶
KandinskyV3. |
|
KandinskyV22 Decoder. |
|
KandinskyV22DecoderDataPreprocessor. |
|
KandinskyV22 Prior. |
- class diffengine.models.editors.kandinsky.KandinskyV3(tokenizer, scheduler, text_encoder, vae, unet, model='kandinsky-community/kandinsky-3', loss=None, unet_lora_config=None, prior_loss_weight=1.0, tokenizer_max_length=128, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, gradient_checkpointing=False, enable_xformers=False)[source]¶
Bases:
mmengine.model.BaseModelKandinskyV3.
Args:¶
tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. unet (dict): Config of unet. model (str): pretrained model name.
Defaults to “kandinsky-community/kandinsky-3”.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- tokenizer_max_length (int): The max length of tokenizer.
Defaults to 128.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
SDDataPreprocessor.- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise').- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps').- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- property device: torch.device¶
Get device information.
- Returns:
torch.device
- Return type:
device.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- loss(model_pred, noise, latents, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- _preprocess_model_input(latents, noise, timesteps)[source]¶
Preprocess model input.
- Parameters:
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
- Return type:
torch.Tensor
- _forward_vae(img, num_batches)[source]¶
Forward vae.
- Parameters:
img (torch.Tensor) –
num_batches (int) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
vae (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
prior_loss_weight (float) –
tokenizer_max_length (int) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
- class diffengine.models.editors.kandinsky.KandinskyV22Decoder(scheduler, image_encoder, vae, unet, decoder_model='kandinsky-community/kandinsky-2-2-decoder', prior_model='kandinsky-community/kandinsky-2-2-prior', loss=None, unet_lora_config=None, prior_loss_weight=1.0, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, gradient_checkpointing=False, enable_xformers=False)[source]¶
Bases:
mmengine.model.BaseModelKandinskyV22 Decoder.
Args:¶
scheduler (dict): Config of scheduler. image_encoder (dict): Config of image encoder. vae (dict): Config of vae. unet (dict): Config of unet. decoder_model (str): pretrained model name of decoder.
Defaults to “kandinsky-community/kandinsky-2-2-decoder”.
- prior_model (str): pretrained model name of prior.
Defaults to “kandinsky-community/kandinsky-2-2-prior”.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
SDDataPreprocessor.- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise').- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps').- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- property device: torch.device¶
Get device information.
- Returns:
torch.device
- Return type:
device.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- loss(model_pred, noise, latents, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- _preprocess_model_input(latents, noise, timesteps)[source]¶
Preprocess model input.
- Parameters:
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
- Return type:
torch.Tensor
- _forward_vae(img, num_batches)[source]¶
Forward vae.
- Parameters:
img (torch.Tensor) –
num_batches (int) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
scheduler (dict) –
image_encoder (dict) –
vae (dict) –
unet (dict) –
decoder_model (str) –
prior_model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
- class diffengine.models.editors.kandinsky.KandinskyV22DecoderDataPreprocessor(non_blocking=False)[source]¶
Bases:
mmengine.model.base_model.data_preprocessor.BaseDataPreprocessorKandinskyV22DecoderDataPreprocessor.
- Parameters:
non_blocking (Optional[bool]) –
- forward(data, training=False)[source]¶
Preprocesses the data into the model input format.
After the data pre-processing of
cast_data(),forwardwill stack the input tensor list to a batch tensor at the first dimension.Args:¶
data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
Returns:¶
dict or list: Data in the same format as the model input.
- Parameters:
data (dict) –
training (bool) –
- Return type:
dict | list
- class diffengine.models.editors.kandinsky.KandinskyV22Prior(tokenizer, scheduler, text_encoder, image_encoder, prior, decoder_model='kandinsky-community/kandinsky-2-2-decoder', prior_model='kandinsky-community/kandinsky-2-2-prior', loss=None, prior_lora_config=None, prior_loss_weight=1.0, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, *, gradient_checkpointing=False, enable_xformers=False)[source]¶
Bases:
mmengine.model.BaseModelKandinskyV22 Prior.
Args:¶
tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. image_encoder (dict): Config of image encoder. prior (dict): Config of prior. decoder_model (str): pretrained model name of decoder.
Defaults to “kandinsky-community/kandinsky-2-2-decoder”.
- prior_model (str): pretrained model name of prior.
Defaults to “kandinsky-community/kandinsky-2-2-prior”.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0).- prior_lora_config (dict, optional): The LoRA config dict for Prior.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- data_preprocessor (dict, optional): The pre-process config of
SDDataPreprocessor.- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise').- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps').- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- property device: torch.device¶
Get device information.
- Returns:
torch.device
- Return type:
device.
- prepare_model()[source]¶
Prepare model for training.
Disable gradient for some models.
- Return type:
None
- infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
- Parameters:
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
- Return type:
list[numpy.ndarray]
- loss(model_pred, noise, latents, timesteps, weight=None)[source]¶
Calculate loss.
- Parameters:
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
- Return type:
dict[str, torch.Tensor]
- _preprocess_model_input(latents, noise, timesteps)[source]¶
Preprocess model input.
- Parameters:
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
- Return type:
torch.Tensor
- forward(inputs, data_samples=None, mode='loss')[source]¶
Forward function.
Args:¶
inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
Returns:¶
dict: The loss dict.
- Parameters:
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
- Return type:
dict
- Parameters:
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
image_encoder (dict) –
prior (dict) –
decoder_model (str) –
prior_model (str) –
loss (dict | None) –
prior_lora_config (dict | None) –
prior_loss_weight (float) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
gradient_checkpointing (bool) –
enable_xformers (bool) –