diffengine.datasets.transforms

Submodules

Package Contents

Classes

BaseTransform

Base class for all transformations.

DumpImage

Dump the image processed by the pipeline.

DumpMaskedImage

Dump Masked the image processed by the pipeline.

PackInputs

Pack the inputs data.

LoadMask

Load Mask for multiple types.

AddConstantCaption

AddConstantCaption.

CenterCrop

CenterCrop.

CLIPImageProcessor

CLIPImageProcessor.

ComputeaMUSEdMicroConds

Compute aMUSEd micro_conds as 'micro_conds' in results.

ComputePixArtImgInfo

Compute Orig Height and Widh + Aspect Ratio.

ComputeTimeIds

Compute time ids as 'time_ids' in results.

ConcatMultipleImgs

ConcatMultipleImgs.

GetMaskedImage

GetMaskedImage.

MaskToTensor

MaskToTensor.

MultiAspectRatioResizeCenterCrop

Multi Aspect Ratio Resize and Center Crop.

RandomCrop

RandomCrop.

RandomHorizontalFlip

RandomHorizontalFlip.

RandomTextDrop

RandomTextDrop. Replace text to empty.

SaveImageShape

Save image shape as 'ori_img_shape' in results.

T5TextPreprocess

T5 Text Preprocess.

TimmImageProcessor

TransformersImageProcessor.

TorchVisonTransformWrapper

TorchVisonTransformWrapper.

TransformersImageProcessor

TransformersImageProcessor.

RandomChoice

Process data with a randomly chosen transform from given candidates.

Attributes

TRANSFORMS

class diffengine.datasets.transforms.BaseTransform[source]

Base class for all transformations.

__call__(results)[source]

Call function to transform data.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

abstract transform(results)[source]

Transform the data.

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Args:

results (dict): The result dict.

Returns:

dict: The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

class diffengine.datasets.transforms.DumpImage(max_imgs, dump_dir)[source]

Dump the image processed by the pipeline.

Args:

max_imgs (int): Maximum value of output. dump_dir (str): Dump output directory.

__call__(results)[source]

Dump the input image to the specified directory.

No changes will be made.

Args:

results (dict): Result dict from loading pipeline.

Returns:

results (dict): Result dict from loading pipeline. (same as input)

Parameters:

results (dict) –

Return type:

dict

Parameters:
  • max_imgs (int) –

  • dump_dir (str) –

class diffengine.datasets.transforms.DumpMaskedImage(max_imgs, dump_dir)[source]

Dump Masked the image processed by the pipeline.

Args:

max_imgs (int): Maximum value of output. dump_dir (str): Dump output directory.

__call__(results)[source]

Dump the input image to the specified directory.

No changes will be made.

Args:

results (dict): Result dict from loading pipeline.

Returns:

results (dict): Result dict from loading pipeline. (same as input)

Parameters:

results (dict) –

Return type:

dict

Parameters:
  • max_imgs (int) –

  • dump_dir (str) –

class diffengine.datasets.transforms.PackInputs(input_keys=None, skip_to_tensor_key=None)[source]

Bases: diffengine.datasets.transforms.BaseTransform

Pack the inputs data.

Required Keys:

  • input_key

Deleted Keys:

All other keys in the dict.

Args:

input_keys (List[str]): The key of element to feed into the model

forwarding. Defaults to [‘img’, ‘text’].

skip_to_tensor_key (List[str]): The key of element to skip to_tensor.

Defaults to [‘text’].

transform(results)[source]

Transform the data.

Parameters:

results (dict) –

Return type:

dict

Parameters:
  • input_keys (list[str] | None) –

  • skip_to_tensor_key (list[str] | None) –

class diffengine.datasets.transforms.LoadMask(mask_mode='bbox', mask_config=None)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

Load Mask for multiple types.

Copied from https://github.com/open-mmlab/mmagic/blob/main/mmagic/utils/trans_utils.py

Reference from: mmagic.datasets.transforms.loading.LoadMask

For different types of mask, users need to provide the corresponding config dict.

Example config for bbox:

config = dict(max_bbox_shape=128)

Example config for irregular:

config = dict(
    num_vertices=(4, 12),
    max_angle=4.,
    length_range=(10, 100),
    brush_width=(10, 40),
    area_ratio_range=(0.15, 0.5))

Example config for ff:

config = dict(
    num_vertices=(4, 12),
    mean_angle=1.2,
    angle_range=0.4,
    brush_width=(12, 40))

Args:

mask_mode (str): Mask mode in [‘bbox’, ‘irregular’, ‘ff’, ‘set’,

‘whole’]. Default: ‘bbox’. * bbox: square bounding box masks. * irregular: irregular holes. * ff: free-form holes from DeepFillv2. * set: randomly get a mask from a mask set. * whole: use the whole image as mask.

mask_config (dict): Params for creating masks. Each type of mask needs

different configs. Default: None.

transform(results)[source]

Transform function.

Args:
results (dict): A dict containing the necessary information and

data for augmentation.

Returns:

dict: A dict containing the processed data and information.

Parameters:

results (dict) –

Return type:

dict

Parameters:
  • mask_mode (str) –

  • mask_config (dict | None) –

diffengine.datasets.transforms.TRANSFORMS[source]
class diffengine.datasets.transforms.AddConstantCaption(constant_caption, keys=None)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

AddConstantCaption.

Example. “a dog.” * constant_caption=”in szn style”

-> “a dog. in szn style”

Args:

constant_caption (str): constant_caption to add. keys (List[str], optional): keys to apply augmentation from results.

Defaults to None.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • constant_caption (str) –

  • keys (list[str] | None) –

class diffengine.datasets.transforms.CenterCrop(*args, size, keys=None, **kwargs)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

CenterCrop.

The difference from torchvision/CenterCrop is

1. save crop top left as ‘crop_top_left’ and crop_bottom_right in results

Args:

size (sequence or int): Desired output size of the crop. If size is an

int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0])

keys (List[str]): keys to apply augmentation from results.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:

dict: ‘crop_top_left’ key is added as crop points.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • size (collections.abc.Sequence[int] | int) –

  • keys (list[str] | None) –

class diffengine.datasets.transforms.CLIPImageProcessor(key='img', output_key='clip_img', pretrained=None, subfolder=None)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

CLIPImageProcessor.

Args:

key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from

results. Defaults to ‘clip_img’.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • key (str) –

  • output_key (str) –

  • pretrained (str | None) –

  • subfolder (str | None) –

class diffengine.datasets.transforms.ComputeaMUSEdMicroConds[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

Compute aMUSEd micro_conds as ‘micro_conds’ in results.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:

dict: ‘micro_conds’ key is added as original image shape.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

class diffengine.datasets.transforms.ComputePixArtImgInfo[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

Compute Orig Height and Widh + Aspect Ratio.

Return ‘resolution’, ‘aspect_ratio’ in results

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:

dict: ‘time_ids’ key is added as original image shape.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

class diffengine.datasets.transforms.ComputeTimeIds[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

Compute time ids as ‘time_ids’ in results.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:

dict: ‘time_ids’ key is added as original image shape.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

class diffengine.datasets.transforms.ConcatMultipleImgs(keys=None)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

ConcatMultipleImgs.

Args:

keys (List[str], optional): keys to apply augmentation from results.

Defaults to None.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:

keys (list[str] | None) –

class diffengine.datasets.transforms.GetMaskedImage(key='masked_image')[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

GetMaskedImage.

Args:

key (str): key to outputs.

Defaults to ‘masked_image’.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:

key (str) –

class diffengine.datasets.transforms.MaskToTensor(key='mask')[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

MaskToTensor.

  1. Convert mask to tensor.

  2. Transpose mask from (H, W, 1) to (1, H, W)

Args:

key (str): key to apply augmentation from results.

Defaults to ‘mask’.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:

key (str) –

class diffengine.datasets.transforms.MultiAspectRatioResizeCenterCrop(*args, sizes, keys=None, interpolation='bilinear', **kwargs)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

Multi Aspect Ratio Resize and Center Crop.

Args:

sizes (List[sequence]): List of desired output size of the crop.

Sequence like (h, w).

keys (List[str]): keys to apply augmentation from results. interpolation (str): Desired interpolation enum defined by

torchvision.transforms.InterpolationMode. Defaults to ‘bilinear’.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • sizes (list[collections.abc.Sequence[int]]) –

  • keys (list[str] | None) –

  • interpolation (str) –

class diffengine.datasets.transforms.RandomCrop(*args, size, keys=None, force_same_size=True, **kwargs)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

RandomCrop.

The difference from torchvision/RandomCrop is

1. save crop top left as ‘crop_top_left’ and crop_bottom_right in results 2. apply same random parameters to multiple keys like [‘img’, ‘condition_img’].

Args:

size (sequence or int): Desired output size of the crop. If size is an

int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0])

keys (List[str]): keys to apply augmentation from results. force_same_size (bool): Force same size for all keys. Defaults to True.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:
dict: ‘crop_top_left’ and crop_bottom_right key is added as crop

point.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • size (collections.abc.Sequence[int] | int) –

  • keys (list[str] | None) –

  • force_same_size (bool) –

class diffengine.datasets.transforms.RandomHorizontalFlip(*args, p=0.5, keys=None, **kwargs)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

RandomHorizontalFlip.

The difference from torchvision/RandomHorizontalFlip is
  1. update ‘crop_top_left’ and crop_bottom_right if exists.

2. apply same random parameters to multiple keys like [‘img’, ‘condition_img’].

Args:

p (float): probability of the image being flipped.

Default value is 0.5.

keys (List[str]): keys to apply augmentation from results.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:

dict: ‘crop_top_left’ key is fixed.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • p (float) –

  • keys (list[str] | None) –

class diffengine.datasets.transforms.RandomTextDrop(p=0.1, keys=None)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

RandomTextDrop. Replace text to empty.

Args:

p (float): probability of the image being flipped.

Default value is 0.5.

keys (List[str]): keys to apply augmentation from results.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • p (float) –

  • keys (list[str] | None) –

class diffengine.datasets.transforms.SaveImageShape[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

Save image shape as ‘ori_img_shape’ in results.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:

dict: ‘ori_img_shape’ key is added as original image shape.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

class diffengine.datasets.transforms.T5TextPreprocess(keys=None, *, clean_caption=True)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

T5 Text Preprocess.

Args:

keys (List[str]): keys to apply augmentation from results. clean_caption (bool): clean caption. Defaults to False.

_clean_caption(caption)[source]

Clean caption.

Copied from diffusers.pipelines.deepfloyd_if.pipeline_if.IFPipeline._clean_caption

Parameters:

caption (str) –

Return type:

str

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • keys (list[str] | None) –

  • clean_caption (bool) –

class diffengine.datasets.transforms.TimmImageProcessor(pretrained, key='img', output_key='clip_img')[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

TransformersImageProcessor.

Args:

pretrained (str): pretrained model name. key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from

results. Defaults to ‘clip_img’.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • pretrained (str) –

  • key (str) –

  • output_key (str) –

class diffengine.datasets.transforms.TorchVisonTransformWrapper(transform, *args, keys=None, **kwargs)[source]

TorchVisonTransformWrapper.

We can use torchvision.transforms like dict(type=’torchvision/Resize’, size=512)

Args:

transform (str): The name of transform. For example

torchvision/Resize.

keys (List[str]): keys to apply augmentation from results.

__call__(results)[source]

Call transform.

Parameters:

results (dict) –

Return type:

dict

__repr__()[source]

Repr.

Return type:

str

Parameters:

keys (list[str] | None) –

class diffengine.datasets.transforms.TransformersImageProcessor(key='img', output_key='clip_img', pretrained=None)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

TransformersImageProcessor.

Args:

key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from

results. Defaults to ‘clip_img’.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • key (str) –

  • output_key (str) –

  • pretrained (str | None) –

class diffengine.datasets.transforms.RandomChoice(transforms, prob=None)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

Process data with a randomly chosen transform from given candidates.

Copied from mmcv/transforms/wrappers.py.

Args:

transforms (list[list]): A list of transform candidates, each is a

sequence of transforms.

prob (list[float], optional): The probabilities associated

with each pipeline. The length should be equal to the pipeline number and the sum should be 1. If not given, a uniform distribution will be assumed.

Examples:

>>> # config
>>> pipeline = [
>>>     dict(type='RandomChoice',
>>>         transforms=[
>>>             [dict(type='RandomHorizontalFlip')],  # subpipeline 1
>>>             [dict(type='RandomRotate')],  # subpipeline 2
>>>         ]
>>>     )
>>> ]
__iter__()[source]

Iterate over transforms.

Return type:

collections.abc.Iterator

random_pipeline_index()[source]

Return a random transform index.

Return type:

int

transform(results)[source]

Randomly choose a transform to apply.

Parameters:

results (dict) –

Return type:

dict | None

Parameters:
  • transforms (list[Transform | list[Transform]]) –

  • prob (list[float] | None) –