diffengine.datasets.transforms.processing

Module Contents

Classes

TorchVisonTransformWrapper

TorchVisonTransformWrapper.

SaveImageShape

Save image shape as 'ori_img_shape' in results.

RandomCrop

RandomCrop.

CenterCrop

CenterCrop.

MultiAspectRatioResizeCenterCrop

Multi Aspect Ratio Resize and Center Crop.

RandomHorizontalFlip

RandomHorizontalFlip.

ComputeTimeIds

Compute time ids as 'time_ids' in results.

ComputePixArtImgInfo

Compute Orig Height and Widh + Aspect Ratio.

CLIPImageProcessor

CLIPImageProcessor.

RandomTextDrop

RandomTextDrop. Replace text to empty.

T5TextPreprocess

T5 Text Preprocess.

MaskToTensor

MaskToTensor.

GetMaskedImage

GetMaskedImage.

AddConstantCaption

AddConstantCaption.

ConcatMultipleImgs

ConcatMultipleImgs.

ComputeaMUSEdMicroConds

Compute aMUSEd micro_conds as 'micro_conds' in results.

TransformersImageProcessor

TransformersImageProcessor.

TimmImageProcessor

TransformersImageProcessor.

Functions

_str_to_torch_dtype(t)

Map to torch.dtype.

_interpolation_modes_from_str(t)

Map to Interpolation.

register_vision_transforms()

Register vision transforms.

Attributes

VISION_TRANSFORMS

diffengine.datasets.transforms.processing._str_to_torch_dtype(t)[source]

Map to torch.dtype.

Parameters:

t (str) –

diffengine.datasets.transforms.processing._interpolation_modes_from_str(t)[source]

Map to Interpolation.

Parameters:

t (str) –

class diffengine.datasets.transforms.processing.TorchVisonTransformWrapper(transform, *args, keys=None, **kwargs)[source]

TorchVisonTransformWrapper.

We can use torchvision.transforms like dict(type=’torchvision/Resize’, size=512)

Args:

transform (str): The name of transform. For example

torchvision/Resize.

keys (List[str]): keys to apply augmentation from results.

__call__(results)[source]

Call transform.

Parameters:

results (dict) –

Return type:

dict

__repr__()[source]

Repr.

Return type:

str

Parameters:

keys (list[str] | None) –

diffengine.datasets.transforms.processing.register_vision_transforms()[source]

Register vision transforms.

Register transforms in torchvision.transforms to the TRANSFORMS registry.

Returns:

List[str]

Return type:

A list of registered transforms’ name.

diffengine.datasets.transforms.processing.VISION_TRANSFORMS[source]
class diffengine.datasets.transforms.processing.SaveImageShape[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

Save image shape as ‘ori_img_shape’ in results.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:

dict: ‘ori_img_shape’ key is added as original image shape.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

class diffengine.datasets.transforms.processing.RandomCrop(*args, size, keys=None, force_same_size=True, **kwargs)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

RandomCrop.

The difference from torchvision/RandomCrop is

1. save crop top left as ‘crop_top_left’ and crop_bottom_right in results 2. apply same random parameters to multiple keys like [‘img’, ‘condition_img’].

Args:

size (sequence or int): Desired output size of the crop. If size is an

int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0])

keys (List[str]): keys to apply augmentation from results. force_same_size (bool): Force same size for all keys. Defaults to True.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:
dict: ‘crop_top_left’ and crop_bottom_right key is added as crop

point.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • size (collections.abc.Sequence[int] | int) –

  • keys (list[str] | None) –

  • force_same_size (bool) –

class diffengine.datasets.transforms.processing.CenterCrop(*args, size, keys=None, **kwargs)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

CenterCrop.

The difference from torchvision/CenterCrop is

1. save crop top left as ‘crop_top_left’ and crop_bottom_right in results

Args:

size (sequence or int): Desired output size of the crop. If size is an

int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0])

keys (List[str]): keys to apply augmentation from results.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:

dict: ‘crop_top_left’ key is added as crop points.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • size (collections.abc.Sequence[int] | int) –

  • keys (list[str] | None) –

class diffengine.datasets.transforms.processing.MultiAspectRatioResizeCenterCrop(*args, sizes, keys=None, interpolation='bilinear', **kwargs)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

Multi Aspect Ratio Resize and Center Crop.

Args:

sizes (List[sequence]): List of desired output size of the crop.

Sequence like (h, w).

keys (List[str]): keys to apply augmentation from results. interpolation (str): Desired interpolation enum defined by

torchvision.transforms.InterpolationMode. Defaults to ‘bilinear’.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • sizes (list[collections.abc.Sequence[int]]) –

  • keys (list[str] | None) –

  • interpolation (str) –

class diffengine.datasets.transforms.processing.RandomHorizontalFlip(*args, p=0.5, keys=None, **kwargs)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

RandomHorizontalFlip.

The difference from torchvision/RandomHorizontalFlip is
  1. update ‘crop_top_left’ and crop_bottom_right if exists.

2. apply same random parameters to multiple keys like [‘img’, ‘condition_img’].

Args:

p (float): probability of the image being flipped.

Default value is 0.5.

keys (List[str]): keys to apply augmentation from results.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:

dict: ‘crop_top_left’ key is fixed.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • p (float) –

  • keys (list[str] | None) –

class diffengine.datasets.transforms.processing.ComputeTimeIds[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

Compute time ids as ‘time_ids’ in results.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:

dict: ‘time_ids’ key is added as original image shape.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

class diffengine.datasets.transforms.processing.ComputePixArtImgInfo[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

Compute Orig Height and Widh + Aspect Ratio.

Return ‘resolution’, ‘aspect_ratio’ in results

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:

dict: ‘time_ids’ key is added as original image shape.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

class diffengine.datasets.transforms.processing.CLIPImageProcessor(key='img', output_key='clip_img', pretrained=None, subfolder=None)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

CLIPImageProcessor.

Args:

key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from

results. Defaults to ‘clip_img’.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • key (str) –

  • output_key (str) –

  • pretrained (str | None) –

  • subfolder (str | None) –

class diffengine.datasets.transforms.processing.RandomTextDrop(p=0.1, keys=None)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

RandomTextDrop. Replace text to empty.

Args:

p (float): probability of the image being flipped.

Default value is 0.5.

keys (List[str]): keys to apply augmentation from results.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • p (float) –

  • keys (list[str] | None) –

class diffengine.datasets.transforms.processing.T5TextPreprocess(keys=None, *, clean_caption=True)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

T5 Text Preprocess.

Args:

keys (List[str]): keys to apply augmentation from results. clean_caption (bool): clean caption. Defaults to False.

_clean_caption(caption)[source]

Clean caption.

Copied from diffusers.pipelines.deepfloyd_if.pipeline_if.IFPipeline._clean_caption

Parameters:

caption (str) –

Return type:

str

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • keys (list[str] | None) –

  • clean_caption (bool) –

class diffengine.datasets.transforms.processing.MaskToTensor(key='mask')[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

MaskToTensor.

  1. Convert mask to tensor.

  2. Transpose mask from (H, W, 1) to (1, H, W)

Args:

key (str): key to apply augmentation from results.

Defaults to ‘mask’.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:

key (str) –

class diffengine.datasets.transforms.processing.GetMaskedImage(key='masked_image')[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

GetMaskedImage.

Args:

key (str): key to outputs.

Defaults to ‘masked_image’.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:

key (str) –

class diffengine.datasets.transforms.processing.AddConstantCaption(constant_caption, keys=None)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

AddConstantCaption.

Example. “a dog.” * constant_caption=”in szn style”

-> “a dog. in szn style”

Args:

constant_caption (str): constant_caption to add. keys (List[str], optional): keys to apply augmentation from results.

Defaults to None.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • constant_caption (str) –

  • keys (list[str] | None) –

class diffengine.datasets.transforms.processing.ConcatMultipleImgs(keys=None)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

ConcatMultipleImgs.

Args:

keys (List[str], optional): keys to apply augmentation from results.

Defaults to None.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:

keys (list[str] | None) –

class diffengine.datasets.transforms.processing.ComputeaMUSEdMicroConds[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

Compute aMUSEd micro_conds as ‘micro_conds’ in results.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Returns:

dict: ‘micro_conds’ key is added as original image shape.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

class diffengine.datasets.transforms.processing.TransformersImageProcessor(key='img', output_key='clip_img', pretrained=None)[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

TransformersImageProcessor.

Args:

key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from

results. Defaults to ‘clip_img’.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • key (str) –

  • output_key (str) –

  • pretrained (str | None) –

class diffengine.datasets.transforms.processing.TimmImageProcessor(pretrained, key='img', output_key='clip_img')[source]

Bases: diffengine.datasets.transforms.base.BaseTransform

TransformersImageProcessor.

Args:

pretrained (str): pretrained model name. key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from

results. Defaults to ‘clip_img’.

transform(results)[source]

Transform.

Args:

results (dict): The result dict.

Parameters:

results (dict) –

Return type:

dict | tuple[list, list] | None

Parameters:
  • pretrained (str) –

  • key (str) –

  • output_key (str) –