diffengine.datasets.transforms.processing¶
Module Contents¶
Classes¶
TorchVisonTransformWrapper. |
|
Save image shape as 'ori_img_shape' in results. |
|
RandomCrop. |
|
CenterCrop. |
|
Multi Aspect Ratio Resize and Center Crop. |
|
RandomHorizontalFlip. |
|
Compute time ids as 'time_ids' in results. |
|
Compute Orig Height and Widh + Aspect Ratio. |
|
CLIPImageProcessor. |
|
RandomTextDrop. Replace text to empty. |
|
T5 Text Preprocess. |
|
MaskToTensor. |
|
GetMaskedImage. |
|
AddConstantCaption. |
|
ConcatMultipleImgs. |
|
Compute aMUSEd micro_conds as 'micro_conds' in results. |
|
TransformersImageProcessor. |
|
TransformersImageProcessor. |
Functions¶
Map to torch.dtype. |
|
Map to Interpolation. |
|
Register vision transforms. |
Attributes¶
- diffengine.datasets.transforms.processing._str_to_torch_dtype(t)[source]¶
Map to torch.dtype.
- Parameters:
t (str) –
- diffengine.datasets.transforms.processing._interpolation_modes_from_str(t)[source]¶
Map to Interpolation.
- Parameters:
t (str) –
- class diffengine.datasets.transforms.processing.TorchVisonTransformWrapper(transform, *args, keys=None, **kwargs)[source]¶
TorchVisonTransformWrapper.
We can use torchvision.transforms like dict(type=’torchvision/Resize’, size=512)
Args:¶
- transform (str): The name of transform. For example
torchvision/Resize.
keys (List[str]): keys to apply augmentation from results.
- Parameters:
keys (list[str] | None) –
- diffengine.datasets.transforms.processing.register_vision_transforms()[source]¶
Register vision transforms.
Register transforms in
torchvision.transformsto theTRANSFORMSregistry.- Returns:
List[str]
- Return type:
A list of registered transforms’ name.
- class diffengine.datasets.transforms.processing.SaveImageShape[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformSave image shape as ‘ori_img_shape’ in results.
- class diffengine.datasets.transforms.processing.RandomCrop(*args, size, keys=None, force_same_size=True, **kwargs)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformRandomCrop.
- The difference from torchvision/RandomCrop is
1. save crop top left as ‘crop_top_left’ and crop_bottom_right in results 2. apply same random parameters to multiple keys like [‘img’, ‘condition_img’].
Args:¶
- size (sequence or int): Desired output size of the crop. If size is an
int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0])
keys (List[str]): keys to apply augmentation from results. force_same_size (bool): Force same size for all keys. Defaults to True.
- Parameters:
size (collections.abc.Sequence[int] | int) –
keys (list[str] | None) –
force_same_size (bool) –
- class diffengine.datasets.transforms.processing.CenterCrop(*args, size, keys=None, **kwargs)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformCenterCrop.
- The difference from torchvision/CenterCrop is
1. save crop top left as ‘crop_top_left’ and crop_bottom_right in results
Args:¶
- size (sequence or int): Desired output size of the crop. If size is an
int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0])
keys (List[str]): keys to apply augmentation from results.
- Parameters:
size (collections.abc.Sequence[int] | int) –
keys (list[str] | None) –
- class diffengine.datasets.transforms.processing.MultiAspectRatioResizeCenterCrop(*args, sizes, keys=None, interpolation='bilinear', **kwargs)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformMulti Aspect Ratio Resize and Center Crop.
Args:¶
- sizes (List[sequence]): List of desired output size of the crop.
Sequence like (h, w).
keys (List[str]): keys to apply augmentation from results. interpolation (str): Desired interpolation enum defined by
torchvision.transforms.InterpolationMode. Defaults to ‘bilinear’.
- Parameters:
sizes (list[collections.abc.Sequence[int]]) –
keys (list[str] | None) –
interpolation (str) –
- class diffengine.datasets.transforms.processing.RandomHorizontalFlip(*args, p=0.5, keys=None, **kwargs)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformRandomHorizontalFlip.
- The difference from torchvision/RandomHorizontalFlip is
update ‘crop_top_left’ and crop_bottom_right if exists.
2. apply same random parameters to multiple keys like [‘img’, ‘condition_img’].
Args:¶
- p (float): probability of the image being flipped.
Default value is 0.5.
keys (List[str]): keys to apply augmentation from results.
- Parameters:
p (float) –
keys (list[str] | None) –
- class diffengine.datasets.transforms.processing.ComputeTimeIds[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformCompute time ids as ‘time_ids’ in results.
- class diffengine.datasets.transforms.processing.ComputePixArtImgInfo[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformCompute Orig Height and Widh + Aspect Ratio.
Return ‘resolution’, ‘aspect_ratio’ in results
- class diffengine.datasets.transforms.processing.CLIPImageProcessor(key='img', output_key='clip_img', pretrained=None, subfolder=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformCLIPImageProcessor.
Args:¶
key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from
results. Defaults to ‘clip_img’.
- Parameters:
key (str) –
output_key (str) –
pretrained (str | None) –
subfolder (str | None) –
- class diffengine.datasets.transforms.processing.RandomTextDrop(p=0.1, keys=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformRandomTextDrop. Replace text to empty.
Args:¶
- p (float): probability of the image being flipped.
Default value is 0.5.
keys (List[str]): keys to apply augmentation from results.
- Parameters:
p (float) –
keys (list[str] | None) –
- class diffengine.datasets.transforms.processing.T5TextPreprocess(keys=None, *, clean_caption=True)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformT5 Text Preprocess.
Args:¶
keys (List[str]): keys to apply augmentation from results. clean_caption (bool): clean caption. Defaults to False.
- Parameters:
keys (list[str] | None) –
clean_caption (bool) –
- class diffengine.datasets.transforms.processing.MaskToTensor(key='mask')[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformMaskToTensor.
Convert mask to tensor.
Transpose mask from (H, W, 1) to (1, H, W)
Args:¶
- key (str): key to apply augmentation from results.
Defaults to ‘mask’.
- Parameters:
key (str) –
- class diffengine.datasets.transforms.processing.GetMaskedImage(key='masked_image')[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformGetMaskedImage.
Args:¶
- key (str): key to outputs.
Defaults to ‘masked_image’.
- Parameters:
key (str) –
- class diffengine.datasets.transforms.processing.AddConstantCaption(constant_caption, keys=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformAddConstantCaption.
- Example. “a dog.” * constant_caption=”in szn style”
-> “a dog. in szn style”
Args:¶
constant_caption (str): constant_caption to add. keys (List[str], optional): keys to apply augmentation from results.
Defaults to None.
- Parameters:
constant_caption (str) –
keys (list[str] | None) –
- class diffengine.datasets.transforms.processing.ConcatMultipleImgs(keys=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformConcatMultipleImgs.
Args:¶
- keys (List[str], optional): keys to apply augmentation from results.
Defaults to None.
- Parameters:
keys (list[str] | None) –
- class diffengine.datasets.transforms.processing.ComputeaMUSEdMicroConds[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformCompute aMUSEd micro_conds as ‘micro_conds’ in results.
- class diffengine.datasets.transforms.processing.TransformersImageProcessor(key='img', output_key='clip_img', pretrained=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformTransformersImageProcessor.
Args:¶
key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from
results. Defaults to ‘clip_img’.
- Parameters:
key (str) –
output_key (str) –
pretrained (str | None) –
- class diffengine.datasets.transforms.processing.TimmImageProcessor(pretrained, key='img', output_key='clip_img')[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformTransformersImageProcessor.
Args:¶
pretrained (str): pretrained model name. key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from
results. Defaults to ‘clip_img’.
- Parameters:
pretrained (str) –
key (str) –
output_key (str) –