diffengine.datasets.transforms¶
Submodules¶
Package Contents¶
Classes¶
Base class for all transformations. |
|
Dump the image processed by the pipeline. |
|
Dump Masked the image processed by the pipeline. |
|
Pack the inputs data. |
|
Load Mask for multiple types. |
|
AddConstantCaption. |
|
CenterCrop. |
|
CLIPImageProcessor. |
|
Compute Orig Height and Widh + Aspect Ratio. |
|
Compute time ids as 'time_ids' in results. |
|
GetMaskedImage. |
|
MaskToTensor. |
|
Multi Aspect Ratio Resize and Center Crop. |
|
RandomCrop. |
|
RandomHorizontalFlip. |
|
RandomTextDrop. Replace text to empty. |
|
Save image shape as 'ori_img_shape' in results. |
|
T5 Text Preprocess. |
|
TorchVisonTransformWrapper. |
|
Process data with a randomly chosen transform from given candidates. |
Attributes¶
- class diffengine.datasets.transforms.BaseTransform[source]¶
Base class for all transformations.
- __call__(results)[source]¶
Call function to transform data.
- Parameters:
results (dict) –
- Return type:
dict | tuple[list, list] | None
- abstract transform(results)[source]¶
Transform the data.
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
Args:¶
results (dict): The result dict.
Returns:¶
dict: The result dict.
- Parameters:
results (dict) –
- Return type:
dict | tuple[list, list] | None
- class diffengine.datasets.transforms.DumpImage(max_imgs, dump_dir)[source]¶
Dump the image processed by the pipeline.
Args:¶
max_imgs (int): Maximum value of output. dump_dir (str): Dump output directory.
- Parameters:
max_imgs (int) –
dump_dir (str) –
- class diffengine.datasets.transforms.DumpMaskedImage(max_imgs, dump_dir)[source]¶
Dump Masked the image processed by the pipeline.
Args:¶
max_imgs (int): Maximum value of output. dump_dir (str): Dump output directory.
- Parameters:
max_imgs (int) –
dump_dir (str) –
- class diffengine.datasets.transforms.PackInputs(input_keys=None, skip_to_tensor_key=None)[source]¶
Bases:
diffengine.datasets.transforms.BaseTransformPack the inputs data.
Required Keys:
input_key
Deleted Keys:
All other keys in the dict.
Args:¶
- input_keys (List[str]): The key of element to feed into the model
forwarding. Defaults to [‘img’, ‘text’].
- skip_to_tensor_key (List[str]): The key of element to skip to_tensor.
Defaults to [‘text’].
- Parameters:
input_keys (list[str] | None) –
skip_to_tensor_key (list[str] | None) –
- class diffengine.datasets.transforms.LoadMask(mask_mode='bbox', mask_config=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformLoad Mask for multiple types.
Copied from https://github.com/open-mmlab/mmagic/blob/main/mmagic/utils/trans_utils.py
Reference from: mmagic.datasets.transforms.loading.LoadMask
For different types of mask, users need to provide the corresponding config dict.
Example config for bbox:
config = dict(max_bbox_shape=128)
Example config for irregular:
config = dict( num_vertices=(4, 12), max_angle=4., length_range=(10, 100), brush_width=(10, 40), area_ratio_range=(0.15, 0.5))
Example config for ff:
config = dict( num_vertices=(4, 12), mean_angle=1.2, angle_range=0.4, brush_width=(12, 40))
Args:¶
- mask_mode (str): Mask mode in [‘bbox’, ‘irregular’, ‘ff’, ‘set’,
‘whole’]. Default: ‘bbox’. * bbox: square bounding box masks. * irregular: irregular holes. * ff: free-form holes from DeepFillv2. * set: randomly get a mask from a mask set. * whole: use the whole image as mask.
- mask_config (dict): Params for creating masks. Each type of mask needs
different configs. Default: None.
- Parameters:
mask_mode (str) –
mask_config (dict | None) –
- class diffengine.datasets.transforms.AddConstantCaption(constant_caption, keys=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformAddConstantCaption.
- Example. “a dog.” * constant_caption=”in szn style”
-> “a dog. in szn style”
Args:¶
constant_caption (str): constant_caption to add. keys (List[str], optional): keys to apply augmentation from results.
Defaults to None.
- Parameters:
constant_caption (str) –
keys (list[str] | None) –
- class diffengine.datasets.transforms.CenterCrop(*args, size, keys=None, **kwargs)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformCenterCrop.
- The difference from torchvision/CenterCrop is
1. save crop top left as ‘crop_top_left’ and crop_bottom_right in results
Args:¶
- size (sequence or int): Desired output size of the crop. If size is an
int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0])
keys (List[str]): keys to apply augmentation from results.
- Parameters:
size (collections.abc.Sequence[int] | int) –
keys (list[str] | None) –
- class diffengine.datasets.transforms.CLIPImageProcessor(key='img', output_key='clip_img', pretrained=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformCLIPImageProcessor.
Args:¶
key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from
results. Defaults to ‘clip_img’.
- Parameters:
key (str) –
output_key (str) –
pretrained (str | None) –
- class diffengine.datasets.transforms.ComputePixArtImgInfo[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformCompute Orig Height and Widh + Aspect Ratio.
Return ‘resolution’, ‘aspect_ratio’ in results
- class diffengine.datasets.transforms.ComputeTimeIds[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformCompute time ids as ‘time_ids’ in results.
- class diffengine.datasets.transforms.GetMaskedImage(key='masked_image')[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformGetMaskedImage.
Args:¶
- key (str): key to outputs.
Defaults to ‘masked_image’.
- Parameters:
key (str) –
- class diffengine.datasets.transforms.MaskToTensor(key='mask')[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformMaskToTensor.
Convert mask to tensor.
Transpose mask from (H, W, 1) to (1, H, W)
Args:¶
- key (str): key to apply augmentation from results.
Defaults to ‘mask’.
- Parameters:
key (str) –
- class diffengine.datasets.transforms.MultiAspectRatioResizeCenterCrop(*args, sizes, keys=None, interpolation='bilinear', **kwargs)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformMulti Aspect Ratio Resize and Center Crop.
Args:¶
- sizes (List[sequence]): List of desired output size of the crop.
Sequence like (h, w).
keys (List[str]): keys to apply augmentation from results. interpolation (str): Desired interpolation enum defined by
torchvision.transforms.InterpolationMode. Defaults to ‘bilinear’.
- Parameters:
sizes (list[collections.abc.Sequence[int]]) –
keys (list[str] | None) –
interpolation (str) –
- class diffengine.datasets.transforms.RandomCrop(*args, size, keys=None, force_same_size=True, **kwargs)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformRandomCrop.
- The difference from torchvision/RandomCrop is
1. save crop top left as ‘crop_top_left’ and crop_bottom_right in results 2. apply same random parameters to multiple keys like [‘img’, ‘condition_img’].
Args:¶
- size (sequence or int): Desired output size of the crop. If size is an
int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0])
keys (List[str]): keys to apply augmentation from results. force_same_size (bool): Force same size for all keys. Defaults to True.
- Parameters:
size (collections.abc.Sequence[int] | int) –
keys (list[str] | None) –
force_same_size (bool) –
- class diffengine.datasets.transforms.RandomHorizontalFlip(*args, p=0.5, keys=None, **kwargs)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformRandomHorizontalFlip.
- The difference from torchvision/RandomHorizontalFlip is
update ‘crop_top_left’ and crop_bottom_right if exists.
2. apply same random parameters to multiple keys like [‘img’, ‘condition_img’].
Args:¶
- p (float): probability of the image being flipped.
Default value is 0.5.
keys (List[str]): keys to apply augmentation from results.
- Parameters:
p (float) –
keys (list[str] | None) –
- class diffengine.datasets.transforms.RandomTextDrop(p=0.1, keys=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformRandomTextDrop. Replace text to empty.
Args:¶
- p (float): probability of the image being flipped.
Default value is 0.5.
keys (List[str]): keys to apply augmentation from results.
- Parameters:
p (float) –
keys (list[str] | None) –
- class diffengine.datasets.transforms.SaveImageShape[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformSave image shape as ‘ori_img_shape’ in results.
- class diffengine.datasets.transforms.T5TextPreprocess(keys=None, *, clean_caption=True)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformT5 Text Preprocess.
Args:¶
keys (List[str]): keys to apply augmentation from results. clean_caption (bool): clean caption. Defaults to False.
- Parameters:
keys (list[str] | None) –
clean_caption (bool) –
- class diffengine.datasets.transforms.TorchVisonTransformWrapper(transform, *args, keys=None, **kwargs)[source]¶
TorchVisonTransformWrapper.
We can use torchvision.transforms like dict(type=’torchvision/Resize’, size=512)
Args:¶
- transform (str): The name of transform. For example
torchvision/Resize.
keys (List[str]): keys to apply augmentation from results.
- Parameters:
keys (list[str] | None) –
- class diffengine.datasets.transforms.RandomChoice(transforms, prob=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransformProcess data with a randomly chosen transform from given candidates.
Copied from mmcv/transforms/wrappers.py.
Args:¶
- transforms (list[list]): A list of transform candidates, each is a
sequence of transforms.
- prob (list[float], optional): The probabilities associated
with each pipeline. The length should be equal to the pipeline number and the sum should be 1. If not given, a uniform distribution will be assumed.
Examples:¶
>>> # config >>> pipeline = [ >>> dict(type='RandomChoice', >>> transforms=[ >>> [dict(type='RandomHorizontalFlip')], # subpipeline 1 >>> [dict(type='RandomRotate')], # subpipeline 2 >>> ] >>> ) >>> ]
- Parameters:
transforms (list[Transform | list[Transform]]) –
prob (list[float] | None) –