diffengine.datasets.transforms
¶
Submodules¶
Package Contents¶
Classes¶
Base class for all transformations. |
|
Dump the image processed by the pipeline. |
|
Dump Masked the image processed by the pipeline. |
|
Pack the inputs data. |
|
Load Mask for multiple types. |
|
AddConstantCaption. |
|
CenterCrop. |
|
CLIPImageProcessor. |
|
Compute aMUSEd micro_conds as 'micro_conds' in results. |
|
Compute Orig Height and Widh + Aspect Ratio. |
|
Compute time ids as 'time_ids' in results. |
|
ConcatMultipleImgs. |
|
GetMaskedImage. |
|
MaskToTensor. |
|
Multi Aspect Ratio Resize and Center Crop. |
|
RandomCrop. |
|
RandomHorizontalFlip. |
|
RandomTextDrop. Replace text to empty. |
|
Save image shape as 'ori_img_shape' in results. |
|
T5 Text Preprocess. |
|
TransformersImageProcessor. |
|
TorchVisonTransformWrapper. |
|
TransformersImageProcessor. |
|
Process data with a randomly chosen transform from given candidates. |
Attributes¶
- class diffengine.datasets.transforms.BaseTransform[source]¶
Base class for all transformations.
- __call__(results)[source]¶
Call function to transform data.
- Parameters:
results (dict) –
- Return type:
dict | tuple[list, list] | None
- abstract transform(results)[source]¶
Transform the data.
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
Args:¶
results (dict): The result dict.
Returns:¶
dict: The result dict.
- Parameters:
results (dict) –
- Return type:
dict | tuple[list, list] | None
- class diffengine.datasets.transforms.DumpImage(max_imgs, dump_dir)[source]¶
Dump the image processed by the pipeline.
Args:¶
max_imgs (int): Maximum value of output. dump_dir (str): Dump output directory.
- Parameters:
max_imgs (int) –
dump_dir (str) –
- class diffengine.datasets.transforms.DumpMaskedImage(max_imgs, dump_dir)[source]¶
Dump Masked the image processed by the pipeline.
Args:¶
max_imgs (int): Maximum value of output. dump_dir (str): Dump output directory.
- Parameters:
max_imgs (int) –
dump_dir (str) –
- class diffengine.datasets.transforms.PackInputs(input_keys=None, skip_to_tensor_key=None)[source]¶
Bases:
diffengine.datasets.transforms.BaseTransform
Pack the inputs data.
Required Keys:
input_key
Deleted Keys:
All other keys in the dict.
Args:¶
- input_keys (List[str]): The key of element to feed into the model
forwarding. Defaults to [‘img’, ‘text’].
- skip_to_tensor_key (List[str]): The key of element to skip to_tensor.
Defaults to [‘text’].
- Parameters:
input_keys (list[str] | None) –
skip_to_tensor_key (list[str] | None) –
- class diffengine.datasets.transforms.LoadMask(mask_mode='bbox', mask_config=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
Load Mask for multiple types.
Copied from https://github.com/open-mmlab/mmagic/blob/main/mmagic/utils/trans_utils.py
Reference from: mmagic.datasets.transforms.loading.LoadMask
For different types of mask, users need to provide the corresponding config dict.
Example config for bbox:
config = dict(max_bbox_shape=128)
Example config for irregular:
config = dict( num_vertices=(4, 12), max_angle=4., length_range=(10, 100), brush_width=(10, 40), area_ratio_range=(0.15, 0.5))
Example config for ff:
config = dict( num_vertices=(4, 12), mean_angle=1.2, angle_range=0.4, brush_width=(12, 40))
Args:¶
- mask_mode (str): Mask mode in [‘bbox’, ‘irregular’, ‘ff’, ‘set’,
‘whole’]. Default: ‘bbox’. * bbox: square bounding box masks. * irregular: irregular holes. * ff: free-form holes from DeepFillv2. * set: randomly get a mask from a mask set. * whole: use the whole image as mask.
- mask_config (dict): Params for creating masks. Each type of mask needs
different configs. Default: None.
- Parameters:
mask_mode (str) –
mask_config (dict | None) –
- class diffengine.datasets.transforms.AddConstantCaption(constant_caption, keys=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
AddConstantCaption.
- Example. “a dog.” * constant_caption=”in szn style”
-> “a dog. in szn style”
Args:¶
constant_caption (str): constant_caption to add. keys (List[str], optional): keys to apply augmentation from results.
Defaults to None.
- Parameters:
constant_caption (str) –
keys (list[str] | None) –
- class diffengine.datasets.transforms.CenterCrop(*args, size, keys=None, **kwargs)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
CenterCrop.
- The difference from torchvision/CenterCrop is
1. save crop top left as ‘crop_top_left’ and crop_bottom_right in results
Args:¶
- size (sequence or int): Desired output size of the crop. If size is an
int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0])
keys (List[str]): keys to apply augmentation from results.
- Parameters:
size (collections.abc.Sequence[int] | int) –
keys (list[str] | None) –
- class diffengine.datasets.transforms.CLIPImageProcessor(key='img', output_key='clip_img', pretrained=None, subfolder=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
CLIPImageProcessor.
Args:¶
key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from
results. Defaults to ‘clip_img’.
- Parameters:
key (str) –
output_key (str) –
pretrained (str | None) –
subfolder (str | None) –
- class diffengine.datasets.transforms.ComputeaMUSEdMicroConds[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
Compute aMUSEd micro_conds as ‘micro_conds’ in results.
- class diffengine.datasets.transforms.ComputePixArtImgInfo[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
Compute Orig Height and Widh + Aspect Ratio.
Return ‘resolution’, ‘aspect_ratio’ in results
- class diffengine.datasets.transforms.ComputeTimeIds[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
Compute time ids as ‘time_ids’ in results.
- class diffengine.datasets.transforms.ConcatMultipleImgs(keys=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
ConcatMultipleImgs.
Args:¶
- keys (List[str], optional): keys to apply augmentation from results.
Defaults to None.
- Parameters:
keys (list[str] | None) –
- class diffengine.datasets.transforms.GetMaskedImage(key='masked_image')[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
GetMaskedImage.
Args:¶
- key (str): key to outputs.
Defaults to ‘masked_image’.
- Parameters:
key (str) –
- class diffengine.datasets.transforms.MaskToTensor(key='mask')[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
MaskToTensor.
Convert mask to tensor.
Transpose mask from (H, W, 1) to (1, H, W)
Args:¶
- key (str): key to apply augmentation from results.
Defaults to ‘mask’.
- Parameters:
key (str) –
- class diffengine.datasets.transforms.MultiAspectRatioResizeCenterCrop(*args, sizes, keys=None, interpolation='bilinear', **kwargs)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
Multi Aspect Ratio Resize and Center Crop.
Args:¶
- sizes (List[sequence]): List of desired output size of the crop.
Sequence like (h, w).
keys (List[str]): keys to apply augmentation from results. interpolation (str): Desired interpolation enum defined by
torchvision.transforms.InterpolationMode. Defaults to ‘bilinear’.
- Parameters:
sizes (list[collections.abc.Sequence[int]]) –
keys (list[str] | None) –
interpolation (str) –
- class diffengine.datasets.transforms.RandomCrop(*args, size, keys=None, force_same_size=True, **kwargs)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
RandomCrop.
- The difference from torchvision/RandomCrop is
1. save crop top left as ‘crop_top_left’ and crop_bottom_right in results 2. apply same random parameters to multiple keys like [‘img’, ‘condition_img’].
Args:¶
- size (sequence or int): Desired output size of the crop. If size is an
int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0])
keys (List[str]): keys to apply augmentation from results. force_same_size (bool): Force same size for all keys. Defaults to True.
- Parameters:
size (collections.abc.Sequence[int] | int) –
keys (list[str] | None) –
force_same_size (bool) –
- class diffengine.datasets.transforms.RandomHorizontalFlip(*args, p=0.5, keys=None, **kwargs)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
RandomHorizontalFlip.
- The difference from torchvision/RandomHorizontalFlip is
update ‘crop_top_left’ and crop_bottom_right if exists.
2. apply same random parameters to multiple keys like [‘img’, ‘condition_img’].
Args:¶
- p (float): probability of the image being flipped.
Default value is 0.5.
keys (List[str]): keys to apply augmentation from results.
- Parameters:
p (float) –
keys (list[str] | None) –
- class diffengine.datasets.transforms.RandomTextDrop(p=0.1, keys=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
RandomTextDrop. Replace text to empty.
Args:¶
- p (float): probability of the image being flipped.
Default value is 0.5.
keys (List[str]): keys to apply augmentation from results.
- Parameters:
p (float) –
keys (list[str] | None) –
- class diffengine.datasets.transforms.SaveImageShape[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
Save image shape as ‘ori_img_shape’ in results.
- class diffengine.datasets.transforms.T5TextPreprocess(keys=None, *, clean_caption=True)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
T5 Text Preprocess.
Args:¶
keys (List[str]): keys to apply augmentation from results. clean_caption (bool): clean caption. Defaults to False.
Clean caption.
Copied from diffusers.pipelines.deepfloyd_if.pipeline_if.IFPipeline._clean_caption
- Parameters:
caption (str) –
- Return type:
str
- Parameters:
keys (list[str] | None) –
clean_caption (bool) –
- class diffengine.datasets.transforms.TimmImageProcessor(pretrained, key='img', output_key='clip_img')[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
TransformersImageProcessor.
Args:¶
pretrained (str): pretrained model name. key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from
results. Defaults to ‘clip_img’.
- Parameters:
pretrained (str) –
key (str) –
output_key (str) –
- class diffengine.datasets.transforms.TorchVisonTransformWrapper(transform, *args, keys=None, **kwargs)[source]¶
TorchVisonTransformWrapper.
We can use torchvision.transforms like dict(type=’torchvision/Resize’, size=512)
Args:¶
- transform (str): The name of transform. For example
torchvision/Resize.
keys (List[str]): keys to apply augmentation from results.
- Parameters:
keys (list[str] | None) –
- class diffengine.datasets.transforms.TransformersImageProcessor(key='img', output_key='clip_img', pretrained=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
TransformersImageProcessor.
Args:¶
key (str): key to apply augmentation from results. Defaults to ‘img’. output_key (str): output_key after applying augmentation from
results. Defaults to ‘clip_img’.
- Parameters:
key (str) –
output_key (str) –
pretrained (str | None) –
- class diffengine.datasets.transforms.RandomChoice(transforms, prob=None)[source]¶
Bases:
diffengine.datasets.transforms.base.BaseTransform
Process data with a randomly chosen transform from given candidates.
Copied from mmcv/transforms/wrappers.py.
Args:¶
- transforms (list[list]): A list of transform candidates, each is a
sequence of transforms.
- prob (list[float], optional): The probabilities associated
with each pipeline. The length should be equal to the pipeline number and the sum should be 1. If not given, a uniform distribution will be assumed.
Examples:¶
>>> # config >>> pipeline = [ >>> dict(type='RandomChoice', >>> transforms=[ >>> [dict(type='RandomHorizontalFlip')], # subpipeline 1 >>> [dict(type='RandomRotate')], # subpipeline 2 >>> ] >>> ) >>> ]
- Parameters:
transforms (list[Transform | list[Transform]]) –
prob (list[float] | None) –