diffengine.datasets

Subpackages

Submodules

Package Contents

Classes

HFControlNetDataset

Dataset for huggingface datasets.

HFDataset

Dataset for huggingface datasets.

HFDatasetPreComputeEmbs

Dataset for huggingface datasets.

HFDPODataset

DPO Dataset for huggingface datasets.

HFDreamBoothDataset

DreamBooth Dataset for huggingface datasets.

HFESDDatasetPreComputeEmbs

Huggingface Erasing Concepts from Diffusion Models Dataset.

class diffengine.datasets.HFControlNetDataset(dataset, image_column='image', condition_column='condition', caption_column='text', csv='metadata.csv', pipeline=(), cache_dir=None)[source]

Bases: torch.utils.data.Dataset

Dataset for huggingface datasets.

Args:

dataset (str): Dataset name or path to dataset. image_column (str): Image column name. Defaults to ‘image’. condition_column (str): Condition column name for ControlNet.

Defaults to ‘condition’.

caption_column (str): Caption column name. Defaults to ‘text’. csv (str): Caption csv file name when loading local folder.

Defaults to ‘metadata.csv’.

pipeline (Sequence): Processing pipeline. Defaults to an empty tuple. cache_dir (str, optional): The directory where the downloaded datasets

will be stored.Defaults to None.

__len__()[source]

Get the length of dataset.

Returns:

int

Return type:

The length of filtered dataset.

__getitem__(idx)[source]

Get item.

Get the idx-th image and data information of dataset after ``self.pipeline`.

Args:

idx (int): The index of self.data_list.

Returns:

dict: The idx-th image and data information of dataset after self.pipeline.

Parameters:

idx (int) –

Return type:

dict

Parameters:
  • dataset (str) –

  • image_column (str) –

  • condition_column (str) –

  • caption_column (str) –

  • csv (str) –

  • pipeline (collections.abc.Sequence) –

  • cache_dir (str | None) –

class diffengine.datasets.HFDataset(dataset, image_column='image', caption_column='text', csv='metadata.csv', pipeline=(), cache_dir=None)[source]

Bases: torch.utils.data.Dataset

Dataset for huggingface datasets.

Args:

dataset (str): Dataset name or path to dataset. image_column (str): Image column name. Defaults to ‘image’. caption_column (str): Caption column name. Defaults to ‘text’. csv (str): Caption csv file name when loading local folder.

Defaults to ‘metadata.csv’.

pipeline (Sequence): Processing pipeline. Defaults to an empty tuple. cache_dir (str, optional): The directory where the downloaded datasets

will be stored.Defaults to None.

__len__()[source]

Get the length of dataset.

Returns:

int

Return type:

The length of filtered dataset.

__getitem__(idx)[source]

Get item.

Get the idx-th image and data information of dataset after ``self.pipeline`.

Args:

idx (int): The index of self.data_list.

Returns:

dict: The idx-th image and data information of dataset after self.pipeline.

Parameters:

idx (int) –

Return type:

dict

Parameters:
  • dataset (str) –

  • image_column (str) –

  • caption_column (str) –

  • csv (str) –

  • pipeline (collections.abc.Sequence) –

  • cache_dir (str | None) –

class diffengine.datasets.HFDatasetPreComputeEmbs(*args, model='stabilityai/stable-diffusion-xl-base-1.0', text_hasher='text', device='cuda', proportion_empty_prompts=0.0, **kwargs)[source]

Bases: HFDataset

Dataset for huggingface datasets.

The difference from HFDataset is
  1. pre-compute Text Encoder embeddings to save memory.

Args:

model (str): pretrained model name of stable diffusion xl.

Defaults to ‘stabilityai/stable-diffusion-xl-base-1.0’.

text_hasher (str): Text embeddings hasher name. Defaults to ‘text’. device (str): Device used to compute embeddings. Defaults to ‘cuda’. proportion_empty_prompts (float): The probabilities to replace empty

text. Defaults to 0.9.

__getitem__(idx)[source]

Get item.

Get the idx-th image and data information of dataset after ``self.train_transforms`.

Args:

idx (int): The index of self.data_list.

Returns:

dict: The idx-th image and data information of dataset after self.train_transforms.

Parameters:

idx (int) –

Return type:

dict

Parameters:
  • model (str) –

  • text_hasher (str) –

  • device (str) –

  • proportion_empty_prompts (float) –

class diffengine.datasets.HFDPODataset(dataset, image_columns=None, caption_column='text', label_column='label_0', csv='metadata.csv', pipeline=(), split='train', cache_dir=None)[source]

Bases: torch.utils.data.Dataset

DPO Dataset for huggingface datasets.

Args:

dataset (str): Dataset name or path to dataset. image_columns (list[str]): Image column names. Defaults to [‘image’]. caption_column (str): Caption column name. Defaults to ‘text’. label_column (str): Label column name of whether image_columns[0] is

better than image_columns[1]. Defaults to ‘label_0’.

csv (str): Caption csv file name when loading local folder.

Defaults to ‘metadata.csv’.

pipeline (Sequence): Processing pipeline. Defaults to an empty tuple. split (str): Dataset split. Defaults to ‘train’. cache_dir (str, optional): The directory where the downloaded datasets

will be stored.Defaults to None.

__len__()[source]

Get the length of dataset.

Returns:

int

Return type:

The length of filtered dataset.

__getitem__(idx)[source]

Get item.

Get the idx-th image and data information of dataset after ``self.pipeline`.

Args:

idx (int): The index of self.data_list.

Returns:

dict: The idx-th image and data information of dataset after self.pipeline.

Parameters:

idx (int) –

Return type:

dict

Parameters:
  • dataset (str) –

  • image_columns (list[str] | None) –

  • caption_column (str) –

  • label_column (str) –

  • csv (str) –

  • pipeline (collections.abc.Sequence) –

  • split (str) –

  • cache_dir (str | None) –

class diffengine.datasets.HFDreamBoothDataset(dataset, instance_prompt, image_column='image', dataset_sub_dir=None, class_image_config=None, class_prompt=None, pipeline=(), csv=None, cache_dir=None)[source]

Bases: torch.utils.data.Dataset

DreamBooth Dataset for huggingface datasets.

Args:

dataset (str): Dataset name. instance_prompt (str):

The prompt with identifier specifying the instance.

image_column (str): Image column name. Defaults to ‘image’. dataset_sub_dir (optional, str): Dataset sub directory name. class_image_config (dict):

model (str): pretrained model name of stable diffusion to

create training data of class images. Defaults to ‘runwayml/stable-diffusion-v1-5’.

data_dir (str): A folder containing the training data of class

images. Defaults to ‘work_dirs/class_image’.

num_images (int): Minimal class images for prior preservation

loss. If there are not enough images already present in class_data_dir, additional images will be sampled with class_prompt. Defaults to 200.

recreate_class_images (bool): Whether to re create all class

images. Defaults to True.

class_prompt (Optional[str]): The prompt to specify images in the same

class as provided instance images. Defaults to None.

pipeline (Sequence): Processing pipeline. Defaults to an empty tuple. csv (str, optional): Image path csv file name when loading local

folder. If None, the dataset will be loaded from image folders. Defaults to None.

cache_dir (str, optional): The directory where the downloaded datasets

will be stored.Defaults to None.

default_class_image_config :dict
generate_class_image(class_image_config)[source]

Generate class images for prior preservation loss.

Parameters:

class_image_config (dict) –

Return type:

None

__len__()[source]

Get the length of dataset.

Returns:

int

Return type:

The length of filtered dataset.

__getitem__(idx)[source]

Get item.

Get the idx-th image and data information of dataset after ``self.pipeline`.

Args:

idx (int): The index of self.data_list.

Returns:

dict: The idx-th image and data information of dataset after self.pipeline.

Parameters:

idx (int) –

Return type:

dict

Parameters:
  • dataset (str) –

  • instance_prompt (str) –

  • image_column (str) –

  • dataset_sub_dir (str | None) –

  • class_image_config (dict | None) –

  • class_prompt (str | None) –

  • pipeline (collections.abc.Sequence) –

  • csv (str | None) –

  • cache_dir (str | None) –

class diffengine.datasets.HFESDDatasetPreComputeEmbs(forget_caption, model='stabilityai/stable-diffusion-xl-base-1.0', device='cuda', pipeline=())[source]

Bases: torch.utils.data.Dataset

Huggingface Erasing Concepts from Diffusion Models Dataset.

Dataset of huggingface datasets for Erasing Concepts from Diffusion Models.

Args:

forget_caption (str): The caption used to forget. model (str): pretrained model name of stable diffusion xl.

Defaults to ‘stabilityai/stable-diffusion-xl-base-1.0’.

device (str): Device used to compute embeddings. Defaults to ‘cuda’. pipeline (Sequence): Processing pipeline. Defaults to an empty tuple.

__len__()[source]

Get the length of dataset.

Returns:

int

Return type:

The length of filtered dataset.

__getitem__(idx)[source]

Get the dataset after ``self.pipeline`.

Args:

idx (int): The index.

Returns:

dict: The idx-th data information of dataset after self.pipeline.

Parameters:

idx (int) –

Return type:

dict

Parameters:
  • forget_caption (str) –

  • model (str) –

  • device (str) –

  • pipeline (collections.abc.Sequence) –