torchwrench.extras.hdf package¶

class torchwrench.extras.hdf.HDFDataset(hdf_fpath: str | ~pathlib.Path, *, transform: ~typing.Callable[[__SPHINX_IMMATERIAL_TYPE_VAR__V_T], __SPHINX_IMMATERIAL_TYPE_VAR__V_U] | None = <function identity>, keep_padding: ~typing.Iterable[str] = (), return_added_columns: bool = False, open_hdf: bool = True, cast: ~typing.Literal['to_torch_or_builtin', 'to_torch_or_numpy', 'as_builtin', 'to_numpy_src', 'to_torch_src', 'none'] = 'none', file_kwds: ~typing.Dict[str, ~typing.Any] | None = None)[source]¶

Bases: Generic[T, U], DatasetSlicer[U]

property added_columns : list[str]¶: Return the list of columns added by pack_to_hdf function.

property all_columns : list[str]¶: The name of all columns of the dataset.

at(*args, **kwargs) → Any[source]¶: Deprecated: Use get_item method instead.

property attrs : HDFDatasetAttributes¶

close(ignore_if_closed: bool = False, remove_file: bool = False) → None[source]¶

property column_names : tuple[str, ...]¶: The name of each column of the dataset.

get_attrs() → HDFDatasetAttributes[source]¶

get_column_dtype(column_name: str) → dtype[source]¶

get_column_shape(column_name: str) → tuple[int, ...][source]¶

get_columns_shapes() → dict[str, tuple[int, ...]][source]¶

get_hdf_fpath() → Path[source]¶

get_hdf_keys() → tuple[str, ...][source]¶

get_item(index: int, column: None = None) → U[source]¶
get_item(index: Iterable[int] | slice | None, column: str) → list
get_item(index: Iterable[int] | slice | None, column: list[str] | None = None) → dict[str, list]
get_item(index: Any, column: Any, raw: bool = False) → Any

property info : dict[str, Any]¶: Return the global dataset info.

is_closed() → bool[source]¶

is_open() → bool[source]¶

property item_type : 'dict' | 'tuple'¶: Return the global dataset info.

property keep_padding : list[str]¶

keys() → tuple[str, ...][source]¶

property num_columns : int¶

property num_rows : int¶

open(ignore_if_opened: bool = False) → None[source]¶

property shape : tuple[int, ...]¶: The shape of the Clotho dataset.

to_dict(raw: bool = False) → dict[str, ndarray][source]¶

property transform : Callable[[T], U] | None¶

property user_attrs : Any¶

torchwrench.extras.hdf.h5py_is_available() → bool[source]¶

torchwrench.extras.hdf.pack_to_hdf(dataset: ~pythonwrench.typing.classes.SupportsGetitemLen[__SPHINX_IMMATERIAL_TYPE_VAR__V_T_DictOrTuple, ~typing.Any] | ~pythonwrench.typing.classes.SupportsIterLen[__SPHINX_IMMATERIAL_TYPE_VAR__V_T_DictOrTuple] | ~typing.Mapping[str, ~pythonwrench.typing.classes.SupportsGetitemLen], hdf_fpath: str | ~pathlib.Path, pre_transform: ~typing.Callable[[__SPHINX_IMMATERIAL_TYPE_VAR__V_T_DictOrTuple], __SPHINX_IMMATERIAL_TYPE_VAR__V_T_DictOrTuple] | None = <function identity>, *, batch_size: int = 32, num_workers: int | ~typing.Literal['auto'] = 'auto', skip_scan: bool = False, encoding: str = 'utf-8', file_kwds: ~typing.Dict[str, ~typing.Any] | None = None, col_kwds: ~typing.Dict[str, ~typing.Any] | None = None, shape_suffix: str = '__shape', store_str_as_vlen: bool = False, user_attrs: ~typing.Any = None, exists: ~typing.Literal['overwrite', 'skip', 'error'] = 'error', ds_kwds: ~typing.Dict[str, ~typing.Any] | None = None, verbose: int = 0) → HDFDataset[T_DictOrTuple, T_DictOrTuple][source]¶

Pack a dataset to HDF file.

Args:

dataset: The sized dataset to pack. Must be sized and all items must be of dict type.: The key of each dictionaries are strings and values can be int, float, str, Tensor, non-empty List[int], non-empty List[float], non-empty List[str]. If values are tensors or lists, the number of dimensions must be the same for all items in the dataset.

hdf_fpath: The path to the HDF file. pre_transform: The optional transform to apply to audio returned by the dataset BEFORE storing it in HDF file.

Can be used for deterministic transforms like Resample, LogMelSpectrogram, etc. defaults to None.

batch_size: The batch size of the dataloader. defaults to 32. num_workers: The number of workers of the dataloader.

If “auto”, it will be set to len(os.sched_getaffinity(0)). defaults to “auto”.

skip_scan: If True, the input dataset will be considered as fully homogeneous, which means that all columns values contains the same shape and dtype, which will be inferred from the first batch.: It is meant to skip the first step which scans each dataset item once and speed up packing to HDF file. defaults to False.

encoding: String encoding used in file. defaults to “utf-8”. file_kwds: Options given to h5py.File object. defaults to None. col_kwds: Options given to all dataset columns, i.e. h5py.File().create_dataset(.) method. defaults to None. shape_suffix: Shape column suffix in HDF file. defaults to “_shape”. store_str_as_vlen: If True, store strings as variable length string dtype. defaults to False. user_attrs: Additional metadata to add to the hdf file. It must be convertible to JSON with json.dumps. defaults to None.

exists: Determine which action should be performed if the target HDF file already exists.: “overwrite”: Replace the target file then pack dataset. “skip”: Skip this function and returns the packed dataset. “error”: Raises a ValueError.

ds_kwds: Keywords arguments passed to the returned HDFDataset instance if the target file already exists and if exists == “skip”. verbose: Verbose level. defaults to 0.

Returns:

hdf_dataset: The target HDF dataset object.

torchwrench.extras.hdf package¶

Submodules¶