torchwrench’s documentation¶
Collection of functions and modules to help development in PyTorch.
Useful links:
Key features¶
torchwrench functions and modules can be used like torch ones. The default acronym for torchwrench is tw.
Label conversions¶
Supports multiclass labels conversions between probabilities, classes indices, classes names and onehot encoding.
import torchwrench as tw
probs = tw.as_tensor([[0.9, 0.1], [0.4, 0.6]])
names = tw.probs_to_name(probs, idx_to_name={0: "Cat", 1: "Dog"})
# ["Cat", "Dog"]
This package also supports multilabel labels conversions between probabilities, classes multi-indices, classes multi-names and multihot encoding.
import torchwrench as tw
multihot = tw.as_tensor([[1, 0, 0], [0, 1, 1], [0, 0, 0]])
indices = tw.multihot_to_indices(multihot)
# [[0], [1, 2], []]
Finally, this packages includes the powerset multilabel conversions :
import torchwrench as tw
multihot = tw.as_tensor([[1, 0, 0], [0, 1, 1], [0, 0, 0]])
indices = tw.multilabel_to_powerset(multihot, num_classes=3, max_set_size=2)
# tensor([[0, 1, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 1],
# [1, 0, 0, 0, 0, 0, 0]])
Typing¶
Typing with number of dimensions :
import torchwrench as tw
x1 = tw.as_tensor([1, 2])
print(isinstance(x1, tw.Tensor2D)) # False
x2 = tw.as_tensor([[1, 2], [3, 4]])
print(isinstance(x2, tw.Tensor2D)) # True
Typing with tensor dtype :
import torchwrench as tw
x1 = tw.as_tensor([1, 2], dtype=tw.int)
print(isinstance(x1, tw.SignedIntegerTensor)) # True
x2 = tw.as_tensor([1, 2], dtype=tw.long)
print(isinstance(x2, tw.SignedIntegerTensor1D)) # True
x3 = tw.as_tensor([1, 2], dtype=tw.float)
print(isinstance(x3, tw.SignedIntegerTensor)) # False
Padding & cropping¶
Pad a specific dimension :
import torchwrench as tw
x = tw.rand(10, 3, 1)
padded = tw.pad_dim(x, target_length=5, dim=1, pad_value=-1)
# x2 has shape (10, 5, 1), padded with -1
Pad nested list of tensors to a single one :
import torchwrench as tw
tensors = [tw.rand(10, 2), [tw.rand(3)] * 5, tw.rand(0, 5)]
padded = tw.pad_and_stack_rec(tensors, pad_value=0)
# padded has shape (3, 10, 5), padded with 0
Remove values at a specific dimension :
import torchwrench as tw
x = tw.rand(10, 5, 3)
cropped = tw.crop_dim(x, dim=1, target_length=2)
# cropped has shape (10, 2, 3)
Masking¶
import torchwrench as tw
x = tw.as_tensor([3, 1, 2])
mask = tw.lengths_to_non_pad_mask(x, max_len=4)
# Each row i contains x[i] True values for non-padding mask
# tensor([[True, True, True, False],
# [True, False, False, False],
# [True, True, False, False]])
import torchwrench as tw
x = tw.as_tensor([1, 2, 3, 4])
mask = tw.as_tensor([True, True, False, False])
result = tw.masked_mean(x, mask)
# result contains the mean of the values marked as True: 1.5
Others tensors manipulations¶
import torchwrench as tw
x = tw.as_tensor([1, 2, 3, 4])
result = tw.insert_at_indices(x, indices=[0, 2], values=5)
# result contains tensor with inserted values: tensor([5, 1, 2, 5, 3, 4])
import torchwrench as tw
perm = tw.randperm(10)
inv_perm = tw.get_inverse_perm(perm)
x1 = tw.rand(10)
x2 = x1[perm]
x3 = x2[inv_perm]
# inv_perm are indices that allow us to get x3 from x2, i.e. x1 == x3 here
Pre-compute datasets to HDF files¶
Here is an example of pre-computing spectrograms of torchaudio SPEECHCOMMANDS dataset, using pack_dataset function:
from torchaudio.datasets import SPEECHCOMMANDS
from torchaudio.transforms import Spectrogram
from torchwrench import nn
from torchwrench.extras.hdf import pack_to_hdf
speech_commands_root = "path/to/speech_commands"
packed_root = "path/to/packed_dataset.hdf"
dataset = SPEECHCOMMANDS(speech_commands_root, download=True, subset="validation")
# dataset[0] is a tuple, contains waveform and other metadata
class MyTransform(nn.Module):
def __init__(self) -> None:
super().__init__()
self.spectrogram_extractor = Spectrogram()
def forward(self, item):
waveform = item[0]
spectrogram = self.spectrogram_extractor(waveform)
return (spectrogram,) + item[1:]
pack_to_hdf(dataset, packed_root, MyTransform())
Then you can load the pre-computed dataset using HDFDataset:
from torchwrench.extras.hdf import HDFDataset
packed_root = "path/to/packed_dataset.hdf"
packed_dataset = HDFDataset(packed_root)
packed_dataset[0] # == first transformed item, i.e. transform(dataset[0])
Contact¶
Maintainer: - [Étienne Labbé](https://labbeti.github.io/) “Labbeti”: labbeti.pub@gmail.com