torchwrench.nn package¶
- class torchwrench.nn.Abs(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
abs().- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.AdaptiveAvgPool1d(output_size: int | None | tuple[int | None, ...])[source]¶
Bases:
_AdaptiveAvgPoolNdApplies a 1D adaptive average pooling over an input signal composed of several input planes.
The output size is \(L_{out}\), for any input size. The number of output features is equal to the number of input planes.
- Args:
output_size: the target output size \(L_{out}\).
- Shape:
Input: \((N, C, L_{in})\) or \((C, L_{in})\).
Output: \((N, C, L_{out})\) or \((C, L_{out})\), where \(L_{out}=\text{output\_size}\).
- Examples:
>>> # target output size of 5 >>> m = nn.AdaptiveAvgPool1d(5) >>> input = torch.randn(1, 64, 8) >>> output = m(input)
- class torchwrench.nn.AdaptiveAvgPool2d(output_size: int | None | tuple[int | None, ...])[source]¶
Bases:
_AdaptiveAvgPoolNdApplies a 2D adaptive average pooling over an input signal composed of several input planes.
The output is of size H x W, for any input size. The number of output features is equal to the number of input planes.
- Args:
- output_size: the target output size of the image of the form H x W.
Can be a tuple (H, W) or a single H for a square image H x H. H and W can be either a
int, orNonewhich means the size will be the same as that of the input.
- Shape:
Input: \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\).
Output: \((N, C, S_{0}, S_{1})\) or \((C, S_{0}, S_{1})\), where \(S=\text{output\_size}\).
- Examples:
>>> # target output size of 5x7 >>> m = nn.AdaptiveAvgPool2d((5, 7)) >>> input = torch.randn(1, 64, 8, 9) >>> output = m(input) >>> # target output size of 7x7 (square) >>> m = nn.AdaptiveAvgPool2d(7) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input) >>> # target output size of 10x7 >>> m = nn.AdaptiveAvgPool2d((None, 7)) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input)
- class torchwrench.nn.AdaptiveAvgPool3d(output_size: int | None | tuple[int | None, ...])[source]¶
Bases:
_AdaptiveAvgPoolNdApplies a 3D adaptive average pooling over an input signal composed of several input planes.
The output is of size D x H x W, for any input size. The number of output features is equal to the number of input planes.
- Args:
- output_size: the target output size of the form D x H x W.
Can be a tuple (D, H, W) or a single number D for a cube D x D x D. D, H and W can be either a
int, orNonewhich means the size will be the same as that of the input.
- Shape:
Input: \((N, C, D_{in}, H_{in}, W_{in})\) or \((C, D_{in}, H_{in}, W_{in})\).
Output: \((N, C, S_{0}, S_{1}, S_{2})\) or \((C, S_{0}, S_{1}, S_{2})\), where \(S=\text{output\_size}\).
- Examples:
>>> # target output size of 5x7x9 >>> m = nn.AdaptiveAvgPool3d((5, 7, 9)) >>> input = torch.randn(1, 64, 8, 9, 10) >>> output = m(input) >>> # target output size of 7x7x7 (cube) >>> m = nn.AdaptiveAvgPool3d(7) >>> input = torch.randn(1, 64, 10, 9, 8) >>> output = m(input) >>> # target output size of 7x9x8 >>> m = nn.AdaptiveAvgPool3d((7, None, None)) >>> input = torch.randn(1, 64, 10, 9, 8) >>> output = m(input)
-
class torchwrench.nn.AdaptiveLogSoftmaxWithLoss(in_features: int, n_classes: int, cutoffs: Sequence[int], div_value: float =
4.0, head_bias: bool =False, device=None, dtype=None)[source]¶ Bases:
ModuleEfficient softmax approximation.
As described in Efficient softmax approximation for GPUs by Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, and Hervé Jégou.
Adaptive softmax is an approximate strategy for training models with large output spaces. It is most effective when the label distribution is highly imbalanced, for example in natural language modelling, where the word frequency distribution approximately follows the Zipf’s law.
Adaptive softmax partitions the labels into several clusters, according to their frequency. These clusters may contain different number of targets each. Additionally, clusters containing less frequent labels assign lower dimensional embeddings to those labels, which speeds up the computation. For each minibatch, only clusters for which at least one target is present are evaluated.
The idea is that the clusters which are accessed frequently (like the first one, containing most frequent labels), should also be cheap to compute – that is, contain a small number of assigned labels.
We highly recommend taking a look at the original paper for more details.
cutoffsshould be an ordered Sequence of integers sorted in the increasing order. It controls number of clusters and the partitioning of targets into clusters. For example settingcutoffs = [10, 100, 1000]means that first 10 targets will be assigned to the ‘head’ of the adaptive softmax, targets 11, 12, …, 100 will be assigned to the first cluster, and targets 101, 102, …, 1000 will be assigned to the second cluster, while targets 1001, 1002, …, n_classes - 1 will be assigned to the last, third cluster.div_valueis used to compute the size of each additional cluster, which is given as \(\left\lfloor\frac{\texttt{in\_features}}{\texttt{div\_value}^{idx}}\right\rfloor\), where \(idx\) is the cluster index (with clusters for less frequent words having larger indices, and indices starting from \(1\)).head_biasif set to True, adds a bias term to the ‘head’ of the adaptive softmax. See paper for details. Set to False in the official implementation.
Warning
Labels passed as inputs to this module should be sorted according to their frequency. This means that the most frequent label should be represented by the index 0, and the least frequent label should be represented by the index n_classes - 1.
Note
This module returns a
NamedTuplewithoutputandlossfields. See further documentation for details.Note
To compute log-probabilities for all classes, the
log_probmethod can be used.- Args:
in_features (int): Number of features in the input tensor n_classes (int): Number of classes in the dataset cutoffs (Sequence): Cutoffs used to assign targets to their buckets div_value (float, optional): value used as an exponent to compute sizes
of the clusters. Default: 4.0
- head_bias (bool, optional): If
True, adds a bias term to the ‘head’ of the adaptive softmax. Default:
False
- head_bias (bool, optional): If
- Returns:
NamedTuplewithoutputandlossfields:output is a Tensor of size
Ncontaining computed target log probabilities for each exampleloss is a Scalar representing the computed negative log likelihood loss
- Shape:
input: \((N, \texttt{in\_features})\) or \((\texttt{in\_features})\)
target: \((N)\) or \(()\) where each value satisfies \(0 <= \texttt{target[i]} <= \texttt{n\_classes}\)
output1: \((N)\) or \(()\)
output2:
Scalar
- log_prob(input: Tensor) Tensor[source]¶
Compute log probabilities for all \(\texttt{n\_classes}\).
- Args:
input (Tensor): a minibatch of examples
- Returns:
log-probabilities of for each class \(c\) in range \(0 <= c <= \texttt{n\_classes}\), where \(\texttt{n\_classes}\) is a parameter passed to
AdaptiveLogSoftmaxWithLossconstructor.- Shape:
Input: \((N, \texttt{in\_features})\)
Output: \((N, \texttt{n\_classes})\)
- predict(input: Tensor) Tensor[source]¶
Return the class with the highest probability for each example in the input minibatch.
This is equivalent to
self.log_prob(input).argmax(dim=1), but is more efficient in some cases.- Args:
input (Tensor): a minibatch of examples
- Returns:
output (Tensor): a class with the highest probability for each example
- Shape:
Input: \((N, \texttt{in\_features})\)
Output: \((N)\)
- tail : ModuleList¶
-
class torchwrench.nn.AdaptiveMaxPool1d(output_size: int | None | tuple[int | None, ...], return_indices: bool =
False)[source]¶ Bases:
_AdaptiveMaxPoolNdApplies a 1D adaptive max pooling over an input signal composed of several input planes.
The output size is \(L_{out}\), for any input size. The number of output features is equal to the number of input planes.
- Args:
output_size: the target output size \(L_{out}\). return_indices: if
True, will return the indices along with the outputs.Useful to pass to nn.MaxUnpool1d. Default:
False- Shape:
Input: \((N, C, L_{in})\) or \((C, L_{in})\).
Output: \((N, C, L_{out})\) or \((C, L_{out})\), where \(L_{out}=\text{output\_size}\).
- Examples:
>>> # target output size of 5 >>> m = nn.AdaptiveMaxPool1d(5) >>> input = torch.randn(1, 64, 8) >>> output = m(input)
-
class torchwrench.nn.AdaptiveMaxPool2d(output_size: int | None | tuple[int | None, ...], return_indices: bool =
False)[source]¶ Bases:
_AdaptiveMaxPoolNdApplies a 2D adaptive max pooling over an input signal composed of several input planes.
The output is of size \(H_{out} \times W_{out}\), for any input size. The number of output features is equal to the number of input planes.
- Args:
- output_size: the target output size of the image of the form \(H_{out} \times W_{out}\).
Can be a tuple \((H_{out}, W_{out})\) or a single \(H_{out}\) for a square image \(H_{out} \times H_{out}\). \(H_{out}\) and \(W_{out}\) can be either a
int, orNonewhich means the size will be the same as that of the input.- return_indices: if
True, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool2d. Default:
False
- Shape:
Input: \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\).
Output: \((N, C, H_{out}, W_{out})\) or \((C, H_{out}, W_{out})\), where \((H_{out}, W_{out})=\text{output\_size}\).
- Examples:
>>> # target output size of 5x7 >>> m = nn.AdaptiveMaxPool2d((5, 7)) >>> input = torch.randn(1, 64, 8, 9) >>> output = m(input) >>> # target output size of 7x7 (square) >>> m = nn.AdaptiveMaxPool2d(7) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input) >>> # target output size of 10x7 >>> m = nn.AdaptiveMaxPool2d((None, 7)) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input)
-
class torchwrench.nn.AdaptiveMaxPool3d(output_size: int | None | tuple[int | None, ...], return_indices: bool =
False)[source]¶ Bases:
_AdaptiveMaxPoolNdApplies a 3D adaptive max pooling over an input signal composed of several input planes.
The output is of size \(D_{out} \times H_{out} \times W_{out}\), for any input size. The number of output features is equal to the number of input planes.
- Args:
- output_size: the target output size of the image of the form \(D_{out} \times H_{out} \times W_{out}\).
Can be a tuple \((D_{out}, H_{out}, W_{out})\) or a single \(D_{out}\) for a cube \(D_{out} \times D_{out} \times D_{out}\). \(D_{out}\), \(H_{out}\) and \(W_{out}\) can be either a
int, orNonewhich means the size will be the same as that of the input.- return_indices: if
True, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool3d. Default:
False
- Shape:
Input: \((N, C, D_{in}, H_{in}, W_{in})\) or \((C, D_{in}, H_{in}, W_{in})\).
Output: \((N, C, D_{out}, H_{out}, W_{out})\) or \((C, D_{out}, H_{out}, W_{out})\), where \((D_{out}, H_{out}, W_{out})=\text{output\_size}\).
- Examples:
>>> # target output size of 5x7x9 >>> m = nn.AdaptiveMaxPool3d((5, 7, 9)) >>> input = torch.randn(1, 64, 8, 9, 10) >>> output = m(input) >>> # target output size of 7x7x7 (cube) >>> m = nn.AdaptiveMaxPool3d(7) >>> input = torch.randn(1, 64, 10, 9, 8) >>> output = m(input) >>> # target output size of 7x9x8 >>> m = nn.AdaptiveMaxPool3d((7, None, None)) >>> input = torch.randn(1, 64, 10, 9, 8) >>> output = m(input)
-
class torchwrench.nn.AlphaDropout(p: float =
0.5, inplace: bool =False)[source]¶ Bases:
_DropoutNdApplies Alpha Dropout over the input.
Alpha Dropout is a type of Dropout that maintains the self-normalizing property. For an input with zero mean and unit standard deviation, the output of Alpha Dropout maintains the original mean and standard deviation of the input. Alpha Dropout goes hand-in-hand with SELU activation function, which ensures that the outputs have zero mean and unit standard deviation.
During training, it randomly masks some of the elements of the input tensor with probability p using samples from a bernoulli distribution. The elements to masked are randomized on every forward call, and scaled and shifted to maintain zero mean and unit standard deviation.
During evaluation the module simply computes an identity function.
More details can be found in the paper Self-Normalizing Neural Networks .
- Args:
p (float): probability of an element to be dropped. Default: 0.5 inplace (bool, optional): If set to
True, will do this operationin-place
- Shape:
Input: \((*)\). Input can be of any shape
Output: \((*)\). Output is of the same shape as input
Examples:
>>> m = nn.AlphaDropout(p=0.2) >>> input = torch.randn(20, 16) >>> output = m(input)
- class torchwrench.nn.Angle(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
angle().- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.AsTensor(*, device: device | None | 'default' | 'cuda_if_available' | str | int =
None, dtype: dtype | None | 'default' | str | DTypeEnum =None)[source]¶ Bases:
ModuleModule version of
as_tensor().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Any) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.AvgPool1d(kernel_size: int | tuple[int], stride: int | tuple[int] =
None, padding: int | tuple[int] =0, ceil_mode: bool =False, count_include_pad: bool =True)[source]¶ Bases:
_AvgPoolNdApplies a 1D average pooling over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C, L)\), output \((N, C, L_{out})\) and
kernel_size\(k\) can be precisely described as:\[\text{out}(N_i, C_j, l) = \frac{1}{k} \sum_{m=0}^{k-1} \text{input}(N_i, C_j, \text{stride} \times l + m)\]If
paddingis non-zero, then the input is implicitly zero-padded on both sides forpaddingnumber of points.- Note:
When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.
Note
pad should be at most half of effective kernel size.
The parameters
kernel_size,stride,paddingcan each be anintor a one-element tuple.- Args:
kernel_size: the size of the window stride: the stride of the window. Default value is
kernel_sizepadding: implicit zero padding to be added on both sides ceil_mode: when True, will use ceil instead of floor to compute the output shape count_include_pad: when True, will include the zero-padding in the averaging calculation- Shape:
Input: \((N, C, L_{in})\) or \((C, L_{in})\).
Output: \((N, C, L_{out})\) or \((C, L_{out})\), where
\[L_{out} = \left\lfloor \frac{L_{in} + 2 \times \text{padding} - \text{kernel\_size}}{\text{stride}} + 1\right\rfloor\]Per the note above, if
ceil_modeis True and \((L_{out} - 1) \times \text{stride} \geq L_{in} + \text{padding}\), we skip the last window as it would start in the right padded region, resulting in \(L_{out}\) being reduced by one.
Examples:
>>> # pool with window of size=3, stride=2 >>> m = nn.AvgPool1d(3, stride=2) >>> m(torch.tensor([[[1., 2, 3, 4, 5, 6, 7]]])) tensor([[[2., 4., 6.]]])
-
class torchwrench.nn.AvgPool2d(kernel_size: int | tuple[int, int], stride: int | tuple[int, int] | None =
None, padding: int | tuple[int, int] =0, ceil_mode: bool =False, count_include_pad: bool =True, divisor_override: int | None =None)[source]¶ Bases:
_AvgPoolNdApplies a 2D average pooling over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C, H, W)\), output \((N, C, H_{out}, W_{out})\) and
kernel_size\((kH, kW)\) can be precisely described as:\[out(N_i, C_j, h, w) = \frac{1}{kH * kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input(N_i, C_j, stride[0] \times h + m, stride[1] \times w + n)\]If
paddingis non-zero, then the input is implicitly zero-padded on both sides forpaddingnumber of points.- Note:
When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.
Note
pad should be at most half of effective kernel size.
The parameters
kernel_size,stride,paddingcan either be:a single
intor a single-element tuple – in which case the same value is used for the height and width dimensiona
tupleof two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
- Args:
kernel_size: the size of the window stride: the stride of the window. Default value is
kernel_sizepadding: implicit zero padding to be added on both sides ceil_mode: when True, will use ceil instead of floor to compute the output shape count_include_pad: when True, will include the zero-padding in the averaging calculation divisor_override: if specified, it will be used as divisor, otherwise size of the pooling region will be used.- Shape:
Input: \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\).
Output: \((N, C, H_{out}, W_{out})\) or \((C, H_{out}, W_{out})\), where
\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{kernel\_size}[0]}{\text{stride}[0]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{kernel\_size}[1]}{\text{stride}[1]} + 1\right\rfloor\]Per the note above, if
ceil_modeis True and \((H_{out} - 1)\times \text{stride}[0]\geq H_{in} + \text{padding}[0]\), we skip the last window as it would start in the bottom padded region, resulting in \(H_{out}\) being reduced by one.The same applies for \(W_{out}\).
Examples:
>>> # pool of square window of size=3, stride=2 >>> m = nn.AvgPool2d(3, stride=2) >>> # pool of non-square window >>> m = nn.AvgPool2d((3, 2), stride=(2, 1)) >>> input = torch.randn(20, 16, 50, 32) >>> output = m(input)
-
class torchwrench.nn.AvgPool3d(kernel_size: int | tuple[int, int, int], stride: int | tuple[int, int, int] | None =
None, padding: int | tuple[int, int, int] =0, ceil_mode: bool =False, count_include_pad: bool =True, divisor_override: int | None =None)[source]¶ Bases:
_AvgPoolNdApplies a 3D average pooling over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C, D, H, W)\), output \((N, C, D_{out}, H_{out}, W_{out})\) and
kernel_size\((kD, kH, kW)\) can be precisely described as:\[\begin{split}\begin{aligned} \text{out}(N_i, C_j, d, h, w) ={} & \sum_{k=0}^{kD-1} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} \\ & \frac{\text{input}(N_i, C_j, \text{stride}[0] \times d + k, \text{stride}[1] \times h + m, \text{stride}[2] \times w + n)} {kD \times kH \times kW} \end{aligned}\end{split}\]If
paddingis non-zero, then the input is implicitly zero-padded on all three sides forpaddingnumber of points.- Note:
When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.
Note
pad should be at most half of effective kernel size.
The parameters
kernel_size,stridecan either be:a single
int– in which case the same value is used for the depth, height and width dimensiona
tupleof three ints – in which case, the first int is used for the depth dimension, the second int for the height dimension and the third int for the width dimension
- Args:
kernel_size: the size of the window stride: the stride of the window. Default value is
kernel_sizepadding: implicit zero padding to be added on all three sides ceil_mode: when True, will use ceil instead of floor to compute the output shape count_include_pad: when True, will include the zero-padding in the averaging calculation divisor_override: if specified, it will be used as divisor, otherwisekernel_sizewill be used- Shape:
Input: \((N, C, D_{in}, H_{in}, W_{in})\) or \((C, D_{in}, H_{in}, W_{in})\).
Output: \((N, C, D_{out}, H_{out}, W_{out})\) or \((C, D_{out}, H_{out}, W_{out})\), where
\[D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{kernel\_size}[0]}{\text{stride}[0]} + 1\right\rfloor\]\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{kernel\_size}[1]}{\text{stride}[1]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{kernel\_size}[2]}{\text{stride}[2]} + 1\right\rfloor\]Per the note above, if
ceil_modeis True and \((D_{out} - 1)\times \text{stride}[0]\geq D_{in} + \text{padding}[0]\), we skip the last window as it would start in the padded region, resulting in \(D_{out}\) being reduced by one.The same applies for \(W_{out}\) and \(H_{out}\).
Examples:
>>> # pool of square window of size=3, stride=2 >>> m = nn.AvgPool3d(3, stride=2) >>> # pool of non-square window >>> m = nn.AvgPool3d((3, 2, 2), stride=(2, 1, 2)) >>> input = torch.randn(20, 16, 50, 44, 31) >>> output = m(input)
-
class torchwrench.nn.BCELoss(weight: Tensor | None =
None, size_average=None, reduce=None, reduction: str ='mean')[source]¶ Bases:
_WeightedLossCreates a criterion that measures the Binary Cross Entropy between the target and the input probabilities:
The unreduced (i.e. with
reductionset to'none') loss can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right],\]where \(N\) is the batch size. If
reductionis not'none'(default'mean'), then\[\begin{split}\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the targets \(y\) should be numbers between 0 and 1.
Notice that if \(x_n\) is either 0 or 1, one of the log terms would be mathematically undefined in the above loss equation. PyTorch chooses to set \(\log (0) = -\infty\), since \(\lim_{x\to 0} \log (x) = -\infty\). However, an infinite term in the loss equation is not desirable for several reasons.
For one, if either \(y_n = 0\) or \((1 - y_n) = 0\), then we would be multiplying 0 with infinity. Secondly, if we have an infinite loss value, then we would also have an infinite term in our gradient, since \(\lim_{x\to 0} \frac{d}{dx} \log (x) = \infty\). This would make BCELoss’s backward method nonlinear with respect to \(x_n\), and using it for things like linear regression would not be straight-forward.
Our solution is that BCELoss clamps its log function outputs to be greater than or equal to -100. This way, we can always have a finite loss value and a linear backward method.
- Args:
- weight (Tensor, optional): a manual rescaling weight given to the loss
of each batch element. If given, has to be a Tensor of size nbatch.
- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Target: \((*)\), same shape as the input.
Output: scalar. If
reductionis'none', then \((*)\), same shape as input.
Examples:
>>> m = nn.Sigmoid() >>> loss = nn.BCELoss() >>> input = torch.randn(3, 2, requires_grad=True) >>> target = torch.rand(3, 2, requires_grad=False) >>> output = loss(m(input), target) >>> output.backward()
-
class torchwrench.nn.BCEWithLogitsLoss(weight: Tensor | None =
None, size_average=None, reduce=None, reduction: str ='mean', pos_weight: Tensor | None =None)[source]¶ Bases:
_LossThis loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.
The unreduced (i.e. with
reductionset to'none') loss can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_n \left[ y_n \cdot \log \sigma(x_n) + (1 - y_n) \cdot \log (1 - \sigma(x_n)) \right],\]where \(N\) is the batch size. If
reductionis not'none'(default'mean'), then\[\begin{split}\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the targets t[i] should be numbers between 0 and 1.
It’s possible to trade off recall and precision by adding weights to positive examples. In the case of multi-label classification the loss can be described as:
\[\ell_c(x, y) = L_c = \{l_{1,c},\dots,l_{N,c}\}^\top, \quad l_{n,c} = - w_{n,c} \left[ p_c y_{n,c} \cdot \log \sigma(x_{n,c}) + (1 - y_{n,c}) \cdot \log (1 - \sigma(x_{n,c})) \right],\]where \(c\) is the class number (\(c > 1\) for multi-label binary classification, \(c = 1\) for single-label binary classification), \(n\) is the number of the sample in the batch and \(p_c\) is the weight of the positive answer for the class \(c\).
\(p_c > 1\) increases the recall, \(p_c < 1\) increases the precision.
For example, if a dataset contains 100 positive and 300 negative examples of a single class, then
pos_weightfor the class should be equal to \(\frac{300}{100}=3\). The loss would act as if the dataset contains \(3\times 100=300\) positive examples.Examples:
>>> target = torch.ones([10, 64], dtype=torch.float32) # 64 classes, batch size = 10 >>> output = torch.full([10, 64], 1.5) # A prediction (logit) >>> pos_weight = torch.ones([64]) # All weights are equal to 1 >>> criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight) >>> criterion(output, target) # -log(sigmoid(1.5)) tensor(0.20...)In the above example, the
pos_weighttensor’s elements correspond to the 64 distinct classes in a multi-label binary classification scenario. Each element inpos_weightis designed to adjust the loss function based on the imbalance between negative and positive samples for the respective class. This approach is useful in datasets with varying levels of class imbalance, ensuring that the loss calculation accurately accounts for the distribution in each class.- Args:
- weight (Tensor, optional): a manual rescaling weight given to the loss
of each batch element. If given, has to be a Tensor of size nbatch.
- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'- pos_weight (Tensor, optional): a weight of positive examples to be broadcasted with target.
Must be a tensor with equal size along the class dimension to the number of classes. Pay close attention to PyTorch’s broadcasting semantics in order to achieve the desired operations. For a target of size [B, C, H, W] (where B is batch size) pos_weight of size [B, C, H, W] will apply different pos_weights to each element of the batch or [C, H, W] the same pos_weights across the batch. To apply the same positive weight along all spatial dimensions for a 2D multi-class target [C, H, W] use: [C, 1, 1]. Default:
None
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Target: \((*)\), same shape as the input.
Output: scalar. If
reductionis'none', then \((*)\), same shape as input.
Examples:
>>> loss = nn.BCEWithLogitsLoss() >>> input = torch.randn(3, requires_grad=True) >>> target = torch.empty(3).random_(2) >>> output = loss(input, target) >>> output.backward()
-
class torchwrench.nn.BatchNorm1d(num_features: int, eps: float =
1e-05, momentum: float | None =0.1, affine: bool =True, track_running_stats: bool =True, device=None, dtype=None)[source]¶ Bases:
_BatchNormApplies Batch Normalization over a 2D or 3D input.
Method described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the number of features or channels of the input). By default, the elements of \(\gamma\) are set to 1 and the elements of \(\beta\) are set to 0. At train time in the forward pass, the variance is calculated via the biased estimator, equivalent to
torch.var(input, correction=0). However, the value stored in the moving average of the variance is calculated via the unbiased estimator, equivalent totorch.var(input, correction=1).Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default
momentumof 0.1.If
track_running_statsis set toFalse, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.Note
This
momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Because the Batch Normalization is done over the C dimension, computing statistics on (N, L) slices, it’s common terminology to call this Temporal Batch Normalization.
- Args:
num_features: number of features or channels \(C\) of the input eps: a value added to the denominator for numerical stability.
Default: 1e-5
- momentum: the value used for the running_mean and running_var
computation. Can be set to
Nonefor cumulative moving average (i.e. simple average). Default: 0.1- affine: a boolean value that when set to
True, this module has learnable affine parameters. Default:
True- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics, and initializes statistics buffersrunning_meanandrunning_varasNone. When these buffers areNone, this module always uses batch statistics. in both training and eval modes. Default:True
- Shape:
Input: \((N, C)\) or \((N, C, L)\), where \(N\) is the batch size, \(C\) is the number of features or channels, and \(L\) is the sequence length
Output: \((N, C)\) or \((N, C, L)\) (same shape as input)
Examples:
>>> # With Learnable Parameters >>> m = nn.BatchNorm1d(100) >>> # Without Learnable Parameters >>> m = nn.BatchNorm1d(100, affine=False) >>> input = torch.randn(20, 100) >>> output = m(input)
-
class torchwrench.nn.BatchNorm2d(num_features: int, eps: float =
1e-05, momentum: float | None =0.1, affine: bool =True, track_running_stats: bool =True, device=None, dtype=None)[source]¶ Bases:
_BatchNormApplies Batch Normalization over a 4D input.
4D is a mini-batch of 2D inputs with additional channel dimension. Method described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are set to 1 and the elements of \(\beta\) are set to 0. At train time in the forward pass, the standard-deviation is calculated via the biased estimator, equivalent to
torch.var(input, correction=0). However, the value stored in the moving average of the standard-deviation is calculated via the unbiased estimator, equivalent totorch.var(input, correction=1).Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default
momentumof 0.1.If
track_running_statsis set toFalse, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.Note
This
momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Because the Batch Normalization is done over the C dimension, computing statistics on (N, H, W) slices, it’s common terminology to call this Spatial Batch Normalization.
- Args:
- num_features: \(C\) from an expected input of size
\((N, C, H, W)\)
- eps: a value added to the denominator for numerical stability.
Default: 1e-5
- momentum: the value used for the running_mean and running_var
computation. Can be set to
Nonefor cumulative moving average (i.e. simple average). Default: 0.1- affine: a boolean value that when set to
True, this module has learnable affine parameters. Default:
True- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics, and initializes statistics buffersrunning_meanandrunning_varasNone. When these buffers areNone, this module always uses batch statistics. in both training and eval modes. Default:True
- Shape:
Input: \((N, C, H, W)\)
Output: \((N, C, H, W)\) (same shape as input)
Examples:
>>> # With Learnable Parameters >>> m = nn.BatchNorm2d(100) >>> # Without Learnable Parameters >>> m = nn.BatchNorm2d(100, affine=False) >>> input = torch.randn(20, 100, 35, 45) >>> output = m(input)
-
class torchwrench.nn.BatchNorm3d(num_features: int, eps: float =
1e-05, momentum: float | None =0.1, affine: bool =True, track_running_stats: bool =True, device=None, dtype=None)[source]¶ Bases:
_BatchNormApplies Batch Normalization over a 5D input.
5D is a mini-batch of 3D inputs with additional channel dimension as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are set to 1 and the elements of \(\beta\) are set to 0. At train time in the forward pass, the standard-deviation is calculated via the biased estimator, equivalent to
torch.var(input, correction=0). However, the value stored in the moving average of the standard-deviation is calculated via the unbiased estimator, equivalent totorch.var(input, correction=1).Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default
momentumof 0.1.If
track_running_statsis set toFalse, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.Note
This
momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Because the Batch Normalization is done over the C dimension, computing statistics on (N, D, H, W) slices, it’s common terminology to call this Volumetric Batch Normalization or Spatio-temporal Batch Normalization.
- Args:
- num_features: \(C\) from an expected input of size
\((N, C, D, H, W)\)
- eps: a value added to the denominator for numerical stability.
Default: 1e-5
- momentum: the value used for the running_mean and running_var
computation. Can be set to
Nonefor cumulative moving average (i.e. simple average). Default: 0.1- affine: a boolean value that when set to
True, this module has learnable affine parameters. Default:
True- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics, and initializes statistics buffersrunning_meanandrunning_varasNone. When these buffers areNone, this module always uses batch statistics. in both training and eval modes. Default:True
- Shape:
Input: \((N, C, D, H, W)\)
Output: \((N, C, D, H, W)\) (same shape as input)
Examples:
>>> # With Learnable Parameters >>> m = nn.BatchNorm3d(100) >>> # Without Learnable Parameters >>> m = nn.BatchNorm3d(100, affine=False) >>> input = torch.randn(20, 100, 35, 45, 10) >>> output = m(input)
-
class torchwrench.nn.Bilinear(in1_features: int, in2_features: int, out_features: int, bias: bool =
True, device=None, dtype=None)[source]¶ Bases:
ModuleApplies a bilinear transformation to the incoming data: \(y = x_1^T A x_2 + b\).
- Args:
in1_features: size of each first input sample, must be > 0 in2_features: size of each second input sample, must be > 0 out_features: size of each output sample, must be > 0 bias: If set to
False, the layer will not learn an additive bias.Default:
True- Shape:
Input1: \((*, H_\text{in1})\) where \(H_\text{in1}=\text{in1\_features}\) and \(*\) means any number of additional dimensions including none. All but the last dimension of the inputs should be the same.
Input2: \((*, H_\text{in2})\) where \(H_\text{in2}=\text{in2\_features}\).
Output: \((*, H_\text{out})\) where \(H_\text{out}=\text{out\_features}\) and all but the last dimension are the same shape as the input.
- Attributes:
- weight: the learnable weights of the module of shape
\((\text{out\_features}, \text{in1\_features}, \text{in2\_features})\). The values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in1\_features}}\)
- bias: the learnable bias of the module of shape \((\text{out\_features})\).
If
biasisTrue, the values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in1\_features}}\)
Examples:
>>> m = nn.Bilinear(20, 30, 40) >>> input1 = torch.randn(128, 20) >>> input2 = torch.randn(128, 30) >>> output = m(input1, input2) >>> print(output.size()) torch.Size([128, 40])
-
class torchwrench.nn.CELU(alpha: float =
1.0, inplace: bool =False)[source]¶ Bases:
ModuleApplies the CELU function element-wise.
\[\text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1))\]More details can be found in the paper Continuously Differentiable Exponential Linear Units .
- Args:
alpha: the \(\alpha\) value for the CELU formulation. Default: 1.0 inplace: can optionally do the operation in-place. Default:
False- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.CELU() >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.CTCLoss(blank: int =
0, reduction: str ='mean', zero_infinity: bool =False)[source]¶ Bases:
_LossThe Connectionist Temporal Classification loss.
Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over the probability of possible alignments of input to target, producing a loss value which is differentiable with respect to each input node. The alignment of input to target is assumed to be “many-to-one”, which limits the length of the target sequence such that it must be \(\leq\) the input length.
- Args:
blank (int, optional): blank label. Default \(0\). reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the output losses will be divided by the target lengths and then the mean over the batch is taken,'sum': the output losses will be summed. Default:'mean'- zero_infinity (bool, optional):
Whether to zero infinite losses and the associated gradients. Default:
FalseInfinite losses mainly occur when the inputs are too short to be aligned to the targets.
- Shape:
Log_probs: Tensor of size \((T, N, C)\) or \((T, C)\), where \(T = \text{input length}\), \(N = \text{batch size}\), and \(C = \text{number of classes (including blank)}\). The logarithmized probabilities of the outputs (e.g. obtained with
torch.nn.functional.log_softmax()).Targets: Tensor of size \((N, S)\) or \((\operatorname{sum}(\text{target\_lengths}))\), where \(N = \text{batch size}\) and \(S = \text{max target length, if shape is } (N, S)\). It represents the target sequences. Each element in the target sequence is a class index. And the target index cannot be blank (default=0). In the \((N, S)\) form, targets are padded to the length of the longest sequence, and stacked. In the \((\operatorname{sum}(\text{target\_lengths}))\) form, the targets are assumed to be un-padded and concatenated within 1 dimension.
Input_lengths: Tuple or tensor of size \((N)\) or \(()\), where \(N = \text{batch size}\). It represents the lengths of the inputs (must each be \(\leq T\)). And the lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths.
Target_lengths: Tuple or tensor of size \((N)\) or \(()\), where \(N = \text{batch size}\). It represents lengths of the targets. Lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths. If target shape is \((N,S)\), target_lengths are effectively the stop index \(s_n\) for each target sequence, such that
target_n = targets[n,0:s_n]for each target in a batch. Lengths must each be \(\leq S\) If the targets are given as a 1d tensor that is the concatenation of individual targets, the target_lengths must add up to the total length of the tensor.Output: scalar if
reductionis'mean'(default) or'sum'. Ifreductionis'none', then \((N)\) if input is batched or \(()\) if input is unbatched, where \(N = \text{batch size}\).
Examples:
>>> # Target are to be padded >>> T = 50 # Input sequence length >>> C = 20 # Number of classes (including blank) >>> N = 16 # Batch size >>> S = 30 # Target sequence length of longest target in batch (padding length) >>> S_min = 10 # Minimum target length, for demonstration purposes >>> >>> # Initialize random batch of input vectors, for *size = (T,N,C) >>> input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_() >>> >>> # Initialize random batch of targets (0 = blank, 1:C = classes) >>> target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long) >>> >>> input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long) >>> target_lengths = torch.randint( ... low=S_min, ... high=S, ... size=(N,), ... dtype=torch.long, ... ) >>> ctc_loss = nn.CTCLoss() >>> loss = ctc_loss(input, target, input_lengths, target_lengths) >>> loss.backward() >>> >>> >>> # Target are to be un-padded >>> T = 50 # Input sequence length >>> C = 20 # Number of classes (including blank) >>> N = 16 # Batch size >>> >>> # Initialize random batch of input vectors, for *size = (T,N,C) >>> input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_() >>> input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long) >>> >>> # Initialize random batch of targets (0 = blank, 1:C = classes) >>> target_lengths = torch.randint(low=1, high=T, size=(N,), dtype=torch.long) >>> target = torch.randint( ... low=1, ... high=C, ... size=(sum(target_lengths),), ... dtype=torch.long, ... ) >>> ctc_loss = nn.CTCLoss() >>> loss = ctc_loss(input, target, input_lengths, target_lengths) >>> loss.backward() >>> >>> >>> # Target are to be un-padded and unbatched (effectively N=1) >>> T = 50 # Input sequence length >>> C = 20 # Number of classes (including blank) >>> >>> # Initialize random batch of input vectors, for *size = (T,C) >>> # xdoctest: +SKIP("FIXME: error in doctest") >>> input = torch.randn(T, C).log_softmax(1).detach().requires_grad_() >>> input_lengths = torch.tensor(T, dtype=torch.long) >>> >>> # Initialize random batch of targets (0 = blank, 1:C = classes) >>> target_lengths = torch.randint(low=1, high=T, size=(), dtype=torch.long) >>> target = torch.randint( ... low=1, ... high=C, ... size=(target_lengths,), ... dtype=torch.long, ... ) >>> ctc_loss = nn.CTCLoss() >>> loss = ctc_loss(input, target, input_lengths, target_lengths) >>> loss.backward()- Reference:
A. Graves et al.: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks: https://www.cs.toronto.edu/~graves/icml_2006.pdf
- Note:
In order to use CuDNN, the following must be satisfied: the
targetsmust be in concatenated format, allinput_lengthsmust be T. \(blank=0\),target_lengths\(\leq 256\), the integer arguments must be of dtypetorch.int32, and thelog_probsitself must be of dtypetorch.float32.The regular implementation uses the (more common in PyTorch) torch.long dtype.
- Note:
In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting
torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.
- class torchwrench.nn.ChannelShuffle(groups: int)[source]¶
Bases:
ModuleDivides and rearranges the channels in a tensor.
This operation divides the channels in a tensor of shape \((N, C, *)\) into g groups as \((N, \frac{C}{g}, g, *)\) and shuffles them, while retaining the original tensor shape in the final output.
- Args:
groups (int): number of groups to divide channels in.
Examples:
>>> channel_shuffle = nn.ChannelShuffle(2) >>> input = torch.arange(1, 17, dtype=torch.float32).view(1, 4, 2, 2) >>> input tensor([[[[ 1., 2.], [ 3., 4.]], [[ 5., 6.], [ 7., 8.]], [[ 9., 10.], [11., 12.]], [[13., 14.], [15., 16.]]]]) >>> output = channel_shuffle(input) >>> output tensor([[[[ 1., 2.], [ 3., 4.]], [[ 9., 10.], [11., 12.]], [[ 5., 6.], [ 7., 8.]], [[13., 14.], [15., 16.]]]])
- class torchwrench.nn.ConstantPad1d(padding: int | tuple[int, int], value: float)[source]¶
Bases:
_ConstantPadNdPads the input tensor boundaries with a constant value.
For N-dimensional padding, use
torch.nn.functional.pad().- Args:
- padding (int, tuple): the size of the padding. If is int, uses the same
padding in both boundaries. If a 2-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\))
- Shape:
Input: \((C, W_{in})\) or \((N, C, W_{in})\).
Output: \((C, W_{out})\) or \((N, C, W_{out})\), where
\(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)
Examples:
>>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> m = nn.ConstantPad1d(2, 3.5) >>> input = torch.randn(1, 2, 4) >>> input tensor([[[-1.0491, -0.7152, -0.0749, 0.8530], [-1.3287, 1.8966, 0.1466, -0.2771]]]) >>> m(input) tensor([[[ 3.5000, 3.5000, -1.0491, -0.7152, -0.0749, 0.8530, 3.5000, 3.5000], [ 3.5000, 3.5000, -1.3287, 1.8966, 0.1466, -0.2771, 3.5000, 3.5000]]]) >>> m = nn.ConstantPad1d(2, 3.5) >>> input = torch.randn(1, 2, 3) >>> input tensor([[[ 1.6616, 1.4523, -1.1255], [-3.6372, 0.1182, -1.8652]]]) >>> m(input) tensor([[[ 3.5000, 3.5000, 1.6616, 1.4523, -1.1255, 3.5000, 3.5000], [ 3.5000, 3.5000, -3.6372, 0.1182, -1.8652, 3.5000, 3.5000]]]) >>> # using different paddings for different sides >>> m = nn.ConstantPad1d((3, 1), 3.5) >>> m(input) tensor([[[ 3.5000, 3.5000, 3.5000, 1.6616, 1.4523, -1.1255, 3.5000], [ 3.5000, 3.5000, 3.5000, -3.6372, 0.1182, -1.8652, 3.5000]]])
- class torchwrench.nn.ConstantPad2d(padding: int | tuple[int, int, int, int], value: float)[source]¶
Bases:
_ConstantPadNdPads the input tensor boundaries with a constant value.
For N-dimensional padding, use
torch.nn.functional.pad().- Args:
- padding (int, tuple): the size of the padding. If is int, uses the same
padding in all boundaries. If a 4-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\))
- Shape:
Input: \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\).
Output: \((N, C, H_{out}, W_{out})\) or \((C, H_{out}, W_{out})\), where
\(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\)
\(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)
Examples:
>>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> m = nn.ConstantPad2d(2, 3.5) >>> input = torch.randn(1, 2, 2) >>> input tensor([[[ 1.6585, 0.4320], [-0.8701, -0.4649]]]) >>> m(input) tensor([[[ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 1.6585, 0.4320, 3.5000, 3.5000], [ 3.5000, 3.5000, -0.8701, -0.4649, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000, 3.5000]]]) >>> # using different paddings for different sides >>> m = nn.ConstantPad2d((3, 0, 2, 1), 3.5) >>> m(input) tensor([[[ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 1.6585, 0.4320], [ 3.5000, 3.5000, 3.5000, -0.8701, -0.4649], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000]]])
- class torchwrench.nn.ConstantPad3d(padding: int | tuple[int, int, int, int, int, int], value: float)[source]¶
Bases:
_ConstantPadNdPads the input tensor boundaries with a constant value.
For N-dimensional padding, use
torch.nn.functional.pad().- Args:
- padding (int, tuple): the size of the padding. If is int, uses the same
padding in all boundaries. If a 6-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\), \(\text{padding\_front}\), \(\text{padding\_back}\))
- Shape:
Input: \((N, C, D_{in}, H_{in}, W_{in})\) or \((C, D_{in}, H_{in}, W_{in})\).
Output: \((N, C, D_{out}, H_{out}, W_{out})\) or \((C, D_{out}, H_{out}, W_{out})\), where
\(D_{out} = D_{in} + \text{padding\_front} + \text{padding\_back}\)
\(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\)
\(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)
Examples:
>>> m = nn.ConstantPad3d(3, 3.5) >>> input = torch.randn(16, 3, 10, 20, 30) >>> output = m(input) >>> # using different paddings for different sides >>> m = nn.ConstantPad3d((3, 3, 6, 6, 0, 1), 3.5) >>> output = m(input)
-
class torchwrench.nn.Conv1d(in_channels: int, out_channels: int, kernel_size: int | tuple[int], stride: int | tuple[int] =
1, padding: str | int | tuple[int] =0, dilation: int | tuple[int] =1, groups: int =1, bias: bool =True, padding_mode: 'zeros' | 'reflect' | 'replicate' | 'circular' ='zeros', device=None, dtype=None)[source]¶ Bases:
_ConvNdApplies a 1D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C_{\text{in}}, L)\) and output \((N, C_{\text{out}}, L_{\text{out}})\) can be precisely described as:
\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)\]where \(\star\) is the valid cross-correlation operator, \(N\) is a batch size, \(C\) denotes a number of channels, \(L\) is a length of signal sequence.
This module supports TensorFloat32.
On certain ROCm devices, when using float16 inputs this module will use different precision for backward.
stridecontrols the stride for the cross-correlation, a single number or a one-element tuple.paddingcontrols the amount of padding applied to the input. It can be either a string {‘valid’, ‘same’} or a tuple of ints giving the amount of implicit padding applied on both sides.dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of whatdilationdoes.groupscontrols the connections between inputs and outputs.in_channelsandout_channelsmust both be divisible bygroups. For example,At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
At groups=
in_channels, each input channel is convolved with its own set of filters (of size \(\frac{\text{out\_channels}}{\text{in\_channels}}\)).
- Note:
When groups == in_channels and out_channels == K * in_channels, where K is a positive integer, this operation is also known as a “depthwise convolution”.
In other words, for an input of size \((N, C_{in}, L_{in})\), a depthwise convolution with a depthwise multiplier K can be performed with the arguments \((C_\text{in}=C_\text{in}, C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in})\).
- Note:
In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting
torch.backends.cudnn.deterministic = True. See /notes/randomness for more information.- Note:
padding='valid'is the same as no padding.padding='same'pads the input so the output has the shape as the input. However, this mode doesn’t support any stride values other than 1.- Note:
This module supports complex data types i.e.
complex32, complex64, complex128.- Args:
in_channels (int): Number of channels in the input image out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int, tuple or str, optional): Padding added to both sides of
the input. Default: 0
- dilation (int or tuple, optional): Spacing between kernel
elements. Default: 1
- groups (int, optional): Number of blocked connections from input
channels to output channels. Default: 1
- bias (bool, optional): If
True, adds a learnable bias to the output. Default:
True- padding_mode (str, optional):
'zeros','reflect', 'replicate'or'circular'. Default:'zeros'
- Shape:
Input: \((N, C_{in}, L_{in})\) or \((C_{in}, L_{in})\)
Output: \((N, C_{out}, L_{out})\) or \((C_{out}, L_{out})\), where
\[L_{out} = \left\lfloor\frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\_size} - 1) - 1}{\text{stride}} + 1\right\rfloor\]
- Attributes:
- weight (Tensor): the learnable weights of the module of shape
\((\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}}, \text{kernel\_size})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{in} * \text{kernel\_size}}\)
- bias (Tensor): the learnable bias of the module of shape
(out_channels). If
biasisTrue, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{in} * \text{kernel\_size}}\)
Examples:
>>> m = nn.Conv1d(16, 33, 3, stride=2) >>> input = torch.randn(20, 16, 50) >>> output = m(input)- forward(input: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.Conv2d(in_channels: int, out_channels: int, kernel_size: int | tuple[int, int], stride: int | tuple[int, int] =
1, padding: str | int | tuple[int, int] =0, dilation: int | tuple[int, int] =1, groups: int =1, bias: bool =True, padding_mode: 'zeros' | 'reflect' | 'replicate' | 'circular' ='zeros', device=None, dtype=None)[source]¶ Bases:
_ConvNdApplies a 2D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C_{\text{in}}, H, W)\) and output \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) can be precisely described as:
\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)\]where \(\star\) is the valid 2D cross-correlation operator, \(N\) is a batch size, \(C\) denotes a number of channels, \(H\) is a height of input planes in pixels, and \(W\) is width in pixels.
This module supports TensorFloat32.
On certain ROCm devices, when using float16 inputs this module will use different precision for backward.
stridecontrols the stride for the cross-correlation, a single number or a tuple.paddingcontrols the amount of padding applied to the input. It can be either a string {‘valid’, ‘same’} or an int / a tuple of ints giving the amount of implicit padding applied on both sides.dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of whatdilationdoes.groupscontrols the connections between inputs and outputs.in_channelsandout_channelsmust both be divisible bygroups. For example,At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
At groups=
in_channels, each input channel is convolved with its own set of filters (of size \(\frac{\text{out\_channels}}{\text{in\_channels}}\)).
The parameters
kernel_size,stride,padding,dilationcan either be:a single
int– in which case the same value is used for the height and width dimensiona
tupleof two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
- Note:
When groups == in_channels and out_channels == K * in_channels, where K is a positive integer, this operation is also known as a “depthwise convolution”.
In other words, for an input of size \((N, C_{in}, L_{in})\), a depthwise convolution with a depthwise multiplier K can be performed with the arguments \((C_\text{in}=C_\text{in}, C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in})\).
- Note:
In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting
torch.backends.cudnn.deterministic = True. See /notes/randomness for more information.- Note:
padding='valid'is the same as no padding.padding='same'pads the input so the output has the shape as the input. However, this mode doesn’t support any stride values other than 1.- Note:
This module supports complex data types i.e.
complex32, complex64, complex128.- Args:
in_channels (int): Number of channels in the input image out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int, tuple or str, optional): Padding added to all four sides of
the input. Default: 0
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1 groups (int, optional): Number of blocked connections from input
channels to output channels. Default: 1
- bias (bool, optional): If
True, adds a learnable bias to the output. Default:
True- padding_mode (str, optional):
'zeros','reflect', 'replicate'or'circular'. Default:'zeros'
- bias (bool, optional): If
- Shape:
Input: \((N, C_{in}, H_{in}, W_{in})\) or \((C_{in}, H_{in}, W_{in})\)
Output: \((N, C_{out}, H_{out}, W_{out})\) or \((C_{out}, H_{out}, W_{out})\), where
\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]
- Attributes:
- weight (Tensor): the learnable weights of the module of shape
\((\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},\) \(\text{kernel\_size[0]}, \text{kernel\_size[1]})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}\)
- bias (Tensor): the learnable bias of the module of shape
(out_channels). If
biasisTrue, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}\)
Examples:
>>> # With square kernels and equal stride >>> m = nn.Conv2d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)) >>> # non-square kernels and unequal stride and with padding and dilation >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1)) >>> input = torch.randn(20, 16, 50, 100) >>> output = m(input)- forward(input: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.Conv3d(in_channels: int, out_channels: int, kernel_size: int | tuple[int, int, int], stride: int | tuple[int, int, int] =
1, padding: str | int | tuple[int, int, int] =0, dilation: int | tuple[int, int, int] =1, groups: int =1, bias: bool =True, padding_mode: 'zeros' | 'reflect' | 'replicate' | 'circular' ='zeros', device=None, dtype=None)[source]¶ Bases:
_ConvNdApplies a 3D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C_{in}, D, H, W)\) and output \((N, C_{out}, D_{out}, H_{out}, W_{out})\) can be precisely described as:
\[out(N_i, C_{out_j}) = bias(C_{out_j}) + \sum_{k = 0}^{C_{in} - 1} weight(C_{out_j}, k) \star input(N_i, k)\]where \(\star\) is the valid 3D cross-correlation operator
This module supports TensorFloat32.
On certain ROCm devices, when using float16 inputs this module will use different precision for backward.
stridecontrols the stride for the cross-correlation.paddingcontrols the amount of padding applied to the input. It can be either a string {‘valid’, ‘same’} or a tuple of ints giving the amount of implicit padding applied on both sides.dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of whatdilationdoes.groupscontrols the connections between inputs and outputs.in_channelsandout_channelsmust both be divisible bygroups. For example,At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
At groups=
in_channels, each input channel is convolved with its own set of filters (of size \(\frac{\text{out\_channels}}{\text{in\_channels}}\)).
The parameters
kernel_size,stride,padding,dilationcan either be:a single
int– in which case the same value is used for the depth, height and width dimensiona
tupleof three ints – in which case, the first int is used for the depth dimension, the second int for the height dimension and the third int for the width dimension
- Note:
When groups == in_channels and out_channels == K * in_channels, where K is a positive integer, this operation is also known as a “depthwise convolution”.
In other words, for an input of size \((N, C_{in}, L_{in})\), a depthwise convolution with a depthwise multiplier K can be performed with the arguments \((C_\text{in}=C_\text{in}, C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in})\).
- Note:
In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting
torch.backends.cudnn.deterministic = True. See /notes/randomness for more information.- Note:
padding='valid'is the same as no padding.padding='same'pads the input so the output has the shape as the input. However, this mode doesn’t support any stride values other than 1.- Note:
This module supports complex data types i.e.
complex32, complex64, complex128.- Args:
in_channels (int): Number of channels in the input image out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int, tuple or str, optional): Padding added to all six sides of
the input. Default: 0
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1 groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1 bias (bool, optional): If
True, adds a learnable bias to the output. Default:Truepadding_mode (str, optional):'zeros','reflect','replicate'or'circular'. Default:'zeros'- Shape:
Input: \((N, C_{in}, D_{in}, H_{in}, W_{in})\) or \((C_{in}, D_{in}, H_{in}, W_{in})\)
Output: \((N, C_{out}, D_{out}, H_{out}, W_{out})\) or \((C_{out}, D_{out}, H_{out}, W_{out})\), where
\[D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{dilation}[2] \times (\text{kernel\_size}[2] - 1) - 1}{\text{stride}[2]} + 1\right\rfloor\]
- Attributes:
- weight (Tensor): the learnable weights of the module of shape
\((\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},\) \(\text{kernel\_size[0]}, \text{kernel\_size[1]}, \text{kernel\_size[2]})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{in} * \prod_{i=0}^{2}\text{kernel\_size}[i]}\)
- bias (Tensor): the learnable bias of the module of shape (out_channels). If
biasisTrue, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{in} * \prod_{i=0}^{2}\text{kernel\_size}[i]}\)
Examples:
>>> # With square kernels and equal stride >>> m = nn.Conv3d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0)) >>> input = torch.randn(20, 16, 10, 50, 100) >>> output = m(input)- forward(input: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.ConvTranspose1d(in_channels: int, out_channels: int, kernel_size: int | tuple[int], stride: int | tuple[int] =
1, padding: int | tuple[int] =0, output_padding: int | tuple[int] =0, groups: int =1, bias: bool =True, dilation: int | tuple[int] =1, padding_mode: 'zeros' | 'reflect' | 'replicate' | 'circular' ='zeros', device=None, dtype=None)[source]¶ Bases:
_ConvTransposeNdApplies a 1D transposed convolution operator over an input image composed of several input planes.
This module can be seen as the gradient of Conv1d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation as it does not compute a true inverse of convolution). For more information, see the visualizations here and the Deconvolutional Networks paper.
This module supports TensorFloat32.
On certain ROCm devices, when using float16 inputs this module will use different precision for backward.
stridecontrols the stride for the cross-correlation.paddingcontrols the amount of implicit zero padding on both sides fordilation * (kernel_size - 1) - paddingnumber of points. See note below for details.output_paddingcontrols the additional size added to one side of the output shape. See note below for details.dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but the link here has a nice visualization of whatdilationdoes.groupscontrols the connections between inputs and outputs.in_channelsandout_channelsmust both be divisible bygroups. For example,At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
At groups=
in_channels, each input channel is convolved with its own set of filters (of size \(\frac{\text{out\_channels}}{\text{in\_channels}}\)).
- Note:
The
paddingargument effectively addsdilation * (kernel_size - 1) - paddingamount of zero padding to both sizes of the input. This is set so that when aConv1dand aConvTranspose1dare initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, whenstride > 1,Conv1dmaps multiple input shapes to the same output shape.output_paddingis provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note thatoutput_paddingis only used to find output shape, but does not actually add zero-padding to output.- Note:
In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting
torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Args:
in_channels (int): Number of channels in the input image out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int or tuple, optional):
dilation * (kernel_size - 1) - paddingzero-paddingwill be added to both sides of the input. Default: 0
- output_padding (int or tuple, optional): Additional size added to one side
of the output shape. Default: 0
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1 bias (bool, optional): If
True, adds a learnable bias to the output. Default:Truedilation (int or tuple, optional): Spacing between kernel elements. Default: 1- Shape:
Input: \((N, C_{in}, L_{in})\) or \((C_{in}, L_{in})\)
Output: \((N, C_{out}, L_{out})\) or \((C_{out}, L_{out})\), where
\[L_{out} = (L_{in} - 1) \times \text{stride} - 2 \times \text{padding} + \text{dilation} \times (\text{kernel\_size} - 1) + \text{output\_padding} + 1\]
- Attributes:
- weight (Tensor): the learnable weights of the module of shape
\((\text{in\_channels}, \frac{\text{out\_channels}}{\text{groups}},\) \(\text{kernel\_size})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \text{kernel\_size}}\)
- bias (Tensor): the learnable bias of the module of shape (out_channels).
If
biasisTrue, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \text{kernel\_size}}\)
Examples:
>>> # With square kernels and equal stride >>> m = nn.ConvTranspose1d(16, 33, 3, stride=2) >>> input = torch.randn(20, 16, 50) >>> output = m(input) >>> # exact output size can be also specified as an argument >>> input = torch.randn(1, 16, 12) >>> downsample = nn.Conv1d(16, 16, 3, stride=2, padding=1) >>> upsample = nn.ConvTranspose1d(16, 16, 3, stride=2, padding=1) >>> h = downsample(input) >>> h.size() torch.Size([1, 16, 6]) >>> output = upsample(h, output_size=input.size()) >>> output.size() torch.Size([1, 16, 12])-
forward(input: Tensor, output_size: list[int] | None =
None) Tensor[source]¶ Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.ConvTranspose2d(in_channels: int, out_channels: int, kernel_size: int | tuple[int, int], stride: int | tuple[int, int] =
1, padding: int | tuple[int, int] =0, output_padding: int | tuple[int, int] =0, groups: int =1, bias: bool =True, dilation: int | tuple[int, int] =1, padding_mode: 'zeros' | 'reflect' | 'replicate' | 'circular' ='zeros', device=None, dtype=None)[source]¶ Bases:
_ConvTransposeNdApplies a 2D transposed convolution operator over an input image composed of several input planes.
This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation as it does not compute a true inverse of convolution). For more information, see the visualizations here and the Deconvolutional Networks paper.
This module supports TensorFloat32.
On certain ROCm devices, when using float16 inputs this module will use different precision for backward.
stridecontrols the stride for the cross-correlation. When stride > 1, ConvTranspose2d inserts zeros between input elements along the spatial dimensions before applying the convolution kernel. This zero-insertion operation is the standard behavior of transposed convolutions, which can increase the spatial resolution and is equivalent to a learnable upsampling operation.paddingcontrols the amount of implicit zero padding on both sides fordilation * (kernel_size - 1) - paddingnumber of points. See note below for details.output_paddingcontrols the additional size added to one side of the output shape. See note below for details.dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but the link here has a nice visualization of whatdilationdoes.groupscontrols the connections between inputs and outputs.in_channelsandout_channelsmust both be divisible bygroups. For example,At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
At groups=
in_channels, each input channel is convolved with its own set of filters (of size \(\frac{\text{out\_channels}}{\text{in\_channels}}\)).
The parameters
kernel_size,stride,padding,output_paddingcan either be:a single
int– in which case the same value is used for the height and width dimensionsa
tupleof two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
- Note:
The
paddingargument effectively addsdilation * (kernel_size - 1) - paddingamount of zero padding to both sizes of the input. This is set so that when aConv2dand aConvTranspose2dare initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, whenstride > 1,Conv2dmaps multiple input shapes to the same output shape.output_paddingis provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note thatoutput_paddingis only used to find output shape, but does not actually add zero-padding to output.- Note:
In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting
torch.backends.cudnn.deterministic = True. See /notes/randomness for more information.- Args:
in_channels (int): Number of channels in the input image out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int or tuple, optional):
dilation * (kernel_size - 1) - paddingzero-paddingwill be added to both sides of each dimension in the input. Default: 0
- output_padding (int or tuple, optional): Additional size added to one side
of each dimension in the output shape. Default: 0
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1 bias (bool, optional): If
True, adds a learnable bias to the output. Default:Truedilation (int or tuple, optional): Spacing between kernel elements. Default: 1- Shape:
Input: \((N, C_{in}, H_{in}, W_{in})\) or \((C_{in}, H_{in}, W_{in})\)
Output: \((N, C_{out}, H_{out}, W_{out})\) or \((C_{out}, H_{out}, W_{out})\), where
\[H_{out} = (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) + \text{output\_padding}[0] + 1\]\[W_{out} = (W_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) + \text{output\_padding}[1] + 1\]- Attributes:
- weight (Tensor): the learnable weights of the module of shape
\((\text{in\_channels}, \frac{\text{out\_channels}}{\text{groups}},\) \(\text{kernel\_size[0]}, \text{kernel\_size[1]})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\_size}[i]}\)
- bias (Tensor): the learnable bias of the module of shape (out_channels)
If
biasisTrue, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\_size}[i]}\)
Examples:
>>> # With square kernels and equal stride >>> m = nn.ConvTranspose2d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.ConvTranspose2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)) >>> input = torch.randn(20, 16, 50, 100) >>> output = m(input) >>> # exact output size can be also specified as an argument >>> input = torch.randn(1, 16, 12, 12) >>> downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1) >>> upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1) >>> h = downsample(input) >>> h.size() torch.Size([1, 16, 6, 6]) >>> output = upsample(h, output_size=input.size()) >>> output.size() torch.Size([1, 16, 12, 12])
-
class torchwrench.nn.ConvTranspose3d(in_channels: int, out_channels: int, kernel_size: int | tuple[int, int, int], stride: int | tuple[int, int, int] =
1, padding: int | tuple[int, int, int] =0, output_padding: int | tuple[int, int, int] =0, groups: int =1, bias: bool =True, dilation: int | tuple[int, int, int] =1, padding_mode: 'zeros' | 'reflect' | 'replicate' | 'circular' ='zeros', device=None, dtype=None)[source]¶ Bases:
_ConvTransposeNdApplies a 3D transposed convolution operator over an input image composed of several input planes. The transposed convolution operator multiplies each input value element-wise by a learnable kernel, and sums over the outputs from all input feature planes.
This module can be seen as the gradient of Conv3d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation as it does not compute a true inverse of convolution). For more information, see the visualizations here and the Deconvolutional Networks paper.
This module supports TensorFloat32.
On certain ROCm devices, when using float16 inputs this module will use different precision for backward.
stridecontrols the stride for the cross-correlation.paddingcontrols the amount of implicit zero padding on both sides fordilation * (kernel_size - 1) - paddingnumber of points. See note below for details.output_paddingcontrols the additional size added to one side of the output shape. See note below for details.dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but the link here has a nice visualization of whatdilationdoes.groupscontrols the connections between inputs and outputs.in_channelsandout_channelsmust both be divisible bygroups. For example,At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
At groups=
in_channels, each input channel is convolved with its own set of filters (of size \(\frac{\text{out\_channels}}{\text{in\_channels}}\)).
The parameters
kernel_size,stride,padding,output_paddingcan either be:a single
int– in which case the same value is used for the depth, height and width dimensionsa
tupleof three ints – in which case, the first int is used for the depth dimension, the second int for the height dimension and the third int for the width dimension
- Note:
The
paddingargument effectively addsdilation * (kernel_size - 1) - paddingamount of zero padding to both sizes of the input. This is set so that when aConv3dand aConvTranspose3dare initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, whenstride > 1,Conv3dmaps multiple input shapes to the same output shape.output_paddingis provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note thatoutput_paddingis only used to find output shape, but does not actually add zero-padding to output.- Note:
In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting
torch.backends.cudnn.deterministic = True. See /notes/randomness for more information.- Args:
in_channels (int): Number of channels in the input image out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int or tuple, optional):
dilation * (kernel_size - 1) - paddingzero-paddingwill be added to both sides of each dimension in the input. Default: 0
- output_padding (int or tuple, optional): Additional size added to one side
of each dimension in the output shape. Default: 0
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1 bias (bool, optional): If
True, adds a learnable bias to the output. Default:Truedilation (int or tuple, optional): Spacing between kernel elements. Default: 1- Shape:
Input: \((N, C_{in}, D_{in}, H_{in}, W_{in})\) or \((C_{in}, D_{in}, H_{in}, W_{in})\)
Output: \((N, C_{out}, D_{out}, H_{out}, W_{out})\) or \((C_{out}, D_{out}, H_{out}, W_{out})\), where
\[D_{out} = (D_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) + \text{output\_padding}[0] + 1\]\[H_{out} = (H_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) + \text{output\_padding}[1] + 1\]\[W_{out} = (W_{in} - 1) \times \text{stride}[2] - 2 \times \text{padding}[2] + \text{dilation}[2] \times (\text{kernel\_size}[2] - 1) + \text{output\_padding}[2] + 1\]- Attributes:
- weight (Tensor): the learnable weights of the module of shape
\((\text{in\_channels}, \frac{\text{out\_channels}}{\text{groups}},\) \(\text{kernel\_size[0]}, \text{kernel\_size[1]}, \text{kernel\_size[2]})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \prod_{i=0}^{2}\text{kernel\_size}[i]}\)
- bias (Tensor): the learnable bias of the module of shape (out_channels)
If
biasisTrue, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \prod_{i=0}^{2}\text{kernel\_size}[i]}\)
Examples:
>>> # With square kernels and equal stride >>> m = nn.ConvTranspose3d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.ConvTranspose3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(0, 4, 2)) >>> input = torch.randn(20, 16, 10, 50, 100) >>> output = m(input)-
forward(input: Tensor, output_size: list[int] | None =
None) Tensor[source]¶ Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.CosineEmbeddingLoss(margin: float =
0.0, size_average=None, reduce=None, reduction: str ='mean')[source]¶ Bases:
_LossCreates a criterion that measures the loss given input tensors \(x_1\), \(x_2\) and a Tensor label \(y\) with values 1 or -1. Use (\(y=1\)) to maximize the cosine similarity of two inputs, and (\(y=-1\)) otherwise. This is typically used for learning nonlinear embeddings or semi-supervised learning.
The loss function for each sample is:
\[\begin{split}\text{loss}(x, y) = \begin{cases} 1 - \cos(x_1, x_2), & \text{if } y = 1 \\ \max(0, \cos(x_1, x_2) - \text{margin}), & \text{if } y = -1 \end{cases}\end{split}\]- Args:
- margin (float, optional): Should be a number from \(-1\) to \(1\),
\(0\) to \(0.5\) is suggested. If
marginis missing, the default value is \(0\).- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- Shape:
Input1: \((N, D)\) or \((D)\), where N is the batch size and D is the embedding dimension.
Input2: \((N, D)\) or \((D)\), same shape as Input1.
Target: \((N)\) or \(()\).
Output: If
reductionis'none', then \((N)\), otherwise scalar.
Examples:
>>> loss = nn.CosineEmbeddingLoss() >>> input1 = torch.randn(3, 5, requires_grad=True) >>> input2 = torch.randn(3, 5, requires_grad=True) >>> target = torch.ones(3) >>> output = loss(input1, input2, target) >>> output.backward()
-
class torchwrench.nn.CosineSimilarity(dim: int =
1, eps: float =1e-08)[source]¶ Bases:
ModuleReturns cosine similarity between \(x_1\) and \(x_2\), computed along dim.
\[\text{similarity} = \dfrac{x_1 \cdot x_2}{\max(\Vert x_1 \Vert _2 \cdot \Vert x_2 \Vert _2, \epsilon)}.\]- Args:
dim (int, optional): Dimension where cosine similarity is computed. Default: 1 eps (float, optional): Small value to avoid division by zero.
Default: 1e-8
- Shape:
Input1: \((\ast_1, D, \ast_2)\) where D is at position dim
Input2: \((\ast_1, D, \ast_2)\), same number of dimensions as x1, matching x1 size at dimension dim, and broadcastable with x1 at other dimensions.
Output: \((\ast_1, \ast_2)\)
- Examples:
>>> input1 = torch.randn(100, 128) >>> input2 = torch.randn(100, 128) >>> cos = nn.CosineSimilarity(dim=1, eps=1e-6) >>> output = cos(input1, input2)
-
class torchwrench.nn.CropDim(target_length: int, *, align: 'left' | 'right' | 'center' | 'random' =
'left', dim: int =-1, generator: Generator | None | 'default' | int =None)[source]¶ Bases:
ModuleFor more information, see
crop_dim().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.CropDims(target_lengths: Iterable[int], *, aligns: 'left' | 'right' | 'center' | 'random' | Iterable['left' | 'right' | 'center' | 'random'] =
'left', dims: Iterable[int] =(-1,), generator: Generator | None | 'default' | int =None)[source]¶ Bases:
ModuleFor more information, see
crop_dims().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.CrossEntropyLoss(weight: Tensor | None =
None, size_average=None, ignore_index: int =-100, reduce=None, reduction: str ='mean', label_smoothing: float =0.0)[source]¶ Bases:
_WeightedLossThis criterion computes the cross entropy loss between input logits and target.
It is useful when training a classification problem with C classes. If provided, the optional argument
weightshould be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.The input is expected to contain the unnormalized logits for each class (which do not need to be positive or sum to 1, in general). input has to be a Tensor of size \((C)\) for unbatched input, \((minibatch, C)\) or \((minibatch, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) for the K-dimensional case. The last being useful for higher dimension inputs, such as computing cross entropy loss per-pixel for 2D images.
The target that this criterion expects should contain either:
Class indices in the range \([0, C)\) where \(C\) is the number of classes; if ignore_index is specified, this loss also accepts this class index (this index may not necessarily be in the class range). The unreduced (i.e. with
reductionset to'none') loss for this case can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_{y_n} \log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^C \exp(x_{n,c})} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}\]where \(x\) is the input, \(y\) is the target, \(w\) is the weight, \(C\) is the number of classes, and \(N\) spans the minibatch dimension as well as \(d_1, ..., d_k\) for the K-dimensional case. If
reductionis not'none'(default'mean'), then\[\begin{split}\ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}} l_n, & \text{if reduction} = \text{`mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]Note that this case is equivalent to applying
LogSoftmaxon an input, followed byNLLLoss.Probabilities for each class; useful when labels beyond a single class per minibatch item are required, such as for blended labels, label smoothing, etc. The unreduced (i.e. with
reductionset to'none') loss for this case can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - \sum_{c=1}^C w_c \log \frac{\exp(x_{n,c})}{\sum_{i=1}^C \exp(x_{n,i})} y_{n,c}\]where \(x\) is the input, \(y\) is the target, \(w\) is the weight, \(C\) is the number of classes, and \(N\) spans the minibatch dimension as well as \(d_1, ..., d_k\) for the K-dimensional case. If
reductionis not'none'(default'mean'), then\[\begin{split}\ell(x, y) = \begin{cases} \frac{\sum_{n=1}^N l_n}{N}, & \text{if reduction} = \text{`mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]
Note
The performance of this criterion is generally better when target contains class indices, as this allows for optimized computation. Consider providing target as class probabilities only when a single class label per minibatch item is too restrictive.
- Args:
- weight (Tensor, optional): a manual rescaling weight given to each class.
If given, has to be a Tensor of size C.
- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- ignore_index (int, optional): Specifies a target value that is ignored
and does not contribute to the input gradient. When
size_averageisTrue, the loss is averaged over non-ignored targets. Note thatignore_indexis only applicable when the target contains class indices.- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the weighted mean of the output is taken,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'- label_smoothing (float, optional): A float in [0.0, 1.0]. Specifies the amount
of smoothing when computing the loss, where 0.0 means no smoothing. The targets become a mixture of the original ground truth and a uniform distribution as described in Rethinking the Inception Architecture for Computer Vision. Default: \(0.0\).
- Shape:
Input: Shape \((C)\), \((N, C)\) or \((N, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.
Target: If containing class indices, shape \(()\), \((N)\) or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss where each value should be between \([0, C)\). The target data type is required to be long when using class indices. If containing class probabilities, the target must be the same shape input, and each value should be between \([0, 1]\). This means the target data type is required to be float when using class probabilities. Note that PyTorch does not strictly enforce probability constraints on the class probabilities and that it is the user’s responsibility to ensure
targetcontains valid probability distributions (see below examples section for more details).Output: If reduction is ‘none’, shape \(()\), \((N)\) or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss, depending on the shape of the input. Otherwise, scalar.
where:
\[\begin{split}\begin{aligned} C ={} & \text{number of classes} \\ N ={} & \text{batch size} \\ \end{aligned}\end{split}\]
Examples:
>>> # Example of target with class indices >>> loss = nn.CrossEntropyLoss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.empty(3, dtype=torch.long).random_(5) >>> output = loss(input, target) >>> output.backward() >>> >>> # Example of target with class probabilities >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randn(3, 5).softmax(dim=1) >>> output = loss(input, target) >>> output.backward()Note
When
targetcontains class probabilities, it should consist of soft labels—that is, eachtargetentry should represent a probability distribution over the possible classes for a given data sample, with individual probabilities between[0,1]and the total distribution summing to 1. This is why thesoftmax()function is applied to thetargetin the class probabilities example above.PyTorch does not validate whether the values provided in
targetlie in the range[0,1]or whether the distribution of each data sample sums to1. No warning will be raised and it is the user’s responsibility to ensure thattargetcontains valid probability distributions. Providing arbitrary values may yield misleading loss values and unstable gradients during training.- Examples:
>>> # xdoctest: +SKIP >>> # Example of target with incorrectly specified class probabilities >>> loss = nn.CrossEntropyLoss() >>> torch.manual_seed(283) >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randn(3, 5) >>> # Provided target class probabilities are not in range [0,1] >>> target tensor([[ 0.7105, 0.4446, 2.0297, 0.2671, -0.6075], [-1.0496, -0.2753, -0.3586, 0.9270, 1.0027], [ 0.7551, 0.1003, 1.3468, -0.3581, -0.9569]]) >>> # Provided target class probabilities do not sum to 1 >>> target.sum(axis=1) tensor([2.8444, 0.2462, 0.8873]) >>> # No error message and possible misleading loss value >>> loss(input, target).item() 4.6379876136779785 >>> >>> # Example of target with correctly specified class probabilities >>> # Use .softmax() to ensure true probability distribution >>> target_new = target.softmax(dim=1) >>> # New target class probabilities all in range [0,1] >>> target_new tensor([[0.1559, 0.1195, 0.5830, 0.1000, 0.0417], [0.0496, 0.1075, 0.0990, 0.3579, 0.3860], [0.2607, 0.1355, 0.4711, 0.0856, 0.0471]]) >>> # New target class probabilities sum to 1 >>> target_new.sum(axis=1) tensor([1.0000, 1.0000, 1.0000]) >>> loss(input, target_new).item() 2.55349063873291
-
class torchwrench.nn.CrossMapLRN2d(size: int, alpha: float =
0.0001, beta: float =0.75, k: float =1)[source]¶ Bases:
Module
-
class torchwrench.nn.Dropout(p: float =
0.5, inplace: bool =False)[source]¶ Bases:
_DropoutNdDuring training, randomly zeroes some of the elements of the input tensor with probability
p.The zeroed elements are chosen independently for each forward call and are sampled from a Bernoulli distribution.
Each channel will be zeroed out independently on every forward call.
This has proven to be an effective technique for regularization and preventing the co-adaptation of neurons as described in the paper Improving neural networks by preventing co-adaptation of feature detectors .
Furthermore, the outputs are scaled by a factor of \(\frac{1}{1-p}\) during training. This means that during evaluation the module simply computes an identity function.
- Args:
p: probability of an element to be zeroed. Default: 0.5 inplace: If set to
True, will do this operation in-place. Default:False- Shape:
Input: \((*)\). Input can be of any shape
Output: \((*)\). Output is of the same shape as input
Examples:
>>> m = nn.Dropout(p=0.2) >>> input = torch.randn(20, 16) >>> output = m(input)
-
class torchwrench.nn.Dropout2d(p: float =
0.5, inplace: bool =False)[source]¶ Bases:
_DropoutNdRandomly zero out entire channels.
A channel is a 2D feature map, e.g., the \(j\)-th channel of the \(i\)-th sample in the batched input is a 2D tensor \(\text{input}[i, j]\).
Each channel will be zeroed out independently on every forward call with probability
pusing samples from a Bernoulli distribution.Usually the input comes from
nn.Conv2dmodules.As described in the paper Efficient Object Localization Using Convolutional Networks , if adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then i.i.d. dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease.
In this case,
nn.Dropout2d()will help promote independence between feature maps and should be used instead.- Args:
p (float, optional): probability of an element to be zero-ed. inplace (bool, optional): If set to
True, will do this operationin-place
Warning
Due to historical reasons, this class will perform 1D channel-wise dropout for 3D inputs (as done by
nn.Dropout1d). Thus, it currently does NOT support inputs without a batch dimension of shape \((C, H, W)\). This behavior will change in a future release to interpret 3D inputs as no-batch-dim inputs. To maintain the old behavior, switch tonn.Dropout1d.- Shape:
Input: \((N, C, H, W)\) or \((N, C, L)\).
Output: \((N, C, H, W)\) or \((N, C, L)\) (same shape as input).
Examples:
>>> m = nn.Dropout2d(p=0.2) >>> input = torch.randn(20, 16, 32, 32) >>> output = m(input)
-
class torchwrench.nn.Dropout3d(p: float =
0.5, inplace: bool =False)[source]¶ Bases:
_DropoutNdRandomly zero out entire channels.
A channel is a 3D feature map, e.g., the \(j\)-th channel of the \(i\)-th sample in the batched input is a 3D tensor \(\text{input}[i, j]\).
Each channel will be zeroed out independently on every forward call with probability
pusing samples from a Bernoulli distribution.Usually the input comes from
nn.Conv3dmodules.As described in the paper Efficient Object Localization Using Convolutional Networks , if adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then i.i.d. dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease.
In this case,
nn.Dropout3d()will help promote independence between feature maps and should be used instead.- Args:
p (float, optional): probability of an element to be zeroed. inplace (bool, optional): If set to
True, will do this operationin-place
- Shape:
Input: \((N, C, D, H, W)\) or \((C, D, H, W)\).
Output: \((N, C, D, H, W)\) or \((C, D, H, W)\) (same shape as input).
Examples:
>>> m = nn.Dropout3d(p=0.2) >>> input = torch.randn(20, 16, 4, 32, 32) >>> output = m(input)
-
class torchwrench.nn.ELU(alpha: float =
1.0, inplace: bool =False)[source]¶ Bases:
ModuleApplies the Exponential Linear Unit (ELU) function, element-wise.
Method described in the paper: Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs).
ELU is defined as:
\[\begin{split}\text{ELU}(x) = \begin{cases} x, & \text{ if } x > 0\\ \alpha * (\exp(x) - 1), & \text{ if } x \leq 0 \end{cases}\end{split}\]- Args:
alpha: the \(\alpha\) value for the ELU formulation. Default: 1.0 inplace: can optionally do the operation in-place. Default:
False- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.ELU() >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.EModule(*, strict_load: bool =
False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' ='first_param')[source]¶ Bases:
Generic[InType,OutType],ConfigModule,TypedModule[InType,OutType],ProxyDeviceModuleEnriched torch.nn.Module with proxy device, forward typing and automatic configuration detection from attributes.
The default behaviour is the same than PyTorch Module class.
- chain(*others: SupportsTypedForward[Any, OutType] | TypedModule[InType, OutType]) ESequential[InType, OutType][source]¶
- chain(*others: Module) ESequential[InType, Any]
-
class torchwrench.nn.EModuleDict(modules: Mapping[str, TypedModuleLike[InType, OutType3]] | None =
None, *, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE)[source]¶ -
class torchwrench.nn.EModuleDict(modules: Mapping[str, Module] | None =
None, *, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) Bases:
Generic[InType,OutType3],EModule[InType,Dict[str,OutType3]],ModuleDictEnriched torch.nn.ModuleDict with proxy device, forward typing and automatic configuration detection from attributes.
Designed to work with torchwrench.nn.EModule instances. The default behaviour is the same than PyTorch ModuleDict class, except for the forward call which returns a dict containing the output of each module called separately.
- forward(*args: InType, **kwargs: InType) dict[str, OutType3][source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.EModuleList(modules: Iterable[TypedModuleLike[InType, OutType3]] | None =
None, *, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE)[source]¶ -
class torchwrench.nn.EModuleList(modules: Iterable[Module] | None =
None, *, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) Bases:
Generic[InType,OutType3],EModule[InType,List[OutType3]],ModuleListEnriched torch.nn.ModuleList with proxy device, forward typing and automatic configuration detection from attributes.
Designed to work with torchwrench.nn.EModule instances. The default behaviour is the same than PyTorch ModuleList class, except for the forward call which returns a list containing the output of each module called separately.
- forward(*args: InType, **kwargs: InType) list[OutType3][source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.EModulePartial(fn: Callable[[Concatenate[InType, P]], OutType], *args: __SPHINX_IMMATERIAL_TYPE_VAR__P_P, **kwargs: __SPHINX_IMMATERIAL_TYPE_VAR__P_P)[source]¶
Bases:
Generic[InType,OutType],EModule[InType,OutType]Wrap a python callable to nn.Module class.
- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: InType, **kwargs: __SPHINX_IMMATERIAL_TYPE_VAR__P_P) OutType[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.ESequential(*, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE)[source]¶ -
class torchwrench.nn.ESequential(arg0: SupportsTypedForward[InType, OutType] | TypedModule[InType, OutType], /, *, unpack_tuple: bool =
False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE, unpack_dict: bool =False) -
class torchwrench.nn.ESequential(arg0: SupportsTypedForward[InType, Any] | TypedModule[InType, OutType], arg1: SupportsTypedForward[Any, OutType] | TypedModule[InType, OutType], /, *, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) -
class torchwrench.nn.ESequential(arg0: SupportsTypedForward[InType, Any] | TypedModule[InType, OutType], arg1: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg2: SupportsTypedForward[Any, OutType] | TypedModule[InType, OutType], /, *, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) -
class torchwrench.nn.ESequential(arg0: SupportsTypedForward[InType, Any] | TypedModule[InType, OutType], arg1: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg2: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg3: SupportsTypedForward[Any, OutType] | TypedModule[InType, OutType], /, *, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) -
class torchwrench.nn.ESequential(arg0: SupportsTypedForward[InType, Any] | TypedModule[InType, OutType], arg1: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg2: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg3: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg4: SupportsTypedForward[Any, OutType] | TypedModule[InType, OutType], /, *, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) -
class torchwrench.nn.ESequential(arg0: SupportsTypedForward[InType, Any] | TypedModule[InType, OutType], arg1: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg2: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg3: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg4: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg5: SupportsTypedForward[Any, OutType] | TypedModule[InType, OutType], /, *, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) -
class torchwrench.nn.ESequential(arg0: SupportsTypedForward[InType, Any] | TypedModule[InType, OutType], arg1: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg2: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg3: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg4: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg5: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg6: SupportsTypedForward[Any, OutType] | TypedModule[InType, OutType], /, *, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) -
class torchwrench.nn.ESequential(arg0: SupportsTypedForward[InType, Any] | TypedModule[InType, OutType], arg1: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg2: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg3: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg4: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg5: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg6: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg7: SupportsTypedForward[Any, OutType] | TypedModule[InType, OutType], /, *, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) -
class torchwrench.nn.ESequential(arg0: SupportsTypedForward[InType, Any] | TypedModule[InType, OutType], arg1: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg2: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg3: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg4: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg5: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg6: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg7: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg8: SupportsTypedForward[Any, OutType] | TypedModule[InType, OutType], /, *, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) -
class torchwrench.nn.ESequential(arg0: SupportsTypedForward[InType, Any] | TypedModule[InType, OutType], arg1: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg2: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg3: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg4: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg5: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg6: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg7: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg8: SupportsTypedForward[Any, Any] | TypedModule[InType, OutType], arg9: SupportsTypedForward[Any, OutType] | TypedModule[InType, OutType], /, *, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) -
class torchwrench.nn.ESequential(arg: OrderedDict[str, SupportsTypedForward[InType, OutType] | TypedModule[InType, OutType]], /, *, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) -
class torchwrench.nn.ESequential(arg: OrderedDict[str, Module], /, *, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) -
class torchwrench.nn.ESequential(*args: Module, unpack_tuple: bool =
False, unpack_dict: bool =False, strict_load: bool =False, config_to_extra_repr: bool =False, device_detect_mode: 'proxy' | 'first_param' | 'none' =_DEFAULT_DEVICE_DETECT_MODE) Bases:
Generic[InType,OutType],EModule[InType,OutType],TypedSequential[InType,OutType]Enriched torch.nn.Sequential with proxy device, forward typing and automatic configuration detection from attributes.
Designed to work with torchwrench.nn.EModule instances. The default behaviour is the same than PyTorch Sequential class.
-
class torchwrench.nn.Embedding(num_embeddings: int, embedding_dim: int, padding_idx: int | None =
None, max_norm: float | None =None, norm_type: float =2.0, scale_grad_by_freq: bool =False, sparse: bool =False, _weight: Tensor | None =None, _freeze: bool =False, device=None, dtype=None)[source]¶ Bases:
ModuleA simple lookup table that stores embeddings of a fixed dictionary and size.
This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.
- Args:
num_embeddings (int): size of the dictionary of embeddings embedding_dim (int): the size of each embedding vector padding_idx (int, optional): If specified, the entries at
padding_idxdo not contribute to the gradient;therefore, the embedding vector at
padding_idxis not updated during training, i.e. it remains as a fixed “pad”. For a newly constructed Embedding, the embedding vector atpadding_idxwill default to all zeros, but can be updated to another value to be used as the padding vector.- max_norm (float, optional): If given, each embedding vector with norm larger than
max_norm is renormalized to have norm
max_norm.
norm_type (float, optional): The p of the p-norm to compute for the
max_normoption. Default2. scale_grad_by_freq (bool, optional): If given, this will scale gradients by the inverse of frequency ofthe words in the mini-batch. Default
False.- sparse (bool, optional): If
True, gradient w.r.t.weightmatrix will be a sparse tensor. See Notes for more details regarding sparse gradients.
- max_norm (float, optional): If given, each embedding vector with norm larger than
- Attributes:
- weight (Tensor): the learnable weights of the module of shape (num_embeddings, embedding_dim)
initialized from \(\mathcal{N}(0, 1)\)
- Shape:
Input: \((*)\), IntTensor or LongTensor of arbitrary shape containing the indices to extract
Output: \((*, H)\), where * is the input shape and \(H=\text{embedding\_dim}\)
Note
Keep in mind that only a limited number of optimizers support sparse gradients: currently it’s
optim.SGD(CUDA and CPU),optim.SparseAdam(CUDA and CPU) andoptim.Adagrad(CPU)Note
When
max_normis notNone,Embedding’s forward method will modify theweighttensor in-place. Since tensors needed for gradient computations cannot be modified in-place, performing a differentiable operation onEmbedding.weightbefore callingEmbedding’s forward method requires cloningEmbedding.weightwhenmax_normis notNone. For example:n, d, m = 3, 5, 7 embedding = nn.Embedding(n, d, max_norm=1.0) W = torch.randn((m, d), requires_grad=True) idx = torch.tensor([1, 2]) a = ( embedding.weight.clone() @ W.t() ) # weight must be cloned for this to be differentiable b = embedding(idx) @ W.t() # modifies weight in-place out = a.unsqueeze(0) + b.unsqueeze(1) loss = out.sigmoid().prod() loss.backward()Examples:
>>> # an Embedding module containing 10 tensors of size 3 >>> embedding = nn.Embedding(10, 3) >>> # a batch of 2 samples of 4 indices each >>> input = torch.LongTensor([[1, 2, 4, 5], [4, 3, 2, 9]]) >>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> embedding(input) tensor([[[-0.0251, -1.6902, 0.7172], [-0.6431, 0.0748, 0.6969], [ 1.4970, 1.3448, -0.9685], [-0.3677, -2.7265, -0.1685]], [[ 1.4970, 1.3448, -0.9685], [ 0.4362, -0.4004, 0.9400], [-0.6431, 0.0748, 0.6969], [ 0.9124, -2.3616, 1.1151]]]) >>> # example with padding_idx >>> embedding = nn.Embedding(10, 3, padding_idx=0) >>> input = torch.LongTensor([[0, 2, 0, 5]]) >>> embedding(input) tensor([[[ 0.0000, 0.0000, 0.0000], [ 0.1535, -2.0309, 0.9315], [ 0.0000, 0.0000, 0.0000], [-0.1655, 0.9897, 0.0635]]]) >>> # example of changing `pad` vector >>> padding_idx = 0 >>> embedding = nn.Embedding(3, 3, padding_idx=padding_idx) >>> embedding.weight Parameter containing: tensor([[ 0.0000, 0.0000, 0.0000], [-0.7895, -0.7089, -0.0364], [ 0.6778, 0.5803, 0.2678]], requires_grad=True) >>> with torch.no_grad(): ... embedding.weight[padding_idx] = torch.ones(3) >>> embedding.weight Parameter containing: tensor([[ 1.0000, 1.0000, 1.0000], [-0.7895, -0.7089, -0.0364], [ 0.6778, 0.5803, 0.2678]], requires_grad=True)- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(input: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
classmethod from_pretrained(embeddings, freeze=
True, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False)[source]¶ Create Embedding instance from given 2-dimensional FloatTensor.
- Args:
- embeddings (Tensor): FloatTensor containing weights for the Embedding.
First dimension is being passed to Embedding as
num_embeddings, second asembedding_dim.- freeze (bool, optional): If
True, the tensor does not get updated in the learning process. Equivalent to
embedding.weight.requires_grad = False. Default:True- padding_idx (int, optional): If specified, the entries at
padding_idxdo not contribute to the gradient; therefore, the embedding vector at
padding_idxis not updated during training, i.e. it remains as a fixed “pad”.
max_norm (float, optional): See module initialization documentation. norm_type (float, optional): See module initialization documentation. Default
2. scale_grad_by_freq (bool, optional): See module initialization documentation. DefaultFalse. sparse (bool, optional): See module initialization documentation.
Examples:
>>> # FloatTensor containing pretrained weights >>> weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]]) >>> embedding = nn.Embedding.from_pretrained(weight) >>> # Get embeddings for index 1 >>> input = torch.LongTensor([1]) >>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> embedding(input) tensor([[ 4.0000, 5.1000, 6.3000]])
-
class torchwrench.nn.EmbeddingBag(num_embeddings: int, embedding_dim: int, max_norm: float | None =
None, norm_type: float =2.0, scale_grad_by_freq: bool =False, mode: str ='mean', sparse: bool =False, _weight: Tensor | None =None, include_last_offset: bool =False, padding_idx: int | None =None, device=None, dtype=None)[source]¶ Bases:
ModuleCompute sums or means of ‘bags’ of embeddings, without instantiating the intermediate embeddings.
For bags of constant length, no
per_sample_weights, no indices equal topadding_idx, and with 2D inputs, this classHowever,
EmbeddingBagis much more time and memory efficient than using a chain of these operations.EmbeddingBag also supports per-sample weights as an argument to the forward pass. This scales the output of the Embedding before performing a weighted reduction as specified by
mode. Ifper_sample_weightsis passed, the only supportedmodeis"sum", which computes a weighted sum according toper_sample_weights.- Args:
num_embeddings (int): size of the dictionary of embeddings embedding_dim (int): the size of each embedding vector max_norm (float, optional): If given, each embedding vector with norm larger than
max_normis renormalized to have norm
max_norm.norm_type (float, optional): The p of the p-norm to compute for the
max_normoption. Default2. scale_grad_by_freq (bool, optional): if given, this will scale gradients by the inverse of frequency ofthe words in the mini-batch. Default
False. Note: this option is not supported whenmode="max".- mode (str, optional):
"sum","mean"or"max". Specifies the way to reduce the bag. "sum"computes the weighted sum, takingper_sample_weightsinto consideration."mean"computes the average of the values in the bag,"max"computes the max value over each bag. Default:"mean"- sparse (bool, optional): if
True, gradient w.r.t.weightmatrix will be a sparse tensor. See Notes for more details regarding sparse gradients. Note: this option is not supported when
mode="max".- include_last_offset (bool, optional): if
True, the size of offsets is equal to the number of bags + 1. The last element is the size of the input, or the ending index position of the last bag (sequence). This matches the CSR format. Ignored when input is 2D. Default
False.- padding_idx (int, optional): If specified, the entries at
padding_idxdo not contribute to the gradient; therefore, the embedding vector at
padding_idxis not updated during training, i.e. it remains as a fixed “pad”. For a newly constructed EmbeddingBag, the embedding vector atpadding_idxwill default to all zeros, but can be updated to another value to be used as the padding vector. Note that the embedding vector atpadding_idxis excluded from the reduction.
- mode (str, optional):
- Attributes:
- weight (Tensor): the learnable weights of the module of shape (num_embeddings, embedding_dim)
initialized from \(\mathcal{N}(0, 1)\).
Examples:
>>> # an EmbeddingBag module containing 10 tensors of size 3 >>> embedding_sum = nn.EmbeddingBag(10, 3, mode='sum') >>> # a batch of 2 samples of 4 indices each >>> input = torch.tensor([1, 2, 4, 5, 4, 3, 2, 9], dtype=torch.long) >>> offsets = torch.tensor([0, 4], dtype=torch.long) >>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> embedding_sum(input, offsets) tensor([[-0.8861, -5.4350, -0.0523], [ 1.1306, -2.5798, -1.0044]]) >>> # Example with padding_idx >>> embedding_sum = nn.EmbeddingBag(10, 3, mode='sum', padding_idx=2) >>> input = torch.tensor([2, 2, 2, 2, 4, 3, 2, 9], dtype=torch.long) >>> offsets = torch.tensor([0, 4], dtype=torch.long) >>> embedding_sum(input, offsets) tensor([[ 0.0000, 0.0000, 0.0000], [-0.7082, 3.2145, -2.6251]]) >>> # An EmbeddingBag can be loaded from an Embedding like so >>> embedding = nn.Embedding(10, 3, padding_idx=2) >>> embedding_sum = nn.EmbeddingBag.from_pretrained( embedding.weight, padding_idx=embedding.padding_idx, mode='sum')- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
-
forward(input: Tensor, offsets: Tensor | None =
None, per_sample_weights: Tensor | None =None) Tensor[source]¶ Forward pass of EmbeddingBag.
- Args:
input (Tensor): Tensor containing bags of indices into the embedding matrix. offsets (Tensor, optional): Only used when
inputis 1D.offsetsdeterminesthe starting index position of each bag (sequence) in
input.- per_sample_weights (Tensor, optional): a tensor of float / double weights, or None
to indicate all weights should be taken to be
1. If specified,per_sample_weightsmust have exactly the same shape as input and is treated as having the sameoffsets, if those are notNone. Only supported formode='sum'.
- Returns:
Tensor output shape of (B, embedding_dim).
Note
A few notes about
inputandoffsets:inputandoffsetshave to be of the same type, either int or longIf
inputis 2D of shape (B, N), it will be treated asBbags (sequences) each of fixed lengthN, and this will returnBvalues aggregated in a way depending on themode.offsetsis ignored and required to beNonein this case.If
inputis 1D of shape (N), it will be treated as a concatenation of multiple bags (sequences).offsetsis required to be a 1D tensor containing the starting index positions of each bag ininput. Therefore, foroffsetsof shape (B),inputwill be viewed as havingBbags. Empty bags (i.e., having 0-length) will have returned vectors filled by zeros.
-
classmethod from_pretrained(embeddings: Tensor, freeze: bool =
True, max_norm: float | None =None, norm_type: float =2.0, scale_grad_by_freq: bool =False, mode: str ='mean', sparse: bool =False, include_last_offset: bool =False, padding_idx: int | None =None) EmbeddingBag[source]¶ Create EmbeddingBag instance from given 2-dimensional FloatTensor.
- Args:
- embeddings (Tensor): FloatTensor containing weights for the EmbeddingBag.
First dimension is being passed to EmbeddingBag as ‘num_embeddings’, second as ‘embedding_dim’.
- freeze (bool, optional): If
True, the tensor does not get updated in the learning process. Equivalent to
embeddingbag.weight.requires_grad = False. Default:True
max_norm (float, optional): See module initialization documentation. Default:
Nonenorm_type (float, optional): See module initialization documentation. Default2. scale_grad_by_freq (bool, optional): See module initialization documentation. DefaultFalse. mode (str, optional): See module initialization documentation. Default:"mean"sparse (bool, optional): See module initialization documentation. Default:False. include_last_offset (bool, optional): See module initialization documentation. Default:False. padding_idx (int, optional): See module initialization documentation. Default:None.
Examples:
>>> # FloatTensor containing pretrained weights >>> weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]]) >>> embeddingbag = nn.EmbeddingBag.from_pretrained(weight) >>> # Get embeddings for index 1 >>> input = torch.LongTensor([[1, 0]]) >>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> embeddingbag(input) tensor([[ 2.5000, 3.7000, 4.6500]])
- class torchwrench.nn.Exp(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
exp().- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.Exp2(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
exp2().- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.FFT(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
fft().- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.FeatureAlphaDropout(p: float =
0.5, inplace: bool =False)[source]¶ Bases:
_DropoutNdRandomly masks out entire channels.
A channel is a feature map, e.g. the \(j\)-th channel of the \(i\)-th sample in the batch input is a tensor \(\text{input}[i, j]\) of the input tensor). Instead of setting activations to zero, as in regular Dropout, the activations are set to the negative saturation value of the SELU activation function. More details can be found in the paper Self-Normalizing Neural Networks .
Each element will be masked independently for each sample on every forward call with probability
pusing samples from a Bernoulli distribution. The elements to be masked are randomized on every forward call, and scaled and shifted to maintain zero mean and unit variance.Usually the input comes from
nn.AlphaDropoutmodules.As described in the paper Efficient Object Localization Using Convolutional Networks , if adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then i.i.d. dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease.
In this case,
nn.AlphaDropout()will help promote independence between feature maps and should be used instead.- Args:
p (float, optional): probability of an element to be zeroed. Default: 0.5 inplace (bool, optional): If set to
True, will do this operationin-place
- Shape:
Input: \((N, C, D, H, W)\) or \((C, D, H, W)\).
Output: \((N, C, D, H, W)\) or \((C, D, H, W)\) (same shape as input).
Examples:
>>> m = nn.FeatureAlphaDropout(p=0.2) >>> input = torch.randn(20, 16, 4, 32, 32) >>> output = m(input)
-
class torchwrench.nn.Flatten(start_dim: int =
1, end_dim: int =-1)[source]¶ Bases:
ModuleFlattens a contiguous range of dims into a tensor.
For use with
Sequential, seetorch.flatten()for details.- Shape:
Input: \((*, S_{\text{start}},..., S_{i}, ..., S_{\text{end}}, *)\),’ where \(S_{i}\) is the size at dimension \(i\) and \(*\) means any number of dimensions including none.
Output: \((*, \prod_{i=\text{start}}^{\text{end}} S_{i}, *)\).
- Args:
start_dim: first dim to flatten (default = 1). end_dim: last dim to flatten (default = -1).
- Examples::
>>> input = torch.randn(32, 1, 5, 5) >>> # With default parameters >>> m = nn.Flatten() >>> output = m(input) >>> output.size() torch.Size([32, 25]) >>> # With non-default parameters >>> m = nn.Flatten(0, 2) >>> output = m(input) >>> output.size() torch.Size([160, 5])
-
class torchwrench.nn.Fold(output_size: int | tuple[int, ...], kernel_size: int | tuple[int, ...], dilation: int | tuple[int, ...] =
1, padding: int | tuple[int, ...] =0, stride: int | tuple[int, ...] =1)[source]¶ Bases:
ModuleCombines an array of sliding local blocks into a large containing tensor.
Consider a batched
inputtensor containing sliding local blocks, e.g., patches of images, of shape \((N, C \times \prod(\text{kernel\_size}), L)\), where \(N\) is batch dimension, \(C \times \prod(\text{kernel\_size})\) is the number of values within a block (a block has \(\prod(\text{kernel\_size})\) spatial locations each containing a \(C\)-channeled vector), and \(L\) is the total number of blocks. (This is exactly the same specification as the output shape ofUnfold.) This operation combines these local blocks into the largeoutputtensor of shape \((N, C, \text{output\_size}[0], \text{output\_size}[1], \dots)\) by summing the overlapping values. Similar toUnfold, the arguments must satisfy\[L = \prod_d \left\lfloor\frac{\text{output\_size}[d] + 2 \times \text{padding}[d] % - \text{dilation}[d] \times (\text{kernel\_size}[d] - 1) - 1}{\text{stride}[d]} + 1\right\rfloor,\]where \(d\) is over all spatial dimensions.
output_sizedescribes the spatial shape of the large containing tensor of the sliding local blocks. It is useful to resolve the ambiguity when multiple input shapes map to same number of sliding blocks, e.g., withstride > 0.
The
padding,strideanddilationarguments specify how the sliding blocks are retrieved.stridecontrols the stride for the sliding blocks.paddingcontrols the amount of implicit zero-paddings on both sides forpaddingnumber of points for each dimension before reshaping.dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of whatdilationdoes.
- Args:
- output_size (int or tuple): the shape of the spatial dimensions of the
output (i.e.,
output.sizes()[2:])
kernel_size (int or tuple): the size of the sliding blocks dilation (int or tuple, optional): a parameter that controls the
stride of elements within the neighborhood. Default: 1
- padding (int or tuple, optional): implicit zero padding to be added on
both sides of input. Default: 0
- stride (int or tuple): the stride of the sliding blocks in the input
spatial dimensions. Default: 1
If
output_size,kernel_size,dilation,paddingorstrideis an int or a tuple of length 1 then their values will be replicated across all spatial dimensions.For the case of two output spatial dimensions this operation is sometimes called
col2im.
Note
Foldcalculates each combined value in the resulting large tensor by summing all values from all containing blocks.Unfoldextracts the values in the local blocks by copying from the large tensor. So, if the blocks overlap, they are not inverses of each other.In general, folding and unfolding operations are related as follows. Consider
FoldandUnfoldinstances created with the same parameters:>>> fold_params = dict(kernel_size=..., dilation=..., padding=..., stride=...) >>> fold = nn.Fold(output_size=..., **fold_params) >>> unfold = nn.Unfold(**fold_params)Then for any (supported)
inputtensor the following equality holds:fold(unfold(input)) == divisor * inputwhere
divisoris a tensor that depends only on the shape and dtype of theinput:>>> # xdoctest: +SKIP >>> input_ones = torch.ones(input.shape, dtype=input.dtype) >>> divisor = fold(unfold(input_ones))When the
divisortensor contains no zero elements, thenfoldandunfoldoperations are inverses of each other (up to constant divisor).Warning
Currently, only unbatched (3D) or batched (4D) image-like output tensors are supported.
- Shape:
Input: \((N, C \times \prod(\text{kernel\_size}), L)\) or \((C \times \prod(\text{kernel\_size}), L)\)
Output: \((N, C, \text{output\_size}[0], \text{output\_size}[1], \dots)\) or \((C, \text{output\_size}[0], \text{output\_size}[1], \dots)\) as described above
Examples:
>>> fold = nn.Fold(output_size=(4, 5), kernel_size=(2, 2)) >>> input = torch.randn(1, 3 * 2 * 2, 12) >>> output = fold(input) >>> output.size() torch.Size([1, 3, 4, 5])
-
class torchwrench.nn.FractionalMaxPool2d(kernel_size: int | tuple[int, int], output_size: int | tuple[int, int] | None =
None, output_ratio: float | tuple[float, float] | None =None, return_indices: bool =False, _random_samples=None)[source]¶ Bases:
ModuleApplies a 2D fractional max pooling over an input signal composed of several input planes.
Fractional MaxPooling is described in detail in the paper Fractional MaxPooling by Ben Graham
The max-pooling operation is applied in \(kH \times kW\) regions by a stochastic step size determined by the target output size. The number of output features is equal to the number of input planes.
Note
Exactly one of
output_sizeoroutput_ratiomust be defined.- Args:
- kernel_size: the size of the window to take a max over.
Can be a single number k (for a square kernel of k x k) or a tuple (kh, kw)
- output_size: the target output size of the image of the form oH x oW.
Can be a tuple (oH, oW) or a single number oH for a square image oH x oH. Note that we must have \(kH + oH - 1 <= H_{in}\) and \(kW + oW - 1 <= W_{in}\)
- output_ratio: If one wants to have an output size as a ratio of the input size, this option can be given.
This has to be a number or tuple in the range (0, 1). Note that we must have \(kH + (output\_ratio\_H * H_{in}) - 1 <= H_{in}\) and \(kW + (output\_ratio\_W * W_{in}) - 1 <= W_{in}\)
- return_indices: if
True, will return the indices along with the outputs. Useful to pass to
nn.MaxUnpool2d(). Default:False
- Shape:
Input: \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\).
Output: \((N, C, H_{out}, W_{out})\) or \((C, H_{out}, W_{out})\), where \((H_{out}, W_{out})=\text{output\_size}\) or \((H_{out}, W_{out})=\text{output\_ratio} \times (H_{in}, W_{in})\).
- Examples:
>>> # pool of square window of size=3, and target output size 13x12 >>> m = nn.FractionalMaxPool2d(3, output_size=(13, 12)) >>> # pool of square window and target output size being half of input image size >>> m = nn.FractionalMaxPool2d(3, output_ratio=(0.5, 0.5)) >>> input = torch.randn(20, 16, 50, 32) >>> output = m(input)
- forward(input: Tensor)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.FractionalMaxPool3d(kernel_size: int | tuple[int, int, int], output_size: int | tuple[int, int, int] | None =
None, output_ratio: float | tuple[float, float, float] | None =None, return_indices: bool =False, _random_samples=None)[source]¶ Bases:
ModuleApplies a 3D fractional max pooling over an input signal composed of several input planes.
Fractional MaxPooling is described in detail in the paper Fractional MaxPooling by Ben Graham
The max-pooling operation is applied in \(kT \times kH \times kW\) regions by a stochastic step size determined by the target output size. The number of output features is equal to the number of input planes.
Note
Exactly one of
output_sizeoroutput_ratiomust be defined.- Args:
- kernel_size: the size of the window to take a max over.
Can be a single number k (for a square kernel of k x k x k) or a tuple (kt x kh x kw), k must greater than 0.
- output_size: the target output size of the image of the form oT x oH x oW.
Can be a tuple (oT, oH, oW) or a single number oH for a square image oH x oH x oH
- output_ratio: If one wants to have an output size as a ratio of the input size, this option can be given.
This has to be a number or tuple in the range (0, 1)
- return_indices: if
True, will return the indices along with the outputs. Useful to pass to
nn.MaxUnpool3d(). Default:False
- Shape:
Input: \((N, C, T_{in}, H_{in}, W_{in})\) or \((C, T_{in}, H_{in}, W_{in})\).
Output: \((N, C, T_{out}, H_{out}, W_{out})\) or \((C, T_{out}, H_{out}, W_{out})\), where \((T_{out}, H_{out}, W_{out})=\text{output\_size}\) or \((T_{out}, H_{out}, W_{out})=\text{output\_ratio} \times (T_{in}, H_{in}, W_{in})\)
- Examples:
>>> # pool of cubic window of size=3, and target output size 13x12x11 >>> m = nn.FractionalMaxPool3d(3, output_size=(13, 12, 11)) >>> # pool of cubic window and target output size being half of input size >>> m = nn.FractionalMaxPool3d(3, output_ratio=(0.5, 0.5, 0.5)) >>> input = torch.randn(20, 16, 50, 32, 16) >>> output = m(input)
- forward(input: Tensor)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.GELU(approximate: str =
'none')[source]¶ Bases:
ModuleApplies the Gaussian Error Linear Units function.
\[\text{GELU}(x) = x * \Phi(x)\]where \(\Phi(x)\) is the Cumulative Distribution Function for Gaussian Distribution.
When the approximate argument is ‘tanh’, Gelu is estimated with:
\[\text{GELU}(x) = 0.5 * x * (1 + \text{Tanh}(\sqrt{2 / \pi} * (x + 0.044715 * x^3)))\]- Args:
- approximate (str, optional): the gelu approximation algorithm to use:
'none'|'tanh'. Default:'none'
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.GELU() >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.GLU(dim: int =
-1)[source]¶ Bases:
ModuleApplies the gated linear unit function.
\({GLU}(a, b)= a \otimes \sigma(b)\) where \(a\) is the first half of the input matrices and \(b\) is the second half.
- Args:
dim (int): the dimension on which to split the input. Default: -1
- Shape:
Input: \((\ast_1, N, \ast_2)\) where * means, any number of additional dimensions
Output: \((\ast_1, M, \ast_2)\) where \(M=N/2\)
Examples:
>>> m = nn.GLU() >>> input = torch.randn(4, 2) >>> output = m(input)
-
class torchwrench.nn.GRU(input_size, hidden_size, num_layers=
1, bias=True, batch_first=False, dropout=0.0, bidirectional=False, device=None, dtype=None)[source]¶ Bases:
RNNBaseApply a multi-layer gated recurrent unit (GRU) RNN to an input sequence. For each element in the input sequence, each layer computes the following function:
\[\begin{split}\begin{array}{ll} r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t \odot (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - z_t) \odot n_t + z_t \odot h_{(t-1)} \end{array}\end{split}\]where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, \(h_{(t-1)}\) is the hidden state of the layer at time t-1 or the initial hidden state at time 0, and \(r_t\), \(z_t\), \(n_t\) are the reset, update, and new gates, respectively. \(\sigma\) is the sigmoid function, and \(\odot\) is the Hadamard product.
In a multilayer GRU, the input \(x^{(l)}_t\) of the \(l\) -th layer (\(l \ge 2\)) is the hidden state \(h^{(l-1)}_t\) of the previous layer multiplied by dropout \(\delta^{(l-1)}_t\) where each \(\delta^{(l-1)}_t\) is a Bernoulli random variable which is \(0\) with probability
dropout.- Args:
input_size: The number of expected features in the input x hidden_size: The number of features in the hidden state h num_layers: Number of recurrent layers. E.g., setting
num_layers=2would mean stacking two GRUs together to form a stacked GRU, with the second GRU taking in outputs of the first GRU and computing the final results. Default: 1
- bias: If
False, then the layer does not use bias weights b_ih and b_hh. Default:
True- batch_first: If
True, then the input and output tensors are provided as (batch, seq, feature) instead of (seq, batch, feature). Note that this does not apply to hidden or cell states. See the Inputs/Outputs sections below for details. Default:
False- dropout: If non-zero, introduces a Dropout layer on the outputs of each
GRU layer except the last layer, with dropout probability equal to
dropout. Default: 0
bidirectional: If
True, becomes a bidirectional GRU. Default:False- bias: If
- Inputs: input, h_0
input: tensor of shape \((L, H_{in})\) for unbatched input, \((L, N, H_{in})\) when
batch_first=Falseor \((N, L, H_{in})\) whenbatch_first=Truecontaining the features of the input sequence. The input can also be a packed variable length sequence. Seetorch.nn.utils.rnn.pack_padded_sequence()ortorch.nn.utils.rnn.pack_sequence()for details.h_0: tensor of shape \((D * \text{num\_layers}, H_{out})\) or \((D * \text{num\_layers}, N, H_{out})\) containing the initial hidden state for the input sequence. Defaults to zeros if not provided.
where:
\[\begin{split}\begin{aligned} N ={} & \text{batch size} \\ L ={} & \text{sequence length} \\ D ={} & 2 \text{ if bidirectional=True otherwise } 1 \\ H_{in} ={} & \text{input\_size} \\ H_{out} ={} & \text{hidden\_size} \end{aligned}\end{split}\]- Outputs: output, h_n
output: tensor of shape \((L, D * H_{out})\) for unbatched input, \((L, N, D * H_{out})\) when
batch_first=Falseor \((N, L, D * H_{out})\) whenbatch_first=Truecontaining the output features (h_t) from the last layer of the GRU, for each t. If atorch.nn.utils.rnn.PackedSequencehas been given as the input, the output will also be a packed sequence.h_n: tensor of shape \((D * \text{num\_layers}, H_{out})\) or \((D * \text{num\_layers}, N, H_{out})\) containing the final hidden state for the input sequence.
- Attributes:
- weight_ih_l[k]the learnable input-hidden weights of the \(\text{k}^{th}\) layer
(W_ir|W_iz|W_in), of shape (3*hidden_size, input_size) for k = 0. Otherwise, the shape is (3*hidden_size, num_directions * hidden_size)
- weight_hh_l[k]the learnable hidden-hidden weights of the \(\text{k}^{th}\) layer
(W_hr|W_hz|W_hn), of shape (3*hidden_size, hidden_size)
- bias_ih_l[k]the learnable input-hidden bias of the \(\text{k}^{th}\) layer
(b_ir|b_iz|b_in), of shape (3*hidden_size)
- bias_hh_l[k]the learnable hidden-hidden bias of the \(\text{k}^{th}\) layer
(b_hr|b_hz|b_hn), of shape (3*hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
Note
For bidirectional GRUs, forward and backward are directions 0 and 1 respectively. Example of splitting the output layers when
batch_first=False:output.view(seq_len, batch, num_directions, hidden_size).Note
batch_firstargument is ignored for unbatched inputs.Note
The calculation of new gate \(n_t\) subtly differs from the original paper and other frameworks. In the original implementation, the Hadamard product \((\odot)\) between \(r_t\) and the previous hidden state \(h_{(t-1)}\) is done before the multiplication with the weight matrix W and addition of bias:
\[\begin{aligned} n_t = \tanh(W_{in} x_t + b_{in} + W_{hn} ( r_t \odot h_{(t-1)} ) + b_{hn}) \end{aligned}\]This is in contrast to PyTorch implementation, which is done after \(W_{hn} h_{(t-1)}\)
\[\begin{aligned} n_t = \tanh(W_{in} x_t + b_{in} + r_t \odot (W_{hn} h_{(t-1)}+ b_{hn})) \end{aligned}\]This implementation differs on purpose for efficiency.
Examples:
>>> rnn = nn.GRU(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> output, hn = rnn(input, h0)-
forward(input: Tensor, hx: Tensor | None =
None) tuple[Tensor, Tensor][source]¶ -
forward(input: PackedSequence, hx: Tensor | None =
None) tuple[PackedSequence, Tensor] Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.GRUCell(input_size: int, hidden_size: int, bias: bool =
True, device=None, dtype=None)[source]¶ Bases:
RNNCellBaseA gated recurrent unit (GRU) cell.
\[\begin{split}\begin{array}{ll} r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r \odot (W_{hn} h + b_{hn})) \\ h' = (1 - z) \odot n + z \odot h \end{array}\end{split}\]where \(\sigma\) is the sigmoid function, and \(\odot\) is the Hadamard product.
- Args:
input_size: The number of expected features in the input x hidden_size: The number of features in the hidden state h bias: If
False, then the layer does not use bias weights b_ih andb_hh. Default:
True- Inputs: input, hidden
input : tensor containing input features
hidden : tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided.
- Outputs: h’
h’ : tensor containing the next hidden state for each element in the batch
- Shape:
input: \((N, H_{in})\) or \((H_{in})\) tensor containing input features where \(H_{in}\) = input_size.
hidden: \((N, H_{out})\) or \((H_{out})\) tensor containing the initial hidden state where \(H_{out}\) = hidden_size. Defaults to zero if not provided.
output: \((N, H_{out})\) or \((H_{out})\) tensor containing the next hidden state.
- Attributes:
- weight_ih: the learnable input-hidden weights, of shape
(3*hidden_size, input_size)
- weight_hh: the learnable hidden-hidden weights, of shape
(3*hidden_size, hidden_size)
bias_ih: the learnable input-hidden bias, of shape (3*hidden_size) bias_hh: the learnable hidden-hidden bias, of shape (3*hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
On certain ROCm devices, when using float16 inputs this module will use different precision for backward.
Examples:
>>> rnn = nn.GRUCell(10, 20) >>> input = torch.randn(6, 3, 10) >>> hx = torch.randn(3, 20) >>> output = [] >>> for i in range(6): ... hx = rnn(input[i], hx) ... output.append(hx)-
forward(input: Tensor, hx: Tensor | None =
None) Tensor[source]¶ Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.GaussianNLLLoss(*, full: bool =
False, eps: float =1e-06, reduction: str ='mean')[source]¶ Bases:
_LossGaussian negative log likelihood loss.
The targets are treated as samples from Gaussian distributions with expectations and variances predicted by the neural network. For a
targettensor modelled as having Gaussian distribution with a tensor of expectationsinputand a tensor of positive variancesvarthe loss is:\[\text{loss} = \frac{1}{2}\left(\log\left(\text{max}\left(\text{var}, \ \text{eps}\right)\right) + \frac{\left(\text{input} - \text{target}\right)^2} {\text{max}\left(\text{var}, \ \text{eps}\right)}\right) + \text{const.}\]where
epsis used for stability. By default, the constant term of the loss function is omitted unlessfullisTrue. Ifvaris not the same size asinput(due to a homoscedastic assumption), it must either have a final dimension of 1 or have one fewer dimension (with all other sizes being the same) for correct broadcasting.- Args:
- full (bool, optional): include the constant term in the loss
calculation. Default:
False.- eps (float, optional): value used to clamp
var(see note below), for stability. Default: 1e-6.
- reduction (str, optional): specifies the reduction to apply to the
output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the output is the average of all batch member losses,'sum': the output is the sum of all batch member losses. Default:'mean'.
- Shape:
Input: \((N, *)\) or \((*)\) where \(*\) means any number of additional dimensions
Target: \((N, *)\) or \((*)\), same shape as the input, or same shape as the input but with one dimension equal to 1 (to allow for broadcasting)
Var: \((N, *)\) or \((*)\), same shape as the input, or same shape as the input but with one dimension equal to 1, or same shape as the input but with one fewer dimension (to allow for broadcasting), or a scalar value
Output: scalar if
reductionis'mean'(default) or'sum'. Ifreductionis'none', then \((N, *)\), same shape as the input
- Examples:
>>> loss = nn.GaussianNLLLoss() >>> input = torch.randn(5, 2, requires_grad=True) >>> target = torch.randn(5, 2) >>> var = torch.ones(5, 2, requires_grad=True) # heteroscedastic >>> output = loss(input, target, var) >>> output.backward()>>> loss = nn.GaussianNLLLoss() >>> input = torch.randn(5, 2, requires_grad=True) >>> target = torch.randn(5, 2) >>> var = torch.ones(5, 1, requires_grad=True) # homoscedastic >>> output = loss(input, target, var) >>> output.backward()- Note:
The clamping of
varis ignored with respect to autograd, and so the gradients are unaffected by it.- Reference:
Nix, D. A. and Weigend, A. S., “Estimating the mean and variance of the target probability distribution”, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), Orlando, FL, USA, 1994, pp. 55-60 vol.1, doi: 10.1109/ICNN.1994.374138.
-
class torchwrench.nn.GroupNorm(num_groups: int, num_channels: int, eps: float =
1e-05, affine: bool =True, device=None, dtype=None)[source]¶ Bases:
ModuleApplies Group Normalization over a mini-batch of inputs.
This layer implements the operation as described in the paper Group Normalization
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The input channels are separated into
num_groupsgroups, each containingnum_channels / num_groupschannels.num_channelsmust be divisible bynum_groups. The mean and standard-deviation are calculated separately over each group. \(\gamma\) and \(\beta\) are learnable per-channel affine transform parameter vectors of sizenum_channelsifaffineisTrue. The variance is calculated via the biased estimator, equivalent to torch.var(input, correction=0).This layer uses statistics computed from input data in both training and evaluation modes.
- Args:
num_groups (int): number of groups to separate the channels into num_channels (int): number of channels expected in input eps: a value added to the denominator for numerical stability. Default: 1e-5 affine: a boolean value that when set to
True, this modulehas learnable per-channel affine parameters initialized to ones (for weights) and zeros (for biases). Default:
True.- Shape:
Input: \((N, C, *)\) where \(C=\text{num\_channels}\)
Output: \((N, C, *)\) (same shape as input)
Examples:
>>> input = torch.randn(20, 6, 10, 10) >>> # Separate 6 channels into 3 groups >>> m = nn.GroupNorm(3, 6) >>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm) >>> m = nn.GroupNorm(6, 6) >>> # Put all 6 channels into a single group (equivalent with LayerNorm) >>> m = nn.GroupNorm(1, 6) >>> # Activating the module >>> output = m(input)- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(input: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.Hardshrink(lambd: float =
0.5)[source]¶ Bases:
ModuleApplies the Hard Shrinkage (Hardshrink) function element-wise.
Hardshrink is defined as:
\[\begin{split}\text{HardShrink}(x) = \begin{cases} x, & \text{ if } x > \lambda \\ x, & \text{ if } x < -\lambda \\ 0, & \text{ otherwise } \end{cases}\end{split}\]- Args:
lambd: the \(\lambda\) value for the Hardshrink formulation. Default: 0.5
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Hardshrink() >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.Hardsigmoid(inplace: bool =
False)[source]¶ Bases:
ModuleApplies the Hardsigmoid function element-wise.
Hardsigmoid is defined as:
\[\begin{split}\text{Hardsigmoid}(x) = \begin{cases} 0 & \text{if~} x \le -3, \\ 1 & \text{if~} x \ge +3, \\ x / 6 + 1 / 2 & \text{otherwise} \end{cases}\end{split}\]- Args:
inplace: can optionally do the operation in-place. Default:
False- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Hardsigmoid() >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.Hardswish(inplace: bool =
False)[source]¶ Bases:
ModuleApplies the Hardswish function, element-wise.
Method described in the paper: Searching for MobileNetV3.
Hardswish is defined as:
\[\begin{split}\text{Hardswish}(x) = \begin{cases} 0 & \text{if~} x \le -3, \\ x & \text{if~} x \ge +3, \\ x \cdot (x + 3) /6 & \text{otherwise} \end{cases}\end{split}\]- Args:
inplace: can optionally do the operation in-place. Default:
False- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Hardswish() >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.Hardtanh(min_val: float =
-1.0, max_val: float =1.0, inplace: bool =False, min_value: float | None =None, max_value: float | None =None)[source]¶ Bases:
ModuleApplies the HardTanh function element-wise.
HardTanh is defined as:
\[\begin{split}\text{HardTanh}(x) = \begin{cases} \text{max\_val} & \text{ if } x > \text{ max\_val } \\ \text{min\_val} & \text{ if } x < \text{ min\_val } \\ x & \text{ otherwise } \\ \end{cases}\end{split}\]- Args:
min_val: minimum value of the linear region range. Default: -1 max_val: maximum value of the linear region range. Default: 1 inplace: can optionally do the operation in-place. Default:
False
Keyword arguments
min_valueandmax_valuehave been deprecated in favor ofmin_valandmax_val.- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Hardtanh(-2, 2) >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.HingeEmbeddingLoss(margin: float =
1.0, size_average=None, reduce=None, reduction: str ='mean')[source]¶ Bases:
_LossMeasures the loss given an input tensor \(x\) and a labels tensor \(y\) (containing 1 or -1). This is usually used for measuring whether two inputs are similar or dissimilar, e.g. using the L1 pairwise distance as \(x\), and is typically used for learning nonlinear embeddings or semi-supervised learning.
The loss function for \(n\)-th sample in the mini-batch is
\[\begin{split}l_n = \begin{cases} x_n, & \text{if}\; y_n = 1,\\ \max \{0, margin - x_n\}, & \text{if}\; y_n = -1, \end{cases}\end{split}\]and the total loss functions is
\[\begin{split}\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]where \(L = \{l_1,\dots,l_N\}^\top\).
- Args:
margin (float, optional): Has a default value of 1. size_average (bool, optional): Deprecated (see
reduction). By default,the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- reduce (bool, optional): Deprecated (see
- Shape:
Input: \((*)\) where \(*\) means, any number of dimensions. The sum operation operates over all the elements.
Target: \((*)\), same shape as the input
Output: scalar. If
reductionis'none', then same shape as the input
-
class torchwrench.nn.HuberLoss(reduction: str =
'mean', delta: float =1.0)[source]¶ Bases:
_LossCreates a criterion that uses a squared term if the absolute element-wise error falls below delta and a delta-scaled L1 term otherwise. This loss combines advantages of both
L1LossandMSELoss; the delta-scaled L1 region makes the loss less sensitive to outliers thanMSELoss, while the L2 region provides smoothness overL1Lossnear 0. See Huber loss for more information.For a batch of size \(N\), the unreduced loss can be described as:
\[\ell(x, y) = L = \{l_1, ..., l_N\}^T\]with
\[\begin{split}l_n = \begin{cases} 0.5 (x_n - y_n)^2, & \text{if } |x_n - y_n| < delta \\ delta * (|x_n - y_n| - 0.5 * delta), & \text{otherwise } \end{cases}\end{split}\]If reduction is not none, then:
\[\begin{split}\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]Note
When delta is set to 1, this loss is equivalent to
SmoothL1Loss. In general, this loss differs fromSmoothL1Lossby a factor of delta (AKA beta in Smooth L1). SeeSmoothL1Lossfor additional discussion on the differences in behavior between the two losses.- Args:
- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Default:'mean'- delta (float, optional): Specifies the threshold at which to change between delta-scaled L1 and L2 loss.
The value must be positive. Default: 1.0
- Shape:
Input: \((*)\) where \(*\) means any number of dimensions.
Target: \((*)\), same shape as the input.
Output: scalar. If
reductionis'none', then \((*)\), same shape as the input.
- class torchwrench.nn.IFFT(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
ifft().- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.Identity(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleA placeholder identity operator that is argument-insensitive.
- Args:
args: any argument (unused) kwargs: any keyword argument (unused)
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Identity(54, unused_argument1=0.1, unused_argument2=False) >>> input = torch.randn(128, 20) >>> output = m(input) >>> print(output.size()) torch.Size([128, 20])
-
class torchwrench.nn.Imag(*, return_zeros: bool =
False)[source]¶ Bases:
ModuleModule version of
imag().- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.IndexToName(idx_to_name: Mapping[int, T_Name] | Sequence[T_Name])[source]¶
Bases:
Generic[T_Name],ModuleFor more information, see
index_to_name().- forward(index: list[int] | Tensor) list[T_Name][source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.IndexToOnehot(num_classes: int, *, padding_idx: int | None =
None, device: device | None | 'default' | 'cuda_if_available' | str | int =None, dtype: dtype | None | 'default' | str | DTypeEnum =torch.bool)[source]¶ Bases:
ModuleFor more information, see
index_to_onehot().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(index: list[int] | Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- torchwrench.nn.IndicesToMultihot¶
alias of
MultiIndicesToMultihot
- torchwrench.nn.IndicesToMultinames¶
alias of
MultiIndicesToMultinames
-
class torchwrench.nn.InstanceNorm1d(num_features: int, eps: float =
1e-05, momentum: float =0.1, affine: bool =False, track_running_stats: bool =False, device=None, dtype=None)[source]¶ Bases:
_InstanceNormApplies Instance Normalization.
This operation applies Instance Normalization over a 2D (unbatched) or 3D (batched) input as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the number of features or channels of the input) if
affineisTrue. The variance is calculated via the biased estimator, equivalent to torch.var(input, correction=0).By default, this layer uses instance statistics computed from input data in both training and evaluation modes.
If
track_running_statsis set toTrue, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a defaultmomentumof 0.1.Note
This
momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Note
InstanceNorm1dandLayerNormare very similar, but have some subtle differences.InstanceNorm1dis applied on each channel of channeled data like multidimensional time series, butLayerNormis usually applied on entire sample and often in NLP tasks. Additionally,LayerNormapplies elementwise affine transform, whileInstanceNorm1dusually don’t apply affine transform.- Args:
num_features: number of features or channels \(C\) of the input eps: a value added to the denominator for numerical stability. Default: 1e-5 momentum: the value used for the running_mean and running_var computation. Default: 0.1 affine: a boolean value that when set to
True, this module haslearnable affine parameters, initialized the same way as done for batch normalization. Default:
False.- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:False
- track_running_stats: a boolean value that when set to
- Shape:
Input: \((N, C, L)\) or \((C, L)\)
Output: \((N, C, L)\) or \((C, L)\) (same shape as input)
Examples:
>>> # Without Learnable Parameters >>> m = nn.InstanceNorm1d(100) >>> # With Learnable Parameters >>> m = nn.InstanceNorm1d(100, affine=True) >>> input = torch.randn(20, 100, 40) >>> output = m(input)
-
class torchwrench.nn.InstanceNorm2d(num_features: int, eps: float =
1e-05, momentum: float =0.1, affine: bool =False, track_running_stats: bool =False, device=None, dtype=None)[source]¶ Bases:
_InstanceNormApplies Instance Normalization.
This operation applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size) if
affineisTrue. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, correction=0).By default, this layer uses instance statistics computed from input data in both training and evaluation modes.
If
track_running_statsis set toTrue, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a defaultmomentumof 0.1.Note
This
momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Note
InstanceNorm2dandLayerNormare very similar, but have some subtle differences.InstanceNorm2dis applied on each channel of channeled data like RGB images, butLayerNormis usually applied on entire sample and often in NLP tasks. Additionally,LayerNormapplies elementwise affine transform, whileInstanceNorm2dusually don’t apply affine transform.- Args:
- num_features: \(C\) from an expected input of size
\((N, C, H, W)\) or \((C, H, W)\)
eps: a value added to the denominator for numerical stability. Default: 1e-5 momentum: the value used for the running_mean and running_var computation. Default: 0.1 affine: a boolean value that when set to
True, this module haslearnable affine parameters, initialized the same way as done for batch normalization. Default:
False.- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:False
- Shape:
Input: \((N, C, H, W)\) or \((C, H, W)\)
Output: \((N, C, H, W)\) or \((C, H, W)\) (same shape as input)
Examples:
>>> # Without Learnable Parameters >>> m = nn.InstanceNorm2d(100) >>> # With Learnable Parameters >>> m = nn.InstanceNorm2d(100, affine=True) >>> input = torch.randn(20, 100, 35, 45) >>> output = m(input)
-
class torchwrench.nn.InstanceNorm3d(num_features: int, eps: float =
1e-05, momentum: float =0.1, affine: bool =False, track_running_stats: bool =False, device=None, dtype=None)[source]¶ Bases:
_InstanceNormApplies Instance Normalization.
This operation applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size) if
affineisTrue. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, correction=0).By default, this layer uses instance statistics computed from input data in both training and evaluation modes.
If
track_running_statsis set toTrue, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a defaultmomentumof 0.1.Note
This
momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Note
InstanceNorm3dandLayerNormare very similar, but have some subtle differences.InstanceNorm3dis applied on each channel of channeled data like 3D models with RGB color, butLayerNormis usually applied on entire sample and often in NLP tasks. Additionally,LayerNormapplies elementwise affine transform, whileInstanceNorm3dusually don’t apply affine transform.- Args:
- num_features: \(C\) from an expected input of size
\((N, C, D, H, W)\) or \((C, D, H, W)\)
eps: a value added to the denominator for numerical stability. Default: 1e-5 momentum: the value used for the running_mean and running_var computation. Default: 0.1 affine: a boolean value that when set to
True, this module haslearnable affine parameters, initialized the same way as done for batch normalization. Default:
False.- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:False
- Shape:
Input: \((N, C, D, H, W)\) or \((C, D, H, W)\)
Output: \((N, C, D, H, W)\) or \((C, D, H, W)\) (same shape as input)
Examples:
>>> # Without Learnable Parameters >>> m = nn.InstanceNorm3d(100) >>> # With Learnable Parameters >>> m = nn.InstanceNorm3d(100, affine=True) >>> input = torch.randn(20, 100, 35, 45, 10) >>> output = m(input)
-
class torchwrench.nn.KLDivLoss(size_average=
None, reduce=None, reduction: str ='mean', log_target: bool =False)[source]¶ Bases:
_LossThe Kullback-Leibler divergence loss.
For tensors of the same shape \(y_{\text{pred}},\ y_{\text{true}}\), where \(y_{\text{pred}}\) is the
inputand \(y_{\text{true}}\) is thetarget, we define the pointwise KL-divergence as\[L(y_{\text{pred}},\ y_{\text{true}}) = y_{\text{true}} \cdot \log \frac{y_{\text{true}}}{y_{\text{pred}}} = y_{\text{true}} \cdot (\log y_{\text{true}} - \log y_{\text{pred}})\]To avoid underflow issues when computing this quantity, this loss expects the argument
inputin the log-space. The argumenttargetmay also be provided in the log-space iflog_target= True.To summarise, this function is roughly equivalent to computing
if not log_target: # default loss_pointwise = target * (target.log() - input) else: loss_pointwise = target.exp() * (target - input)and then reducing this result depending on the argument
reductionasif reduction == "mean": # default loss = loss_pointwise.mean() elif reduction == "batchmean": # mathematically correct loss = loss_pointwise.sum() / input.size(0) elif reduction == "sum": loss = loss_pointwise.sum() else: # reduction == "none" loss = loss_pointwiseNote
As all the other losses in PyTorch, this function expects the first argument,
input, to be the output of the model (e.g. the neural network) and the second,target, to be the observations in the dataset. This differs from the standard mathematical notation \(KL(P\ ||\ Q)\) where \(P\) denotes the distribution of the observations and \(Q\) denotes the model.Warning
reduction= “mean” doesn’t return the true KL divergence value, please usereduction= “batchmean” which aligns with the mathematical definition.- Args:
- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set to False, the losses are instead summed for each minibatch. Ignored whenreduceis False. Default: True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. Whenreduceis False, returns a loss per batch element instead and ignoressize_average. Default: True
reduction (str, optional): Specifies the reduction to apply to the output. Default: “mean” log_target (bool, optional): Specifies whether target is the log space. Default: False
- size_average (bool, optional): Deprecated (see
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Target: \((*)\), same shape as the input.
Output: scalar by default. If
reductionis ‘none’, then \((*)\), same shape as the input.
- Examples:
>>> kl_loss = nn.KLDivLoss(reduction="batchmean") >>> # input should be a distribution in the log space >>> input = F.log_softmax(torch.randn(3, 5, requires_grad=True), dim=1) >>> # Sample a batch of distributions. Usually this would come from the dataset >>> target = F.softmax(torch.rand(3, 5), dim=1) >>> output = kl_loss(input, target) >>> >>> kl_loss = nn.KLDivLoss(reduction="batchmean", log_target=True) >>> log_target = F.log_softmax(torch.rand(3, 5), dim=1) >>> output = kl_loss(input, log_target)
-
class torchwrench.nn.L1Loss(size_average=
None, reduce=None, reduction: str ='mean')[source]¶ Bases:
_LossCreates a criterion that measures the mean absolute error (MAE) between each element in the input \(x\) and target \(y\).
The unreduced (i.e. with
reductionset to'none') loss can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = \left| x_n - y_n \right|,\]where \(N\) is the batch size. If
reductionis not'none'(default'mean'), then:\[\begin{split}\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]\(x\) and \(y\) are tensors of arbitrary shapes with a total of \(N\) elements each.
The sum operation still operates over all the elements, and divides by \(N\).
The division by \(N\) can be avoided if one sets
reduction = 'sum'.Supports real-valued and complex-valued inputs.
- Args:
- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- size_average (bool, optional): Deprecated (see
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Target: \((*)\), same shape as the input.
Output: scalar. If
reductionis'none', then \((*)\), same shape as the input.
Examples:
>>> loss = nn.L1Loss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randn(3, 5) >>> output = loss(input, target) >>> output.backward()
-
class torchwrench.nn.LPPool1d(norm_type: float, kernel_size: int | tuple[int, ...], stride: int | tuple[int, ...] | None =
None, ceil_mode: bool =False)[source]¶ Bases:
_LPPoolNdApplies a 1D power-average pooling over an input signal composed of several input planes.
On each window, the function computed is:
\[f(X) = \sqrt[p]{\sum_{x \in X} x^{p}}\]At p = \(\infty\), one gets Max Pooling
At p = 1, one gets Sum Pooling (which is proportional to Average Pooling)
Note
If the sum to the power of p is zero, the gradient of this function is not defined. This implementation will set the gradient to zero in this case.
- Args:
kernel_size: a single int, the size of the window stride: a single int, the stride of the window. Default value is
kernel_sizeceil_mode: when True, will use ceil instead of floor to compute the output shape- Note:
When
ceil_modeisTrue, sliding windows may go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.- Shape:
Input: \((N, C, L_{in})\) or \((C, L_{in})\).
Output: \((N, C, L_{out})\) or \((C, L_{out})\), where
\[L_{out} = \left\lfloor\frac{L_{in} - \text{kernel\_size}}{\text{stride}} + 1\right\rfloor\]
- Examples::
>>> # power-2 pool of window of length 3, with stride 2. >>> m = nn.LPPool1d(2, 3, stride=2) >>> input = torch.randn(20, 16, 50) >>> output = m(input)
-
class torchwrench.nn.LPPool2d(norm_type: float, kernel_size: int | tuple[int, ...], stride: int | tuple[int, ...] | None =
None, ceil_mode: bool =False)[source]¶ Bases:
_LPPoolNdApplies a 2D power-average pooling over an input signal composed of several input planes.
On each window, the function computed is:
\[f(X) = \sqrt[p]{\sum_{x \in X} x^{p}}\]At p = \(\infty\), one gets Max Pooling
At p = 1, one gets Sum Pooling (which is proportional to average pooling)
The parameters
kernel_size,stridecan either be:a single
int– in which case the same value is used for the height and width dimensiona
tupleof two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
Note
If the sum to the power of p is zero, the gradient of this function is not defined. This implementation will set the gradient to zero in this case.
- Args:
kernel_size: the size of the window stride: the stride of the window. Default value is
kernel_sizeceil_mode: when True, will use ceil instead of floor to compute the output shape- Note:
When
ceil_modeisTrue, sliding windows may go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.- Shape:
Input: \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\).
Output: \((N, C, H_{out}, W_{out})\) or \((C, H_{out}, W_{out})\), where
\[H_{out} = \left\lfloor\frac{H_{in} - \text{kernel\_size}[0]}{\text{stride}[0]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} - \text{kernel\_size}[1]}{\text{stride}[1]} + 1\right\rfloor\]
Examples:
>>> # power-2 pool of square window of size=3, stride=2 >>> m = nn.LPPool2d(2, 3, stride=2) >>> # pool of non-square window of power 1.2 >>> m = nn.LPPool2d(1.2, (3, 2), stride=(2, 1)) >>> input = torch.randn(20, 16, 50, 32) >>> output = m(input)
-
class torchwrench.nn.LSTM(input_size, hidden_size, num_layers=
1, bias=True, batch_first=False, dropout=0.0, bidirectional=False, proj_size=0, device=None, dtype=None)[source]¶ Bases:
RNNBaseApply a multi-layer long short-term memory (LSTM) RNN to an input sequence. For each element in the input sequence, each layer computes the following function:
\[\begin{split}\begin{array}{ll} \\ i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi}) \\ f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg}) \\ o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{t-1} + b_{ho}) \\ c_t = f_t \odot c_{t-1} + i_t \odot g_t \\ h_t = o_t \odot \tanh(c_t) \\ \end{array}\end{split}\]where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the input at time t, \(h_{t-1}\) is the hidden state of the layer at time t-1 or the initial hidden state at time 0, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and output gates, respectively. \(\sigma\) is the sigmoid function, and \(\odot\) is the Hadamard product.
In a multilayer LSTM, the input \(x^{(l)}_t\) of the \(l\) -th layer (\(l \ge 2\)) is the hidden state \(h^{(l-1)}_t\) of the previous layer multiplied by dropout \(\delta^{(l-1)}_t\) where each \(\delta^{(l-1)}_t\) is a Bernoulli random variable which is \(0\) with probability
dropout.If
proj_size > 0is specified, LSTM with projections will be used. This changes the LSTM cell in the following way. First, the dimension of \(h_t\) will be changed fromhidden_sizetoproj_size(dimensions of \(W_{hi}\) will be changed accordingly). Second, the output hidden state of each layer will be multiplied by a learnable projection matrix: \(h_t = W_{hr}h_t\). Note that as a consequence of this, the output of LSTM network will be of different shape as well. See Inputs/Outputs sections below for exact dimensions of all variables. You can find more details in https://arxiv.org/abs/1402.1128.- Args:
input_size: The number of expected features in the input x hidden_size: The number of features in the hidden state h num_layers: Number of recurrent layers. E.g., setting
num_layers=2would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1
- bias: If
False, then the layer does not use bias weights b_ih and b_hh. Default:
True- batch_first: If
True, then the input and output tensors are provided as (batch, seq, feature) instead of (seq, batch, feature). Note that this does not apply to hidden or cell states. See the Inputs/Outputs sections below for details. Default:
False- dropout: If non-zero, introduces a Dropout layer on the outputs of each
LSTM layer except the last layer, with dropout probability equal to
dropout. Default: 0
bidirectional: If
True, becomes a bidirectional LSTM. Default:Falseproj_size: If> 0, will use LSTM with projections of corresponding size. Default: 0- bias: If
- Inputs: input, (h_0, c_0)
input: tensor of shape \((L, H_{in})\) for unbatched input, \((L, N, H_{in})\) when
batch_first=Falseor \((N, L, H_{in})\) whenbatch_first=Truecontaining the features of the input sequence. The input can also be a packed variable length sequence. Seetorch.nn.utils.rnn.pack_padded_sequence()ortorch.nn.utils.rnn.pack_sequence()for details.h_0: tensor of shape \((D * \text{num\_layers}, H_{out})\) for unbatched input or \((D * \text{num\_layers}, N, H_{out})\) containing the initial hidden state for each element in the input sequence. Defaults to zeros if (h_0, c_0) is not provided.
c_0: tensor of shape \((D * \text{num\_layers}, H_{cell})\) for unbatched input or \((D * \text{num\_layers}, N, H_{cell})\) containing the initial cell state for each element in the input sequence. Defaults to zeros if (h_0, c_0) is not provided.
where:
\[\begin{split}\begin{aligned} N ={} & \text{batch size} \\ L ={} & \text{sequence length} \\ D ={} & 2 \text{ if bidirectional=True otherwise } 1 \\ H_{in} ={} & \text{input\_size} \\ H_{cell} ={} & \text{hidden\_size} \\ H_{out} ={} & \text{proj\_size if } \text{proj\_size}>0 \text{ otherwise hidden\_size} \\ \end{aligned}\end{split}\]- Outputs: output, (h_n, c_n)
output: tensor of shape \((L, D * H_{out})\) for unbatched input, \((L, N, D * H_{out})\) when
batch_first=Falseor \((N, L, D * H_{out})\) whenbatch_first=Truecontaining the output features (h_t) from the last layer of the LSTM, for each t. If atorch.nn.utils.rnn.PackedSequencehas been given as the input, the output will also be a packed sequence. Whenbidirectional=True, output will contain a concatenation of the forward and reverse hidden states at each time step in the sequence.h_n: tensor of shape \((D * \text{num\_layers}, H_{out})\) for unbatched input or \((D * \text{num\_layers}, N, H_{out})\) containing the final hidden state for each element in the sequence. When
bidirectional=True, h_n will contain a concatenation of the final forward and reverse hidden states, respectively.c_n: tensor of shape \((D * \text{num\_layers}, H_{cell})\) for unbatched input or \((D * \text{num\_layers}, N, H_{cell})\) containing the final cell state for each element in the sequence. When
bidirectional=True, c_n will contain a concatenation of the final forward and reverse cell states, respectively.
- Attributes:
- weight_ih_l[k]the learnable input-hidden weights of the \(\text{k}^{th}\) layer
(W_ii|W_if|W_ig|W_io), of shape (4*hidden_size, input_size) for k = 0. Otherwise, the shape is (4*hidden_size, num_directions * hidden_size). If
proj_size > 0was specified, the shape will be (4*hidden_size, num_directions * proj_size) for k > 0- weight_hh_l[k]the learnable hidden-hidden weights of the \(\text{k}^{th}\) layer
(W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size, hidden_size). If
proj_size > 0was specified, the shape will be (4*hidden_size, proj_size).- bias_ih_l[k]the learnable input-hidden bias of the \(\text{k}^{th}\) layer
(b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)
- bias_hh_l[k]the learnable hidden-hidden bias of the \(\text{k}^{th}\) layer
(b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)
- weight_hr_l[k]the learnable projection weights of the \(\text{k}^{th}\) layer
of shape (proj_size, hidden_size). Only present when
proj_size > 0was specified.- weight_ih_l[k]_reverse: Analogous to weight_ih_l[k] for the reverse direction.
Only present when
bidirectional=True.- weight_hh_l[k]_reverse: Analogous to weight_hh_l[k] for the reverse direction.
Only present when
bidirectional=True.- bias_ih_l[k]_reverse: Analogous to bias_ih_l[k] for the reverse direction.
Only present when
bidirectional=True.- bias_hh_l[k]_reverse: Analogous to bias_hh_l[k] for the reverse direction.
Only present when
bidirectional=True.- weight_hr_l[k]_reverse: Analogous to weight_hr_l[k] for the reverse direction.
Only present when
bidirectional=Trueandproj_size > 0was specified.
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
Note
For bidirectional LSTMs, forward and backward are directions 0 and 1 respectively. Example of splitting the output layers when
batch_first=False:output.view(seq_len, batch, num_directions, hidden_size).Note
For bidirectional LSTMs, h_n is not equivalent to the last element of output; the former contains the final forward and reverse hidden states, while the latter contains the final forward hidden state and the initial reverse hidden state.
Note
batch_firstargument is ignored for unbatched inputs.Note
proj_sizeshould be smaller thanhidden_size.Examples:
>>> rnn = nn.LSTM(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> c0 = torch.randn(2, 3, 20) >>> output, (hn, cn) = rnn(input, (h0, c0))- check_forward_args(input: Tensor, hidden: tuple[Tensor, Tensor], batch_sizes: Tensor | None) None[source]¶
-
forward(input: Tensor, hx: tuple[Tensor, Tensor] | None =
None) tuple[Tensor, tuple[Tensor, Tensor]][source]¶ -
forward(input: PackedSequence, hx: tuple[Tensor, Tensor] | None =
None) tuple[PackedSequence, tuple[Tensor, Tensor]] Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.LSTMCell(input_size: int, hidden_size: int, bias: bool =
True, device=None, dtype=None)[source]¶ Bases:
RNNCellBaseA long short-term memory (LSTM) cell.
\[\begin{split}\begin{array}{ll} i = \sigma(W_{ii} x + b_{ii} + W_{hi} h + b_{hi}) \\ f = \sigma(W_{if} x + b_{if} + W_{hf} h + b_{hf}) \\ g = \tanh(W_{ig} x + b_{ig} + W_{hg} h + b_{hg}) \\ o = \sigma(W_{io} x + b_{io} + W_{ho} h + b_{ho}) \\ c' = f \odot c + i \odot g \\ h' = o \odot \tanh(c') \\ \end{array}\end{split}\]where \(\sigma\) is the sigmoid function, and \(\odot\) is the Hadamard product.
- Args:
input_size: The number of expected features in the input x hidden_size: The number of features in the hidden state h bias: If
False, then the layer does not use bias weights b_ih andb_hh. Default:
True- Inputs: input, (h_0, c_0)
input of shape (batch, input_size) or (input_size): tensor containing input features
h_0 of shape (batch, hidden_size) or (hidden_size): tensor containing the initial hidden state
c_0 of shape (batch, hidden_size) or (hidden_size): tensor containing the initial cell state
If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.
- Outputs: (h_1, c_1)
h_1 of shape (batch, hidden_size) or (hidden_size): tensor containing the next hidden state
c_1 of shape (batch, hidden_size) or (hidden_size): tensor containing the next cell state
- Attributes:
- weight_ih: the learnable input-hidden weights, of shape
(4*hidden_size, input_size)
- weight_hh: the learnable hidden-hidden weights, of shape
(4*hidden_size, hidden_size)
bias_ih: the learnable input-hidden bias, of shape (4*hidden_size) bias_hh: the learnable hidden-hidden bias, of shape (4*hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
On certain ROCm devices, when using float16 inputs this module will use different precision for backward.
Examples:
>>> rnn = nn.LSTMCell(10, 20) # (input_size, hidden_size) >>> input = torch.randn(2, 3, 10) # (time_steps, batch, input_size) >>> hx = torch.randn(3, 20) # (batch, hidden_size) >>> cx = torch.randn(3, 20) >>> output = [] >>> for i in range(input.size()[0]): ... hx, cx = rnn(input[i], (hx, cx)) ... output.append(hx) >>> output = torch.stack(output, dim=0)-
forward(input: Tensor, hx: tuple[Tensor, Tensor] | None =
None) tuple[Tensor, Tensor][source]¶ Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.LayerNorm(normalized_shape: int | list[int] | Size, eps: float =
1e-05, elementwise_affine: bool =True, bias: bool =True, device=None, dtype=None)[source]¶ Bases:
ModuleApplies Layer Normalization over a mini-batch of inputs.
This layer implements the operation as described in the paper Layer Normalization
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated over the last D dimensions, where D is the dimension of
normalized_shape. For example, ifnormalized_shapeis(3, 5)(a 2-dimensional shape), the mean and standard-deviation are computed over the last 2 dimensions of the input (i.e.input.mean((-2, -1))). \(\gamma\) and \(\beta\) are learnable affine transform parameters ofnormalized_shapeifelementwise_affineisTrue. The variance is calculated via the biased estimator, equivalent to torch.var(input, correction=0).Note
Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the
affineoption, Layer Normalization applies per-element scale and bias withelementwise_affine.This layer uses statistics computed from input data in both training and evaluation modes.
- Args:
- normalized_shape (int or list or torch.Size): input shape from an expected input
of size
\[[* \times \text{normalized\_shape}[0] \times \text{normalized\_shape}[1] \times \ldots \times \text{normalized\_shape}[-1]]\]If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size.
eps: a value added to the denominator for numerical stability. Default: 1e-5 elementwise_affine: a boolean value that when set to
True, this modulehas learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases). Default:
True.- bias: If set to
False, the layer will not learn an additive bias (only relevant if elementwise_affineisTrue). Default:True.
- Attributes:
- weight: the learnable weights of the module of shape
\(\text{normalized\_shape}\) when
elementwise_affineis set toTrue. The values are initialized to 1.- bias: the learnable bias of the module of shape
\(\text{normalized\_shape}\) when
elementwise_affineis set toTrue. The values are initialized to 0.
- Shape:
Input: \((N, *)\)
Output: \((N, *)\) (same shape as input)
Examples:
>>> # NLP Example >>> batch, sentence_length, embedding_dim = 20, 5, 10 >>> embedding = torch.randn(batch, sentence_length, embedding_dim) >>> layer_norm = nn.LayerNorm(embedding_dim) >>> # Activate module >>> layer_norm(embedding) >>> >>> # Image Example >>> N, C, H, W = 20, 5, 10, 10 >>> input = torch.randn(N, C, H, W) >>> # Normalize over the last three dimensions (i.e. the channel and spatial dimensions) >>> # as shown in the image below >>> layer_norm = nn.LayerNorm([C, H, W]) >>> output = layer_norm(input)
- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(input: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.LazyBatchNorm1d(eps=
1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)[source]¶ Bases:
_LazyNormBase,_BatchNormA
torch.nn.BatchNorm1dmodule with lazy initialization.Lazy initialization based on the
num_featuresargument of theBatchNorm1dthat is inferred from theinput.size(1). The attributes that will be lazily initialized are weight, bias, running_mean and running_var.Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
- eps: a value added to the denominator for numerical stability.
Default: 1e-5
- momentum: the value used for the running_mean and running_var
computation. Can be set to
Nonefor cumulative moving average (i.e. simple average). Default: 0.1- affine: a boolean value that when set to
True, this module has learnable affine parameters. Default:
True- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics, and initializes statistics buffersrunning_meanandrunning_varasNone. When these buffers areNone, this module always uses batch statistics. in both training and eval modes. Default:True
- cls_to_become¶
alias of
BatchNorm1d
-
class torchwrench.nn.LazyBatchNorm2d(eps=
1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)[source]¶ Bases:
_LazyNormBase,_BatchNormA
torch.nn.BatchNorm2dmodule with lazy initialization.Lazy initialization is done for the
num_featuresargument of theBatchNorm2dthat is inferred from theinput.size(1). The attributes that will be lazily initialized are weight, bias, running_mean and running_var.Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
- eps: a value added to the denominator for numerical stability.
Default: 1e-5
- momentum: the value used for the running_mean and running_var
computation. Can be set to
Nonefor cumulative moving average (i.e. simple average). Default: 0.1- affine: a boolean value that when set to
True, this module has learnable affine parameters. Default:
True- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics, and initializes statistics buffersrunning_meanandrunning_varasNone. When these buffers areNone, this module always uses batch statistics. in both training and eval modes. Default:True
- cls_to_become¶
alias of
BatchNorm2d
-
class torchwrench.nn.LazyBatchNorm3d(eps=
1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)[source]¶ Bases:
_LazyNormBase,_BatchNormA
torch.nn.BatchNorm3dmodule with lazy initialization.Lazy initialization is done for the
num_featuresargument of theBatchNorm3dthat is inferred from theinput.size(1). The attributes that will be lazily initialized are weight, bias, running_mean and running_var.Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
- eps: a value added to the denominator for numerical stability.
Default: 1e-5
- momentum: the value used for the running_mean and running_var
computation. Can be set to
Nonefor cumulative moving average (i.e. simple average). Default: 0.1- affine: a boolean value that when set to
True, this module has learnable affine parameters. Default:
True- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics, and initializes statistics buffersrunning_meanandrunning_varasNone. When these buffers areNone, this module always uses batch statistics. in both training and eval modes. Default:True
- cls_to_become¶
alias of
BatchNorm3d
-
class torchwrench.nn.LazyConv1d(out_channels: int, kernel_size: int | tuple[int], stride: int | tuple[int] =
1, padding: int | tuple[int] =0, dilation: int | tuple[int] =1, groups: int =1, bias: bool =True, padding_mode: 'zeros' | 'reflect' | 'replicate' | 'circular' ='zeros', device=None, dtype=None)[source]¶ Bases:
_LazyConvXdMixin,Conv1dA
torch.nn.Conv1dmodule with lazy initialization of thein_channelsargument.The
in_channelsargument of theConv1dis inferred from theinput.size(1). The attributes that will be lazily initialized are weight and bias.Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int or tuple, optional): Zero-padding added to both sides of
the input. Default: 0
- dilation (int or tuple, optional): Spacing between kernel
elements. Default: 1
- groups (int, optional): Number of blocked connections from input
channels to output channels. Default: 1
- bias (bool, optional): If
True, adds a learnable bias to the output. Default:
True- padding_mode (str, optional):
'zeros','reflect', 'replicate'or'circular'. Default:'zeros'
See also
-
class torchwrench.nn.LazyConv2d(out_channels: int, kernel_size: int | tuple[int, int], stride: int | tuple[int, int] =
1, padding: int | tuple[int, int] =0, dilation: int | tuple[int, int] =1, groups: int =1, bias: bool =True, padding_mode: 'zeros' | 'reflect' | 'replicate' | 'circular' ='zeros', device=None, dtype=None)[source]¶ Bases:
_LazyConvXdMixin,Conv2dA
torch.nn.Conv2dmodule with lazy initialization of thein_channelsargument.The
in_channelsargument of theConv2dthat is inferred from theinput.size(1). The attributes that will be lazily initialized are weight and bias.Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int or tuple, optional): Zero-padding added to both sides of
the input. Default: 0
- dilation (int or tuple, optional): Spacing between kernel
elements. Default: 1
- groups (int, optional): Number of blocked connections from input
channels to output channels. Default: 1
- bias (bool, optional): If
True, adds a learnable bias to the output. Default:
True- padding_mode (str, optional):
'zeros','reflect', 'replicate'or'circular'. Default:'zeros'
See also
-
class torchwrench.nn.LazyConv3d(out_channels: int, kernel_size: int | tuple[int, int, int], stride: int | tuple[int, int, int] =
1, padding: int | tuple[int, int, int] =0, dilation: int | tuple[int, int, int] =1, groups: int =1, bias: bool =True, padding_mode: 'zeros' | 'reflect' | 'replicate' | 'circular' ='zeros', device=None, dtype=None)[source]¶ Bases:
_LazyConvXdMixin,Conv3dA
torch.nn.Conv3dmodule with lazy initialization of thein_channelsargument.The
in_channelsargument of theConv3dthat is inferred from theinput.size(1). The attributes that will be lazily initialized are weight and bias.Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int or tuple, optional): Zero-padding added to both sides of
the input. Default: 0
- dilation (int or tuple, optional): Spacing between kernel
elements. Default: 1
- groups (int, optional): Number of blocked connections from input
channels to output channels. Default: 1
- bias (bool, optional): If
True, adds a learnable bias to the output. Default:
True- padding_mode (str, optional):
'zeros','reflect', 'replicate'or'circular'. Default:'zeros'
See also
-
class torchwrench.nn.LazyConvTranspose1d(out_channels: int, kernel_size: int | tuple[int], stride: int | tuple[int] =
1, padding: int | tuple[int] =0, output_padding: int | tuple[int] =0, groups: int =1, bias: bool =True, dilation: int | tuple[int] =1, padding_mode: 'zeros' | 'reflect' | 'replicate' | 'circular' ='zeros', device=None, dtype=None)[source]¶ Bases:
_LazyConvXdMixin,ConvTranspose1dA
torch.nn.ConvTranspose1dmodule with lazy initialization of thein_channelsargument.The
in_channelsargument of theConvTranspose1dthat is inferred from theinput.size(1). The attributes that will be lazily initialized are weight and bias.Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int or tuple, optional):
dilation * (kernel_size - 1) - paddingzero-paddingwill be added to both sides of the input. Default: 0
- output_padding (int or tuple, optional): Additional size added to one side
of the output shape. Default: 0
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1 bias (bool, optional): If
True, adds a learnable bias to the output. Default:Truedilation (int or tuple, optional): Spacing between kernel elements. Default: 1
- cls_to_become¶
alias of
ConvTranspose1d
-
class torchwrench.nn.LazyConvTranspose2d(out_channels: int, kernel_size: int | tuple[int, int], stride: int | tuple[int, int] =
1, padding: int | tuple[int, int] =0, output_padding: int | tuple[int, int] =0, groups: int =1, bias: bool =True, dilation: int =1, padding_mode: 'zeros' | 'reflect' | 'replicate' | 'circular' ='zeros', device=None, dtype=None)[source]¶ Bases:
_LazyConvXdMixin,ConvTranspose2dA
torch.nn.ConvTranspose2dmodule with lazy initialization of thein_channelsargument.The
in_channelsargument of theConvTranspose2dis inferred from theinput.size(1). The attributes that will be lazily initialized are weight and bias.Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int or tuple, optional):
dilation * (kernel_size - 1) - paddingzero-paddingwill be added to both sides of each dimension in the input. Default: 0
- output_padding (int or tuple, optional): Additional size added to one side
of each dimension in the output shape. Default: 0
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1 bias (bool, optional): If
True, adds a learnable bias to the output. Default:Truedilation (int or tuple, optional): Spacing between kernel elements. Default: 1
- cls_to_become¶
alias of
ConvTranspose2d
-
class torchwrench.nn.LazyConvTranspose3d(out_channels: int, kernel_size: int | tuple[int, int, int], stride: int | tuple[int, int, int] =
1, padding: int | tuple[int, int, int] =0, output_padding: int | tuple[int, int, int] =0, groups: int =1, bias: bool =True, dilation: int | tuple[int, int, int] =1, padding_mode: 'zeros' | 'reflect' | 'replicate' | 'circular' ='zeros', device=None, dtype=None)[source]¶ Bases:
_LazyConvXdMixin,ConvTranspose3dA
torch.nn.ConvTranspose3dmodule with lazy initialization of thein_channelsargument.The
in_channelsargument of theConvTranspose3dis inferred from theinput.size(1). The attributes that will be lazily initialized are weight and bias.Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int or tuple, optional):
dilation * (kernel_size - 1) - paddingzero-paddingwill be added to both sides of each dimension in the input. Default: 0
- output_padding (int or tuple, optional): Additional size added to one side
of each dimension in the output shape. Default: 0
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1 bias (bool, optional): If
True, adds a learnable bias to the output. Default:Truedilation (int or tuple, optional): Spacing between kernel elements. Default: 1
- cls_to_become¶
alias of
ConvTranspose3d
-
class torchwrench.nn.LazyInstanceNorm1d(eps=
1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)[source]¶ Bases:
_LazyNormBase,_InstanceNormA
torch.nn.InstanceNorm1dmodule with lazy initialization of thenum_featuresargument.The
num_featuresargument of theInstanceNorm1dis inferred from theinput.size(1). The attributes that will be lazily initialized are weight, bias, running_mean and running_var.Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
- num_features: \(C\) from an expected input of size
\((N, C, L)\) or \((C, L)\)
eps: a value added to the denominator for numerical stability. Default: 1e-5 momentum: the value used for the running_mean and running_var computation. Default: 0.1 affine: a boolean value that when set to
True, this module haslearnable affine parameters, initialized the same way as done for batch normalization. Default:
False.- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:False
- Shape:
Input: \((N, C, L)\) or \((C, L)\)
Output: \((N, C, L)\) or \((C, L)\) (same shape as input)
- cls_to_become¶
alias of
InstanceNorm1d
-
class torchwrench.nn.LazyInstanceNorm2d(eps=
1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)[source]¶ Bases:
_LazyNormBase,_InstanceNormA
torch.nn.InstanceNorm2dmodule with lazy initialization of thenum_featuresargument.The
num_featuresargument of theInstanceNorm2dis inferred from theinput.size(1). The attributes that will be lazily initialized are weight, bias, running_mean and running_var.Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
- num_features: \(C\) from an expected input of size
\((N, C, H, W)\) or \((C, H, W)\)
eps: a value added to the denominator for numerical stability. Default: 1e-5 momentum: the value used for the running_mean and running_var computation. Default: 0.1 affine: a boolean value that when set to
True, this module haslearnable affine parameters, initialized the same way as done for batch normalization. Default:
False.- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:False
- Shape:
Input: \((N, C, H, W)\) or \((C, H, W)\)
Output: \((N, C, H, W)\) or \((C, H, W)\) (same shape as input)
- cls_to_become¶
alias of
InstanceNorm2d
-
class torchwrench.nn.LazyInstanceNorm3d(eps=
1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)[source]¶ Bases:
_LazyNormBase,_InstanceNormA
torch.nn.InstanceNorm3dmodule with lazy initialization of thenum_featuresargument.The
num_featuresargument of theInstanceNorm3dis inferred from theinput.size(1). The attributes that will be lazily initialized are weight, bias, running_mean and running_var.Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
- num_features: \(C\) from an expected input of size
\((N, C, D, H, W)\) or \((C, D, H, W)\)
eps: a value added to the denominator for numerical stability. Default: 1e-5 momentum: the value used for the running_mean and running_var computation. Default: 0.1 affine: a boolean value that when set to
True, this module haslearnable affine parameters, initialized the same way as done for batch normalization. Default:
False.- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:False
- Shape:
Input: \((N, C, D, H, W)\) or \((C, D, H, W)\)
Output: \((N, C, D, H, W)\) or \((C, D, H, W)\) (same shape as input)
- cls_to_become¶
alias of
InstanceNorm3d
-
class torchwrench.nn.LazyLinear(out_features: int, bias: bool =
True, device=None, dtype=None)[source]¶ Bases:
LazyModuleMixin,LinearA
torch.nn.Linearmodule where in_features is inferred.In this module, the weight and bias are of
torch.nn.UninitializedParameterclass. They will be initialized after the first call toforwardis done and the module will become a regulartorch.nn.Linearmodule. Thein_featuresargument of theLinearis inferred from theinput.shape[-1].Check the
torch.nn.modules.lazy.LazyModuleMixinfor further documentation on lazy modules and their limitations.- Args:
out_features: size of each output sample bias: If set to
False, the layer will not learn an additive bias.Default:
True- Attributes:
- weight: the learnable weights of the module of shape
\((\text{out\_features}, \text{in\_features})\). The values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in\_features}}\)
- bias: the learnable bias of the module of shape \((\text{out\_features})\).
If
biasisTrue, the values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{in\_features}}\)
- bias : UninitializedParameter¶
- initialize_parameters(input) None[source]¶
Infers
in_featuresbased oninputand initializes parameters.
- weight : UninitializedParameter¶
-
class torchwrench.nn.LeakyReLU(negative_slope: float =
0.01, inplace: bool =False)[source]¶ Bases:
ModuleApplies the LeakyReLU function element-wise.
\[\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x)\]or
\[\begin{split}\text{LeakyReLU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \text{negative\_slope} \times x, & \text{ otherwise } \end{cases}\end{split}\]- Args:
- negative_slope: Controls the angle of the negative slope (which is used for
negative input values). Default: 1e-2
inplace: can optionally do the operation in-place. Default:
False- Shape:
Input: \((*)\) where * means, any number of additional dimensions
Output: \((*)\), same shape as the input
Examples:
>>> m = nn.LeakyReLU(0.1) >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.Linear(in_features: int, out_features: int, bias: bool =
True, device=None, dtype=None)[source]¶ Bases:
ModuleApplies an affine linear transformation to the incoming data: \(y = xA^T + b\).
This module supports TensorFloat32.
On certain ROCm devices, when using float16 inputs this module will use different precision for backward.
- Args:
in_features: size of each input sample out_features: size of each output sample bias: If set to
False, the layer will not learn an additive bias.Default:
True- Shape:
Input: \((*, H_\text{in})\) where \(*\) means any number of dimensions including none and \(H_\text{in} = \text{in\_features}\).
Output: \((*, H_\text{out})\) where all but the last dimension are the same shape as the input and \(H_\text{out} = \text{out\_features}\).
- Attributes:
- weight: the learnable weights of the module of shape
\((\text{out\_features}, \text{in\_features})\). The values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in\_features}}\)
- bias: the learnable bias of the module of shape \((\text{out\_features})\).
If
biasisTrue, the values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{in\_features}}\)
Examples:
>>> m = nn.Linear(20, 30) >>> input = torch.randn(128, 20) >>> output = m(input) >>> print(output.size()) torch.Size([128, 30])
-
class torchwrench.nn.LocalResponseNorm(size: int, alpha: float =
0.0001, beta: float =0.75, k: float =1.0)[source]¶ Bases:
ModuleApplies local response normalization over an input signal.
The input signal is composed of several input planes, where channels occupy the second dimension. Applies normalization across channels.
\[b_{c} = a_{c}\left(k + \frac{\alpha}{n} \sum_{c'=\max(0, c-n/2)}^{\min(N-1,c+n/2)}a_{c'}^2\right)^{-\beta}\]- Args:
size: amount of neighbouring channels used for normalization alpha: multiplicative factor. Default: 0.0001 beta: exponent. Default: 0.75 k: additive factor. Default: 1
- Shape:
Input: \((N, C, *)\)
Output: \((N, C, *)\) (same shape as input)
Examples:
>>> lrn = nn.LocalResponseNorm(2) >>> signal_2d = torch.randn(32, 5, 24, 24) >>> signal_4d = torch.randn(16, 5, 7, 7, 7, 7) >>> output_2d = lrn(signal_2d) >>> output_4d = lrn(signal_4d)
- class torchwrench.nn.Log(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
log().- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.Log10(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
log10().- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.Log2(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
log2().- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.LogSigmoid(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleApplies the Logsigmoid function element-wise.
\[\text{LogSigmoid}(x) = \log\left(\frac{ 1 }{ 1 + \exp(-x)}\right)\]- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.LogSigmoid() >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.LogSoftmax(dim: int | None =
None)[source]¶ Bases:
ModuleApplies the \(\log(\text{Softmax}(x))\) function to an n-dimensional input Tensor.
The LogSoftmax formulation can be simplified as:
\[\text{LogSoftmax}(x_{i}) = \log\left(\frac{\exp(x_i) }{ \sum_j \exp(x_j)} \right)\]- Shape:
Input: \((*)\) where * means, any number of additional dimensions
Output: \((*)\), same shape as the input
- Args:
dim (int): A dimension along which LogSoftmax will be computed.
- Returns:
a Tensor of the same dimension and shape as the input with values in the range [-inf, 0)
Examples:
>>> m = nn.LogSoftmax(dim=1) >>> input = torch.randn(2, 3) >>> output = m(input)
-
class torchwrench.nn.LogSoftmaxMultidim(dims: Iterable[int] | None =
(-1,))[source]¶ Bases:
ModuleFor more information, see
softmax_multidim().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(input: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.MSELoss(size_average=
None, reduce=None, reduction: str ='mean')[source]¶ Bases:
_LossCreates a criterion that measures the mean squared error (squared L2 norm) between each element in the input \(x\) and target \(y\).
The unreduced (i.e. with
reductionset to'none') loss can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = \left( x_n - y_n \right)^2,\]where \(N\) is the batch size. If
reductionis not'none'(default'mean'), then:\[\begin{split}\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]\(x\) and \(y\) are tensors of arbitrary shapes with a total of \(N\) elements each.
The mean operation still operates over all the elements, and divides by \(N\).
The division by \(N\) can be avoided if one sets
reduction = 'sum'.- Args:
- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- size_average (bool, optional): Deprecated (see
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Target: \((*)\), same shape as the input.
Examples:
>>> loss = nn.MSELoss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randn(3, 5) >>> output = loss(input, target) >>> output.backward()
-
class torchwrench.nn.MarginRankingLoss(margin: float =
0.0, size_average=None, reduce=None, reduction: str ='mean')[source]¶ Bases:
_LossCreates a criterion that measures the loss given inputs \(x1\), \(x2\), two 1D mini-batch or 0D Tensors, and a label 1D mini-batch or 0D Tensor \(y\) (containing 1 or -1).
If \(y = 1\) then it assumed the first input should be ranked higher (have a larger value) than the second input, and vice-versa for \(y = -1\).
The loss function for each pair of samples in the mini-batch is:
\[\text{loss}(x1, x2, y) = \max(0, -y * (x1 - x2) + \text{margin})\]- Args:
margin (float, optional): Has a default value of \(0\). size_average (bool, optional): Deprecated (see
reduction). By default,the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- reduce (bool, optional): Deprecated (see
- Shape:
Input1: \((N)\) or \(()\) where N is the batch size.
Input2: \((N)\) or \(()\), same shape as the Input1.
Target: \((N)\) or \(()\), same shape as the inputs.
Output: scalar. If
reductionis'none'and Input size is not \(()\), then \((N)\).
Examples:
>>> loss = nn.MarginRankingLoss() >>> input1 = torch.randn(3, requires_grad=True) >>> input2 = torch.randn(3, requires_grad=True) >>> target = torch.randn(3).sign() >>> output = loss(input1, input2, target) >>> output.backward()
-
class torchwrench.nn.MaskedMean(dim: None | int | Iterable[int] =
None)[source]¶ Bases:
ModuleFor more information, see
masked_mean().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(tensor: Tensor, non_pad_mask: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.MaskedSum(dim: None | int | Iterable[int] =
None)[source]¶ Bases:
ModuleFor more information, see
masked_sum().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(tensor: Tensor, non_pad_mask: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.Max(dim: int | None =
None, keepdim: bool =False, *, return_values: bool =True, return_indices: bool | None =None)[source]¶ Bases:
ModuleModule version of
max().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor | max[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.MaxPool1d(kernel_size: int | tuple[int, ...], stride: int | tuple[int, ...] | None =
None, padding: int | tuple[int, ...] =0, dilation: int | tuple[int, ...] =1, return_indices: bool =False, ceil_mode: bool =False)[source]¶ Bases:
_MaxPoolNdApplies a 1D max pooling over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C, L)\) and output \((N, C, L_{out})\) can be precisely described as:
\[out(N_i, C_j, k) = \max_{m=0, \ldots, \text{kernel\_size} - 1} input(N_i, C_j, stride \times k + m)\]If
paddingis non-zero, then the input is implicitly padded with negative infinity on both sides forpaddingnumber of points.dilationis the stride between the elements within the sliding window. This link has a nice visualization of the pooling parameters.- Note:
When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.
- Args:
kernel_size: The size of the sliding window, must be > 0. stride: The stride of the sliding window, must be > 0. Default value is
kernel_size. padding: Implicit negative infinity padding to be added on both sides, must be >= 0 and <= kernel_size / 2. dilation: The stride between elements within a sliding window, must be > 0. return_indices: IfTrue, will return the argmax along with the max values.Useful for
torch.nn.MaxUnpool1dlater- ceil_mode: If
True, will use ceil instead of floor to compute the output shape. This ensures that every element in the input tensor is covered by a sliding window.
- ceil_mode: If
- Shape:
Input: \((N, C, L_{in})\) or \((C, L_{in})\).
Output: \((N, C, L_{out})\) or \((C, L_{out})\),
where
ceil_mode = False\[L_{out} = \left\lfloor \frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\_size} - 1) - 1}{\text{stride}}\right\rfloor + 1\]where
ceil_mode = True\[L_{out} = \left\lceil \frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\_size} - 1) - 1 + (stride - 1)}{\text{stride}}\right\rceil + 1\]Ensure that the last pooling starts inside the image, make \(L_{out} = L_{out} - 1\) when \((L_{out} - 1) * \text{stride} >= L_{in} + \text{padding}\).
Examples:
>>> # pool of size=3, stride=2 >>> m = nn.MaxPool1d(3, stride=2) >>> input = torch.randn(20, 16, 50) >>> output = m(input)
-
class torchwrench.nn.MaxPool2d(kernel_size: int | tuple[int, ...], stride: int | tuple[int, ...] | None =
None, padding: int | tuple[int, ...] =0, dilation: int | tuple[int, ...] =1, return_indices: bool =False, ceil_mode: bool =False)[source]¶ Bases:
_MaxPoolNdApplies a 2D max pooling over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C, H, W)\), output \((N, C, H_{out}, W_{out})\) and
kernel_size\((kH, kW)\) can be precisely described as:\[\begin{split}\begin{aligned} out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\ & \text{input}(N_i, C_j, \text{stride[0]} \times h + m, \text{stride[1]} \times w + n) \end{aligned}\end{split}\]If
paddingis non-zero, then the input is implicitly padded with negative infinity on both sides forpaddingnumber of points.dilationcontrols the spacing between the kernel points. It is harder to describe, but this link has a nice visualization of whatdilationdoes.- Note:
When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.
The parameters
kernel_size,stride,padding,dilationcan either be:a single
int– in which case the same value is used for the height and width dimensiona
tupleof two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
- Args:
kernel_size: the size of the window to take a max over stride: the stride of the window. Default value is
kernel_sizepadding: Implicit negative infinity padding to be added on both sides dilation: a parameter that controls the stride of elements in the window return_indices: ifTrue, will return the max indices along with the outputs.Useful for
torch.nn.MaxUnpool2dlaterceil_mode: when True, will use ceil instead of floor to compute the output shape
- Shape:
Input: \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\)
Output: \((N, C, H_{out}, W_{out})\) or \((C, H_{out}, W_{out})\), where
\[H_{out} = \left\lfloor\frac{H_{in} + 2 * \text{padding[0]} - \text{dilation[0]} \times (\text{kernel\_size[0]} - 1) - 1}{\text{stride[0]}} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 * \text{padding[1]} - \text{dilation[1]} \times (\text{kernel\_size[1]} - 1) - 1}{\text{stride[1]}} + 1\right\rfloor\]
Examples:
>>> # pool of square window of size=3, stride=2 >>> m = nn.MaxPool2d(3, stride=2) >>> # pool of non-square window >>> m = nn.MaxPool2d((3, 2), stride=(2, 1)) >>> input = torch.randn(20, 16, 50, 32) >>> output = m(input)
-
class torchwrench.nn.MaxPool3d(kernel_size: int | tuple[int, ...], stride: int | tuple[int, ...] | None =
None, padding: int | tuple[int, ...] =0, dilation: int | tuple[int, ...] =1, return_indices: bool =False, ceil_mode: bool =False)[source]¶ Bases:
_MaxPoolNdApplies a 3D max pooling over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C, D, H, W)\), output \((N, C, D_{out}, H_{out}, W_{out})\) and
kernel_size\((kD, kH, kW)\) can be precisely described as:\[\begin{split}\begin{aligned} \text{out}(N_i, C_j, d, h, w) ={} & \max_{k=0, \ldots, kD-1} \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\ & \text{input}(N_i, C_j, \text{stride[0]} \times d + k, \text{stride[1]} \times h + m, \text{stride[2]} \times w + n) \end{aligned}\end{split}\]If
paddingis non-zero, then the input is implicitly padded with negative infinity on both sides forpaddingnumber of points.dilationcontrols the spacing between the kernel points. It is harder to describe, but this link has a nice visualization of whatdilationdoes.- Note:
When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.
The parameters
kernel_size,stride,padding,dilationcan either be:a single
int– in which case the same value is used for the depth, height and width dimensiona
tupleof three ints – in which case, the first int is used for the depth dimension, the second int for the height dimension and the third int for the width dimension
- Args:
kernel_size: the size of the window to take a max over stride: the stride of the window. Default value is
kernel_sizepadding: Implicit negative infinity padding to be added on all three sides dilation: a parameter that controls the stride of elements in the window return_indices: ifTrue, will return the max indices along with the outputs.Useful for
torch.nn.MaxUnpool3dlaterceil_mode: when True, will use ceil instead of floor to compute the output shape
- Shape:
Input: \((N, C, D_{in}, H_{in}, W_{in})\) or \((C, D_{in}, H_{in}, W_{in})\).
Output: \((N, C, D_{out}, H_{out}, W_{out})\) or \((C, D_{out}, H_{out}, W_{out})\), where
\[D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{dilation}[2] \times (\text{kernel\_size}[2] - 1) - 1}{\text{stride}[2]} + 1\right\rfloor\]
Examples:
>>> # pool of square window of size=3, stride=2 >>> m = nn.MaxPool3d(3, stride=2) >>> # pool of non-square window >>> m = nn.MaxPool3d((3, 2, 2), stride=(2, 1, 2)) >>> input = torch.randn(20, 16, 50, 44, 31) >>> output = m(input)
-
class torchwrench.nn.MaxUnpool1d(kernel_size: int | tuple[int], stride: int | tuple[int] | None =
None, padding: int | tuple[int] =0)[source]¶ Bases:
_MaxUnpoolNdComputes a partial inverse of
MaxPool1d.MaxPool1dis not fully invertible, since the non-maximal values are lost.MaxUnpool1dtakes in as input the output ofMaxPool1dincluding the indices of the maximal values and computes a partial inverse in which all non-maximal values are set to zero.- Note:
This operation may behave nondeterministically when the input indices has repeat values. See https://github.com/pytorch/pytorch/issues/80827 and /notes/randomness for more information.
Note
MaxPool1dcan map several input sizes to the same output sizes. Hence, the inversion process can get ambiguous. To accommodate this, you can provide the needed output size as an additional argumentoutput_sizein the forward call. See the Inputs and Example below.- Args:
kernel_size (int or tuple): Size of the max pooling window. stride (int or tuple): Stride of the max pooling window.
It is set to
kernel_sizeby default.padding (int or tuple): Padding that was added to the input
- Inputs:
input: the input Tensor to invert
indices: the indices given out by
MaxPool1doutput_size (optional): the targeted output size
- Shape:
Input: \((N, C, H_{in})\) or \((C, H_{in})\).
Output: \((N, C, H_{out})\) or \((C, H_{out})\), where
\[H_{out} = (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{kernel\_size}[0]\]or as given by
output_sizein the call operator
Example:
>>> # xdoctest: +IGNORE_WANT("do other tests modify the global state?") >>> pool = nn.MaxPool1d(2, stride=2, return_indices=True) >>> unpool = nn.MaxUnpool1d(2, stride=2) >>> input = torch.tensor([[[1., 2, 3, 4, 5, 6, 7, 8]]]) >>> output, indices = pool(input) >>> unpool(output, indices) tensor([[[ 0., 2., 0., 4., 0., 6., 0., 8.]]]) >>> # Example showcasing the use of output_size >>> input = torch.tensor([[[1., 2, 3, 4, 5, 6, 7, 8, 9]]]) >>> output, indices = pool(input) >>> unpool(output, indices, output_size=input.size()) tensor([[[ 0., 2., 0., 4., 0., 6., 0., 8., 0.]]]) >>> unpool(output, indices) tensor([[[ 0., 2., 0., 4., 0., 6., 0., 8.]]])
-
class torchwrench.nn.MaxUnpool2d(kernel_size: int | tuple[int, int], stride: int | tuple[int, int] | None =
None, padding: int | tuple[int, int] =0)[source]¶ Bases:
_MaxUnpoolNdComputes a partial inverse of
MaxPool2d.MaxPool2dis not fully invertible, since the non-maximal values are lost.MaxUnpool2dtakes in as input the output ofMaxPool2dincluding the indices of the maximal values and computes a partial inverse in which all non-maximal values are set to zero.- Note:
This operation may behave nondeterministically when the input indices has repeat values. See https://github.com/pytorch/pytorch/issues/80827 and /notes/randomness for more information.
Note
MaxPool2dcan map several input sizes to the same output sizes. Hence, the inversion process can get ambiguous. To accommodate this, you can provide the needed output size as an additional argumentoutput_sizein the forward call. See the Inputs and Example below.- Args:
kernel_size (int or tuple): Size of the max pooling window. stride (int or tuple): Stride of the max pooling window.
It is set to
kernel_sizeby default.padding (int or tuple): Padding that was added to the input
- Inputs:
input: the input Tensor to invert
indices: the indices given out by
MaxPool2doutput_size (optional): the targeted output size
- Shape:
Input: \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\).
Output: \((N, C, H_{out}, W_{out})\) or \((C, H_{out}, W_{out})\), where
\[H_{out} = (H_{in} - 1) \times \text{stride[0]} - 2 \times \text{padding[0]} + \text{kernel\_size[0]}\]\[W_{out} = (W_{in} - 1) \times \text{stride[1]} - 2 \times \text{padding[1]} + \text{kernel\_size[1]}\]or as given by
output_sizein the call operator
Example:
>>> pool = nn.MaxPool2d(2, stride=2, return_indices=True) >>> unpool = nn.MaxUnpool2d(2, stride=2) >>> input = torch.tensor([[[[ 1., 2., 3., 4.], [ 5., 6., 7., 8.], [ 9., 10., 11., 12.], [13., 14., 15., 16.]]]]) >>> output, indices = pool(input) >>> unpool(output, indices) tensor([[[[ 0., 0., 0., 0.], [ 0., 6., 0., 8.], [ 0., 0., 0., 0.], [ 0., 14., 0., 16.]]]]) >>> # Now using output_size to resolve an ambiguous size for the inverse >>> input = torch.tensor([[[[ 1., 2., 3., 4., 5.], [ 6., 7., 8., 9., 10.], [11., 12., 13., 14., 15.], [16., 17., 18., 19., 20.]]]]) >>> output, indices = pool(input) >>> # This call will not work without specifying output_size >>> unpool(output, indices, output_size=input.size()) tensor([[[[ 0., 0., 0., 0., 0.], [ 0., 7., 0., 9., 0.], [ 0., 0., 0., 0., 0.], [ 0., 17., 0., 19., 0.]]]])
-
class torchwrench.nn.MaxUnpool3d(kernel_size: int | tuple[int, int, int], stride: int | tuple[int, int, int] | None =
None, padding: int | tuple[int, int, int] =0)[source]¶ Bases:
_MaxUnpoolNdComputes a partial inverse of
MaxPool3d.MaxPool3dis not fully invertible, since the non-maximal values are lost.MaxUnpool3dtakes in as input the output ofMaxPool3dincluding the indices of the maximal values and computes a partial inverse in which all non-maximal values are set to zero.- Note:
This operation may behave nondeterministically when the input indices has repeat values. See https://github.com/pytorch/pytorch/issues/80827 and /notes/randomness for more information.
Note
MaxPool3dcan map several input sizes to the same output sizes. Hence, the inversion process can get ambiguous. To accommodate this, you can provide the needed output size as an additional argumentoutput_sizein the forward call. See the Inputs section below.- Args:
kernel_size (int or tuple): Size of the max pooling window. stride (int or tuple): Stride of the max pooling window.
It is set to
kernel_sizeby default.padding (int or tuple): Padding that was added to the input
- Inputs:
input: the input Tensor to invert
indices: the indices given out by
MaxPool3doutput_size (optional): the targeted output size
- Shape:
Input: \((N, C, D_{in}, H_{in}, W_{in})\) or \((C, D_{in}, H_{in}, W_{in})\).
Output: \((N, C, D_{out}, H_{out}, W_{out})\) or \((C, D_{out}, H_{out}, W_{out})\), where
\[D_{out} = (D_{in} - 1) \times \text{stride[0]} - 2 \times \text{padding[0]} + \text{kernel\_size[0]}\]\[H_{out} = (H_{in} - 1) \times \text{stride[1]} - 2 \times \text{padding[1]} + \text{kernel\_size[1]}\]\[W_{out} = (W_{in} - 1) \times \text{stride[2]} - 2 \times \text{padding[2]} + \text{kernel\_size[2]}\]or as given by
output_sizein the call operator
Example:
>>> # pool of square window of size=3, stride=2 >>> pool = nn.MaxPool3d(3, stride=2, return_indices=True) >>> unpool = nn.MaxUnpool3d(3, stride=2) >>> output, indices = pool(torch.randn(20, 16, 51, 33, 15)) >>> unpooled_output = unpool(output, indices) >>> unpooled_output.size() torch.Size([20, 16, 51, 33, 15])
-
class torchwrench.nn.Mean(dim: int | None =
None, keepdim: bool =False, dtype: dtype | None | 'default' | str | DTypeEnum =None)[source]¶ Bases:
ModuleModule version of
mean().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.Min(dim: int | None =
None, keepdim: bool =False, *, return_values: bool =True, return_indices: bool | None =None)[source]¶ Bases:
ModuleModule version of
min().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor | min[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.Mish(inplace: bool =
False)[source]¶ Bases:
ModuleApplies the Mish function, element-wise.
Mish: A Self Regularized Non-Monotonic Neural Activation Function.
\[\text{Mish}(x) = x * \text{Tanh}(\text{Softplus}(x))\]- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Mish() >>> input = torch.randn(2) >>> output = m(input)
- class torchwrench.nn.Module(*args: Any, **kwargs: Any)[source]¶
Bases:
objectBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:¶
- training : bool
Boolean represents whether this module is in training or evaluation mode.
-
T_destination =
~T_destination¶
- add_module(name: str, module: Module | None) None[source]¶
Add a child module to the current module.
The module can be accessed as an attribute using the given name.
- Args:
- name (str): name of the child module. The child module can be
accessed from this module using the given name
module (Module): child module to be added to the module.
- apply(fn: Callable[[Module], None]) Self[source]¶
Apply
fnrecursively to every submodule (as returned by.children()) as well as self.Typical use includes initializing the parameters of a model (see also torch.nn.init).
- Args:
fn (
Module-> None): function to be applied to each submodule- Returns:
Module: self
Example:
>>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) is nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[1., 1.], [1., 1.]], requires_grad=True) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[1., 1.], [1., 1.]], requires_grad=True) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )
- bfloat16() Self[source]¶
Casts all floating point parameters and buffers to
bfloat16datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
-
buffers(recurse: bool =
True) Iterator[Tensor][source]¶ Return an iterator over module buffers.
- Args:
- recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields:
torch.Tensor: module buffer
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- children() Iterator[Module][source]¶
Return an iterator over immediate children modules.
- Yields:
Module: a child module
- compile(*args, **kwargs) None[source]¶
Compile this Module’s forward using
torch.compile().This Module’s __call__ method is compiled and all arguments are passed as-is to
torch.compile().See
torch.compile()for details on the arguments for this function.
- cpu() Self[source]¶
Move all model parameters and buffers to the CPU.
Note
This method modifies the module in-place.
- Returns:
Module: self
-
cuda(device: int | device | None =
None) Self[source]¶ Move all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing the optimizer if the module will live on GPU while being optimized.
Note
This method modifies the module in-place.
- Args:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
- double() Self[source]¶
Casts all floating point parameters and buffers to
doubledatatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- eval() Self[source]¶
Set the module in evaluation mode.
This has an effect only on certain modules. See the documentation of particular modules for details of their behaviors in training/evaluation mode, i.e. whether they are affected, e.g.
Dropout,BatchNorm, etc.This is equivalent with
self.train(False).See Locally disabling gradient computation for a comparison between .eval() and several similar mechanisms that may be confused with it.
- Returns:
Module: self
- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- float() Self[source]¶
Casts all floating point parameters and buffers to
floatdatatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- forward(*input: Any) None¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- get_buffer(target: str) Tensor[source]¶
Return the buffer given by
targetif it exists, otherwise throw an error.See the docstring for
get_submodulefor a more detailed explanation of this method’s functionality as well as how to correctly specifytarget.- Args:
- target: The fully-qualified string name of the buffer
to look for. (See
get_submodulefor how to specify a fully-qualified string.)
- Returns:
torch.Tensor: The buffer referenced by
target- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not a buffer
- get_extra_state() Any[source]¶
Return any extra state to include in the module’s state_dict.
Implement this and a corresponding
set_extra_state()for your module if you need to store extra state. This function is called when building the module’s state_dict().Note that extra state should be picklable to ensure working serialization of the state_dict. We only provide backwards compatibility guarantees for serializing Tensors; other objects may break backwards compatibility if their serialized pickled form changes.
- Returns:
object: Any extra state to store in the module’s state_dict
- get_parameter(target: str) Parameter[source]¶
Return the parameter given by
targetif it exists, otherwise throw an error.See the docstring for
get_submodulefor a more detailed explanation of this method’s functionality as well as how to correctly specifytarget.- Args:
- target: The fully-qualified string name of the Parameter
to look for. (See
get_submodulefor how to specify a fully-qualified string.)
- Returns:
torch.nn.Parameter: The Parameter referenced by
target- Raises:
- AttributeError: If the target string references an invalid
path or resolves to something that is not an
nn.Parameter
- get_submodule(target: str) Module[source]¶
Return the submodule given by
targetif it exists, otherwise throw an error.For example, let’s say you have an
nn.ModuleAthat looks like this:A( (net_b): Module( (net_c): Module( (conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2)) ) (linear): Linear(in_features=100, out_features=200, bias=True) ) )(The diagram shows an
nn.ModuleA.Awhich has a nested submodulenet_b, which itself has two submodulesnet_candlinear.net_cthen has a submoduleconv.)To check whether or not we have the
linearsubmodule, we would callget_submodule("net_b.linear"). To check whether we have theconvsubmodule, we would callget_submodule("net_b.net_c.conv").The runtime of
get_submoduleis bounded by the degree of module nesting intarget. A query againstnamed_modulesachieves the same result, but it is O(N) in the number of transitive modules. So, for a simple check to see if some submodule exists,get_submoduleshould always be used.- Args:
- target: The fully-qualified string name of the submodule
to look for. (See above example for how to specify a fully-qualified string.)
- Returns:
torch.nn.Module: The submodule referenced by
target- Raises:
- AttributeError: If at any point along the path resulting from
the target string the (sub)path resolves to a non-existent attribute name or an object that is not an instance of
nn.Module.
- half() Self[source]¶
Casts all floating point parameters and buffers to
halfdatatype.Note
This method modifies the module in-place.
- Returns:
Module: self
-
ipu(device: int | device | None =
None) Self[source]¶ Move all model parameters and buffers to the IPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing the optimizer if the module will live on IPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
-
load_state_dict(state_dict: Mapping[str, Any], strict: bool =
True, assign: bool =False)[source]¶ Copy parameters and buffers from
state_dictinto this module and its descendants.If
strictisTrue, then the keys ofstate_dictmust exactly match the keys returned by this module’sstate_dict()function.Warning
If
assignisTruethe optimizer must be created after the call toload_state_dictunlessget_swap_module_params_on_conversion()isTrue.- Args:
- state_dict (dict): a dict containing parameters and
persistent buffers.
- strict (bool, optional): whether to strictly enforce that the keys
in
state_dictmatch the keys returned by this module’sstate_dict()function. Default:True- assign (bool, optional): When set to
False, the properties of the tensors in the current module are preserved whereas setting it to
Truepreserves properties of the Tensors in the state dict. The only exception is therequires_gradfield ofParameterfor which the value from the module is preserved. Default:False
- Returns:
NamedTuplewithmissing_keysandunexpected_keysfields:missing_keysis a list of str containing any keys that are expectedby this module but missing from the provided
state_dict.
unexpected_keysis a list of str containing the keys that are notexpected by this module but present in the provided
state_dict.
- Note:
If a parameter or buffer is registered as
Noneand its corresponding key exists instate_dict,load_state_dict()will raise aRuntimeError.
- modules() Iterator[Module][source]¶
Return an iterator over all modules in the network.
- Yields:
Module: a module in the network
- Note:
Duplicate modules are returned only once. In the following example,
lwill be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): ... print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True)
-
mtia(device: int | device | None =
None) Self[source]¶ Move all model parameters and buffers to the MTIA.
This also makes associated parameters and buffers different objects. So it should be called before constructing the optimizer if the module will live on MTIA while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
-
named_buffers(prefix: str =
'', recurse: bool =True, remove_duplicate: bool =True) Iterator[tuple[str, Tensor]][source]¶ Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
- Args:
prefix (str): prefix to prepend to all buffer names. recurse (bool, optional): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that are direct members of this module. Defaults to True.
remove_duplicate (bool, optional): whether to remove the duplicated buffers in the result. Defaults to True.
- Yields:
(str, torch.Tensor): Tuple containing the name and buffer
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size())
- named_children() Iterator[tuple[str, Module]][source]¶
Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
- Yields:
(str, Module): Tuple containing a name and child module
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module)
-
named_modules(memo: set[Module] | None =
None, prefix: str ='', remove_duplicate: bool =True)[source]¶ Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- Args:
memo: a memo to store the set of modules already added to the result prefix: a prefix that will be added to the name of the module remove_duplicate: whether to remove the duplicated module instances in the result
or not
- Yields:
(str, Module): Tuple of name and module
- Note:
Duplicate modules are returned only once. In the following example,
lwill be returned only once.
Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): ... print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
-
named_parameters(prefix: str =
'', recurse: bool =True, remove_duplicate: bool =True) Iterator[tuple[str, Parameter]][source]¶ Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- Args:
prefix (str): prefix to prepend to all parameter names. recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- remove_duplicate (bool, optional): whether to remove the duplicated
parameters in the result. Defaults to True.
- Yields:
(str, Parameter): Tuple containing the name and parameter
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size())
-
parameters(recurse: bool =
True) Iterator[Parameter][source]¶ Return an iterator over module parameters.
This is typically passed to an optimizer.
- Args:
- recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
Parameter: module parameter
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- register_backward_hook(hook: Callable[[Module, tuple[Tensor, ...] | Tensor, tuple[Tensor, ...] | Tensor], None | tuple[Tensor, ...] | Tensor]) RemovableHandle[source]¶
Register a backward hook on the module.
This function is deprecated in favor of
register_full_backward_hook()and the behavior of this function will change in future versions.- Returns:
torch.utils.hooks.RemovableHandle:a handle that can be used to remove the added hook by calling
handle.remove()
-
register_buffer(name: str, tensor: Tensor | None, persistent: bool =
True) None[source]¶ Add a buffer to the module.
This is typically used to register a buffer that should not be considered a model parameter. For example, BatchNorm’s
running_meanis not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by settingpersistenttoFalse. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’sstate_dict.Buffers can be accessed as attributes using given names.
- Args:
- name (str): name of the buffer. The buffer can be accessed
from this module using the given name
- tensor (Tensor or None): buffer to be registered. If
None, then operations that run on buffers, such as
cuda, are ignored. IfNone, the buffer is not included in the module’sstate_dict.- persistent (bool): whether the buffer is part of this module’s
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> self.register_buffer('running_mean', torch.zeros(num_features))
-
register_forward_hook(hook: Callable[[T, tuple[Any, ...], Any], Any | None] | Callable[[T, tuple[Any, ...], dict[str, Any], Any], Any | None], *, prepend: bool =
False, with_kwargs: bool =False, always_call: bool =False) RemovableHandle[source]¶ Register a forward hook on the module.
The hook will be called every time after
forward()has computed an output.If
with_kwargsisFalseor not specified, the input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to theforward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called afterforward()is called. The hook should have the following signature:hook(module, args, output) -> None or modified outputIf
with_kwargsisTrue, the forward hook will be passed thekwargsgiven to the forward function and be expected to return the output possibly modified. The hook should have the following signature:hook(module, args, kwargs, output) -> None or modified output- Args:
hook (Callable): The user defined hook to be registered. prepend (bool): If
True, the providedhookwill be firedbefore all existing
forwardhooks on thistorch.nn.Module. Otherwise, the providedhookwill be fired after all existingforwardhooks on thistorch.nn.Module. Note that globalforwardhooks registered withregister_module_forward_hook()will fire before all hooks registered by this method. Default:False- with_kwargs (bool): If
True, thehookwill be passed the kwargs given to the forward function. Default:
False- always_call (bool): If
Truethehookwill be run regardless of whether an exception is raised while calling the Module. Default:
False
- with_kwargs (bool): If
- Returns:
torch.utils.hooks.RemovableHandle:a handle that can be used to remove the added hook by calling
handle.remove()
-
register_forward_pre_hook(hook: Callable[[T, tuple[Any, ...]], Any | None] | Callable[[T, tuple[Any, ...], dict[str, Any]], tuple[Any, dict[str, Any]] | None], *, prepend: bool =
False, with_kwargs: bool =False) RemovableHandle[source]¶ Register a forward pre-hook on the module.
The hook will be called every time before
forward()is invoked.If
with_kwargsis false or not specified, the input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to theforward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned (unless that value is already a tuple). The hook should have the following signature:hook(module, args) -> None or modified inputIf
with_kwargsis true, the forward pre-hook will be passed the kwargs given to the forward function. And if the hook modifies the input, both the args and kwargs should be returned. The hook should have the following signature:hook(module, args, kwargs) -> None or a tuple of modified input and kwargs- Args:
hook (Callable): The user defined hook to be registered. prepend (bool): If true, the provided
hookwill be fired beforeall existing
forward_prehooks on thistorch.nn.Module. Otherwise, the providedhookwill be fired after all existingforward_prehooks on thistorch.nn.Module. Note that globalforward_prehooks registered withregister_module_forward_pre_hook()will fire before all hooks registered by this method. Default:False- with_kwargs (bool): If true, the
hookwill be passed the kwargs given to the forward function. Default:
False
- with_kwargs (bool): If true, the
- Returns:
torch.utils.hooks.RemovableHandle:a handle that can be used to remove the added hook by calling
handle.remove()
-
register_full_backward_hook(hook: Callable[[Module, tuple[Tensor, ...] | Tensor, tuple[Tensor, ...] | Tensor], None | tuple[Tensor, ...] | Tensor], prepend: bool =
False) RemovableHandle[source]¶ Register a backward hook on the module.
The hook will be called every time the gradients with respect to a module are computed, and its firing rules are as follows:
Ordinarily, the hook fires when the gradients are computed with respect to the module inputs.
If none of the module inputs require gradients, the hook will fire when the gradients are computed with respect to module outputs.
If none of the module outputs require gradients, then the hooks will not fire.
The hook should have the following signature:
hook(module, grad_input, grad_output) -> tuple(Tensor) or NoneThe
grad_inputandgrad_outputare tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place ofgrad_inputin subsequent computations.grad_inputwill only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries ingrad_inputandgrad_outputwill beNonefor all non-Tensor arguments.For technical reasons, when this hook is applied to a Module, its forward function will receive a view of each Tensor passed to the Module. Similarly the caller will receive a view of each Tensor returned by the Module’s forward function.
Warning
Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
- Args:
hook (Callable): The user-defined hook to be registered. prepend (bool): If true, the provided
hookwill be fired beforeall existing
backwardhooks on thistorch.nn.Module. Otherwise, the providedhookwill be fired after all existingbackwardhooks on thistorch.nn.Module. Note that globalbackwardhooks registered withregister_module_full_backward_hook()will fire before all hooks registered by this method.- Returns:
torch.utils.hooks.RemovableHandle:a handle that can be used to remove the added hook by calling
handle.remove()
-
register_full_backward_pre_hook(hook: Callable[[Module, tuple[Tensor, ...] | Tensor], None | tuple[Tensor, ...] | Tensor], prepend: bool =
False) RemovableHandle[source]¶ Register a backward pre-hook on the module.
The hook will be called every time the gradients for the module are computed. The hook should have the following signature:
hook(module, grad_output) -> tuple[Tensor, ...], Tensor or NoneThe
grad_outputis a tuple. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the output that will be used in place ofgrad_outputin subsequent computations. Entries ingrad_outputwill beNonefor all non-Tensor arguments.For technical reasons, when this hook is applied to a Module, its forward function will receive a view of each Tensor passed to the Module. Similarly the caller will receive a view of each Tensor returned by the Module’s forward function.
Warning
Modifying inputs inplace is not allowed when using backward hooks and will raise an error.
- Args:
hook (Callable): The user-defined hook to be registered. prepend (bool): If true, the provided
hookwill be fired beforeall existing
backward_prehooks on thistorch.nn.Module. Otherwise, the providedhookwill be fired after all existingbackward_prehooks on thistorch.nn.Module. Note that globalbackward_prehooks registered withregister_module_full_backward_pre_hook()will fire before all hooks registered by this method.- Returns:
torch.utils.hooks.RemovableHandle:a handle that can be used to remove the added hook by calling
handle.remove()
- register_load_state_dict_post_hook(hook)[source]¶
Register a post-hook to be run after module’s
load_state_dict()is called.- It should have the following signature::
hook(module, incompatible_keys) -> None
The
moduleargument is the current module that this hook is registered on, and theincompatible_keysargument is aNamedTupleconsisting of attributesmissing_keysandunexpected_keys.missing_keysis alistofstrcontaining the missing keys andunexpected_keysis alistofstrcontaining the unexpected keys.The given incompatible_keys can be modified inplace if needed.
Note that the checks performed when calling
load_state_dict()withstrict=Trueare affected by modifications the hook makes tomissing_keysorunexpected_keys, as expected. Additions to either set of keys will result in an error being thrown whenstrict=True, and clearing out both missing and unexpected keys will avoid an error.- Returns:
torch.utils.hooks.RemovableHandle:a handle that can be used to remove the added hook by calling
handle.remove()
- register_load_state_dict_pre_hook(hook)[source]¶
Register a pre-hook to be run before module’s
load_state_dict()is called.- It should have the following signature::
hook(module, state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs) -> None # noqa: B950
- Arguments:
- hook (Callable): Callable hook that will be invoked before
loading the state dict.
- register_parameter(name: str, param: Parameter | None) None[source]¶
Add a parameter to the module.
The parameter can be accessed as an attribute using given name.
- Args:
- name (str): name of the parameter. The parameter can be accessed
from this module using the given name
- param (Parameter or None): parameter to be added to the module. If
None, then operations that run on parameters, such ascuda, are ignored. IfNone, the parameter is not included in the module’sstate_dict.
- register_state_dict_post_hook(hook)[source]¶
Register a post-hook for the
state_dict()method.- It should have the following signature::
hook(module, state_dict, prefix, local_metadata) -> None
The registered hooks can modify the
state_dictinplace.
- register_state_dict_pre_hook(hook)[source]¶
Register a pre-hook for the
state_dict()method.- It should have the following signature::
hook(module, prefix, keep_vars) -> None
The registered hooks can be used to perform pre-processing before the
state_dictcall is made.
-
requires_grad_(requires_grad: bool =
True) Self[source]¶ Change if autograd should record operations on parameters in this module.
This method sets the parameters’
requires_gradattributes in-place.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
See Locally disabling gradient computation for a comparison between .requires_grad_() and several similar mechanisms that may be confused with it.
- Args:
- requires_grad (bool): whether autograd should record operations on
parameters in this module. Default:
True.
- Returns:
Module: self
- set_extra_state(state: Any) None[source]¶
Set extra state contained in the loaded state_dict.
This function is called from
load_state_dict()to handle any extra state found within the state_dict. Implement this function and a correspondingget_extra_state()for your module if you need to store extra state within its state_dict.- Args:
state (dict): Extra state from the state_dict
-
set_submodule(target: str, module: Module, strict: bool =
False) None[source]¶ Set the submodule given by
targetif it exists, otherwise throw an error.Note
If
strictis set toFalse(default), the method will replace an existing submodule or create a new submodule if the parent module exists. Ifstrictis set toTrue, the method will only attempt to replace an existing submodule and throw an error if the submodule does not exist.For example, let’s say you have an
nn.ModuleAthat looks like this:A( (net_b): Module( (net_c): Module( (conv): Conv2d(3, 3, 3) ) (linear): Linear(3, 3) ) )(The diagram shows an
nn.ModuleA.Ahas a nested submodulenet_b, which itself has two submodulesnet_candlinear.net_cthen has a submoduleconv.)To override the
Conv2dwith a new submoduleLinear, you could callset_submodule("net_b.net_c.conv", nn.Linear(1, 1))wherestrictcould beTrueorFalseTo add a new submodule
Conv2dto the existingnet_bmodule, you would callset_submodule("net_b.conv", nn.Conv2d(1, 1, 1)).In the above if you set
strict=Trueand callset_submodule("net_b.conv", nn.Conv2d(1, 1, 1), strict=True), an AttributeError will be raised becausenet_bdoes not have a submodule namedconv.- Args:
- target: The fully-qualified string name of the submodule
to look for. (See above example for how to specify a fully-qualified string.)
module: The module to set the submodule to. strict: If
False, the method will replace an existing submoduleor create a new submodule if the parent module exists. If
True, the method will only attempt to replace an existing submodule and throw an error if the submodule doesn’t already exist.- Raises:
ValueError: If the
targetstring is empty or ifmoduleis not an instance ofnn.Module. AttributeError: If at any point along the path resulting fromthe
targetstring the (sub)path resolves to a non-existent attribute name or an object that is not an instance ofnn.Module.
-
state_dict(*, destination: T_destination, prefix: str =
'', keep_vars: bool =False) T_destination[source]¶ -
state_dict(*, prefix: str =
'', keep_vars: bool =False) dict[str, Any] Return a dictionary containing references to the whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to
Noneare not included.Note
The returned object is a shallow copy. It contains references to the module’s parameters and buffers.
Warning
Currently
state_dict()also accepts positional arguments fordestination,prefixandkeep_varsin order. However, this is being deprecated and keyword arguments will be enforced in future releases.Warning
Please avoid the use of argument
destinationas it is not designed for end-users.- Args:
- destination (dict, optional): If provided, the state of module will
be updated into the dict and the same object is returned. Otherwise, an
OrderedDictwill be created and returned. Default:None.- prefix (str, optional): a prefix added to parameter and buffer
names to compose the keys in state_dict. Default:
''.- keep_vars (bool, optional): by default the
Tensors returned in the state dict are detached from autograd. If it’s set to
True, detaching will not be performed. Default:False.
- Returns:
- dict:
a dictionary containing a whole state of the module
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> module.state_dict().keys() ['bias', 'weight']
-
to(device: str | device | int | None =
..., dtype: dtype | None =..., non_blocking: bool =...) Self[source]¶ -
to(dtype: dtype, non_blocking: bool =
...) Self -
to(tensor: Tensor, non_blocking: bool =
...) Self Move and/or cast the parameters and buffers.
This can be called as
-
to(device=
None, dtype=None, non_blocking=False)[source]
-
to(dtype, non_blocking=
False)[source]
-
to(tensor, non_blocking=
False)[source]
-
to(memory_format=
torch.channels_last)[source]
Its signature is similar to
torch.Tensor.to(), but only accepts floating point or complexdtypes. In addition, this method will only cast the floating point or complex parameters and buffers todtype(if given). The integral parameters and buffers will be moveddevice, if that is given, but with dtypes unchanged. Whennon_blockingis set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Args:
- device (
torch.device): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype): the desired floating point or complex dtype of the parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- memory_format (
torch.memory_format): the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- device (
- Returns:
Module: self
Examples:
>>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)-
to(device=
-
to_empty(*, device: str | device | int | None, recurse: bool =
True) Self[source]¶ Move the parameters and buffers to the specified device without copying storage.
- Args:
- device (
torch.device): The desired device of the parameters and buffers in this module.
- recurse (bool): Whether parameters and buffers of submodules should
be recursively moved to the specified device.
- device (
- Returns:
Module: self
-
train(mode: bool =
True) Self[source]¶ Set the module in training mode.
This has an effect only on certain modules. See the documentation of particular modules for details of their behaviors in training/evaluation mode, i.e., whether they are affected, e.g.
Dropout,BatchNorm, etc.- Args:
- mode (bool): whether to set training mode (
True) or evaluation mode (
False). Default:True.
- mode (bool): whether to set training mode (
- Returns:
Module: self
- type(dst_type: dtype | str) Self[source]¶
Casts all parameters and buffers to
dst_type.Note
This method modifies the module in-place.
- Args:
dst_type (type or string): the desired type
- Returns:
Module: self
-
xpu(device: int | device | None =
None) Self[source]¶ Move all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
Note
This method modifies the module in-place.
- Arguments:
- device (int, optional): if specified, all parameters will be
copied to that device
- Returns:
Module: self
-
zero_grad(set_to_none: bool =
True) None[source]¶ Reset gradients of all model parameters.
See similar function under
torch.optim.Optimizerfor more context.- Args:
- set_to_none (bool): instead of setting to zero, set the grads to None.
See
torch.optim.Optimizer.zero_grad()for details.
-
class torchwrench.nn.ModuleDict(modules: Mapping[str, Module] | None =
None)[source]¶ Bases:
ModuleHolds submodules in a dictionary.
ModuleDictcan be indexed like a regular Python dictionary, but modules it contains are properly registered, and will be visible by allModulemethods.ModuleDictis an ordered dictionary that respectsthe order of insertion, and
in
update(), the order of the mergedOrderedDict,dict(started from Python 3.6) or anotherModuleDict(the argument toupdate()).
Note that
update()with other unordered mapping types does not preserve the order of the merged mapping.- Args:
- modules (iterable, optional): a mapping (dictionary) of (string: module)
or an iterable of key-value pairs of type (string, module)
Example:
class MyModule(nn.Module): def __init__(self) -> None: super().__init__() self.choices = nn.ModuleDict( {"conv": nn.Conv2d(10, 10, 3), "pool": nn.MaxPool2d(3)} ) self.activations = nn.ModuleDict( [["lrelu", nn.LeakyReLU()], ["prelu", nn.PReLU()]] ) def forward(self, x, choice, act): x = self.choices[choice](x) x = self.activations[act](x) return x- pop(key: str) Module[source]¶
Remove key from the ModuleDict and return its module.
- Args:
key (str): key to pop from the ModuleDict
- update(modules: Mapping[str, Module]) None[source]¶
Update the
ModuleDictwith key-value pairs from a mapping, overwriting existing keys.Note
If
modulesis anOrderedDict, aModuleDict, or an iterable of key-value pairs, the order of new elements in it is preserved.
- values() ValuesView[Module][source]¶
Return an iterable of the ModuleDict values.
-
class torchwrench.nn.ModuleList(modules: Iterable[Module] | None =
None)[source]¶ Bases:
ModuleHolds submodules in a list.
ModuleListcan be indexed like a regular Python list, but modules it contains are properly registered, and will be visible by allModulemethods.- Args:
modules (iterable, optional): an iterable of modules to add
Example:
class MyModule(nn.Module): def __init__(self) -> None: super().__init__() self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)]) def forward(self, x): # ModuleList can act as an iterable, or be indexed using ints for i, l in enumerate(self.linears): x = self.linears[i // 2](x) + l(x) return x- append(module: Module) Self[source]¶
Append a given module to the end of the list.
- Args:
module (nn.Module): module to append
- extend(modules: Iterable[Module]) Self[source]¶
Append modules from a Python iterable to the end of the list.
- Args:
modules (iterable): iterable of modules to append
- torchwrench.nn.ModulePartial¶
alias of
EModulePartial
-
class torchwrench.nn.MoveToRec(predicate: Callable[[Tensor | Module], bool] | None =
None)[source]¶ Bases:
ModuleModule version of
move_to_rec().- forward(x: Any) Any[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.MultiIndicesToMultihot(num_classes: int, *, padding_idx: int | None =
None, device: device | None | 'default' | 'cuda_if_available' | str | int =None, dtype: dtype | None | 'default' | str | DTypeEnum =torch.bool)[source]¶ Bases:
ModuleFor more information, see
indices_to_multihot().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(indices: list[list[int]] | list[Tensor]) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.MultiIndicesToMultinames(idx_to_name: Mapping[int, T_Name], *, padding_idx: int | None =
None)[source]¶ Bases:
Generic[T_Name],ModuleFor more information, see
indices_to_multinames().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(indices: list[list[int]] | list[Tensor]) list[list[T_Name]][source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.MultiLabelMarginLoss(size_average=
None, reduce=None, reduction: str ='mean')[source]¶ Bases:
_LossCreates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input \(x\) (a 2D mini-batch Tensor) and output \(y\) (which is a 2D Tensor of target class indices). For each sample in the mini-batch:
\[\text{loss}(x, y) = \sum_{ij}\frac{\max(0, 1 - (x[y[j]] - x[i]))}{\text{x.size}(0)}\]where \(x \in \left\{0, \; \cdots , \; \text{x.size}(0) - 1\right\}\), \(y \in \left\{0, \; \cdots , \; \text{y.size}(0) - 1\right\}\), \(0 \leq y[j] \leq \text{x.size}(0)-1\), and \(i \neq y[j]\) for all \(i\) and \(j\).
\(y\) and \(x\) must have the same size.
The criterion only considers a contiguous block of non-negative targets that starts at the front.
This allows for different samples to have variable amounts of target classes.
- Args:
- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- size_average (bool, optional): Deprecated (see
- Shape:
Input: \((C)\) or \((N, C)\) where N is the batch size and C is the number of classes.
Target: \((C)\) or \((N, C)\), label targets padded by -1 ensuring same shape as the input.
Output: scalar. If
reductionis'none', then \((N)\).
Examples:
>>> loss = nn.MultiLabelMarginLoss() >>> x = torch.FloatTensor([[0.1, 0.2, 0.4, 0.8]]) >>> # for target y, only consider labels 3 and 0, not after label -1 >>> y = torch.LongTensor([[3, 0, -1, 1]]) >>> # 0.25 * ((1-(0.1-0.2)) + (1-(0.1-0.4)) + (1-(0.8-0.2)) + (1-(0.8-0.4))) >>> loss(x, y) tensor(0.85...)
-
class torchwrench.nn.MultiLabelSoftMarginLoss(weight: Tensor | None =
None, size_average=None, reduce=None, reduction: str ='mean')[source]¶ Bases:
_WeightedLossCreates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input \(x\) and target \(y\) of size \((N, C)\). For each sample in the minibatch:
\[loss(x, y) = - \frac{1}{C} * \sum_i y[i] * \log((1 + \exp(-x[i]))^{-1}) + (1-y[i]) * \log\left(\frac{\exp(-x[i])}{(1 + \exp(-x[i]))}\right)\]where \(i \in \left\{0, \; \cdots , \; \text{x.nElement}() - 1\right\}\), \(y[i] \in \left\{0, \; 1\right\}\).
- Args:
- weight (Tensor, optional): a manual rescaling weight given to each
class. If given, it has to be a Tensor of size C. Otherwise, it is treated as if having all ones.
- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- Shape:
Input: \((N, C)\) where N is the batch size and C is the number of classes.
Target: \((N, C)\), label targets must have the same shape as the input.
Output: scalar. If
reductionis'none', then \((N)\).
-
class torchwrench.nn.MultiMarginLoss(p: int =
1, margin: float =1.0, weight: Tensor | None =None, size_average=None, reduce=None, reduction: str ='mean')[source]¶ Bases:
_WeightedLossCreates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input \(x\) (a 2D mini-batch Tensor) and output \(y\) (which is a 1D tensor of target class indices, \(0 \leq y \leq \text{x.size}(1)-1\)):
For each mini-batch sample, the loss in terms of the 1D input \(x\) and scalar output \(y\) is:
\[\text{loss}(x, y) = \frac{\sum_i \max(0, \text{margin} - x[y] + x[i])^p}{\text{x.size}(0)}\]where \(i \in \left\{0, \; \cdots , \; \text{x.size}(0) - 1\right\}\) and \(i \neq y\).
Optionally, you can give non-equal weighting on the classes by passing a 1D
weighttensor into the constructor.The loss function then becomes:
\[\text{loss}(x, y) = \frac{\sum_i w[y] * \max(0, \text{margin} - x[y] + x[i])^p}{\text{x.size}(0)}\]- Args:
- p (int, optional): Has a default value of \(1\). \(1\) and \(2\)
are the only supported values.
margin (float, optional): Has a default value of \(1\). weight (Tensor, optional): a manual rescaling weight given to each
class. If given, it has to be a Tensor of size C. Otherwise, it is treated as if having all ones.
- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- Shape:
Input: \((N, C)\) or \((C)\), where \(N\) is the batch size and \(C\) is the number of classes.
Target: \((N)\) or \(()\), where each value is \(0 \leq \text{targets}[i] \leq C-1\).
Output: scalar. If
reductionis'none', then same shape as the target.
Examples:
>>> loss = nn.MultiMarginLoss() >>> x = torch.tensor([[0.1, 0.2, 0.4, 0.8]]) >>> y = torch.tensor([3]) >>> # 0.25 * ((1-(0.8-0.1)) + (1-(0.8-0.2)) + (1-(0.8-0.4))) >>> loss(x, y) tensor(0.32...)
-
class torchwrench.nn.MultiheadAttention(embed_dim, num_heads, dropout=
0.0, bias=True, add_bias_kv=False, add_zero_attn=False, kdim=None, vdim=None, batch_first=False, device=None, dtype=None)[source]¶ Bases:
ModuleAllows the model to jointly attend to information from different representation subspaces.
This MultiheadAttention layer implements the original architecture described in the Attention Is All You Need paper. The intent of this layer is as a reference implementation for foundational understanding and thus it contains only limited features relative to newer architectures. Given the fast pace of innovation in transformer-like architectures, we recommend exploring this tutorial to build efficient layers from building blocks in core or using higher level libraries from the PyTorch Ecosystem.
Multi-Head Attention is defined as:
\[\text{MultiHead}(Q, K, V) = \text{Concat}(\text{head}_1,\dots,\text{head}_h)W^O\]where \(\text{head}_i = \text{Attention}(QW_i^Q, KW_i^K, VW_i^V)\).
nn.MultiheadAttentionwill use the optimized implementations ofscaled_dot_product_attention()when possible.In addition to support for the new
scaled_dot_product_attention()function, for speeding up Inference, MHA will use fastpath inference with support for Nested Tensors, iff:self attention is being computed (i.e.,
query,key, andvalueare the same tensor).inputs are batched (3D) with
batch_first==TrueEither autograd is disabled (using
torch.inference_modeortorch.no_grad) or no tensor argumentrequires_gradtraining is disabled (using
.eval())add_bias_kvisFalseadd_zero_attnisFalsekdimandvdimare equal toembed_dimif a NestedTensor is passed, neither
key_padding_masknorattn_maskis passedautocast is disabled
If the optimized inference fastpath implementation is in use, a NestedTensor can be passed for
query/key/valueto represent padding more efficiently than using a padding mask. In this case, a NestedTensor will be returned, and an additional speedup proportional to the fraction of the input that is padding can be expected.- Args:
embed_dim: Total dimension of the model. num_heads: Number of parallel attention heads. Note that
embed_dimwill be splitacross
num_heads(i.e. each head will have dimensionembed_dim // num_heads).dropout: Dropout probability on
attn_output_weights. Default:0.0(no dropout). bias: If specified, adds bias to input / output projection layers. Default:True. add_bias_kv: If specified, adds bias to the key and value sequences at dim=0. Default:False. add_zero_attn: If specified, adds a new batch of zeros to the key and value sequences at dim=1.Default:
False.kdim: Total number of features for keys. Default:
None(useskdim=embed_dim). vdim: Total number of features for values. Default:None(usesvdim=embed_dim). batch_first: IfTrue, then the input and output tensors are providedas (batch, seq, feature). Default:
False(seq, batch, feature).
Examples:
>>> # xdoctest: +SKIP >>> multihead_attn = nn.MultiheadAttention(embed_dim, num_heads) >>> attn_output, attn_output_weights = multihead_attn(query, key, value)-
forward(query: Tensor, key: Tensor, value: Tensor, key_padding_mask: Tensor | None =
None, need_weights: bool =True, attn_mask: Tensor | None =None, average_attn_weights: bool =True, is_causal: bool =False) tuple[Tensor, Tensor | None][source]¶ Compute attention outputs using query, key, and value embeddings.
Supports optional parameters for padding, masks and attention weights.
- Args:
- query: Query embeddings of shape \((L, E_q)\) for unbatched input, \((L, N, E_q)\) when
batch_first=False or \((N, L, E_q)\) when
batch_first=True, where \(L\) is the target sequence length, \(N\) is the batch size, and \(E_q\) is the query embedding dimensionembed_dim. Queries are compared against key-value pairs to produce the output. See “Attention Is All You Need” for more details.- key: Key embeddings of shape \((S, E_k)\) for unbatched input, \((S, N, E_k)\) when
batch_first=False or \((N, S, E_k)\) when
batch_first=True, where \(S\) is the source sequence length, \(N\) is the batch size, and \(E_k\) is the key embedding dimensionkdim. See “Attention Is All You Need” for more details.- value: Value embeddings of shape \((S, E_v)\) for unbatched input, \((S, N, E_v)\) when
batch_first=Falseor \((N, S, E_v)\) whenbatch_first=True, where \(S\) is the source sequence length, \(N\) is the batch size, and \(E_v\) is the value embedding dimensionvdim. See “Attention Is All You Need” for more details.- key_padding_mask: If specified, a mask of shape \((N, S)\) indicating which elements within
key to ignore for the purpose of attention (i.e. treat as “padding”). For unbatched query, shape should be \((S)\). Binary and float masks are supported. For a binary mask, a
Truevalue indicates that the correspondingkeyvalue will be ignored for the purpose of attention. For a float mask, it will be directly added to the correspondingkeyvalue.- need_weights: If specified, returns
attn_output_weightsin addition toattn_outputs. Set
need_weights=Falseto use the optimizedscaled_dot_product_attentionand achieve the best performance for MHA. Default:True.- attn_mask: If specified, a 2D or 3D mask preventing attention to certain positions. Must be of shape
\((L, S)\) or \((N\cdot\text{num\_heads}, L, S)\), where \(N\) is the batch size, \(L\) is the target sequence length, and \(S\) is the source sequence length. A 2D mask will be broadcasted across the batch while a 3D mask allows for a different mask for each entry in the batch. Binary and float masks are supported. For a binary mask, a
Truevalue indicates that the corresponding position is not allowed to attend. For a float mask, the mask values will be added to the attention weight. If both attn_mask and key_padding_mask are supplied, their types should match.- average_attn_weights: If true, indicates that the returned
attn_weightsshould be averaged across heads. Otherwise,
attn_weightsare provided separately per head. Note that this flag only has an effect whenneed_weights=True. Default:True(i.e. average weights across heads)- is_causal: If specified, applies a causal mask as attention mask.
Default:
False. Warning:is_causalprovides a hint thatattn_maskis the causal mask. Providing incorrect hints can result in incorrect execution, including forward and backward compatibility.
- query: Query embeddings of shape \((L, E_q)\) for unbatched input, \((L, N, E_q)\) when
- Outputs:
attn_output - Attention outputs of shape \((L, E)\) when input is unbatched, \((L, N, E)\) when
batch_first=Falseor \((N, L, E)\) whenbatch_first=True, where \(L\) is the target sequence length, \(N\) is the batch size, and \(E\) is the embedding dimensionembed_dim.attn_output_weights - Only returned when
need_weights=True. Ifaverage_attn_weights=True, returns attention weights averaged across heads of shape \((L, S)\) when input is unbatched or \((N, L, S)\), where \(N\) is the batch size, \(L\) is the target sequence length, and \(S\) is the source sequence length. Ifaverage_attn_weights=False, returns attention weights per head of shape \((\text{num\_heads}, L, S)\) when input is unbatched or \((N, \text{num\_heads}, L, S)\).
Note
batch_first argument is ignored for unbatched inputs.
- merge_masks(attn_mask: Tensor | None, key_padding_mask: Tensor | None, query: Tensor) tuple[Tensor | None, int | None][source]¶
Determine mask type and combine masks if necessary.
If only one mask is provided, that mask and the corresponding mask type will be returned. If both masks are provided, they will be both expanded to shape
(batch_size, num_heads, seq_len, seq_len), combined with logicalorand mask type 2 will be returned Args:attn_mask: attention mask of shape
(seq_len, seq_len), mask type 0 key_padding_mask: padding mask of shape(batch_size, seq_len), mask type 1 query: query embeddings of shape(batch_size, seq_len, embed_dim)- Returns:
merged_mask: merged mask mask_type: merged mask type (0, 1, or 2)
- torchwrench.nn.MultihotToIndices¶
alias of
MultihotToMultiIndices
-
class torchwrench.nn.MultihotToMultiIndices(*, padding_idx: int | None =
None)[source]¶ Bases:
ModuleFor more information, see
multihot_to_indices().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(multihot: Tensor) list | LongTensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.MultihotToMultinames(idx_to_name: Mapping[int, T_Name])[source]¶
Bases:
Generic[T_Name],ModuleFor more information, see
multihot_to_multinames().- forward(multihot: Tensor) list[list[T_Name]][source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.MultilabelToPowerset(num_classes: int, max_set_size: int)[source]¶
Bases:
ModuleModule version of
multilabel_to_powerset().- forward(multilabel: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- torchwrench.nn.MultinamesToIndices¶
alias of
MultinamesToMultiIndices
- class torchwrench.nn.MultinamesToMultiIndices(idx_to_name: Mapping[int, T_Name])[source]¶
Bases:
Generic[T_Name],ModuleFor more information, see
multinames_to_indices().- forward(names: list[list[T_Name]]) list[list[int]][source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.MultinamesToMultihot(idx_to_name: Mapping[int, T_Name], *, device: device | None | 'default' | 'cuda_if_available' | str | int =
None, dtype: dtype | None | 'default' | str | DTypeEnum =torch.bool)[source]¶ Bases:
Generic[T_Name],ModuleFor more information, see
multinames_to_multihot().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(names: list[list[T_Name]]) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.NDArrayToTensor(*, device: device | None | 'default' | 'cuda_if_available' | str | int =
None, dtype: dtype | None | 'default' | str | DTypeEnum =None)[source]¶ Bases:
ModuleFor more information, see
ndarray_to_tensor().- forward(x: ndarray) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.NLLLoss(weight: Tensor | None =
None, size_average=None, ignore_index: int =-100, reduce=None, reduction: str ='mean')[source]¶ Bases:
_WeightedLossThe negative log likelihood loss. It is useful to train a classification problem with C classes.
If provided, the optional argument
weightshould be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.The input given through a forward call is expected to contain log-probabilities of each class. input has to be a Tensor of size either \((minibatch, C)\) or \((minibatch, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) for the K-dimensional case. The latter is useful for higher dimension inputs, such as computing NLL loss per-pixel for 2D images.
Obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftmax layer in the last layer of your network. You may use CrossEntropyLoss instead, if you prefer not to add an extra layer.
The target that this loss expects should be a class index in the range \([0, C-1]\) where C = number of classes; if ignore_index is specified, this loss also accepts this class index (this index may not necessarily be in the class range).
The unreduced (i.e. with
reductionset to'none') loss can be described as:\[\begin{split}\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \\ l_n = - w_{y_n} x_{n,y_n}, \\ w_{c} = \text{weight}[c] \cdot \mathbb{1}\{c \not= \text{ignore\_index}\},\end{split}\]where \(x\) is the input, \(y\) is the target, \(w\) is the weight, and \(N\) is the batch size. If
reductionis not'none'(default'mean'), then\[\begin{split}\ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n}} l_n, & \text{if reduction} = \text{`mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]- Args:
- weight (Tensor, optional): a manual rescaling weight given to each
class. If given, it has to be a Tensor of size C. Otherwise, it is treated as if having all ones.
- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:None- ignore_index (int, optional): Specifies a target value that is ignored
and does not contribute to the input gradient. When
size_averageisTrue, the loss is averaged over non-ignored targets.- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:None- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the weighted mean of the output is taken,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- Shape::
Input: \((N, C)\) or \((C)\), where C = number of classes, N = batch size, or \((N, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.
Target: \((N)\) or \(()\), where each value is \(0 \leq \text{targets}[i] \leq C-1\), or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.
Output: If
reductionis'none', shape \((N)\) or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss. Otherwise, scalar.
Examples:
>>> log_softmax = nn.LogSoftmax(dim=1) >>> loss_fn = nn.NLLLoss() >>> # input to NLLLoss is of size N x C = 3 x 5 >>> input = torch.randn(3, 5, requires_grad=True) >>> # each element in target must have 0 <= value < C >>> target = torch.tensor([1, 0, 4]) >>> loss = loss_fn(log_softmax(input), target) >>> loss.backward() >>> >>> >>> # 2D loss example (used, for example, with image inputs) >>> N, C = 5, 4 >>> loss_fn = nn.NLLLoss() >>> data = torch.randn(N, 16, 10, 10) >>> conv = nn.Conv2d(16, C, (3, 3)) >>> log_softmax = nn.LogSoftmax(dim=1) >>> # output of conv forward is of shape [N, C, 8, 8] >>> output = log_softmax(conv(data)) >>> # each element in target must have 0 <= value < C >>> target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C) >>> # input to NLLLoss is of size N x C x height (8) x width (8) >>> loss = loss_fn(output, target) >>> loss.backward()
- class torchwrench.nn.NameToIndex(idx_to_name: Mapping[int, T_Name] | Sequence[T_Name])[source]¶
Bases:
Generic[T_Name],ModuleFor more information, see
name_to_index().- forward(name: list[T_Name]) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.NameToOnehot(idx_to_name: Mapping[int, T_Name] | Sequence[T_Name], *, device: device | None | 'default' | 'cuda_if_available' | str | int =
None, dtype: dtype | None | 'default' | str | DTypeEnum =torch.bool)[source]¶ Bases:
Generic[T_Name],ModuleFor more information, see
name_to_onehot().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(name: list[T_Name]) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.Normalize(p: float =
2.0, dim: int =1, eps: float =1e-12)[source]¶ Bases:
ModuleModule version of
normalize().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.OnehotToIndex(dim: int =
-1)[source]¶ Bases:
ModuleFor more information, see
onehot_to_index().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(onehot: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.OnehotToName(idx_to_name: Mapping[int, T_Name] | Sequence[T_Name], dim: int =
-1)[source]¶ Bases:
Generic[T_Name],ModuleFor more information, see
onehot_to_name().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(onehot: Tensor) list[T_Name][source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.PReLU(num_parameters: int =
1, init: float =0.25, device=None, dtype=None)[source]¶ Bases:
ModuleApplies the element-wise PReLU function.
\[\text{PReLU}(x) = \max(0,x) + a * \min(0,x)\]or
\[\begin{split}\text{PReLU}(x) = \begin{cases} x, & \text{ if } x \ge 0 \\ ax, & \text{ otherwise } \end{cases}\end{split}\]Here \(a\) is a learnable parameter. When called without arguments, nn.PReLU() uses a single parameter \(a\) across all input channels. If called with nn.PReLU(nChannels), a separate \(a\) is used for each input channel.
Note
weight decay should not be used when learning \(a\) for good performance.
Note
Channel dim is the 2nd dim of input. When input has dims < 2, then there is no channel dim and the number of channels = 1.
- Args:
- num_parameters (int): number of \(a\) to learn.
Although it takes an int as input, there is only two values are legitimate: 1, or the number of channels at input. Default: 1
init (float): the initial value of \(a\). Default: 0.25
- Shape:
Input: \(( *)\) where * means, any number of additional dimensions.
Output: \((*)\), same shape as the input.
- Attributes:
weight (Tensor): the learnable weights of shape (
num_parameters).
Examples:
>>> m = nn.PReLU() >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.PadAndCropDim(target_length: int, align: 'left' | 'right' | 'center' | 'random' =
'left', pad_value: int | float | bool | Tensor0D | Callable[[Tensor], int | float | bool] =0.0, dim: int =-1, mode: 'constant' | 'reflect' | 'replicate' | 'circular' ='constant', generator: Generator | None | 'default' | int =None)[source]¶ Bases:
Module- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.PadAndStackRec(pad_value: int | float | bool =
0, *, align: 'left' | 'right' | 'center' | 'random' ='left', device: device | None | 'default' | 'cuda_if_available' | str | int =None, dtype: dtype | None | 'default' | str | DTypeEnum =None)[source]¶ Bases:
ModuleFor more information, see
pad_and_stack_rec().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(sequence: Tensor | int | float | tuple | list) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.PadDim(target_length: int, *, dim: int =
-1, align: 'left' | 'right' | 'center' | 'random' ='left', pad_value: int | float | bool | Tensor0D | Callable[[Tensor], int | float | bool] =0.0, mode: 'constant' | 'reflect' | 'replicate' | 'circular' ='constant', generator: Generator | None | 'default' | int =None)[source]¶ Bases:
ModuleFor more information, see
pad_dim().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.PadDims(target_lengths: Iterable[int], *, dims: Iterable[int] | None | 'auto' =
None, aligns: 'left' | 'right' | 'center' | 'random' | Iterable['left' | 'right' | 'center' | 'random'] ='left', pad_value: int | float | bool | Tensor0D | Callable[[Tensor], int | float | bool] =0.0, mode: 'constant' | 'reflect' | 'replicate' | 'circular' ='constant', generator: Generator | None | 'default' | int =None)[source]¶ Bases:
ModuleFor more information, see
pad_dims().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.PairwiseDistance(p: float =
2.0, eps: float =1e-06, keepdim: bool =False)[source]¶ Bases:
ModuleComputes the pairwise distance between input vectors, or between columns of input matrices.
Distances are computed using
p-norm, with constantepsadded to avoid division by zero ifpis negative, i.e.:\[\mathrm{dist}\left(x, y\right) = \left\Vert x-y + \epsilon e \right\Vert_p,\]where \(e\) is the vector of ones and the
p-norm is given by.\[\Vert x \Vert _p = \left( \sum_{i=1}^n \vert x_i \vert ^ p \right) ^ {1/p}.\]- Args:
p (real, optional): the norm degree. Can be negative. Default: 2 eps (float, optional): Small value to avoid division by zero.
Default: 1e-6
- keepdim (bool, optional): Determines whether or not to keep the vector dimension.
Default: False
- Shape:
Input1: \((N, D)\) or \((D)\) where N = batch dimension and D = vector dimension
Input2: \((N, D)\) or \((D)\), same shape as the Input1
Output: \((N)\) or \(()\) based on input dimension. If
keepdimisTrue, then \((N, 1)\) or \((1)\) based on input dimension.
- Examples:
>>> pdist = nn.PairwiseDistance(p=2) >>> input1 = torch.randn(100, 128) >>> input2 = torch.randn(100, 128) >>> output = pdist(input1, input2)
-
class torchwrench.nn.ParameterDict(parameters: Any =
None)[source]¶ Bases:
ModuleHolds parameters in a dictionary.
ParameterDict can be indexed like a regular Python dictionary, but Parameters it contains are properly registered, and will be visible by all Module methods. Other objects are treated as would be done by a regular Python dictionary
ParameterDictis an ordered dictionary.update()with other unordered mapping types (e.g., Python’s plaindict) does not preserve the order of the merged mapping. On the other hand,OrderedDictor anotherParameterDictwill preserve their ordering.Note that the constructor, assigning an element of the dictionary and the
update()method will convert anyTensorintoParameter.- Args:
- values (iterable, optional): a mapping (dictionary) of
(string : Any) or an iterable of key-value pairs of type (string, Any)
Example:
class MyModule(nn.Module): def __init__(self) -> None: super().__init__() self.params = nn.ParameterDict( { "left": nn.Parameter(torch.randn(5, 10)), "right": nn.Parameter(torch.randn(5, 10)), } ) def forward(self, x, choice): x = self.params[choice].mm(x) return x- copy() ParameterDict[source]¶
Return a copy of this
ParameterDictinstance.
- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
-
fromkeys(keys: Iterable[str], default: Any | None =
None) ParameterDict[source]¶ Return a new ParameterDict with the keys provided.
- Args:
keys (iterable, string): keys to make the new ParameterDict from default (Parameter, optional): value to set for all keys
-
get(key: str, default: Any | None =
None) Any[source]¶ Return the parameter associated with key if present. Otherwise return default if provided, None if not.
- Args:
key (str): key to get from the ParameterDict default (Parameter, optional): value to return if key not present
- pop(key: str) Any[source]¶
Remove key from the ParameterDict and return its parameter.
- Args:
key (str): key to pop from the ParameterDict
- popitem() tuple[str, Any][source]¶
Remove and return the last inserted (key, parameter) pair from the ParameterDict.
-
setdefault(key: str, default: Any | None =
None) Any[source]¶ Set the default for a key in the Parameterdict.
If key is in the ParameterDict, return its value. If not, insert key with a parameter default and return default. default defaults to None.
- Args:
key (str): key to set default for default (Any): the parameter set to the key
- update(parameters: Mapping[str, Any] | ParameterDict) None[source]¶
Update the
ParameterDictwith key-value pairs fromparameters, overwriting existing keys.Note
If
parametersis anOrderedDict, aParameterDict, or an iterable of key-value pairs, the order of new elements in it is preserved.- Args:
- parameters (iterable): a mapping (dictionary) from string to
Parameter, or an iterable of key-value pairs of type (string,Parameter)
-
class torchwrench.nn.ParameterList(values: Iterable[Any] | None =
None)[source]¶ Bases:
ModuleHolds parameters in a list.
ParameterListcan be used like a regular Python list, but Tensors that areParameterare properly registered, and will be visible by allModulemethods.Note that the constructor, assigning an element of the list, the
append()method and theextend()method will convert anyTensorintoParameter.- Args:
parameters (iterable, optional): an iterable of elements to add to the list.
Example:
class MyModule(nn.Module): def __init__(self) -> None: super().__init__() self.params = nn.ParameterList( [nn.Parameter(torch.randn(10, 10)) for i in range(10)] ) def forward(self, x): # ParameterList can act as an iterable, or be indexed using ints for i, p in enumerate(self.params): x = self.params[i // 2].mm(x) + p.mm(x) return x- append(value: Any) Self[source]¶
Append a given value at the end of the list.
- Args:
value (Any): value to append
- class torchwrench.nn.Permute(*args: int)[source]¶
Bases:
ModuleModule version of
permute().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.PixelShuffle(upscale_factor: int)[source]¶
Bases:
ModuleRearrange elements in a tensor according to an upscaling factor.
Rearranges elements in a tensor of shape \((*, C \times r^2, H, W)\) to a tensor of shape \((*, C, H \times r, W \times r)\), where r is an upscale factor.
This is useful for implementing efficient sub-pixel convolution with a stride of \(1/r\).
See the paper: Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network by Shi et al. (2016) for more details.
- Args:
upscale_factor (int): factor to increase spatial resolution by
- Shape:
Input: \((*, C_{in}, H_{in}, W_{in})\), where * is zero or more batch dimensions
Output: \((*, C_{out}, H_{out}, W_{out})\), where
\[C_{out} = C_{in} \div \text{upscale\_factor}^2\]\[H_{out} = H_{in} \times \text{upscale\_factor}\]\[W_{out} = W_{in} \times \text{upscale\_factor}\]Examples:
>>> pixel_shuffle = nn.PixelShuffle(3) >>> input = torch.randn(1, 9, 4, 4) >>> output = pixel_shuffle(input) >>> print(output.size()) torch.Size([1, 1, 12, 12])
- class torchwrench.nn.PixelUnshuffle(downscale_factor: int)[source]¶
Bases:
ModuleReverse the PixelShuffle operation.
Reverses the
PixelShuffleoperation by rearranging elements in a tensor of shape \((*, C, H \times r, W \times r)\) to a tensor of shape \((*, C \times r^2, H, W)\), where r is a downscale factor.See the paper: Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network by Shi et al. (2016) for more details.
- Args:
downscale_factor (int): factor to decrease spatial resolution by
- Shape:
Input: \((*, C_{in}, H_{in}, W_{in})\), where * is zero or more batch dimensions
Output: \((*, C_{out}, H_{out}, W_{out})\), where
\[C_{out} = C_{in} \times \text{downscale\_factor}^2\]\[H_{out} = H_{in} \div \text{downscale\_factor}\]\[W_{out} = W_{in} \div \text{downscale\_factor}\]Examples:
>>> pixel_unshuffle = nn.PixelUnshuffle(3) >>> input = torch.randn(1, 1, 12, 12) >>> output = pixel_unshuffle(input) >>> print(output.size()) torch.Size([1, 9, 4, 4])
-
class torchwrench.nn.PoissonNLLLoss(log_input: bool =
True, full: bool =False, size_average=None, eps: float =1e-08, reduce=None, reduction: str ='mean')[source]¶ Bases:
_LossNegative log likelihood loss with Poisson distribution of target.
The loss can be described as:
\[ \begin{align}\begin{aligned}\text{target} \sim \mathrm{Poisson}(\text{input})\\\text{loss}(\text{input}, \text{target}) = \text{input} - \text{target} * \log(\text{input}) + \log(\text{target!})\end{aligned}\end{align} \]The last term can be omitted or approximated with Stirling formula. The approximation is used for target values more than 1. For targets less or equal to 1 zeros are added to the loss.
- Args:
- log_input (bool, optional): if
Truethe loss is computed as \(\exp(\text{input}) - \text{target}*\text{input}\), if
Falsethe loss is \(\text{input} - \text{target}*\log(\text{input}+\text{eps})\).- full (bool, optional): whether to compute full loss, i. e. to add the
Stirling approximation term
\[\text{target}*\log(\text{target}) - \text{target} + 0.5 * \log(2\pi\text{target}).\]- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- eps (float, optional): Small value to avoid evaluation of \(\log(0)\) when
log_input = False. Default: 1e-8- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- log_input (bool, optional): if
Examples:
>>> loss = nn.PoissonNLLLoss() >>> log_input = torch.randn(5, 2, requires_grad=True) >>> target = torch.randn(5, 2) >>> output = loss(log_input, target) >>> output.backward()- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Target: \((*)\), same shape as the input.
Output: scalar by default. If
reductionis'none', then \((*)\), the same shape as the input.
-
class torchwrench.nn.PositionalEncoding(emb_size: int, dropout_p: float, maxlen: int =
5000, device: device | None | 'default' | 'cuda_if_available' | str | int =None)[source]¶ Bases:
Module- forward(token_emb: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.Pow(exponent: int | float | bool | Tensor)[source]¶
Bases:
ModuleModule version of
pow().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.PowersetToMultilabel(num_classes: int, max_set_size: int, soft: bool =
False)[source]¶ Bases:
ModuleModule version of
powerset_to_multilabel().-
forward(powerset: Tensor, soft: bool | None =
None) Tensor3D[source]¶ Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
forward(powerset: Tensor, soft: bool | None =
-
class torchwrench.nn.ProbsToIndex(dim: int =
-1)[source]¶ Bases:
ModuleFor more information, see
probs_to_index().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(probs: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- torchwrench.nn.ProbsToIndices¶
alias of
ProbsToMultiIndices
-
class torchwrench.nn.ProbsToMultiIndices(threshold: float | Tensor, *, padding_idx: int | None =
None)[source]¶ Bases:
ModuleFor more information, see
probs_to_indices().- forward(probs: Tensor) list | LongTensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.ProbsToMultihot(threshold: float | Tensor, *, device: device | None | 'default' | 'cuda_if_available' | str | int =
None, dtype: dtype | None | 'default' | str | DTypeEnum =torch.bool)[source]¶ Bases:
ModuleFor more information, see
probs_to_multihot().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(probs: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.ProbsToMultinames(threshold: float | Tensor, idx_to_name: Mapping[int, T_Name])[source]¶
Bases:
Generic[T_Name],ModuleFor more information, see
probs_to_multinames().- forward(probs: Tensor) list[list[T_Name]][source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.ProbsToName(idx_to_name: Mapping[int, T_Name] | Sequence[T_Name], dim: int =
-1)[source]¶ Bases:
Generic[T_Name],ModuleFor more information, see
probs_to_name().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(probs: Tensor) list[T_Name][source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.ProbsToOnehot(*, dim: int =
-1, device: device | None | 'default' | 'cuda_if_available' | str | int =None, dtype: dtype | None | 'default' | str | DTypeEnum =torch.bool)[source]¶ Bases:
ModuleFor more information, see
probs_to_onehot().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(probs: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.RNN(input_size, hidden_size, num_layers=
1, nonlinearity='tanh', bias=True, batch_first=False, dropout=0.0, bidirectional=False, device=None, dtype=None)[source]¶ Bases:
RNNBaseApply a multi-layer Elman RNN with \(\tanh\) or \(\text{ReLU}\) non-linearity to an input sequence. For each element in the input sequence, each layer computes the following function:
\[h_t = \tanh(x_t W_{ih}^T + b_{ih} + h_{t-1}W_{hh}^T + b_{hh})\]where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, and \(h_{(t-1)}\) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If
nonlinearityis'relu', then \(\text{ReLU}\) is used instead of \(\tanh\).# Efficient implementation equivalent to the following with bidirectional=False rnn = nn.RNN(input_size, hidden_size, num_layers) params = dict(rnn.named_parameters()) def forward(x, hx=None, batch_first=False): if batch_first: x = x.transpose(0, 1) seq_len, batch_size, _ = x.size() if hx is None: hx = torch.zeros(rnn.num_layers, batch_size, rnn.hidden_size) h_t_minus_1 = hx.clone() h_t = hx.clone() output = [] for t in range(seq_len): for layer in range(rnn.num_layers): input_t = x[t] if layer == 0 else h_t[layer - 1] h_t[layer] = torch.tanh( input_t @ params[f"weight_ih_l{layer}"].T + h_t_minus_1[layer] @ params[f"weight_hh_l{layer}"].T + params[f"bias_hh_l{layer}"] + params[f"bias_ih_l{layer}"] ) output.append(h_t[-1].clone()) h_t_minus_1 = h_t.clone() output = torch.stack(output) if batch_first: output = output.transpose(0, 1) return output, h_t- Args:
input_size: The number of expected features in the input x hidden_size: The number of features in the hidden state h num_layers: Number of recurrent layers. E.g., setting
num_layers=2would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
nonlinearity: The non-linearity to use. Can be either
'tanh'or'relu'. Default:'tanh'bias: IfFalse, then the layer does not use bias weights b_ih and b_hh.Default:
True- batch_first: If
True, then the input and output tensors are provided as (batch, seq, feature) instead of (seq, batch, feature). Note that this does not apply to hidden or cell states. See the Inputs/Outputs sections below for details. Default:
False- dropout: If non-zero, introduces a Dropout layer on the outputs of each
RNN layer except the last layer, with dropout probability equal to
dropout. Default: 0
bidirectional: If
True, becomes a bidirectional RNN. Default:False- batch_first: If
- Inputs: input, hx
input: tensor of shape \((L, H_{in})\) for unbatched input, \((L, N, H_{in})\) when
batch_first=Falseor \((N, L, H_{in})\) whenbatch_first=Truecontaining the features of the input sequence. The input can also be a packed variable length sequence. Seetorch.nn.utils.rnn.pack_padded_sequence()ortorch.nn.utils.rnn.pack_sequence()for details.hx: tensor of shape \((D * \text{num\_layers}, H_{out})\) for unbatched input or \((D * \text{num\_layers}, N, H_{out})\) containing the initial hidden state for the input sequence batch. Defaults to zeros if not provided.
where:
\[\begin{split}\begin{aligned} N ={} & \text{batch size} \\ L ={} & \text{sequence length} \\ D ={} & 2 \text{ if bidirectional=True otherwise } 1 \\ H_{in} ={} & \text{input\_size} \\ H_{out} ={} & \text{hidden\_size} \end{aligned}\end{split}\]- Outputs: output, h_n
output: tensor of shape \((L, D * H_{out})\) for unbatched input, \((L, N, D * H_{out})\) when
batch_first=Falseor \((N, L, D * H_{out})\) whenbatch_first=Truecontaining the output features (h_t) from the last layer of the RNN, for each t. If atorch.nn.utils.rnn.PackedSequencehas been given as the input, the output will also be a packed sequence.h_n: tensor of shape \((D * \text{num\_layers}, H_{out})\) for unbatched input or \((D * \text{num\_layers}, N, H_{out})\) containing the final hidden state for each element in the batch.
- Attributes:
- weight_ih_l[k]: the learnable input-hidden weights of the k-th layer,
of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)
- weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer,
of shape (hidden_size, hidden_size)
- bias_ih_l[k]: the learnable input-hidden bias of the k-th layer,
of shape (hidden_size)
- bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer,
of shape (hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
Note
For bidirectional RNNs, forward and backward are directions 0 and 1 respectively. Example of splitting the output layers when
batch_first=False:output.view(seq_len, batch, num_directions, hidden_size).Note
batch_firstargument is ignored for unbatched inputs.Examples:
>>> rnn = nn.RNN(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> output, hn = rnn(input, h0)
-
class torchwrench.nn.RNNBase(mode: str, input_size: int, hidden_size: int, num_layers: int =
1, bias: bool =True, batch_first: bool =False, dropout: float =0.0, bidirectional: bool =False, proj_size: int =0, device=None, dtype=None)[source]¶ Bases:
ModuleBase class for RNN modules (RNN, LSTM, GRU).
Implements aspects of RNNs shared by the RNN, LSTM, and GRU classes, such as module initialization and utility methods for parameter storage management.
Note
The forward method is not implemented by the RNNBase class.
Note
LSTM and GRU classes override some methods implemented by RNNBase.
- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- flatten_parameters() None[source]¶
Reset parameter data pointer so that they can use faster code paths.
Right now, this works only if the module is on the GPU and cuDNN is enabled. Otherwise, it’s a no-op.
-
class torchwrench.nn.RNNCell(input_size: int, hidden_size: int, bias: bool =
True, nonlinearity: str ='tanh', device=None, dtype=None)[source]¶ Bases:
RNNCellBaseAn Elman RNN cell with tanh or ReLU non-linearity.
\[h' = \tanh(W_{ih} x + b_{ih} + W_{hh} h + b_{hh})\]If
nonlinearityis ‘relu’, then ReLU is used in place of tanh.- Args:
input_size: The number of expected features in the input x hidden_size: The number of features in the hidden state h bias: If
False, then the layer does not use bias weights b_ih and b_hh.Default:
Truenonlinearity: The non-linearity to use. Can be either
'tanh'or'relu'. Default:'tanh'- Inputs: input, hidden
input: tensor containing input features
hidden: tensor containing the initial hidden state Defaults to zero if not provided.
- Outputs: h’
h’ of shape (batch, hidden_size): tensor containing the next hidden state for each element in the batch
- Shape:
input: \((N, H_{in})\) or \((H_{in})\) tensor containing input features where \(H_{in}\) = input_size.
hidden: \((N, H_{out})\) or \((H_{out})\) tensor containing the initial hidden state where \(H_{out}\) = hidden_size. Defaults to zero if not provided.
output: \((N, H_{out})\) or \((H_{out})\) tensor containing the next hidden state.
- Attributes:
- weight_ih: the learnable input-hidden weights, of shape
(hidden_size, input_size)
- weight_hh: the learnable hidden-hidden weights, of shape
(hidden_size, hidden_size)
bias_ih: the learnable input-hidden bias, of shape (hidden_size) bias_hh: the learnable hidden-hidden bias, of shape (hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
Examples:
>>> rnn = nn.RNNCell(10, 20) >>> input = torch.randn(6, 3, 10) >>> hx = torch.randn(3, 20) >>> output = [] >>> for i in range(6): ... hx = rnn(input[i], hx) ... output.append(hx)-
forward(input: Tensor, hx: Tensor | None =
None) Tensor[source]¶ Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.RNNCellBase(input_size: int, hidden_size: int, bias: bool, num_chunks: int, device=
None, dtype=None)[source]¶ Bases:
Module- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
-
class torchwrench.nn.RReLU(lower: float =
0.125, upper: float =0.3333333333333333, inplace: bool =False)[source]¶ Bases:
ModuleApplies the randomized leaky rectified linear unit function, element-wise.
Method described in the paper: Empirical Evaluation of Rectified Activations in Convolutional Network.
The function is defined as:
\[\begin{split}\text{RReLU}(x) = \begin{cases} x & \text{if } x \geq 0 \\ ax & \text{ otherwise } \end{cases}\end{split}\]where \(a\) is randomly sampled from uniform distribution \(\mathcal{U}(\text{lower}, \text{upper})\) during training while during evaluation \(a\) is fixed with \(a = \frac{\text{lower} + \text{upper}}{2}\).
- Args:
lower: lower bound of the uniform distribution. Default: \(\frac{1}{8}\) upper: upper bound of the uniform distribution. Default: \(\frac{1}{3}\) inplace: can optionally do the operation in-place. Default:
False- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.RReLU(0.1, 0.3) >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.ReLU(inplace: bool =
False)[source]¶ Bases:
ModuleApplies the rectified linear unit function element-wise.
\(\text{ReLU}(x) = (x)^+ = \max(0, x)\)
- Args:
inplace: can optionally do the operation in-place. Default:
False- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.ReLU() >>> input = torch.randn(2) >>> output = m(input) An implementation of CReLU - https://arxiv.org/abs/1603.05201 >>> m = nn.ReLU() >>> input = torch.randn(2).unsqueeze(0) >>> output = torch.cat((m(input), m(-input)))
-
class torchwrench.nn.ReLU6(inplace: bool =
False)[source]¶ Bases:
HardtanhApplies the ReLU6 function element-wise.
\[\text{ReLU6}(x) = \min(\max(0,x), 6)\]- Args:
inplace: can optionally do the operation in-place. Default:
False- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.ReLU6() >>> input = torch.randn(2) >>> output = m(input)
- class torchwrench.nn.Real(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
real().- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.ReflectionPad1d(padding: int | tuple[int, int])[source]¶
Bases:
_ReflectionPadNdPads the input tensor using the reflection of the input boundary.
For N-dimensional padding, use
torch.nn.functional.pad().- Args:
- padding (int, tuple): the size of the padding. If is int, uses the same
padding in all boundaries. If a 2-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\)) Note that padding size should be less than the corresponding input dimension.
- Shape:
Input: \((C, W_{in})\) or \((N, C, W_{in})\).
Output: \((C, W_{out})\) or \((N, C, W_{out})\), where
\(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)
Examples:
>>> m = nn.ReflectionPad1d(2) >>> # xdoctest: +IGNORE_WANT("other tests seem to modify printing styles") >>> input = torch.arange(8, dtype=torch.float).reshape(1, 2, 4) >>> input tensor([[[0., 1., 2., 3.], [4., 5., 6., 7.]]]) >>> m(input) tensor([[[2., 1., 0., 1., 2., 3., 2., 1.], [6., 5., 4., 5., 6., 7., 6., 5.]]]) >>> # using different paddings for different sides >>> m = nn.ReflectionPad1d((3, 1)) >>> m(input) tensor([[[3., 2., 1., 0., 1., 2., 3., 2.], [7., 6., 5., 4., 5., 6., 7., 6.]]])
- class torchwrench.nn.ReflectionPad2d(padding: int | tuple[int, int, int, int])[source]¶
Bases:
_ReflectionPadNdPads the input tensor using the reflection of the input boundary.
For N-dimensional padding, use
torch.nn.functional.pad().- Args:
- padding (int, tuple): the size of the padding. If is int, uses the same
padding in all boundaries. If a 4-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\)) Note that padding size should be less than the corresponding input dimension.
- Shape:
Input: \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\).
Output: \((N, C, H_{out}, W_{out})\) or \((C, H_{out}, W_{out})\) where
\(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\)
\(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)
Examples:
>>> # xdoctest: +IGNORE_WANT("not sure why xdoctest is choking on this") >>> m = nn.ReflectionPad2d(2) >>> input = torch.arange(9, dtype=torch.float).reshape(1, 1, 3, 3) >>> input tensor([[[[0., 1., 2.], [3., 4., 5.], [6., 7., 8.]]]]) >>> m(input) tensor([[[[8., 7., 6., 7., 8., 7., 6.], [5., 4., 3., 4., 5., 4., 3.], [2., 1., 0., 1., 2., 1., 0.], [5., 4., 3., 4., 5., 4., 3.], [8., 7., 6., 7., 8., 7., 6.], [5., 4., 3., 4., 5., 4., 3.], [2., 1., 0., 1., 2., 1., 0.]]]]) >>> # using different paddings for different sides >>> m = nn.ReflectionPad2d((1, 1, 2, 0)) >>> m(input) tensor([[[[7., 6., 7., 8., 7.], [4., 3., 4., 5., 4.], [1., 0., 1., 2., 1.], [4., 3., 4., 5., 4.], [7., 6., 7., 8., 7.]]]])
- class torchwrench.nn.ReflectionPad3d(padding: int | tuple[int, int, int, int, int, int])[source]¶
Bases:
_ReflectionPadNdPads the input tensor using the reflection of the input boundary.
For N-dimensional padding, use
torch.nn.functional.pad().- Args:
- padding (int, tuple): the size of the padding. If is int, uses the same
padding in all boundaries. If a 6-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\), \(\text{padding\_front}\), \(\text{padding\_back}\)) Note that padding size should be less than the corresponding input dimension.
- Shape:
Input: \((N, C, D_{in}, H_{in}, W_{in})\) or \((C, D_{in}, H_{in}, W_{in})\).
Output: \((N, C, D_{out}, H_{out}, W_{out})\) or \((C, D_{out}, H_{out}, W_{out})\), where
\(D_{out} = D_{in} + \text{padding\_front} + \text{padding\_back}\)
\(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\)
\(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)
Examples:
>>> # xdoctest: +IGNORE_WANT("not sure why xdoctest is choking on this") >>> m = nn.ReflectionPad3d(1) >>> input = torch.arange(8, dtype=torch.float).reshape(1, 1, 2, 2, 2) >>> m(input) tensor([[[[[7., 6., 7., 6.], [5., 4., 5., 4.], [7., 6., 7., 6.], [5., 4., 5., 4.]], [[3., 2., 3., 2.], [1., 0., 1., 0.], [3., 2., 3., 2.], [1., 0., 1., 0.]], [[7., 6., 7., 6.], [5., 4., 5., 4.], [7., 6., 7., 6.], [5., 4., 5., 4.]], [[3., 2., 3., 2.], [1., 0., 1., 0.], [3., 2., 3., 2.], [1., 0., 1., 0.]]]]])
- class torchwrench.nn.Repeat(*repeats: int)[source]¶
Bases:
ModuleModule version of
repeat().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.RepeatInterleave(repeats: int | Tensor, dim: int, output_size: int | None =
None)[source]¶ Bases:
ModuleModule version of
repeat_interleave().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.RepeatInterleaveNd(repeats: int, dim: int)[source]¶
Bases:
ModuleFor more information, see
repeat_interleave_nd().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.ReplicationPad1d(padding: int | tuple[int, int])[source]¶
Bases:
_ReplicationPadNdPads the input tensor using replication of the input boundary.
For N-dimensional padding, use
torch.nn.functional.pad().- Args:
- padding (int, tuple): the size of the padding. If is int, uses the same
padding in all boundaries. If a 2-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\)) Note that the output dimensions must remain positive.
- Shape:
Input: \((C, W_{in})\) or \((N, C, W_{in})\).
Output: \((C, W_{out})\) or \((N, C, W_{out})\), where
\(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)
Examples:
>>> # xdoctest: +IGNORE_WANT("not sure why xdoctest is choking on this") >>> m = nn.ReplicationPad1d(2) >>> input = torch.arange(8, dtype=torch.float).reshape(1, 2, 4) >>> input tensor([[[0., 1., 2., 3.], [4., 5., 6., 7.]]]) >>> m(input) tensor([[[0., 0., 0., 1., 2., 3., 3., 3.], [4., 4., 4., 5., 6., 7., 7., 7.]]]) >>> # using different paddings for different sides >>> m = nn.ReplicationPad1d((3, 1)) >>> m(input) tensor([[[0., 0., 0., 0., 1., 2., 3., 3.], [4., 4., 4., 4., 5., 6., 7., 7.]]])
- class torchwrench.nn.ReplicationPad2d(padding: int | tuple[int, int, int, int])[source]¶
Bases:
_ReplicationPadNdPads the input tensor using replication of the input boundary.
For N-dimensional padding, use
torch.nn.functional.pad().- Args:
- padding (int, tuple): the size of the padding. If is int, uses the same
padding in all boundaries. If a 4-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\)) Note that the output dimensions must remain positive.
- Shape:
Input: \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\).
Output: \((N, C, H_{out}, W_{out})\) or \((C, H_{out}, W_{out})\), where
\(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\)
\(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)
Examples:
>>> m = nn.ReplicationPad2d(2) >>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> input = torch.arange(9, dtype=torch.float).reshape(1, 1, 3, 3) >>> input tensor([[[[0., 1., 2.], [3., 4., 5.], [6., 7., 8.]]]]) >>> m(input) tensor([[[[0., 0., 0., 1., 2., 2., 2.], [0., 0., 0., 1., 2., 2., 2.], [0., 0., 0., 1., 2., 2., 2.], [3., 3., 3., 4., 5., 5., 5.], [6., 6., 6., 7., 8., 8., 8.], [6., 6., 6., 7., 8., 8., 8.], [6., 6., 6., 7., 8., 8., 8.]]]]) >>> # using different paddings for different sides >>> m = nn.ReplicationPad2d((1, 1, 2, 0)) >>> m(input) tensor([[[[0., 0., 1., 2., 2.], [0., 0., 1., 2., 2.], [0., 0., 1., 2., 2.], [3., 3., 4., 5., 5.], [6., 6., 7., 8., 8.]]]])
- class torchwrench.nn.ReplicationPad3d(padding: int | tuple[int, int, int, int, int, int])[source]¶
Bases:
_ReplicationPadNdPads the input tensor using replication of the input boundary.
For N-dimensional padding, use
torch.nn.functional.pad().- Args:
- padding (int, tuple): the size of the padding. If is int, uses the same
padding in all boundaries. If a 6-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\), \(\text{padding\_front}\), \(\text{padding\_back}\)) Note that the output dimensions must remain positive.
- Shape:
Input: \((N, C, D_{in}, H_{in}, W_{in})\) or \((C, D_{in}, H_{in}, W_{in})\).
Output: \((N, C, D_{out}, H_{out}, W_{out})\) or \((C, D_{out}, H_{out}, W_{out})\), where
\(D_{out} = D_{in} + \text{padding\_front} + \text{padding\_back}\)
\(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\)
\(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)
Examples:
>>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> m = nn.ReplicationPad3d(3) >>> input = torch.randn(16, 3, 8, 320, 480) >>> output = m(input) >>> # using different paddings for different sides >>> m = nn.ReplicationPad3d((3, 3, 6, 6, 1, 1)) >>> output = m(input)
- class torchwrench.nn.ResampleNearestFreqs(orig_freq: int, new_freq: int, dims: int | ~typing.Iterable[int] = -1, round_fn: ~typing.Callable[[~torch.Tensor], ~torch.Tensor] = <built-in method floor of type object>)[source]¶
Bases:
Module- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.ResampleNearestRates(rates: float | ~typing.Iterable[float], dims: int | ~typing.Iterable[int] = -1, round_fn: ~typing.Callable[[~torch.Tensor], ~torch.Tensor] = <built-in method floor of type object>)[source]¶
Bases:
ModuleFor more information, see
resample_nearest_rates().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.ResampleNearestSteps(steps: float | ~typing.Iterable[float], dims: int | ~typing.Iterable[int] = -1, round_fn: ~typing.Callable[[~torch.Tensor], ~torch.Tensor] = <built-in method floor of type object>)[source]¶
Bases:
Module- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.Reshape(*shape: int)[source]¶
Bases:
ModuleModule version of
reshape().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.SELU(inplace: bool =
False)[source]¶ Bases:
ModuleApplies the SELU function element-wise.
\[\text{SELU}(x) = \text{scale} * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1)))\]with \(\alpha = 1.6732632423543772848170429916717\) and \(\text{scale} = 1.0507009873554804934193349852946\).
Warning
When using
kaiming_normalorkaiming_normal_for initialisation,nonlinearity='linear'should be used instead ofnonlinearity='selu'in order to get Self-Normalizing Neural Networks. Seetorch.nn.init.calculate_gain()for more information.More details can be found in the paper Self-Normalizing Neural Networks .
- Args:
inplace (bool, optional): can optionally do the operation in-place. Default:
False- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.SELU() >>> input = torch.randn(2) >>> output = m(input)
- class torchwrench.nn.Sequential(*args: Module)[source]¶
- class torchwrench.nn.Sequential(arg: OrderedDict[str, Module])
Bases:
ModuleA sequential container.
Modules will be added to it in the order they are passed in the constructor. Alternatively, an
OrderedDictof modules can be passed in. Theforward()method ofSequentialaccepts any input and forwards it to the first module it contains. It then “chains” outputs to inputs sequentially for each subsequent module, finally returning the output of the last module.The value a
Sequentialprovides over manually calling a sequence of modules is that it allows treating the whole container as a single module, such that performing a transformation on theSequentialapplies to each of the modules it stores (which are each a registered submodule of theSequential).What’s the difference between a
Sequentialand atorch.nn.ModuleList? AModuleListis exactly what it sounds like–a list for storingModules! On the other hand, the layers in aSequentialare connected in a cascading way.Example:
# Using Sequential to create a small model. When `model` is run, # input will first be passed to `Conv2d(1,20,5)`. The output of # `Conv2d(1,20,5)` will be used as the input to the first # `ReLU`; the output of the first `ReLU` will become the input # for `Conv2d(20,64,5)`. Finally, the output of # `Conv2d(20,64,5)` will be used as input to the second `ReLU` model = nn.Sequential( nn.Conv2d(1, 20, 5), nn.ReLU(), nn.Conv2d(20, 64, 5), nn.ReLU() ) # Using Sequential with OrderedDict. This is functionally the # same as the above code model = nn.Sequential( OrderedDict( [ ("conv1", nn.Conv2d(1, 20, 5)), ("relu1", nn.ReLU()), ("conv2", nn.Conv2d(20, 64, 5)), ("relu2", nn.ReLU()), ] ) )- append(module: Module) Self[source]¶
Append a given module to the end.
- Args:
module (nn.Module): module to append
Example:
>>> import torch.nn as nn >>> n = nn.Sequential(nn.Linear(1, 2), nn.Linear(2, 3)) >>> n.append(nn.Linear(3, 4)) Sequential( (0): Linear(in_features=1, out_features=2, bias=True) (1): Linear(in_features=2, out_features=3, bias=True) (2): Linear(in_features=3, out_features=4, bias=True) )
- extend(sequential: Iterable[Module]) Self[source]¶
Extends the current Sequential container with layers from another Sequential container.
- Args:
sequential (Sequential): A Sequential container whose layers will be added to the current container.
Example:
>>> import torch.nn as nn >>> n = nn.Sequential(nn.Linear(1, 2), nn.Linear(2, 3)) >>> other = nn.Sequential(nn.Linear(3, 4), nn.Linear(4, 5)) >>> n.extend(other) # or `n + other` Sequential( (0): Linear(in_features=1, out_features=2, bias=True) (1): Linear(in_features=2, out_features=3, bias=True) (2): Linear(in_features=3, out_features=4, bias=True) (3): Linear(in_features=4, out_features=5, bias=True) )
- insert(index: int, module: Module) Self[source]¶
Inserts a module into the Sequential container at the specified index.
- Args:
index (int): The index to insert the module. module (Module): The module to be inserted.
Example:
>>> import torch.nn as nn >>> n = nn.Sequential(nn.Linear(1, 2), nn.Linear(2, 3)) >>> n.insert(0, nn.Linear(3, 4)) Sequential( (0): Linear(in_features=3, out_features=4, bias=True) (1): Linear(in_features=1, out_features=2, bias=True) (2): Linear(in_features=2, out_features=3, bias=True) )
-
class torchwrench.nn.Shuffled(dims: int | Iterable[int] =
-1, generator: Generator | None | 'default' | int =None)[source]¶ Bases:
Module- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.SiLU(inplace: bool =
False)[source]¶ Bases:
ModuleApplies the Sigmoid Linear Unit (SiLU) function, element-wise.
The SiLU function is also known as the swish function.
\[\text{silu}(x) = x * \sigma(x), \text{where } \sigma(x) \text{ is the logistic sigmoid.}\]Note
See Gaussian Error Linear Units (GELUs) where the SiLU (Sigmoid Linear Unit) was originally coined, and see Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning and Swish: a Self-Gated Activation Function where the SiLU was experimented with later.
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.SiLU() >>> input = torch.randn(2) >>> output = m(input)
- class torchwrench.nn.Sigmoid(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleApplies the Sigmoid function element-wise.
\[\text{Sigmoid}(x) = \sigma(x) = \frac{1}{1 + \exp(-x)}\]- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Sigmoid() >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.SmoothL1Loss(size_average=
None, reduce=None, reduction: str ='mean', beta: float =1.0)[source]¶ Bases:
_LossCreates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. It is less sensitive to outliers than
torch.nn.MSELossand in some cases prevents exploding gradients (e.g. see the paper Fast R-CNN by Ross Girshick).For a batch of size \(N\), the unreduced loss can be described as:
\[\ell(x, y) = L = \{l_1, ..., l_N\}^T\]with
\[\begin{split}l_n = \begin{cases} 0.5 (x_n - y_n)^2 / beta, & \text{if } |x_n - y_n| < beta \\ |x_n - y_n| - 0.5 * beta, & \text{otherwise } \end{cases}\end{split}\]If reduction is not none, then:
\[\begin{split}\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]Note
Smooth L1 loss can be seen as exactly
L1Loss, but with the \(|x - y| < beta\) portion replaced with a quadratic function such that its slope is 1 at \(|x - y| = beta\). The quadratic segment smooths the L1 loss near \(|x - y| = 0\).Note
Smooth L1 loss is closely related to
HuberLoss, being equivalent to \(huber(x, y) / beta\) (note that Smooth L1’s beta hyper-parameter is also known as delta for Huber). This leads to the following differences:As beta -> 0, Smooth L1 loss converges to
L1Loss, whileHuberLossconverges to a constant 0 loss. When beta is 0, Smooth L1 loss is equivalent to L1 loss.As beta -> \(+\infty\), Smooth L1 loss converges to a constant 0 loss, while
HuberLossconverges toMSELoss.For Smooth L1 loss, as beta varies, the L1 segment of the loss has a constant slope of 1. For
HuberLoss, the slope of the L1 segment is beta.
- Args:
- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'- beta (float, optional): Specifies the threshold at which to change between L1 and L2 loss.
The value must be non-negative. Default: 1.0
- size_average (bool, optional): Deprecated (see
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Target: \((*)\), same shape as the input.
Output: scalar. If
reductionis'none', then \((*)\), same shape as the input.
-
class torchwrench.nn.SoftMarginLoss(size_average=
None, reduce=None, reduction: str ='mean')[source]¶ Bases:
_LossCreates a criterion that optimizes a two-class classification logistic loss between input tensor \(x\) and target tensor \(y\) (containing 1 or -1).
\[\text{loss}(x, y) = \sum_i \frac{\log(1 + \exp(-y[i]*x[i]))}{\text{x.nelement}()}\]- Args:
- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- size_average (bool, optional): Deprecated (see
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Target: \((*)\), same shape as the input.
Output: scalar. If
reductionis'none', then \((*)\), same shape as input.
-
class torchwrench.nn.Softmax(dim: int | None =
None)[source]¶ Bases:
ModuleApplies the Softmax function to an n-dimensional input Tensor.
Rescales them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1.
Softmax is defined as:
\[\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]When the input Tensor is a sparse tensor then the unspecified values are treated as
-inf.- Shape:
Input: \((*)\) where * means, any number of additional dimensions
Output: \((*)\), same shape as the input
- Returns:
a Tensor of the same dimension and shape as the input with values in the range [0, 1]
- Args:
- dim (int): A dimension along which Softmax will be computed (so every slice
along dim will sum to 1).
Note
This module doesn’t work directly with NLLLoss, which expects the Log to be computed between the Softmax and itself. Use LogSoftmax instead (it’s faster and has better numerical properties).
Examples:
>>> m = nn.Softmax(dim=1) >>> input = torch.randn(2, 3) >>> output = m(input)
- class torchwrench.nn.Softmax2d(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleApplies SoftMax over features to each spatial location.
When given an image of
Channels x Height x Width, it will apply Softmax to each location \((Channels, h_i, w_j)\)- Shape:
Input: \((N, C, H, W)\) or \((C, H, W)\).
Output: \((N, C, H, W)\) or \((C, H, W)\) (same shape as input)
- Returns:
a Tensor of the same dimension and shape as the input with values in the range [0, 1]
Examples:
>>> m = nn.Softmax2d() >>> # you softmax over the 2nd dimension >>> input = torch.randn(2, 3, 12, 13) >>> output = m(input)
-
class torchwrench.nn.SoftmaxMultidim(dims: Iterable[int] | None =
(-1,))[source]¶ Bases:
ModuleFor more information, see
softmax_multidim().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(input: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.Softmin(dim: int | None =
None)[source]¶ Bases:
ModuleApplies the Softmin function to an n-dimensional input Tensor.
Rescales them so that the elements of the n-dimensional output Tensor lie in the range [0, 1] and sum to 1.
Softmin is defined as:
\[\text{Softmin}(x_{i}) = \frac{\exp(-x_i)}{\sum_j \exp(-x_j)}\]- Shape:
Input: \((*)\) where * means, any number of additional dimensions
Output: \((*)\), same shape as the input
- Args:
- dim (int): A dimension along which Softmin will be computed (so every slice
along dim will sum to 1).
- Returns:
a Tensor of the same dimension and shape as the input, with values in the range [0, 1]
Examples:
>>> m = nn.Softmin(dim=1) >>> input = torch.randn(2, 3) >>> output = m(input)
-
class torchwrench.nn.Softplus(beta: float =
1.0, threshold: float =20.0)[source]¶ Bases:
ModuleApplies the Softplus function element-wise.
\[\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x))\]SoftPlus is a smooth approximation to the ReLU function and can be used to constrain the output of a machine to always be positive.
For numerical stability the implementation reverts to the linear function when \(input \times \beta > threshold\).
- Args:
beta: the \(\beta\) value for the Softplus formulation. Default: 1 threshold: values above this revert to a linear function. Default: 20
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Softplus() >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.Softshrink(lambd: float =
0.5)[source]¶ Bases:
ModuleApplies the soft shrinkage function element-wise.
\[\begin{split}\text{SoftShrinkage}(x) = \begin{cases} x - \lambda, & \text{ if } x > \lambda \\ x + \lambda, & \text{ if } x < -\lambda \\ 0, & \text{ otherwise } \end{cases}\end{split}\]- Args:
lambd: the \(\lambda\) (must be no less than zero) value for the Softshrink formulation. Default: 0.5
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Softshrink() >>> input = torch.randn(2) >>> output = m(input)
- class torchwrench.nn.Softsign(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleApplies the element-wise Softsign function.
\[\text{SoftSign}(x) = \frac{x}{ 1 + |x|}\]- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Softsign() >>> input = torch.randn(2) >>> output = m(input)
-
class torchwrench.nn.Sort(dim: int =
-1, descending: bool =False, *, return_values: bool =True, return_indices: bool =True)[source]¶ Bases:
ModuleModule version of
sort().- forward(x: Tensor) sort | Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.Squeeze(dim: int | Iterable[int] | None =
None, mode: 'view_if_possible' | 'view' | 'copy' | 'inplace' ='view_if_possible')[source]¶ Bases:
ModuleModule version of
squeeze().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.SyncBatchNorm(num_features: int, eps: float =
1e-05, momentum: float | None =0.1, affine: bool =True, track_running_stats: bool =True, process_group: Any | None =None, device=None, dtype=None)[source]¶ Bases:
_BatchNormApplies Batch Normalization over a N-Dimensional input.
The N-D input is a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension over all mini-batches of the same process groups. \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are sampled from \(\mathcal{U}(0, 1)\) and the elements of \(\beta\) are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, correction=0).
Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default
momentumof 0.1.If
track_running_statsis set toFalse, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.Note
This
momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Because the Batch Normalization is done for each channel in the
Cdimension, computing statistics on(N, +)slices, it’s common terminology to call this Volumetric Batch Normalization or Spatio-temporal Batch Normalization.Currently
SyncBatchNormonly supportsDistributedDataParallel(DDP) with single GPU per process. Usetorch.nn.SyncBatchNorm.convert_sync_batchnorm()to convertBatchNorm*Dlayer toSyncBatchNormbefore wrapping Network with DDP.- Args:
- num_features: \(C\) from an expected input of size
\((N, C, +)\)
- eps: a value added to the denominator for numerical stability.
Default:
1e-5- momentum: the value used for the running_mean and running_var
computation. Can be set to
Nonefor cumulative moving average (i.e. simple average). Default: 0.1- affine: a boolean value that when set to
True, this module has learnable affine parameters. Default:
True- track_running_stats: a boolean value that when set to
True, this module tracks the running mean and variance, and when set to
False, this module does not track such statistics, and initializes statistics buffersrunning_meanandrunning_varasNone. When these buffers areNone, this module always uses batch statistics. in both training and eval modes. Default:True- process_group: synchronization of stats happen within each process group
individually. Default behavior is synchronization across the whole world
- Shape:
Input: \((N, C, +)\)
Output: \((N, C, +)\) (same shape as input)
Note
Synchronization of batchnorm statistics occurs only while training, i.e. synchronization is disabled when
model.eval()is set or ifself.trainingis otherwiseFalse.Examples:
>>> # xdoctest: +SKIP >>> # With Learnable Parameters >>> m = nn.SyncBatchNorm(100) >>> # creating process group (optional) >>> # ranks is a list of int identifying rank ids. >>> ranks = list(range(8)) >>> r1, r2 = ranks[:4], ranks[4:] >>> # Note: every rank calls into new_group for every >>> # process group created, even if that rank is not >>> # part of the group. >>> process_groups = [torch.distributed.new_group(pids) for pids in [r1, r2]] >>> process_group = process_groups[0 if dist.get_rank() <= 3 else 1] >>> # Without Learnable Parameters >>> m = nn.BatchNorm3d(100, affine=False, process_group=process_group) >>> input = torch.randn(20, 100, 35, 45, 10) >>> output = m(input) >>> # network is nn.BatchNorm layer >>> sync_bn_network = nn.SyncBatchNorm.convert_sync_batchnorm(network, process_group) >>> # only single gpu per process is currently supported >>> ddp_sync_bn_network = torch.nn.parallel.DistributedDataParallel( >>> sync_bn_network, >>> device_ids=[args.local_rank], >>> output_device=args.local_rank)-
classmethod convert_sync_batchnorm(module, process_group=
None)[source]¶ Converts all
BatchNorm*Dlayers in the model totorch.nn.SyncBatchNormlayers.- Args:
module (nn.Module): module containing one or more
BatchNorm*Dlayers process_group (optional): process group to scope synchronization,default is the whole world
- Returns:
The original
modulewith the convertedtorch.nn.SyncBatchNormlayers. If the originalmoduleis aBatchNorm*Dlayer, a newtorch.nn.SyncBatchNormlayer object will be returned instead.
Example:
>>> # Network with nn.BatchNorm layer >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA) >>> module = torch.nn.Sequential( >>> torch.nn.Linear(20, 100), >>> torch.nn.BatchNorm1d(100), >>> ).cuda() >>> # creating process group (optional) >>> # ranks is a list of int identifying rank ids. >>> ranks = list(range(8)) >>> r1, r2 = ranks[:4], ranks[4:] >>> # Note: every rank calls into new_group for every >>> # process group created, even if that rank is not >>> # part of the group. >>> # xdoctest: +SKIP("distributed") >>> process_groups = [torch.distributed.new_group(pids) for pids in [r1, r2]] >>> process_group = process_groups[0 if dist.get_rank() <= 3 else 1] >>> sync_bn_module = torch.nn.SyncBatchNorm.convert_sync_batchnorm(module, process_group)
-
class torchwrench.nn.TFlatten(start_dim: int =
0, end_dim: int | None =None)[source]¶ Bases:
Module- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
- forward(x: ndarray | generic) ndarray
- forward(x: T_BuiltinScalar) list[T_BuiltinScalar]
- forward(x: Iterable[T_BuiltinScalar]) list[T_BuiltinScalar]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.Tanh(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleApplies the Hyperbolic Tangent (Tanh) function element-wise.
Tanh is defined as:
\[\text{Tanh}(x) = \tanh(x) = \frac{\exp(x) - \exp(-x)} {\exp(x) + \exp(-x)}\]- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Tanh() >>> input = torch.randn(2) >>> output = m(input)
- class torchwrench.nn.Tanhshrink(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleApplies the element-wise Tanhshrink function.
\[\text{Tanhshrink}(x) = x - \tanh(x)\]- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Tanhshrink() >>> input = torch.randn(2) >>> output = m(input)
- class torchwrench.nn.TensorTo(**kwargs)[source]¶
Bases:
ModuleModule version of
to().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.TensorToNDArray(*, dtype: str | dtype | None =
None, force: bool =False)[source]¶ Bases:
ModuleFor more information, see
tensor_to_ndarray().- forward(x: Tensor) ndarray[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.Threshold(threshold: float, value: float, inplace: bool =
False)[source]¶ Bases:
ModuleThresholds each element of the input Tensor.
Threshold is defined as:
\[\begin{split}y = \begin{cases} x, &\text{ if } x > \text{threshold} \\ \text{value}, &\text{ otherwise } \end{cases}\end{split}\]- Args:
threshold: The value to threshold at value: The value to replace with inplace: can optionally do the operation in-place. Default:
False- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples:
>>> m = nn.Threshold(0, 0.5) >>> input = torch.arange(-3, 3) >>> output = m(input)
- class torchwrench.nn.ToItem(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
to_item().- forward(x: bool | int | float | complex | None | str | bytes | ndarray | generic | Tensor0D | Tensor | SupportsIterLen) bool | int | float | complex | None | str | bytes[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.ToList(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
tolist().- forward(x: Tensor) list[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.ToNDArray(*, dtype: str | dtype | None =
None, force: bool =False)[source]¶ Bases:
ModuleFor more information, see
to_ndarray().- forward(x: Tensor | ndarray | list) ndarray[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.TopP(p: float, dim: int =
-1, largest: bool =True, *, return_values: bool =True, return_indices: bool =True)[source]¶ Bases:
ModuleModule version of
top_p().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor | LongTensor | top_p[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.Topk(k: int, dim: int =
-1, largest: bool =True, sorted: bool =True, *, return_values: bool =True, return_indices: bool =True)[source]¶ Bases:
ModuleModule version of
topk().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor | LongTensor | topk[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.TransformDrop(transform: Callable[[T], T], p: float, generator: Generator | None | 'default' | int =
None)[source]¶ Bases:
Generic[T],EModule[T,T]- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: T) T[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.Transformer(d_model: int = 512, nhead: int = 8, num_encoder_layers: int = 6, num_decoder_layers: int = 6, dim_feedforward: int = 2048, dropout: float = 0.1, activation: str | ~collections.abc.Callable[[~torch.Tensor], ~torch.Tensor] = <function relu>, custom_encoder: ~typing.Any | None = None, custom_decoder: ~typing.Any | None = None, layer_norm_eps: float = 1e-05, batch_first: bool = False, norm_first: bool = False, bias: bool = True, device=None, dtype=None)[source]¶
Bases:
ModuleA basic transformer layer.
This Transformer layer implements the original Transformer architecture described in the Attention Is All You Need paper. The intent of this layer is as a reference implementation for foundational understanding and thus it contains only limited features relative to newer Transformer architectures. Given the fast pace of innovation in transformer-like architectures, we recommend exploring this tutorial to build an efficient transformer layer from building blocks in core or using higher level libraries from the PyTorch Ecosystem.
- Args:
d_model: the number of expected features in the encoder/decoder inputs (default=512). nhead: the number of heads in the multiheadattention models (default=8). num_encoder_layers: the number of sub-encoder-layers in the encoder (default=6). num_decoder_layers: the number of sub-decoder-layers in the decoder (default=6). dim_feedforward: the dimension of the feedforward network model (default=2048). dropout: the dropout value (default=0.1). activation: the activation function of encoder/decoder intermediate layer, can be a string
(“relu” or “gelu”) or a unary callable. Default: relu
custom_encoder: custom encoder (default=None). custom_decoder: custom decoder (default=None). layer_norm_eps: the eps value in layer normalization components (default=1e-5). batch_first: If
True, then the input and output tensors are providedas (batch, seq, feature). Default:
False(seq, batch, feature).- norm_first: if
True, encoder and decoder layers will perform LayerNorms before other attention and feedforward operations, otherwise after. Default:
False(after).- bias: If set to
False,LinearandLayerNormlayers will not learn an additive bias. Default:
True.
- norm_first: if
- Examples:
>>> transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12) >>> src = torch.rand((10, 32, 512)) >>> tgt = torch.rand((20, 32, 512)) >>> out = transformer_model(src, tgt)
Note: A full example to apply nn.Transformer module for the word language model is available in https://github.com/pytorch/examples/tree/master/word_language_model
-
forward(src: Tensor, tgt: Tensor, src_mask: Tensor | None =
None, tgt_mask: Tensor | None =None, memory_mask: Tensor | None =None, src_key_padding_mask: Tensor | None =None, tgt_key_padding_mask: Tensor | None =None, memory_key_padding_mask: Tensor | None =None, src_is_causal: bool | None =None, tgt_is_causal: bool | None =None, memory_is_causal: bool =False) Tensor[source]¶ Take in and process masked source/target sequences.
Note
If a boolean tensor is provided for any of the [src/tgt/memory]_mask arguments, positions with a
Truevalue are not allowed to participate in the attention, which is the opposite of the definition forattn_maskintorch.nn.functional.scaled_dot_product_attention().- Args:
src: the sequence to the encoder (required). tgt: the sequence to the decoder (required). src_mask: the additive mask for the src sequence (optional). tgt_mask: the additive mask for the tgt sequence (optional). memory_mask: the additive mask for the encoder output (optional). src_key_padding_mask: the Tensor mask for src keys per batch (optional). tgt_key_padding_mask: the Tensor mask for tgt keys per batch (optional). memory_key_padding_mask: the Tensor mask for memory keys per batch (optional). src_is_causal: If specified, applies a causal mask as
src_mask.Default:
None; try to detect a causal mask. Warning:src_is_causalprovides a hint thatsrc_maskis the causal mask. Providing incorrect hints can result in incorrect execution, including forward and backward compatibility.- tgt_is_causal: If specified, applies a causal mask as
tgt_mask. Default:
None; try to detect a causal mask. Warning:tgt_is_causalprovides a hint thattgt_maskis the causal mask. Providing incorrect hints can result in incorrect execution, including forward and backward compatibility.- memory_is_causal: If specified, applies a causal mask as
memory_mask. Default:False. Warning:memory_is_causalprovides a hint thatmemory_maskis the causal mask. Providing incorrect hints can result in incorrect execution, including forward and backward compatibility.
- tgt_is_causal: If specified, applies a causal mask as
- Shape:
src: \((S, E)\) for unbatched input, \((S, N, E)\) if batch_first=False or (N, S, E) if batch_first=True.
tgt: \((T, E)\) for unbatched input, \((T, N, E)\) if batch_first=False or (N, T, E) if batch_first=True.
src_mask: \((S, S)\) or \((N\cdot\text{num\_heads}, S, S)\).
tgt_mask: \((T, T)\) or \((N\cdot\text{num\_heads}, T, T)\).
memory_mask: \((T, S)\).
src_key_padding_mask: \((S)\) for unbatched input otherwise \((N, S)\).
tgt_key_padding_mask: \((T)\) for unbatched input otherwise \((N, T)\).
memory_key_padding_mask: \((S)\) for unbatched input otherwise \((N, S)\).
Note: [src/tgt/memory]_mask ensures that position \(i\) is allowed to attend the unmasked positions. If a BoolTensor is provided, positions with
Trueare not allowed to attend whileFalsevalues will be unchanged. If a FloatTensor is provided, it will be added to the attention weight. [src/tgt/memory]_key_padding_mask provides specified elements in the key to be ignored by the attention. If a BoolTensor is provided, the positions with the value ofTruewill be ignored while the position with the value ofFalsewill be unchanged.output: \((T, E)\) for unbatched input, \((T, N, E)\) if batch_first=False or (N, T, E) if batch_first=True.
Note: Due to the multi-head attention architecture in the transformer model, the output sequence length of a transformer is same as the input sequence (i.e. target) length of the decoder.
where \(S\) is the source sequence length, \(T\) is the target sequence length, \(N\) is the batch size, \(E\) is the feature number
- Examples:
>>> # xdoctest: +SKIP >>> output = transformer_model( ... src, tgt, src_mask=src_mask, tgt_mask=tgt_mask ... )
-
class torchwrench.nn.TransformerDecoder(decoder_layer: TransformerDecoderLayer, num_layers: int, norm: Module | None =
None)[source]¶ Bases:
ModuleTransformerDecoder is a stack of N decoder layers.
This TransformerDecoder layer implements the original architecture described in the Attention Is All You Need paper. The intent of this layer is as a reference implementation for foundational understanding and thus it contains only limited features relative to newer Transformer architectures. Given the fast pace of innovation in transformer-like architectures, we recommend exploring this tutorial to build efficient layers from building blocks in core or using higher level libraries from the PyTorch Ecosystem.
Warning
All layers in the TransformerDecoder are initialized with the same parameters. It is recommended to manually initialize the layers after creating the TransformerDecoder instance.
- Args:
decoder_layer: an instance of the TransformerDecoderLayer() class (required). num_layers: the number of sub-decoder-layers in the decoder (required). norm: the layer normalization component (optional).
- Examples:
>>> decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8) >>> transformer_decoder = nn.TransformerDecoder(decoder_layer, num_layers=6) >>> memory = torch.rand(10, 32, 512) >>> tgt = torch.rand(20, 32, 512) >>> out = transformer_decoder(tgt, memory)
-
forward(tgt: Tensor, memory: Tensor, tgt_mask: Tensor | None =
None, memory_mask: Tensor | None =None, tgt_key_padding_mask: Tensor | None =None, memory_key_padding_mask: Tensor | None =None, tgt_is_causal: bool | None =None, memory_is_causal: bool =False) Tensor[source]¶ Pass the inputs (and mask) through the decoder layer in turn.
- Args:
tgt: the sequence to the decoder (required). memory: the sequence from the last layer of the encoder (required). tgt_mask: the mask for the tgt sequence (optional). memory_mask: the mask for the memory sequence (optional). tgt_key_padding_mask: the mask for the tgt keys per batch (optional). memory_key_padding_mask: the mask for the memory keys per batch (optional). tgt_is_causal: If specified, applies a causal mask as
tgt mask.Default:
None; try to detect a causal mask. Warning:tgt_is_causalprovides a hint thattgt_maskis the causal mask. Providing incorrect hints can result in incorrect execution, including forward and backward compatibility.- memory_is_causal: If specified, applies a causal mask as
memory mask. Default:False. Warning:memory_is_causalprovides a hint thatmemory_maskis the causal mask. Providing incorrect hints can result in incorrect execution, including forward and backward compatibility.
- Shape:
see the docs in
Transformer.
- class torchwrench.nn.TransformerDecoderLayer(d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1, activation: str | ~collections.abc.Callable[[~torch.Tensor], ~torch.Tensor] = <function relu>, layer_norm_eps: float = 1e-05, batch_first: bool = False, norm_first: bool = False, bias: bool = True, device=None, dtype=None)[source]¶
Bases:
ModuleTransformerDecoderLayer is made up of self-attn, multi-head-attn and feedforward network.
This TransformerDecoderLayer implements the original architecture described in the Attention Is All You Need paper. The intent of this layer is as a reference implementation for foundational understanding and thus it contains only limited features relative to newer Transformer architectures. Given the fast pace of innovation in transformer-like architectures, we recommend exploring this tutorial to build efficient layers from building blocks in core or using higher level libraries from the PyTorch Ecosystem.
- Args:
d_model: the number of expected features in the input (required). nhead: the number of heads in the multiheadattention models (required). dim_feedforward: the dimension of the feedforward network model (default=2048). dropout: the dropout value (default=0.1). activation: the activation function of the intermediate layer, can be a string
(“relu” or “gelu”) or a unary callable. Default: relu
layer_norm_eps: the eps value in layer normalization components (default=1e-5). batch_first: If
True, then the input and output tensors are providedas (batch, seq, feature). Default:
False(seq, batch, feature).- norm_first: if
True, layer norm is done prior to self attention, multihead attention and feedforward operations, respectively. Otherwise it’s done after. Default:
False(after).- bias: If set to
False,LinearandLayerNormlayers will not learn an additive bias. Default:
True.
- norm_first: if
- Examples:
>>> decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8) >>> memory = torch.rand(10, 32, 512) >>> tgt = torch.rand(20, 32, 512) >>> out = decoder_layer(tgt, memory)- Alternatively, when
batch_firstisTrue: >>> decoder_layer = nn.TransformerDecoderLayer( ... d_model=512, nhead=8, batch_first=True ... ) >>> memory = torch.rand(32, 10, 512) >>> tgt = torch.rand(32, 20, 512) >>> out = decoder_layer(tgt, memory)
-
forward(tgt: Tensor, memory: Tensor, tgt_mask: Tensor | None =
None, memory_mask: Tensor | None =None, tgt_key_padding_mask: Tensor | None =None, memory_key_padding_mask: Tensor | None =None, tgt_is_causal: bool =False, memory_is_causal: bool =False) Tensor[source]¶ Pass the inputs (and mask) through the decoder layer.
- Args:
tgt: the sequence to the decoder layer (required). memory: the sequence from the last layer of the encoder (required). tgt_mask: the mask for the tgt sequence (optional). memory_mask: the mask for the memory sequence (optional). tgt_key_padding_mask: the mask for the tgt keys per batch (optional). memory_key_padding_mask: the mask for the memory keys per batch (optional). tgt_is_causal: If specified, applies a causal mask as
tgt mask.Default:
False. Warning:tgt_is_causalprovides a hint thattgt_maskis the causal mask. Providing incorrect hints can result in incorrect execution, including forward and backward compatibility.- memory_is_causal: If specified, applies a causal mask as
memory mask. Default:False. Warning:memory_is_causalprovides a hint thatmemory_maskis the causal mask. Providing incorrect hints can result in incorrect execution, including forward and backward compatibility.
- Shape:
see the docs in
Transformer.
-
class torchwrench.nn.TransformerEncoder(encoder_layer: TransformerEncoderLayer, num_layers: int, norm: Module | None =
None, enable_nested_tensor: bool =True, mask_check: bool =True)[source]¶ Bases:
ModuleTransformerEncoder is a stack of N encoder layers.
This TransformerEncoder layer implements the original architecture described in the Attention Is All You Need paper. The intent of this layer is as a reference implementation for foundational understanding and thus it contains only limited features relative to newer Transformer architectures. Given the fast pace of innovation in transformer-like architectures, we recommend exploring this tutorial to build efficient layers from building blocks in core or using higher level libraries from the PyTorch Ecosystem.
Warning
All layers in the TransformerEncoder are initialized with the same parameters. It is recommended to manually initialize the layers after creating the TransformerEncoder instance.
- Args:
encoder_layer: an instance of the TransformerEncoderLayer() class (required). num_layers: the number of sub-encoder-layers in the encoder (required). norm: the layer normalization component (optional). enable_nested_tensor: if True, input will automatically convert to nested tensor
(and convert back on output). This will improve the overall performance of TransformerEncoder when padding rate is high. Default:
True(enabled).- Examples:
>>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8) >>> transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6) >>> src = torch.rand(10, 32, 512) >>> out = transformer_encoder(src)
-
forward(src: Tensor, mask: Tensor | None =
None, src_key_padding_mask: Tensor | None =None, is_causal: bool | None =None) Tensor[source]¶ Pass the input through the encoder layers in turn.
- Args:
src: the sequence to the encoder (required). mask: the mask for the src sequence (optional). src_key_padding_mask: the mask for the src keys per batch (optional). is_causal: If specified, applies a causal mask as
mask.Default:
None; try to detect a causal mask. Warning:is_causalprovides a hint thatmaskis the causal mask. Providing incorrect hints can result in incorrect execution, including forward and backward compatibility.- Shape:
see the docs in
Transformer.
- class torchwrench.nn.TransformerEncoderLayer(d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1, activation: str | ~collections.abc.Callable[[~torch.Tensor], ~torch.Tensor] = <function relu>, layer_norm_eps: float = 1e-05, batch_first: bool = False, norm_first: bool = False, bias: bool = True, device=None, dtype=None)[source]¶
Bases:
ModuleTransformerEncoderLayer is made up of self-attn and feedforward network.
This TransformerEncoderLayer implements the original architecture described in the Attention Is All You Need paper. The intent of this layer is as a reference implementation for foundational understanding and thus it contains only limited features relative to newer Transformer architectures. Given the fast pace of innovation in transformer-like architectures, we recommend exploring this tutorial to build efficient layers from building blocks in core or using higher level libraries from the PyTorch Ecosystem.
TransformerEncoderLayer can handle either traditional torch.tensor inputs, or Nested Tensor inputs. Derived classes are expected to similarly accept both input formats. (Not all combinations of inputs are currently supported by TransformerEncoderLayer while Nested Tensor is in prototype state.)
If you are implementing a custom layer, you may derive it either from the Module or TransformerEncoderLayer class. If your custom layer supports both torch.Tensors and Nested Tensors inputs, make its implementation a derived class of TransformerEncoderLayer. If your custom Layer supports only torch.Tensor inputs, derive its implementation from Module.
- Args:
d_model: the number of expected features in the input (required). nhead: the number of heads in the multiheadattention models (required). dim_feedforward: the dimension of the feedforward network model (default=2048). dropout: the dropout value (default=0.1). activation: the activation function of the intermediate layer, can be a string
(“relu” or “gelu”) or a unary callable. Default: relu
layer_norm_eps: the eps value in layer normalization components (default=1e-5). batch_first: If
True, then the input and output tensors are providedas (batch, seq, feature). Default:
False(seq, batch, feature).- norm_first: if
True, layer norm is done prior to attention and feedforward operations, respectively. Otherwise it’s done after. Default:
False(after).- bias: If set to
False,LinearandLayerNormlayers will not learn an additive bias. Default:
True.
- norm_first: if
- Examples:
>>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8) >>> src = torch.rand(10, 32, 512) >>> out = encoder_layer(src)- Alternatively, when
batch_firstisTrue: >>> encoder_layer = nn.TransformerEncoderLayer( ... d_model=512, nhead=8, batch_first=True ... ) >>> src = torch.rand(32, 10, 512) >>> out = encoder_layer(src)- Fast path:
forward() will use a special optimized implementation described in FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness if all of the following conditions are met:
Either autograd is disabled (using
torch.inference_modeortorch.no_grad) or no tensor argumentrequires_gradtraining is disabled (using
.eval())batch_first is
Trueand the input is batched (i.e.,src.dim() == 3)activation is one of:
"relu","gelu",torch.functional.relu, ortorch.functional.geluat most one of
src_maskandsrc_key_padding_maskis passedif src is a NestedTensor, neither
src_masknorsrc_key_padding_maskis passedthe two
LayerNorminstances have a consistentepsvalue (this will naturally be the case unless the caller has manually modified one without modifying the other)
If the optimized implementation is in use, a NestedTensor can be passed for
srcto represent padding more efficiently than using a padding mask. In this case, a NestedTensor will be returned, and an additional speedup proportional to the fraction of the input that is padding can be expected.
-
forward(src: Tensor, src_mask: Tensor | None =
None, src_key_padding_mask: Tensor | None =None, is_causal: bool =False) Tensor[source]¶ Pass the input through the encoder layer.
- Args:
src: the sequence to the encoder layer (required). src_mask: the mask for the src sequence (optional). src_key_padding_mask: the mask for the src keys per batch (optional). is_causal: If specified, applies a causal mask as
src mask.Default:
False. Warning:is_causalprovides a hint thatsrc_maskis the causal mask. Providing incorrect hints can result in incorrect execution, including forward and backward compatibility.- Shape:
see the docs in
Transformer.
-
class torchwrench.nn.Transpose(dim0: int, dim1: int, copy: bool =
False)[source]¶ Bases:
ModuleModule version of
transpose().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.TripletMarginLoss(margin: float =
1.0, p: float =2.0, eps: float =1e-06, swap: bool =False, size_average=None, reduce=None, reduction: str ='mean')[source]¶ Bases:
_LossCreates a criterion that measures the triplet loss given an input tensors \(x1\), \(x2\), \(x3\) and a margin with a value greater than \(0\). This is used for measuring a relative similarity between samples. A triplet is composed by a, p and n (i.e., anchor, positive examples and negative examples respectively). The shapes of all input tensors should be \((N, D)\).
The distance swap is described in detail in the paper Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al.
The loss function for each sample in the mini-batch is:
\[L(a, p, n) = \max \{d(a_i, p_i) - d(a_i, n_i) + {\rm margin}, 0\}\]where
\[d(x_i, y_i) = \left\lVert {\bf x}_i - {\bf y}_i \right\rVert_p\]The norm is calculated using the specified p value and a small constant \(\varepsilon\) is added for numerical stability.
See also
TripletMarginWithDistanceLoss, which computes the triplet margin loss for input tensors using a custom distance function.- Args:
margin (float, optional): Default: \(1\). p (int, optional): The norm degree for pairwise distance. Default: \(2\). eps (float, optional): Small constant for numerical stability. Default: \(1e-6\). swap (bool, optional): The distance swap is described in detail in the paper
Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al. Default:
False.- size_average (bool, optional): Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:True- reduce (bool, optional): Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:True- reduction (str, optional): Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'
- size_average (bool, optional): Deprecated (see
- Shape:
Input: \((N, D)\) or \((D)\) where \(D\) is the vector dimension.
Output: A Tensor of shape \((N)\) if
reductionis'none'and input shape is \((N, D)\); a scalar otherwise.
Examples:
>>> triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2, eps=1e-7) >>> anchor = torch.randn(100, 128, requires_grad=True) >>> positive = torch.randn(100, 128, requires_grad=True) >>> negative = torch.randn(100, 128, requires_grad=True) >>> output = triplet_loss(anchor, positive, negative) >>> output.backward()
-
class torchwrench.nn.TripletMarginWithDistanceLoss(*, distance_function: Callable[[Tensor, Tensor], Tensor] | None =
None, margin: float =1.0, swap: bool =False, reduction: str ='mean')[source]¶ Bases:
_LossCreates a criterion that measures the triplet loss given input tensors \(a\), \(p\), and \(n\) (representing anchor, positive, and negative examples, respectively), and a nonnegative, real-valued function (“distance function”) used to compute the relationship between the anchor and positive example (“positive distance”) and the anchor and negative example (“negative distance”).
The unreduced loss (i.e., with
reductionset to'none') can be described as:\[\ell(a, p, n) = L = \{l_1,\dots,l_N\}^\top, \quad l_i = \max \{d(a_i, p_i) - d(a_i, n_i) + {\rm margin}, 0\}\]where \(N\) is the batch size; \(d\) is a nonnegative, real-valued function quantifying the closeness of two tensors, referred to as the
distance_function; and \(margin\) is a nonnegative margin representing the minimum difference between the positive and negative distances that is required for the loss to be 0. The input tensors have \(N\) elements each and can be of any shape that the distance function can handle.If
reductionis not'none'(default'mean'), then:\[\begin{split}\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]See also
TripletMarginLoss, which computes the triplet loss for input tensors using the \(l_p\) distance as the distance function.- Args:
- distance_function (Callable, optional): A nonnegative, real-valued function that
quantifies the closeness of two tensors. If not specified, nn.PairwiseDistance will be used. Default:
None- margin (float, optional): A nonnegative margin representing the minimum difference
between the positive and negative distances required for the loss to be 0. Larger margins penalize cases where the negative examples are not distant enough from the anchors, relative to the positives. Default: \(1\).
- swap (bool, optional): Whether to use the distance swap described in the paper
Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al. If True, and if the positive example is closer to the negative example than the anchor is, swaps the positive example and the anchor in the loss computation. Default:
False.- reduction (str, optional): Specifies the (optional) reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Default:'mean'
- Shape:
Input: \((N, *)\) where \(*\) represents any number of additional dimensions as supported by the distance function.
Output: A Tensor of shape \((N)\) if
reductionis'none', or a scalar otherwise.
Examples:
>>> # Initialize embeddings >>> embedding = nn.Embedding(1000, 128) >>> anchor_ids = torch.randint(0, 1000, (1,)) >>> positive_ids = torch.randint(0, 1000, (1,)) >>> negative_ids = torch.randint(0, 1000, (1,)) >>> anchor = embedding(anchor_ids) >>> positive = embedding(positive_ids) >>> negative = embedding(negative_ids) >>> >>> # Built-in Distance Function >>> triplet_loss = \ >>> nn.TripletMarginWithDistanceLoss(distance_function=nn.PairwiseDistance()) >>> output = triplet_loss(anchor, positive, negative) >>> output.backward() >>> >>> # Custom Distance Function >>> def l_infinity(x1, x2): >>> return torch.max(torch.abs(x1 - x2), dim=1).values >>> >>> # xdoctest: +SKIP("FIXME: Would call backwards a second time") >>> triplet_loss = ( >>> nn.TripletMarginWithDistanceLoss(distance_function=l_infinity, margin=1.5)) >>> output = triplet_loss(anchor, positive, negative) >>> output.backward() >>> >>> # Custom Distance Function (Lambda) >>> triplet_loss = ( >>> nn.TripletMarginWithDistanceLoss( >>> distance_function=lambda x, y: 1.0 - F.cosine_similarity(x, y))) >>> output = triplet_loss(anchor, positive, negative) >>> output.backward()- Reference:
V. Balntas, et al.: Learning shallow convolutional feature descriptors with triplet losses: https://bmva-archive.org.uk/bmvc/2016/papers/paper119/index.html
- class torchwrench.nn.Unflatten(dim: int | str, unflattened_size: Size | list[int] | tuple[int, ...] | tuple[tuple[str, int]])[source]¶
Bases:
ModuleUnflattens a tensor dim expanding it to a desired shape. For use with
Sequential.dimspecifies the dimension of the input tensor to be unflattened, and it can be either int or str when Tensor or NamedTensor is used, respectively.unflattened_sizeis the new shape of the unflattened dimension of the tensor and it can be a tuple of ints or a list of ints or torch.Size for Tensor input; a NamedShape (tuple of (name, size) tuples) for NamedTensor input.
- Shape:
Input: \((*, S_{\text{dim}}, *)\), where \(S_{\text{dim}}\) is the size at dimension
dimand \(*\) means any number of dimensions including none.Output: \((*, U_1, ..., U_n, *)\), where \(U\) =
unflattened_sizeand \(\prod_{i=1}^n U_i = S_{\text{dim}}\).
- Args:
dim (Union[int, str]): Dimension to be unflattened unflattened_size (Union[torch.Size, Tuple, List, NamedShape]): New shape of the unflattened dimension
- Examples:
>>> input = torch.randn(2, 50) >>> # With tuple of ints >>> m = nn.Sequential( >>> nn.Linear(50, 50), >>> nn.Unflatten(1, (2, 5, 5)) >>> ) >>> output = m(input) >>> output.size() torch.Size([2, 2, 5, 5]) >>> # With torch.Size >>> m = nn.Sequential( >>> nn.Linear(50, 50), >>> nn.Unflatten(1, torch.Size([2, 5, 5])) >>> ) >>> output = m(input) >>> output.size() torch.Size([2, 2, 5, 5]) >>> # With namedshape (tuple of tuples) >>> input = torch.randn(2, 50, names=("N", "features")) >>> unflatten = nn.Unflatten("features", (("C", 2), ("H", 5), ("W", 5))) >>> output = unflatten(input) >>> output.size() torch.Size([2, 2, 5, 5])
-
class torchwrench.nn.Unfold(kernel_size: int | tuple[int, ...], dilation: int | tuple[int, ...] =
1, padding: int | tuple[int, ...] =0, stride: int | tuple[int, ...] =1)[source]¶ Bases:
ModuleExtracts sliding local blocks from a batched input tensor.
Consider a batched
inputtensor of shape \((N, C, *)\), where \(N\) is the batch dimension, \(C\) is the channel dimension, and \(*\) represent arbitrary spatial dimensions. This operation flattens each slidingkernel_size-sized block within the spatial dimensions ofinputinto a column (i.e., last dimension) of a 3-Doutputtensor of shape \((N, C \times \prod(\text{kernel\_size}), L)\), where \(C \times \prod(\text{kernel\_size})\) is the total number of values within each block (a block has \(\prod(\text{kernel\_size})\) spatial locations each containing a \(C\)-channeled vector), and \(L\) is the total number of such blocks:\[L = \prod_d \left\lfloor\frac{\text{spatial\_size}[d] + 2 \times \text{padding}[d] % - \text{dilation}[d] \times (\text{kernel\_size}[d] - 1) - 1}{\text{stride}[d]} + 1\right\rfloor,\]where \(\text{spatial\_size}\) is formed by the spatial dimensions of
input(\(*\) above), and \(d\) is over all spatial dimensions.Therefore, indexing
outputat the last dimension (column dimension) gives all values within a certain block.The
padding,strideanddilationarguments specify how the sliding blocks are retrieved.stridecontrols the stride for the sliding blocks.paddingcontrols the amount of implicit zero-paddings on both sides forpaddingnumber of points for each dimension before reshaping.dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of whatdilationdoes.
- Args:
kernel_size (int or tuple): the size of the sliding blocks dilation (int or tuple, optional): a parameter that controls the
stride of elements within the neighborhood. Default: 1
- padding (int or tuple, optional): implicit zero padding to be added on
both sides of input. Default: 0
- stride (int or tuple, optional): the stride of the sliding blocks in the input
spatial dimensions. Default: 1
If
kernel_size,dilation,paddingorstrideis an int or a tuple of length 1, their values will be replicated across all spatial dimensions.For the case of two input spatial dimensions this operation is sometimes called
im2col.
Note
Foldcalculates each combined value in the resulting large tensor by summing all values from all containing blocks.Unfoldextracts the values in the local blocks by copying from the large tensor. So, if the blocks overlap, they are not inverses of each other.In general, folding and unfolding operations are related as follows. Consider
FoldandUnfoldinstances created with the same parameters:>>> fold_params = dict(kernel_size=..., dilation=..., padding=..., stride=...) >>> fold = nn.Fold(output_size=..., **fold_params) >>> unfold = nn.Unfold(**fold_params)Then for any (supported)
inputtensor the following equality holds:fold(unfold(input)) == divisor * inputwhere
divisoris a tensor that depends only on the shape and dtype of theinput:>>> # xdoctest: +SKIP >>> input_ones = torch.ones(input.shape, dtype=input.dtype) >>> divisor = fold(unfold(input_ones))When the
divisortensor contains no zero elements, thenfoldandunfoldoperations are inverses of each other (up to constant divisor).Warning
Currently, only 4-D input tensors (batched image-like tensors) are supported.
- Shape:
Input: \((N, C, *)\)
Output: \((N, C \times \prod(\text{kernel\_size}), L)\) as described above
Examples:
>>> unfold = nn.Unfold(kernel_size=(2, 3)) >>> input = torch.randn(2, 5, 3, 4) >>> output = unfold(input) >>> # each patch contains 30 values (2x3=6 vectors, each of 5 channels) >>> # 4 blocks (2x3 kernels) in total in the 3x4 input >>> output.size() torch.Size([2, 30, 4]) >>> # xdoctest: +IGNORE_WANT >>> # Convolution is equivalent with Unfold + Matrix Multiplication + Fold (or view to output shape) >>> inp = torch.randn(1, 3, 10, 12) >>> w = torch.randn(2, 3, 4, 5) >>> inp_unf = torch.nn.functional.unfold(inp, (4, 5)) >>> out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2) >>> out = torch.nn.functional.fold(out_unf, (7, 8), (1, 1)) >>> # or equivalently (and avoiding a copy), >>> # out = out_unf.view(1, 2, 7, 8) >>> (torch.nn.functional.conv2d(inp, w) - out).abs().max() tensor(1.9073e-06)
-
class torchwrench.nn.Unsqueeze(dim: int | Iterable[int], mode: 'view_if_possible' | 'view' | 'copy' | 'inplace' =
'view_if_possible')[source]¶ Bases:
ModuleModule version of
unsqueeze().- extra_repr() str[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x: T_TensorOrArray) T_TensorOrArray[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class torchwrench.nn.Upsample(size: int | tuple[int, ...] | None =
None, scale_factor: float | tuple[float, ...] | None =None, mode: str ='nearest', align_corners: bool | None =None, recompute_scale_factor: bool | None =None)[source]¶ Bases:
ModuleUpsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data.
The input data is assumed to be of the form minibatch x channels x [optional depth] x [optional height] x width. Hence, for spatial inputs, we expect a 4D Tensor and for volumetric inputs, we expect a 5D Tensor.
The algorithms available for upsampling are nearest neighbor and linear, bilinear, bicubic and trilinear for 3D, 4D and 5D input Tensor, respectively.
One can either give a
scale_factoror the target outputsizeto calculate the output size. (You cannot give both, as it is ambiguous)- Args:
- size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int], optional):
output spatial sizes
- scale_factor (float or Tuple[float] or Tuple[float, float] or Tuple[float, float, float], optional):
multiplier for spatial size. Has to match input size if it is a tuple.
- mode (str, optional): the upsampling algorithm: one of
'nearest', 'linear','bilinear','bicubic'and'trilinear'. Default:'nearest'- align_corners (bool, optional): if
True, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when
modeis'linear','bilinear','bicubic', or'trilinear'. Default:False- recompute_scale_factor (bool, optional): recompute the scale_factor for use in the
interpolation calculation. If recompute_scale_factor is
True, then scale_factor must be passed in and scale_factor is used to compute the output size. The computed output size will be used to infer new scales for the interpolation. Note that when scale_factor is floating-point, it may differ from the recomputed scale_factor due to rounding and precision issues. If recompute_scale_factor isFalse, then size or scale_factor will be used directly for interpolation.
- Shape:
Input: \((N, C, W_{in})\), \((N, C, H_{in}, W_{in})\) or \((N, C, D_{in}, H_{in}, W_{in})\)
Output: \((N, C, W_{out})\), \((N, C, H_{out}, W_{out})\) or \((N, C, D_{out}, H_{out}, W_{out})\), where
\[D_{out} = \left\lfloor D_{in} \times \text{scale\_factor} \right\rfloor\]\[H_{out} = \left\lfloor H_{in} \times \text{scale\_factor} \right\rfloor\]\[W_{out} = \left\lfloor W_{in} \times \text{scale\_factor} \right\rfloor\]Warning
With
align_corners = True, the linearly interpolating modes (linear, bilinear, bicubic, and trilinear) don’t proportionally align the output and input pixels, and thus the output values can depend on the input size. This was the default behavior for these modes up to version 0.3.1. Since then, the default behavior isalign_corners = False. See below for concrete examples on how this affects the outputs.Note
If you want downsampling/general resizing, you should use
interpolate().Examples:
>>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2) >>> input tensor([[[[1., 2.], [3., 4.]]]]) >>> m = nn.Upsample(scale_factor=2, mode='nearest') >>> m(input) tensor([[[[1., 1., 2., 2.], [1., 1., 2., 2.], [3., 3., 4., 4.], [3., 3., 4., 4.]]]]) >>> # xdoctest: +IGNORE_WANT("other tests seem to modify printing styles") >>> m = nn.Upsample(scale_factor=2, mode='bilinear') # align_corners=False >>> m(input) tensor([[[[1.0000, 1.2500, 1.7500, 2.0000], [1.5000, 1.7500, 2.2500, 2.5000], [2.5000, 2.7500, 3.2500, 3.5000], [3.0000, 3.2500, 3.7500, 4.0000]]]]) >>> m = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True) >>> m(input) tensor([[[[1.0000, 1.3333, 1.6667, 2.0000], [1.6667, 2.0000, 2.3333, 2.6667], [2.3333, 2.6667, 3.0000, 3.3333], [3.0000, 3.3333, 3.6667, 4.0000]]]]) >>> # Try scaling the same data in a larger tensor >>> input_3x3 = torch.zeros(3, 3).view(1, 1, 3, 3) >>> input_3x3[:, :, :2, :2].copy_(input) tensor([[[[1., 2.], [3., 4.]]]]) >>> input_3x3 tensor([[[[1., 2., 0.], [3., 4., 0.], [0., 0., 0.]]]]) >>> # xdoctest: +IGNORE_WANT("seems to fail when other tests are run in the same session") >>> m = nn.Upsample(scale_factor=2, mode='bilinear') # align_corners=False >>> # Notice that values in top left corner are the same with the small input (except at boundary) >>> m(input_3x3) tensor([[[[1.0000, 1.2500, 1.7500, 1.5000, 0.5000, 0.0000], [1.5000, 1.7500, 2.2500, 1.8750, 0.6250, 0.0000], [2.5000, 2.7500, 3.2500, 2.6250, 0.8750, 0.0000], [2.2500, 2.4375, 2.8125, 2.2500, 0.7500, 0.0000], [0.7500, 0.8125, 0.9375, 0.7500, 0.2500, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]]]) >>> m = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True) >>> # Notice that values in top left corner are now changed >>> m(input_3x3) tensor([[[[1.0000, 1.4000, 1.8000, 1.6000, 0.8000, 0.0000], [1.8000, 2.2000, 2.6000, 2.2400, 1.1200, 0.0000], [2.6000, 3.0000, 3.4000, 2.8800, 1.4400, 0.0000], [2.4000, 2.7200, 3.0400, 2.5600, 1.2800, 0.0000], [1.2000, 1.3600, 1.5200, 1.2800, 0.6400, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]]])
-
class torchwrench.nn.UpsamplingBilinear2d(size: int | tuple[int, int] | None =
None, scale_factor: float | tuple[float, float] | None =None)[source]¶ Bases:
UpsampleApplies a 2D bilinear upsampling to an input signal composed of several input channels.
To specify the scale, it takes either the
sizeor thescale_factoras it’s constructor argument.When
sizeis given, it is the output size of the image (h, w).- Args:
size (int or Tuple[int, int], optional): output spatial sizes scale_factor (float or Tuple[float, float], optional): multiplier for
spatial size.
Warning
This class is deprecated in favor of
interpolate(). It is equivalent tonn.functional.interpolate(..., mode='bilinear', align_corners=True).- Shape:
Input: \((N, C, H_{in}, W_{in})\)
Output: \((N, C, H_{out}, W_{out})\) where
\[H_{out} = \left\lfloor H_{in} \times \text{scale\_factor} \right\rfloor\]\[W_{out} = \left\lfloor W_{in} \times \text{scale\_factor} \right\rfloor\]Examples:
>>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2) >>> input tensor([[[[1., 2.], [3., 4.]]]]) >>> # xdoctest: +IGNORE_WANT("do other tests modify the global state?") >>> m = nn.UpsamplingBilinear2d(scale_factor=2) >>> m(input) tensor([[[[1.0000, 1.3333, 1.6667, 2.0000], [1.6667, 2.0000, 2.3333, 2.6667], [2.3333, 2.6667, 3.0000, 3.3333], [3.0000, 3.3333, 3.6667, 4.0000]]]])
-
class torchwrench.nn.UpsamplingNearest2d(size: int | tuple[int, int] | None =
None, scale_factor: float | tuple[float, float] | None =None)[source]¶ Bases:
UpsampleApplies a 2D nearest neighbor upsampling to an input signal composed of several input channels.
To specify the scale, it takes either the
sizeor thescale_factoras it’s constructor argument.When
sizeis given, it is the output size of the image (h, w).- Args:
size (int or Tuple[int, int], optional): output spatial sizes scale_factor (float or Tuple[float, float], optional): multiplier for
spatial size.
Warning
This class is deprecated in favor of
interpolate().- Shape:
Input: \((N, C, H_{in}, W_{in})\)
Output: \((N, C, H_{out}, W_{out})\) where
\[H_{out} = \left\lfloor H_{in} \times \text{scale\_factor} \right\rfloor\]\[W_{out} = \left\lfloor W_{in} \times \text{scale\_factor} \right\rfloor\]Examples:
>>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2) >>> input tensor([[[[1., 2.], [3., 4.]]]]) >>> m = nn.UpsamplingNearest2d(scale_factor=2) >>> m(input) tensor([[[[1., 1., 2., 2.], [1., 1., 2., 2.], [3., 3., 4., 4.], [3., 3., 4., 4.]]]])
- class torchwrench.nn.View(dtype: dtype, /)[source]¶
- class torchwrench.nn.View(size: Sequence[int], /)
- class torchwrench.nn.View(*size: int)
Bases:
Module- forward(x: Tensor) Tensor[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.ViewAsComplex(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
to_item().- forward(x: Tensor | ndarray | tuple[float, float]) ComplexFloatingTensor | ndarray | complex[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.ViewAsReal(*args: Any, **kwargs: Any)[source]¶
Bases:
ModuleModule version of
to_item().- forward(x: Tensor | ndarray | complex) Tensor | ndarray | tuple[float, float][source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class torchwrench.nn.ZeroPad2d(padding: int | tuple[int, int, int, int])[source]¶
Bases:
ConstantPad2dPads the input tensor boundaries with zero.
For N-dimensional padding, use
torch.nn.functional.pad().- Args:
- padding (int, tuple): the size of the padding. If is int, uses the same
padding in all boundaries. If a 4-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\))
- Shape:
Input: \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\).
Output: \((N, C, H_{out}, W_{out})\) or \((C, H_{out}, W_{out})\), where
\(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\)
\(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)
Examples:
>>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> m = nn.ZeroPad2d(2) >>> input = torch.randn(1, 1, 3, 3) >>> input tensor([[[[-0.1678, -0.4418, 1.9466], [ 0.9604, -0.4219, -0.5241], [-0.9162, -0.5436, -0.6446]]]]) >>> m(input) tensor([[[[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, -0.1678, -0.4418, 1.9466, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.9604, -0.4219, -0.5241, 0.0000, 0.0000], [ 0.0000, 0.0000, -0.9162, -0.5436, -0.6446, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]]]) >>> # using different paddings for different sides >>> m = nn.ZeroPad2d((1, 1, 2, 0)) >>> m(input) tensor([[[[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, -0.1678, -0.4418, 1.9466, 0.0000], [ 0.0000, 0.9604, -0.4219, -0.5241, 0.0000], [ 0.0000, -0.9162, -0.5436, -0.6446, 0.0000]]]])
Subpackages¶
- torchwrench.nn.functional package
- torchwrench.nn.functional.activity_to_segments
- torchwrench.nn.functional.activity_to_segments_list
- torchwrench.nn.functional.all_eq
- torchwrench.nn.functional.all_ne
- torchwrench.nn.functional.arange
- torchwrench.nn.functional.as_device
- torchwrench.nn.functional.as_dtype
- torchwrench.nn.functional.as_generator
- torchwrench.nn.functional.as_tensor
- torchwrench.nn.functional.average_power
- torchwrench.nn.functional.cat
- torchwrench.nn.functional.cat_padded_batch
- torchwrench.nn.functional.checksum_any
- torchwrench.nn.functional.concat
- torchwrench.nn.functional.count_parameters
- torchwrench.nn.functional.crop_dim
- torchwrench.nn.functional.crop_dims
- torchwrench.nn.functional.deep_equal
- torchwrench.nn.functional.empty
- torchwrench.nn.functional.equal
- torchwrench.nn.functional.find
- torchwrench.nn.functional.flatten
- torchwrench.nn.functional.full
- torchwrench.nn.functional.generate_square_subsequent_mask
- torchwrench.nn.functional.get_default_device
- torchwrench.nn.functional.get_default_dtype
- torchwrench.nn.functional.get_default_generator
- torchwrench.nn.functional.get_inverse_perm
- torchwrench.nn.functional.get_ndim
- torchwrench.nn.functional.get_perm_indices
- torchwrench.nn.functional.get_shape
- torchwrench.nn.functional.identity
- torchwrench.nn.functional.index_to_name
- torchwrench.nn.functional.index_to_onehot
- torchwrench.nn.functional.indices_to_multihot
- torchwrench.nn.functional.indices_to_multinames
- torchwrench.nn.functional.initial_seed
- torchwrench.nn.functional.insert_at_indices
- torchwrench.nn.functional.is_complex
- torchwrench.nn.functional.is_convertible_to_tensor
- torchwrench.nn.functional.is_floating_point
- torchwrench.nn.functional.is_full
- torchwrench.nn.functional.is_sorted
- torchwrench.nn.functional.is_stackable
- torchwrench.nn.functional.is_tensor
- torchwrench.nn.functional.is_unique
- torchwrench.nn.functional.lengths_to_non_pad_mask
- torchwrench.nn.functional.lengths_to_pad_mask
- torchwrench.nn.functional.lengths_to_ratios
- torchwrench.nn.functional.log_softmax_multidim
- torchwrench.nn.functional.manual_seed
- torchwrench.nn.functional.masked_equal
- torchwrench.nn.functional.masked_mean
- torchwrench.nn.functional.masked_sum
- torchwrench.nn.functional.matmul
- torchwrench.nn.functional.move_to
- torchwrench.nn.functional.move_to_rec
- torchwrench.nn.functional.mse
- torchwrench.nn.functional.multi_indices_to_multihot
- torchwrench.nn.functional.multi_indices_to_multinames
- torchwrench.nn.functional.multihot_to_indices
- torchwrench.nn.functional.multihot_to_multi_indices
- torchwrench.nn.functional.multihot_to_multinames
- torchwrench.nn.functional.multilabel_to_powerset
- torchwrench.nn.functional.multinames_to_indices
- torchwrench.nn.functional.multinames_to_multi_indices
- torchwrench.nn.functional.multinames_to_multihot
- torchwrench.nn.functional.name_to_index
- torchwrench.nn.functional.name_to_onehot
- torchwrench.nn.functional.ndim
- torchwrench.nn.functional.nelement
- torchwrench.nn.functional.no_grad
- torchwrench.nn.functional.non_pad_mask_to_lengths
- torchwrench.nn.functional.non_pad_mask_to_ratios
- torchwrench.nn.functional.one_hot
- torchwrench.nn.functional.onehot_to_index
- torchwrench.nn.functional.onehot_to_name
- torchwrench.nn.functional.ones
- torchwrench.nn.functional.pad_and_crop_dim
- torchwrench.nn.functional.pad_and_stack_rec
- torchwrench.nn.functional.pad_dim
- torchwrench.nn.functional.pad_dims
- torchwrench.nn.functional.pad_mask_to_lengths
- torchwrench.nn.functional.pad_mask_to_ratios
- torchwrench.nn.functional.powerset_to_multilabel
- torchwrench.nn.functional.probs_to_index
- torchwrench.nn.functional.probs_to_indices
- torchwrench.nn.functional.probs_to_multi_indices
- torchwrench.nn.functional.probs_to_multihot
- torchwrench.nn.functional.probs_to_multinames
- torchwrench.nn.functional.probs_to_name
- torchwrench.nn.functional.probs_to_onehot
- torchwrench.nn.functional.prod
- torchwrench.nn.functional.rand
- torchwrench.nn.functional.randint
- torchwrench.nn.functional.randn
- torchwrench.nn.functional.randperm
- torchwrench.nn.functional.randperm_diff
- torchwrench.nn.functional.ranks
- torchwrench.nn.functional.ratios_to_lengths
- torchwrench.nn.functional.ratios_to_non_pad_mask
- torchwrench.nn.functional.ratios_to_pad_mask
- torchwrench.nn.functional.recursive_to
- torchwrench.nn.functional.remove_at_indices
- torchwrench.nn.functional.repeat_interleave_nd
- torchwrench.nn.functional.resample_nearest_freqs
- torchwrench.nn.functional.resample_nearest_rates
- torchwrench.nn.functional.resample_nearest_steps
- torchwrench.nn.functional.rmse
- torchwrench.nn.functional.seed
- torchwrench.nn.functional.segments_list_to_activity
- torchwrench.nn.functional.segments_to_activity
- torchwrench.nn.functional.segments_to_segments_list
- torchwrench.nn.functional.set_default_dtype
- torchwrench.nn.functional.set_default_generator
- torchwrench.nn.functional.shape
- torchwrench.nn.functional.shuffled
- torchwrench.nn.functional.softmax_multidim
- torchwrench.nn.functional.split
- torchwrench.nn.functional.squeeze
- torchwrench.nn.functional.squeeze_
- torchwrench.nn.functional.squeeze_copy
- torchwrench.nn.functional.stack
- torchwrench.nn.functional.tensor_to_lengths
- torchwrench.nn.functional.tensor_to_non_pad_mask
- torchwrench.nn.functional.tensor_to_pad_mask
- torchwrench.nn.functional.tensor_to_tensors_list
- torchwrench.nn.functional.tensors_list_to_lengths
- torchwrench.nn.functional.to_item
- torchwrench.nn.functional.to_tensor
- torchwrench.nn.functional.top_k
- torchwrench.nn.functional.top_p
- torchwrench.nn.functional.topk
- torchwrench.nn.functional.transform_drop
- torchwrench.nn.functional.unsqueeze
- torchwrench.nn.functional.unsqueeze_
- torchwrench.nn.functional.unsqueeze_copy
- torchwrench.nn.functional.view_as_complex
- torchwrench.nn.functional.view_as_real
- torchwrench.nn.functional.where
- torchwrench.nn.functional.zeros
- Submodules
- torchwrench.nn.functional.activation module
- torchwrench.nn.functional.checksum module
- torchwrench.nn.functional.checksum.checksum_dataframe
- torchwrench.nn.functional.checksum.checksum_dtype
- torchwrench.nn.functional.checksum.checksum_module
- torchwrench.nn.functional.checksum.checksum_numpy
- torchwrench.nn.functional.checksum.checksum_series
- torchwrench.nn.functional.checksum.checksum_tensor
- torchwrench.nn.functional.cropping module
- torchwrench.nn.functional.indices module
- torchwrench.nn.functional.make module
- torchwrench.nn.functional.mask module
- torchwrench.nn.functional.mask.generate_square_subsequent_mask
- torchwrench.nn.functional.mask.lengths_to_non_pad_mask
- torchwrench.nn.functional.mask.lengths_to_pad_mask
- torchwrench.nn.functional.mask.lengths_to_ratios
- torchwrench.nn.functional.mask.masked_equal
- torchwrench.nn.functional.mask.masked_mean
- torchwrench.nn.functional.mask.masked_sum
- torchwrench.nn.functional.mask.non_pad_mask_to_lengths
- torchwrench.nn.functional.mask.non_pad_mask_to_ratios
- torchwrench.nn.functional.mask.pad_mask_to_lengths
- torchwrench.nn.functional.mask.pad_mask_to_ratios
- torchwrench.nn.functional.mask.ratios_to_lengths
- torchwrench.nn.functional.mask.ratios_to_non_pad_mask
- torchwrench.nn.functional.mask.ratios_to_pad_mask
- torchwrench.nn.functional.mask.tensor_to_lengths
- torchwrench.nn.functional.mask.tensor_to_non_pad_mask
- torchwrench.nn.functional.mask.tensor_to_pad_mask
- torchwrench.nn.functional.mask.tensor_to_tensors_list
- torchwrench.nn.functional.mask.tensors_list_to_lengths
- torchwrench.nn.functional.multiclass module
- torchwrench.nn.functional.multiclass.index_to_name
- torchwrench.nn.functional.multiclass.index_to_onehot
- torchwrench.nn.functional.multiclass.name_to_index
- torchwrench.nn.functional.multiclass.name_to_onehot
- torchwrench.nn.functional.multiclass.one_hot
- torchwrench.nn.functional.multiclass.onehot_to_index
- torchwrench.nn.functional.multiclass.onehot_to_name
- torchwrench.nn.functional.multiclass.probs_to_index
- torchwrench.nn.functional.multiclass.probs_to_name
- torchwrench.nn.functional.multiclass.probs_to_onehot
- torchwrench.nn.functional.multilabel module
- torchwrench.nn.functional.multilabel.indices_to_multihot
- torchwrench.nn.functional.multilabel.indices_to_multinames
- torchwrench.nn.functional.multilabel.multi_indices_to_multihot
- torchwrench.nn.functional.multilabel.multi_indices_to_multinames
- torchwrench.nn.functional.multilabel.multihot_to_indices
- torchwrench.nn.functional.multilabel.multihot_to_multi_indices
- torchwrench.nn.functional.multilabel.multihot_to_multinames
- torchwrench.nn.functional.multilabel.multinames_to_indices
- torchwrench.nn.functional.multilabel.multinames_to_multi_indices
- torchwrench.nn.functional.multilabel.multinames_to_multihot
- torchwrench.nn.functional.multilabel.probs_to_indices
- torchwrench.nn.functional.multilabel.probs_to_multi_indices
- torchwrench.nn.functional.multilabel.probs_to_multihot
- torchwrench.nn.functional.multilabel.probs_to_multinames
- torchwrench.nn.functional.new module
- torchwrench.nn.functional.new.arange
- torchwrench.nn.functional.new.empty
- torchwrench.nn.functional.new.full
- torchwrench.nn.functional.new.ones
- torchwrench.nn.functional.new.rand
- torchwrench.nn.functional.new.randint
- torchwrench.nn.functional.new.randn
- torchwrench.nn.functional.new.randperm
- torchwrench.nn.functional.new.zeros
- torchwrench.nn.functional.others module
- torchwrench.nn.functional.others.average_power
- torchwrench.nn.functional.others.cat
- torchwrench.nn.functional.others.concat
- torchwrench.nn.functional.others.count_parameters
- torchwrench.nn.functional.others.deep_equal
- torchwrench.nn.functional.others.find
- torchwrench.nn.functional.others.get_ndim
- torchwrench.nn.functional.others.get_shape
- torchwrench.nn.functional.others.mse
- torchwrench.nn.functional.others.ndim
- torchwrench.nn.functional.others.nelement
- torchwrench.nn.functional.others.prod
- torchwrench.nn.functional.others.ranks
- torchwrench.nn.functional.others.rmse
- torchwrench.nn.functional.others.shape
- torchwrench.nn.functional.others.stack
- torchwrench.nn.functional.padding module
- torchwrench.nn.functional.powerset module
- torchwrench.nn.functional.predicate module
- torchwrench.nn.functional.predicate.all_eq
- torchwrench.nn.functional.predicate.all_ne
- torchwrench.nn.functional.predicate.is_complex
- torchwrench.nn.functional.predicate.is_convertible_to_tensor
- torchwrench.nn.functional.predicate.is_floating_point
- torchwrench.nn.functional.predicate.is_full
- torchwrench.nn.functional.predicate.is_sorted
- torchwrench.nn.functional.predicate.is_stackable
- torchwrench.nn.functional.predicate.is_unique
- torchwrench.nn.functional.segments module
- torchwrench.nn.functional.transform module
- torchwrench.nn.functional.transform.as_tensor
- torchwrench.nn.functional.transform.flatten
- torchwrench.nn.functional.transform.move_to
- torchwrench.nn.functional.transform.move_to_rec
- torchwrench.nn.functional.transform.pad_and_crop_dim
- torchwrench.nn.functional.transform.repeat_interleave_nd
- torchwrench.nn.functional.transform.resample_nearest_freqs
- torchwrench.nn.functional.transform.resample_nearest_rates
- torchwrench.nn.functional.transform.resample_nearest_steps
- torchwrench.nn.functional.transform.shuffled
- torchwrench.nn.functional.transform.squeeze
- torchwrench.nn.functional.transform.squeeze_
- torchwrench.nn.functional.transform.squeeze_copy
- torchwrench.nn.functional.transform.to_item
- torchwrench.nn.functional.transform.to_tensor
- torchwrench.nn.functional.transform.top_k
- torchwrench.nn.functional.transform.top_p
- torchwrench.nn.functional.transform.topk
- torchwrench.nn.functional.transform.transform_drop
- torchwrench.nn.functional.transform.unsqueeze
- torchwrench.nn.functional.transform.unsqueeze_
- torchwrench.nn.functional.transform.unsqueeze_copy
- torchwrench.nn.functional.transform.view_as_complex
- torchwrench.nn.functional.transform.view_as_real
- torchwrench.nn.modules package
- torchwrench.nn.modules.Abs
- torchwrench.nn.modules.Angle
- torchwrench.nn.modules.AsTensor
- torchwrench.nn.modules.CropDim
- torchwrench.nn.modules.CropDims
- torchwrench.nn.modules.EModule
- torchwrench.nn.modules.EModuleDict
- torchwrench.nn.modules.EModuleList
- torchwrench.nn.modules.EModulePartial
- torchwrench.nn.modules.ESequential
- torchwrench.nn.modules.Exp
- torchwrench.nn.modules.Exp2
- torchwrench.nn.modules.FFT
- torchwrench.nn.modules.IFFT
- torchwrench.nn.modules.Identity
- torchwrench.nn.modules.Imag
- torchwrench.nn.modules.IndexToName
- torchwrench.nn.modules.IndexToOnehot
- torchwrench.nn.modules.IndicesToMultihot
- torchwrench.nn.modules.IndicesToMultinames
- torchwrench.nn.modules.Log
- torchwrench.nn.modules.Log10
- torchwrench.nn.modules.Log2
- torchwrench.nn.modules.LogSoftmaxMultidim
- torchwrench.nn.modules.MaskedMean
- torchwrench.nn.modules.MaskedSum
- torchwrench.nn.modules.Max
- torchwrench.nn.modules.Mean
- torchwrench.nn.modules.Min
- torchwrench.nn.modules.Module
- Variables
- T_destination
- add_module
- apply
- bfloat16
- buffers
- call_super_init
- children
- compile
- cpu
- cuda
- double
- dump_patches
- eval
- extra_repr
- float
- forward
- get_buffer
- get_extra_state
- get_parameter
- get_submodule
- half
- ipu
- load_state_dict
- modules
- mtia
- named_buffers
- named_children
- named_modules
- named_parameters
- parameters
- register_backward_hook
- register_buffer
- register_forward_hook
- register_forward_pre_hook
- register_full_backward_hook
- register_full_backward_pre_hook
- register_load_state_dict_post_hook
- register_load_state_dict_pre_hook
- register_module
- register_parameter
- register_state_dict_post_hook
- register_state_dict_pre_hook
- requires_grad_
- set_extra_state
- set_submodule
- share_memory
- state_dict
- to
- to_empty
- train
- training
- type
- xpu
- zero_grad
- torchwrench.nn.modules.ModuleDict
- torchwrench.nn.modules.ModuleList
- torchwrench.nn.modules.ModulePartial
- torchwrench.nn.modules.MoveToRec
- torchwrench.nn.modules.MultiIndicesToMultihot
- torchwrench.nn.modules.MultiIndicesToMultinames
- torchwrench.nn.modules.MultihotToIndices
- torchwrench.nn.modules.MultihotToMultiIndices
- torchwrench.nn.modules.MultihotToMultinames
- torchwrench.nn.modules.MultilabelToPowerset
- torchwrench.nn.modules.MultinamesToIndices
- torchwrench.nn.modules.MultinamesToMultiIndices
- torchwrench.nn.modules.MultinamesToMultihot
- torchwrench.nn.modules.NDArrayToTensor
- torchwrench.nn.modules.NameToIndex
- torchwrench.nn.modules.NameToOnehot
- torchwrench.nn.modules.Normalize
- torchwrench.nn.modules.OnehotToIndex
- torchwrench.nn.modules.OnehotToName
- torchwrench.nn.modules.PadAndCropDim
- torchwrench.nn.modules.PadAndStackRec
- torchwrench.nn.modules.PadDim
- torchwrench.nn.modules.PadDims
- torchwrench.nn.modules.Permute
- torchwrench.nn.modules.PositionalEncoding
- torchwrench.nn.modules.Pow
- torchwrench.nn.modules.PowersetToMultilabel
- torchwrench.nn.modules.ProbsToIndex
- torchwrench.nn.modules.ProbsToIndices
- torchwrench.nn.modules.ProbsToMultiIndices
- torchwrench.nn.modules.ProbsToMultihot
- torchwrench.nn.modules.ProbsToMultinames
- torchwrench.nn.modules.ProbsToName
- torchwrench.nn.modules.ProbsToOnehot
- torchwrench.nn.modules.Real
- torchwrench.nn.modules.Repeat
- torchwrench.nn.modules.RepeatInterleave
- torchwrench.nn.modules.RepeatInterleaveNd
- torchwrench.nn.modules.ResampleNearestFreqs
- torchwrench.nn.modules.ResampleNearestRates
- torchwrench.nn.modules.ResampleNearestSteps
- torchwrench.nn.modules.Reshape
- torchwrench.nn.modules.Sequential
- torchwrench.nn.modules.Shuffled
- torchwrench.nn.modules.SoftmaxMultidim
- torchwrench.nn.modules.Sort
- torchwrench.nn.modules.Squeeze
- torchwrench.nn.modules.TFlatten
- torchwrench.nn.modules.TensorTo
- torchwrench.nn.modules.TensorToNDArray
- torchwrench.nn.modules.ToItem
- torchwrench.nn.modules.ToList
- torchwrench.nn.modules.ToNDArray
- torchwrench.nn.modules.ToTensor
- torchwrench.nn.modules.TopP
- torchwrench.nn.modules.Topk
- torchwrench.nn.modules.TransformDrop
- torchwrench.nn.modules.Transpose
- torchwrench.nn.modules.Unsqueeze
- torchwrench.nn.modules.View
- torchwrench.nn.modules.ViewAsComplex
- torchwrench.nn.modules.ViewAsReal
- Submodules
- torchwrench.nn.modules.activation module
- torchwrench.nn.modules.container module
- torchwrench.nn.modules.crop module
- torchwrench.nn.modules.layer module
- torchwrench.nn.modules.mask module
- torchwrench.nn.modules.module module
- torchwrench.nn.modules.multiclass module
- torchwrench.nn.modules.multiclass.IndexToName
- torchwrench.nn.modules.multiclass.IndexToOnehot
- torchwrench.nn.modules.multiclass.NameToIndex
- torchwrench.nn.modules.multiclass.NameToOnehot
- torchwrench.nn.modules.multiclass.OnehotToIndex
- torchwrench.nn.modules.multiclass.OnehotToName
- torchwrench.nn.modules.multiclass.ProbsToIndex
- torchwrench.nn.modules.multiclass.ProbsToName
- torchwrench.nn.modules.multiclass.ProbsToOnehot
- torchwrench.nn.modules.multilabel module
- torchwrench.nn.modules.multilabel.IndicesToMultihot
- torchwrench.nn.modules.multilabel.IndicesToMultinames
- torchwrench.nn.modules.multilabel.MultiIndicesToMultihot
- torchwrench.nn.modules.multilabel.MultiIndicesToMultinames
- torchwrench.nn.modules.multilabel.MultihotToIndices
- torchwrench.nn.modules.multilabel.MultihotToMultiIndices
- torchwrench.nn.modules.multilabel.MultihotToMultinames
- torchwrench.nn.modules.multilabel.MultinamesToIndices
- torchwrench.nn.modules.multilabel.MultinamesToMultiIndices
- torchwrench.nn.modules.multilabel.MultinamesToMultihot
- torchwrench.nn.modules.multilabel.ProbsToIndices
- torchwrench.nn.modules.multilabel.ProbsToMultiIndices
- torchwrench.nn.modules.multilabel.ProbsToMultihot
- torchwrench.nn.modules.multilabel.ProbsToMultinames
- torchwrench.nn.modules.numpy module
- torchwrench.nn.modules.padding module
- torchwrench.nn.modules.powerset module
- torchwrench.nn.modules.tensor module
- torchwrench.nn.modules.tensor.Abs
- torchwrench.nn.modules.tensor.Angle
- torchwrench.nn.modules.tensor.Exp
- torchwrench.nn.modules.tensor.Exp2
- torchwrench.nn.modules.tensor.FFT
- torchwrench.nn.modules.tensor.IFFT
- torchwrench.nn.modules.tensor.Imag
- torchwrench.nn.modules.tensor.Interpolate
- torchwrench.nn.modules.tensor.Log
- torchwrench.nn.modules.tensor.Log10
- torchwrench.nn.modules.tensor.Log2
- torchwrench.nn.modules.tensor.Max
- torchwrench.nn.modules.tensor.Mean
- torchwrench.nn.modules.tensor.Min
- torchwrench.nn.modules.tensor.Normalize
- torchwrench.nn.modules.tensor.Permute
- torchwrench.nn.modules.tensor.Pow
- torchwrench.nn.modules.tensor.Real
- torchwrench.nn.modules.tensor.Repeat
- torchwrench.nn.modules.tensor.RepeatInterleave
- torchwrench.nn.modules.tensor.Reshape
- torchwrench.nn.modules.tensor.Sort
- torchwrench.nn.modules.tensor.TensorTo
- torchwrench.nn.modules.tensor.ToList
- torchwrench.nn.modules.tensor.Transpose
- torchwrench.nn.modules.tensor.View
- torchwrench.nn.modules.transform module
- torchwrench.nn.modules.transform.AsTensor
- torchwrench.nn.modules.transform.Identity
- torchwrench.nn.modules.transform.MoveToRec
- torchwrench.nn.modules.transform.PadAndCropDim
- torchwrench.nn.modules.transform.RepeatInterleaveNd
- torchwrench.nn.modules.transform.ResampleNearestFreqs
- torchwrench.nn.modules.transform.ResampleNearestRates
- torchwrench.nn.modules.transform.ResampleNearestSteps
- torchwrench.nn.modules.transform.Shuffled
- torchwrench.nn.modules.transform.Squeeze
- torchwrench.nn.modules.transform.TFlatten
- torchwrench.nn.modules.transform.ToItem
- torchwrench.nn.modules.transform.ToTensor
- torchwrench.nn.modules.transform.TopK
- torchwrench.nn.modules.transform.TopP
- torchwrench.nn.modules.transform.Topk
- torchwrench.nn.modules.transform.TransformDrop
- torchwrench.nn.modules.transform.Unsqueeze
- torchwrench.nn.modules.transform.ViewAsComplex
- torchwrench.nn.modules.transform.ViewAsReal