Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over the probability of possible alignments of input to target, producing a loss value which is differentiable with respect to each input node. The alignment of input to target is assumed to be "many-to-one", which limits the length of the target sequence such that it must be ≤ the input length.
reduction: (string, optional): Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the output losses will be divided by the target lengths and then the mean over the batch is taken. Default: 'mean'
zero_infinity: (bool, optional): Whether to zero infinite losses and the associated gradients. Default: FALSE
Infinite losses mainly occur when the inputs are too short to be aligned to the targets.
Note
In order to use CuDNN, the following must be satisfied: targets must be in concatenated format, all input_lengths must be T. blank=0, target_lengths≤256, the integer arguments must be of The regular implementation uses the (more common in PyTorch) torch_long dtype. dtype torch_int32.
In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting torch.backends.cudnn.deterministic = TRUE.
Shape
Log_probs: Tensor of size (T,N,C), where T=\mboxinputlength, N=\mboxbatchsize, and C=\mboxnumberofclasses(includingblank). The logarithmized probabilities of the outputs (e.g. obtained with [nnf)log_softmax()]).
Targets: Tensor of size (N,S) or (\mboxsum(\mboxtarget_lengths)), where N=\mboxbatchsize and S=\mboxmaxtargetlength,ifshapeis(N,S). It represent the target sequences. Each element in the target sequence is a class index. And the target index cannot be blank (default=0). In the (N,S) form, targets are padded to the length of the longest sequence, and stacked. In the (\mboxsum(\mboxtarget_lengths)) form, the targets are assumed to be un-padded and concatenated within 1 dimension.
Input_lengths: Tuple or tensor of size (N), where N=\mboxbatchsize. It represent the lengths of the inputs (must each be ≤T). And the lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths.
Target_lengths: Tuple or tensor of size (N), where N=\mboxbatchsize. It represent lengths of the targets. Lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths. If target shape is (N,S), target_lengths are effectively the stop index sn for each target sequence, such that target_n = targets[n,0:s_n] for each target in a batch. Lengths must each be ≤S
If the targets are given as a 1d tensor that is the concatenation of individual targets, the target_lengths must add up to the total length of the tensor.
Output: scalar. If reduction is 'none', then (N), where N=\mboxbatchsize.