input_size: The number of expected features in the input x
hidden_size: The number of features in the hidden state h
num_layers: Number of recurrent layers. E.g., setting num_layers=2
would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
nonlinearity: The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'
bias: If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE
batch_first: If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE
dropout: If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0
bidirectional: If TRUE, becomes a bidirectional RNN. Default: FALSE
...: other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
ht=tanh(Wihxt+bih+Whhh(t−1)+bhh)
where ht is the hidden state at time t, xt is the input at time t, and h(t−1) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then \mboxReLU is used instead of tanh.
Inputs
input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).
Shape
Input1: (L,N,Hin) tensor containing input features where Hin=\mboxinput_size and L represents a sequence length.
Input2: (S,N,Hout) tensor containing the initial hidden state for each element in the batch. Hout=\mboxhidden_size
Defaults to zero if not provided. where S=\mboxnum_layers∗\mboxnum_directions
If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: (L,N,Hall) where Hall=\mboxnum_directions∗\mboxhidden_size
Output2: (S,N,Hout) tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)
weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)
bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)
bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)
Note
All the weights and biases are initialized from U(−k,k)