foundations of computational agents
Table B.2 gives the defaults for two common Python-based deep learning frameworks, Keras [Chollet, 2021], a high-level interface to tensorflow, and PyTorch. For documentation on Keras, see https://keras.io. For documentation on PyTorch, see https://pytorch.org.
This Book | Keras | PyTorch | ||||
---|---|---|---|---|---|---|
Algorithm | Page | Name | Name | Default | Name | Default |
Dense | 8.3 | Dense | Linear | |||
units | – | out_features | – | |||
(implicit) | in_features | – | ||||
update | 8.3 | SGD | SGD | |||
learning_rate | 0.01 | lr | – | |||
momentum | 8.2.1 | momentum | 0 | momentum | 0 | |
RMS-Prop | 8.2.2 | RMSprop | RMSprop | |||
learning_rate | 0.001 | lr | 0.01 | |||
rho | 0.9 | alpha | 0.99 | |||
epsilon | eps | |||||
Adam | 8.2.3 | Adam | Adam | |||
learning_rate | 0.001 | lr | 0.01 | |||
beta_1 | 0.9 | betas[0] | 0.9 | |||
beta_2 | 0.999 | betas[1] | 0.999 | |||
epsilon | eps | |||||
Dropout | 8.7 | Dropout | Dropout | |||
rate | rate | – | p | 0.5 | ||
2D Conv | 8.9 | Conv2D | Conv2D | |||
kernel_size | kernel_size | |||||
# output channels | filters | – | out_channels | – | ||
# input channels | (implicit) | in_channels | – |
In Keras and PyTorch, the optimizers are specified separately. The one corresponding to the update of Figure 8.9 is SGD (stochastic gradient descent). In both, momentum is a parameter of SGD.
In Keras, the number of input features is implicit, matching the output of the lower layer that the layer is connected to.