B Mapping to Open-Source Packages B.1 Gradient-Boosted Trees References

B.2 Deep Learning

Table B.2 gives the defaults for two common Python-based deep learning frameworks, Keras [Chollet, 2021], a high-level interface to tensorflow, and PyTorch. For documentation on Keras, see https://keras.io. For documentation on PyTorch, see https://pytorch.org.

This Book			Keras		PyTorch
Algorithm	Page	Name	Name	Default	Name	Default
Dense	8.3	$D e n s e$	Dense		Linear
		$n_{o}$	units	–	out_features	–
		$n_{i}$	(implicit)		in_features	–
update	8.3	$u p d a t e$	SGD		SGD
		$\eta$	learning_rate	0.01	lr	–
momentum	8.2.1	$\alpha$	momentum	0	momentum	0
RMS-Prop	8.2.2		RMSprop		RMSprop
		$\eta$	learning_rate	0.001	lr	0.01
		$\rho$	rho	0.9	alpha	0.99
		$\epsilon$	epsilon	$10^{-7}$	eps	$10^{-8}$
Adam	8.2.3		Adam		Adam
		$\eta$	learning_rate	0.001	lr	0.01
		$\beta_{1}$	beta_1	0.9	betas[0]	0.9
		$\beta_{2}$	beta_2	0.999	betas[1]	0.999
		$\epsilon$	epsilon	$10^{-7}$	eps	$10^{-8}$
Dropout	8.7	$D r o p o u t$	Dropout		Dropout
		rate	rate	–	p	0.5
2D Conv	8.9	$Conv2D$	Conv2D		Conv2D
		$k$	kernel_size		kernel_size
	# output channels		filters	–	out_channels	–
	# input channels		(implicit)		in_channels	–

Table B.2: Hyperparameters for two deep learning packages

In Keras and PyTorch, the optimizers are specified separately. The one corresponding to the update of Figure 8.9 is SGD (stochastic gradient descent). In both, momentum is a parameter of SGD.

In Keras, the number of input features is implicit, matching the output of the lower layer that the layer is connected to.

Our definition of RMS-Prop follows the original and Keras. In PyTorch, the RMS-Prop update has $\epsilon$ outside of the square root (Line 5 of the method $u p d a t e$ for RMP-Prop on page 5), similar to Adam.

Artificial Intelligence 3E

B.2 Deep Learning

Artificial
Intelligence 3E