PUBLICATIONS

Abstract

The Golden Ratio in Machine Learning.


Jaeger S

50th IEEE Applied Imagery Pattern Recognition Workshop (AIPR), October 2021.

Abstract:

Gradient descent has been a central training principle for artificial neural networks from the early beginnings to today’s deep learning networks. The most common implementation is the backpropagation algorithm for training feedforward neural networks in a supervised fashion. A drawback of backpropagation has been the search required to find optimal values of two important training parameters, learning rate and momentum weight. The learning rate specifies the step size towards a minimum of the loss function when following the gradient, while the momentum weight considers previous weight changes when updating current weights. Using both parameters in conjunction with each other generally improves training, although their specific values do not follow immediately from standard backpropagation theory. This paper proposes a new information-theoretical loss function based on cross-entropy for which it derives a specific learning rate and momentum weight. Many training procedures based on backpropagation use cross-entropy directly as their loss function. Instead, this paper investigates a dual process model with two processes, in which one process minimizes the Kullback-Leibler divergence while its dual counterpart minimizes the Shannon entropy. The golden ratio plays an important role here, allowing to derive theoretical values for the learning rate and momentum weight, matching closely the values traditionally used in the literature, which are determined empirically. To validate this information-theoretical approach further, classification results for a handwritten digit recognition task are presented, showing that the proposed loss function, in conjunction with the derived learning rate and momentum weight, works in practice.


Jaeger S. The Golden Ratio in Machine Learning. 
50th IEEE Applied Imagery Pattern Recognition Workshop (AIPR), October 2021.

PDF