My Explorations Into Deep Learning: Mathematical optimization theory and methods

https://research.googleblog.com/2018/03/using-machine-learning-to-discover.html

https://arxiv.org/pdf/1706.10207.pdf

Optimization Methods for Supervised Machine Learning:
From Linear Models to Deep Learning
Frank E. Curtis
Katya Scheinberg

July 3, 2017
Abstract
The goal of this tutorial is to introduce key models, algorithms, and
open questions related
to the use of optimization methods for solving problems arising in
machine learning. It is
written with an INFORMS audience in mind, specifically those readers
who are familiar with
the basics of optimization algorithms, but less familiar with machine
learning. We begin by
deriving a formulation of a supervised learning problem and show how
it leads to various op-
timization problems, depending on the context and underlying
assumptions. We then discuss
some of the distinctive features of these optimization problems,
focusing on the examples of
logistic regression and the training of deep neural networks. The
latter half of the tutorial
focuses on optimization algorithms, first for convex logistic
regression, for which we discuss
the use of first-order methods, the stochastic gradient method,
variance reducing stochastic
methods, and second-order methods. Finally, we discuss how these
approaches can be em-
ployed to the training of deep neural networks, emphasizing the
difficulties that arise from the
complex, nonconvex structure of these models.

https://arxiv.org/pdf/1709.07417.pdf

We present an approach to automate the process
of discovering optimization methods, with a fo-
cus on deep learning architectures. We train a
Recurrent Neural Network controller to generate
a string in a domain specific language that de-
scribes a mathematical update equation based on
a list of primitive functions, such as the gradi-
ent, running average of the gradient, etc. The
controller is trained with Reinforcement Learn-
ing to maximize the performance of a model after
a few epochs. On CIFAR-10, our method discov-
ers several update rules that are better than many
commonly used optimizers, such as Adam, RM-
SProp, or SGD with and without Momentum on a
ConvNet model. We introduce two new optimiz-
ers, named PowerSign and AddSign, which we
show transfer well and improve training on a va-
riety of different tasks and architectures, includ-
ing ImageNet classification and Google's neural
machine translation system.

https://arxiv.org/abs/1412.6980v8

We introduce Adam, an algorithm for first-order gradient-based
optimization of stochastic objective functions, based on adaptive
estimates of lower-order moments. The method is straightforward to
implement, is computationally efficient, has little memory
requirements, is invariant to diagonal rescaling of the gradients, and
is well suited for problems that are large in terms of data and/or
parameters. The method is also appropriate for non-stationary
objectives and problems with very noisy and/or sparse gradients. The
hyper-parameters have intuitive interpretations and typically require
little tuning. Some connections to related algorithms, on which Adam
was inspired, are discussed. We also analyze the theoretical
convergence properties of the algorithm and provide a regret bound on
the convergence rate that is comparable to the best known results
under the online convex optimization framework. Empirical results
demonstrate that Adam works well in practice and compares favorably to
other stochastic optimization methods. Finally, we discuss AdaMax, a
variant of Adam based on the infinity norm.

https://arxiv.org/pdf/1708.07827.pdf

https://arxiv.org/pdf/1611.05827.pdf

https://arxiv.org/pdf/1706.03662.pdf

https://arxiv.org/pdf/1706.04638.pdf

My Explorations Into Deep Learning

Mar 31, 2018

Mathematical optimization theory and methods

No comments: