My Explorations Into Deep Learning: A study in convergence of GAN, Stability of Reinforcement Learning, Oscillations etc.

https://devblogs.nvidia.com/deep-learning-nutshell-reinforcement-learning/

https://medium.com/@m.alzantot/deep-reinforcement-learning-demysitifed-episode-2-policy-iteration-value-iteration-and-q-978f9e89ddaa

In practice, random search does not work well for complex problems where the search space (that depends on the number of possible states and actions) is large. Also, genetic algorithm is a meta-heuristic optimization so it does not provide a guarantee to find an optimal solution. In this article, we are going to introduce fundamental reinforcement learning algorithms.

https://www.cs.rochester.edu/~gildea/2013_Spring/Notes/csc446lecture21notes.pdf

https://mpatacchiola.github.io/blog/2016/12/09/dissecting-reinforcement-learning.html

https://mpatacchiola.github.io/blog/2017/01/15/dissecting-reinforcement-learning-2.html

The advantages of MC methods over the dynamic programming approach are the following:

MC allow learning optimal behaviour directly from interaction with the environment.
It is easy and efficient to focus MC methods on small subset of the states.
MC can be used with simulations (sample models)

During the post I will analyse the first two points. The third point is less intuitive. In many applications it is easy to simulate episodes but it can be extremely difficult to construct the transition model required by the dynamic programming techniques. In all these cases the MC method rules.

Stability of Generative Adversarial Networks | ARAYA Inc.

http://www.araya.org/archives/1183

https://www.analyticsvidhya.com/blog/2017/06/introductory-generative-adversarial-networks-gans/

https://openreview.net/forum?id=ryepFJbA-&noteId=S1zS8JpSf

Abstract: We propose studying GAN training dynamics as regret minimization, which is in contrast to the popular view that there is consistent minimization of a divergence between real and generated distributions. We analyze the convergence of GAN training from this new point of view to understand why mode collapse happens. We hypothesize the existence of undesirable local equilibria in this non-convex game to be responsible for mode collapse. We observe that these local equilibria often exhibit sharp gradients of the discriminator function around some real data points. We demonstrate that these degenerate local equilibria can be avoided with a gradient penalty scheme called DRAGAN. We show that DRAGAN enables faster training, achieves improved stability with fewer mode collapses, and leads to generator networks with better modeling performance across a variety of architectures and objective functions.

TL;DR: Analysis of convergence and mode collapse by studying GAN training process as regret minimization

Keywords: GAN, Generative Adversarial Networks, Mode Collapse, Stability, Game Theory, Regret Minimization, Convergence, Gradient Penalty

https://arxiv.org/pdf/1710.07035.pdf

https://arxiv.org/pdf/1701.06264.pdf

http://cedricvillani.org/wp-content/uploads/2012/08/preprint-1.pdf