My Explorations Into Deep Learning: A study in meta-learning and its application to mathematical optimization

Proposal: Neural Architecture Search can be used to search for optimal algorithm (an example is genetic algorithm used for PCB design):

Referencing earlier writeup on optimization via Deep Learning:

http://ilovedeeplearning.blogspot.com/2018/06/travelling-salesman-optimization-and.html

and:

http://ilovedeeplearning.blogspot.com/2018/06/a-random-walk-into-ideas-for-machine.html

Now we will consider the idea of just "Neural Architecture Search" to search through the space of algorithm.

Meta learning is achieved either via simulation (of the biological neural network) or via "neural architecture search", which is basically searching through evolving the different algorithms looking for optimization at the algorithm level, based on reinforcement learning strategies.

It started with this paper:

https://arxiv.org/abs/1611.01578

and subsequently Google spearheaded into the AutoML arena:

https://medium.com/iotforall/automl-promises-vs-reality-850759f2564a

https://www.ml4aad.org/automl/literature-on-neural-architecture-search/

https://icml.cc/Conferences/2018/Schedule?showEvent=2776

https://www.ml4aad.org/wp-content/uploads/2018/07/automl_book_draft_neural_architecture_search.pdf

http://www.fast.ai/2018/07/16/auto-ml2/

https://github.com/markdtw/awesome-architecture-search

https://lab.wallarm.com/the-first-step-by-step-guide-for-implementing-neural-architecture-search-with-reinforcement-99ade71b3d28

https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet

But other than this two approach (simulation + NAS) there are other approaches to Meta-Learning:

MAML:

https://medium.com/@jrodthoughts/whats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

https://d4mucfpksywv.cloudfront.net/research-covers/reptile/reptile_update.pdf

Shared Hierarchy (and the inventor is a high school student):

http://openai-assets.s3.amazonaws.com/MLSH/mlsh_paper.pdf

Some comments:

a. RNN has been successful used in Meta-learning, to combine with RL as feedback, to decide on how the optimal algorithm can be designed.

b. Information as statistics - the more information, the better, as the entropy becomes smaller with larger information set.

c. With greater amount of information, training rounds should be lesser, as the correlation becomes easier to achieve.

d. Graphical vs sequencing: these two have different characteristics like serialization of tasks, or parallel division of tasks. But graphical information again have two possibilities: the individual nodes in graph can be concurrently happening and cannot. If it cannot, an examples in the different types of optimzation strategies. If it can, then it is an example of concurrently executing tasks.

e. Meta-Learning, together with Genetic algo must have a Bayesian component that choose the optimal strategies, based on certain past knowledge of what works and what don't.

f. Representation is still a key problem, as different representation will have different ways of training, and meta-learned. Is there a way to indicate the optimum data - so as to feedback into the system to indicate the optimum meta-learning rules? And after meta-learning has been fixed, it will be used to generate the training data and subsequently the problem solving algorithm itself.

g. As GPU is good in convolution, the mapping of training data or any representation must have a mapping to images itself. This one-one correspondence will help to "visualize" our algorithm, or given "causality" or "explaning" effect on the data.

h. Different types of data will require different frequencies of update: meta learning, being algorithms itself, still has to fit the set of training data, ideally should have a less frequent update frequencies as compared with meta-learning.

i. In (h) above, the different nodes can be the data or constituent of meta-learning itself, and if they are structurally/temporally drawn into a graphical representation, or hierarchical representation - will have different implications upon affecting one another through changes of the "probabilitistic" value of being chosen (Bayesian derived through experience).

Reference:

https://developers.google.com/optimization/

https://developers.google.com/optimization/routing/tsp

https://openreview.net/forum?id=SyX0IeWAW

https://arxiv.org/abs/1802.04240