My Explorations Into Deep Learning: May 2019

May 8, 2019

On the requirements for Automated Source Code writing

Continuing the previous discussion:

https://ilovedeeplearning.blogspot.com/2017/08/internal-representation-of-source-code.html

Here are the basic requirements (partial list) for implementing an automated source code writing system:

1. Common abstract representation: Be it C, Python, Lua or Perl, be able to convert these different languages construct into a common internal representation of objects + transformation/operations + temporal sequence: let's call it "code snippets", or "IR" for intermediate representation.

2. All logic manipulation and transformation will be at the IR level. Debugging of programming errors will also be logically derived at this level.

3. Compositing individual code snippets into a larger entity - call it "code segment" - following many generic rules in programming (essentially defining the concepts of "constraints", time ordering, etc). Sometimes there may not be any rules available for code snippets composition, and therefore a more intermediary form of "code snippets" may have to be identified to enable composition.

4. Able to identify abstract requirements based on one overall objective (eg, sort the list of objects based on the values of a certain field).

5. Able to convert the abstract requirements into actual individual code snippets.

6. Dataset A: Language-dependent programs to IR conversion. Able to convert lots of existing programs and identify all the canonical "code snippets". (in the case of C, it can be converted into LLVM IR, and the IR is the intermediate canonical representation - or "code snippets" - hardware independent, and other languages independent).

7. Dataset B: With (6) being successful, the data for learning how to program will be available - and thus next is to learn all the "illegal" syntax, illegal semantics and meanings, etc. These data are generated by the actual compilers compiling a piece of illegitimately-written software.

8. The final output will be the language dependent part: C, C++, Python, etc. So given the IR, how to translate into the different languages implementing the program.

May 7, 2019

Attention is all you need attentional neural network models – Łukasz Kaiser

https://www.youtube.com/watch?v=rBCqOTEfxvg

https://en.wikipedia.org/wiki/Winograd_Schema_Challenge

https://arxiv.org/pdf/1801.09797.pdf

https://arxiv.org/pdf/1706.05137.pdf

https://ai.googleblog.com/2017/06/accelerating-deep-learning-research.html

https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html

http://jalammar.github.io/illustrated-transformer/

May 4, 2019

BERT/ELMO/Transformer

https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py

https://gluebenchmark.com/tasks

key questions:

1. what so unique about bidirectional processing?

2. is bidirectional lead to confusion or messing up with the weights of previous learned materials?

3. what is the interpretations of each directions' processing?

4. what characteristics of thoughts or understanding can be emulated through directed or bidirectional neural networks?

5. How to learn via unsupervised learning the relationship between the different words? How to know if sentence phrased in different ways have the same meaning?

6. How to learn the changes of meanings when the words are reorganized? Or different tenses used?

7. How to learn the via pairing relationship the questions and its answer pairing?

8. How to learn the similar meaning attached to different translation of the same sentence into different languages?

9. How to learn ordering concept: steps in the sequencing of ideas/concepts from one component to another, one time slice to another, causes and effects?

10. How to learn between different abstraction level of concepts: "class" vs "instantiation", "cars" vs "toyota" etc....one set of entities are just different instantiation of the generic class.

11. Answering the basic classes of question: HOW, WHY, WHEN, WHERE, WHAT and SO....

What is BERT?

Key Innovation part: MLM and next senetence prediction.

https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/

https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270

This diagram best described the innovation #1 for BERT: MLM training:

And next sentence prediction is here:

https://medium.com/@adityathiruvengadam/transformer-architecture-attention-is-all-you-need-aeccd9f50d09?fbclid=IwAR2W_SN8JWzht8Xk5bGIZkgcD4MtyXXZLArEs__YagalLJwGxemrpn1B_jY

Further innovation needed:

Prediction of next ideas/word is an indicator of "intelligence" or "understanding". But there may be many different variation of the next word or concept, and after training, they may or may not be combined together. If yes then it is because they expressed the same idea, if not then it is because the ideas are distinctly different.

After you have next word/next sentence, then how about multiple subsequent words or sentences, possibly with ordering requirement entailed - how is it possible to cascade these operation?

If you cascade them as sequential manner through deeper and deeper neural network, will it be possible to consider different neural architecture, or implementing some skip connections, or possibly creating randomized forgetting, or dropping of all weight??

Ie, A-> prediict B, and predict B1, B2 etc.....and then how about doing backward induction/deduction to predict A?

Masking can be treated as a form of skip connection, or forgetting, or regularization via zeroing the weights. So how about randomized masking?