Jul 8, 2019

BERT/MTL/MASS/ELMO

BERT:

https://arxiv.org/pdf/1810.04805.pdf

http://jalammar.github.io/illustrated-bert/



Sentencepiece/Wordpiece:

https://github.com/google/sentencepiece

https://www.reddit.com/r/MachineLearning/comments/axkmi0/d_why_is_code_or_libraries_for_wordpiece/

https://stackoverflow.com/questions/55382596/how-is-wordpiece-tokenization-helpful-to-effectively-deal-with-rare-words-proble/55416944

https://arxiv.org/pdf/1609.08144.pdf

ERNIE:

http://research.baidu.com/Blog/index-view?id=113

MTL:

https://arxiv.org/pdf/1901.11504.pdf

MASS:

https://www.microsoft.com/en-us/research/blog/introducing-mass-a-pre-training-method-that-outperforms-bert-and-gpt-in-sequence-to-sequence-language-generation-tasks/?fbclid=IwAR2VNTOhAiIvTrAR5AN-trWbjNlRvcGH6rlIiNTWajiHsMIGOEbWKKT6_h0

https://www.microsoft.com/en-us/research/publication/mass-masked-sequence-to-sequence-pre-training-for-language-generation/

TRANSFORMER:

https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/ 

https://jalammar.github.io/illustrated-transformer/

Jun 5, 2019

nvcc & PTX: its internal operation and how to generate it


Go to any of the CUDA examples:


And edit the Makefile to modify two things:   NVCCFLAGS and gpuarch codes.

First is the NVCCFLAGS:  append a "-ptx" to the end. 


And the SMS code, which enumerate through all the different GPU architecture, just modify it to a single GPU architecture (for eg, 30 is used here):




And then "make" will generate:

"/home/tthtlc/cuda-10.0"/bin/nvcc -ccbin g++ -I../../common/inc  -m64 -ptx     -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -o dxtc.o -c dxtc.cu"/home/tthtlc/cuda-10.0"/bin/nvcc -ccbin g++   -m64 -ptx       -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -o dxtc dxtc.o
mkdir -p ../../bin/x86_64/linux/release
cp dxtc ../../bin/x86_64/linux/release

Then edit the dxtc.o file, which consist of all the PTX instructions:





PTX have a unique features in its memory model:


May 8, 2019

On the requirements for Automated Source Code writing

Continuing the previous discussion:

https://ilovedeeplearning.blogspot.com/2017/08/internal-representation-of-source-code.html

Here are the basic requirements (partial list) for implementing an automated source code writing system:

1.   Common abstract representation:   Be it C, Python, Lua or Perl, be able to convert these different languages construct into a common internal representation of objects + transformation/operations + temporal sequence:   let's call it "code snippets", or "IR" for intermediate representation.

2.   All logic manipulation and transformation will be at the IR level.   Debugging of programming errors will also be logically derived at this level.

3.   Compositing individual code snippets into a larger entity - call it "code segment" - following many generic rules in programming (essentially defining the concepts of "constraints", time ordering, etc).    Sometimes there may not be any rules available for code snippets composition, and therefore a more intermediary form of "code snippets" may have to be identified to enable composition.

4.   Able to identify abstract requirements based on one overall objective (eg, sort the list of objects based on the values of a certain field).

5.   Able to convert the abstract requirements into actual individual code snippets.

6.   Dataset A:   Language-dependent programs to IR conversion.   Able to convert lots of existing programs and identify all the canonical "code snippets".   (in the case of C, it can be converted into LLVM IR, and the IR is the intermediate canonical representation - or "code snippets" - hardware independent, and other languages independent).

7.   Dataset B:   With (6) being successful, the data for learning how to program will be available - and thus next is to learn all the "illegal" syntax, illegal semantics and meanings, etc.   These data are generated by the actual compilers compiling a piece of illegitimately-written software.

8.   The final output will be the language dependent part:  C, C++, Python, etc.   So given the IR, how to translate into the different languages implementing the program.

May 4, 2019

BERT/ELMO/Transformer

https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py


key questions:

1.  what so unique about bidirectional processing?

2.  is bidirectional lead to confusion or messing up with the weights of previous learned materials?

3.  what is the interpretations of each directions' processing?

4.  what characteristics of thoughts or understanding can be emulated through directed or bidirectional neural networks?

5.  How to learn via unsupervised learning the relationship between the different words?      How to know if sentence phrased in different ways have the same meaning?

6.  How to learn the changes of meanings when the words are reorganized?   Or different tenses used?

7.  How to learn the via pairing relationship the questions and its answer pairing?

8.  How to learn the similar meaning attached to different translation of the same sentence into different languages?

9.  How to learn ordering concept:   steps in the sequencing of ideas/concepts from one component to another, one time slice to another, causes and effects?

10.   How to learn between different abstraction level of concepts:   "class" vs "instantiation", "cars" vs "toyota" etc....one set of entities are just different instantiation of the generic class.

11.   Answering the basic classes of question:   HOW, WHY, WHEN, WHERE, WHAT and SO....

What is BERT?

Key Innovation part:   MLM and next senetence prediction.





This diagram best described the innovation #1 for BERT: MLM training:


And next sentence prediction is here:



Further innovation needed:

Prediction of next ideas/word is an indicator of "intelligence" or "understanding".   But there may be many different variation of the next word or concept, and after training, they may or may not be combined together.   If yes then it is because they expressed the same idea, if not then it is because the ideas are distinctly different.

After you have next word/next sentence, then how about multiple subsequent words or sentences, possibly with ordering requirement entailed - how is it possible to cascade these operation?

If you cascade them as sequential manner through deeper and deeper neural network, will it be possible to consider different neural architecture, or implementing some skip connections, or possibly creating randomized forgetting, or dropping of all weight??

Ie, A-> prediict B, and predict B1, B2 etc.....and then how about doing backward induction/deduction to predict A?

Masking can be treated as a form of skip connection, or forgetting, or regularization via zeroing the weights.   So how about randomized masking?   

Apr 27, 2019

How to enable line-by-line python debugging in jupyter + Anaconda environment or Google Colab environment?

First, Anaconda has the default python version running at 3.7 but we will need 3.6




First trying out this jupyter program:

https://pytorch.org/tutorials/beginner/nn_tutorial.html

or as captured here:

https://gist.github.com/tthtlc/42e68c773988437c1a4158d55f26f819

First look for the "set_trace()" command inside here:


This is where you enable the jupyter debugging.

If you are in a Google Colab environment, you just need to insert:

from IPython.core.debugger import set_trace
set_trace()

(for example see this:


After running "set_trace()" ipython debugger will stop at the next line and you can enter "help":

ipdb> help

Documented commands (type help ):
========================================
EOF    cl         disable  interact  next    psource  rv         unt   
a      clear      display  j         p       q        s          until 
alias  commands   down     jump      pdef    quit     source     up    
args   condition  enable   l         pdoc    r        step       w     
b      cont       exit     list      pfile   restart  tbreak     whatis
break  continue   h        ll        pinfo   return   u          where 
bt     d          help     longlist  pinfo2  retval   unalias  
c      debug      ignore   n         pp      run      undisplay

Miscellaneous help topics:
==========================
exec  pdb
Now entering "n" to continue execution:
ipdb> n
> d26dbfac9245>(16)()
     14         set_trace()
     15         start_i = i * bs
---> 16         end_i = start_i + bs
     17         xb = x_train[start_i:end_i].to(dev)
     18         yb = y_train[start_i:end_i].to(dev)

ipdb> 
> d26dbfac9245>(17)()
     15         start_i = i * bs
     16         end_i = start_i + bs
---> 17         xb = x_train[start_i:end_i].to(dev)
     18         yb = y_train[start_i:end_i].to(dev)
     19         pred = model(xb)

ipdb> 
> d26dbfac9245>(18)()
     16         end_i = start_i + bs
     17         xb = x_train[start_i:end_i].to(dev)
---> 18         yb = y_train[start_i:end_i].to(dev)
     19         pred = model(xb)
     20         loss = loss_func(pred, yb)


Conflict errors while pip install

When you execute this:


You get "six" module not found error.   So it is either you "pip install" or "conda install" the "six" module.

It it often that "conda" has packages that conflict those from "pip install" - assuming that "pip" is under the "conda" base install directory ($HOME/ana350):

First this is the location of conda:

which conda
/home/tthtlc/ana350/condabin/conda    (or $HOME/ana350/condabin/conda)

And so under its bin directory you can find the pip and python command:

ls -al ~/ana350/bin/pip
/home/tthtlc/ana350/bin/pip

But after creating an environment called "tf36" you get the "pip" and "python" in a subdirectory of its own under conda base directory:

which pip
/home/tthtlc/ana350/envs/tf36/bin/pip

First when you "pip install" you get the "conflict error":


So next you will try "conda install":


SUCCESS.

This means that "six" module does indeed exists inside conda base installation, which is not always the case, and if you try to pip install it will have conflict.

And running the bash shell again:


No more error.