Sep 5, 2018

Source code pattern learning via Machine Learning

Questions:

1.   what are the motivations for finding patterns?

finding code clones, finding bugs pattern, debugging software, writing and composing new program from existing code fragments, identify the differences among different program, identify equivalent machine/code translations for virtualization  etc. 
2.   what are the different ways the same objectives can be achieved programatically?

canonical reduction into equivalent classes, searching similar pattern trees, 
3.   Given an existing program - how to work out the high level objectives of the program?

identifying characteristics by classification into algorithmic classes:   sorting, reverse, copying, transferring, xor-ing, adding/subtraction/division/modulo computation, insertion in strings, deletion from strings etc.
4.   Given the technique of solving a problem (in high level terms) - how to work out the program to achieve that?   (reverse of problem 3)
5.   If completing a program can be equivalent to travelling salesman problem what are the different probabilistic method to select different paths/solutions/techniques to achieve the goal?

Analysing equivalency via reordering of high level primitives and associating costs components to it, step-wise reasoning from adjency matrix of problem in graphical form etc.
6.   Code simulation/emulation dynamically.   This is equivalent to language translation.
 
7.   What are the minimum primitives, or simplest algorithm needed to solve a programming problem?

identifying canonical primitives for any program (in term of LLVM IR, generated from any programming languages).
7.   What is the best canonical representation (LLVM?) for understanding/representing different program?
https://arxiv.org/abs/1602.05110

No comments: