Continuing the previous discussion:
Here are the basic requirements (partial list) for implementing an automated source code writing system:
1. Common abstract representation: Be it C, Python, Lua or Perl, be able to convert these different languages construct into a common internal representation of objects + transformation/operations + temporal sequence: let's call it "code snippets", or "IR" for intermediate representation.
2. All logic manipulation and transformation will be at the IR level. Debugging of programming errors will also be logically derived at this level.
3. Compositing individual code snippets into a larger entity - call it "code segment" - following many generic rules in programming (essentially defining the concepts of "constraints", time ordering, etc). Sometimes there may not be any rules available for code snippets composition, and therefore a more intermediary form of "code snippets" may have to be identified to enable composition.
4. Able to identify abstract requirements based on one overall objective (eg, sort the list of objects based on the values of a certain field).
5. Able to convert the abstract requirements into actual individual code snippets.
6. Dataset A: Language-dependent programs to IR conversion. Able to convert lots of existing programs and identify all the canonical "code snippets". (in the case of C, it can be converted into LLVM IR, and the IR is the intermediate canonical representation - or "code snippets" - hardware independent, and other languages independent).
7. Dataset B: With (6) being successful, the data for learning how to program will be available - and thus next is to learn all the "illegal" syntax, illegal semantics and meanings, etc. These data are generated by the actual compilers compiling a piece of illegitimately-written software.
8. The final output will be the language dependent part: C, C++, Python, etc. So given the IR, how to translate into the different languages implementing the program.
No comments:
Post a Comment