Sep 28, 2017

Source code representation

Analyzing a sample source code from NIST's SAMATE Juliet source code:



And after compiling it using gcc's GIMPLE mechanism to print out only the core structure:


1.   Us the above GIMPLE source course the entire program will be streamed into the LSTM network.

Image result for input and output of LSTM


2.   The caller-caller relationship will be ignored.

3.   The "potential_flaw() comment will be removed.   It will be used as training instructions:   position of comment (with respect to the start of the program) + nature of software vuln.


The "potential_flaw() is used as the output for "supervised learning", and the "group of source code" it is supposed to be associated (in the sense of security bugs) will be a few lines before and after the potential_flaw().

4.   The nature of the "classification", is to be described literally by the comments in the "potential_flaw()".

No comments: