Analyzing a sample source code from NIST's SAMATE Juliet source code:
And after compiling it using gcc's GIMPLE mechanism to print out only the core structure:
3. The "potential_flaw() comment will be removed. It will be used as training instructions: position of comment (with respect to the start of the program) + nature of software vuln.
References:
http://www.coli.uni-saarland.de/~claytong/posters/EMNLP16_Poster.pdf
https://www.inf.ed.ac.uk/teaching/courses/asr/2015-16/asr11-nnlm.pdf
http://mt-class.org/jhu/slides/lecture-nn-lm.pdf
https://stats.stackexchange.com/questions/158834/what-is-a-feasible-sequence-length-for-an-rnn-to-model
https://www.tensorflow.org/tutorials/recurrent#truncated-backpropagation
Exploring the Limits of Language Modeling
https://stackoverflow.com/questions/44478272/how-to-handle-extremely-long-lstm-sequence-length
https://arxiv.org/pdf/1602.02410.pdf
N-GRAM LANGUAGE MODELING USING RECURRENT NEURAL NETWORK ESTIMATION
https://arxiv.org/pdf/1703.10724.pdf
1. Us the above GIMPLE source course the entire program will be streamed into the LSTM network.
2. The caller-caller relationship will be ignored.
2. The caller-caller relationship will be ignored.
3. The "potential_flaw() comment will be removed. It will be used as training instructions: position of comment (with respect to the start of the program) + nature of software vuln.
The "potential_flaw() is used as the output for "supervised learning", and the "group of source code" it is supposed to be associated (in the sense of security bugs) will be a few lines before and after the potential_flaw().
4. The nature of the "classification", is to be described literally by the comments in the "potential_flaw()".
References:
http://www.coli.uni-saarland.de/~claytong/posters/EMNLP16_Poster.pdf
https://www.inf.ed.ac.uk/teaching/courses/asr/2015-16/asr11-nnlm.pdf
http://mt-class.org/jhu/slides/lecture-nn-lm.pdf
https://stats.stackexchange.com/questions/158834/what-is-a-feasible-sequence-length-for-an-rnn-to-model
https://www.tensorflow.org/tutorials/recurrent#truncated-backpropagation
Exploring the Limits of Language Modeling
https://stackoverflow.com/questions/44478272/how-to-handle-extremely-long-lstm-sequence-length
https://arxiv.org/pdf/1602.02410.pdf
N-GRAM LANGUAGE MODELING USING RECURRENT NEURAL NETWORK ESTIMATION
https://arxiv.org/pdf/1703.10724.pdf
No comments:
Post a Comment