Sep 16, 2017

MAST: A review and how it can be used for security bugs

https://mast-group.github.io/

These reviews are solely my opinions and understanding (+misunderstanding).

Interesting Sequence Mining:   

A good example of this is "memcpy()" + "ret", and if length is violated, heap or stack overflow can result.   So this kind of sequence will be interesting for security.   But human input is needed to identify the usefulness of this sequence.   But is it possible to discover other sequence, based on frequency of occurrence?

Probabilistic API mining

This will rank the same problem as before:   How discover probabilistically significant API or functions:   frequencies being called by others, or frequencies of calling other functions, or distribution pattern of the API being used at different locations of the program?

Naturalize:  The input of this are all the tokens of the C source codes, and thus, after assimilating enough strings, it is possible to see generalized patterns in naming convention.   Associating the names with its surrounding neighbors and the different interactions are necessary to achieve this characterization.   For example, in a for loop(), it is always seen to use "i" or "j" as the loop counter.   So "loop" + counter are the neighboring concepts tagged to "for()" statement.    Many other canonical statements exists:   malloc(), while(), etc.   Some are C, and others are system call, and yet others are just localized functions etc.   Ordering of the statements + calling each other can be used to draw the associative diagram linking all these together.

Extreme Source Summarization

Sometimes security bugs are of the form that the usage pattern straddle across multiple functions.   In this case, some kind of summary will be good to have, to point back at the source code usage.

No comments: