Fluid Attacks logo
Contact Us
Young hacker smiling
Zero false positives

Expert intelligence + effective automation

Contact logo Contact Us
GET A DEMO
Parsing code. Photo by Markus Spiske on Unsplash: https://unsplash.com/photos/hvSr_CVecVI

Parse and Conquer

Why Asserts uses Parser combinators
Why Asserts uses parser combinators as its main static code analysis tool? Asserts is our vulnerability closure checker engine, which tests whether vulnerabilities found manually are still open. Many of those functions test for poor programming practices, with aid of parser combinators.
User icon Rafael Ballestas
Folder icon attacks
Calendar icon

2019-05-07



As you might have noticed, at Fluid Attacks we like parser combinators, functional programming, and, of course, Python. In the parser article, I showed you the essentials of Pyparsing and we also showed how to leverage its power to find SQL injections in a PHP application. Here we will extend those essentials to show you how we use parser combinators in Asserts, our vulnerability closure checker engine. Feel free to refer to that article for more details on how Pyparsing works, though I’ll try my best to leave no keyword used here without explanation.

Parser combinators are particularly useful when the expressions to be analyzed are complex. You don’t really need parser combinators to break up email addresses into usernames and domains. For that, a regular expression will suffice. However, when what you need to do is more involved, such as looking for SQL injections, or analyzing source code for poor programming practices as per our Rules and recommendations, parsers are our tool of choice. This is one of the tasks at which Asserts excels, determine whether a vulnerability, found by one of our analysts in the source code, is still open by deeply searching in it, with the aid of parser combinators. Let us see how that works.

Suppose an analyst was auditing some Java source code and found out that it uses the insecure DES cipher to mask the information of a bank transaction in the file transactions.java. DES is insecure due to its small 56-bit key size, which could theoretically be brute-forced in 6 minutes. In order to report the vulnerability, they would write a script that automatically checks if the vulnerability is still there:

Asserts script expl.py to check DES usage
from fluidasserts.lang import java

FILE = 'transactions.java'
java.uses_des_algorithm(FILE)

Simple, right? Just running that script tells you whether the inscure DES algorithm is used in that particular file. Or you can even point it at an entire directory, and Asserts will test every Java source file for DES usage. But what’s behind the curtains? Since Asserts is now open-source, anyone can actually check out what this function does:

java.py. See Gitlab for rest of code.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
from pyparsing import (CaselessKeyword, Word, Literal, Optional, alphas, Or,
                       alphanums, Suppress, nestedExpr, javaStyleComment,
                       SkipTo, QuotedString, oneOf)

from fluidasserts.helper import lang

...

def uses_des_algorithm(java_dest: str, exclude: list = None) -> bool:
    """
    Check if code uses DES as encryption algorithm.

    See `REQ.150 <https://fluidattacks.com/web/es/rules/149/>`_.

    :param java_dest: Path to a Java source file or package.
    """
    method = 'Cipher.getInstance("DES")'
    tk_mess_dig = CaselessKeyword('cipher')
    tk_get_inst = CaselessKeyword('getinstance')
    tk_alg = Literal('"') + CaselessKeyword('des') + Literal('"')
    tk_params = Literal('(') + tk_alg + Literal(')')
    instance_des = tk_mess_dig + Literal('.') + tk_get_inst + tk_params

    result = False
    try:
        matches = lang.check_grammar(instance_des, java_dest, LANGUAGE_SPECS,
                                     exclude)
        if not matches:
            show_unknown('Not files matched',
                         details=dict(code_dest=java_dest))
            return False
    except FileNotFoundError:
        show_unknown('File does not exist', details=dict(code_dest=java_dest))
        return False
    for code_file, vulns in matches.items():
        if vulns:
            show_open('Code uses {} method'.format(method),
                      details=dict(file=code_file,
                                   fingerprint=lang.
                                   file_hash(code_file),
                                   lines=", ".join([str(x) for x in vulns]),
                                   total_vulns=len(vulns)))
            result = True
        else:
            show_close('Code does not use {} method'.format(method),
                       details=dict(file=code_file,
                                    fingerprint=lang.
                                    file_hash(code_file)))
    return result

Notice how, in the first few lines above (17-22), the parser instance_des is built from smaller parsers such as tk_mess_dig, which matches the single keyword "Cipher", but also any variations in case, should they happen. Hence, CaselessKeyword. Literal forgoes any such assumption: if its double quotes, they must be there, exactly. Pyparsing also takes care of handling white space by overloading the + operator to mean "followed, possibly with some whitespace in between". Building a regex to match the same would be perhaps more compact, but never as readable or maintainable. This is one of the many advantages of parser combinators over regular expressions.

Next, this parser is passed along with the other required parameters to the function check_grammar in the lang module (more on that later). This function should return the matches in said file for the built parser. Thus the actual matching code can be reused. If there are matches, that means the vulnerability is open, hence Asserts will show_open a message like this:

Asserts open vulnerability message
Figure 1. Asserts open vulnerability message

Otherwise it shows a similar message, only with less alarming colors.

So, what does the lang module do with the combined parsers and the file? More parsing! It tests whether the path is a single file or a directory, whether it has the correct extension, but really all the meat is in the _get_match_lines_ method. It parses the text in the given source code file, to find out if those lines are comments, i.e., not functional code, so as to skip them. In the future, Asserts will be able to use this comment-code discrimination in order to find lines of code which were commented out and later abandoned, which might pose the risk of unpredictable behavior in the application if carelessly uncommented, depending on who has access to the code.

From fluidasserts.helper.lang module. See full code on Gitlab.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def _get_match_lines(grammar: ParserElement, code_file: str,  # noqa
                     lang_spec: dict) -> List:  # noqa
    """
    Check grammar in file.

    :param grammar: Pyparsing grammar against which file will be checked.
    :param code_file: Source code file to check.
    :param lang_spec: Contains language-specific syntax elements, such as
                       acceptable file extensions and comment delimiters.
    :return: List of lines that contain grammar matches.
    """
    with open(code_file, encoding='latin-1') as file_fd:
        affected_lines = []
        counter = 0
        in_block_comment = False
        for line in file_fd.readlines():
            counter += 1
            try:
                if lang_spec.get('line_comment'):
                    parser = ~Or(lang_spec.get('line_comment'))
                    parser.parseString(line)
            except ParseException:
                continue
            if lang_spec.get('block_comment_start'):
                try:
                    block_start = Literal(lang_spec.get('block_comment_start'))
                    parser = SkipTo(block_start) + block_start
                    parser.parseString(line)
                    in_block_comment = True
                except (ParseException, IndexError):
                    pass

                if in_block_comment and lang_spec.get('block_comment_end'):
                    try:
                        block_end = Literal(lang_spec.get('block_comment_end'))
                        parser = SkipTo(block_end) + block_end
                        parser.parseString(line)
                        in_block_comment = False
                        continue
                    except ParseException:
                        continue
                    except IndexError:
                        pass
            try:
                results = grammar.searchString(line, maxMatches=1)
                if not _is_empty_result(results):
                    affected_lines.append(counter)
            except ParseException:
                pass
    return affected_lines

After testing if the code we’re looking at is functional or not, it is simply a matter of invoking the searchString method from PyParsing, which, as its name implies, searches the given string for matches of the given parser. The module has a few more tricks up its sleeve, such as turning the parsing search results into pretty strings and parsing chunks of lines of code. All that, again, with the help of parser combinators.

The most important takeaway from looking at the source code, and what lies behind it, of this single function, is that using parser combinators in Asserts allows us not only to have readable, maintainable code for our own use and of others, but also for this code to be easily extensible and reusable. PyParsing, due to its object-oriented interface, its clear naming conventions, and the fact that coding parsers in it is just pythonic, allows our team to write and rewrite static code analysis tools that change as do its users' needs.

That just wouldn’t be possible with regular expressions. Regexes must be tailor-made, carefully designed with one specific objective in mind. One application. So that regex that might search for conditionals without default actions in Javascript is bound to be useless for the same purpose in a different language. Such is not the case with parser combinators, as most code is easily modified or reusable. Also, nesting searches as we did above (parsing before parsing to know if we’re inside a block comment) will definitely require uber-complex regular expressions, if it is possible at all.

Just like uses_des_algorithm above, Asserts packs convenient functions to test for many of our requirements or recommendations for secure coding, for several different languages, and growing daily. Pyparsing enhances a significant part of our static code analysis tools in a way that, as mentioned earlier, with regexes would only be ad hoc or impossible to maintain.

References

  1. Asserts documentation.

  2. McGuire, Paul (2008). Getting started with pyparsing. O’Reilly shortcuts.


Author picture

Rafael Ballestas

Mathematician

with an itch for CS



Related




Service status - Terms of Use