Fluid Attacks logo
Contact Us
Young hacker smiling
Zero false positives

Expert intelligence + effective automation

Syringe ready to inject bad stuff. Credit: https://pixabay.com/es/photos/jeringa-healthcare-aguja-medicina-417786/

Tainted love

It's all about sanitization
A brief description of static and dynamic taint analysis or taint checking. Essentially, you can't let unsanitized user input from a source arrive at a security-critical sink without passing through sanitization. Taint checkers make sure that happens.
User icon Rafael Ballestas
Dialog box icon Comment
Folder icon attacks
Calendar icon

2019-08-30



In the several past articles, we have briefly touched on the concept of taint analysis. In this article, we would like to fill in the gaps which maybe have been raised by these careless references. This concept is intimately linked with code representations used by some of the ML-powered vulnerability detectors we have presented before, and, on the other hand, is well complemented by symbolic execution, so we deemed it necessary to amplify this concept a bit.

Most of the OWASP top 10 web application vulnerabilites arise due to the fact that an attacker can inject some code into the application’s inputs which is then used to perform some action in the server. The classic example for this is the SQL injection.

For example, this page from bWAPP has an input where the user is supposed to write a movie name, which should contain only alphanumeric characters:

bWAPP movie search
Figure 1. bWAPP movie search

Well, maybe some movies will have the occasional dash or question mark. Here is what this page does with the user input:

Adapted from bWAPP code
$title = $_POST["title"];
$sql = "SELECT * FROM movies WHERE title LIKE '%" . $title . "%'";
$recordset = mysql_query($sql, $link);

The input is taken from the POST request and pasted right into a SQL query which is performed right then and there and shown to the user in a table. If instead of an actual movie name and attacker writes this in the box:

%' UNION SELECT id, login, password, email, secret,
activated, admin FROM users;#

Then the SQL query would become this:

SELECT * FROM movies WHERE title LIKE '%%'
  UNION SELECT id, login, password, email, secret,
  activated, admin FROM users;

Then all the movies' information will be fetched, but also the users' login information:

bWAPP SQLi
Figure 2. Tainted!

The user input has thus becomed tainted, and hence the SQL query is now tainted, too. In the context of taint checking the $title input above is called a source, which is from whence the bad input and thus the possible injection is coming from.

What is, in the end, the problem with tainted inputs? Depends on what is done with them in the end, i.e., at the sink, where the input is consumed. As we have stated many times in past articles illustrating other injection- or taint-style (as we may call them now with every right), this can be avoided with input validation (check that the input is valid) and sanitization (fix it if is not right by removing dangerous characters or only allow known good ones).

Taint analysis or taint checking consists, thus, of identifying all sources of potentially dangerous user input, all security-critical sinks such as system calls, process interactions, invoking shells, altering files, etc, and figure out if there is any sanitizers between each source-sink pair:

Taint analysis depiction
Figure 3. Taint analysis diagram via Coseinc.

Then, depending on whether this taint analysis is static (code) or _dynamic (runtime), the taint-checking tool should either report to the developer so they can fix the issues or avoid the execution of security-critical operations at the sink level based on data that has been tainted, respectively.

Dynamic taint analysis: the Perl approach

The Perl programming language is well-known for having used taint analysis since its early days. At least as early as 1989. It is one of its main built-in security features, as can be seen by browsing perlsec.

The Perl approach to taint checking is simple:

  • Treat every input as tainted.

my $name = $cgi->param("name");  # Get the $name from user input, tainted!
  • Any line of code that contains a tainted variable implies that any assigned variables in that line are also tainted.

my $full = $name."@fluidattacks.com";  # Now $full is also tainted
  • A tainted variable can only be untainted by _laundering it via regular expressions:

if ($full =~ /^([-\@\w.]+)$/) {
    $full = $1;                    # $full now untainted
}
  • Any tainted variable cannot be used in any risky command, such as invoking a sub-shell, opening files, interacting with system processes, etc. That’s the real run-time protection. Thus the following SQL query would fail:

$dbh->execute("SELECT * FROM users WHERE email = '$full';");

All a user needs to do in order to enable taint mode in Perl is add the -T switch when running from the command line or in the case of executable scripts, such as CGI scripts (a common use case for Perl), add that switch to the shebang:

#!/usr/bin/perl -T

It is worth noting that, since Perl is an interpreted scripting language, this taint mode is only a run-time protection which might not be bulletproof and also might block legitimate requests.

Static analysis: the PyT approach

PyT is a static taint-checking tool for detecting security vulnerabilities in Python web applications. More specifically it was designed with Flask applications in mind. It was developed as a Master’s thesis project by Stefan Micheelsen and Bruno Thalman at Aalborg University.

We chose this as an example of static taint checking not for its results, because I got 0 vulnerabilities in our own projects and, curiously, in the tiny but bug-ridden Damn Small Vulnerable Web application, but rather, for the very well-written and simple enough to understand thesis that explains PyT inner workings and hence, the theory behind static taint analysis.

As you might expect by now, taint analysis is linked to the flow of information inside the program, which can be more accurately by means of the Control Flow Graph of the program. They use this as basis for a mathematical model known as a lattice, which has an interesting property: all monotone (steadily increasing or decreasing) functions defined on them have a fixed point, i.e., they eventually stand still. As it happens, code reachability and data flow can be represented in terms of equations on this lattice. These are guaranteed to have a solution by the fixed point property above. Here is a more friendly depiction of the process, in the author’s own drawings:

PyT process
Figure 4. Overview of PyT’s process from [1]

The final step, of course, is reporting, so that the developer might take the appropriate measures to fix the taint vulnerabilities.


The idea in both incarnations of taint analysis is simple, but powerful: figure out the attack surface and make sure the tainted input can never reach the treasure. Following this simple idea will surely lead to more secure code. But if not sure, you can always give a taint checking tool a try.

References

  1. Stefan Micheelsen, Bruno Thalman. PyT: A Static Analysis Tool for Detecting Security Vulnerabilities in Python Web Applications. MSc thesis


Author picture

Rafael Ballestas

Mathematician

with an itch for CS



Related




Service status - Terms of Use