PHP Aspis: Using Partial Taint Tracking to Protect Against Web Injection Attacks

The most common types of web application attacks involve code injection: Javascript that is embedded into the generated HTML (XSS), SQL that is part of a generated database query (SQL Injection) or scripts that are executed on the web server (Shell Injection and Eval Injection). These attacks commonly exploit the web applications trust in user-provided data. If user-provided data are not properly filtered and sanitised before use, an attacker can trick the application into generating arbitrary HMTL responses and SQL queries, or even execute user-supplied, malicious code.
Past research has suggested runtime taint tracking as an effective solution to prevent injection exploits. In this approach, the origin of all data within the application is tracked by associating meta-data with strings. When an application executes a sensitive operation, such as outputting HTML, these meta-data are used to escape potentially dangerous values. The most efficient implementation of taint tracking is within the language runtime. Runtime taint tracking is not widely used in PHP, however, because it relies on custom runtimes that are not available in production environments.

Overview

PHP Aspis is a tool that performs taint tracking only on on parts of a codebase by rewriting source code to explicitly track and propagate the origin of characters in strings. PHP Aspis augments values to include taint meta-data and rewrites PHP statements to operate in the presence of taint meta-data and propagate these correctly. PHP Aspis then uses the taint meta-data to automatically sanitise user-provided untrusted values and transparently prevent injection attacks. Overall, PHP Aspis does not require modifications to the PHP language runtime or to the web server.
 
The figure on the side presents an overview of a PHP application transformed with PHP Aspis. Overall, PHP Aspis' design focuses on three different aspects.

PHP Aspis Overview

1. Taint meta-data

PHP Aspis uses character-level taint tracking, i.e. tracks the taint of each string character individually. PHP Aspis can track multiple independent and user provided taint categories. A taint category is a generic way of defining how an application is supposed to sanitise data and how PHP Aspis should enforce that the application always sanitises data before they are used. Each taint category is defined as a set of sanitisation functions and a set of guarded sinks. A sanitisation function is called by the application to transform untrusted user data so that they cannot be used for a particular type of injection attack.Guarded sinks are functions that protect data flow to sensitive sink functions. When a call to a sink function is made, PHP Aspis invokes the guard with references to the parameters passed to the sink function.

PHP Aspis relies on PHP arrays to store taint meta-data by enclosing the original values. For example, a tainted string "Yiannis" will be stored as array("Yiannis", 0=>true).

2. Taint tracking code

In each transformed PHP script, PHP Aspis inserts initialisation code that scans the superglobal arrays to identify the HTTP request data, replaces all submitted values with their Aspis-enclosed counterparts and marks user submitted values as fully tainted. As a result, all initial values are Aspis-protected in the transformed script. Then, all program statements and expressions are transformed to operate with Aspis-protected values, propagate their taint correctly and return Aspis-protected values. For example, the function AspisConcat() replaces all operations for string concatenating (e.g. double quotes or the concat operator .) and returns an Aspis-protected result. Control statements are similarly transformed to access the enclosed original values directly.

Built-in PHP functions cannot operate on Aspis-protected values and do not propagate taint meta-data. PHP Aspis uses interceptor functions to intercept calls to them and attach wrappers for taint propagation. Dynamic PHP features such as variable variables and runtime code generation are also supported.

3. Non tracking transformations

The taint-tracking transformations generate taint tracking code that handles Aspis-protected values. For example, a tracking function that changes the case of a string parameter $p expects to find the actual string in $p[0]. Such a function can no longer be called directly from non-tracking code with a simple string for its parameter. Instead, PHP Aspis requires additional transformations to intercept this call and automatically convert $p to an Aspis-protected value, which is marked as fully untainted.

Compatibility transformations make changes to both tracking and non-tracking code. These changes alter the data that are exchanged between a tracking context and a non-tracking context, i.e. data exchanged between functions, classes and code in the global scope. For example, PHP Aspis transforms all cross-context function calls: a call from a tracking to a non-tracking context has its taint removed from parameters and the return value Aspis-protected again. Similar support is provided for global variables, accesses to the superglobal arrays and for code generated at runtime. 

Publications

The USENIX WebApps paper that contains more details on the design PHP Aspis can be found here.

Source Code

The PHP Aspis source code is available on GitHub. You can try PHP Aspis and see how it performs with your PHP applications. Detailed instructions on how to use it are included with the source. Please, don't forget to share your experiences and/or suggestions with us.

People

Acknowledgements

This work was supported by grants EP/F042469 and EP/F044216 (``SmartFlow: Extendable Event-Based Middleware'') from the UK Engineering and Physical Sciences Research Council (EPSRC).