PHP Security Triathlon: Filtering, Validation, and Escaping (Part 1)

When developing applications, we generally have a convention: Don’t trust any data from data sources that are not under our control. For example these external sources:

$_GET
$_POST
$_REQUEST
$_COOKIE
$argv
php://stdin
php://input
file_get_contents()
Remote database
Remote API
Data from the client

All of these external sources may be attack vectors and may (intentionally or unintentionally) inject malicious data into PHP scripts. It’s easy to write a PHP script that accepts user input and then renders the output, but it takes some work to implement it securely. Here I use Chen Bianjin’s three-board axe as an introduction, and I will introduce three techniques: filtering input, verifying data, and escaping output.

1. Filter input

Filtering input means escaping or removing unsafe characters. Before the data reaches the storage layer of the application, it is important to filter the input data. This is the first line of defense. If the site’s comment box accepts HTML, users can add malicious comments <script> label, as follows:

<p>
This article is useful!
</p>
<script>windows.location.href="https://morioh.com";</script>

If this comment is not filtered, the malicious code will be stored in the database and then rendered in the web page. When the user visits this page, they will be redirected to a potentially unsafe phishing website (this attack has a more professional name: XSS attack). This simple example illustrates why we want to filter input data that we do not control. Usually the input data we want to filter includes HTML, SQL queries and user information.

HTML

We can use the htmlentities function provided by PHP to filter HTML. This function will convert all HTML tag characters (&, <,>, etc.) into corresponding HTML entities for safe rendering after the application storage layer is taken out. But sometimes we allow users to enter certain HTML elements, especially when entering rich text, such as pictures and links. However, we htmlentities cannot verify the HTML and cannot detect the character set of the input string, so we cannot achieve such a function.

<?php
$input = "<p><script>alert('Morioh.com');</script></p>"；
echo htmlentities($input, ENT_QUOTES, 'UTF-8');

htmlentities The first parameter represents the HTML string to be processed, the second parameter represents the single quotes to be escaped, and the third parameter represents the character set encoding of the input string.

The htmlentities opposite is the html_entity_decode method, which converts all HTML entities into corresponding HTML tags.

In addition, PHP also provides a similar built-in function htmlspecialchars, which is also used to convert HTML tag characters to HTML entities, but only limited characters can be converted (refer to the official document: https://www.php.net/manual/en/function.htmlspecialchars.php ), if you want to convert all characters or use the htmlentities method, it is worth mentioning htmlentities that htmlspecialchars there is a method, as well as the opposite htmlspecialchars_decode.

If you want to directly remove all HTML tags from the input string, you can use the strip_tags method.

If you need more powerful filtering of HTML functions, you can use the HTML Purifier library, which is a very robust and secure PHP library that is designed to filter HTML input using specified rules. In Laravel we can use the corresponding extension package to implement the filtering function.

SQL query

Sometimes applications must construct SQL queries based on input data. These data may come from the query string of the HTTP request or from the URI fragment of the HTTP request. If you are not careful, it may be used by malicious people for SQL injection attacks (Splicing SQL statements to destroy the database or obtain sensitive information). Many junior programmers might write code like this:

$sql = sprintf(
    'UPDATE users SET password = "%s" WHERE id = %s',
    $_POST['password'],
    $_GET['id']
);

This is very risky. For example, someone sends a request to HTTP by:

POST /user?id=1 HTTP/1.1
Content-Length: 17
Content-Type: application/x-www-form-urlencoded

password=abc”;--

This HTTP request will set the password of each user to abc, because many SQL databases treat-as the beginning of a comment, subsequent text is ignored.

Unfiltered input data must not be used in SQL queries. If you want to use input data in SQL queries, you must use PDO prepared statements (PDO is a built-in database abstraction layer in PHP that provides a unified interface for different database drivers). PDO prepared statements are a function provided by PDO, which can be used to filter external data, and then embed the filtered data into SQL statements to avoid the above SQL injection problems. In addition, the prepared statements can be compiled and run multiple times to reduce the impact on the system Occupy resources for higher execution efficiency. After PDO, we will focus on the database part later.

It is worth noting that many modern PHP frameworks use the MVC architecture pattern to encapsulate database operations into the Model layer. The bottom layer of the framework has already done a good job of avoiding SQL injection, as long as we use the methods provided by the model class to perform database operations You can basically avoid the risk of SQL injection.

Let’s take Laravel as an example to see how the bottom layer avoids SQL injection. Rewrite the above updatestatement, the code will be like this:

$id = $_GET['id'];
$password = $_POST['password'];
User::find($id)->update(['password'=>bcrypt($password)]);

Because the model class calls the builder method at the bottom, the builder (Illuminate\Database\Query\Builder) updatemethod will eventually be called:

public function update(array $values)
{
    $bindings = array_values(array_merge($values, $this->getBindings()));

    $sql = $this->grammar->compileUpdate($this, $values);

    return $this->connection->update($sql, $this->cleanBindings($bindings));
}

This code passes in the parameter to be updated, and then $binding sobtains the binding relationship. Here we get an array containing the password sum updated_at(default update timestamp), and then generate the preprocessing through Illuminate\Database\Query\Grammars\Grammar the compileUpdate method of the Grammar() class. SQL statement, the corresponding SQL statement here is:

update `users` set `password` = ?, `updated_at` = ? where `id` = ?

Then finally pass the prepared SQL statement and the corresponding binding relationship to the database for execution. We will continue to discuss about SQL injection in the subsequent database section.

User profile information

If there is a user account in the application, it may be necessary to process profile information such as email addresses, phone numbers, postal codes, and so on. PHP expected this to happen, so it provided a filter_var sum filter_input function. The parameters of these two functions can use different flags to filter different types of input: email addresses, URL-encoded strings, integers, floating-point numbers, HTML characters, URLs, and specific ranges of ASCII characters.

The following example shows how to filter email addresses and remove !#$%&'*+-/=?^_{|}~@.[]all characters except letters, numbers, and:

<?php
$email = 'louieperry@morioh.com';
$emailSafe = filter_var($email, FILTER_SANITIZE_EMAIL);

For more filter_var use, please refer to the official PHP documentation: https://www.php.net/manual/en/function.filter-var.php , and the corresponding removal filter please refer to: https://www.php.net/manual/en/filter.filters.sanitize.php.

Of course, filter_var functions can also be used to filter other form submission data.

#php #laravel #php-security