A frequent problem that developers encounter when building web applications in PHP is, "How should I represent this data structure as a string?" Two common examples include: * Caching a complex data structure (to reduce database load) * Communicating API requests and responses between HTTP-aware applications This seems like the sort of problem that you could expect would have pre-existing, straightforward solutions built into every major programming language that aren't accompanied by significant security risk. Sadly, this isn't the case. ## Popular Serialization Strategies (and their Respective Vulnerabilities) Let's look at the most common use cases, and the danger involved with each:
libxml_disable_entity_loader(true);
Even with this protection in place, [libxml has an established history of security vulnerabilities](https://www.cvedetails.com/vulnerability-list/vendor_id-1962/product_id-3311/Xmlsoft-Libxml2.html). You can't really get away from libxml2 in PHP, but even if you could, most [XML parsers mishandled CDATA sections](https://lcamtuf.blogspot.com/2014/11/afl-fuzz-nobody-expects-cdata-sections.html) in often exploitable ways.
**Recommendations**:
* See below.
## Everything is Broken; How Do We Protect Ourselves?
So far, we've only looked at known vulnerabilities with each of these three deserialization strategies built into the PHP language, and the situation looks terrible. The prospect of as-of-yet undiscovered vulnerabilities only makes it much more grim. However, these vulnerabilities aren't unavoidable.
### Recommendation: Only Accept GET and POST Fields
When designing your APIs, instead of accepting a JSON/XML blob (or `serialize()` output) in the request body (like many APIs expect), use the tools provided by the HTTP standard: GET and POST fields. PHP already mitigates against hash-table collision denial of service attacks in these (since 2011).
// Sending:
// Don't do this:
curl_setopt($ch, CURLOPT_POSTFIELDS, '{"data":{"does_php_do_the_right_thing_here?":"no"}}');
// Do this instead:
curl_setopt($ch, CURLOPT_POSTFIELDS, [
'data' => [
'does_php_do_the_right_thing_here?' => 'yes, yes it does'
]
]);
// Receiving:
$data = $_POST['data'];
If you're building API requests by hand, http_build_query() will come in handy.
### Recommendation: Authenticate the Messages you Send to Yourself
While the previous recommendation dealt with receiving arbitrary data from end users, this is more suitable for circumstances where you have two servers talking to each other (e.g. an internal API for communicating with a microservice) or you're somehow storing data on the client and don't want it to be tampered with (e.g. encrypted cookie).
This is also a good idea in situations where you're, for example, storing data in a memcached cluster and want to reduce the lateral attack surface if one of the other servers gets compromised.
This recommendation takes a page from the [JSON Web Tokens](https://github.com/lcobucci/jwt) approach:
* Serialize then authenticate.
* Verify then deserialize.
In most cases a simple HMAC (with [constant-time validation](https://paragonie.com/blog/2015/11/preventing-timing-attacks-on-string-comparison-with-double-hmac-strategy)) will suffice; in others, you'll need to use digital signatures. Either way, ask a security expert to review your decision and implementation. (If you don't know any: [We consult](https://paragonie.com/service/technology-consulting).)
For example, using [Halite](https://github.com/paragonie/halite):
<?php
use \ParagonIE\Halite\Symmetric\Crypto as Symmetric;
use \ParagonIE\Halite\KeyFactory;
use \ParagonIE\Halite\Util;
$authKey = KeyFactory::loadAuthenticationKey('/outside/project/path/auth.key');
// Serialization:
$serialized = json_encode($yourData);
$storeMe = Symmetric::authenticate($serialized, $authKey) . $serialized;
// Deserialization:
$mac = Util::safeSubstr($storeMe, 0, 2 * \Sodium\CRYPTO_AUTH_BYTES);
$message = Util::safeSubstr($storeMe, 2 * \Sodium\CRYPTO_AUTH_BYTES);
if (Symmetric::verify($message, $authKey, $mac)) {
$object = json_decode($message);
}
Another example, using public key cryptography (digital signatures):
<?php
use \ParagonIE\Halite\Asymmetric\Crypto as Asymmetric;
use \ParagonIE\Halite\KeyFactory;
use \ParagonIE\Halite\Util;
$keyPair = KeyFactory::loadSignatureKeyPair('/outside/project/path/signing.secretkey');
$secretKey = $keyPair->getSecretKey();
// Serialization:
$serialized = serialize($yourData);
$storeMe = Asymmetric::sign($serialized, $secretKey) . $serialized;
// Deserialization:
$publicKey = KeyFactory::loadSignaturePublicKey('/outside/project/path/signing.publickey');
// Or $publicKey = $keyPair->getPublicKey();
$signature = Util::safeSubstr($storeMe, 0, 2 * \Sodium\CRYPTO_SIGN_BYTES);
$message = Util::safeSubstr($storeMe, 2 * \Sodium\CRYPTO_SIGN_BYTES);
if (Asymmetric::verify($message, $publicKey, $signature)) {
$object = unserialize($message, ['allowed_classes' => false]);
}
-----
In the future, we hope to see PHP internally adopt a non-deterministic hash table (i.e. something similar to [SipHash](https://www.131002.net/siphash/) with a randomly generated key) to make data serialization safer. Until then, your best bet is to either avoid these features in the first place or use strong cryptography as a mitigation.