Earlier this year, we set out to implement an implementation of all the RFC 4648 character encoding functions (and their respective decoding functions) that was fully constant-time. Fortunately, a lot of the groundwork was already laid by Steve "Sc00bz" Thomas (of the DecryptoCat fame).
Our implementation is available on Github under paragonie/constant_time_encoding under the same license as Steve's code (MIT). Feel free to use it to enhance the security of your PHP projects, especially if you need Base32 encoding (which PHP doesn't provide but, for example, Google Authenticator requires).
It's also available via Composer:
composer require paragonie/constant_time_encoding
This post discusses a situation where encoding (which, despite widespread confusion, is a distinct concept from encryption) intersects with designing safe cryptography systems.
What is RFC 4648 Encoding?
Most PHP programmers are more familiar with these functions:
-
bin2hex()
andhex2bin()
-
base64_encode()
andbase64_decode()
They are but a subset of all the character encoding schemes defined in RFC 4648. The total list includes:
- Base16 (hexadecimal)
- Base32
- Base32 with an extended Hex alphabet
- Base64
- URL-safe Base64
While other numerical bases are common (BitCoin uses Base-58 with a checksum), the cool thing about the character encoding schemes defined in RFC 4648 is that they're all powers of 2.
- $2^{4} = 16$
- $2^{5} = 32$
- $2^{6} = 64$
The RFC 4648 encoding schemes are advantageous when working with compressed or raw binary data and converting it into an ASCII-only form for transport. Among other things, they are useful for storing encryption keys in a JSON configuration file.
What Does Constant-Time Mean?
When a function is constant-time, it means that the time it takes to perform a calculation is not dependent on the contents of its inputs; only their size.
For example, this will return FALSE
as soon as $password
contains a character other than a
:
var_dump($password === str_repeat("a", 1024));
Conversely, this will always take the same time (assuming $password
is 1024 characters long):
var_dump(hash_equals($password, str_repeat("a", 1024)));
This might not seem like a big deal, but if you're comparing a Message Authentication Code for an encrypted message, if it doesn't always take the same amount of time to compare the MAC you calculated for the message with the MAC that was sent, an attacker can slowly deduce a valid MAC for a forged message. This is called a timing attack.
String comparisons (which outside of cryptography are considered a benign operation) aren't the only thing that can leak timing information. Micro-architecture side channels, such as cache-timing attacks, are far more pernicious. Even software implementations of AES are vulnerable to cache-timing attacks (PDF). Cryptographers have been able to deduce the AES key based on cache-timing information in 65 milliseconds.
Why the World Needs Constant-Time RFC 4648 Encoding
If you work with cryptography, you probably generate and store secret information by encoding it. If you're using the standard character encoding functions that ship with most programming languages, you might be opening the door to cache-timing attacks.
This is still an open research question, but we do know that, if used on cryptographic secrets, most programming languages are using table look-ups indexed by secret data, which is an easy way to introduce cache-timing vulnerabilities into a cryptosystem.
Rather than wait for a practical exploit be developed, the solution is available today (for PHP, anyway). If you're writing software in another language, libsodium ships with hexadecimal encoding/decoding functions that are cache-timing-safe.
How We Designed our Constant-time Encoding Library
We started with a simple fork of Steve Thomas's ConstTimeEncoding to modernize the code (PSR-4, integration with Composer, etc.). We ended up going a lot further:
- Instead of
ord()
andchr()
, which amplify cache-timing leaks due to PHP's optimizations, we instead opted to usepack()
andunpack()
- To better handle function overloading, we wrote a
Binary
class that reliably delivers the expected results ofstrlen()
andsubstr()
when working with raw binary data. - We implemented
Hex
,Base32
,Base32Hex
, andBase64UrlSafe
for complete RFC 4648 coverage. - We built a unit test suite (via PHPUnit)
How to Use the Library
The code deltas between using our library and PHP's built-in functions was kept reasonably small.
Old:
<?php
var_dump(strtr(base64_encode(random_bytes(32)), '+/', '-_'));
New:
<?php
use \ParagonIE\ConstantTime\Base64UrlSafe;
var_dump(Base64UrlSafe::encode(random_bytes(32)));
For Hex and Base32 encoding, there is a separate encodeUpper()
method that returns capital letters instead of lowercase.
What Projects Use this Library?
- CMS Airship and any other Paragon Initiative Enterprises projects, when appropriate
- Phpseclib
- Niklas Keller's Two-Factor Authentication Library
On Boring Cryptography
"Boring cryptography" refers to cryptography designs and implementations that are obviously secure. This means having at least $2^{128}$ bits of security (Ed25519) instead of 1024-bit RSA (which is estimated to be approximately $2^{80}$). Boring cryptography means being obviously constant-time. When cryptography is boring, there's far less room for implementers to make cataclysmic mistakes (such as repeating an ECDSA nonce).
Cryptographers are working hard to bring boring cryptography to the masses. Paragon Initiative Enterprises is similarly working hard to bring boring levels of security to PHP. That is why we're building Airship: The PHP community deserves a CMS/blogging platform that is obviously secure, written from an understanding of how PHP applications are attacked in the real world.
Remember, "Attacks only get better; they never get worse."