Earlier this year, we set out to implement an implementation of all the RFC 4648 character encoding functions (and their respective decoding functions) that was fully constant-time. Fortunately, a lot of the groundwork was already laid by [Steve "Sc00bz" Thomas](https://www.tobtu.com) (of the DecryptoCat fame). Our implementation is available on Github under [paragonie/constant_time_encoding](https://github.com/paragonie/constant_time_encoding) under the same license as Steve's code (MIT). Feel free to use it to enhance the security of your PHP projects, especially if you need Base32 encoding (which PHP doesn't provide but, for example, Google Authenticator requires). It's also available via Composer: > `composer require paragonie/constant_time_encoding` This post discusses a situation where encoding (which, despite widespread confusion, is a [distinct concept from encryption](https://paragonie.com/blog/2015/08/you-wouldnt-base64-a-password-cryptography-decoded)) intersects with designing safe cryptography systems. ### What is RFC 4648 Encoding? Most PHP programmers are more familiar with these functions: * `bin2hex()` and `hex2bin()` * `base64_encode()` and `base64_decode()` They are but a subset of all the character encoding schemes defined in [RFC 4648](https://tools.ietf.org/html/rfc4648#page-2). The total list includes: * Base16 (hexadecimal) * Base32 * Base32 with an extended Hex alphabet * Base64 * URL-safe Base64 While other numerical bases are common (BitCoin uses [Base-58 with a checksum](https://github.com/adamcaudill/Base58Check)), the cool thing about the character encoding schemes defined in RFC 4648 is that they're all powers of 2. * $2^{4} = 16$ * $2^{5} = 32$ * $2^{6} = 64$ The RFC 4648 encoding schemes are advantageous when working with compressed or raw binary data and converting it into an ASCII-only form for transport. Among other things, they are useful for storing encryption keys in a JSON configuration file. ### What Does Constant-Time Mean? When a function is constant-time, it means that the time it takes to perform a calculation is not dependent on the contents of its inputs; only their size. For example, this will return `FALSE` as soon as `$password` contains a character other than `a`:
var_dump($password === str_repeat("a", 1024));
Conversely, this will always take the same time (assuming `$password` is 1024 characters long):
var_dump(hash_equals($password, str_repeat("a", 1024)));
This might not seem like a big deal, but if you're comparing a Message Authentication Code for an encrypted message, if it doesn't always take the same amount of time to compare the MAC you calculated for the message with the MAC that was sent, an attacker can slowly deduce a valid MAC for a forged message. This is called a [timing attack](https://en.wikipedia.org/wiki/Timing_attack). String comparisons (which outside of cryptography are considered a benign operation) aren't the only thing that can leak timing information. Micro-architecture side channels, such as cache-timing attacks, are far more pernicious. Even [software implementations of AES are vulnerable to cache-timing attacks](https://cr.yp.to/antiforgery/cachetiming-20050414.pdf) (PDF). Cryptographers have been able to deduce the AES key based on cache-timing information in 65 milliseconds. ### Why the World Needs Constant-Time RFC 4648 Encoding If you work with cryptography, you probably generate and store secret information by encoding it. If you're using the standard character encoding functions that ship with most programming languages, you *might* be opening the door to cache-timing attacks. This is still an [open research question](https://defuse.ca/side-channels-in-encoding-functions.htm), but we do know that, if used on cryptographic secrets, most programming languages are [using table look-ups indexed by secret data](https://cryptocoding.net/index.php/Coding_rules#Avoid_table_look-ups_indexed_by_secret_data), which is an easy way to introduce cache-timing vulnerabilities into a cryptosystem. Rather than wait for a practical exploit be developed, the solution is available today (for PHP, anyway). If you're writing software in another language, [libsodium](https://download.libsodium.org/doc/) ships with hexadecimal encoding/decoding functions that are cache-timing-safe. ## How We Designed our Constant-time Encoding Library We started with a simple fork of [Steve Thomas's ConstTimeEncoding](https://github.com/Sc00bz/ConstTimeEncoding) to modernize the code (PSR-4, integration with Composer, etc.). We ended up going a lot further: * Instead of `ord()` and `chr()`, which amplify cache-timing leaks due to PHP's optimizations, we instead opted to use `pack()` and `unpack()` * To better handle [function overloading](https://secure.php.net/manual/en/mbstring.overload.php), we wrote a `Binary` class that reliably delivers the expected results of `strlen()` and `substr()` when working with raw binary data. * We implemented `Hex`, `Base32`, `Base32Hex`, and `Base64UrlSafe` for complete RFC 4648 coverage. * We built a unit test suite (via PHPUnit) ### How to Use the Library The code deltas between using our library and PHP's built-in functions was kept reasonably small. Old:
New:

var_dump(Base64UrlSafe::encode(random_bytes(32)));
For Hex and Base32 encoding, there is a separate `encodeUpper()` method that returns capital letters instead of lowercase. ## What Projects Use this Library? * [CMS Airship](https://github.com/paragonie/airship) and any other Paragon Initiative Enterprises projects, when appropriate * [Phpseclib](https://github.com/phpseclib/phpseclib) * [Niklas Keller's Two-Factor Authentication Library](https://github.com/kelunik/two-factor) ## On Boring Cryptography "Boring cryptography" refers to cryptography designs and implementations that are *obviously secure*. This means having at least $2^{128}$ bits of security (Ed25519) instead of 1024-bit RSA (which is estimated to be approximately $2^{80}$). Boring cryptography means being *obviously constant-time*. When cryptography is boring, there's far less room for implementers to make cataclysmic mistakes (such as [repeating an ECDSA nonce](https://www.schneier.com/blog/archives/2011/01/sony_ps3_securi.html)). Cryptographers are working hard to bring boring cryptography to the masses. Paragon Initiative Enterprises is similarly working hard to bring boring levels of security to PHP. That is why we're building [Airship](https://github.com/paragonie/airship): The PHP community deserves a CMS/blogging platform that is *obviously secure*, written from an understanding of how PHP applications are attacked in the real world. Remember, "Attacks only get better; they never get worse."