Back in 2017, we outlined the fundamentals of searchable encryption with PHP and SQL. Shortly after, we implemented this design in a library we call CipherSweet.
Our initial design constraints were as follows:
- Only use the cryptography tools that are already widely available to developers.
- Only use encryption modes that are secure against chosen-ciphertext attacks.
- Treat usability as a security property.
- Remain as loosely schema-agnostic as possible, so that it's possible to use our design in NoSQL contexts or wildly different SQL database layouts.
- Be extensible, so that it may be integrated with many other products and services.
Today, we'd like to talk about some of the challenges we've encountered, as well as some of the features that have landed in CipherSweet since its inception, and how we believe they are beneficial for the adoption of usable cryptography at scale.
If you're not familiar with cryptography terms, you may find this page useful.
Challenges in Searchable Encryption
As of the time of this writing, it's difficult to declare a "state of the art" design for searchable encryption, for two reasons:
- Different threat models and operational requirements.
- Ongoing academic research into different designs and attacks.
Cryptographers interested in encrypted search engines are likely invested in the ongoing research into fully homomorphic encryption (FHE), which allows the database server to perform calculations on the ciphertext and return an encrypted result to the application to decrypt.
Some projects (e.g. the encrypted camera app Pixek and much of the other work of Seny Kamara, et al.) uses a technique called structured encryption to accomplish encrypted search with a different threat model and set of operational requirements. Namely, the queries and tags are encrypted client-side and the server just acts as a data mule with no additional power to perform computations.
In either case, there are a few challenges that any proposed design must help its users overcome if they are to be used in the real world.
Active Cryptanalytic Attacks
The most significant real-world deterrents from adopting fully homomorphic encryption today are:
- Performance.
- Cryptography implementation availability.
However, savvy companies will also list a third deterrent: adaptive chosen-ciphertext attacks.
This can be a controversial point to raise, because its significance depends on your application's threat model. Some application developers really trust their database server to not lie to the application.
More generally, all forms of active attacks from a privileged but not omnipotent user (e.g. root access to the database server, but not root access on the client application software) should be considered when design any kind of encrypted search feature.
Small Input Domains
Let's say you're designing software for a hospital computer network and need to store protected health information with very few possible inputs (e.g. HIV status).
Even if you can encrypt this data securely (i.e. using AEAD and without message length oracles), any system that allows you to quickly search the database for a specific value (e.g. HIV Positive) introduces the risk of leaking information through side-channels.
Information Leakage
Search operations are ripe for oracles.
In particular: Order-revealing encryption techniques leak your plaintext, similar to block ciphers in ECB mode.
Any proposal for searchable encryption must be able to account for its information leakage and provide users a simple way of understanding and managing that risk.
CipherSweet: A High-Level Overview
This is a brief introduction to CipherSweet and a high-level overview. For more depth, please refer to the official documentation on Github.
Where to Get CipherSweet
CipherSweet is available on Github, and can be installed via Composer with the following command:
composer require paragonie/ciphersweet
Using CipherSweet
First, you need a backend, which handles all of the cryptographic heavy lifting.
We give you two to choose from, but there's also a BackendInterface
if anyone ever needs to define their own:
- FIPSCrypto only uses the algorithms approved for use by FIPS 140-2. Note that using this backend doesn't automatically make your application FIPS 140-2 certified.
- ModernCrypto uses libsodium, and is generally recommended in most situations.
Once you've chosen a backend, you're done thinking about cryptography algorithms. You don't need to specify a cipher mode, or a hash function, or anything else. Instead, the next step is to decide how you want to manage your keys.
In addition to a few generic options, CipherSweet provides a KeyProviderInterface
to allow developers to integrate with their own custom key management solutions.
Finally, you just need to pass the backend and key provider to the engine. From this point on, the engine is the only object you need to work with directly.
All together, it looks like this:
<?php
use ParagonIE\CipherSweet\Backend\ModernCrypto;
use ParagonIE\CipherSweet\KeyProvider\StringProvider;
use ParagonIE\CipherSweet\CipherSweet;
// First, choose your backend:
$backend = new ModernCrypto();
// Next, your key provider:
$provider = new StringProvider(
// The key provider stores the BackendInterface for internal use:
$backend,
// Example key, chosen randomly, hex-encoded:
'4e1c44f87b4cdf21808762970b356891db180a9dd9850e7baf2a79ff3ab8a2fc'
);
// From this point forward, you only need your Engine:
$engine = new CipherSweet($provider);
Once you have an working CipherSweet engine, you have a lot of flexibility in how you use it. In each of the following classes, you'll mostly use the following methods:
-
prepareForStorage()
on INSERT and UPDATE queries. -
getAllBlindIndexes()
/getBlindIndex()
for SELECT queries. -
decrypt()
/decryptRow()
/decryptManyRows()
for decrypting after the SELECT query.
The encrypt/decrypt APIs were named more verbosely than simply encrypt()
/decrypt()
to ensure that the intent is
communicated whenever a developer works with it.
EncryptedField: Searchable Encryption for a Single Column
EncryptedField
is a minimalistic interface for encrypting a single column of a database table.
EncryptedField
is designed for projects that only ever need to encrypt a single field, but still want to be able to search on the values of this field.
<?php
use ParagonIE\CipherSweet\BlindIndex;
use ParagonIE\CipherSweet\CipherSweet;
use ParagonIE\CipherSweet\EncryptedField;
use ParagonIE\CipherSweet\Transformation\LastFourDigits;
/** @var CipherSweet $engine */
$ssn = (new EncryptedField($engine, 'contacts', 'ssn'))
->addBlindIndex(
new BlindIndex('contact_ssn_full', [], 8)
)
->addBlindIndex(
new BlindIndex('contact_ssn_last_four', [new LastFourDigits], 4)
);
EncryptedRow: Searchable Encryption for Many Columns in One Table
EncryptedRow
is a more powerful API that operates on rows of data at a time.
EncryptedRow
is designed for projects that encrypt multiple fields and/or wish to create compound blind indexes.
It also has built-in handling for integers, floating point numbers, and (nullable) boolean values, (which furthermore doesn't leak the size of the stored values in the ciphertext length):
<?php
use ParagonIE\CipherSweet\CipherSweet;
use ParagonIE\CipherSweet\EncryptedRow;
/** @var CipherSweet $engine */
$row = (new EncryptedRow($engine, 'contacts'))
->addTextField('first_name')
->addTextField('last_name')
->addTextField('ssn')
->addBooleanField('hivstatus')
->addFloatField('latitude')
->addFloatField('longitude')
->addIntegerField('birth_year');
EncryptedRow
expects an array that maps column names to values, like so:
<?php
$input = [
'contactid' => 12345,
'first_name' => 'Jane',
'last_name' => 'Doe',
'ssn' => '123-45-6789',
'hivstatus' => false,
'latitude' => 52.52,
'longitude' => -33.106,
'birth_year' => 1988,
'extraneous' => true
];
EncryptedMultiRows: Searchable Encryption for Many Tables
EncryptedMultiRows
is a multi-row abstraction designed to make it easier to work on heavily-normalized databases
and integrate CipherSweet with ORMs (e.g. Eloquent).
Under the hood, it maintains an internal array of EncryptedRow
objects (one for each table), so
the features that EncryptedRow
provides are also usable from EncryptedMultiRows
.
Anyone familiar with EncryptedRow
should find the API for EncryptedMultiRows
to be familiar.
<?php
use ParagonIE\CipherSweet\CipherSweet;
use ParagonIE\CipherSweet\EncryptedMultiRows;
/** @var CipherSweet $engine */
$rowSet = (new EncryptedMultiRows($engine))
->addTextField('contacts', 'first_name')
->addTextField('contacts', 'last_name')
->addTextField('contacts', 'ssn')
->addBooleanField('contacts', 'hivstatus')
->addFloatField('contacts', 'latitude')
->addFloatField('contacts', 'longitude')
->addIntegerField('contacts', 'birth_year')
->addTextField('foobar', 'test');
EncryptedRows
expects an array of table names mapped to an array that in turn maps columns to values,
like so:
<?php
$input = [
'contacts' => [
'contactid' => 12345,
'first_name' => 'Jane',
'last_name' => 'Doe',
'ssn' => '123-45-6789',
'hivstatus' => null, // unknown
'latitude' => 52.52,
'longitude' => -33.106,
'birth_year' => 1988,
'extraneous' => true
],
'foobar' => [
'foobarid' => 23,
'contactid' => 12345,
'test' => 'paragonie'
]
];
CipherSweet's Usable Cryptography Wins
In addition to being designed in accordance to cryptographically secure PHP best practices, CipherSweet was also carefully constructed to be a user-friendly cryptographic API.
Here are some of the design decisions and features that lend towards hitting its usable security goals.
Blind Index Planning
If you're not familiar with blind indexes, please read the blog post detailing the fundamentals of our design.
Our blind indexing technique has a relatively straightforward information leakage profile, since the building block we use is a keyed hash function (e.g. HMAC-SHA384 or BLAKE2b) or key derivation function (e.g. PBKDF2-SHA384 or Argon2id), which is then truncated and used as a Bloom filter.
- If you make your index outputs too small, you'll incur a performance penalty from false positives that makes the blind index almost pointless.
- If you make your index outputs too large, you introduce the risk of creating unique fingerprints of the plaintext. The existence of reliable fingerprints introduce the risk of known- and chosen-plaintext attacks.
However, calculating a safe output size for each blind index involves a bit of math:
Generally, for a given population P, you want there to be between 2 and sqrt(P) hash prefix collisions (which we call "coincidences") in the blind index output.
To save developers time doing pencil and paper math, we created Planner
classes,
which let you figure out how many bits you can safely make your blind index outputs. No pencil and paper needed.
Compound Blind Indexes
A compound blind index is simply a blind index that was created from multiple fields at once. This is extremely useful if you want to filter your encrypted search results based on a boolean field without leaking the boolean value directly in the index value.
More broadly, compound blind indexes give you a flexible way to index common search criteria to make lookups fast.
For example, using EncryptedRow
:
<?php
use ParagonIE\CipherSweet\CipherSweet;
use ParagonIE\CipherSweet\Transformation\AlphaCharactersOnly;
use ParagonIE\CipherSweet\Transformation\FirstCharacter;
use ParagonIE\CipherSweet\Transformation\Lowercase;
use ParagonIE\CipherSweet\Transformation\LastFourDigits;
use ParagonIE\CipherSweet\EncryptedRow;
/** @var EncryptedRow $row */
$row->addCompoundIndex(
$row->createCompoundIndex(
'contact_first_init_last_name',
['first_name', 'last_name'],
64, // 64 bits = 8 bytes
true
)
->addTransform('first_name', new AlphaCharactersOnly())
->addTransform('first_name', new Lowercase())
->addTransform('first_name', new FirstCharacter())
->addTransform('last_name', new AlphaCharactersOnly())
->addTransform('last_name', new Lowercase())
);
This gives you a case-insensitive index of first initial + last name.
Built-In Key Separation
Information leakage is especially harmful if you're using the same key everywhere.
To mitigate this, CipherSweet automatically derives distinct subkeys for each table and column, and then for each blind index, using a process called the key hierarchy.
The short of it is: Your KeyProvider
defines a master key, from which the actual key used for encrypting each field is derived. We use HKDF and carefully-chosen domain separation constants to ensure cross-protocol attacks are not possible.
Key Rotation
If you need ever to switch CipherSweet backends or rotate your keys, we created a special-purpose suite of PHP classes to facilitate less-painful data migrations and reduce the amount of boilerplate code needed.
<?php
use ParagonIE\CipherSweet\CipherSweet;
use ParagonIE\CipherSweet\KeyRotation\FieldRotator;
use ParagonIE\CipherSweet\EncryptedField;
// 1. Set up
/**
* @var string $ciphertext
* @var CipherSweet $old
* @var CipherSweet $new
*/
$oldField = new EncryptedField($old, 'contacts', 'ssn');
$newField = new EncryptedField($new, 'contacts', 'ssn');
$rotator = new FieldRotator($oldField, $newField);
// 2. Using the
if ($rotator->needsReEncrypt($ciphertext)) {
list($ciphertext, $indices) = $rotator->prepareForUpdate($ciphertext);
// Then update this row in the database.
}
You can learn more about the various various migration features here.
Upcoming Developments in CipherSweet
One of the items on our roadmap for PHP security in 2019 is to bring CipherSweet to your favorite framework, with as little friction as possible. To this end, we will be releasing ORM integrations throughout Q1 2019, starting with Eloquent and Doctrine.
Additionally, we plan on shipping KeyProvider
implementations to integrate with cloud KMS solutions and common HSM solutions (e.g. YubiHSM). These will be standalone packages that extend the core functionality of CipherSweet to allow businesses and government offices to meet their stringent security compliance requirements without polluting the main library with code to tolerate oddly-specific requirements.
When both of these developments have been completed, adopting searchable encryption in your PHP software should be as painless as possible.
Finally, we want to develop CipherSweet beyond the PHP language. We want to provide compatible implementations for Java, C#, and Node.js developers in our initial run, although we're happy to assist the open source community in developing and auditing compatible libraries in other languages.
Honorable mention: Ryan Littlefield has already started on an early Python implementation of CipherSweet.
Support the Development of CipherSweet
If you'd like to support our development efforts, please consider purchasing an enterprise support contract from our company.