Simple way to encode a string according to a password?

RexE picture RexE · Mar 22, 2010 · Viewed 222.2k times · Source

Does Python have a built-in, simple way of encoding/decoding strings using a password?

Something like this:

>>> encode('John Doe', password = 'mypass')
'sjkl28cn2sx0'
>>> decode('sjkl28cn2sx0', password = 'mypass')
'John Doe'

So the string "John Doe" gets encrypted as 'sjkl28cn2sx0'. To get the original string, I would "unlock" that string with the key 'mypass', which is a password in my source code. I'd like this to be the way I can encrypt/decrypt a Word document with a password.

I would like to use these encrypted strings as URL parameters. My goal is obfuscation, not strong security; nothing mission critical is being encoded. I realize I could use a database table to store keys and values, but am trying to be minimalist.

Answer

Martijn Pieters picture Martijn Pieters · Mar 13, 2019

Python has no built-in encryption schemes, no. You also should take encrypted data storage serious; trivial encryption schemes that one developer understands to be insecure and a toy scheme may well be mistaken for a secure scheme by a less experienced developer. If you encrypt, encrypt properly.

You don’t need to do much work to implement a proper encryption scheme however. First of all, don’t re-invent the cryptography wheel, use a trusted cryptography library to handle this for you. For Python 3, that trusted library is cryptography.

I also recommend that encryption and decryption applies to bytes; encode text messages to bytes first; stringvalue.encode() encodes to UTF8, easily reverted again using bytesvalue.decode().

Last but not least, when encrypting and decrypting, we talk about keys, not passwords. A key should not be human memorable, it is something you store in a secret location but machine readable, whereas a password often can be human-readable and memorised. You can derive a key from a password, with a little care.

But for a web application or process running in a cluster without human attention to keep running it, you want to use a key. Passwords are for when only an end-user needs access to the specific information. Even then, you usually secure the application with a password, then exchange encrypted information using a key, perhaps one attached to the user account.

Symmetric key encryption

Fernet – AES CBC + HMAC, strongly recommended

The cryptography library includes the Fernet recipe, a best-practices recipe for using cryptography. Fernet is an open standard, with ready implementations in a wide range of programming languages and it packages AES CBC encryption for you with version information, a timestamp and an HMAC signature to prevent message tampering.

Fernet makes it very easy to encrypt and decrypt messages and keep you secure. It is the ideal method for encrypting data with a secret.

I recommend you use Fernet.generate_key() to generate a secure key. You can use a password too (next section), but a full 32-byte secret key (16 bytes to encrypt with, plus another 16 for the signature) is going to be more secure than most passwords you could think of.

The key that Fernet generates is a bytes object with URL and file safe base64 characters, so printable:

from cryptography.fernet import Fernet

key = Fernet.generate_key()  # store in a secure location
print("Key:", key.decode())

To encrypt or decrypt messages, create a Fernet() instance with the given key, and call the Fernet.encrypt() or Fernet.decrypt(), both the plaintext message to encrypt and the encrypted token are bytes objects.

encrypt() and decrypt() functions would look like:

from cryptography.fernet import Fernet

def encrypt(message: bytes, key: bytes) -> bytes:
    return Fernet(key).encrypt(message)

def decrypt(token: bytes, key: bytes) -> bytes:
    return Fernet(key).decrypt(token)

Demo:

>>> key = Fernet.generate_key()
>>> print(key.decode())
GZWKEhHGNopxRdOHS4H4IyKhLQ8lwnyU7vRLrM3sebY=
>>> message = 'John Doe'
>>> encrypt(message.encode(), key)
'gAAAAABciT3pFbbSihD_HZBZ8kqfAj94UhknamBuirZWKivWOukgKQ03qE2mcuvpuwCSuZ-X_Xkud0uWQLZ5e-aOwLC0Ccnepg=='
>>> token = _
>>> decrypt(token, key).decode()
'John Doe'

Fernet with password – key derived from password, weakens the security somewhat

You can use a password instead of a secret key, provided you use a strong key derivation method. You do then have to include the salt and the HMAC iteration count in the message, so the encrypted value is not Fernet-compatible anymore without first separating salt, count and Fernet token:

import secrets
from base64 import urlsafe_b64encode as b64e, urlsafe_b64decode as b64d

from cryptography.fernet import Fernet
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC

backend = default_backend()
iterations = 100_000

def _derive_key(password: bytes, salt: bytes, iterations: int = iterations) -> bytes:
    """Derive a secret key from a given password and salt"""
    kdf = PBKDF2HMAC(
        algorithm=hashes.SHA256(), length=32, salt=salt,
        iterations=iterations, backend=backend)
    return b64e(kdf.derive(password))

def password_encrypt(message: bytes, password: str, iterations: int = iterations) -> bytes:
    salt = secrets.token_bytes(16)
    key = _derive_key(password.encode(), salt, iterations)
    return b64e(
        b'%b%b%b' % (
            salt,
            iterations.to_bytes(4, 'big'),
            b64d(Fernet(key).encrypt(message)),
        )
    )

def password_decrypt(token: bytes, password: str) -> bytes:
    decoded = b64d(token)
    salt, iter, token = decoded[:16], decoded[16:20], b64e(decoded[20:])
    iterations = int.from_bytes(iter, 'big')
    key = _derive_key(password.encode(), salt, iterations)
    return Fernet(key).decrypt(token)

Demo:

>>> message = 'John Doe'
>>> password = 'mypass'
>>> password_encrypt(message.encode(), password)
b'9Ljs-w8IRM3XT1NDBbSBuQABhqCAAAAAAFyJdhiCPXms2vQHO7o81xZJn5r8_PAtro8Qpw48kdKrq4vt-551BCUbcErb_GyYRz8SVsu8hxTXvvKOn9QdewRGDfwx'
>>> token = _
>>> password_decrypt(token, password).decode()
'John Doe'

Including the salt in the output makes it possible to use a random salt value, which in turn ensures the encrypted output is guaranteed to be fully random regardless of password reuse or message repetition. Including the iteration count ensures that you can adjust for CPU performance increases over time without losing the ability to decrypt older messages.

A password alone can be as safe as a Fernet 32-byte random key, provided you generate a properly random password from a similar size pool. 32 bytes gives you 256 ^ 32 number of keys, so if you use an alphabet of 74 characters (26 upper, 26 lower, 10 digits and 12 possible symbols), then your password should be at least math.ceil(math.log(256 ** 32, 74)) == 42 characters long. However, a well-selected larger number of HMAC iterations can mitigate the lack of entropy somewhat as this makes it much more expensive for an attacker to brute force their way in.

Just know that choosing a shorter but still reasonably secure password won’t cripple this scheme, it just reduces the number of possible values a brute-force attacker would have to search through; make sure to pick a strong enough password for your security requirements.

Alternatives

Obscuring

An alternative is not to encrypt. Don't be tempted to just use a low-security cipher, or a home-spun implementation of, say Vignere. There is no security in these approaches, but may give an inexperienced developer that is given the task to maintain your code in future the illusion of security, which is worse than no security at all.

If all you need is obscurity, just base64 the data; for URL-safe requirements, the base64.urlsafe_b64encode() function is fine. Don't use a password here, just encode and you are done. At most, add some compression (like zlib):

import zlib
from base64 import urlsafe_b64encode as b64e, urlsafe_b64decode as b64d

def obscure(data: bytes) -> bytes:
    return b64e(zlib.compress(data, 9))

def unobscure(obscured: bytes) -> bytes:
    return zlib.decompress(b64d(obscured))

This turns b'Hello world!' into b'eNrzSM3JyVcozy_KSVEEAB0JBF4='.

Integrity only

If all you need is a way to make sure that the data can be trusted to be unaltered after having been sent to an untrusted client and received back, then you want to sign the data, you can use the hmac library for this with SHA1 (still considered secure for HMAC signing) or better:

import hmac
import hashlib

def sign(data: bytes, key: bytes, algorithm=hashlib.sha256) -> bytes:
    assert len(key) >= algorithm().digest_size, (
        "Key must be at least as long as the digest size of the "
        "hashing algorithm"
    )
    return hmac.new(key, data, algorithm).digest()

def verify(signature: bytes, data: bytes, key: bytes, algorithm=hashlib.sha256) -> bytes:
    expected = sign(data, key, algorithm)
    return hmac.compare_digest(expected, signature)

Use this to sign data, then attach the signature with the data and send that to the client. When you receive the data back, split data and signature and verify. I've set the default algorithm to SHA256, so you'll need a 32-byte key:

key = secrets.token_bytes(32)

You may want to look at the itsdangerous library, which packages this all up with serialisation and de-serialisation in various formats.

Using AES-GCM encryption to provide encryption and integrity

Fernet builds on AEC-CBC with a HMAC signature to ensure integrity of the encrypted data; a malicious attacker can't feed your system nonsense data to keep your service busy running in circles with bad input, because the ciphertext is signed.

The Galois / Counter mode block cipher produces ciphertext and a tag to serve the same purpose, so can be used to serve the same purposes. The downside is that unlike Fernet there is no easy-to-use one-size-fits-all recipe to reuse on other platforms. AES-GCM also doesn't use padding, so this encryption ciphertext matches the length of the input message (whereas Fernet / AES-CBC encrypts messages to blocks of fixed length, obscuring the message length somewhat).

AES256-GCM takes the usual 32 byte secret as a key:

key = secrets.token_bytes(32)

then use

import binascii, time
from base64 import urlsafe_b64encode as b64e, urlsafe_b64decode as b64d

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
from cryptography.exceptions import InvalidTag

backend = default_backend()

def aes_gcm_encrypt(message: bytes, key: bytes) -> bytes:
    current_time = int(time.time()).to_bytes(8, 'big')
    algorithm = algorithms.AES(key)
    iv = secrets.token_bytes(algorithm.block_size // 8)
    cipher = Cipher(algorithm, modes.GCM(iv), backend=backend)
    encryptor = cipher.encryptor()
    encryptor.authenticate_additional_data(current_time)
    ciphertext = encryptor.update(message) + encryptor.finalize()        
    return b64e(current_time + iv + ciphertext + encryptor.tag)

def aes_gcm_decrypt(token: bytes, key: bytes, ttl=None) -> bytes:
    algorithm = algorithms.AES(key)
    try:
        data = b64d(token)
    except (TypeError, binascii.Error):
        raise InvalidToken
    timestamp, iv, tag = data[:8], data[8:algorithm.block_size // 8 + 8], data[-16:]
    if ttl is not None:
        current_time = int(time.time())
        time_encrypted, = int.from_bytes(data[:8], 'big')
        if time_encrypted + ttl < current_time or current_time + 60 < time_encrypted:
            # too old or created well before our current time + 1 h to account for clock skew
            raise InvalidToken
    cipher = Cipher(algorithm, modes.GCM(iv, tag), backend=backend)
    decryptor = cipher.decryptor()
    decryptor.authenticate_additional_data(timestamp)
    ciphertext = data[8 + len(iv):-16]
    return decryptor.update(ciphertext) + decryptor.finalize()

I've included a timestamp to support the same time-to-live use-cases that Fernet supports.

Other approaches on this page, in Python 3

AES CFB - like CBC but without the need to pad

This is the approach that All Іѕ Vаиітy follows, albeit incorrectly. This is the cryptography version, but note that I include the IV in the ciphertext, it should not be stored as a global (reusing an IV weakens the security of the key, and storing it as a module global means it'll be re-generated the next Python invocation, rendering all ciphertext undecryptable):

import secrets
from base64 import urlsafe_b64encode as b64e, urlsafe_b64decode as b64d

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend

backend = default_backend()

def aes_cfb_encrypt(message, key):
    algorithm = algorithms.AES(key)
    iv = secrets.token_bytes(algorithm.block_size // 8)
    cipher = Cipher(algorithm, modes.CFB(iv), backend=backend)
    encryptor = cipher.encryptor()
    ciphertext = encryptor.update(message) + encryptor.finalize()
    return b64e(iv + ciphertext)

def aes_cfb_decrypt(ciphertext, key):
    iv_ciphertext = b64d(ciphertext)
    algorithm = algorithms.AES(key)
    size = algorithm.block_size // 8
    iv, encrypted = iv_ciphertext[:size], iv_ciphertext[size:]
    cipher = Cipher(algorithm, modes.CFB(iv), backend=backend)
    decryptor = cipher.decryptor()
    return decryptor.update(encrypted) + decryptor.finalize()

This lacks the added armoring of an HMAC signature and there is no timestamp; you’d have to add those yourself.

The above also illustrates how easy it is to combine basic cryptography building blocks incorrectly; All Іѕ Vаиітy‘s incorrect handling of the IV value can lead to a data breach or all encrypted messages being unreadable because the IV is lost. Using Fernet instead protects you from such mistakes.

AES ECB – not secure

If you previously implemented AES ECB encryption and need to still support this in Python 3, you can do so still with cryptography too. The same caveats apply, ECB is not secure enough for real-life applications. Re-implementing that answer for Python 3, adding automatic handling of padding:

from base64 import urlsafe_b64encode as b64e, urlsafe_b64decode as b64d

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.primitives import padding
from cryptography.hazmat.backends import default_backend

backend = default_backend()

def aes_ecb_encrypt(message, key):
    cipher = Cipher(algorithms.AES(key), modes.ECB(), backend=backend)
    encryptor = cipher.encryptor()
    padder = padding.PKCS7(cipher.algorithm.block_size).padder()
    padded = padder.update(msg_text.encode()) + padder.finalize()
    return b64e(encryptor.update(padded) + encryptor.finalize())

def aes_ecb_decrypt(ciphertext, key):
    cipher = Cipher(algorithms.AES(key), modes.ECB(), backend=backend)
    decryptor = cipher.decryptor()
    unpadder = padding.PKCS7(cipher.algorithm.block_size).unpadder()
    padded = decryptor.update(b64d(ciphertext)) + decryptor.finalize()
    return unpadder.update(padded) + unpadder.finalize()

Again, this lacks the HMAC signature, and you shouldn’t use ECB anyway. The above is there merely to illustrate that cryptography can handle the common cryptographic building blocks, even the ones you shouldn’t actually use.