I am not a cryptanalyst, nor a mathematician – which makes it debatable whether I am qualified to write this series of articles. But I do have a good-enough grasp of the concept to notice that, even though we seem to live in a world where cryptography has a direct impact on our daily lives, there is considerable misunderstanding floating around. Whether the trigger is a question or a statement from a work colleague, a piece of software that I'm reviewing or an article from a publication, I often find myself thinking that a lot of people would greatly benefit from, at the very least, a basic understanding of what cryptography is and what it's not. I'm certainly not referring just to software engineers: middle and upper management have decisions to make, directions to set and products to acquire; sales people have products to... sell; spokespeople have statements to make when an incident occurs; journalists have questions to ask and articles to write; employees in a corporation will stumble upon different technologies, such as digital signatures; finally, the majority of us entrust a great deal of sensitive data to different entities through digital channels.
The trigger today was an article on BBC News regarding Grindr and the sharing of the HIV-status of its members. I read the same story on a few other news web sites and they all had one thing in common: encryption (which is one of the main purposes of the field of cryptography) was mentioned several times, but the lack of clarification made the information rather useless. So I decided to try and explain the basic concepts in a way that a non-technical person would find not just easy to understand, but also useful. And because I consider myself a programmer and I can't simply ignore my fellow guild members, I will add some additional, but clearly marked information for them; and some examples using Python Cryptography.
Cryptography's main purpose is to keep communication or data secure – inaccessible to an attacker interested in obtaining the private information. Its origins can be traced to antiquity, but the methods employed have become increasingly complex and tied to mathematics and engineering in modern times. And while its use wasn't as spread until recently, mainly confined to the military realm, the advent of the Internet has meant that the privacy and financial security of most of us relies heavily on cryptography and, more importantly, on how well its implemented.
There are two important things to take into consideration in the context of cryptography. The first is that the landscape constantly evolves: very, very few cryptographic algorithms are proven to be perpetually secure. As new methods, weaknesses or important technological or mathematical advancements are discovered, what is considered unbreakable today may become broken tomorrow. The good news is that, for the most part, change comes relatively slowly. Even when a very serious weakness is found, the danger is not immediate, but rather imminent: a few other conditions would need to be met to result in total loss of confidentiality. The industry is also quite good at enforcing continuous improvement to maintain at least similar, if not better, levels of privacy – for example, new algorithms are being pushed forward and older ones deprecated at the first signs of weakness, before they become broken. However, all of this means that you can't simply encrypt some data, keep it in the cupboard and expect that it won't be accessible in 20 years' time.
The second crucial aspect is that cryptography is often not employed by experts in the field. Contrary to what I've learned is popular belief, cryptography is not within the expertise of software engineers; it's actually mathematicians that do the hard work in inventing, verifying and constantly trying to break the building blocks that form the discipline. These are then applied in IT systems by software engineers, disconcertingly often without a good grasp of the basics. Even when these ingredients are of the highest quality and considered fresh and safe to use and with the best intentions in mind, it's easy to get things wrong and end up with a rather bad-tasting soup. Ironically, the problem may be exacerbated by our current need for cryptography when it's seen as a panacea and used in a way that provides a false sense of security.
Finally, while I'll try to avoid using technical terms without an explanation, there are a number of concepts which will be repeatedly employed and either form a basic prerequisite or are useful to develop a vocabulary in the subject:
- The use of Alice, Bob and Eve is very common in the literature to represent the good guys (the first two, the A and B parties) trying to exchange messages securely and the bad guy, Eve (the eavesdropper), who tries to subvert their efforts. Mallory is a generic malicious attacker, also interested in harming the good characters. Any resemblance to real people is purely coincidental.
- We'll refer to data generically, whatever it may represent: a text message, a PDF document, sounds etc. In a computer system or network, all data is in fact a series of bits: a sequence of
1s. It has to be interpreted according to some rules to give it meaning, whether for a computer or for a human being. We won't be going into more detail here, but remember that whether we're talking about financial information from an Internet Banking web site or a smiley face sent over a messaging app, the data itself is nothing more than
- An algorithm is simply a set of instructions, a recipe. It contains the rules to transform the input data into an output. For example, we could build an algorithm to capitalise all words in a Microsoft Word document. The scientific definition is rather complex, but you could argue that everything that happens on a computer system is governed by algorithms.
- A function, just like in mathematics, can provide an implementation of an algorithm. In this respect, we might use the two terms interchangebly, as the fine distinction isn't relevant to our discussion. A function will take an input and provide an output.
A word for developers
It's generally a good idea to use the higher-level building blocks from well-reviewed libraries as much as possible. For example, when establishing a secure communications channel, using the TLS implementation from OpenSSL or GnuTLS is preferable to conceiving a new protocol. Not only does this offer a simple upgrade path when ciphers, key exchange mechanisms or protocols become obsolete, but it also protects you from the very fine details which need to be taken into consideration.
And because this is applicable to almost every topic that will be covered in this series, when working with cryptography, some applications require a Cryptographically secure pseudo-random number generator. While sometimes higher-quality numbers are required, it's not always the case; however, it rarely hurts to err on the safe side. That being said, in Python,
os.urandom() provides just that – "a string of n random bytes suitable for cryptographic use". Your choice of development environment should always provide something equivalent; never rely on a cryptographically insecure PRNG unless you know exactly what you're doing.
The examples will be short and in the Python 3 interpreter. Fans of other languages should be able to easily understand the concept and apply it to any environment. ↩︎
Today's cryptography deals with more than that, but we'll leave that for a future article. ↩︎
Which I stubbornly refuse to spell with a lowercase i. ↩︎
Pseudo-random numbers generated from more entropy. ↩︎