Blockchain tech, especially smart contracts, are the hot new "internet". Post the creation of Bitcoin, we've seen the rise of the public smart contract system Ethereum and several "private" systems like The Linux Foundation's Hyperledger. These distributed ledgers have become the hot new foundation to build apps on top of, leveraging the additional trust that they are supposed to provide by virtue of their distributed nature.

Demian Brener of OpenZeppelin, which provides a set of reusable smart contracts atop Ethereum, writes -

There are no tools for developers to easily create, test, verify and audit smart contracts, and do so collaboratively.

As we accrete such common contract tools and libraries, a blockchain app developer's job certainly continues to ease. But how should we audit the set of contracts that make up our applications to ensure that they leverage the blockchain platform correctly without loopholes?

That is, simply, a hard problem. It can be particularly hard for someone coming with a domain of expertise and looking to leverage smart contracts as a platform.

To explain why that is a hard problem, we need to go back to what a blockchain really is. Not from the tech perspective, but from the system properties that it provides for leveraging by apps. In summary, a blockchain provides you a database -

  1. whose records cannot be mutated once created,
  2. which is impossibly hard to tamper with, and
  3. which is auditable by non participants.

The immutability of records is provided by the chaining of new blocks of records onto the existing chain (hence "blockchain"). The tamper evidence is possible since if one record is tampered with, then the signatures of the subsequent records won't match up. Furthermore, all blockchain peers will have to agree about the change even if the signature verified fine. The auditability is provided by an open and inexpensive mechanism to compute these signatures based on the content of a block.

The cryptography that goes towards providing these properties is nothing short of genius, but the value of these properties is, relatively speaking, not hard to articulate and understand. After all, these properties, when enforceable without a central authority stepping in, should vastly increase our trust in such a system.

If we put aside the immutability and tamper evidence and focus on the auditability of the records on a blockchain, we can see that all information placed in these records must be independently and timelessly verifiable. In other words, the data placed there cannot refer to entities that can change over time. For example, if we want to log that The New York Times wrote an article about Trump on 29th May 2017, it is not sufficient to place a link to the article, note down the date and time, add it as a record on the blockchain and expect to be done. The server serving up the link may go down. The maintainer may change the content of the link to say something else entirely. The link may redirect to another article or to a cat GIF. This
mutability of what the URL refers to makes it impossible to audit at arbitrary times.

In that sense, the blockchain is best not used as an arbitrary database. Doing so unnecessarily increases the cost of storing the record while reaping none of the benefits listed above. Might as well create a public regular RDBMS and place the info on it. Only information of a certain nature benefits from existing on a blockchain.

So, to make a public record that the NYT did publish such an article, what can we do? Let's say we have a system that archives all NYT articles as they appear. We can then store a copy of the entire article as a record. But then, what stops anyone from uploading any content and claim that it was published by the NYT? One of the things we can do to mminimize the amount of data we store is to not store the whole article itself, but to store a cryptographic hash of the article on the blockchain, while we archive the article itself in a regular database. Since we can, at any time, compute the hash of the archived articles and prove, with negligible error probability, that the article is indeed what was logged on the blockchain, it is sufficient to talk about storing these recomputable "hashes".

We still haven't addressed the problem of "what if someone records a fake article on the blockchain?". But all we need is a system to increase our trust that a hash pulled out of the system was indeed the article. For example, if all CDNs logged hashes of articles, then showing that the hash we logged can also be found in the logs of a few independent CDNs over which we have no admin control over, would increase the trust other would have in our claim.

In short, injecting external data into a blockchain record is a non-trivial problem.

On the other hand, say we're logging information about a book. We can refer to the book by its ISBN number. The ISBN database is maintained in a reliable manner across the globe and lets us check at any time any metadata associated with the book once we know its ISBN number. The ISBN number, therefore, is an auditable data item that can be placed in a blockchain record. The probability that the highly replicated ISBN database is tampered with in an undetectable way is pretty low. To tamper with the book that a number refers to, not only do we need to change the database contents, but we also need to change all the printed or downloaded copies of the book that feature the ISBN number. To further strengthen this, if every time the ISBN org registers a new book it placed the record on the blockchain instead of in a normal database, then the record's cryptographic hash can be used in place of the ISBN number to refer to the book in a unique manner.

In this way, all information that is within a blockchain record eventually is best folded into the blockchain itself, making the entire system closed.

If we fail to create such a closed system, the next best thing is to only refer to highly trusted and timeless systems. Since the strength of the blockchain system is increased by the volume of transactions recorded on it, folding statements produced by these highly trusted systems into blockchain records would, effectively, carve them on digital diamonds.

The data touched by a blockchain app, therefore, necessarily infects every part of the system it comes from. Immutability is an infectious virus.

Not even something as common as email addresses would necessarily qualify as immutable data. Of course, nothing stops us from placing email addresses in a record as usual, but what do these refer to? Is it referring to the author of a book? If so, has the author changed her email address now? Has the provider reassigned the email address to someone else? Was the email address placed there without the consent of the owner of the address? Did it even exist in the first place? Should we include some proof that an email actually got sent from this address? Should we include info about the IP address from which the mail got sent? The DKIM signature of the mailer that sent it? Different applications would require different answers to these questions. The bitcoin application, for example, neatly side steps this identity problem by creating an identity system - the wallet address - that exists by creation within the blockchain system.

So, how do we audit our blockchain application described as smart contracts?

We need to ensure that literally every bit of information we include in our contract comes from a timeless source.

With data that goes on the blockchain, it is all the more imperative to answer the timeless questions of epistemology -

  1. What do you know? - i.e. what does the data on the blockchain signify?
  2. How do you know it? - i.e. how are sure that it indeed does signify what you claim it to?