Blockchains aren’t always the answer
Blockchain has been a hot topic over the past few years, with the public-facing news and hype being focused on the crypto market and NFTs (Non-Fungible Tokens). Public blockchains, like Bitcoin, have proven utility as an alternative to traditional financial rails, but what is of interest to the enterprise market are private blockchains, where a blockchain network can be orchestrated between a known set of parties that share a business process, such as supply chain. Blockchain, by definition, is a distributed computing technology. This means all parties participating in the network must host infrastructure (blockchain nodes). Commitment of transactions to the blockchain must be done through network consensus, such as Byzantine Fault Tolerance, but suffers from low throughput (eg; ~1000 TPS). It’s also still relatively nascent, so development tools are new, languages for writing the business logic for the blockchain (smart contracts) are bespoke, so developing new systems built on blockchain are not for the faint of heart – you’re in for a journey!
I led the Azure Blockchain Service team at Microsoft from 2018 – 2020, where we built a fully-managed Azure service for blockchain networks. In my discussions with 100’s of CxO’s over that time, one thing I learned was enterprises may think they want a blockchain for their scenario, but upon inspection I found what they really wanted was the ability to establish digital trust in their existing business process. What I mean by digital trust is replacing, or augmenting, existing manual processes of verifying data integrity with a cryptographic means of verification. As a simple example, how can you ensure that a privileged user in HR IT has not maliciously changed a colleague’s salary? While SQL Auditing can capture the event, what it does not capture is what the change was. While this is a simple example, and one that could be easily detected and remediated, if it was a bulk transaction changing several rows in several tables – the forensics to detect and remediate are challenging. If instead, we can make that data tamper-evident, with historical values captured and cryptographically protected, remediation becomes much simpler. Employing a decentralized blockchain in this case does not make sense. The system in question (HR records) is a centralized system – why introduce the overhead of a complex blockchain just to protect the data?
Azure SQL Database ledger – the power of blockchain with the simplicity of SQL
Azure SQL Database ledger is a new feature of Azure SQL Database that employs the cryptographic patterns used in blockchains to make data in tamper-evident, coupled with the ability to capture the history of changes made along with metadata useful for forensics, such as knowing which transaction impacted which rows. Let’s explore how ledger tables work and where they’re useful.
Updatable Ledger Tables
As the name implies, with updatable ledger tables, you can issue UPDATE and DELETE statements in your ledger-enabled database. Contemplating a blockchain solution for an existing system would require changing your data pattern to one that is append-only – you can’t modify the immutable data in a blockchain. Instead, you’d have to issue corresponding, correcting transactions in order to “mimic” an update. Here’s how updatable ledger tables work.
1. A transaction is submitted to the database issuing an update to an existing row, or set of rows.
2. The previous values of the row(s) being updated are subsequently copied to a history table, which has the same schema as the updatable ledger table.
• You may say, “This sounds like Temporal to me”, and you’d be right – to a point. The history table leverages the temporal functionality.
• Always generated columns are added to the updatable and history tables which track the ID of the transaction, the sequence of operation (we’ll show an example of this later).
3. In order to view a chronicle of your data, the ledger view is created which shows the current, and previous values of your rows.
4. Data is protected through SHA 256 hashing, ultimately producing “blocks” of transactions which are hashed together to form a blockchain data structure.
• Each row updated in a transaction is hashed and using a Merkle Tree to produce a root hash representing all rows in a transaction.
• Each transaction in a block is hashed and using a Merkle Tree produces a root hash representing all transactions (and rows per the above bullet) in a block.
• Each block is hashed using the root hash of the block, hashed with the hash of the previous block
5. The hash of each block represents the state of the entire database at the point in time it was created. This hash, called the database digest is pushed outside of the SQL Database to a trusted storage service (eg: Azure Storage Immutable Blobs, or Azure Confidential Ledger).
6. To detect any potential tampering in the database, the sys.sp_verify_database_ledger stored procedure is run, which fetches the database digests in storage, re-calculates the hashes in the database, and compares the values. If the hashes do not match, a tampering event has occurred.
Append-Only Ledger Tables
Append-only ledger tables provide an additional level of security, by blocking UPDATE and DELETE T-SQL statements. This means if a privileged user, such as a DBA, attempts to make a malicious simple change, the SQL Server will block the operation. Since they only allow append-only operations, there is no need for a history table when using append-only ledger tables. The cryptographic protection for append-only ledger tables is no different than what is described above for updatable ledger tables.
Data lineage and forensics
I described above how GENERATED ALWAYS columns added to updatable and append-only ledger tables provide the metadata needed to understand changes made to your ledger tables. Let’s take a closer look at how this would work.
We’ll use a simple banking example, tracking the account balances of customers at a bank. Note that a real schema for a banking system would be much more complicated than this, but I simplified this for clarity.
We’ll create our schema, however, note the new argument LEDGER = ON. This tells the SQL Server that this will be an updatable ledger table. If I had wanted to create and append-only ledger table, I would have added LEDGER = ON (APPEND_ONLY = ON) instead.
CREATE SCHEMA [Account] GO CREATE TABLE [Account].[Balance] ( [CustomerID] INT NOT NULL PRIMARY KEY CLUSTERED, [LastName] VARCHAR (50) NOT NULL, [FirstName] VARCHAR (50) NOT NULL, [Balance] DECIMAL (10,2) NOT NULL ) WITH ( SYSTEM_VERSIONING = ON, LEDGER = ON ); GO
This will not only create our updatable ledger table, but will also create the associated history table and ledger view, similar to the below:
Now we’ll add a customer, Nick, with an opening balance of $50 to the bank’s system.
INSERT INTO [Account].[Balance] VALUES (1, 'Jones', 'Nick', 50)
Now let’s submit another transaction, this time adding 3 new customers, John, Joe and Mary with various opening balances.
INSERT INTO [Account].[Balance] VALUES (2, 'Smith', 'John', 500), (3, 'Smith', 'Joe', 30), (4, 'Michaels', 'Mary', 200)
Now lastly, let’s update Nick’s balance from $50 to $100.
UPDATE [Account].[Balance] SET [Balance] = 100 WHERE [CustomerID] = 1
OK, so we’ve submitted 3 transactions of adding 4 new customers to the bank and updated one customer’s balance. Let’s see how this looks if we inspect the updatable ledger table, the history table and the ledger view.
SELECT * ,[ledger_start_transaction_id] ,[ledger_end_transaction_id] ,[ledger_start_sequence_number] ,[ledger_end_sequence_number] FROM [Account].[Balance] GO SELECT * FROM Account.MSSQL_LedgerHistoryFor_46610071 GO SELECT * FROM Account.Balance_Ledger ORDER BY ledger_transaction_id GO
What we see here is the following:
1. Each transaction has its own transaction ID.
2. Since the addition of John, Joe and Mary are part of a single transaction, they have the same transaction ID, but the ledger_start_sequence_number indicates the order of inserts within the transaction.
3. Nick’s original balance of $50 has been moved into the history table.
4. The update of Nick’s balance to $100 is represented as a DELETE of the original value in the ledger view, followed by an INSERT of the updated balance.
The Database Ledger
The blockchain data structure in Azure SQL Database ledger is captured in 2 new system tables, sys.database_ledger_transactions , which captures all transactions issued, the block they are sealed in, commit time, and name of the user that issued the transaction. The table hashes are the root hash of the transaction as described earlier.
sys.database_ledger_blocks captures the blocks created, their associated transaction root hash, the number of transactions incorporated in the block as well as the hash of the previous block preceding it.
SELECT * FROM sys.database_ledger_transactions GO SELECT * FROM sys.database_ledger_blocks GO
Now that we’ve talked about the core aspects of how the ledger feature works, we’ll continue next month and talk about database verification. Database verification is how you detect tampering in the database, and there are several ways to configure and achieve this based on your preferences – so tune-in next month!