There’s no doubt that SQL is getting old. It was developed in the early 1970s, by IBM - in an age where computers were large centralised things; a very different world from today. Indeed, in IBM’s 1974 paper on SEQUEL (as it was then known) in
Communications of the ACM, it was designed not only for use by programmers to access a database, but also for “accountants, engineers, architects, and urban planners”.
Clearly, either standards of user-friendliness have improved over the past thirty-five years - or our standards of friendly users have dropped.
So what about this new “NoSQL” idea?
Way before SQL came on the scene, programmers processed files directly. You’d open a file, and get records from it, one by one, in order, unless you did an explicit seek to a known record number. No joins, no indices. Pretty quickly, people started to write libraries of file-handling routines. These evolved into actual databases, each with their own strengths and weaknesses - and their own programming interfaces. When SQL emerged, it was a good fit for the architectures of the time; and it soon became a standard interface to different database packages. Sure, compatability has never been great - but it’s been an issue of adapting to different dialects, rather than rewriting the entire application.
Non-SQL databases lingered in a few places. Embedded systems didn’t want the overhead of SQL, and standalone applications often had to use something simpler for their internal storage, at least until SQLite came on the scene. The UNIX databases of passwords, groups, and so on in /etc stayed the way they were, because that was a standard in its own right. And sometimes more specialised database interfaces were made for specific purposes, such as LDAP.
The development of the Web has created a growing desire to build scaleable, fault-tolerant, systems. When the Web was more read-only it was easy to maintain a pool of identical database servers with the same data manually replicated to them all, or to just put in bigger, better, SQL servers with redundant hardware.
As time passed, web sites migrated towards more interactive models, creating new pressure for databases to handle real-time updates. While some form of replication is the only way of achieving fault-tolerant data storage, replication of SQL without a bottleneck somewhere is hard. And here’s why.
Imagine you issue two updates:
UPDATE table SET foo = foo + 1 WHERE pk = 1;
UPDATE table SET foo = foo * 2 WHERE pk = 1;
Clearly, it matters what order those queries are applied in. If foo starts off as 1, and you do them in the order shown, it’ll end up as 4; if you do them in the other order, it’ll end up as 3. So in a network of clients and servers, each server has to perform the updates in the same order if we’re to have any kind of consistency between our replicas. But to agree on a global order, you either need a central master server (that becomes a bottleneck), or a complex protocol that gets consensus between all the servers (which becomes a bottleneck). So however you do it, there’s bottlenecks, which harm scalability and availability.