I’m sure this has been blogged about many, many times, but it’s hard to believe just what a pain doing research using global country data over any significant time span can be. Obviously I was delighted to find the marvellously assembled CShapes, one man’s attempt to gather all the information about countries throughout history into one place. Countries come and counties go, and representing this seemingly simple fact requires three things which really don’t go together well in any of the database management systems around. These are:

  1. Space – what does this country look like from space? Where are it’s borders?
  2. Time – when did the country appear/disappear/start to look like this.
  3. A history of events to string all these spaces and times together.

So when East and West Germany become just Germany in 1989, what is the researcher attempting to do a time-series type analysis to do with the attributes of each country. Let’s say we’re interested in how much aid a certain country gives under particular circumstances; we may be asking the question “does a natural disaster in recipient country x cause a donor country a to give less money to recipient country y? And if so, how long does the effect take to work?”

To begin to answer this question we need to ‘do something’ with the information we have about how much East Germany and West Germany give once the year we’re looking at gets beyond 1989. “Just add it up,” you cry! “What could be simpler?” But the system needs to ‘know’ that these two countries became one. So we need to store a kind of ‘transformations’ register which keeps track of which shapes morphed into which other shapes and when.

But then things get complicated: how can I make sure that the data in the transformations register leads to a view of the world which makes sense? Standard database constraints areĀ  no longer enough. Countries are allowed to exist twice, unlike the standard database uniqueness constraints, just not at the same time. How is this represented in a database?

Well, the outstandingly named Richard T. Snodgrass (in his book Developing Time-Oriented Database Applications in SQL) would have me add something called a ‘check assertion’. This basically ensures that before any updates are commited to a table, certain rules are followed, one of which might be that a certain country cannot exist twice at the same time. My problem with all of this? PostGreSQL doesn’t support assertions, despite it being a part of the SQL92 standard. Why not?!

Apparently no database system does. And we poor adventurers surfing the edge of what can be done with information systems are left scratching our heads and writing blog entries instead…


All about databases

Oh my goodness, you cry in thrilled elation. A blog all about one researcher’s investigations into the vagueries and subtleties of database design?

Whilst I acknowledge that for a good proportion of the seven billion potential readers of this blog, the prospect of a blog about data is not going to compete with the universes of Twitter, YouTube and gathering fresh water (yes, I realise that not all of the seven billion spend their time idly wondering what to do with their computer-bound, office tarnished days) but for those happy few for whom this counts as a ‘hobby or interest’ on dating websites and LinkedIn profiles, a great day has dawned.

I’ll be filling these pages with interesting stuff I come across in my attempt to build the ultimate all-emcompassing “Global Intelligence System”, first reviews of which included the now-fabled “it’s not intelligent, and it’s certainly not global”. At least said critic acknowledged that it was a system, eh?!

Till next time, DBMS-fans.