I’m sure this has been blogged about many, many times, but it’s hard to believe just what a pain doing research using global country data over any significant time span can be. Obviously I was delighted to find the marvellously assembled CShapes, one man’s attempt to gather all the information about countries throughout history into one place. Countries come and counties go, and representing this seemingly simple fact requires three things which really don’t go together well in any of the database management systems around. These are:
- Space – what does this country look like from space? Where are it’s borders?
- Time – when did the country appear/disappear/start to look like this.
- A history of events to string all these spaces and times together.
So when East and West Germany become just Germany in 1989, what is the researcher attempting to do a time-series type analysis to do with the attributes of each country. Let’s say we’re interested in how much aid a certain country gives under particular circumstances; we may be asking the question “does a natural disaster in recipient country x cause a donor country a to give less money to recipient country y? And if so, how long does the effect take to work?”
To begin to answer this question we need to ‘do something’ with the information we have about how much East Germany and West Germany give once the year we’re looking at gets beyond 1989. “Just add it up,” you cry! “What could be simpler?” But the system needs to ‘know’ that these two countries became one. So we need to store a kind of ‘transformations’ register which keeps track of which shapes morphed into which other shapes and when.
But then things get complicated: how can I make sure that the data in the transformations register leads to a view of the world which makes sense? Standard database constraints areĀ no longer enough. Countries are allowed to exist twice, unlike the standard database uniqueness constraints, just not at the same time. How is this represented in a database?
Well, the outstandingly named Richard T. Snodgrass (in his book Developing Time-Oriented Database Applications in SQL) would have me add something called a ‘check assertion’. This basically ensures that before any updates are commited to a table, certain rules are followed, one of which might be that a certain country cannot exist twice at the same time. My problem with all of this? PostGreSQL doesn’t support assertions, despite it being a part of the SQL92 standard. Why not?!
Apparently no database system does. And we poor adventurers surfing the edge of what can be done with information systems are left scratching our heads and writing blog entries instead…