Visualising similarity

A model of the global economy is, by its very nature, an unwieldy object to work with. There are 40 countries (we want more; that’s coming next) and the economy of each country is described by the economic activity of 35 sectors.

Each sector in each country interacts with each other sector in each other country creating close to two million interactions.

This is great for wowing potential users of the model with the sheer scale and size of thing, but it makes life pretty hard if you want to ask a question like “what effect has a certain change had on… well, everything?”

This is hard because “everything” here encompasses two million numbers some of which will have gone up and others of which will have gone down.

If you don’t put any effort into visualisation, the output of the model looks absolutely horrible:

Output of a World Input-Output Table

Needless to say, picking interesting information out of such a mass of numbers involves some careful thought. (For the interested, what you’re seeing here is dollar-valued commodity flows between sectors within the Australian economy, the sectors being numbered 1 to 35.)

The paper I’m writing at the moment asks an even trickier question than “what’s going on?”. I’m trying to work out how our model compares with other, more standard, ways of doing this kind of thing. This means making the same change in two models and comparing the results.

One way to boil down lots of information into a far smaller number of ‘things’ is to rank the numbers you’re analysing. This just means putting the numbers into order then saying which number is biggest, which is second-biggest etc.

So in our case, if we make a change to the global economy, instead of looking at a horrifying table of numbers we can just say “Australia was the country most affected by the change. Netherlands was second, Spain tenth, Bulgaria 39th…” and so on.

The advantage to this approach is that, when comparing the results of two models, you can just compare the ranks of the countries and see if they’re similar. If they are, you might be justified in concluding that the models are doing more-or-less the same thing.

It also allows for some nice visualisation. If we write down all the countries in one column in the order of their rank (most-affected by some change we’ve made, to least-affected) using one model, and make a second column where the countries are ordered according to their rank using the other model, we can quickly see where the differences are, particularly if we draw nice lines between the countries to show how their position has changed.

Here’s the outcome of such an experiment:

chn_vec_rank_change_wiot_vs_model
The design for this visualisation was inspired by a similar thing in the work of Hidalgo and Hausmann, see here on p4!

It shows the results of reducing demand for Chinese vehicles by $1M on the global economy in 2010. The left-hand column shows the results using a traditional model (for the interested: it’s called a Multi-Region Input-Output model, or MRIO). The most-affected countries are at the top and the least-affected at the bottom. The right-hand column is the same but for our model.

With the exception of Slovakia, the results look pretty good. The ranks are generally pretty similar which is encouraging. We’re currently trying to find out what’s going on with Slovakia, and I’ll post here if we ever find out!

(Note that Taiwan is not in our model, because the UN doesn’t report trade data for it, as it deems it to be a part of China. I won’t be delving into this international controversy here!)

Advertisement

The most important industry in the world

I’ve been modelling the interconnected nature of the global economy by simulating a reduction in demand for various sectors in various countries. It’s a very simple little piece of analysis:

What would happen if the demand for a given sector in a given country was reduced by a single US dollar?

In answering this question for every sector in every country in the model, you can get a sense of which sectors have the biggest impact on the global economy. Basically you reduce the demand for each sector by a dollar and watch what happens to the rest of the world.

Unexpectedly, perhaps, this most-important sector is the vehicles sector in China. If demand for vehicles dropped by a single dollar, an unbelievable $98 would be lost in terms of global production. This is a truly astonishing conclusion.

So where does this $98 dollars come from? Well, the interconnectedness of the global economy is behind the magnitude of the number. In short, not only do sectors which feed the Chinese vehicle sector suffer, but all the sectors which feed those sectors and so on through the network that is the global economy. And a hint of how complex the picture is, is given by this image (click for full size):

Each circle is a sector in a certain country. The lines between the sectors represent changes in trade between them due to the $1 reduction in demand for Chinese vehicles. The sectors are sized according to how affected they are by the change. (Note for technical types only: they are sized proportional to their eigenvector centrality.)

It goes to show how interconnected the global economy really is. This small change in China has knock-on effects for the US, Japan, Korea, Germany, Italy, the Netherlands… the list goes on and on.

What the flip is going on with global trade?

Similarly to many branches of statistics-gathering, the world’s trade statistics bureaux lack, in their communication style, a certain panache. The writings of such agencies are characterised by a complete absence of zing, lightness-of-touch and joie de vivre. I’ve blogged before about horrific diagrams like that shown here, and how the whole enterprise of gathering information about global trade is inaccessible and unpleasant.

So it gave me an extra tickle, to find a rare example of humour in a working paper from the National Bureau of Economic Research called “World Trade Flows: 1962-2000”.

In the paper, they present a number of databases of world trade flows from a series of years between 1962 and 2000. Blind or indifferent to the fears of the “Millenium Bug“, they use a two-digit code to represent the particular year. Let me recap: it’s a database of World Trade Flows at a given two-digit year we’ll generically call “??”.

The result can be seen here on p48 of the report. Fantastic stuff…
Feenstra et al 2005

How visualising networks broke my browser

Networks: aren’t they great? The sexiest modelling paradigm around at the moment and there are no shortage of social science researchers itching to jump on the bandwagon.

Never one to drag my heels, I blogged last week about the attempts by me and my colleagues to bring network science into Economics, and included a fancy graphic to demonstrate how visualising networks can look pretty, and potentially be informative about systems with complex interconnections.

An imagined network of three countries, Red, Blue and Green, using three products, A, B and C internally as intermediate inputs to the production process, and also trading these products with one another.
An imagined network of three countries, Red, Blue and Green, using three products, A, B and C internally as intermediate inputs to the production process, and also trading these products with one another.

But the image I included was static, prepared in a piece of open source network analysis software called Gephi (it’s one of those pieces of software that everybody hates, everybody uses, but no one understands). The natural extension to this is an interactive network diagram. Imagine if we could play with the network shown in that picture. How cool would it be to be able to drag the nodes around to see how the network responds?

Well, there is a way; and, in fact, it’s been done many times before. This cool-looking interactive visualisation is by web-visualisation guru Mike Bostock. The guy brings together insane technical skillz (he seems single-handedly to have written the popular javascript visualisation library d3) with an eye for beautiful design that leads to some of the most breath-taking infographics on the web.

His network visualisation uses something called a force-directed graph in which physics equations are used to determine the behaviour of a network. The nodes (drawn as circles) repel each other like charged particles, and the links between the circles act like springs, pulling the nodes back together. This leads to a balanced state where the nodes are as far apart from each other as they can be, under the constraint that they’re attached together with springs of varying strength.

The network shown in Mike Bostock’s example is pretty simple, but it struck me as a great way to visualise my network of networks. Here’s an example. This is Great Britain’s economy in 2009. Each circle is a sector of the economy, and a link between two sectors shows the extent to which one sector sold goods to the other in that year. For simplicity, most of the smaller links have been filtered out (otherwise, the whole thing is a tangled mess!)

This is great: the sectors are circles, with the bigger circles being the bigger sectors overall, and the connections between the circles being the value of the goods sold from one sector to another. The thicker the line, the more goods were sold.

But there’s a key piece of information missing from this way of viewing the network: the flows between sectors have direction, that is to say, it matters that sector A sold £100 worth of stuff to sector B, rather than the other way around. So how to visualise the network in a way that emphasises the directionality of the links as well as the size?

We could try putting arrows on the ends of the links, right? Mike Bostock has thought of this already of course, and has a simple example here. But the problem is that the circles in his example are a fixed size. If the circles were bigger, the arrows would get hidden underneath them. How to place the arrows when the circles are all different sizes and the line connecting them is ‘bendy’ is an ‘unpleasant’ maths problem.

How to place an arrow when circle sizes differ

I wrestled with putting arrows on the lines for a while before abandoning the project altogether. Then, after some skillful Googling (as vital to the 21st century citizen as reading and writing was to citizens of previous centuries) I came across this from Mike Bostock’s website:

Making a gradient follow a path

With this idea, I could make each end of the links a different colour with, say, red being the seller’s end, and green being the buyer’s end. On a very small subset of my UK 2009 economy network, things seem to work pretty well:

but the computational overhead is massive. Each line in this network is really a group of around 30 little pieces of line, each with its own colour, creating the effect of a smooth transition from green to red. That means that the browser has to work much harder than it otherwise would have to. This approaches scales very poorly. Here’s a slightly more filled out network (these videos are real-time captures of my browser’s output):

Although the network is still a tiny fraction of the complete picture, things are already starting to slow down. Finally, just to really push things to the limit, here’s the network as shown in the very first video in this post. As you can see, although the resulting network looks “pretty cool” (for which read, mind-bogglingly complex) my browser has basically ceased to function. It takes around ten seconds to process each frame of the animation.

So it looks like the colouring of the links is not workable. Watch this space for more updates as I try different methods for showing a big network with directed links.

Gineau-Bissau imported 500,000 litres of ‘Ice, Snow and Potable Water’ in 2003

So, like, trade data, yeah? It’s great isn’t it? All the world’s trade in tangible products, itemised by zealous customs officials, and browsable by the idle researcher, thumbing through a Who’s Who-style almanac of fascinating trade numbers, discovering little gems in the diamond stats here, unusual themes of the international art market there. A joy.

Or so you’d think. The reality is significantly more painful than my sepia-tinted dusty old office with copies of the UN’s famous COMTRADE database lying around like old Yellow Pages waiting to be perused.

What you actually have to do if you want, in a quiet moment between world-saving academic discoveries, to know of the trade patterns of the world is use this horrible-looking and overloaded website. Somehow, you’d think the world’s most glamorous and most-studied database would look more, well, bling. But there it is. It also operates incredibly slowly. Here’s an entertaining stat from the website itself:

The blistering speed demonstrated by the UN's flagship data product. COMTRADE returns a single row in just 20 seconds.
The blistering speed demonstrated by the UN's flagship data product. COMTRADE returns a single row in just 20 seconds.

The database boasts close to two billion records. This means that if a densely typed book were produced with one trade record per line, the book would be around 217 metres thick (See the bottom of this post for the calculation). In order that the UN doesn’t spend all its limited resources on server power, they’ve limited the queries you can submit to the database to be those that would return fewer than 50,000 records. So you can’t just ask: “how much stuff does the UK export to the rest of the world?” because, with around 6,300 product categories, 200-odd countries, and around fifty years of data, you quickly hit that ceiling.

The maximum will be relaxed if you contribute to server costs: for a mere $1,000US the limited is upped to 50 million records. This means that, in principle at least, you could download the entire database in just 35 queries. But how to put those queries together? We can select “all products” and “all years” and then a random bunch of countries, in the hope that the limit won’t be exceeded. But it’s impossible to know a priori how many countries will fit into a single 50 million-record query.

So I decided to do things the ‘brute force’ way: no single country exceeds the 50 million record data limit (as far as I can tell) so by submitting queries country by country, I should be able safely to avoid the ceiling. But this is still a tedious process for 200 countries: queries must be submitted via click-boxes on the website (which is painfully slow running as I’ve mentioned) and then, once the query is ready, an email is sent and you go back to the website to download a file containing the data. This file must be named appropriately (by hand) and saved somewhere appropriate before being uploaded to our data server. Keeping track of which countries you’ve submitted, which are ready, which you’ve downloaded and which uploaded to the database is a painful process.

So you can imagine my horror (or, if you can’t, think blood draining from face, dry mouth, bulging eyes, exploding brain) upon discovering that, 75 countries in to this long, boring process, I’ve been asking the server for the wrong pieces of information.

Instead of the dollar value of each transaction, I’ve ended up with quantity of a product traded. This means I now know, for example, that the Bahamas traded nine live horses with the US, but not how much those horses were worth. I also know that Swaziland bought 10 kilos of used postage stamps from South Africa, but not how much they spent buying them.

For aggregation purposes, this information is utterly useless. What is the total export value of the Solomon Islands? Well, it’s 70 tons of “Ornamental fish, live”, plus 19 kg of edible offal, plus 317 tons of “Palm kernel or babassu oil, crude”. It’s just not going to work.

So it’s back to square 1 with the downloading of trade data from the UN website. If anyone knows of a better way of doing this, let this weary researcher know quick, or there may be one fewer “Professional brainbox, unfrozen” exported from the UK in future editions of the data.

Here’s how the thickness of our imaginary book of trade data was calculated. Microsoft Word can squeeze 46 lines of fairly dense data onto an A4 page. My copy of Pemnberton & Rau’s “Mathematics for Economists” is 4cm thick and has 700 pages, or 0.0057cm per page. The COMTRADE database has 1.75 billion records, which means it’d need 38 million pages, for a total book thickness of 217,391cm or 217 metres.