How visualising networks broke my browser

Networks: aren’t they great? The sexiest modelling paradigm around at the moment and there are no shortage of social science researchers itching to jump on the bandwagon.

Never one to drag my heels, I blogged last week about the attempts by me and my colleagues to bring network science into Economics, and included a fancy graphic to demonstrate how visualising networks can look pretty, and potentially be informative about systems with complex interconnections.

An imagined network of three countries, Red, Blue and Green, using three products, A, B and C internally as intermediate inputs to the production process, and also trading these products with one another.
An imagined network of three countries, Red, Blue and Green, using three products, A, B and C internally as intermediate inputs to the production process, and also trading these products with one another.

But the image I included was static, prepared in a piece of open source network analysis software called Gephi (it’s one of those pieces of software that everybody hates, everybody uses, but no one understands). The natural extension to this is an interactive network diagram. Imagine if we could play with the network shown in that picture. How cool would it be to be able to drag the nodes around to see how the network responds?

Well, there is a way; and, in fact, it’s been done many times before. This cool-looking interactive visualisation is by web-visualisation guru Mike Bostock. The guy brings together insane technical skillz (he seems single-handedly to have written the popular javascript visualisation library d3) with an eye for beautiful design that leads to some of the most breath-taking infographics on the web.

His network visualisation uses something called a force-directed graph in which physics equations are used to determine the behaviour of a network. The nodes (drawn as circles) repel each other like charged particles, and the links between the circles act like springs, pulling the nodes back together. This leads to a balanced state where the nodes are as far apart from each other as they can be, under the constraint that they’re attached together with springs of varying strength.

The network shown in Mike Bostock’s example is pretty simple, but it struck me as a great way to visualise my network of networks. Here’s an example. This is Great Britain’s economy in 2009. Each circle is a sector of the economy, and a link between two sectors shows the extent to which one sector sold goods to the other in that year. For simplicity, most of the smaller links have been filtered out (otherwise, the whole thing is a tangled mess!)

This is great: the sectors are circles, with the bigger circles being the bigger sectors overall, and the connections between the circles being the value of the goods sold from one sector to another. The thicker the line, the more goods were sold.

But there’s a key piece of information missing from this way of viewing the network: the flows between sectors have direction, that is to say, it matters that sector A sold £100 worth of stuff to sector B, rather than the other way around. So how to visualise the network in a way that emphasises the directionality of the links as well as the size?

We could try putting arrows on the ends of the links, right? Mike Bostock has thought of this already of course, and has a simple example here. But the problem is that the circles in his example are a fixed size. If the circles were bigger, the arrows would get hidden underneath them. How to place the arrows when the circles are all different sizes and the line connecting them is ‘bendy’ is an ‘unpleasant’ maths problem.

How to place an arrow when circle sizes differ

I wrestled with putting arrows on the lines for a while before abandoning the project altogether. Then, after some skillful Googling (as vital to the 21st century citizen as reading and writing was to citizens of previous centuries) I came across this from Mike Bostock’s website:

Making a gradient follow a path

With this idea, I could make each end of the links a different colour with, say, red being the seller’s end, and green being the buyer’s end. On a very small subset of my UK 2009 economy network, things seem to work pretty well:

but the computational overhead is massive. Each line in this network is really a group of around 30 little pieces of line, each with its own colour, creating the effect of a smooth transition from green to red. That means that the browser has to work much harder than it otherwise would have to. This approaches scales very poorly. Here’s a slightly more filled out network (these videos are real-time captures of my browser’s output):

Although the network is still a tiny fraction of the complete picture, things are already starting to slow down. Finally, just to really push things to the limit, here’s the network as shown in the very first video in this post. As you can see, although the resulting network looks “pretty cool” (for which read, mind-bogglingly complex) my browser has basically ceased to function. It takes around ten seconds to process each frame of the animation.

So it looks like the colouring of the links is not workable. Watch this space for more updates as I try different methods for showing a big network with directed links.

Advertisements

Gineau-Bissau imported 500,000 litres of ‘Ice, Snow and Potable Water’ in 2003

So, like, trade data, yeah? It’s great isn’t it? All the world’s trade in tangible products, itemised by zealous customs officials, and browsable by the idle researcher, thumbing through a Who’s Who-style almanac of fascinating trade numbers, discovering little gems in the diamond stats here, unusual themes of the international art market there. A joy.

Or so you’d think. The reality is significantly more painful than my sepia-tinted dusty old office with copies of the UN’s famous COMTRADE database lying around like old Yellow Pages waiting to be perused.

What you actually have to do if you want, in a quiet moment between world-saving academic discoveries, to know of the trade patterns of the world is use this horrible-looking and overloaded website. Somehow, you’d think the world’s most glamorous and most-studied database would look more, well, bling. But there it is. It also operates incredibly slowly. Here’s an entertaining stat from the website itself:

The blistering speed demonstrated by the UN's flagship data product. COMTRADE returns a single row in just 20 seconds.
The blistering speed demonstrated by the UN's flagship data product. COMTRADE returns a single row in just 20 seconds.

The database boasts close to two billion records. This means that if a densely typed book were produced with one trade record per line, the book would be around 217 metres thick (See the bottom of this post for the calculation). In order that the UN doesn’t spend all its limited resources on server power, they’ve limited the queries you can submit to the database to be those that would return fewer than 50,000 records. So you can’t just ask: “how much stuff does the UK export to the rest of the world?” because, with around 6,300 product categories, 200-odd countries, and around fifty years of data, you quickly hit that ceiling.

The maximum will be relaxed if you contribute to server costs: for a mere $1,000US the limited is upped to 50 million records. This means that, in principle at least, you could download the entire database in just 35 queries. But how to put those queries together? We can select “all products” and “all years” and then a random bunch of countries, in the hope that the limit won’t be exceeded. But it’s impossible to know a priori how many countries will fit into a single 50 million-record query.

So I decided to do things the ‘brute force’ way: no single country exceeds the 50 million record data limit (as far as I can tell) so by submitting queries country by country, I should be able safely to avoid the ceiling. But this is still a tedious process for 200 countries: queries must be submitted via click-boxes on the website (which is painfully slow running as I’ve mentioned) and then, once the query is ready, an email is sent and you go back to the website to download a file containing the data. This file must be named appropriately (by hand) and saved somewhere appropriate before being uploaded to our data server. Keeping track of which countries you’ve submitted, which are ready, which you’ve downloaded and which uploaded to the database is a painful process.

So you can imagine my horror (or, if you can’t, think blood draining from face, dry mouth, bulging eyes, exploding brain) upon discovering that, 75 countries in to this long, boring process, I’ve been asking the server for the wrong pieces of information.

Instead of the dollar value of each transaction, I’ve ended up with quantity of a product traded. This means I now know, for example, that the Bahamas traded nine live horses with the US, but not how much those horses were worth. I also know that Swaziland bought 10 kilos of used postage stamps from South Africa, but not how much they spent buying them.

For aggregation purposes, this information is utterly useless. What is the total export value of the Solomon Islands? Well, it’s 70 tons of “Ornamental fish, live”, plus 19 kg of edible offal, plus 317 tons of “Palm kernel or babassu oil, crude”. It’s just not going to work.

So it’s back to square 1 with the downloading of trade data from the UN website. If anyone knows of a better way of doing this, let this weary researcher know quick, or there may be one fewer “Professional brainbox, unfrozen” exported from the UK in future editions of the data.

Here’s how the thickness of our imaginary book of trade data was calculated. Microsoft Word can squeeze 46 lines of fairly dense data onto an A4 page. My copy of Pemnberton & Rau’s “Mathematics for Economists” is 4cm thick and has 700 pages, or 0.0057cm per page. The COMTRADE database has 1.75 billion records, which means it’d need 38 million pages, for a total book thickness of 217,391cm or 217 metres.