How visualising networks broke my browser

Networks: aren’t they great? The sexiest modelling paradigm around at the moment and there are no shortage of social science researchers itching to jump on the bandwagon.

Never one to drag my heels, I blogged last week about the attempts by me and my colleagues to bring network science into Economics, and included a fancy graphic to demonstrate how visualising networks can look pretty, and potentially be informative about systems with complex interconnections.

An imagined network of three countries, Red, Blue and Green, using three products, A, B and C internally as intermediate inputs to the production process, and also trading these products with one another.
An imagined network of three countries, Red, Blue and Green, using three products, A, B and C internally as intermediate inputs to the production process, and also trading these products with one another.

But the image I included was static, prepared in a piece of open source network analysis software called Gephi (it’s one of those pieces of software that everybody hates, everybody uses, but no one understands). The natural extension to this is an interactive network diagram. Imagine if we could play with the network shown in that picture. How cool would it be to be able to drag the nodes around to see how the network responds?

Well, there is a way; and, in fact, it’s been done many times before. This cool-looking interactive visualisation is by web-visualisation guru Mike Bostock. The guy brings together insane technical skillz (he seems single-handedly to have written the popular javascript visualisation library d3) with an eye for beautiful design that leads to some of the most breath-taking infographics on the web.

His network visualisation uses something called a force-directed graph in which physics equations are used to determine the behaviour of a network. The nodes (drawn as circles) repel each other like charged particles, and the links between the circles act like springs, pulling the nodes back together. This leads to a balanced state where the nodes are as far apart from each other as they can be, under the constraint that they’re attached together with springs of varying strength.

The network shown in Mike Bostock’s example is pretty simple, but it struck me as a great way to visualise my network of networks. Here’s an example. This is Great Britain’s economy in 2009. Each circle is a sector of the economy, and a link between two sectors shows the extent to which one sector sold goods to the other in that year. For simplicity, most of the smaller links have been filtered out (otherwise, the whole thing is a tangled mess!)

This is great: the sectors are circles, with the bigger circles being the bigger sectors overall, and the connections between the circles being the value of the goods sold from one sector to another. The thicker the line, the more goods were sold.

But there’s a key piece of information missing from this way of viewing the network: the flows between sectors have direction, that is to say, it matters that sector A sold £100 worth of stuff to sector B, rather than the other way around. So how to visualise the network in a way that emphasises the directionality of the links as well as the size?

We could try putting arrows on the ends of the links, right? Mike Bostock has thought of this already of course, and has a simple example here. But the problem is that the circles in his example are a fixed size. If the circles were bigger, the arrows would get hidden underneath them. How to place the arrows when the circles are all different sizes and the line connecting them is ‘bendy’ is an ‘unpleasant’ maths problem.

How to place an arrow when circle sizes differ

I wrestled with putting arrows on the lines for a while before abandoning the project altogether. Then, after some skillful Googling (as vital to the 21st century citizen as reading and writing was to citizens of previous centuries) I came across this from Mike Bostock’s website:

Making a gradient follow a path

With this idea, I could make each end of the links a different colour with, say, red being the seller’s end, and green being the buyer’s end. On a very small subset of my UK 2009 economy network, things seem to work pretty well:

but the computational overhead is massive. Each line in this network is really a group of around 30 little pieces of line, each with its own colour, creating the effect of a smooth transition from green to red. That means that the browser has to work much harder than it otherwise would have to. This approaches scales very poorly. Here’s a slightly more filled out network (these videos are real-time captures of my browser’s output):

Although the network is still a tiny fraction of the complete picture, things are already starting to slow down. Finally, just to really push things to the limit, here’s the network as shown in the very first video in this post. As you can see, although the resulting network looks “pretty cool” (for which read, mind-bogglingly complex) my browser has basically ceased to function. It takes around ten seconds to process each frame of the animation.

So it looks like the colouring of the links is not workable. Watch this space for more updates as I try different methods for showing a big network with directed links.

Putting Networks into Economics: A Manifesto

The social sciences are buzzing with excitement about network analysis: here at UCL’s Centre for Advanced Spatial Analysis (CASA), networks are being used in ways both intuitive, in the modelling of traffic flows around road networks and patterns in the use of social networking; to the highly counter-intuitive, such as in the study of where and when crimes occur in relation to other crimes, how rioters and the police play out a networked game of cat and mouse, or how ethnic conflicts are affected by access to, and use by the government of, media both new and traditional.

But Economics has been late to the game. Eric Beinhocker has written forcefully about the need for Economics to adapt, by doing nothing less controversial than simply to acknowledge the existence of the Second Law of Thermodynamics—the law of ever-increasing disorder which was pithily summarised by Jagger/Richards as “you can’t always get what you want”—and to embrace networks and the science associated with them: complexity.

The cost at which a particular government can borrow from investors is clearly related to how convinced the money markets are that other governments represent a safe bet, yet borrowing costs are routinely studied using only the variables of the country being studied. Similarly, you can’t expect to explain movements in macroeconomic variables such as exchange rates or foreign trade, without including the other countries in your analysis. It is this kind of old-world thinking, regress A against B to see if it has ‘an effect’, that has led Economics to consistently misunderstand the world as it really is today. (And, indeed, as it always has been.)

John Stuart Mill, in motivating previous generations of economists, described the booms and busts of macroeconomics as being like a stormy sea tossing and rolling a boat: “Would you advise those who go to sea to deny the wind and the waves — or to make use of them and to find the means of guarding against their dangers?” This is an analogy which seems immediately to require a global focus to the study of Economics, and an approach to the systems and interactions involved which is meteorological in scale. But too often we see attempts to predict the movement of the ship based on the actions of the crew, rather than an acknowledgement that bigger forces are at work.

Economic stormy seas: who is really in control here?
Image: James E. Buttersworth [Public domain], via Wikimedia Commons
So to the problem of applying the lessons of complexity to the global economic system: we must start with a network upon which to operate. But the systems involved are intimidatingly large. The European Union, the globe’s biggest single market, represents the economic activity of some 500 million inhabitants trading goods and services with a value of around $16 trillion. Any practicable representation of a network of systems of this complexity, and the global economy is surely the system of maximum-conceivable complexity, would be such a gross simplification as to iron out all of the interactions, chain reactions, bifurcations and time-lag effects which could potentially make a model useful for explaining or forecasting global events. And yet we must begin somewhere.

As with the meteorologist, the place to begin must surely be with data. Given the dismal state of Economics as a tool for predicting even the simplest of human interations, let alone something as subtle and sensitive as the famous but little-understood “market outcome”, we need to build inductive models based on as much data as we can gather. An inductive model is one which is driven, to the greatest extent possible, by what can actually be observed without relying on assumptions or postulated mechanisms through which one variable affects another.

So how can a network representation of the global economy be put together in a way which is rich in data and light on assumptions? Perhaps we might continue the meteorological analogy: just as the weather is something greater than the movement of warm and cold air around the world, and yet can be described and predicted by knowing about how, where and when the air moves, so the economy is a function of, but something more than, the global flow of goods and services. And just as storms and heatwaves are predicted by weathermen watching airflows, the phenomena of interest to the economist, wealth and growth, poverty and inequality, might be studied by watching these goods and services move around the world. Note that there’s no assumption about causality in weather forecasting, or any statement about the mechanisms by which certain air flows result in certain weather patterns. They simply trust the data to include these implicit relationships without ever attempting to specify them. This is why an inductive model like this is distinct from a deductive one, where theories are postulated to explain mechanisms and causal links.

We can start by describing the flow of goods and services around the global economy as taking place at just two distinct levels: within an economy, and between economies. These two levels can each be characterised by a simple question.

Within an economy: to what extent does the production of goods and services, the level of which is set in response to demand from consumers, require the production of other goods and services as inputs to the production process? The answer to this question can be represented by a network of dependency between the various goods and services an economy produces. Some goods require a great deal of input from other sectors of the economy, others are almost independent.

Between economies: to what extent does trade in goods and services occur between economies which produce a product, and economies which consume it? The answer again comes in the form of a network of trading relationships. Some countries have a history of trading certain products with one another, and other countries are largely self-sufficient. Critically, in both cases, the structure of the network, that is to say the relative importance of each interconnection, can be derived purely from data.

A network of networks
A network representation of trade: “it’s complicated.” Image: me and my amazing skills

In both cases, we build a network representation of flows which take the available data as a starting point, and take that data seriously. It is the data which defines which countries are linked to which other countries, and the data which defines the relative significance of those links. Where traditional economic models attempt to say something insightful about the high levels of trade between, say, France and its former colonies, here we let the data do the talking. If France trades more with Senegal, a former French colony, than it does with Ethiopia, a former Italian colony, then the data will reflect this, and the network which is built on that data will include the bias in its construction, along with all other biases too numerous to hope to describe explicitly. We are taking the observable state of the trade network as the best possible proxy for all the subtleties and irrationalities of human interaction and, in doing so, explicitly abandon the attempt to include such unknowables in our model. It is in this sense that we are taking the data seriously.

Even with nothing but a static description of the world’s economy as viewed through the observed networks of production and trade, we can perform some interesting analysis. For example, which trade link—that is to say, which product, traded between which two countries—contributes most to global output? The answer may not be simply the trade link with the highest dollar value. For example, Belgium provides around a quarter of France’s imported iron and steel. If those two countries had a falling out which caused them to stop trading altogether then, all things being equal, France would have 25% less steel to use in manufacturing other goods. The French car industry uses 75% domestic metals and 25% imported metal, meaning that French output of cars would be down 12.5%. Since 50% of all cars produced in France are exported, and France exports over $20 billion-worth of cars a year, the spat between Belgium and France would cost the world over $1 billion in lost car exports alone.

The above simple example is constructed using trade data from the UN, and a sector-by-sector description of France’s economy, produced by the European Union, including which products are required to produce cars and how much of each of those products is imported. By combining these two data sets we build a picture of the movement of goods and services not only through economies but between economies and, hence, around the world. By adding some simple linearity assumptions—to make 1% more cars, France needs 1% more steel—we can even begin to do some dynamic analysis of the global economy, to answer questions such as: how will the world respond to a growth in demand for cars coming from China? Or: which products would most benefit Nigeria if imports were replaced with domestic production? And how would that affect Nigeria’s trading partners?

We describe this framework, a network of networks based on observable data, as a Global Demonstration Model: it is a least-assumptions description of the world’s economy designed to give a demonstration of the power of network and complexity science when applied to questions which are inescapably global in nature, such as those concerning migration, trade, international security and development aid. More subtle and realistic models are of course possible, but the more interpretation and insight we add to the model, the further we get from the data.

Here at CASA we’re currently assembling the data that will allow us to build our global network of networks. We’re expecting to see the first version of our demonstration model, and the first of the demonstration analyses based on it, to be completed in the autumn.

I’ll be posting the results of these explorations here on this blog. Stay tuned…

Oxfam, the ODI and the open access debate

Anyone following either of the development world’s Twitter stalwarts, Duncan Green at Oxfam and Owen Barder at the Centre for Global Development (CGDev), (these guys are so prolific (and entertaining) on Twitter, you wonder how they have time to do anything else!) may have been confused by the torrent of acronyms coming out of yesterday’s burst of activity regarding open access to journals.

DuncanGreenTwitter

My short summary of the situation is this (gents, correct me if any of this is wrong):

An NGO called the Overseas Development Institute (ODI) publish a couple of peer-reviewed journals, the kind of thing that is important for academics to get their results taken seriously. In brief, once a piece of research is in a peer-reviewed journal, it can effectively be treated as ‘respectable’ by the rest of the academic community: other researchers are ‘allowed’ to take it as the starting point for their own research. (Obviously they can disagree that the conclusions of the research are valid, but there’s less scope for arguing that the research itself is meaningless.) The two journals published by the ODI are called Development Policy Review (DPR) and Disasters.

If you want to read the articles in either of these ODI journals, or any other journal not explicitly published as ‘open access‘, you need to be either (a) a member of an academic instition which has paid the publisher for access rights, (b) from one of a particular set of developing countries which have a special waver of the access fee, or pay for the article yourself. (In my experience, they are around $50 a pop.)

Access to the latest research by as many people as possible (and not just other full-time paid brainboxes) is generally considered a “good thing”, and it’s over the ODI’s open access policy that Green and Barder have been getting so excited on Twitter. The ODI’s line on open access is that they are pro the idea, but that the journals are actually published by a third party so they don’t actually have the right to make the articles available on their website.

And the debate is not purely one-sided. While open access sounds like a great idea there are costs involved in published, as with any other curated service. Both the publisher, Wiley, and the ODI themselves have costs to cover involved in producing research which is of high-quality and which, more importantly, double-checked by the eyes of at least two experts in the field. This principle of peer review is at the very heart of the scientific process (which can be summarised thus: clever person does experiment, other clever people check what they’ve done makes sense, future generations of clever people can build on this work without having to verify all the results themselves.)

So watch out for Green’s and Barder’s joint blog rant about open access, and the debate around the funding of and access to science which will follow.

Gineau-Bissau imported 500,000 litres of ‘Ice, Snow and Potable Water’ in 2003

So, like, trade data, yeah? It’s great isn’t it? All the world’s trade in tangible products, itemised by zealous customs officials, and browsable by the idle researcher, thumbing through a Who’s Who-style almanac of fascinating trade numbers, discovering little gems in the diamond stats here, unusual themes of the international art market there. A joy.

Or so you’d think. The reality is significantly more painful than my sepia-tinted dusty old office with copies of the UN’s famous COMTRADE database lying around like old Yellow Pages waiting to be perused.

What you actually have to do if you want, in a quiet moment between world-saving academic discoveries, to know of the trade patterns of the world is use this horrible-looking and overloaded website. Somehow, you’d think the world’s most glamorous and most-studied database would look more, well, bling. But there it is. It also operates incredibly slowly. Here’s an entertaining stat from the website itself:

The blistering speed demonstrated by the UN's flagship data product. COMTRADE returns a single row in just 20 seconds.
The blistering speed demonstrated by the UN's flagship data product. COMTRADE returns a single row in just 20 seconds.

The database boasts close to two billion records. This means that if a densely typed book were produced with one trade record per line, the book would be around 217 metres thick (See the bottom of this post for the calculation). In order that the UN doesn’t spend all its limited resources on server power, they’ve limited the queries you can submit to the database to be those that would return fewer than 50,000 records. So you can’t just ask: “how much stuff does the UK export to the rest of the world?” because, with around 6,300 product categories, 200-odd countries, and around fifty years of data, you quickly hit that ceiling.

The maximum will be relaxed if you contribute to server costs: for a mere $1,000US the limited is upped to 50 million records. This means that, in principle at least, you could download the entire database in just 35 queries. But how to put those queries together? We can select “all products” and “all years” and then a random bunch of countries, in the hope that the limit won’t be exceeded. But it’s impossible to know a priori how many countries will fit into a single 50 million-record query.

So I decided to do things the ‘brute force’ way: no single country exceeds the 50 million record data limit (as far as I can tell) so by submitting queries country by country, I should be able safely to avoid the ceiling. But this is still a tedious process for 200 countries: queries must be submitted via click-boxes on the website (which is painfully slow running as I’ve mentioned) and then, once the query is ready, an email is sent and you go back to the website to download a file containing the data. This file must be named appropriately (by hand) and saved somewhere appropriate before being uploaded to our data server. Keeping track of which countries you’ve submitted, which are ready, which you’ve downloaded and which uploaded to the database is a painful process.

So you can imagine my horror (or, if you can’t, think blood draining from face, dry mouth, bulging eyes, exploding brain) upon discovering that, 75 countries in to this long, boring process, I’ve been asking the server for the wrong pieces of information.

Instead of the dollar value of each transaction, I’ve ended up with quantity of a product traded. This means I now know, for example, that the Bahamas traded nine live horses with the US, but not how much those horses were worth. I also know that Swaziland bought 10 kilos of used postage stamps from South Africa, but not how much they spent buying them.

For aggregation purposes, this information is utterly useless. What is the total export value of the Solomon Islands? Well, it’s 70 tons of “Ornamental fish, live”, plus 19 kg of edible offal, plus 317 tons of “Palm kernel or babassu oil, crude”. It’s just not going to work.

So it’s back to square 1 with the downloading of trade data from the UN website. If anyone knows of a better way of doing this, let this weary researcher know quick, or there may be one fewer “Professional brainbox, unfrozen” exported from the UK in future editions of the data.

Here’s how the thickness of our imaginary book of trade data was calculated. Microsoft Word can squeeze 46 lines of fairly dense data onto an A4 page. My copy of Pemnberton & Rau’s “Mathematics for Economists” is 4cm thick and has 700 pages, or 0.0057cm per page. The COMTRADE database has 1.75 billion records, which means it’d need 38 million pages, for a total book thickness of 217,391cm or 217 metres.