Untangling the ridiculogram

Anyone who’s done any work in the vicinity of network science or, more specifically, seen social scientists attempting network science, will have seen plenty of images such as this:


ridiculogram2
Taken from Fagiolo and Mastrorillo (2013)

or this:

ridiculogram
Taken from Adamic and Glance (2005)

which add little scientific input, but merely dazzle the reader with the complexity and sheer magnitude of the networks being analysed. At a recent talk in Cambridge I heard the legendary Mark Newman refer to these network spaghetti-servings as “ridiculograms”. I know researchers who have been asked specifically to produce such diagrams to impress the difficulty of their project onto an adoring lay-crowd.

Well, I’m hoping to do a little more with a network visualisation. It’s not because I’m better than the people who made those diagrams (apologies to the researchers in question; I’m not meaning to be rude.) but just because I’ve got a lot of time to dedicate to painfully learning the tools we have available for turning spaghetti into scientific knowledge. (Caution: this pasta metaphor may have overstepped its usefulness.)

The tool of choice for the modern networks visualiser is called Gephi and it really is one of the n wonders of the open-source software world. Which makes me feel guilty about saying that I hate it, but I do. It’s amazing and brilliant but, at least on Windows 64-bit, it’s buggy as hell, and does all sorts of crazy stuff when you’re least expecting it*. Christ knows that if it were up to me to develop all of open-source, the world would be a much, much worse place.

But here’s my friendly guide to untangling that ridiculogram step-by-step, without losing too much sleep in the process:

Here’s how every network visualisation you’ve ever seen has started life. This is what you get when you import your graph into Gephi:
chinese vehicles step 1
This particular graph happens to be a version of a trade network. Each node is an economic sector in a particular country, and the edges (lines) represent the size of the flow between sectors. The truth is a bit more detailed than that. Actually this graph shows the response of the trade network to a $1M reduction in demand for Chinese vehicles.

The first thing to do is lay those bad-boys out. I’ve gone for the Yifan Hu layout, because it visually separates clusters.
chinese vehicles step 2

This then lays the ground-work for the more visually pleasing “Force Atlas 2” layout. Here I’ve gone for “Dissuade Hubs”, “LinLog mode” and “Prevent Overlap” with Scaling=5.0 and Gravity=4.0:
chinese vehicles step 3

Now let’s colour the nodes by country, to see if the clusters match to countries. It looks like they broadly do (which makes sense because sectors interact most with domestic sectors):
chinese vehicles step 4
To do this in Gephi, go the the Partition tab at the top right and select Nodes. Click on the refresh button and pick the variable you want to partition by. This will set random colours to each group.

There are rather too many countries showing here, making the colours all a bit similar, and the clusters not all that clear. Let’s wrestle with Gephi filters. (This is hard, boring and severally counterintuitive.) To filter by country, go to Filters > Attributes > Equal and select ‘country’. To filter for just China, you’d simply enter CHN into the pattern box, but our life is a bit more difficult, because we want to filter in a number of countries. To do this we use the regular expression ‘or’ concept, which is a vertical bar: ‘|’. So my pattern looks like ‘CHN|JPN|DEU|USA|KOR|FRA|AUS|ITA|GBR|BRA’. Tick the Use regex box and click OK. Now click Filter and the filter will be applied:
chinese vehicles step 5

This is starting to look a bit nicer, but we need to resize the nodes to show which are the most important. I usually size by node centrality (basically a measure of how ‘important’ the node is in the network.) To do this, go to the Statistics tab, and click Run next to Eignvector Centrality. This adds the centrality of each node as a property. To set the size of the nodes, go to Ranking at the top-left, and click the red diamond which, for some reason, stands for node size. Select Eigenvector Centrality from the list and click apply:
chinese vehicles step 6

So this is ok, but the China nodes (in green) are all so much more significant that everything else is basically invisible. The node sizes can be fine-tuned using the Spline… link. I set my spline like this, which gives a some definition to all the big values and allows lots of medium values to come through:
chinese vehicles step 7

Resulting in:
chinese vehicles step 8
which looks much better.

Now time to label the nodes. An almost invisible button at the bottom-right of the screen is actually an up-arrow behind which hides the labelling dialog. (Note that this is definitely the single-worst piece of UI design I’ve ever seen.) If you’re lucky enough to find this, set node labels on and adjust the size slider until you can see them. If the attribute you want for the label isn’t already selected, click Configure… and change it in there.

We now have:
chinese vehicles step 9

We’re almost finished with the layout, but let’s space the clusters out a bit, so we can see what’s going on within countries. The ‘Noverlap’ layout isn’t installed by default, but you can install it easily from the Plugins menu. This is really useful for spreading out clusters of nodes. I ran it with a ratio of 2.0 and a margin of 20.0. I then also ran the Expansion layout followed by the Label Adjust layout. This combination of layouts seems to get everything looking peachy:
chinese vehicles step 10

Now that the layout is complete, we can filter out some of the smaller edges. Edges are great for laying out accurately, but you don’t want to see every one on the finished diagram. To add another filter to the country filter we’ve already go, we need to add it as a subfilter. More crazy counter-intuitiveness. My completed set of filters looked like this:
chinese vehicles step 11
Note that to set the range, you can double-click on the number and type it in. Saves messing with that stupid slider.

Now that the graph is nicely layed out, and filtered, time to switch to the Preview pane. This pane doesn’t redraw unless you click Refresh after every change. After selecting Nodes > Show Labels and Edges > Rescale weight, the default preview looks like this, already pretty nice:
chinese vehicles step 13

I’ve changed the font (at least on my system, you have to do this by just typing the name and size into the Font box. I’ve typed “Tahoma 12” here.) and massively up the thickness. This means that the biggest flows are ridiculously thick, but it’s a fair trade-off to get some of the smaller flows to show up too:
chinese vehicles step 14

For a few final flourishes, export your preview to an SVG, and open it in a vector-graphics editor. (I’m using the free and totally brilliant Inkscape, but feel free to pay a million pounds for Illustrator.) I’ve used the editor to add some country labels, and move a few of the sector labels to make them more readable and less cluttered. I’ve also deleted a few nodes for visual clarity’s sake. Here’s the finished product. Pretty good I reckon, and certainly a world away from the hairball-style ridiculogram we started with:
chinese vehicles step 15
*
Here’s a list of Gephi bugs that have had me smashing my completely innocent keyboard in frustration: 1) when you save you work, then close the application, it worryingly asks you if you want to save. (Which makes me feel that something is amiss.) An insanely large number of times, the resulting file is then corrupted somehow and won’t open. 2) Hand-wrought queries you’ve spent ages writing are not saved, so you have to make ’em from scratch every time. Ugh. 3) The export to SVG option often results in a terse little “NullPointerException” error message and no SVG is produced.

That’s it. Rant more-or-less over.

Advertisements

The most important industry in the world

I’ve been modelling the interconnected nature of the global economy by simulating a reduction in demand for various sectors in various countries. It’s a very simple little piece of analysis:

What would happen if the demand for a given sector in a given country was reduced by a single US dollar?

In answering this question for every sector in every country in the model, you can get a sense of which sectors have the biggest impact on the global economy. Basically you reduce the demand for each sector by a dollar and watch what happens to the rest of the world.

Unexpectedly, perhaps, this most-important sector is the vehicles sector in China. If demand for vehicles dropped by a single dollar, an unbelievable $98 would be lost in terms of global production. This is a truly astonishing conclusion.

So where does this $98 dollars come from? Well, the interconnectedness of the global economy is behind the magnitude of the number. In short, not only do sectors which feed the Chinese vehicle sector suffer, but all the sectors which feed those sectors and so on through the network that is the global economy. And a hint of how complex the picture is, is given by this image (click for full size):

Each circle is a sector in a certain country. The lines between the sectors represent changes in trade between them due to the $1 reduction in demand for Chinese vehicles. The sectors are sized according to how affected they are by the change. (Note for technical types only: they are sized proportional to their eigenvector centrality.)

It goes to show how interconnected the global economy really is. This small change in China has knock-on effects for the US, Japan, Korea, Germany, Italy, the Netherlands… the list goes on and on.

How visualising networks broke my browser

Networks: aren’t they great? The sexiest modelling paradigm around at the moment and there are no shortage of social science researchers itching to jump on the bandwagon.

Never one to drag my heels, I blogged last week about the attempts by me and my colleagues to bring network science into Economics, and included a fancy graphic to demonstrate how visualising networks can look pretty, and potentially be informative about systems with complex interconnections.

An imagined network of three countries, Red, Blue and Green, using three products, A, B and C internally as intermediate inputs to the production process, and also trading these products with one another.
An imagined network of three countries, Red, Blue and Green, using three products, A, B and C internally as intermediate inputs to the production process, and also trading these products with one another.

But the image I included was static, prepared in a piece of open source network analysis software called Gephi (it’s one of those pieces of software that everybody hates, everybody uses, but no one understands). The natural extension to this is an interactive network diagram. Imagine if we could play with the network shown in that picture. How cool would it be to be able to drag the nodes around to see how the network responds?

Well, there is a way; and, in fact, it’s been done many times before. This cool-looking interactive visualisation is by web-visualisation guru Mike Bostock. The guy brings together insane technical skillz (he seems single-handedly to have written the popular javascript visualisation library d3) with an eye for beautiful design that leads to some of the most breath-taking infographics on the web.

His network visualisation uses something called a force-directed graph in which physics equations are used to determine the behaviour of a network. The nodes (drawn as circles) repel each other like charged particles, and the links between the circles act like springs, pulling the nodes back together. This leads to a balanced state where the nodes are as far apart from each other as they can be, under the constraint that they’re attached together with springs of varying strength.

The network shown in Mike Bostock’s example is pretty simple, but it struck me as a great way to visualise my network of networks. Here’s an example. This is Great Britain’s economy in 2009. Each circle is a sector of the economy, and a link between two sectors shows the extent to which one sector sold goods to the other in that year. For simplicity, most of the smaller links have been filtered out (otherwise, the whole thing is a tangled mess!)

This is great: the sectors are circles, with the bigger circles being the bigger sectors overall, and the connections between the circles being the value of the goods sold from one sector to another. The thicker the line, the more goods were sold.

But there’s a key piece of information missing from this way of viewing the network: the flows between sectors have direction, that is to say, it matters that sector A sold £100 worth of stuff to sector B, rather than the other way around. So how to visualise the network in a way that emphasises the directionality of the links as well as the size?

We could try putting arrows on the ends of the links, right? Mike Bostock has thought of this already of course, and has a simple example here. But the problem is that the circles in his example are a fixed size. If the circles were bigger, the arrows would get hidden underneath them. How to place the arrows when the circles are all different sizes and the line connecting them is ‘bendy’ is an ‘unpleasant’ maths problem.

How to place an arrow when circle sizes differ

I wrestled with putting arrows on the lines for a while before abandoning the project altogether. Then, after some skillful Googling (as vital to the 21st century citizen as reading and writing was to citizens of previous centuries) I came across this from Mike Bostock’s website:

Making a gradient follow a path

With this idea, I could make each end of the links a different colour with, say, red being the seller’s end, and green being the buyer’s end. On a very small subset of my UK 2009 economy network, things seem to work pretty well:

but the computational overhead is massive. Each line in this network is really a group of around 30 little pieces of line, each with its own colour, creating the effect of a smooth transition from green to red. That means that the browser has to work much harder than it otherwise would have to. This approaches scales very poorly. Here’s a slightly more filled out network (these videos are real-time captures of my browser’s output):

Although the network is still a tiny fraction of the complete picture, things are already starting to slow down. Finally, just to really push things to the limit, here’s the network as shown in the very first video in this post. As you can see, although the resulting network looks “pretty cool” (for which read, mind-bogglingly complex) my browser has basically ceased to function. It takes around ten seconds to process each frame of the animation.

So it looks like the colouring of the links is not workable. Watch this space for more updates as I try different methods for showing a big network with directed links.