Gineau-Bissau imported 500,000 litres of ‘Ice, Snow and Potable Water’ in 2003

So, like, trade data, yeah? It’s great isn’t it? All the world’s trade in tangible products, itemised by zealous customs officials, and browsable by the idle researcher, thumbing through a Who’s Who-style almanac of fascinating trade numbers, discovering little gems in the diamond stats here, unusual themes of the international art market there. A joy.

Or so you’d think. The reality is significantly more painful than my sepia-tinted dusty old office with copies of the UN’s famous COMTRADE database lying around like old Yellow Pages waiting to be perused.

What you actually have to do if you want, in a quiet moment between world-saving academic discoveries, to know of the trade patterns of the world is use this horrible-looking and overloaded website. Somehow, you’d think the world’s most glamorous and most-studied database would look more, well, bling. But there it is. It also operates incredibly slowly. Here’s an entertaining stat from the website itself:

The blistering speed demonstrated by the UN's flagship data product. COMTRADE returns a single row in just 20 seconds.
The blistering speed demonstrated by the UN's flagship data product. COMTRADE returns a single row in just 20 seconds.

The database boasts close to two billion records. This means that if a densely typed book were produced with one trade record per line, the book would be around 217 metres thick (See the bottom of this post for the calculation). In order that the UN doesn’t spend all its limited resources on server power, they’ve limited the queries you can submit to the database to be those that would return fewer than 50,000 records. So you can’t just ask: “how much stuff does the UK export to the rest of the world?” because, with around 6,300 product categories, 200-odd countries, and around fifty years of data, you quickly hit that ceiling.

The maximum will be relaxed if you contribute to server costs: for a mere $1,000US the limited is upped to 50 million records. This means that, in principle at least, you could download the entire database in just 35 queries. But how to put those queries together? We can select “all products” and “all years” and then a random bunch of countries, in the hope that the limit won’t be exceeded. But it’s impossible to know a priori how many countries will fit into a single 50 million-record query.

So I decided to do things the ‘brute force’ way: no single country exceeds the 50 million record data limit (as far as I can tell) so by submitting queries country by country, I should be able safely to avoid the ceiling. But this is still a tedious process for 200 countries: queries must be submitted via click-boxes on the website (which is painfully slow running as I’ve mentioned) and then, once the query is ready, an email is sent and you go back to the website to download a file containing the data. This file must be named appropriately (by hand) and saved somewhere appropriate before being uploaded to our data server. Keeping track of which countries you’ve submitted, which are ready, which you’ve downloaded and which uploaded to the database is a painful process.

So you can imagine my horror (or, if you can’t, think blood draining from face, dry mouth, bulging eyes, exploding brain) upon discovering that, 75 countries in to this long, boring process, I’ve been asking the server for the wrong pieces of information.

Instead of the dollar value of each transaction, I’ve ended up with quantity of a product traded. This means I now know, for example, that the Bahamas traded nine live horses with the US, but not how much those horses were worth. I also know that Swaziland bought 10 kilos of used postage stamps from South Africa, but not how much they spent buying them.

For aggregation purposes, this information is utterly useless. What is the total export value of the Solomon Islands? Well, it’s 70 tons of “Ornamental fish, live”, plus 19 kg of edible offal, plus 317 tons of “Palm kernel or babassu oil, crude”. It’s just not going to work.

So it’s back to square 1 with the downloading of trade data from the UN website. If anyone knows of a better way of doing this, let this weary researcher know quick, or there may be one fewer “Professional brainbox, unfrozen” exported from the UK in future editions of the data.

Here’s how the thickness of our imaginary book of trade data was calculated. Microsoft Word can squeeze 46 lines of fairly dense data onto an A4 page. My copy of Pemnberton & Rau’s “Mathematics for Economists” is 4cm thick and has 700 pages, or 0.0057cm per page. The COMTRADE database has 1.75 billion records, which means it’d need 38 million pages, for a total book thickness of 217,391cm or 217 metres.

Advertisements

Author: Rob Levy

Economist at NEF. Former teaching Fellow in Economics at UCL and Bristol University. Recently submitted my PhD. We'll see what happens...

2 thoughts on “Gineau-Bissau imported 500,000 litres of ‘Ice, Snow and Potable Water’ in 2003”

  1. I feel your pain old chap. Gave up on this database earlier today. It might save some time to pre-calculate all the query URLs and run through them individually – that way you only need to click the ‘download’ button to retrieve the csv..

    Like

  2. An entertaining addendum:
    I’ve discovered that there are some products in the UN data whose product descriptions include double-quotes. Why is this a problem?

    Well, the data arrives at my computer in comma-separated variables (CSV) format and double-quotes have a special meaning in this format. They’re a way of telling the computer, this comma isn’t really a comma. Look at this, for example:

    name, description, quantity
    Chair, It’s a chair, 3
    Table, Just a normal table, 10
    Furniture, “Everything else, including ashtrays”, 20

    The double-quotes in the last line tell the computer that “Everything else, including ashtrays” is all one piece of information, and that the comma doesn’t mean “move on to the next piece of information” like it normally does.

    There are 14 products in the UN data product list whose descriptions contain double-quotes, which totally buggers the CSV format up.

    A particularly amusing example is product code 9507, described as:
    Fishing rods, fish-hooks and other line fishing tackle; fish landing nets, butterfly nets and similar nets; decoy “birds” (other than those of heading 92.08 or 97.05) and similar hunting or shooting requisites.

    Like

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s