Finally, a Python workflow that actually works

I do all my exploratory data work in Jupyter Notebook. It’s an amazing mix of nicely-formatted text, syntax-highlighted code and clear outputs. I love it:


But it’s not the complete package. I often want to peek at the results of a calculation I’ve done, just to verify to myself that I know what I’m doing. Here’s a typical example:


I don’t really want to print out the results of this calculation. I just want to peek at it to verify that it makes sense.

I’ve finally found a dream way of giving myself both things at once. A nice-looking record of what I’ve done, with the ability to easily run chunks of code again and again, and a way to peek at data and outputs in a forgettable, scroll-off-the-top-of-the-screen way.

It involves starting a Jupyter Notebook, then connecting an iPython console to the same kernel. This means that the iPython console and the Jupyter Notebook are seeing the same Python instance: the same variables, the same packages, the same function definitions. Here’s that example again, but this time, the Notebook doesn’t report the result of the calculation, it just does it. I’m then peeking at the result in a neighbouring console window:


To get this to work, you just copy/paste the long string of numbers and letters after “Kernel started” in the Jupyter Notebook window (here it’s 551204fc-c49f….):


Then, in your favourite terminal, type:

jupyter console --existing 561204fc-c49f-44a7-abf2-902535573282

You’ll now have an iPython console running off the same kernel. Wicked.


GDP may partly be “based on randomly generated numbers”

GDP is easy to describe. It’s one of the reasons it is so popular as a measure of what a country “does” economically. You just add up everything that’s produced in a given year, and subtract all the stuff that went into making it. What’s left is the genuinely “new” stuff the economy produced that year. Easy.

Behind that simple description lies a tangle of thorny issues involving philosophy (what counts as a “thing”? what gives a thing value?), sociology (what value does education have? or a healthy society?), gender politics (why does childcare not count as having value?) and a host of other hard-to-answer qualitative issues.

But these things seem like minutiae in comparison to the big data question this poses: How do we know anything about the economy at all? Who is actually counting and recording all this?

This might seem like a silly question. Surely the government just “knows” what is happening in the economy, right? Don’t they have to know what’s being bought and sold in order to calculate tax bills?

Well, they do know a lot of that stuff for VAT purposes but, for now at least, that information is not made available to the people who calculate GDP. (This is set to partly change in 2018.) Companies are obliged to report their turnover and their profits but, for the calculation of GDP, these are not timely enough, don’t cover enough of the economy (especially small firms) and are not cross-checked with other sources. There is also no requirement for companies to submit this data in a format which could easily be aggregated into a single, huge database. Imagine going through thousands of company reports copy/pasting values out of PDF tables.

So the government instead sends out around 50,000 monthly surveys asking questions about what was produced, how much was bought, and where any profits made went (i.e. to wages, investment or shareholders.) These are called the Monthly Business Survey (MBS) and are the little-known secret of how the government knows what’s happening in the economy and, hence, how it calculates GDP.

The accuracy of GDP depends directly on the accuracy of thousands of self-reported surveys. Statistics 101 tells us that random mis-measurements should tend to cancel one another out once results are aggregated across surveys, especially when there are lots of people being surveyed. But systematic under/over-reporting must play a role in some way or another.

Predicting the direction of this systemic mis-reporting of financial data is difficult to do. Would a firm want to over-report its successes for boastful purposes, or to under-report, fearful of a knock on the door from the taxman?

There are also at least some business owners who appear not to take the process of completing the survey quite as seriously as they might.

One business owner was mightily disgruntled at the amount of data being asked for:

The form asks for details of total turnover, retail turnover, commodity breakdown of retail cover, expenditure, employment costs, energy costs, goods materials & services costs, taxes, duties & levies paid, value of stock held, capital expenditure …

To which another gleefully responded:

I had to do one, waste of time, didn’t try too hard with the answers. Scribbled some rubbish and throw it back at ’em.

There’s even some suggestion that the process of checking that the responses for sanity is not as water-tight as it might be:

I had one of these about 10 years ago, followed up by threats of fines.
I eventually filled it in with totally vague responses and notes along the lines of “this is an estimate based on randomly generated numbers” it appears that they can’t force you to give meaningful data.
Apparently this blatantly stupid information was completely acceptable and they went away 🙂

Ouch. Let’s hope not too many business owners are following this one’s lead…