Finally, a Python workflow that actually works

I do all my exploratory data work in Jupyter Notebook. It’s an amazing mix of nicely-formatted text, syntax-highlighted code and clear outputs. I love it:

Untitled

But it’s not the complete package. I often want to peek at the results of a calculation I’ve done, just to verify to myself that I know what I’m doing. Here’s a typical example:

Untitled

I don’t really want to print out the results of this calculation. I just want to peek at it to verify that it makes sense.

I’ve finally found a dream way of giving myself both things at once. A nice-looking record of what I’ve done, with the ability to easily run chunks of code again and again, and a way to peek at data and outputs in a forgettable, scroll-off-the-top-of-the-screen way.

It involves starting a Jupyter Notebook, then connecting an iPython console to the same kernel. This means that the iPython console and the Jupyter Notebook are seeing the same Python instance: the same variables, the same packages, the same function definitions. Here’s that example again, but this time, the Notebook doesn’t report the result of the calculation, it just does it. I’m then peeking at the result in a neighbouring console window:

Untitled

To get this to work, you just copy/paste the long string of numbers and letters after “Kernel started” in the Jupyter Notebook window (here it’s 551204fc-c49f….):

Untitled

Then, in your favourite terminal, type:

jupyter console --existing 561204fc-c49f-44a7-abf2-902535573282

You’ll now have an iPython console running off the same kernel. Wicked.

Advertisements

GDP may partly be “based on randomly generated numbers”

GDP is easy to describe. It’s one of the reasons it is so popular as a measure of what a country “does” economically. You just add up everything that’s produced in a given year, and subtract all the stuff that went into making it. What’s left is the genuinely “new” stuff the economy produced that year. Easy.

Behind that simple description lies a tangle of thorny issues involving philosophy (what counts as a “thing”? what gives a thing value?), sociology (what value does education have? or a healthy society?), gender politics (why does childcare not count as having value?) and a host of other hard-to-answer qualitative issues.

But these things seem like minutiae in comparison to the big data question this poses: How do we know anything about the economy at all? Who is actually counting and recording all this?

This might seem like a silly question. Surely the government just “knows” what is happening in the economy, right? Don’t they have to know what’s being bought and sold in order to calculate tax bills?

Well, they do know a lot of that stuff for VAT purposes but, for now at least, that information is not made available to the people who calculate GDP. (This is set to partly change in 2018.) Companies are obliged to report their turnover and their profits but, for the calculation of GDP, these are not timely enough, don’t cover enough of the economy (especially small firms) and are not cross-checked with other sources. There is also no requirement for companies to submit this data in a format which could easily be aggregated into a single, huge database. Imagine going through thousands of company reports copy/pasting values out of PDF tables.

So the government instead sends out around 50,000 monthly surveys asking questions about what was produced, how much was bought, and where any profits made went (i.e. to wages, investment or shareholders.) These are called the Monthly Business Survey (MBS) and are the little-known secret of how the government knows what’s happening in the economy and, hence, how it calculates GDP.

The accuracy of GDP depends directly on the accuracy of thousands of self-reported surveys. Statistics 101 tells us that random mis-measurements should tend to cancel one another out once results are aggregated across surveys, especially when there are lots of people being surveyed. But systematic under/over-reporting must play a role in some way or another.

Predicting the direction of this systemic mis-reporting of financial data is difficult to do. Would a firm want to over-report its successes for boastful purposes, or to under-report, fearful of a knock on the door from the taxman?

There are also at least some business owners who appear not to take the process of completing the survey quite as seriously as they might.

One business owner was mightily disgruntled at the amount of data being asked for:

The form asks for details of total turnover, retail turnover, commodity breakdown of retail cover, expenditure, employment costs, energy costs, goods materials & services costs, taxes, duties & levies paid, value of stock held, capital expenditure …

To which another gleefully responded:

I had to do one, waste of time, didn’t try too hard with the answers. Scribbled some rubbish and throw it back at ’em.

There’s even some suggestion that the process of checking that the responses for sanity is not as water-tight as it might be:

I had one of these about 10 years ago, followed up by threats of fines.
I eventually filled it in with totally vague responses and notes along the lines of “this is an estimate based on randomly generated numbers” it appears that they can’t force you to give meaningful data.
Apparently this blatantly stupid information was completely acceptable and they went away 🙂

Ouch. Let’s hope not too many business owners are following this one’s lead…

BBC gender pay gap

The release of the BBC’s list of top-paid talent has quite unexpectedly focused on the difference between men’s pay and women’s pay.

But it’s hard to look at a list and see whether women are doing worse than men, or whether there are fewer women on the list. Clearly either of these is a bad show, but a bit of data viz should be able to make the picture clearer.

In the pictures below, men and women are ranked from most well-paid at the top, to least well-paid at the bottom. On the right, all men and women are ranked together

In a perfectly fair distribution the lines would all be flat. Any downward-sloping lines suggest that someone is doing worse overall than their position within their gender would suggest.

Click for an interactive version

Claudia Winkleman is the best-paid woman at the BBC but only the 8th best-paid person overall. And as for the hardest-working person on TV, Laura Kuenssberg, she’s the 13th best-paid woman, but is outbid by Jools Holland, the 36th best-paid man.

Talk about a pay Squeeze.

SpareRoom is over. What’s next for Britain’s flatsharers?

There’s an air of arms race about flatsharing adverts of the kind hosted by sites like SpareRoom.

All individuals want the same thing: to find other individuals, either to move into the spare room in their shared house, or who have a spare room to move into.

And all agents and landlords want the same thing too: to attempt to act as middleman between those two sets of individuals, taking an barely-earned cut in the process. Agents in particular have nothing at all to gain from the kind of peer-to-peer flatsharing which SpareRoom et al. are supposed to excel at.

So what do these middlemen do? They jump onto the site themselves, flooding it with adverts of their own, drowning the individuals out in a sea of noise. This heavy spamming of supoosedly peer-to-peer sites is characterised by attention-seeking tactics of ever-increasing volume.

Look closely at the first four hits in a search for rental rooms in one particular North London postcode district:

Untitled2

These adverts are laden with capital letters, random punctuation marks, and shouted claims about the virtues of the room available: “//SUPER PRICE TODAY/” yells one. Not one of these top four results is posted by an individual looking for a flatmate.

When one site becomes flooded with this kind of spammy advert (Gumtree was first to fall prey to this), attention shifts to another site where, for a time, flatsharers meet and interact unimpeded. This continues until, inevitably, the middlemen get wind of this new platform, and the spamming begins.

But this time seems to be different. SpareRoom has been pretty much defunct now for at least 18 months and, as far as I can tell, no platform has arisen to take its place.

This absence of a usable service has driven individuals looking for flats or flatmates into the walled garden of Facebook. One potential flatsharer told me:

…all about word of mouth I reckon…SpareRoom was shit!

Untitled

Flatsharing via Facebook is great if you’re a social media bigwig, and your many friends are all sharing flats too, but it’s unhelpful for people who’ve just arrived in town, or whose friends are mostly homeowners or shacked up in couples.

The move to Facebook has shown that there is a real appetite for a reliable, trustworthy way to fill a room going in a flat share. What everyone really wants is to fill their room with a friend, a friend-of-friend, or someone like-minded enough to ensure they’ll get on.

Overheads for running a site which hosts genuine adverts from genuine flatsharers come pretty close to zero, so crazy fees like those of SpareRoom are not going to be the answer to keeping a new site clean of spammers. There needs to be some kind of verification process where your Facebook friends can vouch for the genuine nature of your flatshare advert: is it what it claims to be? is the advert genuinely from a current housemate?

These are problems which have been solved over and over again in other domains. It’s indicative of the lack of renters’ collective bargaining power that the process of finding rooms and flatmates is so poorly served by the options available.

Are bad exams harder to mark than good ones?

Bored out of my brains marking a large pile (84) of exams, I decided to spice things up a little bit by timing how long it took me to mark each question. I happen to favour the style of marking where you mark every question 1, followed by every question 2 etc.

It seemed clear to me that the speed of marking should depend on the order the paper was marked in: as I become more and more familiar with the kind of weird stuff students write in response to my questions, and as I solidify in my mind how many marks I feel particular answers are worth, it makes sense that I should speed up.

I also wondered whether bad answers are more difficult to mark than good answers.

Looking at this, I can see I’m right about my first assumption:

untitled1

The duration clearly decreases on average as the marking exercise goes on.

But what about the crucial question: is marking a bad exam answer harder than marking a good one?

untitled

The answer is a fairly resounding “no” (Specification 1). The low R2 also makes you think there’s something more important going on here: namely handwriting: I’d guess that’s the big factor.

But like a good social scientist, I wasn’t happy with leaving it there. Maybe there was some kind of nonlinear effect: very, very bad answers are easy to mark (since there’s usually little or nothing written), so are very good ones.

Specification (2) shows that there’s not much evidence that this is the case. (in fact p is less than 0.1 for both marks and squared marks in (2), so in some social science contexts, I’d award myself a little star!)

You might think: this is the most boring result imaginable. Why is this worth a blog post?

Well, you’re right of course…. but the struggle against publication bias just claimed one small victory!

#Procrastinalysis

Does your econ department care about new ideas?

The Reteaching Economics network is a group of early-career economics teachers interested in moving the teaching of economics on from nonsense like this:

nairu
NAIRU – Not in fact a Pacific island nationImage by Asacarny, distributed under a CC by 2.5 licence.

to something a little bit more resembling the actual world which students expected to be finding out about when they signed up for an Economics undergraduate[1].

They are inspired by the incredible Rethinking Economics, an international student group whose founders have just written an excellent book on the “perils of leaving economics to the experts.” The energy and vision of this group of students makes the average econ department look like exactly the kind of left-behind-with-dust-gathering legacy institution it usually is.

The Reteaching Economics group publishes a list of its members online, so we can see which institutions are most likely to care about teaching new ideas. Here’s the breakdown, including only institutions with more than one member.

reteaching-by-institution

There is a fairly long list of lonely singleton Reteachers (full disclosure: I’m one of them) who are the distant outposts of these ideas in otherwise skeptical economics departments. They should be a source of sympathy, greetings cards and reassuring poems/songs/flashmobs.

If you study Economics at an institution other than these, why not make your teacher aware of the new world being created just beyond the walls of their department?

@ReteachEcon

@RethinkEcon

[1] This is excluding the not-inconsiderable minority of econ undergrads who signed up because they want a job at Goldman/KPMG/etc. as soon as possible, who would probably be best served by finding out as little about the real world as possible. go back ^

Does your department care about tax havens?

Oxfam published a press release yesterday containing an open letter to world leaders calling for them to “make significant moves towards ending the era of tax havens” which are “distorting the working of the global economy”.

This seems to me like a pretty important intervention, and it’s a rare opportunity for economists to use their variably-justified reputation as “people who know about the economy” to do something positive to fix the way the system works at a global level.

These kinds of coordinated interventions into the actual workings of the global economic system seem woefully infrequent to me. So when it does happen, I’m anxious to be a part of it in whatever tiny way I can. And the least a bottom-rung economist like me can do is choose to work for the institution which is pulling hardest in the right direction.

Oxfam published the full list of signatories to the letter, so I thought I’d do a teeny bit of analysis, to see which countries and which UK departments care most about bringing an end to the global tax haven system. Perhaps this will help any early-careerists choose an institution whose interests align with theirs: after all, there is more to judging a faculty than counting its number of peer-reviewed publications.

Spoiler alert: the number of signatories from Bristol, the university I’ve recently moved to be a part of? Zero. Bad times…


 

First, here’s the breakdown of signatories by country:countries

Italy leads the field by a long, long way. From a first glance, it looks like they are from a decent variety of institutions but I haven’t checked this properly. Future work someone? Italy is a known advocate for change in economics, being a big adopter of the CORE project which aims to transform how undergrads are taught economics. The UK comes a pleasing second, I look at this result in a bit more detail below, and it’s pleasing to see the USA being well represented too.

It’s worth pointing out here that not all signatories are equal: although France has only a paltry ten signatories, one is Olivier Blanchard and another is Thomas Piketty. These are both absolute giants of the economics world, and their contribution here, particularly Blanchard who is not exactly known as an iconoclast, is very significant indeed. (One might do an analysis by Google Scholar citation count instead, which would show up these differences.)

Now let’s look at which institutions those 50 UK signatories come from:institutions

The top two, SOAS and Greenwich, are both already on my radar as being slightly more radical than the standard econ department. But this is not a list dominated by such agitators: the LSE is hardly known as an anti-establishment hotbed, and nor are Warwick or Oxford.

Big-name mover for change, Ha-Joon Chang, cuts a lonely figure as the sole signatory from Cambridge. I hope he at least has some researcher staff of his own to go punting with.

And, as I mentioned, zero signatories from Bristol.

#EndTaxHavens.