I never thought I’d say this, but Stata rules the roost at at least one thing: lagging a column in panel data. What does that mean?
Imagine you’re looking at test scores, and you think this year’s test score depends on last year’s (a sensible assumption perhaps). Your dataset would look something like this:
Year Name Score Previous Score Score Before That 2010 Bob 30 . . 2011 Bob 59 30 . 2012 Bob 63 59 30
The previous scores are calculated by ‘lagging’ the data by one and two periods (note that the dot represents a missing value. Stata has “.” and R has “NA”)
To lag data in Stata you simply type L.Score and for inclusion of lots of lags, L(0/3).Score will give you the score, last year’s score, the year before that AND the year before that one too. Amazing!
Now, as I search for a way to do this in R, imaging my horror on stumbling upon this syntactical nightmare. Being involved with computing gets you used to difference syntax systems. For example, I know that the SQL-like “WHERE Name=’Rob’ AND Home=’London'” looks like “AND(A$2$=’Rob’,A$3$=’London’) in Excel. But you get the feeling that, as computer languages beging to be able to claim some maturity of design, some kind of basic tenets of language syntax would begin to emerge. (Note that, were Excel to be designed today, there’s no way the developers would opt for the AND(A,B) tomfoolery.)
So, to lag in R, I have to do this:
plm(Name ~ Score, lag(Score=2,c(2,3)))
Try getting your head around that! Score for readability of code? Nought! Score for being able to remember how to do this next time I want to? Nought point nought!
Ah, the joys of community-developed software.