So far so simple in terms of reading data into the R environment (I’m using the wonderful RStudio by the way). Download a little library, run about 5 lines of code and, boom, my PostgreSQL view is available as a data frame in R. Easy peasy.
library(package=RPostgreSQL) library(package=plm) ## Load the PostgreSQL driver drv <- dbDriver("PostgreSQL") ## Open a connection con <- dbConnect(drv, host="xxxxxxxx",dbname="xxxxxxx",user="xxxxxx",password="xxxxxx") ## Read the whole contents of a table into a dataframers rs <- dbReadTable(con,"rlvw_cntry_year_aid_affected")
More tricky is the whole panel data regression part. Panel data has two dimensions, a time dimension (in my case the years from about 1960 to 2008) and an “individuals” dimension, in my case countries. So I have aid received data for all countries for each year in the set, making a lot of observations overall.
The hard part is using both dimensions when running a regression. In principle, the maths is not complicated. And in the statistics package I’m used to using, it’s straightforward (once you know how!). You tell the software which column represents your time dimension, and which your individual dimension and off you go. In R, I’m not sure yet how that stuff works, so it’s back to the reading board for me as I trawl through online tutorials etc. I’ll report back once I’ve worked out how to do it.