Population migration derived from ACS 2015 5-year PUMS dataset
Now that we have got all the data imported, let's have some fun.
For those of you who knows me, I have been an advocate for open source movement for a while now. The statistic software I preferred to use have been the R. However, I had not spent a lot of my time on R - I think we all understand that people got a lot of things to do and we revisit a tool when we needed to.
Couple months ago, I spent my spare time and wrote quite a bit of code in R and I thought that I will be right at home when I decided to take on this migration project. Boy, am I wrong about this... gosh. Well, spent almost whole day and end up fixing some of the bugs - well, not really a bug but because I have decided to include the NA definition into my definition database, it caused some problem when referencing these definitions from my old code. Anyway, got it fixed but did not really use the R.
Well - my IT training kicked in - I realized that instead of using the statistic software for this project, a few SQL statement will largely simplify the task to nothing. Come to think about this, the SQL not only easier, it actually run much faster - Database is designed to run from hard disk, it is not like most statistic software will load all the data into memory and tied up the computer resources. By the way, a while back I have this idea of using database as my statistic software. I actually check out MS SQL documentation on customer functions and, do you know what, it is totally possible. Now, the question is who is going to take on this project.
Anyway, I end up running few SQL statements and dumping it into Excel with a bunch of formula - sorry, I haven't really invested in the Open Office yet.
OK. Let's get back to the topic. American Community Survey is conducted by US Census Bureau in an annually basis. The PUMS file is sampled from the collected data and allows user to use these sample to derive results that weren't readily tabulated by the US Census Bureau.
Inside the ACS survey, there is a question that asked respondents where they lived a year ago. Based on this question, we can look into the PUMS data and derive some useful information from it. One of the interesting application of this question is when it is combined with the education attainment info of the respondents. This allowed data analysts to see that, for people moving out of a state, what kind of education these people acquired and, hence, the brain drain if highly educated people left a state.
Click here for the resulting file - please noted that for any result derived from sampling, there are associated errors - this file does not come with the 'margin of errors', which describes the range the real value may lie. In our case, with large enough margin of errors, the real value for an in-migration could end up in negative and, hence, associated with the idea of an out-migration. So, the file is for references only. The author is working on consolidate some of the categories and, hopefully, can report some data with reasonable 'margin of error'.
Labels: ACS, brain drain, Database, IT, migration data, national, PUMS, SQL