Things about society.

Wednesday, December 28, 2016

ACS 2015 1-year PUMS for database/IT professional

Continue with the ACS PUMS database project, the task is to import the 1-year ACS PUMS product of 2015.

Comparing Census's 2015 Data Dictionary file (PUMSDataDict15.txt) to that of 2014, we noticed that Census' added leading spaces to a lot of lines to, possibly, make the file more readable for 'human' users. After few thoughts, I decide to simply remove those leading spaces via sed before processing it with my definition processor. It is also noticed that in this year's file, there are a lot of value definitions spanning more than one line and I decided to modify my program to adapt to that.

Processing the Data Dictionary file with my program, it yields the following parsing errors. Some of them are clearly unintended errors, others may just because Census did not spend time and efforts to establish clear syntax rules so that their products can be machine friendly. Here are the parsing errors:

      A not recognized line before DIALUP
        - it read 's line intentionally blank; content continues.'
      Two line Note: before TEN
      A not recognized line before PARTNER
it read 's line purposely blank; content continues.'
      NWAB, NWAV, NWLA, NWLK, NWRE variables deviated from the format of others.

        - extra text after the variable size
      PAP blank line before 00001..99999
        - this is an obvious error
      NAICSP bbbbbbbb no space before description
        - most value definition, there is a separation space between value and description.

       NAICSP 928110P1 to 928110P7 no space before description
        - same as above
      Just before end(*) a Note line instead of Note: line
        - a Note, not a Note:

Since there are only few issues, I was able to manually editing the file and made it machine parse-able/process-able.


Labels: , , , ,


Post a Comment

Subscribe to Post Comments [Atom]

<< Home