Things about society.

Wednesday, January 25, 2017

ACS 2015 5-year PUMS for database/IT professional

Continue with the ACS PUMS database project, the task is to import the 5-year ACS PUMS product of 2015.

Comparing Census's 2015 Data Dictionary file (PUMS_Data_Dictionary_2011-2015.txt) to that of 2014, we, again, noticed that Census' added leading spaces to a lot of lines to, possibly, make the file more readable for 'human' users. Following the step of processing the 2015 1-year file, I removed those leading spaces via sed before processing it with my definition processor.

Processing the Data Dictionary file with my program, it yields the following parsing errors. Some of them are clearly unintended errors, others may just because Census did not spend time and efforts to establish clear syntax rules so that their products can be machine friendly. Here are the parsing errors:

        value 1001264, the blank after '1001264' is actually an A0h instead of
        just before TEN, a two line 'NOTE:'
        - no blank line after the 'PERSON RECORD' section mark
        value 100264, the blank after '1001264' is actually an A0h instead of
        just before GCL, a two line 'NOTE:'
        no empty line before FPINCP

The A0h one is really interesting. For those of interest, A0h is a NBSP character used in HTML. Without a good hex editor, it takes me a lot of efforts to figure out what is going wrong.

Labels: , , , ,


Post a Comment

Subscribe to Post Comments [Atom]

<< Home