SocialPond

Things about society.

Saturday, November 04, 2017

Continued struggle with IPEDS data



As mentioned earlier, works in automate data importing take enormous amount of dedication and efforts. For those do not appreciate, there is really no need to share the knowledge with them.

Here is an example that demonstrate the kind of work and dedication is needed to solve just one problem that I run into while importing IPEDS data.

One problem format I run into in some IPEDS csv file is: 
...,""some text quoted with two double quote"", ...

As a human, we know this line break the csv convention and most likely any csv file importing program is going to fail.

As a data user, I got few resolutions to consider. If I am only dealing with this file, the fastest way is to just open the file in text editor and modify the line so that the csv file can be imported into my application. If you are thinking this way, most likely you are a data analyst and probably think this is how things should be handled. Since you are higher up in the data food chain, likely, have not appreciate the work and thought of IT professions.

IT professions are likely to view the situation from a much broad point of view and ask questions like: What if this is an error exist in ACS' csv file? - If you know the size of a general ACS' csv file, you will realize that there probably very few text editor can effectively open the file, let alone to locate the error line and fix it.

IT professions may also ask: What if there are other csv files also have this problem? How can I handle this automatically?

One tool a lot of IT profession know about is the sed program. To use the sed to fix this problem it is straight forward:
  sed 's/,"("[^"]{2,}")"/,\1/g; ' InCsv

Unfortunately, if you want to invoke this with VBA, the command become much more complicate:
  Cmd.exe /c ^"sed ^-r ^-n ^'^{s^/^,^"^(^"^[^^^^^"^]^{2^,^}^"^)^"^/^,^\1^/g^; p^;^}^' InCsv ^"

Let's just say this, if you have no clue what we are talking about here, you should appreciate the work of IT professions.

Labels: , , , , , , , , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home