SocialPond

Things about society.

Wednesday, May 17, 2017

NCES IPEDS data for Database/IT professionals


Personally, I am an IT professional worked in an education agency. (Have) Been in this position long enough and dealing with social science researchers a lot. One of the interesting observation is that even though the social science researcher dealing with data all the time, without the IT background still limited their ability to handle large amount of data efficiently. A lot of time, these staffs relied on expensive commercial software and computer hardware to perform their tasks. When leading projects, often times, they are limited by their vision to provide and deliver efficient data products.

On the other hand, people with strong IT training can have better visions on how things work and know the real limits of things and set the goals that others can't - I love this Elon Musk story Simple math is why Elon Musk’s companies keep doing what others don’t even consider possible, where Physics is said to be applied first, which is the fundamental that dictated the limits. The value of a real STEM training is the vision and the know of limits. Apply to the data processing, the IT is the know.

The Integrated Postsecondary Education Data System (IPEDS) refers to a set of data collected from a large set of Postsecondary Education Institutions of United States. The survey is conducted by the National Center for Education Statistics. The data collected is available for anyone's use. For causal use, you can easily obtained the data you are interested in, manually. However, as we all know, the real power of data multiplied if you can have all the data in one place in a readily to be used state. Yes, most likely we are talking about a database.

Glance over the data retrieval option offered by NCES/IPEDS, the 'Complete data files' option seems to be the best way to retrieve the whole IPEDS data set. Practicing a bit manually, you soon realize that manually select and download will still take you a long time to even download the file let alone importing them.

With enough IT knowledge, a reasonable approach to this problem could be: 1) Save the download page; 2) Make minimal fixes to the page so that it conform to XML; 3) Device a short XSLT translation script; 4) Copy the translated page into database; 5) With the list of file to download on hand, wrote scripts to download files automatically.

In addition to the above implementation, to facilitate the continuity of time available for download, a scheduling mechanism is also implemented.
  


Labels: , , , , , , , , , , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home