Tuesday, May 1, 2012

Zap Gremlins

So much fun exporting data from MSSQL Express to PostgreSQL. 

So the MSSQL "Save Results As..." feature from the query editor window allows export as csv - which is a step in the right direction, but still pathetic in it's options [ie: none] However the encoding appears to be UTF-16, I don't know if this is database derived or if it forces encoding at export time [I have no interest in learning more about MSSQL unless it's absolutely necessary, out of the gate I despise it with a deep and abiding loathing]

When attempting to prep the exported csv with a simple php script to insert the data into a holding table, I was appalled to find out that there were encoding artifacts *in between every character* [my ignorance of UTF-16 encoding shines here... I'm assuming it's a multibyte character encoding byte of some sort] - long story short, the data needs further intervention before it's ready to import/insert into a UTF-8 PostgresSQL database table.

TextWrangler to the rescue:
Using the Zap Gremlins function I was able to eradicate all non ascii characters from my dataset - which in this instance was fine as I'm doing very rudimentary and gross information prep.