Case Study 2: Data Transformation Stage 3

Manipulating the data formatting

In this stage I edited the CSV output of my Python script from stage 2. As mentioned, my script outputted the results as a series of bracket-enclosed lists with values in quotes separated by commas, which was not readable as a standard CSV as-was.

In LibreOffice I used the Find/Replace function to delete all square brackets, single quotes, and spaces after commas. I split the data on the commas using the Text to Columns feature, added a column with the city name, added column headers (field names), and saved as a CSV.

The two CSVs I formatted in LibreOffice are available at the link below this section’s context and critique.

 

Context and critique

This stage was necessitated by the awkwardly formatted output of my script in the preceding stage. Any intervention like this is subject to errors of consistency. In one instance, a Google-applied label had a comma in it (“Ducks, geese and swans”) causing the CSV to break the label across columns, which I had to manually fix.

Stage 3 data consists of 2 output CSVs. Click the links below to download from GitHub.

RIGHT CLICK TO DOWNLOAD – NYC

RIGHT CLICK TO DOWNLOAD – NYC 25%

 

Continue on to the next section

Return to table of contents

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *