Case Study 2: Data Transformation Stage 4

Isolating unique labels

To see which labels were unique to one set I wrote a Python script which compared the labels in the two input CSVs and outputted two new CSVs, one for each image set with only the labels unique to that set. I also had the script return a count of unique labels. The full-resolution base image set contained 101 unique labels. The reduced-resolution dataset contained 120 unique labels.

The Python script I wrote for this stage along with the two output CSVs from the script are available at the link below this section’s commentary and critique. Also included is a PDF of the consolidated comparison document with images that had unique labels on both the original and lower-res versions highlighted.

Context and critique

This stage consisted of a transformation through a script. The opportunity for error or unintended consequences here is again in the coding, but I spot checked the results by searching for labels it deemed unique in the other image sets output files from earlier stages, and everything seems to be functioning as intended.

Stage 4 data consists of 1 script and 2 output CSVs (the same script was used for each). Also available is a PDF of the consolidated data with unique labels highlighted. Click the links below to download from GitHub.

RIGHT CLICK TO DOWNLOAD – Script

RIGHT CLICK TO DOWNLOAD – NYC output CSV

RIGHT CLICK TO DOWNLOAD – NYC 25% output CSV

RIGHT CLICK TO DOWNLOAD – Consolidated comparison PDF

Continue on to the next section…

Return to table of contents…