white printing paper with numbers

Hello people, it’s been a while since I last wrote one of these tutorials so I figured to cover something I recently did at work. We had to convert a YAML to a CSV file since some of the data we were onboarding changed file type! It used to be in a CSV format and now it was being sent as a YAML file so I needed to figure out how to manually do this and then automate it.

Since I absolutely love Python and I absolutely love using Pandas libraries to work with data I decided to use both for this file conversion rather than something online that does it at a click of a button LOL. I just wanted to have a bit of fun!

What is a YAML file?

YAML stands for Yet another Mark-up Language which I guess is a bit tongue in cheek. It is used to create configuration files with any programming language and is a superset of JSON, another data serialisation language. It can do everything that JSON can do and more like supports comments, is more readable, uses python-like indentation etc.

In order to convert the YAML file to a CSV file using Python first I needed to look at the data to see what data cleaning is required.

I’ve added some examples found online since I am unable to share what I was working on exactly but this is what it looks like in comparison to XML and JSON.

Image source: Nehal Kahn

So as you can see the YAML file looks a lot like JSON and thus first you can normalise it by loading the YAML file and then converting it to JSON. Go ahead and try this now on Google collab!

Here is what you need on Google collab if you are uploading any files:

from google.colab import files

# Upload file 
uploaded = files.upload()

Then import the libraries you require and convert YAML to JSON

import yaml
import json
import pandas as pd

# Open YAML file 
with open('file.yaml', 'r') as file:
    configuration = yaml.safe_load(file)

# Convert to JSON
with open('file.json', 'w') as json_file:
    json.dump(configuration, json_file)

Then you need to read the JSON as DataFrame in order to make use of Pandas to clean and re-organise the data how you see fit!

# Read JSON as DataFrame
df = pd.read_json(json.dumps(json.load(open('file.json'))))

Boom there you have it your data frame ready for cleansing, organising whatever you want to do with it! We had to do much more after this but it’s unique to our data so go feel free and play around with your YAML file in Dataframe format.

After this finish playing around with your data now it’s time to export it as a CSV.

for_csv.to_csv('file.csv', encoding='utf-8')

A voila! I am not kidding you – it was as easy as 1 .. 2.. 3 🙂


If you found this post useful please support House Ninety Two: Buy Me a Coffee at ko-fi.com