Hello people, it’s been a while since I last wrote one of these tutorials so I figured to cover something I recently did at work. We had to convert a YAML to a CSV file since some of the data we were onboarding changed file type! It used to be in a CSV format and now it was being sent as a YAML file so I needed to figure out how to manually do this and then automate it.
Since I absolutely love Python and I absolutely love using Pandas libraries to work with data I decided to use both for this file conversion rather than something online that does it at a click of a button LOL. I just wanted to have a bit of fun!
What is a YAML file?
YAML stands for Yet another Mark-up Language which I guess is a bit tongue in cheek. It is used to create configuration files with any programming language and is a superset of JSON, another data serialisation language. It can do everything that JSON can do and more like supports comments, is more readable, uses python-like indentation etc.
In order to convert the YAML file to a CSV file using Python first I needed to look at the data to see what data cleaning is required.
I’ve added some examples found online since I am unable to share what I was working on exactly but this is what it looks like in comparison to XML and JSON.
So as you can see the YAML file looks a lot like JSON and thus first you can normalise it by loading the YAML file and then converting it to JSON. Go ahead and try this now on Google collab!
Here is what you need on Google collab if you are uploading any files:
from google.colab import files # Upload file uploaded = files.upload()
Then import the libraries you require and convert YAML to JSON
import yaml import json import pandas as pd # Open YAML file with open('file.yaml', 'r') as file: configuration = yaml.safe_load(file) # Convert to JSON with open('file.json', 'w') as json_file: json.dump(configuration, json_file)
Then you need to read the JSON as DataFrame in order to make use of Pandas to clean and re-organise the data how you see fit!
# Read JSON as DataFrame df = pd.read_json(json.dumps(json.load(open('file.json'))))
Boom there you have it your data frame ready for cleansing, organising whatever you want to do with it! We had to do much more after this but it’s unique to our data so go feel free and play around with your YAML file in Dataframe format.
After this finish playing around with your data now it’s time to export it as a CSV.
A voila! I am not kidding you – it was as easy as 1 .. 2.. 3 🙂