It seems like every time I hand in or finish my exam my immediate reaction is “Oh god I wanna go write a blog about this”. It’s almost like I need a release and somewhere to put all that pent up emotion lol. This review is for the Big Data Analytics module at Computer Science MSc in Data Analytics at The University of York.
Big Data Analytics
This module walks you through the CRISP-DM process which is a Data Science/Mining process on how to process data to solve a problem. Each week goes through a section of the process and by the end of this module you will have learnt how to come up with your own research question, explore data using descriptive statistics, clean and prepare data ready to use for machine learning. Then you will analyse the results which prepare you for the final assignment.
Unfortunately, you don’t get taught how to use Python to do all these things. Instead, you can use an array of Python, Excel and Weka. Weka is an open-sourced machine learning tool used in Data Mining. If you know how to use Python with Jupyter Notebooks then you will have an advantage of quickly cleaning data Weka does this but it’s annoying to learn. Luckily, we learnt how to use Python to clean data in the gruelling module: Advanced Programming where they teach you Python in 1 week to try to kill you before you’re even halfway through the module 🤪.
Then when you’ve done fixing up your data you learn different machine learning models and what they do and what type of data is best suited for what. Then you use these on your data to categorise your data or make predictions. Mine were completely terrible no matter what I did; it just didn’t work out… maybe yours will be better but my machines learn jack shit for the past 2 weeks 😭.
A good friend on slack (GFOS) from this course (that I met last week on slack who has the same last name, is similar age AND lives near me – like WTF!!!) told me you can have awful results but if you justify why then your awful results become AWESOME!!! So bare that in mind.
Once you analyse your results you learn about why you need to put it in a database, namely just how to convert your flat datafile into a relational database and how to show this visually using an Entity Relationship Diagram/ Text Schema to anyone else who would want to deal with it.
Then you are directed towards where to learn simple SQL queries to communicate with your database. You don’t essentially LEARN SQL you just learn what it can do. They give pointers on where you can learn more about the subjects in depth.
Some great resources I used that were quick to dip and out of through this course were:
- Data Mining with Weka on Futurelearn – The guy who wrote the Data Mining book which you’ll have to read created this short course on Weka.
- SQL Tutorial on W3School – Runs you through the most basic SQL commands in a simple read and do format. You don’t really need SQL for this module but I’m sure you’ll need it in the real world.
- Python for Data Analysis by Wes McKinney – This book was part of the Advanced Programming module but I bought it because it was so damn good to flick through and lots of practical guidance in how to use Python to manipulate data for statistics and analysis.
Finally you learn about Privacy Issues and Validation. These are covered briefly and not much in depth. They were interesting but I struggled to understand the topic on validation.
Anyways that’s pretty much it. It’s the same gruelling method of making you read 10000 pages 2-4 books with pimped out paragraphs. I made the mistake (AGAIN) of reading through all the content and then tackling the assignment last. My new GFOS gave some great advice as she finished her assignment…ONE WEEK AHEAD OF THE DEADLINE! I was gobsmacked. These were her 2 pence…
If you’ve happened to stumble upon this blog and haven’t enrolled and are reading this to decide whether you want to sign up for the Computer Science MSc in Data Analytics course at The University of York then I suggest you read my other reviews for each module.
- Chrome plugin for a Pomodoro Assistant. – Using the pomodoro method which includes 20 minutes study time, then 5 minutes break for productivity.
- Start the Assignment WHLIST you study each topic. Each week was associated to an area in the assignment. Therefore she would do the assignment and read through the content answering and doing the task at hand whilst it was fresh in her mind. This allowed her to complete it on the go rather than the mad rush at the end where I put in all-nighters that bring me back to my prime Uni days.
- Assign each whole day for each part of the assignment. This helped her break down the task into manageable chunks and allowed a whole day dedicated to each area of focus.
Other Helpful Applications
- Visual Paradigm – You can use this for your Entity Relationship Diagrams (Or any diagrams) it’s free
- or Lucid Chart – A prettier app for diagrams/models, read my article on how to get the educational license FOR FREE.
- Mendeley – Save all your IEEE references for your assignment using the handy Chrome plugin. Educational license = FREE.
- Bibguru – Automatically convert links into IEEE citations for your assignments – BEWARE OF PAGINATION FOR BOOKS – It’s buggy AF. In my feedback, they mentioned that if I’m referencing specific topics I needed to add page numbers. So proof checks all your references if you use this.
- Google documents – I know everyone uses this but you can access it from anywhere and exporting to PDF makes it super convenient for submission.
- Jupyter Notebooks – Great for interactive interface to use for cleaning data in Python.
I enjoyed learning more about Machine Learning tools. I honestly didn’t know what ML was. Until this module, I always thought it was a robot who was programmed to learn things on their own and build on knowledge who will eventually take over the world as an AI….but in reality, it’s more about the information you feed the ‘robot’ who has specific learning processes. Then it uses that to make its own judgements. In Machine Learning there are loads of models to use to learn how you want them to learn so it’s not as crazy as you see in Science Fiction movies. Although, I’m sure there’s some super boffin out there who knows how to do the crazy AI stuff. I, on the other hand now know how to click some buttons in Weka and wrongly classify my data 😂 which is definitely progress from last month when I couldn’t even understand WTF Linear Regression was… (it’s the most basic algorithm in ML) 😭
Can I just say…how tired I am??
I went from an excited and positive the-sun-shines-out-my-bum kinda student starting her Master’s Degree to a weathered, mentally exhausted and heavily pedantic old hag in less than 6 months 😂. Don’t get me wrong there are highlights to doing this course but there are also way too many low points.
The main issue I found is that they direct you to the content to “learn” but there is no way of confirming things or to know what you’ve indeed learnt is correct until your final assignment… which is marked and by then its too late to unlearn the wrongdoings. Therefore, I may suggest when we are taught certain techniques to do it and write it out and then send an email for feedback. I wish I had done this for all the diagrams considering we had to do some for the assignment and I had no hell in clue what to do.
I also found the assignment really difficult. Namely, because none of my models or statistical things was working but also because I didn’t know how to write about everything I was doing. In addition to having to answer 3 research question having to do it under 300 words was an even BIGGER STRUGGLE; by the time I did 1000 hours of research on google my words were over the 300-word limit by about 3000 words and I was struggling to cut my paragraphs down 😵. I stayed up until early hours and handed it in last minute, once again, but now I feel a bit more confident about what Linear Regression is… amongst other things 😂.
Furthermore, I’ve learned so much which I’m happy about but I am still frustrated with how I am unable to time-manage properly. I’ve bought 3 books on Time Management, Critical Writing and Report Writing. I’ve read the latter two and they came in SUPER handy but am yet to read the time management one. Considering my payment didn’t go through I’ve now been forced to go on a study break pushing my graduation date by 6 months. With the time I have I may write another article… maybe even a whole book on helping those with writing skills but I will wait until I get the results for this module before I go ahead and lead others astray. Sign up to my newsletter to keep updated ✌️.
Anyways, thanks for reading!