Preface
I’m taking a Quantitative Reasoning (QR) class at the National University of Singapore (where I study) this semester. I came to some pretty harsh realisations just 2 weeks into the class that I hope to address here.
The subject and focus of the class is the Pursuit of Happiness. No, it’s not a course on how to achieve it or ponder beyond its face value – leave that to the philosophers. There is no fixed objective and the “target board” seems to be moving every week as we learn new ways of becoming better “data analysts”. Of course, this is not a substitute to other, more-established Data Science and Data Analytics classes at NUS.
The real goal of the class, that I’m slowly grasping, is to ask better, specific, actionable questions after examining any data or piece of information. It’s not about answering them just yet; it’s about being able to ask them in the first place.
… and this blogpost outlines how I royally messed that up!!!
In some way, this blogpost presents a Data Science spin on my friend Linus’s article titled “Build tools around workflows, not workflows around tools“. It’s a 11/10 must-read you can find here.
The Fault In Our Tools
I’ll bite: I have a technical background. I study Computer Science and Pure Mathematics with statistics and data science baked into my curriculum. By practice, I’m a Machine Learning research student working on some lovely projects under the guidance of amazing professors and mentors who I’m indebted to for life.
Since class began, I authored cool Colab notebooks and generated some colourful plots of the dataset I was working with, while sharing it with my Professor and class (it’s an incremental knowledge-sharing-based course). I tried out all sorts of correlations while remembering the age-old mantra of “correlation does not imply causation“.
I liked to think I have this gigantic arsenal of technical tools at my disposal. In fact, I was among the few in class who felt confident working with large datasets and writing code to achieve funky results.
Despite that all, I somehow managed to ask the wrong questions during my EDA and drew inaccurate insights from my datasets. In fact, I made some silly mistakes that tenured data scientists would chide less-senior data scientists for.
… and that made me the biggest loser.
Setting Traps … For Myself
I say “Loser” not in the general sense, but in that I went astray from the true motive of the class, what it stood for, and what my Professor was trying to tell me (and my classmates) all this while. Simply put, I lost track of the true goal.
You see, my ability to write code, knowledge of fancy algorithms, and usage of sTaTe-oF-tHe-aRt ML/DS models on varied types of data were the reasons for my immediate downfall. I very epically trapped and handicapped myself in a tool-centric way of thinking that ended up constraining my QR skills (i.e., the ability to ask meaningful questions).
I strongly believed that having access to these technical tools would enable me to ask better questions.
It soon dawned on me the same day I decided to start penning down this blogpost:
Asking questions about data is NOT dependent on the tools used. It's the tools that are dependent on the questions asked about the data.
The Professor could have easily called this an “Intro to Data Science and Analytics” class and I’d probably pass by with minor scratches. There’d be zero value-add on top of what I’m already doing in my major(s). It just had to be different! My toolkit – my supposed “superpowers” – became my greatest weakness in this class.
It hit me like a truck
This week (at the time of writing), my Professor was going through the exercise of learning to ask good questions during our weekly seminars. He was showing us the World Happiness Report (going back to the topic of the class) and its various nuances.
The catch? There was no code, no funky visualisation tools, no GPU-powered notebooks, no colourful graphs. Nothing. Just MS Excel.
He was asking some really high-level questions about it and so were my friends who were discussing on call. The Professor showed me (and my peers) that it is possible to ask meaningful questions without using anything fancy, no matter the complexity of the dataset. Then again, my Professor told us at the start to keep it simple. Why hadn’t I thought of that?
Which brings me to the main point of this blogpost …
Stop Uselessly Chasing Complexity
I made technology my crutch.
I realised that this technical toolkit had me chasing complexity just so that I could use said technical toolkit. There was no “fun” going back to the basic data tools that were no longer the popular choices in the toolkit.
It brought me closer to an even greater realisation:
Once you learn about the complexity and subtleties behind things around you, you subconsciously wish to see it everywhere because it validates what you know / what you just learned.
It makes you feel that the knowledge you possess is very useful and thereby, worth using everywhere.
It was no different here: I came in thinking that my awesomely cool mAcHiNe lEaRnInG mOdElS and bAyEsIaN iNfErEnCe aLgOrItHmS would be my saving grace. I thought I’d be using it everywhere and asking some really cool questions about the data (what QR is all about!) that would blow everyone’s minds.
I did NOT want to go back to basic EDA methods. Doing that was uncool and boring. My fallacy was equating “uncool basics” to “not showing originality in the questions asked about data”.
You could say that it was a rather intellectual-ego-quenching thing to think about. I felt like I’d be like everybody else if I touched the basic EDA techniques like regression lines or plotting simple line graphs. I thought that I’d have no edge over everyone else in my class.
Yet, doing that was exactly what helped my friends ask better questions that matter.
Long story short, stop chasing complexity in what you do just so that you get to use what you’ve spent years learning. Sometimes, going back to the basics can pay fat dividends in the long term.
It makes life so much easier and enjoyable too!
Questions First, Tools Next
What I figured out isn’t exactly new. It’s a rite of passage for people studying ML and DS. It just so happens to be my turn to narrate the story and pass on what I know.
QR is a means to ask questions that are more than meets the eye, not simple X-Y relationships and correlations; the latter is just plain-old statistics.
QR is like “reading between the lines” but for numeric data.
This isn’t a one-off, sudden realisation. I observed that friends of mine who were from non-STEM majors (i.e., those who didn’t have the “tech toolkit”) asked some really solid questions and gave powerful insights from the different datasets they were examining through the 2 weeks of the class. All in all, their sharings seemed well-planned and executed.
I expected to do better in the class just because I had access to more powerful superpowers. I didn’t even think of my novel, useful question generation abilities …
… which is one of the prime objectives of the class.
Tips for my ML/DS Pals
Here’s some stuff I want to share with you. It’s in bullet point format because I’m running out of things to say 🤣
Stop chasing complexity: KISS
Ask questions about the data at first glance before doing complex code-related EDA:
How was the data created?
Why was something measured the way it was?
Is this the best way to measure said variable?
For multiple data sources, are they all measured in the same way to ensure consistency and fairness?
Data cleaning is everything
The source of and authority behind the data is very important
You don’t need to know all sorts of fancy tools to ask better questions
Use common sense to find out what’s the best way to represent your data
There are tons more but these are what I’m going to extensively practice here on out in class. Gone are the days where the mere presence of my technical toolkit justifies why I must use it on data.
On Next Steps
Since the dawn of this “newfound knowledge”, I’ve decided to ditch my toolkit in class.
It’s a rather scary move given my background but I hope to start from the ground up. I’m essentially removing one very important degree of freedom (i.e., the technical toolkit) so I can purely focus more on asking better, specific, actionable questions that matter simply by looking at the data as is. I’ll primarily be using Google Sheets and MS Excel for my analyses – if others are using it and asking great questions, why can’t it work for me?
Of course, this isn’t permanent. I’ll slowly be transitioning back to my powertools once I begin to ask some good QR-related questions and analyse data the right way (whatever that means).
Till then, let’s see how it goes! More updates to come.
A call to action …
If you liked what you read, consider subscribing to my newsletter: