In 2015, when I entered college and first started learning about data science and machine learning, the title of “data scientist” meant anything from analyzing a business’s data and suggesting strategy adjustments to doing novel research in deep learning algorithms. Since then, the field has splintered into many different roles, with the people doing the aforementioned tasks now working under the titles of “Data Analyst” and “Research Scientist”. Some other roles that now exist are “Data Engineer”, “Applied Scientist”, and “Machine Learning Engineer”.
For a long time, the bottleneck to creating useful machine learning products was insufficient accuracy caused by lack of good datasets, algorithms that could learn from them, and computing power. Researchers in both academia and industry have removed this bottleneck with the development of new datasets, algorithms that can learn from both labeled and un-labeled data, and the continuation of Moore’s Law.
But despite these amazing advances, the only places they are being used are at large tech companies, suggesting a further bottleneck in the system.
I believe that this bottleneck now lies in putting these models into production.
Some believed that machine learning models were no different than other code and should be treated accordingly–built and deployed with the same processes and tools that have been developed over the last few decades. This is a false belief and has resulted in much strife for those teams that tried it.
It is out of this realization that the role of the “Machine Learning Engineer” was created. And as more companies have started to use ML, the popularity of the role has grown accordingly:
But what does a ML engineer actually do?
Like the “data scientist” before it, what MLEs actually do varies greatly from company-to-company. The only commonality is the mandate of “put models into production”.
One of the reasons that adoption of ML is so hard from an organizational perspective is that it requires many different but interconnected parts to work together. Models need to be built from data and then deployed in such a way that they can continue learning. This involves coordination between data teams, researchers, dev ops, and occasionally even hardware infrastructure in addition to all the normal coordination required for “traditional” software like product, PM, and UI/UX.
As such the role of MLE tends to require a more horizontal skillset, knowing less about more fields rather than the other way around (vertical).
This may be different at companies whose machine learning capability is very mature (Google, FB, Netflix, Stitchfix, etc.), but those are the vast minority of places one will find oneself working at.
So what fields are these? What skills does one need? Here’s what Workera (sister company to deeplearning.ai) has to say:
I broadly agree with this, with my only nit-picks being that I would de-emphasize mathematics to “developing” and emphasize software engineering more, including in it data engineering and DevOps.
Having been on Twitter for the past few months, I’ve noticed that many aspiring MLEs tend to focus only on the machine learning skills, especially modeling complex problems such as those on Kaggle.
In my own work as an MLE at Workday, I’d estimate that I spend 25-40% of my development time doing data science and building models, at least 50% in software or data engineering, and then maybe 10% on DevOps.
A recurring theme in my podcasting and writing is that there is a mismatch between what most aspiring MLEs tend to focus on and what is actually useful. Reading new deep learning papers is fun and exciting, but reflects less than 10% of the actual things I do in my day-to-day. Far more valuable is to know the basics of each ML sub-field, defined as enough knowledge that you could get up to speed with the state of the art quickly if needed.
I posit that one would get a far higher return from learning more about data and software engineering than trying to keep up with model architectures that are likely to be out-dated in less than a year.
When I interviewed Josh Tobin, he mentioned how modeling is likely to become more commoditized with the rise of technologies such as Lobe and AutoML in addition to the trend of more inference being handled by external APIs. I asked him: “Given what you’ve just said, what should machine learning engineers focus on?” He responded: “Focus on the data and on getting models into production”
And in the most popular episode of the podcast to date, Shreya Shankar said that the main skill she wishes she had learned earlier was data wrangling: interacting with databases, ETL with MapReduce/Spark, and data cleaning.
If you’re still not convinced, here’s what two ML-Twitter superstars, Chip Huyen, and Santiago Valdarrama posted earlier this year:
Notice how “ML Algos” are at #7 on Chip’s list and #6 on Santiago’s.
Yes, these skills aren’t as “sexy” as the latest SotA transformer, but for exactly that reason they are overlooked by almost everyone else you’re competing with, giving you a solid edge in the job market.