In a previous post, I argued that aspiring Machine Learning Engineers should focus more on engineering rather than machine learning. One will never need to train a fully state of the art model, and reading papers about them should mostly be regarded as entertainment rather than as having any serious career value.
That said, there are some research areas that happen to have significant overlap with problems commonly faced as a practitioner in industry. In this post I detail three areas I think are the most relevant.
Transfer learning is nothing new. The concept was first proposed in 1976, with the first success occuring in 1981. But after numerous AI winters, it was mostly ignored until deep learning came back to the forefront of AI with AlexNet’s success on ImageNet in 2012. Soon after, researchers took up the subject again and started using the first few layers of deep convolutional networks across many different tasks. At NIPS 2016, Andrew Ng even said that transfer learning will be the next driver of commercial ML success.
Since then, transfer learning has obtained success after success, not just in computer vision, but also in natural language processing with the rise of pre-training methods like in BERT that are able to leverage massive amounts of un-labeled data.
At this point, as an applied researcher it is very rare that I train any deep learning model from scratch, especially for complex tasks like those in NLP. Leveraging pre-trained models like those from Hugging Face greatly increases accuracy, especially in tasks where there is not much data available. In extreme cases, models must learn from very few examples, and it is these scenarios that we can turn to research to inform us on how to get the most from it.
In the past year, GPT-3 has shown us what’s possible in few shot learning with truly massive model sizes. But we don’t need multi-million dollar budgets to obtain good results with few data points. Generative text modeling is an extraordinarily difficult task and it is rare that anyone would need to do so. Few shot learning is a particularly hot field right now and is one of the few that I think is really worth paying attention to as a practitioner.
- Sebastian Ruder’s blog post
- A Comprehensive Survey on Transfer Learning
- Generalizing from a Few Examples: A Survey on Few-Shot Learning
Explainability and Interpretability
If neural networks had perfect accuracy, we wouldn’t need to know why. But they don’t, so this field becomes immensely important as these “black box” models are used for increasingly vital tasks.
If you work with a product or feature where customers directly interact with a model, you’ve inevitably had a situation come up where a customer will bring you an example where an error occurred and they come to you asking why. In these situations, it is rare that a technical explanation of why you cannot know will suffice. Trust me on this one.
Feature importance charts or input saliency maps can be helpful at times, but still don’t tell the full story of what is going on. Unfortunately, this is often where we have to accept the limits of our tools and turn to recent research if we truly want to dive deeper.
On the AI ethics front, there are some models that should be provably fair across certain features and that can only be done with the tools that are coming out of research in the explainability space. I suspect that this will only become more important.
- Explainable Artificial Intelligence: a Systematic Review
- Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI
- A Survey on the Explainability of Supervised Machine Learning
Model Compression and Distillation
I’ve had to learn the hard way that accuracy at all costs is not a feasible way to go about creating models meant for production. Features have SLAs and infrastructure has limits. In the past, I’ve spent an entire month working on a very clever model that greatly outperformed only to find that it exceeded the maximum latency we had to support.
With the rise of real-time machine learning, reducing inference latency becomes increasingly important. There are two ways of doing this: reducing the number of ops in a model and/or making the ops run faster.
While much of the latter is not quite production-ready as it involves specialty hardware that most will not have access to, quantization is the principle technique that is often a no-brainer to apply. PyTorch even recently started to support quantization-aware training!
In terms of reducing model ops, pruning and distillation have been the main techniques with the latter achieving especially good results in language models. This area of research is highly practical and I wouldn’t be surprised if we start seeing some more tools being released to aid in this task.
- A Survey of Model Compression and Acceleration for Deep Neural Networks
- Knowledge Distillation: A Survey
- Survey of Machine Learning Accelerators