Evidently AI: Data and Model Monitoring

Elena and Emeli of Evidently AI discuss what they've learned applying ML across a wide variety of industries, including manufacturing and industrial process improvement, and then go into why they've started building tools for data and ML monitoring as well as how teams can do it better.

Guest Bio

Elena Samuylova and Emeli Dral are the co-founders of Evidently AI, where they build open source tools to analyze and monitor machine learning models. Elena was previously the head of the startup ecosystem at Yandex, director of business development at their data factory and chief product officer at Mechanica AI. Emeli was previously a data scientist at Yandex, chief data scientist at the data factory and Mechanica AI in addition to teaching machine learning both online and at multiple universities.

Links Mentioned

Show Notes

  • 02:15 How Emeli and Elena each got started in data science
    • For Emeli, data science was a practical choice, attended the Yandex School of Data Analysis and thought it was all very interesting
    • Elena started out in economics and business, but wanted to go into IT because she liked being an early adopter of technologies.
      • She originally worked at Yandex running their startup accelerator
      • Yandex launched an internal startup (Data Factory) in 2014 which was consulting to apply ML to many different companies and she didn't want to miss the opportunity
    • Emeli was a data scientist at the Data Factory, which is where they started working together
  • 07:10 Applying machine learning across a wide variety of industries at the Yandex Data Factory
    • Common challenges encountered
      • Enterprises have their data stored in many different ways, varying qualities of implementation
      • Hardest part is choosing the use-case from the data available, formulating the problem statement, deciding on metrics
      • Had to manage expectations, lots of hype around "AI"
        • Had to learn how to communicate what is possible with the technology and the limitations
    • How they chose what problem to work on
      • High-volume process that could be automated with reasonably low risk, high enough upside
      • Common use-cases were demand prediction, churn prediction, marketing personalization
    • Differences across industries
      • Algorithms used are usually the same, data setups were always very different
        • eg. e-commerce purchase data needs to be treated differently than manufacturing sensor data
        • Very important to work with domain experts to deeply understand the data
  • 14:55 Using ML for industrial process improvement
    • Later both worked at Mechanica AI, a startup building an ML platform for process improvement of the production side of industrial manufacturing
    • There's always variation in the process: different raw materials, different environment (humidity, temperature)
    • Creating models to help the control operators understand what's happening
      • Predicting output quality, finding anomalies, etc. so that operator could intervene if necessary
    • Previous models were standard statistical process control
      • ML added an additional layer on top that took into account the machine's historical operating data
  • 23:35 Challenges encountered in industrial ML
    • Data is stored in very different systems than in IT companies
      • "Even just getting there and extracting the data is a challenge"
      • Sometimes using custom-built data collection and storage software
    • Differing levels of data:
      • Raw milli-second level sensor time series used for immediate process control
      • Second layer which aggregates raw data and stores long term
    • Data available in real-time is slightly different than what's available historically
      • Sensors change, aggregation methods changed
    • Sensors will fail or go haywire, differing sample rates or time zones between them
    • Machines are very different between plants, hard to deliver a solution that works out-of-the-box
      • Need to work closely with experts at each plant to customize the solution
  • 27:15 The huge opportunity for ML in manufacturing
    • "Majority of industrial companies are very well suited to apply these technologies because you can collect this data pretty much immediately... In many cases the major issue that we came across is that companies just throw away the data, they just don't store it."
      • "10 years ago when they instrumented this, no one thought about using all this data. The only reason why it is even recorded is because if you have some incidents you might want to look through to understand why it happened. But storing this data seemed just a cost for these companies until they understood that they can actually use it for improvement."
    • Process manufacturing has been instrumenting machines for decades
    • IT companies have a lot to learn from manufacturing as well
      • Combining human and algorithmic decision making
      • Parallels with advanced process control (APC) systems
      • Fall-back systems, safe overrides
        • 3 levels of progressively simpler models that can be used if a sensor goes out or collection fails for whatever reason
    • "We should not underestimate the size of this opportunity. Literally one third of the global GDP is manufacturing."
    • Don't have to deal with personal data, don't have issues with ethics and bias
  • 37:40 Why they started working on tools for data and ML monitoring
    • They faced all sorts of data quality issues when applying ML previously, not many easy-to-use tools to address them
    • Domain experts who are often non-technical often will spot data problems before data scientists will, but currently only data scientists are in positions to find them
    • "Successful adoption only happens when all the stakeholders are on board, they understand and trust the model and they know that they can actually rely on it."
  • 42:50 Different kinds of data drift and how to address them
    • Continuous drift
      • Data shifting slightly over time little by little
      • Harder to spot
      • Can address by continually fine-tuning on the newest data
    • Sudden drift
      • Large change at once in the distribution
      • Easier to spot
      • Harder to address: need to fully re-train only on newest data
        • Not always enough available
  • 48:25 Common mistakes ML teams make in monitoring
    • "Many companies treat monitoring as an afterthought."
    • One mistake is only relying on direct model feedback, which in some cases has a long lag time
      • eg. Demand forecasting: you might only know the ground truth a week or month after the prediction is made
      • Need other ways to monitor the data that are faster
        • Statistical properties of the input data
        • Distribution of the output
    • Over-alerting
      • Lots of different features for any given model, need to figure out which are the most important that should be alerted on
        • Otherwise you won't trust the alerts and will ignore important ones when they come up
    • Not tying the accuracy metrics to a business metric and monitoring that as well
      • Not always easy, but very important
      • Need to make sure that the model is having the intended effect, that customers are using it correctly
      • eg. Demand forecasting: measuring an inventory efficiency KPI
      • Get all stakeholders in the same room to sort it out
  • 55:25 Features of Evidently AI's library
  • 57:35 Building open source software
    • Want to establish best practices to be used by a wide variety of companies
    • Market is perhaps too early to have a commercial solution
    • Efficient distribution model
      • Developers can download it and use it right away
  • 01:00:15 Suggestions for how to use Evidently right now
    • Currently generates reports from batches of data
    • Can have it run as an automated step whenever an offline model is run on new data
    • Can use it as part of an acceptance test whenever a model is re-trained
  • 01:02:25 Technical roadmap for Evidently
    • Building an alerting service
    • Adding more connectors so that most data sources can be used
    • Model and data observability
      • Surfacing insights you might not have directly been looking for
  • 01:05:50 Monitoring complex data
    • Hard, but not impossible
    • NLP: alerting on anomalous topics or tokens
  • 01:08:50 Business roadmap for Evidently
    • Want to talk to many people to figure out best practices
  • 01:11:35 Rapid fire questions
    • For fun:
      • Elena: travel photography, walking
      • Emeli: basketball
    • Books:
      • Elena: Thinking Fast and Slow, Flow
      • Emeli: Doing Good Better
    • Advice:
      • Elena: Don't start a business for the money
      • Emeli: Invest in fundamentals: engineering skills, programming, and math
    • Recent changed mind on:
      • Elena: Most problems don't need ML solutions
      • Emeli: Hobbies aren't just a distraction from work
    • Contrarian truth:
      • Elena: Remote work is an amazing opportunity
      • Emeli: Don't need to be passionate about something to choose it as a career