Dan Jeffries: AI Infrastructure and Ethics

Dan discusses why he's so excited about the future of machine learning, where it is on the technology adoption curve, the rise of a "canonical stack" of AI infrastructure, and practically approaching the hard problems in AI ethics.

Guest Bio

Dan Jeffries is the chief technical evangelist at Pachyderm, a leading data science platform. He's a prominent writer and speaker on all things related to the future. He's been in software for over two decades, many of those at Redhat, and is the founder of the AI Infrastructure Alliance and Practical AI Ethics.

Links Mentioned

Show Notes

  • 02:15 How Dan got started in computer science
    • College job was assistant to the secretary at a software company
      • Went to the IT guy and said "teach me everything you know"
    • Learned that "tenacity is the greatest skill when figuring out computer problems"
    • Two approaches to finding solutions aligned with introvert/extrovert trait
      • "if you're a bit more extroverted than you tend to reach out to other folks and you build a tremendous network of very intelligent people who are experts that you ask questions"
      • "Other folks who are more introverted will tend to go to YouTube, go to the web, go to a bunch of tutorials, dig into forums. They're master searchers."
    • "You need to be really good at one of them, probably good at both in the long run"
  • 06:50 What Dan is most excited about in AI
    • "The history of man is the history of intelligence and artificial intelligence is the technology that changes everything"
      • Every leap forward is from a new, more intelligent way of doing things
        • Hunting -> agriculture
        • Monocropping -> Leaving half empty -> crop rotation
    • Humans tend to focus too much on the negatives
      • No tech itself is good or bad, it's about how humans use it
    • Everything will be assisted by AI
    • Especially excited about creative enhancement in music and arts
      • "Think about if you could, as a musician, take out your guitar and play an amazing riff. And then, the algorithm would say, cool, here's 50 continuations of that. And you listen to them and you go, you know what? Number eight is amazing. Give me more variants on that."
    • Lots of applications in healthcare, especially preventative
      • Wearables able to alert you before something bad might happen so you can go to a doctor in time
  • 14:45 Where we are in the adoption curve of ML
    • Early-adopter stage
    • Still figuring out what problems we even have as we apply ML
      • eg. Feature stores recently becoming developed and used
    • "We're just beginning to figure this out... The software is a little rough around the edges that reminds me of a lot of the early days of Linux, where it was [only] for hobbyists. You had to know 50 different [things] just to get it working and installed, much less do anything... We're in the Netscape and SSL just got invented phase of the internet."
    • Still building out the core infrastructure for companies to use instead of rolling their own
  • 20:40 The "Canonical Stack" of ML
    • Dan's article: Rise of the Canonical Stack in ML
    • 3-5 general categories of infrastructure
      • Data aggregation, wrangling
      • Experiment tracking, training pipelines
      • Model deployment, serving
      • Model monitoring, auditing, management
    • Models need to be continuously re-trained on new data coming in
    • Fastest developing are the last category, especially as bias becomes a bigger issue and people want to know how models are making decisions
    • Pachyderm: data-driven pipelines instead of porting over logic-driven ones from traditional software
      • Plus data versioning and lineage, guaranteed immutability
        • Inspired by copy-on-write file systems
    • Very interested in how feature stores will evolve and interact with various parts of the stack
    • End-to-end solutions aren't going to be adopted, requirements are too different across companies
  • 32:00 Dan's goal for the AI Infrastructure Alliance
    • Doesn't want the alliance to be prescriptive (these are the companies to use)
    • Wants competitors to work together, make their tools all integrate with each other, create common abstractions and interfaces
  • 40:55 "Problems that ML startups don't know they're going to have"
    • Being able to re-produce a model at any given time
    • Compliance with data regulations (GDPR et al)
    • Data and model security, access control
    • Connecting different tools together and maintaining those connections
  • 49:00 Closed vs open source tools in the Canonical Stack
    • Open source tools start off uglier, but eventually tend to become more well-adopted and more powerful
    • Some tools make more sense to be SaaS/proprietary
      • Those that can achieve economies of scale across customers
        • ML-assisted data labeling
        • Specialized hardware training platform
        • General model APIs
      • Those that require integration with large numbers of other system
        • Data and model monitoring
  • 01:08:40 Dan's practical approach to AI Ethics
    • Most companies haven't yet grasped how important it is
      • Too busy just trying to get it to work
    • Need algorithm transparency and explainability
    • Better communication to end-users on it's capabilities and limitations
    • Algorithms might be able to make better decisions on sensitive characteristics, but we aren't allowed to do that right now
      • Some algos are just using other features as proxies for sensitive ones
    • Wants companies to actually address these issues instead of just using platitudes to good PR
    • Can't hold algorithms to too high a standard--just needs to be better than a human across all categories
      • eg. A hiring algo that's not perfectly unbiased, but still less biased than a human
    • Ethics needs to be a consideration from leaders at the top of an org
    • Need to be clear on what is being traded off for ethical considerations--no free lunch
    • Create a system to find and fix algorithm biases--not a one-off action
      • Don't assume you will be perfect and never have an issue
  • 01:23:50 Rapid fire questions
    • For fun: travel during non-COVID times, otherwise cooking
    • Books: Thinking in Bets, History of the World in 6 Glasses, Super Thinking
    • Under-rated ML application: art (eg. Adobe's ML features)
    • Advice: Always make time for learning new things and following interests
    • Contrarian truth: People need to be taught to think better in school