AD
Episode
325
Interview
Web News

Should Our AI Data Be in the Cloud? On-Device AI vs Cloud-Based AI

Recorded:
August 20, 2024
Released:
September 3, 2024
Episode Number:
325

Should AI run in the cloud or on-device? Some people think that on-device is the only way forward as it ensures a certain level of privacy, while others worry that our local hardware (PC, smartphone, smartwatch) is not powerful enough to deliver cutting-edge features. In this episode, Matt and Mike explore how on-device AI and cloud-based AI work. The guys discussed the benefits of each approach while analyzing and debating their differences on both a hardware/software and user-experience level. Are you worried about your AI information being out in the cloud?

Listen

Also available on...
...and many more, check your podcast app!

Who’s in This Episode?

Show Notes

Episode Sponsor - Magic Mind & Wix Studio

Magic Mind returns as an episode sponsor this week and we thank them for their support!

Limited time Magic Mind deal for our listeners!

https://magicmind.com/HTMLPOD20 - 20% off for one-time purchases and subscriptions (Use the link and code!)

Code: HTMLPOD20

Wix Studio: The Web Platform for Agencies and Enterprises

Wix Studio is the new web platform tailored to designers, developers and marketers who build websites for others or for large organizations. The magic of Wix Studio is its advanced design capabilities which makes website creation efficient and intuitive.

Check out Wix Studio today.



How to support the show

Patreon

Prices subject to change and are listed in USD

  • Support the show from as little as ~$1/month
  • Get a shoutout at the end of the episode (while supplies last) for just ~$3/month
  • Help support the HTML All The Things Podcast: Click Here


Show Notes

Introduction

  • On-device versus in-cloud is a huge debate these days for many different technologies
    • Although the argument takes slightly different forms (ie in video games, gamers argue single player games shouldn’t rely on an internet connection “always online” because it’s another point of failure)
  • For some of the newest tech, like AI, the argument takes on a different meaning as it will shape our devices’ hardware and change how we use our devices forever
  • There are two camps:
    • Cloud-based AI
    • On-device AI
  • Cloud-based AI is…
    • AI that does its query processing in the cloud on a server somewhere, then sends that response back
    • [I’ll refer to cloud-based AI as cloud AI]
  • On-device AI is…
    • AI that does not use the cloud for its processing, the processing is done all on the device. This includes learning as well, an integral part of the UX of AI

Benefits of on-device AI

  • No reliance on an internet connection
    • Especially important for mobile devices due to spotty data connection in rural areas, large buildings, or holes in standard coverage
    • Mobile networks are often high-latency as well (albeit somewhat remedied by modern tech), which can delay the sending of queries and the receiving of responses
  • Speed!
    • Latency (already mentioned) plays a big part in slowing down AI
    • Even when the connection is 5G+ with full bars, sending data across the country (or even further sometimes)  to a datacenter to be processed, then back again takes time
  • Privacy
    • If you’re sending data out into the cloud, then there’s a risk of it being exposed
    • When processing is done on your device, the cloud doesn’t need to know about your data - but you should still safeguard your device as it could be lost or stolen!

Benefits of cloud AI

  • Access to more computing power 
    • Datacenters offer access to a lot of computing power (virtually limitless in the context of most AI query/responses)
    • Datacenters are updated frequently (new servers, more RAM, hardware maintenance and repair) to keep up with their computational load and to keep up with new features - an upgrade that you don’t have to do (ie you don’t need to buy a new iPhone)
    • What this all means is that cloud AI is scalable
  • Updates
    • On top of the hardware updates, datacenters will handle software updates to new AI models, apply interim updates, and monitor that other fixes are applied + working as they should
  • More data
    • If you grew up in the 90s, you’ve seen our devices grow from simple unconnected experiences to ones that almost always need it and that’s because there’s only so much that you can do locally
    • Think of your smartphone experience, if you lost data you could:
      • Use the calculator
      • Take photos
      • Play local games (rare these days)
      • Call and text
      • You’ll lose: Data messaging (ie WhatsApp), social media, GPS for nav & photo tagging (device dependent), your voice assistant (device dependent)
    • Access to data gives us:
      • way more context and information (ie news)
      • allows us to connect more easily with friends, family and strangers (social media)
    • AI being an intelligence can utilize this data to enhance it’s queries
      • For example, if your device is on-device only and was last updated on world events on January 1, 2018 - it would have no knowledge of COVID-19 (a major world shaping event)
      • Humans are the query creators, what if new language starts becoming popular (ie cap) and the user types that into an AI that has been local for years…it won’t understand the query

How On-Device AI Works

Hardware

  • CPU (Central Processing Unit)
    • Good at everything, but not great at anything
    • Capable of running AI processes quickly (when they’re less demanding)
  • GPU (Graphics Processing Unit)
    • Originally meant to just process graphics
    • Very effective of parallel processing - ideal for matrix and vector calculations (common in AI)
    • Can help speed up AI processes on-device by handling multiple operations simultaneously 
  • NPU (Neural Processing Unit)
    • Special chip for neural network operations
    • Optimized for AI-related calculations, performing them fast and efficiently (a major plus for mobile devices on-battery)
  • ASIC (Application-Specific Integrated Circuits) and FPGAs (Field-Programmable Gate Arrays)
    • Custom-built hardware solutions that can be used in a wide array of uses, including AI
    • Custom-designed for use in particular applications
    • Because they’re hardware, they’re very efficient - but aren’t very flexible in changing what they can do
    • FPGAs are a bit more flexible than ASICs as they can be re–programmed after manufacturing to perform specific functions (ie AI tasks)

Software

  • Machine Learning Frameworks
    • TensorFlow Lite, PyTorch Mobile, and Core ML are examples of popular machine learning libraries designed for mobile and edge devices
      • Side Note: 
        • An edge device is a computing device that operates at the "edge" of a network, meaning it processes data closer to the source of that data rather than relying on centralized cloud or data center resources
        • For example, a digital camera that has on-device AI to recognize faces is an edge device

Model Optimization Techniques 

  • Quantization
    • Reduces the precision of computations from floating point to integers, decreasing model size and speeding up inference while maintaining acceptable accuracy.
  • Pruning
    • Removes redundant or insignificant weights from a neural network, reducing model size and computational demand without significantly impacting performance.
  • Knowledge Distillation
    • Trains a smaller model (the "student") to mimic a larger model (the "teacher"), resulting in a compact model that requires less computational power but still performs well.
  • Model Compression
    • Reduces model size by simplifying its structure through techniques like matrix factorization, minimizing complexity with minimal impact on accuracy.
  • Sparse Representations
    • Converts dense matrices to sparse formats, reducing computational and memory requirements, especially effective in deep learning models with many zero parameters.
  • Layer Fusion
    • Merges multiple neural network layers into one, reducing computational overhead and improving inference speed, such as fusing batch normalization with convolutional layers.
  • Neural Architecture Search (NAS)
    • Automates the design of neural networks optimized for specific hardware, finding the most efficient architecture given the device's computational constraints.
  • Hardware-aware Tuning
    • Tailors AI models to the specific hardware they will run on, optimizing for characteristics like memory bandwidth and processor speed to maximize performance.

Sources