On-Device Machine Learning
Running machine learning models locally on the user's device rather than sending data to a remote server for inference. Also called edge ML or client-side ML. Core value proposition of the WebMachineLearning initiative and the Prompt API.
This is a note from my public notes. View the canonical version: On-Device Machine Learning.
Running machine learning models locally on the user's device rather than sending data to a remote server for inference. Also called edge ML or client-side ML. Core value proposition of the WebMachineLearning initiative and the Prompt API.
Why It Matters
| Dimension | Cloud Inference | On-Device Inference |
|---|---|---|
| Privacy | Data sent to server | Data never leaves device |
| Latency | Round-trip network | Sub-millisecond local |
| Offline | Not available | Works without internet |
| Cost | Per-query API fees | Zero marginal cost |
| Throughput | Rate-limited by API | Limited by device hardware |
Key Enablers
- Hardware acceleration: NPUs, GPUs, and specialized ML chips in modern devices
- Model compression: quantization, pruning, and distillation make large models fit on-device
- Browser APIs: WebNN API, Prompt API give web apps access to device hardware
- OS-level models: browsers can surface OS-provided models (e.g., Apple's Core ML, Google's Gemini Nano on Android)
Trade-offs
Advantages:
- Privacy by default — no data transmitted
- Works offline
- No API costs
- Low latency for real-time use cases
Limitations:
- Model capability bounded by device compute
- Large model downloads for first run
- Consistency varies across devices and hardware
- Smaller context windows than cloud models
Web Platform Connection
WebMachineLearning standardizes browser access to on-device ML. WebNN API provides the low-level hardware interface; Prompt API and Writing Assistance APIs expose higher-level LLM capabilities.
References
Related
- Machine Learning (ML)
- AI Inference
- AI Privacy
- WebMachineLearning
- Prompt API
- WebNN API
- Browser-Provided Language Models
- Large Language Models (LLMs)
- Edge AI
- Edge Computing
- Neural Processing Unit (NPU)
- Gemini Nano
- Transformers.js
- ONNX Runtime Web
- Web Assembly (WASM)
- WebGPU
About Sébastien
I'm Sébastien Dubois, and I'm on a mission to help knowledge workers escape information overload. After 20+ years in IT and seeing too many brilliant minds drowning in digital chaos, I've decided to help people build systems that actually work. Through the Knowii Community, my courses, products & services and my Website/Newsletter, I share practical and battle-tested systems.
I write about Knowledge Work, Personal Knowledge Management, Note-taking, Lifelong Learning, Personal Organization, Productivity, and more. I also craft lovely digital products and tools.
If you want to follow my work, then become a member and join our community.
Ready to get to the next level?
If you're tired of information overwhelm and ready to build a reliable knowledge system:
- 📚 KM for Beginners — 10+ hours of structured video lessons
- 🚀 Obsidian Starter Kit — Ready-made vault with 40+ templates
- 💼 Knowledge Worker Kit — Complete guides + lifetime community
- 🦉 1-on-1 Coaching — Personalized guidance
- 🎯 Join Knowii — Community + ALL courses & tools
Found this valuable? Share it with someone who needs it.