1. Start with the workload, not the chip
The most common mistake in AI hardware is reverse-engineering the product around a chip someone read about on a blog. You should start with three questions:
- What's the latency budget? 100ms, 1s, or 5s? This determines whether you need a dedicated NPU or can ride on the CPU.
- What's the power envelope? Battery-powered (under 500mW peak) or wall-powered (no constraint)?
- What's the worst-case accuracy requirement? 95% mAP for face detection, or 70% for "good enough" wake-word?
Once you have these, silicon selection becomes a lookup table, not a debate.
2. The silicon landscape (2026)
Here's our mental model for picking AI-capable silicon. It's not exhaustive, but covers 95% of what gets built.
| Tier | Chips | Typical Use | TOPS | Power |
|---|---|---|---|---|
| MCU class | ESP32-S3, RP2040 + TinyML | Keyword spotting, simple CV | 0.05-0.5 | 50-200 mW |
| Linux SoC entry | Allwinner V3s, BL808 | Local CV, small LLMs (1-3B) | 1-2 | 0.5-2 W |
| Linux SoC mid | Rockchip RK3588, Amlogic A311D | Multi-stream CV, 7B LLMs | 6-12 | 3-8 W |
| Dedicated NPU | Hailo-8, Google Edge TPU | Always-on vision, fast inference | 20-26 | 2-5 W |
| Flagship edge AI | NVIDIA Jetson Orin, Qualcomm RB5 | Multi-modal, foundation models | 40-275 | 7-60 W |
Real example: our AI-tracker (SKYTAG-like)
This is the SKYTAG family — a small battery-powered finder that needs to run a tiny object-detection model on a camera frame, decide if "your stuff" is in frame, and notify you. Workload: ~150ms latency, under 200mW peak power, 80% mAP on person detection.
Silicon choice: ESP32-S3 + VL53L5CX (time-of-flight sensor). We use ToF instead of camera because (1) it's 10x lower power, (2) it's robust to lighting, and (3) we don't actually need to recognize "what" the object is — only "where" it is relative to the tracker. This reframed the entire problem and saved us 2 years of model optimization.
3. Thermal design is where AI products die
AI workloads spike. A face-detection model running at 10 FPS on a small SoC will spike CPU to 80% utilization, drawing 2-3x baseline power. In a sealed plastic enclosure, that means the case temperature hits 60°C within 20 minutes — which is above the touch-comfort limit (45°C) and accelerates battery degradation.
Three mitigations we use:
- Duty cycle: Run inference at 1Hz, not 10Hz. 80% of "real-time" products don't actually need real-time.
- Thermal mass: Use the metal enclosure as a heat spreader. We spec aluminum enclosures for any product drawing >1.5W continuous.
- Adaptive throttling: Drop to a smaller model when the case temp exceeds 50°C. We do this with a single thermistor + 3 lines of code.
4. The firmware pipeline nobody talks about
The boring, unglamorous part of AI hardware is the firmware pipeline that gets the model onto the device and keeps it updated. Here's our standard stack:
// 1. Train on cloud (PyTorch → ONNX)
// 2. Quantize (ONNX → TFLite-Micro / NCNN / vendor SDK)
// 3. Bundle into firmware (CMake embeds model as C array)
// 4. OTA update via signed delta patches
// 5. A/B partition for rollback (we've never had a bricked device since adopting this)
Two non-obvious points:
- Quantization is where 80% of accuracy loss happens. Don't trust the framework defaults. We always validate quantized accuracy against a held-out test set before shipping.
- OTA matters even for low-volume. Your model will improve, your bugfixes will ship, and you will screw up at least one update. A/B partitions and signed payloads are not optional.
5. Field test, then field test more
Lab accuracy is meaningless. We put every AI product through three field-test gates before mass production:
- Controlled environment: 50 scripted scenarios, hit 95% success.
- Uncontrolled environment: 20 volunteers use the device for a week, report failures.
- Adversarial environment: Lighting, motion blur, occlusion, EMI from other electronics. If you can't handle a kid waving it around, you don't ship.
Each gate typically drops accuracy by 5-10%. Plan for it.
6. When to bring in EMS (us)
Most AI hardware founders underestimate the gap between "I have a working prototype on my desk" and "I can ship 1,000 units." The gap is:
- BOM hardening for supply-chain variance
- DFM review (your schematic works, but your layout can't be assembled at scale)
- Firmware reproducibility (your dev board flashed fine; production needs signed images)
- Compliance (FCC, CE, RoHS, MFi for Apple ecosystems)
We do free DFM review on any design that comes through our door. If you're past the prototype stage and want a sanity check before you spend 50k on a pilot run, email us.