1. Start with the workload, not the chip

The most common mistake in AI hardware is reverse-engineering the product around a chip someone read about on a blog. You should start with three questions:

  1. What's the latency budget? 100ms, 1s, or 5s? This determines whether you need a dedicated NPU or can ride on the CPU.
  2. What's the power envelope? Battery-powered (under 500mW peak) or wall-powered (no constraint)?
  3. What's the worst-case accuracy requirement? 95% mAP for face detection, or 70% for "good enough" wake-word?

Once you have these, silicon selection becomes a lookup table, not a debate.

2. The silicon landscape (2026)

Here's our mental model for picking AI-capable silicon. It's not exhaustive, but covers 95% of what gets built.

TierChipsTypical UseTOPSPower
MCU classESP32-S3, RP2040 + TinyMLKeyword spotting, simple CV0.05-0.550-200 mW
Linux SoC entryAllwinner V3s, BL808Local CV, small LLMs (1-3B)1-20.5-2 W
Linux SoC midRockchip RK3588, Amlogic A311DMulti-stream CV, 7B LLMs6-123-8 W
Dedicated NPUHailo-8, Google Edge TPUAlways-on vision, fast inference20-262-5 W
Flagship edge AINVIDIA Jetson Orin, Qualcomm RB5Multi-modal, foundation models40-2757-60 W

Real example: our AI-tracker (SKYTAG-like)

This is the SKYTAG family — a small battery-powered finder that needs to run a tiny object-detection model on a camera frame, decide if "your stuff" is in frame, and notify you. Workload: ~150ms latency, under 200mW peak power, 80% mAP on person detection.

Silicon choice: ESP32-S3 + VL53L5CX (time-of-flight sensor). We use ToF instead of camera because (1) it's 10x lower power, (2) it's robust to lighting, and (3) we don't actually need to recognize "what" the object is — only "where" it is relative to the tracker. This reframed the entire problem and saved us 2 years of model optimization.

Reframe trick: Often the biggest AI-hardware win is realizing you don't need AI at all. A dumb sensor with smart backend logic can outperform a clever on-device model.

3. Thermal design is where AI products die

AI workloads spike. A face-detection model running at 10 FPS on a small SoC will spike CPU to 80% utilization, drawing 2-3x baseline power. In a sealed plastic enclosure, that means the case temperature hits 60°C within 20 minutes — which is above the touch-comfort limit (45°C) and accelerates battery degradation.

Three mitigations we use:

  1. Duty cycle: Run inference at 1Hz, not 10Hz. 80% of "real-time" products don't actually need real-time.
  2. Thermal mass: Use the metal enclosure as a heat spreader. We spec aluminum enclosures for any product drawing >1.5W continuous.
  3. Adaptive throttling: Drop to a smaller model when the case temp exceeds 50°C. We do this with a single thermistor + 3 lines of code.

4. The firmware pipeline nobody talks about

The boring, unglamorous part of AI hardware is the firmware pipeline that gets the model onto the device and keeps it updated. Here's our standard stack:

// 1. Train on cloud (PyTorch → ONNX)
// 2. Quantize (ONNX → TFLite-Micro / NCNN / vendor SDK)
// 3. Bundle into firmware (CMake embeds model as C array)
// 4. OTA update via signed delta patches
// 5. A/B partition for rollback (we've never had a bricked device since adopting this)

Two non-obvious points:

5. Field test, then field test more

Lab accuracy is meaningless. We put every AI product through three field-test gates before mass production:

  1. Controlled environment: 50 scripted scenarios, hit 95% success.
  2. Uncontrolled environment: 20 volunteers use the device for a week, report failures.
  3. Adversarial environment: Lighting, motion blur, occlusion, EMI from other electronics. If you can't handle a kid waving it around, you don't ship.

Each gate typically drops accuracy by 5-10%. Plan for it.

6. When to bring in EMS (us)

Most AI hardware founders underestimate the gap between "I have a working prototype on my desk" and "I can ship 1,000 units." The gap is:

The rule of thumb: If your prototype works on a breadboard but you're not sure if the connector is in stock at 10k quantity, talk to an EMS partner before you commit to the design.

We do free DFM review on any design that comes through our door. If you're past the prototype stage and want a sanity check before you spend 50k on a pilot run, email us.