AI Hardware Development Guide: From Edge Inference to Production

1. Start with the workload, not the chip

The most common mistake in AI hardware is reverse-engineering the product around a chip someone read about on a blog. You should start with three questions:

What's the latency budget? 100ms, 1s, or 5s? This determines whether you need a dedicated NPU or can ride on the CPU.
What's the power envelope? Battery-powered (under 500mW peak) or wall-powered (no constraint)?
What's the worst-case accuracy requirement? 95% mAP for face detection, or 70% for "good enough" wake-word?

Once you have these, silicon selection becomes a lookup table, not a debate.

2. The silicon landscape (2026)

Here's our mental model for picking AI-capable silicon. It's not exhaustive, but covers 95% of what gets built.

Tier	Chips	Typical Use	TOPS	Power
MCU class	ESP32-S3, RP2040 + TinyML	Keyword spotting, simple CV	0.05-0.5	50-200 mW
Linux SoC entry	Allwinner V3s, BL808	Local CV, small LLMs (1-3B)	1-2	0.5-2 W
Linux SoC mid	Rockchip RK3588, Amlogic A311D	Multi-stream CV, 7B LLMs	6-12	3-8 W
Dedicated NPU	Hailo-8, Google Edge TPU	Always-on vision, fast inference	20-26	2-5 W
Flagship edge AI	NVIDIA Jetson Orin, Qualcomm RB5	Multi-modal, foundation models	40-275	7-60 W

Real example: our AI-tracker (SKYTAG-like)

This is the SKYTAG family — a small battery-powered finder that needs to run a tiny object-detection model on a camera frame, decide if "your stuff" is in frame, and notify you. Workload: ~150ms latency, under 200mW peak power, 80% mAP on person detection.

Silicon choice: ESP32-S3 + VL53L5CX (time-of-flight sensor). We use ToF instead of camera because (1) it's 10x lower power, (2) it's robust to lighting, and (3) we don't actually need to recognize "what" the object is — only "where" it is relative to the tracker. This reframed the entire problem and saved us 2 years of model optimization.

Reframe trick: Often the biggest AI-hardware win is realizing you don't need AI at all. A dumb sensor with smart backend logic can outperform a clever on-device model.

3. Thermal design is where AI products die

AI workloads spike. A face-detection model running at 10 FPS on a small SoC will spike CPU to 80% utilization, drawing 2-3x baseline power. In a sealed plastic enclosure, that means the case temperature hits 60°C within 20 minutes — which is above the touch-comfort limit (45°C) and accelerates battery degradation.

Three mitigations we use:

Duty cycle: Run inference at 1Hz, not 10Hz. 80% of "real-time" products don't actually need real-time.
Thermal mass: Use the metal enclosure as a heat spreader. We spec aluminum enclosures for any product drawing >1.5W continuous.
Adaptive throttling: Drop to a smaller model when the case temp exceeds 50°C. We do this with a single thermistor + 3 lines of code.

4. The firmware pipeline nobody talks about

The boring, unglamorous part of AI hardware is the firmware pipeline that gets the model onto the device and keeps it updated. Here's our standard stack:

// 1. Train on cloud (PyTorch → ONNX)
// 2. Quantize (ONNX → TFLite-Micro / NCNN / vendor SDK)
// 3. Bundle into firmware (CMake embeds model as C array)
// 4. OTA update via signed delta patches
// 5. A/B partition for rollback (we've never had a bricked device since adopting this)

Two non-obvious points:

Quantization is where 80% of accuracy loss happens. Don't trust the framework defaults. We always validate quantized accuracy against a held-out test set before shipping.
OTA matters even for low-volume. Your model will improve, your bugfixes will ship, and you will screw up at least one update. A/B partitions and signed payloads are not optional.

5. Field test, then field test more

Lab accuracy is meaningless. We put every AI product through three field-test gates before mass production:

Controlled environment: 50 scripted scenarios, hit 95% success.
Uncontrolled environment: 20 volunteers use the device for a week, report failures.
Adversarial environment: Lighting, motion blur, occlusion, EMI from other electronics. If you can't handle a kid waving it around, you don't ship.

Each gate typically drops accuracy by 5-10%. Plan for it.

6. When to bring in EMS (us)

Most AI hardware founders underestimate the gap between "I have a working prototype on my desk" and "I can ship 1,000 units." The gap is:

BOM hardening for supply-chain variance
DFM review (your schematic works, but your layout can't be assembled at scale)
Firmware reproducibility (your dev board flashed fine; production needs signed images)
Compliance (FCC, CE, RoHS, MFi for Apple ecosystems)

The rule of thumb: If your prototype works on a breadboard but you're not sure if the connector is in stock at 10k quantity, talk to an EMS partner before you commit to the design.

We do free DFM review on any design that comes through our door. If you're past the prototype stage and want a sanity check before you spend 50k on a pilot run, email us.

1. Start with the workload, not the chip

2. The silicon landscape (2026)

Real example: our AI-tracker (SKYTAG-like)

3. Thermal design is where AI products die

4. The firmware pipeline nobody talks about

5. Field test, then field test more

6. When to bring in EMS (us)

Keep reading

3D Printing for Prototyping

Custom PCBA: Schematic to First Article

Low-Volume Manufacturing