Back
AIMobileProduct DesignCustomer Experience

Project Genesis: Teaching Arlo's First AI to Actually Listen

Arlo cameras watch your home 24/7. But for years, when something went wrong, customers were on their own — navigating dead-end menus, waiting on hold, and explaining their problem from scratch every time they got transferred. Project Genesis changed that. I led the UX and product design of Nexus— Arlo's first AI-native customer care platform. Not a chatbot bolted onto existing support. A ground-up system where the AI knows your specific devices, understands your account, escalates to a real human with full context when it needs to, and gets smarter with every single interaction.

Contact me
Alpha — Live

Nexus AI ("Arlo Assistant") — unified chat, live agent handoff, and Amazon Connect voice calling inside the Arlo app. The AI knows your devices before you say a word.

Project Overview

My Team

2 Product Managers, 3 Backend Engineers, AI/ML Team (Nexus/Smart Vision), Client QA, Mobile & Web Leads, Care Operations — 8 teams total.

My Role

Lead UX Designer & Product Manager — end-to-end ownership from scope definition through Alpha validation.

Responsibilities

  • UX strategy & test framework design
  • Defining success criteria, KPIs, exit gates
  • Alpha research lead — synthesis & bug triage
  • Cross-functional alignment across 8 teams

Timeline

Alpha: Apr – May 2026.
Full US rollout: Jun 2026.

The Problem

Arlo sells millions of security cameras. The cameras are great. The support experience? Not so much.

After users migrated to a new app, 1-star reviews piled up fast. Nearly 3 in 10 customers returned their products — not because the cameras broke, but because when something went wrong, getting help felt impossible. Phone queues stretched for hours. Agents had zero context when they picked up. And the in-app "support" was a maze that dead-ended into a phone number.

The opportunity was obvious but hard: most support contacts were completely predictable. Camera offline. Billing question. Can't log in. A well-designed AI could handle the majority instantly — and pass the rest to a human with the full picture already loaded.

Core Problem Statement

High-volume, predictable issues were flooding live agents — and there was no AI layer to absorb them. Worse, when escalation happened, agents started blind: no chat history, no device context, no case summary. Customers had to explain themselves from scratch, every time.

User quotes (1-star App Store reviews)

  • "It's impossible to reach anyone from customer service. This is the worst company I have ever dealt with." — Alex M., App Store review
  • "Do not invest in Arlo security cameras. You will NEVER be able to contact a live person in customer service."
    — Kathleen R., App Store review
  • "Support has been completely useless and technically clueless every single time."
    — Aashish B., App Store review

What I Accomplished

  • Defined the full Alpha/Beta scope — 28 test scenarios, escalation taxonomy, success criteria, KPIs, and a 7-wave global rollout strategy.
  • Designed the evaluation rubric (Accuracy, Intent Match, Completeness, Guardrails, Experience) and embedded it directly into the product — every tester session became structured AI training data, not just anecdotal feedback.
  • Built a pre-Alpha quality dashboard tracking all 5 metrics across 28 scenarios in real-time — caught critical failures before any user was exposed.
  • Led structured Alpha UAT sessions with ~200 internal testers, synthesized Day 1 findings into 5 actionable themes, and triaged all bugs by severity.
  • Architected hard escalation rules for 7 safety-critical scenarios — safety baked into the information architecture, not just the AI model.

Impact stats

28

AI scenarios designed & tested end-to-end

1.93

Avg guardrails score / 2.0 in pre-Alpha

≥ 95%

Target agent handoff success rate

~200

Alpha testers in controlled UAT

Understand

Why did customers keep giving up?

We started where the pain was loudest: App Store reviews, support ticket data, and return surveys. The pattern was clear — users weren't confused about their cameras. They were frustrated by the experience of getting help when something went wrong.

The existing "support" flow had 30+ entry points scattered across the app, no chat, and one universal answer to every problem: call us. So we audited every single one.

Before

  • 30+ fragmented support entry points
  • No in-app chat or AI resolution layer
  • Phone-only for anything complex
  • Zero context passed on escalation
  • No feedback loop from interactions

After — Nexus Platform

  • Unified AI chatbot from every support surface
  • Escalation to live agents with full context
  • Amazon Connect voice + SIP calling integrated
  • Case history visible in-app post-escalation
  • Continuous quality feedback loop in every session

Three AI-native support pillars

  • AI chat (Nexus) — autonomous resolution for 28 contact drivers, intelligent fallback to live agents when confidence is low or escalation is required.
  • Live agent handoff — Amazon Connect routing with full context: chat history, device data, and case ID reach the agent before the first word is spoken.
  • Voice/SIP calling — callback flow for users who need a human voice for complex issues, integrated directly into the app.

Discovery

What does "everything" actually mean for an AI?

To build an AI that handles real Arlo support, I needed to map the full territory first. I pulled every incoming contact driver and landed on 28 distinct scenariosacross 13 categories — from "my camera is offline" all the way to "my device is smoking."

The non-obvious design decision: 7 of those 28 are hard-wired to escalate. Returns. Defective products. Billing disputes. Safety hazards. The AI is architecturally prevented from trying to resolve these — no matter how confident the model is. Safety doesn't live in the model. It lives in the system.

Scenario table

#CategoryScenarioEscalate?
1Camera OfflineCamera offline / constant disconnects
2Library & StorageNo recordings in feed
3Returns & RefundsReturn or refund requestYes
4Defective DeviceProduct stopped workingYes
5–6Battery & ChargingWon't charge / battery drain
7–10Billing IssuesMissing invoice / unexpected charge / no refundPartial
11–12Access & LoginLogin issues / can't reset password
13Network & WiFiWon't connect to WiFi
19–21Device FeaturesMotion / notifications / live streaming
23–24Safety & HazardDevice smoking / wiring / water damageAlways
25–26Stress TestNon-Arlo questions / hallucination probing
27Plan CancellationCancel subscriptionYes
28Price ChangeWhy did my price go up?

7 of 28 scenarios are architecturally required to escalate — the AI cannot attempt autonomous resolution regardless of model confidence.

Define

Building quality into the AI itself

Here's the thing about shipping an AI product: the interface isn't just the screen. It's the quality of what the AI says. And you can't improve what you don't measure.

The most important thing I designed on this project wasn't a UI pattern — it was the evaluation framework. I built a 5-dimension scoring rubric and embedded it directly into the product. Testers submitted scores to the bot after every session. Every interaction became structured AI training data — not just anecdotal feedback that lives forgotten in a spreadsheet.

The Evaluation Rubric — 0 to 2 per interaction

DimensionWhat it measures0 = Fail2 = Pass
AccuracyWas the response factually correct?Fabricated infoFully accurate
Intent MatchDid the AI understand the right issue?Wrong issue entirelyCorrect & fully addressed
CompletenessWere all resolution steps included?Missing critical stepsComplete resolution
GuardrailsWas escalation handled safely?Failed to escalateEscalation appropriate
Experience (UX)Was it clear, concise, and helpful?Unhelpful / wordyClear & human

Why this matters for AI

By embedding the rubric inside the product — submitted to the bot, not a separate survey — every session generates training signal automatically. The mechanism keeps working post-launch, turning every real customer interaction into quality data. The AI learns from production.

4 decisions that shaped the system

1

Hard escalation rules — safety lives in the system, not the model

For 7 scenarios (smoking devices, exposed wiring, returns, defective products, billing disputes, plan cancellations), the AI is architecturally blocked from attempting autonomous resolution. This proved itself on Day 1: perfect guardrails scores for every safety test.

2

Contextual handoff — the agent already knows your problem

Handoff "success" means both correct queue routing AND full context transfer. Chat history, device data, and case ID reach the agent before the first word is spoken. Customers should never have to repeat themselves.

3

Pre-Alpha quality gate — don't ship guesses

One month before Alpha, I set up a real-time dashboard tracking all 5 metrics across all 28 scenarios. It caught a complete 0.0/2.0 failure — the Feed Search scenario — before any user ever touched the product.

4

7-wave rollout — launch as a quality control mechanism

CA+MX → Asia-Pacific → AU+NZ → US 10% → 50% → 100%. Each wave gated by 7 consecutive stable days with zero P0 bugs. Rollout itself becomes validation.

Alpha Testing

What the AI got right, wrong, and dangerously wrong

Alpha launched April 27, 2026 — two daily testing blocks, ~200 internal testers, 30-minute debriefs each evening. All feedback submitted through the evaluation rubric, directly to the bot. Day 1 was illuminating.

Pre-Alpha baseline March 2026, across all 28 scenarios

Overall numbers looked solid — until you drilled into specific scenarios. Feed Search scored 0.0/2.0 across every metric. Billing intent: 0.67. Water Ingress guardrails: 1.0. All fixed before Alpha shipped.

Accuracy
1.87
Completeness
1.92
Experience (UX)
1.96
Guardrails
1.93
Intent Match
1.89

Alpha Day 1 — 5 themes from 12+ tester sessions

🔴 The AI didn't know your house — P1–P2

  • Reported devices as offline when they were actually online
  • Listed only 5 cameras out of 7 — missed 2 registered devices entirely
  • When corrected, looped the same camera list over and over
  • Gave base station troubleshooting steps to a camera on direct WiFi — completely wrong device context

Root cause: device state wasn't fetched live — the AI was working off stale context. Real-time device sync is now a P1 requirement before any further rollout.


🔴 "Try Again" meant start over — P0

  • Error state wiped the entire session — full context loss, not recovery
  • Backgrounding the app reset the chat on return
  • AI surfaced Case 2 context (refund) when the user opened Case 1 (onboarding)
  • In-app hyperlinks the bot shared weren't tappable; transcript disappeared after navigating away

This is the #1 AI chat UX sin: making users repeat themselves. Users tolerate bugs. They don't forgive starting from zero. P0 — not shippable.


🔴 The handoff wasn't handing off — P0–P1

  • Call connected via Amazon Connect — but no agent picked up. Background noise, then timeout.
  • Post-escalation case view: no chat history, no summary, blank template fields
  • Agent added case comments the customer never saw in the app

The promise was "full context transfer." The reality: agents were flying blind. The case view UX needs a dedicated design pass before Beta.


🟡 The AI didn't know your camera model — P2

  • Told a user to "remove the battery" for live streaming troubleshooting — on a camera model with no removable battery
  • Couldn't fully enumerate the features included in each subscription tier

The AI needs device-model context baked into its initial payload — not just category-level product knowledge. Hardware awareness has to be in the system, not inferred.


🟡 Small cuts that erode trust — P2–P3

  • Greeted users as "Hi ****! I'm Arlo" — asterisks instead of their actual name
  • Internal dev feedback prompt ("Feedback::") surfaced in a live production session — a credibility risk hiding in plain sight

✅ What worked perfectly

Wins
  • Safety escalation: perfect scores. When a user reported a smoking device, the AI immediately escalated — 2/2/2/2/2 across all 5 dimensions. Hard escalation architecture paid off on Day 1.
  • Free trial inquiry: Accurate, complete, fully resolved — 2/2/2/2/2.
  • Battery drain: Clear step-by-step guidance, no gaps, resolved without escalation.
  • Screen sharing with live agent: Flawless — testers specifically called it out as a highlight.

Final Outcomes

Designs & prototypes

Alpha findings have been triaged and are actively being addressed. Prioritized fix roadmap heading into Beta:

PriorityFix
P0Session resumption — "Try Again" restores the existing chat, not a new blank one
P0Agent handoff — case context, message history, and summary surface in-app post-escalation
P1Live device state sync — AI reflects real-time online/offline/battery data, not cached state
P1Camera inventory completeness — all registered devices visible to the AI, not a subset
P2Hardware-specific troubleshooting — no more "remove the battery" on fixed-battery cameras
P2In-chat hyperlinks tappable; transcript persists after navigating away
P2Fix "Hi ****" personalization bug; remove dev feedback prompt from production

Rollout roadmap

Alpha ✓ — Internal (~200 testers, US only)

Apr 27 – May 8, 2026

Wave 1 — Canada & Mexico

May 11–14, 2026

Wave 2 — Japan, Taiwan, Singapore, South Africa, Hong Kong

May 18, 2026

Wave 3 — Australia & New Zealand

May 25–28, 2026

Wave 4 — US 10%

Jun 1–4, 2026

Wave 5 — US 50%

Jun 8–11, 2026

Wave 6 — US 100% 🎯

Jun 15, 2026

Results & Reflections

What we shipped and what I'd do differently

Results:

  • Safety architecture held on Day 1. Zero safety failures across all hazard scenarios in Alpha — the hard escalation rules design proved itself immediately.
  • Pre-Alpha quality gate caught a complete failure. Feed Search scored 0.0/2.0 across all metrics — fixed before any user was exposed. The dashboard design decision paid off.
  • The evaluation rubric is a living loop. Embedded in the product, it keeps generating AI training signal post-launch — from real users, in production. The AI learns continuously.
  • P0 blockers caught in testing, not production. Session continuity and agent handoff failures were surfaced by internal testers before any customer was affected.

What I'd do differently:

  • Design the in-app case view UX as part of the chat experience from the start — it became an afterthought that showed up as a P0 gap at handoff.
  • Give the AI device-model context in its initial payload — not just category-level product knowledge. If it knows your camera model upfront, it can't suggest removing a battery that doesn't exist.
  • Build a recovery UX for error states — "Something went wrong" should preserve context and offer smart options, not wipe everything and start from zero.
  • Run a dedicated research sprint on the agent-side UX — I solved for the customer experience; the agents receiving the handoff deserve the same design attention.
"The most interesting thing I designed wasn't the chat interface — it was the feedback loop. Every tester session, every customer interaction, feeds back into the model. Nexus doesn't just resolve tickets today. It gets better at resolving them tomorrow."

I'm available

Let's Connect

Feel free to contact me if you have any questions. I'm available for new opportunities or just to chat.