Tech Strategic Plan · For Internal Review

Lumen AI
Health Coach:
An AI‑first rebuild.

If we were starting Lumen today — with the breath data we have, the user base we've built, and the AI tools available in 2026 — we probably wouldn't build a hardware company with an app attached. We'd build a metabolic intelligence product that happens to use a device, with community and gamification woven in to keep people coming back. Here's a sketch of what that could look like.

Ilya Yakushev · Prepared for the Lumen Team

Three layers, one continuous loop.

Today, the device, the app, and the data each live in their own mental model. The proposal is to bring them into one stack, with each layer having a clear job: data flows down, insights flow back up.

L1The brain

Metabolic Intelligence Core

A continuously learning model of this user — identity, experiments, commitments, metabolic assessment, learned habits, and conversation state.

  • Persistent user metabolic model — features grow with every breath, meal, sleep night, workout.
  • Recommendation engine: impact × completion-probability × experiment relevance × state fit × time-of-day × intensity budget.
  • Loop state machine — manages all five engagement loops simultaneously.
  • Six physiological mediators (cortisol, glycogen, circadian, muscle mass, CoQ10, insulin sensitivity) as the explanatory language.
User state & recommendations
L2The voice

Conversational AI Coach

An LLM agent grounded in the user's data and the Lumen knowledge base. It works proactively, not just reactively — initiating breaths, following up on commitments, and acknowledging completion.

  • Six communication mechanics: commitment request, loop closure, reward, prediction, expectation-setting, balance approach.
  • Multi-modal: text, breath triggers, meal photo analysis (Tier 2).
  • Always explains why through the relevant mediator before revealing the result.
Cards (NBA, insight, prediction, summary)
L3The body

Visual Body Interface

Card-based, real-time view of body state. Engagement is tied to action — the rule is no commit, no card. The day's cards accumulate, weekly summaries roll up, and the Journal keeps everything for drill-down.

  • Three tabs: Coach (conversation) · Visual Body (execution) · Journal (persistent record).
  • Drill-down hierarchy: weekly summary → daily summary → individual card.
  • Mobile-first; web-first delivery for velocity (per the current sprint focus).

Three squads, an agent fleet, and a PM who owns the KPI.

Squads map 1:1 to the architecture layers, so engineers and the systems they own ship together. AI agents do the volume work; people do the judgment work. Headcount stays lean by design: 13 builders, 3 enablers, plus an agent fleet.

SQUAD 1Layer 1

Metabolic Intelligence

Owns the user model, recommendation engine, mediator definitions, and the breath/lifestyle data pipeline.

ML Engineer×2
Metabolic Scientist×1
Data Engineer×1
TL · Senior ML Engineer / VP of Data
SQUAD 2Layer 2

AI Coach & Conversation

Owns the LLM agent stack, RAG over the user model, prompt & tool architecture, and the loop state machine that turns conversation into commitments.

AI / LLM Engineer×2
Product Designer (conversation UX)×1
Backend Engineer×1
TL · AI Engineer / CTO
SQUAD 3Layer 3

Mobile & Visual Body

Owns the three-tab app, the card system, real-time body visualization, and the bridge from the agent's outputs to a touchable, shippable surface.

iOS / Android Engineer×2
Frontend / React Native Engineer×1
Product Designer×1
TL · Mobile Lead / VP of R&D
ENABLERS · CROSS-SQUADKPI & release quality

Enabling roles

Kept lean on purpose. The PM owns the North Star KPI, the data scientist closes the analysis loop, and QA holds the release bar.

Product Manager×1 — owns KPI
Data Scientist×1 — KPI & cohort analysis
QA Engineer (or AI-solution equivalent)×1

What's automated, what isn't, and why.

Function AI Agent Human Why
Daily recommendation generation Fully automated
Reviews edge cases~5% of plans
Volume and scoring are mechanical; people only triage the outliers.
Conversation responses LLM with grounding Quality monitoring & red-team Real-time conversation isn't realistic to human-staff; quality is sampled instead.
Experiment design Generates candidates Metabolic Scientist approves The LLM proposes; the scientist signs off on anything that touches clinical claims.
Insight generation Pattern detection Validates clinical claims Pattern-finding is statistical; framing it safely is a human call.
User support Tier 1 deflection Tier 2+ human FAQ-shaped issues fit an LLM well. Hardware, billing, and medical cases go to a human.
Product decisions Not delegated PM + Founders Product strategy is a judgment call, not an optimization problem.
Safety / medical review Not delegated Medical Officer Liability and regulatory exposure should sit with a named human.

PHER — Positive Health Engagement Rate.

One number, focused on quality over volume. We propose using it as the leading indicator for retention and outcomes, and as the shared metric across squads.

PHER (Positive Health Engagement Rate)  =  actions the user completed ÷ actions the coach recommended

Read as a percentage: of every 100 Next Best Actions the coach issues, how many does the user actually complete? A PHER of 75% means three out of four recommendations landed.

Why this metric, and not DAU?

DAU (Daily Active Users) measures how often people open the app, which can quietly reward noisy notifications. PHER measures whether the coach is being useful. A coach that issues fewer NBAs (Next Best Actions) and gets each one done is generally healthier than one that floods the user. Every recommendation the LLM emits is judged on whether it landed.

Weeks 1–2 · ≥ 65% of NBAs completed
Week 12 · ≥ 75% of NBAs completed
Steady-state · ≥ 80% of NBAs completed

Secondary metrics: actions per day, return-interaction rate, and Flex Score movement (the lagging outcome PHER should drive).

0%
PHER target · Week 12

Foundation, then experience, then intelligence.

About five months end-to-end. Internal dogfooding gates the beta — the team should want to use it before anyone else does. Mobile web first, since speed of iteration matters more than native packaging at this stage.

Phase 1 · Foundation

3 weeks

Internal dogfood release
  • Persistent user model + ingestion pipeline.
  • Loop state machine — MVP scope: Next Best Action + Daily Summary loops.
  • LLM Coach with metabolic grounding (RAG over the user model + canonical KB).
  • Mobile-web shell; three tabs in skeleton form.
  • Internal dogfood gate: team uses it daily.
Phase 2 · Core Experience

1 month

Closed beta · 1,000 users
  • All five engagement loops live.
  • Visual Body tab shipped — full card system with no-commit / no-card rule.
  • Intraday adjustment engine ("balance your day", easier-alternative replacement).
  • PHER dashboards instrumented; cohort scaffolding stood up.
Phase 3 · Intelligence Layer

3 months

General availability
  • Full six-signal recommendation scoring.
  • Experiment engine — automated hypothesis → test → personalized insight.
  • Journal + drill-down hierarchy (week → day → card).
  • Cohort benchmarking surfaces in-product.
  • Lumen Pro integrations & clinical dashboards lit up.

The advantages compound, the flywheel turns, and the data itself becomes a meaningful switching cost.

Data advantage

Proprietary breath data and longitudinal metabolic profiles.

76M+ measurements would be hard for anyone else to reproduce in a single venture cycle. Personal Flex Score histories get richer with every breath, and an LLM-first company with a generic wearable would have a hard time recreating this from scratch.

Defensibility

The user's switching cost is their own data.

The metabolic model gets personalized over months, not days. After 90 days, a user's coach should know them better than a fresh system would — even on the same hardware. That's a reasonable source of stickiness.

AI-native

Designed for agents from day one, not bolted on after.

The user model, loop state machine, and card system are designed to give the agent a clean substrate. The intent is to build the app around the agent, rather than retrofit an LLM into an existing app.

Engagement economy

The points loop can double as a revenue flywheel.

The coach issues NBAs, the user commits and completes, points accrue, and redemptions either deepen the product (advanced features → ARPU) or run through partners (gift cards → affiliate revenue). Gamification can pull double duty as engagement and monetization.

Retention & community

We expect this to lower churn and lift engagement — that's the whole point of the rebuild.

Today, most Lumen drop-off happens in the first 30–60 days, before users have built enough metabolic insight to feel the value. The AI coach, the points loop, and the community layer are designed to fix that window directly. Internal target: ≥ 20% reduction in 60-day churn versus the current cohort once the new experience reaches GA, with a corresponding lift in weekly return-interaction rate.

Coach proactivity
The coach initiates breaths and follows up on commitments instead of waiting to be opened — every closed loop is a reason to come back.
Points, levels, streaks
Server-authoritative XP, level progression, and streak tracking make daily completion feel like progress, not chore. Redemptions give the points somewhere to land.
Community & leaderboards
Daily, weekly, monthly, and all-time leaderboards plus friend cohorts give users someone to compete (and commiserate) with. Social proof is one of the strongest known retention levers in health apps.
More users
Better cohort data
Better recommendations
Better outcomes
Word of mouth + retention
More users