AI-First Engineering

OpenPhone should be built the way the product behaves: small autonomous loops, strong runtime contracts, visible evidence, and human review at the right boundaries.

This document is the operating map for making that real in the repository.

Principles

Build repeatable loops, not heroic one-off prompts.
Prefer small green units over large branches.
Make every agent run inherit the same repository contract.
Turn repeated manual review into scripts, schemas, benchmarks, or docs checks.
Keep physical-device evidence separate from source control.
Let humans approve product, safety, and release decisions; let agents do the repetitive exploration, repair, and verification work.

Required Loops

Fast Commit Loop

Use for normal implementation tasks.

Pick one small task with a clear exit condition.
Inspect relevant docs, contracts, and tests.
Make the smallest coherent change.
Run ./scripts/check.sh.
Run git diff --check.
Fix failures with at most a few bounded repair passes.
Commit one coherent green unit.

This loop is the default for docs, scripts, schemas, protocol contracts, assistant implementation, and integration code.

Runtime Contract Loop

Use when changing OpenPhone runtime tools, OpenClaw integration, MCP, CLI, or future remote runtimes.

Required local checks:

./scripts/check-runtime-protocol.sh
./scripts/check.sh
git diff --check

Required live checks when hardware and a gateway are available:

./scripts/smoke-test-openclaw-runtime.sh
./scripts/run-eval-suite.sh

Any new runtime must define:

command and event mapping;
default exposure and safety class;
confirmation behavior for state-changing actions;
identity, token, and storage boundaries;
at least one smoke or contract test.

Device Eval Loop

Use before claiming phone-control quality improved.

The canonical smoke suite is:

./scripts/run-eval-suite.sh

Benchmark coverage is:

./scripts/run-agent-benchmark.sh \
  --benchmark docs/agent-benchmarks/openphone-v0.json

Device eval output belongs under ignored .worktree/ paths or GitHub Actions artifacts, not committed source files.

Review Loop

Use for pull requests before merge.

A reviewer agent compares the branch against the base branch.
Findings focus on correctness, safety, privacy, missing validation, docs drift, and release risk.
The implementer fixes only accepted findings.
The reviewer re-checks once.

The goal is not to make agents argue forever. The loop exits when findings are fixed, explicitly declined, or moved to follow-up work.

Docs Freshness Loop

Run regularly, and whenever product behavior changes.

Docs review should check:

whether docs/README.md still points to the canonical entry points;
whether architecture, runtime, security, testing, and release docs agree;
whether obsolete claims can be deleted instead of explained around;
whether new scripts or contracts are documented;
whether docs are readable for a new contributor and specific enough for an agent to act on.

Long-term, the docs should be published as a static site, for example at docs.openphone.secondly.com. The source of truth should remain this repo; the site should render curated docs, not become a second place where truth drifts.

Local-only agent notes belong in ignored docs/local-temp/, not in public docs. See docs/LOCAL_AGENT_NOTES.md.

CI And CD Target State

Current Baseline

ci.yml runs repository checks and whitespace checks on GitHub-hosted Linux.
eval.yml runs physical trajectory smokes on the openphone-device self-hosted runner.
release.yml runs release work on the openphone-build self-hosted runner.

Next CI Layers

PR contract checks
- Always run ./scripts/check.sh and git diff --check.
- Fail quickly on schema, protocol, policy, CLI, MCP, and assistant Java regressions.
Emulator runtime smoke
- Build or install an emulator image with the OpenPhone assistant.
- Boot it headlessly on a self-hosted runner.
- Run ADB-backed UI/context checks.
- Exercise local runtime actions without depending on provider keys.
Remote runtime smoke
- Start or connect to an OpenClaw gateway.
- Configure the phone runtime over ADB.
- Verify presence, command exposure, openphone.screen.get, and one safe tool round trip.
Physical device eval
- Run trajectory smokes nightly and on demand.
- Run the benchmark suite before releases and after high-risk runtime work.
- Upload trajectories and summaries as private artifacts.
Release gate
- Require build, device, runtime, docs, license, and security evidence before tagging a release.

Release Publishing Loop

Every public release should have one obvious trail:

Update docs/releases/CHANGELOG.md.
Update the versioned release notes under docs/releases/.
Run CI and relevant evals.
Dispatch .github/workflows/release.yml with the version, device, release notes file, prerelease flag, and latest-release behavior.
Publish OTA artifacts, SHA256SUMS, and ARTIFACTS.md to GitHub Releases.
Confirm the GitHub release page is the public source for that version.

Work Queue Shape

Keep tasks small enough that one agent can finish, validate, and commit them.

Each task should include:

goal;
likely files;
validation command;
risk level;
exit condition;
whether device, emulator, OpenClaw, or docs-site validation is required.

Good lanes:

runtime protocol contracts;
assistant UI and runtime implementation;
policy, approvals, and auditability;
OpenClaw and future runtime adapters;
emulator/device eval infrastructure;
docs/site publishing;
release automation.

Codex Routines To Add

These routines should run in isolated worktrees where possible.

PR reviewer: review new pull requests for correctness, safety, docs drift, missing tests, and privacy risk.
CI watchdog: when CI fails, produce a diagnosis or small patch.
Daily docs gardener: find stale, contradictory, or orphaned docs and propose a small cleanup.
Eval summarizer: after nightly device evals, summarize pass/fail, regressions, and artifacts.
Work queue curator: maintain the next set of small tasks by lane.

Unattended routines must avoid broad local secrets and should not commit or push without explicit human review unless the action is narrowly scoped and well proven.

Docs Site Target

A hosted docs site should be generated from docs/ and should have a curated navigation rather than dumping every Markdown file equally.

Recommended first version:

static Markdown site generator such as VitePress, Docusaurus, or MkDocs;
custom domain docs.openphone.secondly.com;
GitHub Pages or another static host;
docs build check in CI;
broken-link check;
clear sections for Concepts, Build, Testing, Runtime, Devices, Releases, Legal, and Contributing.

The first docs-site PR should only add the publishing scaffold and navigation. It should not rewrite all docs at once.

Definition Of AI-First

OpenPhone is AI-first when:

every PR is reviewed by humans and agents;
every important behavior has a script, contract, benchmark, or eval;
docs are written for both people and agents;
agents can safely run in parallel worktrees;
CI proves local, remote, emulator, and physical-device behavior at the right cadence;
repeated work turns into reusable loops.

AI-First Engineering

On this page