AI-First Engineering
OpenPhone should be built the way the product behaves: small autonomous loops, strong runtime contracts, visible evidence, and human review at the right boundaries.
This document is the operating map for making that real in the repository.
Principles
- Build repeatable loops, not heroic one-off prompts.
- Prefer small green units over large branches.
- Make every agent run inherit the same repository contract.
- Turn repeated manual review into scripts, schemas, benchmarks, or docs checks.
- Keep physical-device evidence separate from source control.
- Let humans approve product, safety, and release decisions; let agents do the repetitive exploration, repair, and verification work.
Required Loops
Fast Commit Loop
Use for normal implementation tasks.
- Pick one small task with a clear exit condition.
- Inspect relevant docs, contracts, and tests.
- Make the smallest coherent change.
- Run
./scripts/check.sh. - Run
git diff --check. - Fix failures with at most a few bounded repair passes.
- Commit one coherent green unit.
This loop is the default for docs, scripts, schemas, protocol contracts, assistant implementation, and integration code.
Runtime Contract Loop
Use when changing OpenPhone runtime tools, OpenClaw integration, MCP, CLI, or future remote runtimes.
Required local checks:
./scripts/check-runtime-protocol.sh
./scripts/check.sh
git diff --checkRequired live checks when hardware and a gateway are available:
./scripts/smoke-test-openclaw-runtime.sh
./scripts/run-eval-suite.shAny new runtime must define:
- command and event mapping;
- default exposure and safety class;
- confirmation behavior for state-changing actions;
- identity, token, and storage boundaries;
- at least one smoke or contract test.
Device Eval Loop
Use before claiming phone-control quality improved.
The canonical smoke suite is:
./scripts/run-eval-suite.shBenchmark coverage is:
./scripts/run-agent-benchmark.sh \
--benchmark docs/agent-benchmarks/openphone-v0.jsonDevice eval output belongs under ignored .worktree/ paths or GitHub Actions
artifacts, not committed source files.
Review Loop
Use for pull requests before merge.
- A reviewer agent compares the branch against the base branch.
- Findings focus on correctness, safety, privacy, missing validation, docs drift, and release risk.
- The implementer fixes only accepted findings.
- The reviewer re-checks once.
The goal is not to make agents argue forever. The loop exits when findings are fixed, explicitly declined, or moved to follow-up work.
Docs Freshness Loop
Run regularly, and whenever product behavior changes.
Docs review should check:
- whether
docs/README.mdstill points to the canonical entry points; - whether architecture, runtime, security, testing, and release docs agree;
- whether obsolete claims can be deleted instead of explained around;
- whether new scripts or contracts are documented;
- whether docs are readable for a new contributor and specific enough for an agent to act on.
Long-term, the docs should be published as a static site, for example at
docs.openphone.secondly.com. The source of truth should remain this repo; the
site should render curated docs, not become a second place where truth drifts.
Local-only agent notes belong in ignored docs/local-temp/, not in public docs.
See docs/LOCAL_AGENT_NOTES.md.
CI And CD Target State
Current Baseline
ci.ymlruns repository checks and whitespace checks on GitHub-hosted Linux.eval.ymlruns physical trajectory smokes on theopenphone-deviceself-hosted runner.release.ymlruns release work on theopenphone-buildself-hosted runner.
Next CI Layers
-
PR contract checks
- Always run
./scripts/check.shandgit diff --check. - Fail quickly on schema, protocol, policy, CLI, MCP, and assistant Java regressions.
- Always run
-
Emulator runtime smoke
- Build or install an emulator image with the OpenPhone assistant.
- Boot it headlessly on a self-hosted runner.
- Run ADB-backed UI/context checks.
- Exercise local runtime actions without depending on provider keys.
-
Remote runtime smoke
- Start or connect to an OpenClaw gateway.
- Configure the phone runtime over ADB.
- Verify presence, command exposure,
openphone.screen.get, and one safe tool round trip.
-
Physical device eval
- Run trajectory smokes nightly and on demand.
- Run the benchmark suite before releases and after high-risk runtime work.
- Upload trajectories and summaries as private artifacts.
-
Release gate
- Require build, device, runtime, docs, license, and security evidence before tagging a release.
Release Publishing Loop
Every public release should have one obvious trail:
- Update
docs/releases/CHANGELOG.md. - Update the versioned release notes under
docs/releases/. - Run CI and relevant evals.
- Dispatch
.github/workflows/release.ymlwith the version, device, release notes file, prerelease flag, and latest-release behavior. - Publish OTA artifacts,
SHA256SUMS, andARTIFACTS.mdto GitHub Releases. - Confirm the GitHub release page is the public source for that version.
Work Queue Shape
Keep tasks small enough that one agent can finish, validate, and commit them.
Each task should include:
- goal;
- likely files;
- validation command;
- risk level;
- exit condition;
- whether device, emulator, OpenClaw, or docs-site validation is required.
Good lanes:
- runtime protocol contracts;
- assistant UI and runtime implementation;
- policy, approvals, and auditability;
- OpenClaw and future runtime adapters;
- emulator/device eval infrastructure;
- docs/site publishing;
- release automation.
Codex Routines To Add
These routines should run in isolated worktrees where possible.
- PR reviewer: review new pull requests for correctness, safety, docs drift, missing tests, and privacy risk.
- CI watchdog: when CI fails, produce a diagnosis or small patch.
- Daily docs gardener: find stale, contradictory, or orphaned docs and propose a small cleanup.
- Eval summarizer: after nightly device evals, summarize pass/fail, regressions, and artifacts.
- Work queue curator: maintain the next set of small tasks by lane.
Unattended routines must avoid broad local secrets and should not commit or push without explicit human review unless the action is narrowly scoped and well proven.
Docs Site Target
A hosted docs site should be generated from docs/ and should have a curated
navigation rather than dumping every Markdown file equally.
Recommended first version:
- static Markdown site generator such as VitePress, Docusaurus, or MkDocs;
- custom domain
docs.openphone.secondly.com; - GitHub Pages or another static host;
- docs build check in CI;
- broken-link check;
- clear sections for Concepts, Build, Testing, Runtime, Devices, Releases, Legal, and Contributing.
The first docs-site PR should only add the publishing scaffold and navigation. It should not rewrite all docs at once.
Definition Of AI-First
OpenPhone is AI-first when:
- every PR is reviewed by humans and agents;
- every important behavior has a script, contract, benchmark, or eval;
- docs are written for both people and agents;
- agents can safely run in parallel worktrees;
- CI proves local, remote, emulator, and physical-device behavior at the right cadence;
- repeated work turns into reusable loops.