Table of Contents
- Key Highlights
- Introduction
- How the product loop simplifies a complex coaching problem
- Why rules-first matters where safety is essential
- Choosing the stack: Next.js, Supabase, OpenAI Responses API, and why not heavy orchestration
- Structured output and strict JSON schemas: why parsing is not enough
- Data model and security: training_states, workout_sessions, workout_sets
- Coach insights and caching: treat evidence as first-class
- UX lessons: language, onboarding, and transparency
- Practical safety boundaries and conservative defaults
- Limitations of the initial build and engineering priorities
- Extending to trainers: permissions, auditing, and workflows
- Why structured validation and rule fallbacks reduce legal and ethical risk
- Cost and performance considerations in model usage
- Real-world examples that highlight the design trade-offs
- Design principles that guided development
- Metrics that matter for validating the loop
- What this approach signals for other domains
- Engineering roadmap: from prototype to product
- Final product assessment: strengths and where to watch
- FAQ
Key Highlights
- A rules-first architecture pairs deterministic safety checks with the OpenAI Responses API to generate clear, constrained workout plans and session explanations.
- The product emphasizes persisted workflows—logging real session data, validating AI output with JSON schemas, and keeping deterministic fallbacks to prevent unsafe recommendations.
- Built on Next.js and Supabase, the app demonstrates how lightweight stacks can support iterative product learning, trainer workflows, and future extensions without over-relying on complex orchestration frameworks.
Introduction
Many people can get a workout plan from a prompt or an online coach. Tracking what actually happened in the gym, deciding what to change next, and doing so in a way that respects safety and real-world variability is far harder. Be My Trainer tackles that operational problem by combining a compact technical stack with a rules-first decision layer and constrained use of a generative model. The result is an application that treats AI as an assistant for language and personalization, while keeping deterministic rules in charge of safety-critical choices.
This article walks through the product loop, architectural choices, and design trade-offs behind Be My Trainer. It shows why structured model outputs and persistent user history matter for fitness apps and explains how simple engineering decisions—controlled schemas, local rule engines, and conservative defaults—reduce risk and improve trust. The lessons apply broadly to any domain where user safety and gradual adaptation must coexist with personalized, language-rich experiences.
How the product loop simplifies a complex coaching problem
The core of the product is a feedback loop that maps a small user profile into a starter plan, captures detailed session data, then adjusts subsequent workouts based on concrete evidence. That loop shifts the hard work from speculative personalization to empirical, session-level adaptation.
Onboarding collects a concise profile: primary goal, experience level, age range, weekly training frequency, available equipment, and limits or preferences. Those values seed an initial program: a weekly layout of workout days, each with exercises, sets, reps, and an explicit coach message. Critically, the system never invents starting weights. Instead, starting load is forced to zero so the user picks an appropriate weight in the gym. This removes a common safety failure mode where models suggest unrealistic loads.
During a workout, logging ranges from full set-by-set entries (reps, weight, effort) to quick feedback buttons like "too easy", "good", or "too hard." The user also records energy and soreness after the session and can add free-text notes. These session-level signals are the most important input to the next-session decision. The workflow becomes:
- Profile → generate plan
- Complete workout → log performance and recovery
- Calculate a safe adjustment → have AI phrase an explanation
- Persist the next workout and coach message
Persisting history matters. When adjustments come from actual logged data, progressive overload is anchored to what the user did, not to what a model guessed they could do. The result is an evidence-based training path rather than a speculative one-off plan.
Real-world implication: Compare a friend who tells you to "add 5 pounds every week" and a coach who watches form and effort. Be My Trainer records the user's output and approximates an attentive coach, but only within the limits its rule layer and validation allow.
Why rules-first matters where safety is essential
Fitness guidance is not purely creative text. It has safety constraints, predictable responses to pain and fatigue, and a need for conservative defaults. The rules-first pattern used here enforces a narrow safety envelope and delegates explanation to the model.
The decision pipeline looks like this:
- User input → deterministic rule decision
- Rule layer produces a set of allowed safe adjustments
- Structured AI request presents those options
- AI selects among allowed actions and composes an explanation
- Server validates the AI response; if invalid, the fallback rule result is used
This approach prevents a model from inventing a risky recommendation such as pushing through pain, drastically increasing load after a poor session, or prescribing an exercise the user lacks equipment for. The rule engine considers quantities like percentage of target reps completed, average effort across sets, energy and soreness ratings, and presence of pain-related language in notes. It then maps these signals into one of a small number of determinations: progress gradually, reduce volume by 20%, hold/make easier, stop or swap a movement, or repeat the workout.
An illustrative scenario: A user misses several reps on a heavy compound lift, reports high soreness (4/5), and adds a note about feeling "sharp pain in the left knee." The rule engine will flag the movement and select the "Stop or swap a flagged movement" decision. The model may be asked to explain the swap in clear coaching language and suggest alternative exercises targeting the same muscles without the problematic knee stress. If the AI fails to produce a valid response, the deterministic swap stands as the fallback.
This pattern provides several advantages:
- Predictability: Users and developers can reason about outcomes because the rule layer determines the candidate actions.
- Testability: Rules are unit-testable and auditable, simplifying verification across edge cases.
- Transparency: The app can show whether a recommendation came from AI or the fallback rules and present the evidence that influenced the decision.
The rules-first approach is applicable in any domain combining personalization with safety. Clinical triage, medication reminders, or industrial control panels benefit from deterministic boundaries paired with language-rich AI for explanation.
Choosing the stack: Next.js, Supabase, OpenAI Responses API, and why not heavy orchestration
Be My Trainer uses a compact, pragmatic stack. Next.js and React handle the public site, the authenticated app, and server API routes. Supabase provides both authentication and a Postgres backend for persisted training data. The OpenAI Responses API generates workouts, composes coach messages, and helps label coach insights. Vercel hosts the app.
The decision to avoid agent frameworks or LangChain-style orchestration reflects product realities. Early workflows are short and structured. Adding a full orchestration layer would increase complexity and indirection without delivering immediate value. The current architecture looks like:
Browser → Next.js App Router → Supabase Auth and Postgres → Next.js API routes → OpenAI Responses API
This pipeline keeps latency manageable, preserves server-side validation, and allows the developer to control schema and fallback behavior. It also maps naturally to a development cadence of quick iterations: modify rules, tweak schema, and observe user behavior.
Real-world comparison: Startups often weigh a "big platform now" vs "simple, extensible stack" decision. Be My Trainer illustrates the lean approach: solve the product's core research questions—does the loop encourage continued training, do users accept conservative adjustments—before investing in orchestration complexity.
Structured output and strict JSON schemas: why parsing is not enough
One of the most common pitfalls when integrating language models is relying on hopeful parsing of free-form responses. Be My Trainer avoids that by requiring strict JSON schemas for AI responses. For plan generation, responses must include a coach message and an array of workout days. Each workout day needs a name, focus, and a list of exercises with constrained attributes.
The server applies additional validation:
- Training days must be between two and six
- Exercise counts are limited to sensible ranges
- Sets/reps fall within realistic bands
- Missing values receive conservative fallback defaults
- Starting weights are forced to zero
The forced zero is crucial. Without it, a model might invent a starting weight that could be unsafe. The app invites users to self-select an appropriate weight in the gym, then relies on actual logged loads to guide progression.
Structured output yields clear benefits:
- Deterministic ingestion: Server code can safely read well-typed objects and enforce business rules.
- Easier error handling: If the AI returns malformed JSON or violates constraints, the server can revert to conservative defaults or rule-based outcomes.
- Improved auditability: It's straightforward to log the AI response alongside the rules-applied decisions, enabling later investigation if outcomes diverge.
A concrete example: Onboard a user who selects "hypertrophy" and equipment "dumbbells only." The AI returns three workouts formatted as prescribed. The server checks exercises to ensure none require barbells or machines. If an exercise like "barbell deadlift" appears, the server replaces it with a safe dumbbell alternative. That substitution stops a dangerous mismatch before it reaches the user.
Structured schemas do not guarantee correctness, but they create a deliberate boundary. The model handles natural language and personalization within that boundary; the application enforces safety and plausibility outside it.
Data model and security: training_states, workout_sessions, workout_sets
Be My Trainer organizes training data into three primary tables:
- training_states: stores the user's profile, active plan, current workout position, and the most recent coach message. This table represents the user's immediate "state" in the training workflow.
- workout_sessions: records each finished session, including energy, soreness, free-text notes, completion percentage, coach decision, and timestamp.
- workout_sets: contains set-level entries—exercise reference, repetitions, weight used, effort rating, and set order.
This separation supports efficient queries tailored to different UI needs. Generating weekly adherence charts or retrieving recent session summaries is faster when the session-level aggregates are readily available. Conversely, exercise-level progress and trends require detailed set data.
Supabase's Row Level Security enforces per-user access controls, ensuring users can only read and write their own training data. This prevents accidental leaks across accounts and simplifies compliance with privacy expectations.
Practical benefits of this model include:
- Clear separation of concerns: dashboards, charts, and rules read the appropriate granularity without complex joins.
- Easier backups and migrations: storing session summaries separate from set logs makes it practical to archive older detail while keeping recent history online.
- Security-by-default: RLS and authentication handle the common case of "private fitness data," helping prevent exposure of sensitive logs.
A cautionary note: the first version keeps the coach insight cache in the browser rather than the database. That reduces backend calls but complicates multi-device consistency. Future iterations will likely migrate caching and insight storage to server-side caches or the database to maintain consistent behavior across devices.
Coach insights and caching: treat evidence as first-class
The app offers coach insights after a minimum of three completed workouts. The process begins with deterministic signal extraction: average completion, mean energy and soreness, count of very hard sets, and history of conservative decisions. These metrics form the "evidence baseline."
Only after calculating this baseline does the app send a request to the language model for improved labels or a more nuanced explanation. The model can refine phrasing or suggest a next action, but it does not replace the evidence. The evidence remains owned by the rule layer.
Caching matters. Coach insights are cached in the browser keyed to the user and the IDs of recent workouts. A page refresh does not trigger another AI call. Only when a new completed workout changes the key will the insight be regenerated. This prevents unnecessary model requests and keeps the cost profile manageable.
Analogy: Think of evidence as lab results and the model as a clinician explaining them. The data—blood pressure, soreness scores, completed reps—should remain primary. The clinician provides interpretation but cannot manufacture raw numbers.
This pattern enforces good behavior for AI usage:
- AI calls are evidence-triggered, not render-triggered
- Explanations enhance but do not supplant objective metrics
- Cached results respect rate limits and user experience
From a product perspective, this prevented a subtle UX bug: a technically correct but repetitive coaching line kept reappearing because the underlying rules did not change. Surfacing the actual signal and the review window (for instance, "average soreness 4.2/5 over the last five sessions") made the insight actionable and less generic.
UX lessons: language, onboarding, and transparency
Some of the highest-impact improvements did not involve model changes or infrastructure. They were product decisions that improved comprehension and trust.
Terminology was adjusted after user feedback. Early testers assumed familiarity with "sets," "reps," "load," and "effort." Newcomers found those terms opaque. The interface now explains set vs round, uses "weight used" instead of "load," clarifies what counts as a repetition, and maintains a quick logging path with three simple options: "too easy," "good," or "too hard."
Coach insights were revised to be more specific. A message that repeatedly said "recovery needs attention" felt generic even if it was accurate. Rewriting the card to show the measurable signal—what window was used and the actual value—transformed it into actionable feedback.
Transparency replaced branding "AI" as the primary interface cue. The main action reads "Generate plan" rather than "Generate AI plan." After a workout, the explanation clarifies whether the final decision originated from the deterministic rules or the model. Users respond better to explanations that show evidence and the decision source, which increases trust more than branding with a buzzword.
Real-world impact: In fitness, trust matters more than novelty. A trustworthy, predictable app nudges consistency. Novelty without reliability drives churn. Small UX choices—plain language, clear evidence, and honest descriptions of AI's role—produce better retention.
Practical safety boundaries and conservative defaults
The app establishes clear, conservative safety boundaries to limit risk:
- Adult-only onboarding prevents accidental recommendations to minors.
- No prescribed starting weights; users select loads based on their comfort.
- Conservative handling of poor recovery: sessions can be reduced or held rather than escalated.
- Pain or injury language triggers stop-or-swap behavior for exercises that could exacerbate issues.
- Prominent Terms, Privacy, and Safety pages explain limitations and advise seeking licensed help when necessary.
These choices reflect an understanding of the domain: exercise guidance interacts with physical health, and small errors can cause harm. Conservative defaults reduce the chance of an unsafe recommendation slipping through.
Example: A user reports persistent shoulder pain. The system flags the movement and either stops prescribing pressing motions that involve the shoulder or substitutes safer variations. The app also includes clear language advising a medical professional if pain persists.
This conservative posture is aligned with regulatory and ethical imperatives. It also preserves user trust. An app that aggressively ramps up load after poor sessions risks injury and reputation damage.
Limitations of the initial build and engineering priorities
The first release deliberately accepts several engineering trade-offs to accelerate learning:
- Coach-insight cache lives in the browser rather than a centralized store, complicating multi-device consistency.
- Account deletion is handled via a verified request flow instead of a fully automated self-service path.
- Automated tests around training rules are missing, increasing the risk of regressions.
- The main application component is monolithic and needs refactoring into smaller modules to improve maintainability.
- Database changes are applied via a shared schema file rather than versioned migrations.
These limitations are not technical oversights but intentional prioritizations. The immediate priorities are product validation: does the loop hold users, is the onboarding clear, and does adaptation feel reasonable? Once those questions receive answers, the engineering plan moves to hardening tasks: automated tests, data migrations, server-side caching, and modularization.
Roadmap items include:
- Moving coach-insight caching server-side to ensure consistent cross-device behavior.
- Implementing a full audit trail for trainer interventions and client management.
- Adding automated tests for the rule engine to ensure predictable outcomes as rules evolve.
- Adopting migration tooling to track database changes and enable safer deployments.
This incremental posture embodies a product-first philosophy: validate user value quickly, then invest in engineering rigor.
Extending to trainers: permissions, auditing, and workflows
Long-term plans center on enabling personal trainers to manage clients within the same foundation. That introduces new requirements:
- A trainer/client permission model with granular access controls.
- An auditable change log so clients and trainers can see who changed a plan and why.
- A trainer dashboard that surfaces client signals, recovery metrics, and suggested adjustments.
- Decision-support features allowing trainers to accept, reject, or modify AI recommendations.
These features change the product from a single-user assistant into a collaboration platform where AI acts as a support tool for human professionals rather than a replacement.
A plausible trainer workflow:
- Trainer creates a client profile and proposes an initial plan.
- Client completes workouts and logs sets; trainer receives alerts about flagged sessions (pain, repeated missed reps).
- The app suggests adjustments derived from the same rules-first pipeline, labeled as "suggested by the assistant."
- The trainer reviews suggestions, makes changes, and the system logs the action with a timestamp and rationale.
From a compliance and trust perspective, auditable changes are essential. Trainers must know that their client's data is not being used to automate medical decisions, and clients must trust that any recommendation from the app reflects a human-reviewed process when applicable.
Building for trainers also magnifies scale considerations: multiple client accounts per trainer, permission revocation, and multi-device dashboards. That will motivate migration to migrated schemas, server-side caching, and robust background processing.
Why structured validation and rule fallbacks reduce legal and ethical risk
The combination of structured schemas, deterministic rule decisions, and conservative fallbacks mitigates several legal and ethical risks inherent in delivering health-adjacent recommendations.
Risk vectors that this approach addresses:
- Hallucination risk: Models may invent plausible but unsafe instructions. JSON schemas and server-side validation prevent malformed or out-of-scope outputs from reaching users.
- Overreach: A model might recommend clinical-level interventions. Rules restrict output to a finite, safe set of actions.
- Traceability: Deterministic rules create an audit trail and easier explanations for why a particular adjustment occurred.
- User autonomy: Forcing starting weights to zero requires users to self-assess and make safe choices. This preserves user responsibility and reduces liability tied to model misestimation.
These mitigations do not eliminate risk. They do, however, move decision points into predictable code paths, which simplifies testing, logging, and human review when needed.
From a governance standpoint, this design supports safer deployments and clearer terms-of-service disclosures. It also aligns with guidelines that emphasize "AI as assistant, human oversight required" for high-stakes domains.
Cost and performance considerations in model usage
A practical product must balance the value derived from model calls against both latency and cost. Be My Trainer adopts strategies to keep AI usage targeted and efficient:
- Use AI for plan generation, coach messaging, and occasional insights—not for every UI render.
- Cache insights client-side and regenerate only when new session evidence appears.
- Keep model requests structured and minimal in size by sending only the necessary evidence baseline and allowed decisions rather than the entire session history.
- Implement fallback rules that produce acceptable outcomes when API calls fail or responses are invalid.
This design limits the frequency of model invocations and keeps per-user cost predictable. It also prevents a single page refresh from consuming API quota.
Real-world parallel: A customer support system might use models to summarize a conversation once but not to generate responses for every keystroke. The same throttle principles apply to fitness guidance.
Real-world examples that highlight the design trade-offs
Example 1 — Conservative progression vs. aggressive gain
- Scenario: A novice user completes a set with perfect reps but reports low energy. A model free to decide might encourage pushing harder to exploit the favorable set outcome. The rule engine prioritizes recovery signals and suggests holding progression or reducing volume. The model crafts empathetic language explaining that consistency and recovery yield better long-term gains than immediate overload.
Example 2 — Exercise swapping to avoid injury
- Scenario: A user reports lateral knee pain after sets of lunges. Rules detect pain language and flag the movement. The AI produces a swap to split-stance deadlifts or glute bridges, explaining why those variations reduce knee shear while targeting similar muscles. The server validates exercise equipment compatibility and rejects any swap requiring unavailable equipment.
Example 3 — Handling inconsistent logging
- Scenario: A user frequently uses the quick logging path ("too easy"/"good"/"too hard"), producing sparse weight data. Rules emphasize consistency and recommend a few sessions focusing on establishing baselines rather than immediate progression. The model composes a coach message that asks the user to log at least one full session with weights to enable more accurate adjustments.
Each example shows why blended deterministic rules with constrained model explanations produce safer and more useful outcomes than either pure automation or pure manual rules.
Design principles that guided development
The engineering and product choices embody several core principles:
- Evidence-first: Decisions should rely on logged behavior and recovery rather than speculative guesses.
- Minimal surprise: Recommendations must be predictable and explainable. Users should understand why the app made a change.
- Conservative defaults: When in doubt, choose safer actions—hold volume, reduce load, or swap a movement.
- Schema-driven interfaces: Require structured outputs so the server can enforce constraints before saving or displaying information.
- Transparent provenance: Show whether an outcome came from a rule or an AI explanation, alongside the evidence that supported it.
- Incremental complexity: Start with a simple stack and add orchestration only when workflows demand it.
These principles prioritize user safety, product clarity, and rapid learning.
Metrics that matter for validating the loop
To determine whether the product meets its goals, the team focuses on a handful of behavioral and technical metrics:
- Onboarding completion rate: How many users finish the profile and receive an initial plan?
- First-week retention: Do users return to log at least one workout after generating a plan?
- Workout completion rate: Of workouts started, how many are completed and how many sets are logged?
- Logging fidelity: What percentage of sessions include set-level weight and reps vs. quick categorical logging?
- Adjustment acceptance: Do users adhere to the app's suggested next workouts or override them?
- Safety flag rate: Frequency of stop-or-swap actions and follow-up actions like medical referrals or trainer interventions.
- Latency and cost per user: Average time to generate a plan and cost of AI calls per active user.
Tracking these metrics helps answer the central product question: does the loop help people train more effectively and safely than a static plan or ad-hoc prompts?
What this approach signals for other domains
The rules-first, schema-validated pattern has broader relevance beyond fitness. Any domain where personalization intersects with safety—telehealth triage, financial advice, educational remediation—benefits from this pattern. The key ideas translate directly:
- Determine a small set of safe, auditable actions for the system to select from.
- Use deterministic logic to create that set based on objective signals.
- Employ models to craft explanations, improve phrasing, or personalize messages.
- Validate structured model outputs before they can alter application state.
- Keep fallbacks and caching in place to maintain service continuity when model calls fail.
Adopting this pattern reduces catastrophic failure modes that arise when a model is granted unconstrained agency.
Engineering roadmap: from prototype to product
To move from an MVP to a robust product, several engineering tasks are prioritized:
- Migration to versioned database schema and migrations to enable safe changes across environments.
- Unit and integration tests for the rule engine to protect behavior as rules evolve.
- Server-side caching for coach insights, ensuring consistent cross-device behavior.
- Modularization of the main application component into smaller, testable units for maintainability.
- Audit logging and a permission model to support trainer/client workflows.
- A background job system for heavier asynchronous tasks (e.g., periodic insights generation across many users).
- Improved account deletion workflows for compliance and user autonomy.
Each of these steps strengthens reliability and prepares the product for scale while preserving the core rules-first philosophy.
Final product assessment: strengths and where to watch
Be My Trainer’s central strength is its careful allocation of responsibilities: deterministic rules handle safety and evidence, while the model enriches language and personalization. This architecture reduces risk, increases predictability, and makes outcomes auditable.
Watch these areas as the product grows:
- Rule complexity: As new rules are added, ensure they remain testable and do not produce conflicting signals.
- Trainer tooling: Permissions, auditing, and workflow complexity will require careful UX and security planning.
- Cost management: As user count increases, refine which AI calls are essential and explore batching or lighter-weight models for routine tasks.
- Data consistency: Migrate caching and insights to server-side stores to avoid fragmentation across devices.
- Regulatory posture: If the product drifts toward clinical recommendations, prepare for stricter compliance obligations.
With prudent engineering and product discipline, the same foundation supports both individual users and trainer-centric features while maintaining safety and clarity.
FAQ
Q: How does Be My Trainer decide whether to increase or reduce my workload? A: The app calculates objective signals from your session logs—completion of target reps, average effort, energy and soreness ratings, and any pain language in notes. A deterministic rule set maps those signals to a small set of safe decisions (progress gradually, reduce volume, hold/make easier, stop or swap a movement, repeat the workout). The model receives only the allowed options and composes an explanation; if the model fails, the deterministic decision is used.
Q: Can the app recommend starting weights for beginners? A: No. The app forces starting weights to zero and asks users to select an appropriate weight during their first sessions. This prevents the system from guessing a load that may be unsafe. Adjustments and progressive overload derive from real logged history rather than speculative estimates.
Q: Does the app replace a human trainer or medical advice? A: No. Be My Trainer provides general fitness guidance and is not a substitute for a qualified trainer, coach, or clinician. The app includes explicit safety messaging and advises seeking professional help when pain or clinical concerns arise.
Q: What happens if the AI returns malformed data or a recommendation that's out of bounds? A: The server validates AI responses against strict JSON schemas and additional application rules. If the response is malformed or invalid, the deterministic rule fallback is applied and persisted. This ensures the user always receives a valid next workout.
Q: Why not use LangChain or an agent framework? A: The initial workflows are short and structured; a heavy orchestration layer would add complexity with little immediate benefit. The current stack—Next.js, Supabase, and the OpenAI Responses API—keeps the system straightforward and easier to iterate. The team may adopt orchestration tools later if the product requires retrieval, complex tools, or long-running workflows.
Q: How are user data and privacy protected? A: User data is stored in Supabase Postgres with Row Level Security limiting reads and writes to authenticated users' own data. The app exposes Terms, Privacy, and Safety pages and provides user controls as the product matures. Future iterations will improve account deletion workflows and auditability.
Q: Can I use the app as a trainer managing multiple clients? A: The initial release focuses on individual users. Trainer features—multi-client management, auditable plan changes, and permission models—are planned for later versions after validating the core single-user loop.
Q: How often does the app call the AI, and how are costs managed? A: AI calls occur for plan generation, specific coach messages, and periodic coach insights. Insights are cached in the browser and regenerated only when new session evidence appears. The system also has deterministic fallbacks to ensure the app functions even without successful AI responses, keeping cost and latency under control.
Q: What are the known limitations of the current build? A: Notable limitations include browser-only caching for insights, a verified-but-manual account deletion flow, missing automated tests for training rules, a monolithic main component that needs splitting, and database changes managed via a shared schema file rather than migrations. These are explicit priorities for the engineering roadmap.
Q: How can I provide feedback or try the app? A: Be My Trainer is live at bemytrainer.app. The team welcomes feedback from users who try onboarding and log a complete workout to help refine onboarding, rules, and UX.