feat(code): Suggest experiment opportunities in setup discovery#2109
feat(code): Suggest experiment opportunities in setup discovery#2109jurajmajerik wants to merge 2 commits intomainfrom
Conversation
Adds an experiment-category tier to setup discovery, gated by the posthog-code-experiment-suggestions flag. When enabled, the agent scans for A/B-testable surfaces alongside the existing bug and PostHog-instrumentation tiers, and the slot-reservation rule keeps at least one experiment card if a credible candidate exists. Accepted experiment cards build a task prompt that reuses the run-experiment skill template to scaffold flag, draft experiment, and code wiring end-to-end. Defaults off in production; on in dev via import.meta.env.DEV.
# Conflicts: # apps/code/src/renderer/features/setup/services/setupRunService.ts # apps/code/src/renderer/features/setup/types.ts # apps/code/src/renderer/features/setup/utils/buildDiscoveredTaskPrompt.ts # apps/code/src/renderer/features/setup/utils/categoryConfig.ts # apps/code/src/shared/types/analytics.ts
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
apps/code/src/renderer/features/setup/prompts.ts:39-45
**Duplicated category list (OnceAndOnlyOnce violation)**
`BASE_ALLOWED_CATEGORIES` in `prompts.ts` and `BASE_CATEGORY_ENUM` in `types.ts` enumerate exactly the same nine base categories. If a new category is added to the schema in the future, both constants must be updated in sync — a gap that will inevitably be missed. The prompt's allowed-categories string should be derived from the schema's enum array (e.g. export `BASE_CATEGORY_ENUM` from `types.ts` and call `.join(", ")` here) so there is a single source of truth.
### Issue 2 of 2
apps/code/src/renderer/features/setup/services/setupRunService.ts:357-359
**`import.meta.env.DEV` makes the flag bypass un-testable in dev**
`isFeatureFlagEnabled(EXPERIMENT_SUGGESTIONS_FLAG) || import.meta.env.DEV` means every developer running a local build always gets experiment suggestions, with no way to exercise the flag-off code path without a production build. If someone wants to verify the fallback behaviour (e.g. confirming that `experiment` cards surfaced during a flag-on session still render after the flag is turned off), they cannot do so from a normal dev environment. Consider whether this was intentional or whether a more explicit local override (e.g. a query-param or an env var) would give more control during testing.
Reviews (1): Last reviewed commit: "Merge branch 'main' into experiments/dis..." | Re-trigger Greptile |
| const BASE_ALLOWED_CATEGORIES = | ||
| "bug, security, dead_code, duplication, performance, stale_feature_flag, error_tracking, event_tracking, funnel"; | ||
|
|
||
| function buildDiscoveryRules(includeExperiments: boolean): string { | ||
| const allowed = includeExperiments | ||
| ? `${BASE_ALLOWED_CATEGORIES}, experiment` | ||
| : BASE_ALLOWED_CATEGORIES; |
There was a problem hiding this comment.
Duplicated category list (OnceAndOnlyOnce violation)
BASE_ALLOWED_CATEGORIES in prompts.ts and BASE_CATEGORY_ENUM in types.ts enumerate exactly the same nine base categories. If a new category is added to the schema in the future, both constants must be updated in sync — a gap that will inevitably be missed. The prompt's allowed-categories string should be derived from the schema's enum array (e.g. export BASE_CATEGORY_ENUM from types.ts and call .join(", ") here) so there is a single source of truth.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/renderer/features/setup/prompts.ts
Line: 39-45
Comment:
**Duplicated category list (OnceAndOnlyOnce violation)**
`BASE_ALLOWED_CATEGORIES` in `prompts.ts` and `BASE_CATEGORY_ENUM` in `types.ts` enumerate exactly the same nine base categories. If a new category is added to the schema in the future, both constants must be updated in sync — a gap that will inevitably be missed. The prompt's allowed-categories string should be derived from the schema's enum array (e.g. export `BASE_CATEGORY_ENUM` from `types.ts` and call `.join(", ")` here) so there is a single source of truth.
How can I resolve this? If you propose a fix, please make it concise.| const includeExperiments = | ||
| isFeatureFlagEnabled(EXPERIMENT_SUGGESTIONS_FLAG) || | ||
| import.meta.env.DEV; |
There was a problem hiding this comment.
import.meta.env.DEV makes the flag bypass un-testable in dev
isFeatureFlagEnabled(EXPERIMENT_SUGGESTIONS_FLAG) || import.meta.env.DEV means every developer running a local build always gets experiment suggestions, with no way to exercise the flag-off code path without a production build. If someone wants to verify the fallback behaviour (e.g. confirming that experiment cards surfaced during a flag-on session still render after the flag is turned off), they cannot do so from a normal dev environment. Consider whether this was intentional or whether a more explicit local override (e.g. a query-param or an env var) would give more control during testing.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/renderer/features/setup/services/setupRunService.ts
Line: 357-359
Comment:
**`import.meta.env.DEV` makes the flag bypass un-testable in dev**
`isFeatureFlagEnabled(EXPERIMENT_SUGGESTIONS_FLAG) || import.meta.env.DEV` means every developer running a local build always gets experiment suggestions, with no way to exercise the flag-off code path without a production build. If someone wants to verify the fallback behaviour (e.g. confirming that `experiment` cards surfaced during a flag-on session still render after the flag is turned off), they cannot do so from a normal dev environment. Consider whether this was intentional or whether a more explicit local override (e.g. a query-param or an env var) would give more control during testing.
How can I resolve this? If you propose a fix, please make it concise.
adboio
left a comment
There was a problem hiding this comment.
this is sick, thank you!! one minor greptile comment that seems legit but otherwise lgtm
| const BASE_ALLOWED_CATEGORIES = | ||
| "bug, security, dead_code, duplication, performance, stale_feature_flag, error_tracking, event_tracking, funnel"; | ||
|
|
||
| function buildDiscoveryRules(includeExperiments: boolean): string { | ||
| const allowed = includeExperiments | ||
| ? `${BASE_ALLOWED_CATEGORIES}, experiment` | ||
| : BASE_ALLOWED_CATEGORIES; |
Problem
Setup currently only suggests bug fixes and cleanups. We want it to also suggest experiments to run.
Changes
The setup discovery scan now also looks for places in the code where an A/B test would make sense (CTAs, pricing, signup flows, empty states) and shows them as suggestion cards alongside the bugs.
Behind the
posthog-code-experiment-suggestionsflag. Off in prod, on in dev.If the user clicks an experiment card, the agent creates a feature flag, a draft experiment with a primary metric, and wires the variants into the code. It reuses the existing
run-experimentskill prompt for that part.The agent only includes an experiment card if it finds a real candidate. When it does, it drops a lower-priority bug/cleanup card to make room.
How did you test this?
Built a small Vite + React + posthog-js app with a landing CTA, pricing, signup, empty state, and one XSS bug. Ran discovery on it. Got an experiment suggestion plus the XSS. Clicked the experiment card and the agent created the flag, the draft experiment (
signup_submittedas primary metric,pricing_plan_selectedas guardrail), and updated the code.Follow-up: add analytics on what experiments are being suggested before we turn the flag on for more people.
Publish to changelog?
no