Discuss your proposed contribution before starting. Not doing this runs you the risk of entirely discarding something you put considerable time and effort into. You can DM Miguel on Slack for a 1on1 call.
Wait for Review. We’ll do our best to get to your contribution as soon as possible. If it’s been 2-3 days and you have yet to receive any comments, DM Miguel on Slack
Merge into evals branch. We don’t let external contributors run our CI via GitHub Actions to prevent spam and misuse. If your contribution passes an initial screen, we’ll run our evals on it
By default, all PRs run the following tests that you can also run from the repo source:
Lint (npm run lint) - Runs prettier and eslint. If this fails, you can most likely run npm run format to fix some simple linting errors.
Build (npm run build) - Lints and builds TS → JS in dist/ via tsup
End-to-End (npm run e2e) - These are deterministic end-to-end Playwright tests to ensure the integrity of basic Playwright functionality of stagehand.page and stagehand.context as well as compatibility with the Browserbase API
Combination (npm run evals category combination) - This runs AI-based end-to-end tests using combinations of act, extract, and observe
If you’re changing anything about act, extract, or observe itself, we might also run specific act/extract/observe evals to ensure existing functionality doesn’t significantly drop.
Cleanup and merge to main. Once it’s in evals, unfortunately the original contributor can’t make any further changes. The internal Stagehand team will be responsible for cleaning up the code and bringing it into main.
Use draft PRs. If your PR is a work in progress, please convert it to a draft (see below) while you’re working on it, and mark it for review/add reviewers when you’re ready. This helps us prevent clutter in the review queue.
Provide a reproducible test plan. Include an eval (preferred) or example. We can’t merge your PR if we can’t run anything that specifically highlights your contribution.
Add your script to evals.config.json with default category combination (or act/extract/observe if you’reonlytestingact/extract/observe).
Add a changeset. Run npx changeset in TS or uvx changeset in Python to add a changeset that will directly reflect in the CHANGELOG in the upcoming release.
patch - no net new functionality to an end-user
minor - some net new functionality to an end-user (new function parameter, new exposed type, etc.)
major - you shouldn’t be committing a major change