fix(execution): cap isolate memory at 128MB and recycle workers every 200 executions#4543
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview Reduces the default Reviewed by Cursor Bugbot for commit eb6068b. Configure here. |
Greptile SummaryThis PR makes two targeted production fixes to address a memory alarm triggered by native heap accumulation in the isolated-vm worker pool: it reverts the isolate
Confidence Score: 5/5Safe to merge — all three files are consistent with each other and the changes are backed by 48h of production telemetry. The changes are narrow and mutually consistent: the env schema default, the in-code fallback, the isolate memory cap, and the error messages all tell the same story. The 256 → 128 MB revert is validated by zero memory-limit errors in production. The 500 → 200 recycling threshold directly addresses the observed RSS growth (nativeContexts climbing to 475, RSS peaking at 15.6 GB). Both values remain env-overridable, so further tuning requires no code change. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[New execution request] --> B{Worker available?}
B -- yes --> C[Assign to worker]
B -- no --> D[Queue / spawn new worker]
C --> E[Create ivm.Isolate\nmemoryLimit: 128 MB]
E --> F[Execute code / task]
F --> G{Memory exceeded?}
G -- yes --> H[Return MemoryLimitError\n'128 MB exceeded']
G -- no --> I[Return result]
I --> J[Increment worker.execCount]
J --> K{execCount >= 200?}
K -- yes --> L[Mark worker retiring\nprocess.kill on idle]
K -- no --> M[Worker stays in pool]
L --> N[Spawn replacement worker]
Reviews (4): Last reviewed commit: "fix(execution): update memory limit erro..." | Re-trigger Greptile |
|
@greptile |
|
@cursor review |
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit eb6068b. Configure here.
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit eb6068b. Configure here.
Summary
memoryLimitfrom 256MB → 128MB inisolated-vm-worker.cjs(bothexecuteCodeandexecuteTask). 256MB was added in improvement(sandbox): upgrade pptx/docx/pdf bootstrap with image helpers, MIME guards, and 256 MB isolate limit #4505 for doc generation; production data (48h) shows zeroReached memory limiterrors, so the headroom was unused. Error messages updated accordingly.MAX_EXECUTIONS_PER_WORKERdefault from 500 → 200 in bothapps/sim/lib/core/config/env.ts(env schema default) andapps/sim/lib/execution/isolated-vm.ts(fallback). Workers recycle 2.5× more aggressively. All 89 worker retirements in the last 48h hit the 500 ceiling, andMemoryTelemetryshows native context count climbing to 475 with RSS peaks of 15.6GB — onlyprocess.kill()reclaims that native memory, so faster recycling caps steady-state RSS.IVM_MAX_EXECUTIONS_PER_WORKER).Why
Production app task triggered
sim-production-us-east-1-app-task-high-memoryalarm, climbing from 1GB → 13.7GB in 30 minutes after the v0.6.72 deploy. Math (13GB ÷ 256MB ≈ 50 isolates) + telemetry (externalMB 11.7GB peak, nativeContexts 475) point at native memory accumulating in the parent across executions faster than workers recycle.Note on chosen value (200)
Initial fix used 100 (5× reduction). Raised to 200 after weighing tradeoffs:
IVM_MAX_EXECUTIONS_PER_WORKER) lets us drop to 100 without a code change if post-deploy telemetry shows RSS still climbingType of Change
Testing
Tested manually. Spawn-rate impact at 200 is negligible: peak 74/hr → ~185/hr across the 4-worker pool, distributed via
retiringflag (no in-flight interruption).Checklist