feat: extract sandbox runtimes from core#8093
feat: extract sandbox runtimes from core#8093zouyonghe wants to merge 30 commits intoAstrBotDevs:masterfrom
Conversation
There was a problem hiding this comment.
Sorry @zouyonghe, your pull request is larger than the review limit of 150000 diff characters
There was a problem hiding this comment.
Code Review
This pull request transitions the sandbox management system to a provider-based architecture, removing hardcoded implementations for Shipyard and CUA from the core. It introduces a SandboxManager, SandboxRegistry, and a SandboxProvider protocol to handle sandbox lifecycles, leases, and persistence generically. The dashboard is updated with new sandbox routes, and tool registration is refactored to be dynamically driven by providers. Review feedback highlights the need to restore provider-specific system prompts lost during extraction, improve resilience when sandboxes are busy, ensure compatibility with different booter implementations of the available check, and avoid blocking the event loop with synchronous file I/O in the registry.
| provider_info = get_sandbox_provider_info(booter) | ||
| if provider_info: | ||
| for tool_name in provider_info.get("tool_names", []): | ||
| tool = tool_mgr.get_func(tool_name) | ||
| if tool and getattr(tool, "active", True): | ||
| req.func_tool.add_tool(tool) |
There was a problem hiding this comment.
The generic sandbox tool mounting logic should also include provider-specific system prompt instructions if available. The extraction of runtimes removed important guidance (e.g., file path rules for Shipyard Neo or GUI instructions for CUA) that the LLM needs to use the tools effectively. The SandboxProvider protocol should be updated to provide these instructions. Please ensure this new functionality is accompanied by corresponding unit tests and that the logic for extracting provider info is shared to avoid duplication.
| provider_info = get_sandbox_provider_info(booter) | |
| if provider_info: | |
| for tool_name in provider_info.get("tool_names", []): | |
| tool = tool_mgr.get_func(tool_name) | |
| if tool and getattr(tool, "active", True): | |
| req.func_tool.add_tool(tool) | |
| provider_info = get_sandbox_provider_info(booter) | |
| if provider_info: | |
| if prompt := provider_info.get("system_prompt"): | |
| req.system_prompt = f"{req.system_prompt or ''}\n{prompt}\n" | |
| for tool_name in provider_info.get("tool_names", []): | |
| tool = tool_mgr.get_func(tool_name) | |
| if tool and getattr(tool, "active", True): | |
| req.func_tool.add_tool(tool) |
References
- When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.
- New functionality, such as handling attachments, should be accompanied by corresponding unit tests.
| def _sandbox_provider_info(provider_id: str, provider: SandboxProvider) -> dict: | ||
| return { | ||
| "provider_id": provider_id, | ||
| "capabilities": sorted(getattr(provider, "capabilities", set())), | ||
| "tool_names": sorted(getattr(provider, "tool_names", set())), | ||
| } |
There was a problem hiding this comment.
Include the system_prompt from the provider in the info dictionary so it can be utilized by the agent during tool registration. Refactor the logic into a shared helper function to avoid code duplication. Also, ensure this new functionality is accompanied by unit tests.
| def _sandbox_provider_info(provider_id: str, provider: SandboxProvider) -> dict: | |
| return { | |
| "provider_id": provider_id, | |
| "capabilities": sorted(getattr(provider, "capabilities", set())), | |
| "tool_names": sorted(getattr(provider, "tool_names", set())), | |
| } | |
| def _sandbox_provider_info(provider_id: str, provider: SandboxProvider) -> dict: | |
| return { | |
| "provider_id": provider_id, | |
| "capabilities": sorted(getattr(provider, "capabilities", set())), | |
| "tool_names": sorted(getattr(provider, "tool_names", set())), | |
| "system_prompt": getattr(provider, "system_prompt", ""), | |
| } |
References
- When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.
- New functionality, such as handling attachments, should be accompanied by corresponding unit tests.
| available = getattr(booter, "available", None) | ||
| if available is None: | ||
| return True | ||
| return await available() |
| if not self.acquire_lease(current_sandbox_id, session_id): | ||
| raise RuntimeError(f"Sandbox {current_sandbox_id} is busy") | ||
| booter = self.session_booter[current_sandbox_id] | ||
| if await self.booter_available(booter): | ||
| self.registry.touch_sandbox(current_sandbox_id) | ||
| self.save_registry() | ||
| self.schedule_idle_cleanup(current_sandbox_id, idle_timeout) | ||
| return booter | ||
| self.session_booter.pop(current_sandbox_id, None) |
There was a problem hiding this comment.
If the current sandbox is busy (e.g., taken over by another session), raising a RuntimeError here will cause the agent to fail. Instead of crashing, it would be more resilient to clear the current sandbox assignment for this session and fall through to the logic that finds the default sandbox or creates a new one.
| if not self.acquire_lease(current_sandbox_id, session_id): | |
| raise RuntimeError(f"Sandbox {current_sandbox_id} is busy") | |
| booter = self.session_booter[current_sandbox_id] | |
| if await self.booter_available(booter): | |
| self.registry.touch_sandbox(current_sandbox_id) | |
| self.save_registry() | |
| self.schedule_idle_cleanup(current_sandbox_id, idle_timeout) | |
| return booter | |
| self.session_booter.pop(current_sandbox_id, None) | |
| if self.acquire_lease(current_sandbox_id, session_id): | |
| booter = self.session_booter[current_sandbox_id] | |
| if await self.booter_available(booter): | |
| self.registry.touch_sandbox(current_sandbox_id) | |
| self.save_registry() | |
| self.schedule_idle_cleanup(current_sandbox_id, idle_timeout) | |
| return booter | |
| self.session_booter.pop(current_sandbox_id, None) | |
| else: | |
| self.registry.set_current_sandbox_id(session_id, None) | |
| self.save_registry() |
| class SandboxProvider(Protocol): | ||
| provider_id: str | ||
| capabilities: set[str] | ||
| tool_names: set[str] |
There was a problem hiding this comment.
The SandboxProvider protocol should include a system_prompt field to allow runtimes to provide necessary instructions to the LLM (e.g., path conventions or specific tool usage workflows). New functionality should be accompanied by corresponding unit tests.
| class SandboxProvider(Protocol): | |
| provider_id: str | |
| capabilities: set[str] | |
| tool_names: set[str] | |
| class SandboxProvider(Protocol): | |
| provider_id: str | |
| capabilities: set[str] | |
| tool_names: set[str] | |
| system_prompt: str = "" |
References
- New functionality, such as handling attachments, should be accompanied by corresponding unit tests.
|
|
||
| def save(self) -> None: | ||
| self.storage_path.parent.mkdir(parents=True, exist_ok=True) | ||
| self.storage_path.write_text( |
Use sandbox_manager as the source of active sandbox booters so provider sandboxes receive skill updates again, and write the sandbox registry atomically to avoid corrupting persisted state on crashes.
- Use sandbox_manager.booter_available() in sync_skills_to_active_sandboxes to avoid TypeError when provider available is a sync method or property. - Release lease and save registry when current booter becomes unavailable in get_or_create_booter, preventing lease leaks. - Reschedule idle cleanup on destroy_booter failure instead of leaking the sandbox without any future cleanup task. - Wrap CopyFileBetweenSandboxesTool temp file cleanup in try/finally to prevent leaks on upload/download errors. - Update session_current in takeover_sandbox for consistent get_current_sandbox semantics. - Force-cleanup session_booter, idle_state and registry in destroy_sandbox even if destroy_booter raises an exception. - Add boot_lock around create_sandbox_uncontrolled to prevent concurrent booter creation for the same sandbox_id. - Remove unused session_id parameter from _apply_sandbox_tools. - Remove unreachable break in get_or_create_booter. - Update tests to match adjusted _apply_sandbox_tools signature.
…ill sync - Upgrade SandboxProvider Protocol with optional plugin_config, provider_api_version, system_prompt, auto_sync_skills and lifecycle hooks (on_sandbox_created / on_sandbox_destroyed). - register_sandbox_provider now accepts a tools=list parameter. Core automatically appends tools to llm_tools and removes them on unregister. - unregister_sandbox_provider with force=True now synchronously cleans up registry records and spawns best-effort async destroy_booter tasks. - cleanup_managed_sandboxes no longer skips sandboxes whose provider has been unregistered; it always frees booters and registry state. - get_or_create_booter automatically syncs skills after boot unless provider sets auto_sync_skills=False. - invoke optional lifecycle hooks after create and destroy. - Remove manual skill sync calls from all provider create_booter methods. - Update tests for new protocol shape and auto-sync behaviour.
…e, best-effort skill sync - _cleanup_provider_sandboxes_sync now calls registry.save() after deleting records so force-unregister cleanup is persisted to disk. - Extract _finalize_created_booter helper and call it from both get_or_create_booter and create_sandbox_uncontrolled so explicit sandbox creation (e.g. astrbot_create_sandbox) also runs skill sync and on_sandbox_created hooks. - Wrap auto skill sync in try/except so a sync failure logs a warning instead of leaving a live but orphaned sandbox registered in state.
- Extract _invoke_sandbox_created_hook helper. - Move hook invocation out of _finalize_created_booter so it fires only after the sandbox is leased, matching the Protocol docstring guarantee. - get_or_create_booter calls hook after _finalize_created_booter (lease already held during creation). - create_sandbox calls hook after acquire_lease, eliminating the race where hook/skill sync could delay leasing and let idle cleanup expire the sandbox.
…aths - Make takeover_sandbox async so it can await the hook - Trigger hook in switch_current_sandbox_checked after lease cleanup - Trigger hook in get_observer_booter_by_id for dashboard-created sandboxes - Only mark created_hook_fired on success; transient failures are retried - Reset idle cleanup timer in create_sandbox after lease acquisition - Update callers (dashboard route, TakeoverSandboxTool, tests) to await Closes: on_sandbox_created skipped for dashboard/uncontrolled sandboxes
…nitoring - Remove on_sandbox_created from get_observer_booter_by_id; observer access does not acquire a lease and must not trigger creation-side initialization. The hook still fires on first lease via switch/takeover/create_sandbox. - Protect created_hook_fired check-and-set with _sandbox_boot_lock to prevent duplicate triggers under concurrent lease operations. - Allow Dashboard screenshot/shell to bypass lease check (session_id=None) so admins can monitor any sandbox regardless of current controller.
… semantics - get_observer_booter_by_id now accepts require_lease=True (default). When False, it allows read-only observer access to sandboxes controlled by other sessions without raising, but skips touch_sandbox and idle_cleanup reset so observer polling does not extend another session's sandbox lifetime. - Dashboard shell restores session_id lease check (require_lease=True default); only screenshot bypasses lease with require_lease=False for monitoring. - Update test_sandbox_dashboard_screenshot_respects_session_ownership to test_sandbox_dashboard_screenshot_bypasses_lease_for_monitoring, verifying that read-only observer access succeeds without a lease.
Dashboard is an administrative interface; both shell and screenshot should be usable on any sandbox regardless of current lease holder. - run_shell: pass require_lease=False to get_observer_booter_by_id - Update test to verify admin shell access bypasses session ownership
Add canonical sandbox lifecycle states to replace vague 'not running' errors: - creating: boot in progress - running: active and available - error: creation failed or booter health check failed - stopping: destroy in progress - stopped: destroyed but persistent record kept - unknown: reconciled on startup Changes: - SandboxStatus enum gains CREATING, ERROR, STOPPING states - create_sandbox_uncontrolled sets CREATING before boot, ERROR on failure - _finalize_created_booter sets RUNNING after successful boot - destroy_sandbox sets STOPPING before teardown, STOPPED for persistent - get_observer_booter_by_id returns state-aware error messages: 'still being created', 'being destroyed', 'has been destroyed', 'encountered an error', 'unavailable (health check failed)' - booter_available failure now updates status to ERROR instead of unknown
5e11841 to
48a1900
Compare
2bbed6d to
94b1e80
Compare
- Add status column with color-coded chips: creating(amber), running(success), error(error), stopping(warning), stopped/stopped(grey) - Add i18n labels for all new lifecycle states in en-US and zh-CN - Disable action buttons (console, screenshot, config, set-default) when sandbox status is not 'running' to prevent errors on creating/stopping sandboxes - Update table headers to show status instead of lease chip
94b1e80 to
14a5d12
Compare
Return dashboard sandbox create requests immediately in creating state, boot them in background, and let the UI transition from creating to idle or occupied without manual refresh. Also harden pending boot cleanup and keep sandbox details synchronized with refreshed records.
Clarify sandbox guidance so the model checks the current sandbox first, lists reusable sandboxes next, and treats creating a new sandbox as a last resort reserved for explicit fresh or separate environment requests.
Update sandbox prompts, tool descriptions, and dashboard messages to use occupancy wording instead of lease or 租约, including the release completion message and related labels.
Summary
Verification
uv run ruff format ...reported20 files left unchangeduv run ruff check ...reportedAll checks passed!uv run pytest tests/unit/test_cua_extracted_from_core.py tests/unit/test_sandbox_computer_client.py tests/unit/test_sandbox_manager.py tests/unit/test_sandbox_provider.py tests/unit/test_sandbox_registry.py tests/unit/test_sandbox_models.py tests/unit/test_computer.py tests/unit/test_astr_main_agent.py tests/test_computer_config.py tests/test_dashboard.py -qreported174 passed, 1 warning