feat: extract sandbox runtimes from core by zouyonghe · Pull Request #8093 · AstrBotDevs/AstrBot

zouyonghe · 2026-05-09T03:24:31Z

Summary

Add generic sandbox provider registration, registry, manager, and dashboard sandbox APIs.
Remove built-in CUA, Shipyard, Shipyard Neo, Boxlite, and Bay runtime implementations from core.
Keep core sandbox tool mounting provider-driven and remove runtime-specific dashboard/config coupling.

Verification

uv run ruff format ... reported 20 files left unchanged
uv run ruff check ... reported All checks passed!
uv run pytest tests/unit/test_cua_extracted_from_core.py tests/unit/test_sandbox_computer_client.py tests/unit/test_sandbox_manager.py tests/unit/test_sandbox_provider.py tests/unit/test_sandbox_registry.py tests/unit/test_sandbox_models.py tests/unit/test_computer.py tests/unit/test_astr_main_agent.py tests/test_computer_config.py tests/test_dashboard.py -q reported 174 passed, 1 warning

sourcery-ai

Sorry @zouyonghe, your pull request is larger than the review limit of 150000 diff characters

gemini-code-assist

Code Review

This pull request transitions the sandbox management system to a provider-based architecture, removing hardcoded implementations for Shipyard and CUA from the core. It introduces a SandboxManager, SandboxRegistry, and a SandboxProvider protocol to handle sandbox lifecycles, leases, and persistence generically. The dashboard is updated with new sandbox routes, and tool registration is refactored to be dynamically driven by providers. Review feedback highlights the need to restore provider-specific system prompts lost during extraction, improve resilience when sandboxes are busy, ensure compatibility with different booter implementations of the available check, and avoid blocking the event loop with synchronous file I/O in the registry.

gemini-code-assist · 2026-05-09T03:30:25Z

+    provider_info = get_sandbox_provider_info(booter)
+    if provider_info:
+        for tool_name in provider_info.get("tool_names", []):
+            tool = tool_mgr.get_func(tool_name)
+            if tool and getattr(tool, "active", True):
+                req.func_tool.add_tool(tool)


The generic sandbox tool mounting logic should also include provider-specific system prompt instructions if available. The extraction of runtimes removed important guidance (e.g., file path rules for Shipyard Neo or GUI instructions for CUA) that the LLM needs to use the tools effectively. The SandboxProvider protocol should be updated to provide these instructions. Please ensure this new functionality is accompanied by corresponding unit tests and that the logic for extracting provider info is shared to avoid duplication.

Suggested change

provider_info = get_sandbox_provider_info(booter)

if provider_info:

for tool_name in provider_info.get("tool_names", []):

tool = tool_mgr.get_func(tool_name)

if tool and getattr(tool, "active", True):

req.func_tool.add_tool(tool)

provider_info = get_sandbox_provider_info(booter)

if provider_info:

if prompt := provider_info.get("system_prompt"):

req.system_prompt = f"{req.system_prompt or ''}\n{prompt}\n"

for tool_name in provider_info.get("tool_names", []):

tool = tool_mgr.get_func(tool_name)

if tool and getattr(tool, "active", True):

req.func_tool.add_tool(tool)

References

When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.

New functionality, such as handling attachments, should be accompanied by corresponding unit tests.

gemini-code-assist · 2026-05-09T03:30:25Z

+def _sandbox_provider_info(provider_id: str, provider: SandboxProvider) -> dict:
+    return {
+        "provider_id": provider_id,
+        "capabilities": sorted(getattr(provider, "capabilities", set())),
+        "tool_names": sorted(getattr(provider, "tool_names", set())),
+    }


Include the system_prompt from the provider in the info dictionary so it can be utilized by the agent during tool registration. Refactor the logic into a shared helper function to avoid code duplication. Also, ensure this new functionality is accompanied by unit tests.

Suggested change

def _sandbox_provider_info(provider_id: str, provider: SandboxProvider) -> dict:

return {

"provider_id": provider_id,

"capabilities": sorted(getattr(provider, "capabilities", set())),

"tool_names": sorted(getattr(provider, "tool_names", set())),

}

def _sandbox_provider_info(provider_id: str, provider: SandboxProvider) -> dict:

return {

"provider_id": provider_id,

"capabilities": sorted(getattr(provider, "capabilities", set())),

"tool_names": sorted(getattr(provider, "tool_names", set())),

"system_prompt": getattr(provider, "system_prompt", ""),

}

References

When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.

New functionality, such as handling attachments, should be accompanied by corresponding unit tests.

gemini-code-assist · 2026-05-09T03:30:25Z

+        available = getattr(booter, "available", None)
+        if available is None:
+            return True
+        return await available()


The available attribute might be a property or a synchronous method in some implementations. It's safer to check if it's callable before awaiting it to avoid a TypeError.

if callable(available): return await available() return bool(available)

gemini-code-assist · 2026-05-09T03:30:26Z

+            if not self.acquire_lease(current_sandbox_id, session_id):
+                raise RuntimeError(f"Sandbox {current_sandbox_id} is busy")
+            booter = self.session_booter[current_sandbox_id]
+            if await self.booter_available(booter):
+                self.registry.touch_sandbox(current_sandbox_id)
+                self.save_registry()
+                self.schedule_idle_cleanup(current_sandbox_id, idle_timeout)
+                return booter
+            self.session_booter.pop(current_sandbox_id, None)


If the current sandbox is busy (e.g., taken over by another session), raising a RuntimeError here will cause the agent to fail. Instead of crashing, it would be more resilient to clear the current sandbox assignment for this session and fall through to the logic that finds the default sandbox or creates a new one.

Suggested change

if not self.acquire_lease(current_sandbox_id, session_id):

raise RuntimeError(f"Sandbox {current_sandbox_id} is busy")

booter = self.session_booter[current_sandbox_id]

if await self.booter_available(booter):

self.registry.touch_sandbox(current_sandbox_id)

self.save_registry()

self.schedule_idle_cleanup(current_sandbox_id, idle_timeout)

return booter

self.session_booter.pop(current_sandbox_id, None)

if self.acquire_lease(current_sandbox_id, session_id):

booter = self.session_booter[current_sandbox_id]

if await self.booter_available(booter):

self.registry.touch_sandbox(current_sandbox_id)

self.save_registry()

self.schedule_idle_cleanup(current_sandbox_id, idle_timeout)

return booter

self.session_booter.pop(current_sandbox_id, None)

else:

self.registry.set_current_sandbox_id(session_id, None)

self.save_registry()

gemini-code-assist · 2026-05-09T03:30:26Z

+class SandboxProvider(Protocol):
+    provider_id: str
+    capabilities: set[str]
+    tool_names: set[str]


The SandboxProvider protocol should include a system_prompt field to allow runtimes to provide necessary instructions to the LLM (e.g., path conventions or specific tool usage workflows). New functionality should be accompanied by corresponding unit tests.

Suggested change

class SandboxProvider(Protocol):

provider_id: str

capabilities: set[str]

tool_names: set[str]

class SandboxProvider(Protocol):

provider_id: str

capabilities: set[str]

tool_names: set[str]

system_prompt: str = ""

References

New functionality, such as handling attachments, should be accompanied by corresponding unit tests.

gemini-code-assist · 2026-05-09T03:30:26Z

+
+    def save(self) -> None:
+        self.storage_path.parent.mkdir(parents=True, exist_ok=True)
+        self.storage_path.write_text(


Performing synchronous file I/O with write_text inside the event loop can block other coroutines, especially if the registry grows large. Consider using an executor or an asynchronous file library for persistence.

Use sandbox_manager as the source of active sandbox booters so provider sandboxes receive skill updates again, and write the sandbox registry atomically to avoid corrupting persisted state on crashes.

- Use sandbox_manager.booter_available() in sync_skills_to_active_sandboxes to avoid TypeError when provider available is a sync method or property. - Release lease and save registry when current booter becomes unavailable in get_or_create_booter, preventing lease leaks. - Reschedule idle cleanup on destroy_booter failure instead of leaking the sandbox without any future cleanup task. - Wrap CopyFileBetweenSandboxesTool temp file cleanup in try/finally to prevent leaks on upload/download errors. - Update session_current in takeover_sandbox for consistent get_current_sandbox semantics. - Force-cleanup session_booter, idle_state and registry in destroy_sandbox even if destroy_booter raises an exception. - Add boot_lock around create_sandbox_uncontrolled to prevent concurrent booter creation for the same sandbox_id. - Remove unused session_id parameter from _apply_sandbox_tools. - Remove unreachable break in get_or_create_booter. - Update tests to match adjusted _apply_sandbox_tools signature.

…ill sync - Upgrade SandboxProvider Protocol with optional plugin_config, provider_api_version, system_prompt, auto_sync_skills and lifecycle hooks (on_sandbox_created / on_sandbox_destroyed). - register_sandbox_provider now accepts a tools=list parameter. Core automatically appends tools to llm_tools and removes them on unregister. - unregister_sandbox_provider with force=True now synchronously cleans up registry records and spawns best-effort async destroy_booter tasks. - cleanup_managed_sandboxes no longer skips sandboxes whose provider has been unregistered; it always frees booters and registry state. - get_or_create_booter automatically syncs skills after boot unless provider sets auto_sync_skills=False. - invoke optional lifecycle hooks after create and destroy. - Remove manual skill sync calls from all provider create_booter methods. - Update tests for new protocol shape and auto-sync behaviour.

…e, best-effort skill sync - _cleanup_provider_sandboxes_sync now calls registry.save() after deleting records so force-unregister cleanup is persisted to disk. - Extract _finalize_created_booter helper and call it from both get_or_create_booter and create_sandbox_uncontrolled so explicit sandbox creation (e.g. astrbot_create_sandbox) also runs skill sync and on_sandbox_created hooks. - Wrap auto skill sync in try/except so a sync failure logs a warning instead of leaving a live but orphaned sandbox registered in state.

- Extract _invoke_sandbox_created_hook helper. - Move hook invocation out of _finalize_created_booter so it fires only after the sandbox is leased, matching the Protocol docstring guarantee. - get_or_create_booter calls hook after _finalize_created_booter (lease already held during creation). - create_sandbox calls hook after acquire_lease, eliminating the race where hook/skill sync could delay leasing and let idle cleanup expire the sandbox.

…aths - Make takeover_sandbox async so it can await the hook - Trigger hook in switch_current_sandbox_checked after lease cleanup - Trigger hook in get_observer_booter_by_id for dashboard-created sandboxes - Only mark created_hook_fired on success; transient failures are retried - Reset idle cleanup timer in create_sandbox after lease acquisition - Update callers (dashboard route, TakeoverSandboxTool, tests) to await Closes: on_sandbox_created skipped for dashboard/uncontrolled sandboxes

…nitoring - Remove on_sandbox_created from get_observer_booter_by_id; observer access does not acquire a lease and must not trigger creation-side initialization. The hook still fires on first lease via switch/takeover/create_sandbox. - Protect created_hook_fired check-and-set with _sandbox_boot_lock to prevent duplicate triggers under concurrent lease operations. - Allow Dashboard screenshot/shell to bypass lease check (session_id=None) so admins can monitor any sandbox regardless of current controller.

… semantics - get_observer_booter_by_id now accepts require_lease=True (default). When False, it allows read-only observer access to sandboxes controlled by other sessions without raising, but skips touch_sandbox and idle_cleanup reset so observer polling does not extend another session's sandbox lifetime. - Dashboard shell restores session_id lease check (require_lease=True default); only screenshot bypasses lease with require_lease=False for monitoring. - Update test_sandbox_dashboard_screenshot_respects_session_ownership to test_sandbox_dashboard_screenshot_bypasses_lease_for_monitoring, verifying that read-only observer access succeeds without a lease.

Dashboard is an administrative interface; both shell and screenshot should be usable on any sandbox regardless of current lease holder. - run_shell: pass require_lease=False to get_observer_booter_by_id - Update test to verify admin shell access bypasses session ownership

Add canonical sandbox lifecycle states to replace vague 'not running' errors: - creating: boot in progress - running: active and available - error: creation failed or booter health check failed - stopping: destroy in progress - stopped: destroyed but persistent record kept - unknown: reconciled on startup Changes: - SandboxStatus enum gains CREATING, ERROR, STOPPING states - create_sandbox_uncontrolled sets CREATING before boot, ERROR on failure - _finalize_created_booter sets RUNNING after successful boot - destroy_sandbox sets STOPPING before teardown, STOPPED for persistent - get_observer_booter_by_id returns state-aware error messages: 'still being created', 'being destroyed', 'has been destroyed', 'encountered an error', 'unavailable (health check failed)' - booter_available failure now updates status to ERROR instead of unknown

- Add status column with color-coded chips: creating(amber), running(success), error(error), stopping(warning), stopped/stopped(grey) - Add i18n labels for all new lifecycle states in en-US and zh-CN - Disable action buttons (console, screenshot, config, set-default) when sandbox status is not 'running' to prevent errors on creating/stopping sandboxes - Update table headers to show status instead of lease chip

Return dashboard sandbox create requests immediately in creating state, boot them in background, and let the UI transition from creating to idle or occupied without manual refresh. Also harden pending boot cleanup and keep sandbox details synchronized with refreshed records.

Clarify sandbox guidance so the model checks the current sandbox first, lists reusable sandboxes next, and treats creating a new sandbox as a last resort reserved for explicit fresh or separate environment requests.

Update sandbox prompts, tool descriptions, and dashboard messages to use occupancy wording instead of lease or 租约, including the release completion message and related labels.

feat: extract sandbox runtimes from core

195333e

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 9, 2026

sourcery-ai Bot reviewed May 9, 2026

View reviewed changes

dosubot Bot added the area:core The bug / feature is about astrbot's core, backend label May 9, 2026

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

zouyonghe added 24 commits May 9, 2026 13:05

fix(dashboard): restore sandbox management UI

252ccd0

fix: align sandbox dashboard behavior and defaults

92017fb

feat(agent): add sandbox lifecycle tools in sandbox mode

3ffbb15

fix(dashboard): clarify sandbox status and release flow

d7bfb2f

fix(sandbox): refine dashboard controls and merge upstream

3b93de3

fix(ci): stabilize sandbox tests and formatting

6d0c934

fix(sandbox): harden dashboard access and docs

d4eca5f

fix(test): skip sandbox schema checks when plugins are absent

7204f2f

docs: add sandbox runtime plugin guide

4b7a01c

docs: clarify sandbox config override source

c20ce99

docs: make sandbox plugin guide developer focused

449c1b7

docs: move sandbox runtime guide out of config docs

211de97

fix: address sandbox provider review feedback

f2f0cff

fix(sandbox): restore manager-backed skill sync

56b166b

Use sandbox_manager as the source of active sandbox booters so provider sandboxes receive skill updates again, and write the sandbox registry atomically to avoid corrupting persisted state on crashes.

docs(protocol): recommend accepting plugin_config in __init__

5744856

zouyonghe force-pushed the feat/generic-sandbox-core branch from 5e11841 to 48a1900 Compare May 9, 2026 17:08

zouyonghe force-pushed the feat/generic-sandbox-core branch 2 times, most recently from 2bbed6d to 94b1e80 Compare May 9, 2026 17:29

zouyonghe force-pushed the feat/generic-sandbox-core branch from 94b1e80 to 14a5d12 Compare May 9, 2026 17:34

zouyonghe added 4 commits May 10, 2026 04:29

style: format computer client

dbc3c9b

fix(prompt): prefer reusing default sandbox

3e660e4

Clarify sandbox guidance so the model checks the current sandbox first, lists reusable sandboxes next, and treats creating a new sandbox as a last resort reserved for explicit fresh or separate environment requests.

docs(sandbox): rename lease wording to occupancy

e617bb0

Update sandbox prompts, tool descriptions, and dashboard messages to use occupancy wording instead of lease or 租约, including the release completion message and related labels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: extract sandbox runtimes from core#8093

feat: extract sandbox runtimes from core#8093
zouyonghe wants to merge 30 commits intoAstrBotDevs:masterfrom
zouyonghe:feat/generic-sandbox-core

zouyonghe commented May 9, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Uh oh!

gemini-code-assist Bot May 9, 2026

Uh oh!

gemini-code-assist Bot May 9, 2026

Uh oh!

gemini-code-assist Bot May 9, 2026

Uh oh!

gemini-code-assist Bot May 9, 2026

Uh oh!

gemini-code-assist Bot May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

zouyonghe commented May 9, 2026

Summary

Verification

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant