Skip to content

feat: extract sandbox runtimes from core#8093

Open
zouyonghe wants to merge 30 commits intoAstrBotDevs:masterfrom
zouyonghe:feat/generic-sandbox-core
Open

feat: extract sandbox runtimes from core#8093
zouyonghe wants to merge 30 commits intoAstrBotDevs:masterfrom
zouyonghe:feat/generic-sandbox-core

Conversation

@zouyonghe
Copy link
Copy Markdown
Member

Summary

  • Add generic sandbox provider registration, registry, manager, and dashboard sandbox APIs.
  • Remove built-in CUA, Shipyard, Shipyard Neo, Boxlite, and Bay runtime implementations from core.
  • Keep core sandbox tool mounting provider-driven and remove runtime-specific dashboard/config coupling.

Verification

  • uv run ruff format ... reported 20 files left unchanged
  • uv run ruff check ... reported All checks passed!
  • uv run pytest tests/unit/test_cua_extracted_from_core.py tests/unit/test_sandbox_computer_client.py tests/unit/test_sandbox_manager.py tests/unit/test_sandbox_provider.py tests/unit/test_sandbox_registry.py tests/unit/test_sandbox_models.py tests/unit/test_computer.py tests/unit/test_astr_main_agent.py tests/test_computer_config.py tests/test_dashboard.py -q reported 174 passed, 1 warning

@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 9, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @zouyonghe, your pull request is larger than the review limit of 150000 diff characters

@dosubot dosubot Bot added the area:core The bug / feature is about astrbot's core, backend label May 9, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions the sandbox management system to a provider-based architecture, removing hardcoded implementations for Shipyard and CUA from the core. It introduces a SandboxManager, SandboxRegistry, and a SandboxProvider protocol to handle sandbox lifecycles, leases, and persistence generically. The dashboard is updated with new sandbox routes, and tool registration is refactored to be dynamically driven by providers. Review feedback highlights the need to restore provider-specific system prompts lost during extraction, improve resilience when sandboxes are busy, ensure compatibility with different booter implementations of the available check, and avoid blocking the event loop with synchronous file I/O in the registry.

Comment thread astrbot/core/astr_main_agent.py Outdated
Comment on lines +989 to +994
provider_info = get_sandbox_provider_info(booter)
if provider_info:
for tool_name in provider_info.get("tool_names", []):
tool = tool_mgr.get_func(tool_name)
if tool and getattr(tool, "active", True):
req.func_tool.add_tool(tool)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The generic sandbox tool mounting logic should also include provider-specific system prompt instructions if available. The extraction of runtimes removed important guidance (e.g., file path rules for Shipyard Neo or GUI instructions for CUA) that the LLM needs to use the tools effectively. The SandboxProvider protocol should be updated to provide these instructions. Please ensure this new functionality is accompanied by corresponding unit tests and that the logic for extracting provider info is shared to avoid duplication.

Suggested change
provider_info = get_sandbox_provider_info(booter)
if provider_info:
for tool_name in provider_info.get("tool_names", []):
tool = tool_mgr.get_func(tool_name)
if tool and getattr(tool, "active", True):
req.func_tool.add_tool(tool)
provider_info = get_sandbox_provider_info(booter)
if provider_info:
if prompt := provider_info.get("system_prompt"):
req.system_prompt = f"{req.system_prompt or ''}\n{prompt}\n"
for tool_name in provider_info.get("tool_names", []):
tool = tool_mgr.get_func(tool_name)
if tool and getattr(tool, "active", True):
req.func_tool.add_tool(tool)
References
  1. When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.
  2. New functionality, such as handling attachments, should be accompanied by corresponding unit tests.

Comment on lines +27 to +32
def _sandbox_provider_info(provider_id: str, provider: SandboxProvider) -> dict:
return {
"provider_id": provider_id,
"capabilities": sorted(getattr(provider, "capabilities", set())),
"tool_names": sorted(getattr(provider, "tool_names", set())),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Include the system_prompt from the provider in the info dictionary so it can be utilized by the agent during tool registration. Refactor the logic into a shared helper function to avoid code duplication. Also, ensure this new functionality is accompanied by unit tests.

Suggested change
def _sandbox_provider_info(provider_id: str, provider: SandboxProvider) -> dict:
return {
"provider_id": provider_id,
"capabilities": sorted(getattr(provider, "capabilities", set())),
"tool_names": sorted(getattr(provider, "tool_names", set())),
}
def _sandbox_provider_info(provider_id: str, provider: SandboxProvider) -> dict:
return {
"provider_id": provider_id,
"capabilities": sorted(getattr(provider, "capabilities", set())),
"tool_names": sorted(getattr(provider, "tool_names", set())),
"system_prompt": getattr(provider, "system_prompt", ""),
}
References
  1. When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.
  2. New functionality, such as handling attachments, should be accompanied by corresponding unit tests.

available = getattr(booter, "available", None)
if available is None:
return True
return await available()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The available attribute might be a property or a synchronous method in some implementations. It's safer to check if it's callable before awaiting it to avoid a TypeError.

        if callable(available):
            return await available()
        return bool(available)

Comment on lines +178 to +186
if not self.acquire_lease(current_sandbox_id, session_id):
raise RuntimeError(f"Sandbox {current_sandbox_id} is busy")
booter = self.session_booter[current_sandbox_id]
if await self.booter_available(booter):
self.registry.touch_sandbox(current_sandbox_id)
self.save_registry()
self.schedule_idle_cleanup(current_sandbox_id, idle_timeout)
return booter
self.session_booter.pop(current_sandbox_id, None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the current sandbox is busy (e.g., taken over by another session), raising a RuntimeError here will cause the agent to fail. Instead of crashing, it would be more resilient to clear the current sandbox assignment for this session and fall through to the logic that finds the default sandbox or creates a new one.

Suggested change
if not self.acquire_lease(current_sandbox_id, session_id):
raise RuntimeError(f"Sandbox {current_sandbox_id} is busy")
booter = self.session_booter[current_sandbox_id]
if await self.booter_available(booter):
self.registry.touch_sandbox(current_sandbox_id)
self.save_registry()
self.schedule_idle_cleanup(current_sandbox_id, idle_timeout)
return booter
self.session_booter.pop(current_sandbox_id, None)
if self.acquire_lease(current_sandbox_id, session_id):
booter = self.session_booter[current_sandbox_id]
if await self.booter_available(booter):
self.registry.touch_sandbox(current_sandbox_id)
self.save_registry()
self.schedule_idle_cleanup(current_sandbox_id, idle_timeout)
return booter
self.session_booter.pop(current_sandbox_id, None)
else:
self.registry.set_current_sandbox_id(session_id, None)
self.save_registry()

Comment on lines +9 to +12
class SandboxProvider(Protocol):
provider_id: str
capabilities: set[str]
tool_names: set[str]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The SandboxProvider protocol should include a system_prompt field to allow runtimes to provide necessary instructions to the LLM (e.g., path conventions or specific tool usage workflows). New functionality should be accompanied by corresponding unit tests.

Suggested change
class SandboxProvider(Protocol):
provider_id: str
capabilities: set[str]
tool_names: set[str]
class SandboxProvider(Protocol):
provider_id: str
capabilities: set[str]
tool_names: set[str]
system_prompt: str = ""
References
  1. New functionality, such as handling attachments, should be accompanied by corresponding unit tests.


def save(self) -> None:
self.storage_path.parent.mkdir(parents=True, exist_ok=True)
self.storage_path.write_text(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Performing synchronous file I/O with write_text inside the event loop can block other coroutines, especially if the registry grows large. Consider using an executor or an asynchronous file library for persistence.

zouyonghe added 24 commits May 9, 2026 13:05
Use sandbox_manager as the source of active sandbox booters so provider sandboxes receive skill updates again, and write the sandbox registry atomically to avoid corrupting persisted state on crashes.
- Use sandbox_manager.booter_available() in sync_skills_to_active_sandboxes to
  avoid TypeError when provider available is a sync method or property.
- Release lease and save registry when current booter becomes unavailable in
  get_or_create_booter, preventing lease leaks.
- Reschedule idle cleanup on destroy_booter failure instead of leaking the
  sandbox without any future cleanup task.
- Wrap CopyFileBetweenSandboxesTool temp file cleanup in try/finally to prevent
  leaks on upload/download errors.
- Update session_current in takeover_sandbox for consistent get_current_sandbox
  semantics.
- Force-cleanup session_booter, idle_state and registry in destroy_sandbox even
  if destroy_booter raises an exception.
- Add boot_lock around create_sandbox_uncontrolled to prevent concurrent booter
  creation for the same sandbox_id.
- Remove unused session_id parameter from _apply_sandbox_tools.
- Remove unreachable break in get_or_create_booter.
- Update tests to match adjusted _apply_sandbox_tools signature.
…ill sync

- Upgrade SandboxProvider Protocol with optional plugin_config,
  provider_api_version, system_prompt, auto_sync_skills and lifecycle hooks
  (on_sandbox_created / on_sandbox_destroyed).
- register_sandbox_provider now accepts a tools=list parameter. Core
  automatically appends tools to llm_tools and removes them on unregister.
- unregister_sandbox_provider with force=True now synchronously cleans up
  registry records and spawns best-effort async destroy_booter tasks.
- cleanup_managed_sandboxes no longer skips sandboxes whose provider has been
  unregistered; it always frees booters and registry state.
- get_or_create_booter automatically syncs skills after boot unless provider
  sets auto_sync_skills=False.
- invoke optional lifecycle hooks after create and destroy.
- Remove manual skill sync calls from all provider create_booter methods.
- Update tests for new protocol shape and auto-sync behaviour.
…e, best-effort skill sync

- _cleanup_provider_sandboxes_sync now calls registry.save() after deleting
  records so force-unregister cleanup is persisted to disk.
- Extract _finalize_created_booter helper and call it from both
  get_or_create_booter and create_sandbox_uncontrolled so explicit sandbox
  creation (e.g. astrbot_create_sandbox) also runs skill sync and
  on_sandbox_created hooks.
- Wrap auto skill sync in try/except so a sync failure logs a warning instead
  of leaving a live but orphaned sandbox registered in state.
- Extract _invoke_sandbox_created_hook helper.
- Move hook invocation out of _finalize_created_booter so it fires only after
  the sandbox is leased, matching the Protocol docstring guarantee.
- get_or_create_booter calls hook after _finalize_created_booter (lease already
  held during creation).
- create_sandbox calls hook after acquire_lease, eliminating the race where
  hook/skill sync could delay leasing and let idle cleanup expire the sandbox.
…aths

- Make takeover_sandbox async so it can await the hook
- Trigger hook in switch_current_sandbox_checked after lease cleanup
- Trigger hook in get_observer_booter_by_id for dashboard-created sandboxes
- Only mark created_hook_fired on success; transient failures are retried
- Reset idle cleanup timer in create_sandbox after lease acquisition
- Update callers (dashboard route, TakeoverSandboxTool, tests) to await

Closes: on_sandbox_created skipped for dashboard/uncontrolled sandboxes
…nitoring

- Remove on_sandbox_created from get_observer_booter_by_id; observer access
  does not acquire a lease and must not trigger creation-side initialization.
  The hook still fires on first lease via switch/takeover/create_sandbox.

- Protect created_hook_fired check-and-set with _sandbox_boot_lock to prevent
  duplicate triggers under concurrent lease operations.

- Allow Dashboard screenshot/shell to bypass lease check (session_id=None)
  so admins can monitor any sandbox regardless of current controller.
… semantics

- get_observer_booter_by_id now accepts require_lease=True (default).  When
  False, it allows read-only observer access to sandboxes controlled by other
  sessions without raising, but skips touch_sandbox and idle_cleanup reset so
  observer polling does not extend another session's sandbox lifetime.

- Dashboard shell restores session_id lease check (require_lease=True default);
  only screenshot bypasses lease with require_lease=False for monitoring.

- Update test_sandbox_dashboard_screenshot_respects_session_ownership to
  test_sandbox_dashboard_screenshot_bypasses_lease_for_monitoring, verifying
  that read-only observer access succeeds without a lease.
Dashboard is an administrative interface; both shell and screenshot
should be usable on any sandbox regardless of current lease holder.

- run_shell: pass require_lease=False to get_observer_booter_by_id
- Update test to verify admin shell access bypasses session ownership
Add canonical sandbox lifecycle states to replace vague 'not running' errors:
- creating: boot in progress
- running: active and available
- error: creation failed or booter health check failed
- stopping: destroy in progress
- stopped: destroyed but persistent record kept
- unknown: reconciled on startup

Changes:
- SandboxStatus enum gains CREATING, ERROR, STOPPING states
- create_sandbox_uncontrolled sets CREATING before boot, ERROR on failure
- _finalize_created_booter sets RUNNING after successful boot
- destroy_sandbox sets STOPPING before teardown, STOPPED for persistent
- get_observer_booter_by_id returns state-aware error messages:
  'still being created', 'being destroyed', 'has been destroyed',
  'encountered an error', 'unavailable (health check failed)'
- booter_available failure now updates status to ERROR instead of unknown
@zouyonghe zouyonghe force-pushed the feat/generic-sandbox-core branch from 5e11841 to 48a1900 Compare May 9, 2026 17:08
@zouyonghe zouyonghe force-pushed the feat/generic-sandbox-core branch 2 times, most recently from 2bbed6d to 94b1e80 Compare May 9, 2026 17:29
- Add status column with color-coded chips:
  creating(amber), running(success), error(error), stopping(warning), stopped/stopped(grey)
- Add i18n labels for all new lifecycle states in en-US and zh-CN
- Disable action buttons (console, screenshot, config, set-default) when
  sandbox status is not 'running' to prevent errors on creating/stopping sandboxes
- Update table headers to show status instead of lease chip
@zouyonghe zouyonghe force-pushed the feat/generic-sandbox-core branch from 94b1e80 to 14a5d12 Compare May 9, 2026 17:34
zouyonghe added 4 commits May 10, 2026 04:29
Return dashboard sandbox create requests immediately in creating state, boot them in background, and let the UI transition from creating to idle or occupied without manual refresh. Also harden pending boot cleanup and keep sandbox details synchronized with refreshed records.
Clarify sandbox guidance so the model checks the current sandbox first, lists reusable sandboxes next, and treats creating a new sandbox as a last resort reserved for explicit fresh or separate environment requests.
Update sandbox prompts, tool descriptions, and dashboard messages to use occupancy wording instead of lease or 租约, including the release completion message and related labels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant