Skip to content

stagehand mcp - add screenshots as a resource #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 2 additions & 7 deletions stagehand/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,14 +64,9 @@ A Model Context Protocol (MCP) server that provides AI-powered web automation ca

### Resources

The server provides access to two types of resources:
The server provides access to one resource:

1. **Console Logs** (`console://logs`)

- Browser console output in text format
- Includes all console messages from the browser

2. **Screenshots** (`screenshot://<name>`)
**Screenshots** (`screenshot://<name>`)
- PNG images of captured screenshots
- Accessible via the screenshot name specified during capture

Expand Down
91 changes: 55 additions & 36 deletions stagehand/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ import {
CallToolResult,
Tool,
ListResourcesRequestSchema,
ListResourceTemplatesRequestSchema
ListResourceTemplatesRequestSchema,
ReadResourceRequestSchema
} from "@modelcontextprotocol/sdk/types.js";

import { Stagehand } from "@browserbasehq/stagehand";
Expand Down Expand Up @@ -134,28 +135,18 @@ const TOOLS: Tool[] = [
},
{
name: "screenshot",
description: "Take a screenshot of the current page. Use this tool to learn where you are on the page when controlling the browser with Stagehand.",
description: "Takes a screenshot of the current page. Use this tool to learn where you are on the page when controlling the browser with Stagehand. Only use this tool when the other tools are not sufficient to get the information you need.",
inputSchema: {
type: "object",
properties: {
fullPage: {
type: "boolean",
description: "Whether to take a screenshot of the full page (true) or just the visible viewport (false). Default is false."
},
path: {
type: "string",
description: "Optional. Custom file path where the screenshot should be saved. If not provided, a default path will be used."
}
}
properties: {},
},
},
];

// Global state
let stagehand: Stagehand | undefined;
let serverInstance: Server | undefined;
const consoleLogs: string[] = [];
const operationLogs: string[] = [];
const screenshots = new Map<string, string>();

function log(message: string, level: 'info' | 'error' | 'debug' = 'info') {
const timestamp = new Date().toISOString();
Expand Down Expand Up @@ -401,34 +392,33 @@ async function handleToolCall(

case "screenshot":
try {
const fullPage = args.fullPage === true;

// Create a screenshots directory next to the logs directory
const SCREENSHOTS_DIR = path.join(__dirname, '../screenshots');
if (!fs.existsSync(SCREENSHOTS_DIR)) {
fs.mkdirSync(SCREENSHOTS_DIR, { recursive: true });
}

const screenshotBuffer = await stagehand.page.screenshot({
fullPage: false
});

// Generate a filename based on timestamp if path not provided
const screenshotPath = args.path || path.join(SCREENSHOTS_DIR, `screenshot-${new Date().toISOString().replace(/:/g, '-')}.png`);
// Convert buffer to base64 string and store in memory
const screenshotBase64 = screenshotBuffer.toString('base64');
const name = `screenshot-${new Date().toISOString().replace(/:/g, '-')}`;
screenshots.set(name, screenshotBase64);

// If a custom path is provided, ensure its directory exists
if (args.path) {
const customDir = path.dirname(screenshotPath);
if (!fs.existsSync(customDir)) {
fs.mkdirSync(customDir, { recursive: true });
}
//notify the client that the resources changed
if (serverInstance) {
serverInstance.notification({
method: "notifications/resources/list_changed",
});
}

// Take the screenshot
// making fullpage false temporarily
await stagehand.page.screenshot({ path: screenshotPath, fullPage: false });

return {
content: [
{
type: "text",
text: `Screenshot taken and saved to: ${screenshotPath}`,
text: `Screenshot taken with name: ${name}`,
},
{
type: "image",
data: screenshotBase64,
mimeType: "image/png",
},
],
isError: false,
Expand Down Expand Up @@ -536,8 +526,15 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
server.setRequestHandler(ListResourcesRequestSchema, async (request) => {
try {
logRequest('ListResources', request.params);
// Return an empty list since we don't have any resources defined
const response = { resources: [] };
const response = {
resources: [
...Array.from(screenshots.keys()).map((name) => ({
uri: `screenshot://${name}`,
mimeType: "image/png",
name: `Screenshot: ${name}`,
})),
]
};
const sanitizedResponse = sanitizeMessage(response);
logResponse('ListResources', JSON.parse(sanitizedResponse));
return JSON.parse(sanitizedResponse);
Expand Down Expand Up @@ -571,6 +568,28 @@ server.setRequestHandler(ListResourceTemplatesRequestSchema, async (request) =>
}
});

server.setRequestHandler(ReadResourceRequestSchema, async (request) => {
const uri = request.params.uri.toString();

if (uri.startsWith("screenshot://")) {
const name = uri.split("://")[1];
const screenshot = screenshots.get(name);
if (screenshot) {
return {
contents: [
{
uri,
mimeType: "image/png",
blob: screenshot,
},
],
};
}
}

throw new Error(`Resource not found: ${uri}`);
});

// Run the server
async function runServer() {
const transport = new StdioServerTransport();
Expand Down