Intro
Hello! I'm a Ninja Web Developer. Hi-Yah!🥷
I made an image generation and editing MCP before.↓
🧠🥷How to make Image generation and editing MCP (Gemini API + Cline and Cursor)
However, it had problems such as fetching data for 4 seconds interval.
And the worst part is that it could display image for the first time, but it could not after second time until I reload the page manually.
So I reloaded the page for 4 seconds interval.🙅
It gives a terrible user experience, because the display flashes every 4 seconds, but I run out of time to fix it.🙇
Therefore, this time I made Ver 2.0 to solved these problems.🎉
Here we go!🚀
Solving problem 1 (Fetching data for 4 seconds interval)
I used stdio
for the transport of MPC in the previous code, since it is the default setting of MPC.↓
https://modelcontextprotocol.io/docs/concepts/transports#standard-input%2Foutput-stdio
Stdio cannot communicate with the browser directly, so I fetched the data in 4 seconds interval from the backend to frontend.
This way is simple and easy, but I didn't think it is good.
In the new code, I change it to WebSocket
which I use to make avatar control MCP in my previous post.↓↓
🧠🥷How to make AI controled Avatar 2 (Vroid MCP + Cline and Cursor + Unity)
WebSocket is a communication protocol that allows real-time, two-way communication between a client (like a browser) and a server over a single, long-lived connection.
MCP also supports SSE (Server-Sent Events)
.
SSE is simpler and easier than WebSocket, but it is less flexible.↓
https://modelcontextprotocol.io/docs/concepts/transports#server-sent-events-sse
On March 26, SSE was replaced with Streamable HTTP
.
It is more flexible than SSE, but looked harder to set in Next.js App than WebSocket.↓
https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http
Comparing MCP transports
Here is a table comparing the difference of MCP transports.↓
MCP Transports |
Next.js Frontend Access |
Next.js Backend Access |
Complexity | Flexibility | Note |
---|---|---|---|---|---|
WebSocket | ✅ WebSocket |
✅ ws |
⭐⭐⭐ | ⭐⭐⭐⭐ | Real-time both-way transport. |
Streamable HTTP |
✅ fetch + Readable Stream |
✅ Next Response + stream |
⭐⭐⭐⭐ | ⭐⭐⭐ | One-way from server to browser. |
SSE | ✅ EventSource |
✅ res.write |
⭐⭐ | ⭐⭐ | One-way from server to browser. |
Polling | ✅ fetch in useEffect |
✅ API route |
⭐ | ⭐ | Browser polls a Next.js API route every few seconds. |
stdio | ❌ | ✅ | ⭐ | ⭐ | MCP stdio can't reach browser. |
Solving problem 2 (The image does not update after second generation)
I thought it was a React useEffect
problem, but it was rather a browser image cache
problem.
In the previous code, a same image URL was set every time, so the cache image was used instead of the new image.
I added a timestamp to image URL, so the browser can notice and reload the new image every time it was made.
Outline of system
This Image generation and editing MCP has a simple three layer structure.
Cline or Cursor → MCP → Next.js Web App
1️⃣ Cline or Cursor part will send the instruction to MCP.
2️⃣ MCP part will relay the instruction to Next.js Web App.
3️⃣ Next.js Web App part will call Gemini API, and generate or edit an image, and display it on the browser.
How to set Next.js Web App
1️⃣ Make the API key for Gemini API
https://aistudio.google.com/app/apikey
2️⃣ Make a Next.js project
npx create-next-app@latest
https://nextjs.org/docs/app/getting-started/installation
3️⃣ Install Gemini API library
npm install @google/genai
4️⃣ Set the codes
Code of frontend (app/page.tsx)
"use client";
import React, { useState, useEffect } from "react";
export default function HomePage() {
const [image, setImage] = useState("");
const [update, setUpdate] = useState(0);
async function generateImage() {
console.log("generateImage called");
try {
const response = await fetch("/api/generate-image");
const data = await response.json();
if (data.imageUrl) {
setImage(`${data.imageUrl}?t=${Date.now()}`);
setUpdate(update + 1);
}
} catch (error) {
console.error("Error generate image:", error);
}
}
async function editImage() {
console.log("editImage called");
try {
const response = await fetch("/api/edit-image");
const data = await response.json();
if (data.imageUrl) {
setImage(`${data.imageUrl}?t=${Date.now()}`);
setUpdate(update + 1);
}
} catch (error) {
console.error("Error edit image:", error);
}
}
useEffect(() => {
const ws = new WebSocket("ws://localhost:8080");
ws.onopen = () => {
console.log("Connected to WebSocket server");
};
ws.onmessage = (event) => {
console.log("Received message: " + event.data);
console.log("Received message type: " + typeof event.data);
if (event.data === "generate_image") {
generateImage();
}
if (event.data === "edit_image") {
editImage();
}
};
ws.onclose = () => {
console.log("Disconnected from WebSocket server");
};
return () => {
console.log("Closed WebSocket server");
ws.close();
};
}, []);
return (
<div>
<h1>Generate and Edit Image</h1>
{image && (
<img src={image} alt="Generated Image" style={{ maxWidth: "500px" }} />
)}
</div>
);
}
Code of image generation (app/api/generate-image/route.ts)
This code is same as Ver 1.0.
import { NextRequest, NextResponse } from "next/server";
import { GoogleGenAI, Modality } from "@google/genai";
import fs from "fs/promises";
import path from "path";
let cachedImage = "";
export async function POST(req: NextRequest) {
try {
const { prompt } = await req.json();
const ai = new GoogleGenAI({
apiKey: process.env.GEMINI_API_KEY!,
});
const response = await ai.models.generateContent({
model: "gemini-2.0-flash-exp-image-generation",
contents: prompt,
config: {
responseModalities: [Modality.TEXT, Modality.IMAGE],
},
});
let imageData = "";
if (
response.candidates &&
response.candidates[0] &&
response.candidates[0].content &&
response.candidates[0].content.parts
) {
for (const part of response.candidates[0].content.parts) {
if (part.inlineData) {
imageData = part.inlineData.data || "";
break;
}
}
}
if (imageData) {
let base64Data = imageData;
if (imageData.startsWith("data:image")) {
base64Data = imageData.split(",")[1];
}
if (base64Data) {
const imageBuffer = Buffer.from(base64Data, "base64");
const imagePath = path.join(
process.cwd(),
"public",
"generated-image.png"
);
await fs.writeFile(imagePath, imageBuffer);
const imageUrl = "/generated-image.png";
cachedImage = imageUrl;
} else {
console.error("No base64 data found in image data");
return NextResponse.json(
{ error: "No base64 data found in image data" },
{ status: 500 }
);
}
} else {
console.error("No image data found in response");
return NextResponse.json(
{ error: "No image data found in response" },
{ status: 500 }
);
}
return NextResponse.json({ imageUrl: cachedImage });
} catch (error: unknown) {
console.error("Error:", error);
return NextResponse.json(
{ error: (error as Error).message },
{ status: 500 }
);
}
}
export async function GET() {
return NextResponse.json({ imageUrl: cachedImage });
}
Code of image editing (app/api/edit-image/route.ts)
This code is same as Ver 1.0.
import { NextRequest, NextResponse } from "next/server";
import { GoogleGenAI, Modality } from "@google/genai";
import fs from "fs/promises";
import path from "path";
let cachedImage = "";
export async function POST(req: NextRequest) {
try {
const { prompt } = await req.json();
const imagePath = path.join(process.cwd(), "public", "generated-image.png");
const imageBuffer = await fs.readFile(imagePath);
const ai = new GoogleGenAI({
apiKey: process.env.GEMINI_API_KEY!,
});
const contents = [
{ text: prompt },
{
inlineData: {
mimeType: "image/png",
data: imageBuffer.toString("base64"),
},
},
];
const response = await ai.models.generateContent({
model: "gemini-2.0-flash-exp-image-generation",
contents,
config: {
responseModalities: [Modality.TEXT, Modality.IMAGE],
},
});
let imageData = "";
if (
response.candidates &&
response.candidates[0] &&
response.candidates[0].content &&
response.candidates[0].content.parts
) {
for (const part of response.candidates[0].content.parts) {
if (part.inlineData) {
imageData = part.inlineData.data || "";
break;
}
}
}
if (imageData) {
let base64Data = imageData;
if (imageData.startsWith("data:image")) {
base64Data = imageData.split(",")[1];
}
if (base64Data) {
const imageBuffer = Buffer.from(base64Data, "base64");
const imagePath = path.join(
process.cwd(),
"public",
"edited-image.png"
);
await fs.writeFile(imagePath, imageBuffer);
const imageUrl = "/edited-image.png";
cachedImage = imageUrl;
} else {
console.error("No base64 data found in image data");
return NextResponse.json(
{ error: "No base64 data found in image data" },
{ status: 500 }
);
}
} else {
console.error("No image data found in response");
return NextResponse.json(
{ error: "No image data found in response" },
{ status: 500 }
);
}
return NextResponse.json({ imageUrl: cachedImage });
} catch (error: unknown) {
console.error("Error:", error);
return NextResponse.json(
{ error: (error as Error).message },
{ status: 500 }
);
}
}
export async function GET() {
return NextResponse.json({ imageUrl: cachedImage });
}
Environment Variable (.env.local)
GEMINI_API_KEY=your_real_api_key_here
How to set Image generation and editing MCP
1️⃣ Make a folder for Image MCP Server and open it from your editor.
2️⃣ Make package.json
.↓
npm init
3️⃣ Install MCP SDK
.↓
npm install @modelcontextprotocol/sdk
4️⃣ Install WebSocket
npm install --save-dev @types/ws
5️⃣ Make tsconfig.json
.↓
tsc --init
6️⃣ Add "build": "tsc",
to scripts
of package.json
.
7️⃣ Add index.ts
(Typescript) of Image MCP Server.↓
#!/usr/bin/env node
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
CallToolRequestSchema,
ErrorCode,
ListToolsRequestSchema,
McpError,
} from "@modelcontextprotocol/sdk/types.js";
import axios from "axios";
class ImageServer {
private server: Server;
constructor() {
this.server = new Server(
{
name: "image-mcp-server",
version: "0.1.0",
},
{
capabilities: {
resources: {},
tools: {},
},
}
);
this.setupToolHandlers();
this.server.onerror = (error) => console.error("[MCP Error]", error);
process.on("SIGINT", async () => {
await this.server.close();
process.exit(0);
});
}
private setupToolHandlers() {
this.server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [
{
name: "generate_image",
description: "Generates an image using Gemini API",
inputSchema: {
type: "object",
properties: {
prompt: {
type: "string",
description: "The prompt to use for image generation",
},
},
required: ["prompt"],
},
},
{
name: "edit_image",
description: "Edits an image using Gemini API",
inputSchema: {
type: "object",
properties: {
prompt: {
type: "string",
description: "The prompt to use for image editing",
},
},
required: ["prompt"],
},
},
],
}));
this.server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === "generate_image") {
const { prompt } = request.params.arguments as { prompt: string };
try {
await axios.post("http://localhost:3000/api/generate-image", {
prompt: prompt,
});
return {
content: [
{
type: "text",
text: "Image generated successfully",
},
],
};
} catch (error: any) {
console.error(error);
return {
content: [
{
type: "text",
text: `Error generating image: ${error.message}`,
},
],
isError: true,
};
}
} else if (request.params.name === "edit_image") {
const { prompt } = request.params.arguments as { prompt: string };
try {
await axios.post("http://localhost:3000/api/edit-image", {
prompt: prompt,
});
return {
content: [
{
type: "text",
text: "Image edited successfully",
},
],
};
} catch (error: any) {
console.error(error);
return {
content: [
{
type: "text",
text: `Error editing image: ${error.message}`,
},
],
isError: true,
};
}
}
throw new McpError(
ErrorCode.MethodNotFound,
`Unknown tool: ${request.params.name}`
);
});
}
async run() {
const transport = new StdioServerTransport();
await this.server.connect(transport);
console.error("Image MCP server running on stdio");
console.log("mcp ok!");
}
}
const server = new ImageServer();
server.run().catch(console.error);
8️⃣ Build index.ts
to index.js
.↓
run npm build
9️⃣ Set cline_mcp_settings.json
for Cline and mcp.json
for Cursor.↓
{
"mcpServers": {
"image-mcp-server": {
"command": "node",
"args": ["path to index.js"]
}
}
}
How to use them
1️⃣ npm run dev
, and start the Next.js App, and access http://localhost:3000
.
2️⃣ Ask your Cline or Cursor to generate an image.
For example,
Use "generate_image" tool of "image-mcp-server",
and send "Create a plumber wearing a red overall
and a red hat with a mustache.
It shouldn't look too much like Mario."
3️⃣ Ask your Cline or Cursor to edit an image.
For example,
Use "edit_image" tool of "image-mcp-server",
and send "Change the overall and hat to green.".
4️⃣ Yay! We can generate and edit images from Cline and Cursor.🎉
Outro
Making our original App is very challenging and fun, but it also takes so much time.
However, we have lots of new learning and accomplishment.
So let's try new skills more and more.
I hope you will learn something from this post, or maybe enjoy even a little.
Thank you for reading.
Happy AI coding!🤖 Hi-Yah!🥷
Wait! I can improve the App more?
After I wrote this post to this point, I suddenly noticed that this MCP will be simpler if I replace WebSocket + stdio to WebSocket only.
To explain in other words, I can replace the stdio to only one WebSocket which is used to transmit to the backend of Next.js.
To explain in a table, this MCP will change like this.↓↓
App Version | MCP → Next.js Frontend | MCP → Next.js Backend |
---|---|---|
Ver 1.0 | polling | stdio |
Ver 2.0 | WebSocket | stdio |
Ver 3.0 | WebSocket | WebSocket |
I wanted to notice faster..., but anyway I gave it a try to make Ver 3.0.
Ver 3.0
This is the MCP server (index.ts) Ver 3.0.
MCP and Backend transmit had been changed from stdio to WebSocket.
Other codes are same as before.
#!/usr/bin/env node
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { WebSocketServer } from "ws";
import {
CallToolRequestSchema,
ErrorCode,
ListToolsRequestSchema,
McpError,
} from "@modelcontextprotocol/sdk/types.js";
class ImageServer {
private server: Server;
private wss: WebSocketServer;
constructor() {
this.server = new Server(
{
name: "image-mcp-server",
version: "0.1.0",
},
{
capabilities: {
resources: {},
tools: {},
},
}
);
this.wss = new WebSocketServer({ port: 8080 });
this.wss.on("connection", (ws) => {
console.log("Client connected");
ws.on("message", async (message) => {
try {
const request = JSON.parse(message.toString());
if (
request.method === "callTool" &&
request.params.name === "generate_image"
) {
const { prompt } = request.params.arguments.prompt;
const response = await fetch(
"http://localhost:3000/api/generate-image",
{
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({ prompt: prompt }),
}
);
if (!response.ok) {
throw new Error(
`generate_image HTTP error! status: ${response.status}`
);
}
const data = await response.json();
if (!data.imageUrl) {
ws.send(
JSON.stringify({ error: "No image data found in response" })
);
} else {
this.wss.clients.forEach((client) => {
console.log("Sending generate_image");
client.send("generate_image");
});
ws.send(JSON.stringify({ imageUrl: data.imageUrl }));
}
} else if (
request.method === "callTool" &&
request.params.name === "edit_image"
) {
const { prompt } = request.params.arguments.prompt;
const response = await fetch(
"http://localhost:3000/api/edit-image",
{
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({ prompt: prompt }),
}
);
if (!response.ok) {
throw new Error(
`edit_image HTTP error! status: ${response.status}`
);
}
const data = await response.json();
if (!data.imageUrl) {
ws.send(
JSON.stringify({ error: "No image data found in response" })
);
} else {
this.wss.clients.forEach((client) => {
console.log("Sending edit_image");
client.send("edit_image");
});
ws.send(JSON.stringify({ imageUrl: data.imageUrl }));
}
}
} catch (error: any) {
console.error(error);
ws.send(JSON.stringify({ error: error.message }));
}
});
ws.on("close", () => {
console.log("Client disconnected");
});
});
this.setupToolHandlers();
this.server.onerror = (error) => console.error("[MCP Error]", error);
process.on("SIGINT", async () => {
await this.server.close();
process.exit(0);
});
}
private setupToolHandlers() {
this.server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [
{
name: "generate_image",
description: "Generates an image using Gemini API",
inputSchema: {
type: "object",
properties: {
prompt: {
type: "string",
description: "The prompt to use for image generation",
},
},
required: ["prompt"],
},
},
{
name: "edit_image",
description: "Edits an image using Gemini API",
inputSchema: {
type: "object",
properties: {
prompt: {
type: "string",
description: "The prompt to use for image editing",
},
},
required: ["prompt"],
},
},
],
}));
}
async run() {
console.error("Image MCP server running on websocket");
console.log("mcp ok!");
}
}
const server = new ImageServer();
server.run().catch(console.error);
Outro 2
Using WebSocket
seems good for communication between MCP and both Next.js frontend and backend.
Transport is difficult to understand, and I would like to learn more another day.
I hope you will learn something from this post, or maybe enjoy even a little.
Thank you for reading.
Happy AI coding!🤖 Hi-Yah!🥷
Top comments (0)