2026-03-28
9 min read
Web Scraping MCP Server: Build One in 10 Minutes
If you've used Claude Desktop, Cursor, or Windsurf recently, you've probably noticed they can do a lot more than autocomplete code. They can run terminal commands, read files, call APIs. The protocol behind this is MCP -- Model Context Protocol -- and it's quickly becoming the standard way AI tools interact with the outside world.
The most useful MCP server you can build? One that scrapes the web. Here's how to build a web scraping MCP server in about 50 lines of code using Last Crawler's API.
What is MCP?
MCP (Model Context Protocol) is an open standard created by Anthropic that lets AI applications call external tools. Think of it like USB-C for AI -- a single protocol that connects any AI client to any capability.
Before MCP, every AI tool had its own plugin system. ChatGPT had plugins (RIP). Cursor had its own extensions. Nothing was compatible. MCP fixes that by defining a universal JSON-RPC interface: the AI client sends a request, the MCP server runs a tool, and sends back the result.
An MCP server is just a program that exposes tools. Each tool has a name, a description, and typed parameters. The AI reads those descriptions, decides when to call each tool, and interprets the results.
The key pieces:
- MCP Client: The AI application (Claude Desktop, Cursor, Windsurf, Cline)
- MCP Server: Your code that exposes tools
- Tools: Individual functions the AI can call -- like
extract_json,get_markdown, ortake_screenshot
Why web scraping is the killer MCP use case
AI coding assistants are powerful but blind. They can write code, refactor files, debug errors -- but they can't look at a webpage. They don't know what's on your competitor's pricing page. They can't check if your deployment actually rendered correctly. They can't pull live data from documentation sites.
This is why a web scraping MCP server is so useful. It gives your AI assistant eyes on the internet.
Some things that become possible:
- Ask Claude to "check if the homepage loads correctly" and it actually visits the URL
- Tell Cursor to "build a component that matches this design" and give it a URL instead of a screenshot
- Have your AI research assistant pull structured data from 10 competitor pages
The critical thing here is live data. Agents need real-time information, not cached responses from a training set. A web scraping MCP server hits the actual page, renders JavaScript, and returns fresh content -- which is exactly what makes it different from the AI just "knowing" about a site from training data.
For more on why agents specifically need structured web data, see our deep dive on web scraping APIs for AI agents.
Build it: a web scraping MCP server in ~50 lines
We'll build a TypeScript MCP server that exposes three tools:
- extract_json -- extract structured data from any URL using a JSON schema
- extract_markdown -- convert any webpage to clean markdown
- screenshot -- take a screenshot of any URL
Each tool calls a Last Crawler API endpoint. If you haven't used it before, the short version: you POST a URL and some options, you get back structured data. No browser management, no proxy rotation, no CAPTCHA solving on your end. Check out the URL-to-JSON API and URL-to-Markdown API guides for the full details.
Step 1: Set up the project
bash
mkdir web-scraper-mcp && cd web-scraper-mcp
npm init -y
npm install @modelcontextprotocol/sdk
You'll also need a Last Crawler API key. Grab one from lastcrawler.xyz.
Step 2: Write the server
Create index.ts:
typescript
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
const API_BASE = "https://lastcrawler.xyz/api";
const API_KEY = process.env.LASTCRAWLER_API_KEY!;
async function callApi(endpoint: string, body: object) {
const res = await fetch(`${API_BASE}/${endpoint}`, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${API_KEY}`,
},
body: JSON.stringify(body),
});
if (!res.ok) throw new Error(`API error: ${res.status}`);
return res;
}
const server = new McpServer({
name: "web-scraper",
version: "1.0.0",
});
server.tool(
"extract_json",
"Extract structured data from a URL using a JSON schema",
{
url: z.string().url(),
schema: z.record(z.any()).describe("JSON schema defining the data to extract"),
},
async ({ url, schema }) => {
const res = await callApi("json", { url, schema });
const data = await res.json();
return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] };
}
);
server.tool(
"extract_markdown",
"Convert a webpage to clean markdown",
{ url: z.string().url() },
async ({ url }) => {
const res = await callApi("markdown", { url });
const text = await res.text();
return { content: [{ type: "text", text }] };
}
);
server.tool(
"screenshot",
"Take a screenshot of a URL",
{
url: z.string().url(),
viewport_width: z.number().optional().default(1280),
},
async ({ url, viewport_width }) => {
const res = await callApi("screenshot", {
url,
viewport_width,
});
const base64 = Buffer.from(await res.arrayBuffer()).toString("base64");
return { content: [{ type: "image", data: base64, mimeType: "image/png" }] };
}
);
const transport = new StdioServerTransport();
server.connect(transport);
That's the entire server. Three tools, clean error handling, properly typed.
Step 3: Configure for Claude Desktop
Add this to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
json
{
"mcpServers": {
"web-scraper": {
"command": "npx",
"args": ["tsx", "/path/to/web-scraper-mcp/index.ts"],
"env": {
"LASTCRAWLER_API_KEY": "your-api-key-here"
}
}
}
}
Restart Claude Desktop. You should see the tools appear in the toolbar.
Step 4: Configure for Cursor
Cursor uses the same MCP config format. Add a .cursor/mcp.json file in your project root:
json
{
"mcpServers": {
"web-scraper": {
"command": "npx",
"args": ["tsx", "/path/to/web-scraper-mcp/index.ts"],
"env": {
"LASTCRAWLER_API_KEY": "your-api-key-here"
}
}
}
}
That's it. Cursor picks it up automatically.
What you can do with it
Once the MCP server is running, your AI assistant can browse the web. Here are the use cases that actually matter:
Research agent
Ask Claude: "Compare the pricing of Vercel, Netlify, and Railway. Put it in a table."
Claude calls extract_json three times with a pricing schema, gets back structured data, and builds the comparison table. No manual data entry, no copy-pasting from three browser tabs. The structured extraction approach -- which we cover in detail in the URL-to-JSON guide -- means the AI gets typed numbers it can actually compare, not blobs of text.
Competitive analysis
"Check the top 5 results for 'project management software' and extract their key features and pricing tiers."
The AI calls extract_markdown on each result page, reads the content, then calls extract_json with a features schema to pull structured data. You get a competitive landscape in 60 seconds.
Content monitoring
"Check if our changelog page has been updated since last Tuesday."
The AI calls extract_markdown on your changelog, reads the dates, and tells you. You could wrap this in a cron job with a simple script that triggers the MCP server.
Visual verification
"Take a screenshot of our staging environment and tell me if the hero section looks right."
The AI calls screenshot, gets back the image, and uses its vision capabilities to analyze the render. Useful for catching visual regressions after deploys.
For more patterns like these, our post on building AI agent tools goes deeper into how to design schema-per-task tools that agents can actually reason with.
How this compares to Firecrawl's MCP server
Firecrawl has an MCP server. It's popular. Here's an honest comparison:
| Last Crawler MCP | Firecrawl MCP | |
|---|---|---|
| Setup complexity | ~50 lines, one file | Separate package, more config |
| Endpoints | JSON, Markdown, Screenshot, PDF, Links, Crawl, Scrape, Content, Snapshot | Scrape, Crawl, Map, Extract |
| Structured extraction | Schema-based JSON with type enforcement | Similar extraction, less type control |
| JavaScript rendering | Full headless browser, every request | Full headless browser |
| Screenshot support | Built-in endpoint | Not available as MCP tool |
| Pricing | Pay per request, no tiers | Tiered plans starting at $19/mo |
The biggest practical difference: Last Crawler gives you more endpoint variety. You get markdown conversion, screenshots, PDFs, link extraction, and structured JSON all from the same API. With Firecrawl you're mostly working with scrape and crawl.
The other difference is simplicity. The MCP server above is one file. You can read every line, understand exactly what it does, and modify it for your needs. No black-box SDK wrapping another SDK.
Going further: adding more tools
The server above covers the basics, but Last Crawler's API has more endpoints you could expose:
typescript
// Extract all links from a page
server.tool(
"extract_links",
"Get all links from a webpage",
{ url: z.string().url() },
async ({ url }) => {
const res = await callApi("links", { url });
const data = await res.json();
return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] };
}
);
// Crawl multiple pages
server.tool(
"crawl_site",
"Crawl a website starting from a URL",
{
url: z.string().url(),
max_pages: z.number().optional().default(10),
},
async ({ url, max_pages }) => {
const res = await callApi("crawl", { url, max_pages });
const data = await res.json();
return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] };
}
);
You could also add authentication handling, caching, rate limiting, or request queuing depending on your use case. The MCP SDK supports all of these patterns.
FAQ
What is an MCP server?
An MCP server is a lightweight program that exposes tools to AI applications using the Model Context Protocol standard. It communicates via JSON-RPC, typically over stdio. Any AI client that supports MCP -- like Claude Desktop, Cursor, Windsurf, or Cline -- can discover and call these tools automatically.
Do I need to run a browser locally?
No. Last Crawler handles browser rendering on the server side. Your MCP server is just making HTTP requests -- it's a thin wrapper around API calls. No Puppeteer, no Playwright, no Chrome installation needed.
Can I use this with Python instead of TypeScript?
Yes. The MCP SDK has a Python package (mcp). The same pattern applies -- define tools, call the API, return results. Here's the equivalent Python setup:
bash
pip install mcp
python
from mcp.server.fastmcp import FastMCP
import httpx
mcp = FastMCP("web-scraper")
API_BASE = "https://lastcrawler.xyz/api"
@mcp.tool()
async def extract_json(url: str, schema: dict) -> str:
"""Extract structured data from a URL using a JSON schema."""
async with httpx.AsyncClient() as client:
res = await client.post(
f"{API_BASE}/json",
json={"url": url, "schema": schema},
headers={"Authorization": f"Bearer {API_KEY}"},
)
return res.text
@mcp.tool()
async def extract_markdown(url: str) -> str:
"""Convert a webpage to clean markdown."""
async with httpx.AsyncClient() as client:
res = await client.post(
f"{API_BASE}/markdown",
json={"url": url},
headers={"Authorization": f"Bearer {API_KEY}"},
)
return res.text
How many requests can I make?
Depends on your Last Crawler plan. The API handles rate limiting and retries on its end, so your MCP server doesn't need to worry about it. For most development use -- calling a few URLs per conversation -- you won't come close to any limits.
Can the AI call these tools automatically?
Yes, that's the whole point of MCP. The AI reads the tool descriptions, decides when to call them based on the conversation context, and invokes them with the right parameters. You don't need to tell it to "use the extract_json tool" -- just ask it to "get the pricing data from this URL" and it figures out which tool to use.
Is this secure?
Your API key stays in the MCP server config -- it's never sent to the AI model. The AI only sees the tool descriptions and results. The MCP server runs locally on your machine, and all API calls go directly from your machine to Last Crawler's servers over HTTPS.
Last Crawler
2026-03-28