lastcrawler.xyz

Back

2026-02-15

4 min read

Developer ToolsAPIsTutorial

Turn Any Website into an API and Extract Structured Data from Any URL

Not every data source has an API. Most don't. Government databases, real estate listings, academic papers, product catalogs, job boards — the data is on the web, behind HTML, and there's no structured way to access it.

Here's how to turn any of them into a typed API you can call from your code.

Step 1: Define your schema

Look at the page. What data do you want? Write it as a JSON schema.

Say you want to extract job listings from a company's careers page:

typescript

const jobSchema = {
  jobs: [{
    title: "string",
    department: "string",
    location: "string",
    type: "string",       // full-time, part-time, contract
    posted_date: "string"
  }]
};

That's it. No inspecting the DOM, no writing CSS selectors, no figuring out which API endpoints the SPA calls internally. For a deeper look at schema-driven extraction, see our URL to JSON API guide.

Step 2: Call the endpoint

typescript

const response = await fetch("https://api.lastcrawler.xyz/json", {
  method: "POST",
  headers: { "Authorization": "Bearer YOUR_KEY" },
  body: JSON.stringify({
    url: "https://example.com/careers",
    schema: jobSchema
  })
});

const { jobs } = await response.json();
// [{ title: "Senior Engineer", department: "Platform", ... }, ...]

You now have typed, structured data from a page that has no API.

Step 3: Add TypeScript types

Since the schema is static, you can define types and get full autocomplete:

typescript

interface Job {
  title: string;
  department: string;
  location: string;
  type: string;
  posted_date: string;
}

interface JobsResponse {
  jobs: Job[];
}

const data: JobsResponse = await response.json();
// Full autocomplete, type checking, the works

Real examples

Real estate listings:

json

{
  "listings": [{
    "address": "string",
    "price": "number",
    "bedrooms": "number",
    "bathrooms": "number",
    "sqft": "number",
    "status": "string"
  }]
}

Academic paper metadata:

json

{
  "title": "string",
  "authors": ["string"],
  "abstract": "string",
  "published": "string",
  "citations": "number",
  "doi": "string"
}

Restaurant menu:

json

{
  "categories": [{
    "name": "string",
    "items": [{
      "name": "string",
      "price": "number",
      "description": "string",
      "dietary": ["string"]
    }]
  }]
}

Each of these takes 30 seconds to define and works on any site in the category. No site-specific code. If you need the raw page content instead of structured JSON, our URL to Markdown API is the right tool for that.

Turn any website into an API that survives redesigns

This is the real advantage over traditional scraping. The schema describes what data you want, not where it is on the page. When the site redesigns, switches frameworks, or restructures its DOM, your extraction keeps working.

We've tested this across hundreds of site redesigns. Schemas hold up in ways that selector-based scraping can't. Under the hood, this is powered by a headless browser API running on trusted edge infrastructure that renders pages in real browsers before extraction.

Wrapping it in a proper API

For production use, wrap the extraction in a simple API route:

typescript

// /api/jobs/[company].ts
export async function GET(req, { params }) {
  const data = await crawler.json(
    COMPANY_URLS[params.company],
    { schema: jobSchema }
  );
  return Response.json(data);
}

Now you have /api/jobs/stripe, /api/jobs/vercel, /api/jobs/linear — all backed by live web data, all returning the same typed schema. No RSS feed required. No official API needed. Any website becomes your data source.

FAQ

How do I turn a website into an API without an official API?

Define a JSON schema for the data you want, then POST the URL and schema to a structured extraction endpoint. You get back clean, typed JSON. Wrap that in a simple API route and you have a live endpoint backed by any webpage — no scraping infrastructure needed.

How do I extract structured data from any URL reliably?

Use schema-driven AI extraction rather than CSS selectors or XPath. Define the fields you want as a JSON schema, and the AI figures out where that data lives on the page. This approach works across site redesigns and framework changes because it understands content semantically, not structurally.

What kinds of websites can be turned into a structured data API?

Any publicly accessible page with readable content: e-commerce product pages, job boards, real estate listings, government databases, academic paper indexes, restaurant menus, pricing pages. If a human can read the data on the page, a schema-driven extractor can return it as structured JSON.

+

Last Crawler

2026-02-15

+_+

Home

2026