claudekit / tools / firecrawl
[ Tool · Data Engineering ]

Firecrawl

🔥 The API to search, scrape, and interact with the web for AI. Three integrated capabilities — Search, Scrape, Interact — exposed through one API. Open source under AGPL-3.0 and self-hostable via docker-compose, the engine also powers the firecrawl.dev cloud SaaS run by the same team.

firecrawl/firecrawl ·updated
$ git clone https://github.com/firecrawl/firecrawl && cd firecrawl && docker compose up copy

What it does

The infrastructure for “clean, LLM-ready data” from the live web is a real bottleneck for AI agents and RAG pipelines. General scrapers leave you to handle JavaScript rendering, complex markup, robots.txt, and multi-step interactions yourself — and the output rarely lands in a shape that an LLM can consume directly.

Firecrawl bundles that infrastructure into one API. Quoting firecrawl.dev: “the infrastructure layer that helps AI find, read, and act on the live web.” Output is returned as LLM-ready markdown or structured data from the start.

Key features — three integrated capabilities

  • Search — web search

    Run a query and get search results, with optional content extraction for each hit in the same call.

  • Scrape — page → clean data

    Extract a single URL into JSON, markdown, or branding formats. JavaScript rendering and complex markup are handled automatically.

  • Interact — page automation

    Automate clicks, typing, and navigation to reach content that static scraping cannot.

Additional endpoints include Agent (autonomous multi-source research), Crawl (multi-page extraction with depth and page limits), Map (discover indexed URLs on a site), and Batch Scrape (parallel processing of many URLs).

Cloud vs Open Source

AspectOpen Source (this repo)Cloud (firecrawl.dev)
OperatorYouFirecrawl team
LicenseAGPL-3.0 (SDKs / some UI = MIT)SaaS terms
Extra featuresCore engineAdditional cloud-only features (see README comparison)
CostYour infra cost1,000 credits/month free + paid plans
Data controlFull self-controlRouted through Firecrawl infrastructure
Best forStrict data residency, cost or customization controlFast start without infrastructure overhead

SDKs

LanguageInstall
Pythonpip install firecrawl-py
Node.jsnpm install @mendable/firecrawl-js
JavaJitPack via Gradle / Maven (com.github.firecrawl:firecrawl-java-sdk:2.0)
Elixir{:firecrawl, "~> 1.0"}
Rustfirecrawl = "2"

A community Go SDK is linked separately in the README.

Usage

Cloud (fastest start) — generate an API key at firecrawl.dev and call directly.

curl -X POST 'https://api.firecrawl.dev/v2/search' \
  -H 'Authorization: Bearer fc-YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"query": "firecrawl", "limit": 5}'

Self-host — use the docker-compose stack at the repo root.

git clone https://github.com/firecrawl/firecrawl
cd firecrawl
docker compose up

See SELF_HOST.md in the repo for environment setup and dependencies.

From Claude Code — use the Firecrawl MCP. Point it at a self-hosted instance via FIRECRAWL_API_URL to keep the cloud out of the loop entirely.

Notes

  • AGPL-3.0 has real obligations — review the copyleft terms before integrating the engine source into a commercial product. Simply calling the API as a client (via MCP or SDK) is generally unaffected.
  • SDKs and some UI components are MIT — explicit in the README. Client-side integration draws only the MIT-licensed parts.
  • robots.txt respected by default — README quote: “Firecrawl respects robots.txt by default,” and: “It is the sole responsibility of end users to respect websites’ policies when scraping.”
  • Adoption — firecrawl.dev cites over one million signups and customers including Apple, Canva, and Lovable.
  • Actively maintained — near-daily commits since the first commit in April 2024.
§ 7

See also

same category · curated
[01]
[MCP] Hugging Face · Hugging Face's official remote MCP server, hosted by Hugging Face. Search models, datasets, Spaces, papers, and docs in natural language and call Gradio Space tools from Claude Code.
tool · claudekit.io / tools / huggingface
[02]
[Skill] Last30Days · An AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web — then synthesizes a grounded summary. The engine scores results by actual engagement metrics (upvotes, likes, view counts, prediction market odds) rather than editorial authority, resolves entities first (handles, subreddits, GitHub repos, hashtags), runs parallel multi-source queries, merges duplicate stories across platforms, and produces cited briefs. Reddit, HN, Polymarket, and GitHub work immediately with zero configuration.
tool · claudekit.io / tools / last30days
[03]
[MCP] Firecrawl MCP · 🔥 Official Firecrawl MCP Server — Adds powerful web scraping and search to Cursor, Claude and any other LLM clients. Exposes 12+ tools spanning single-page scrape, batch processing, site crawl, search, structured extraction, autonomous research agent, and interactive page automation, returning clean LLM-ready markdown.
tool · claudekit.io / tools / firecrawl-mcp
§ 8

Frequently Asked Questions

frequently asked
§ 8.1
What is Firecrawl?
Quoting the README: "The API to search, scrape, and interact with the web for AI." A full-stack backend service written in TypeScript, Python, Rust, and Java that both powers the [firecrawl.dev](https://firecrawl.dev) cloud SaaS and is open-source under AGPL-3.0 for anyone to self-host.
§ 8.2
Is it open source? What's the license?
Yes — published on GitHub under AGPL-3.0. From the README: "This project is primarily licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). The SDKs and some UI components are licensed under the MIT License." The core engine is AGPL; SDKs and some UI components are MIT.
§ 8.3
How does it relate to firecrawl.dev?
firecrawl.dev is the cloud SaaS run by the same Firecrawl team — a hosted version of this engine with additional cloud-only features (see the README's "Open Source vs Cloud" comparison). The free plan starts at 1,000 credits per month.
§ 8.4
How do I self-host it?
Use the `docker-compose.yaml` in the repo root and follow the `SELF_HOST.md` guide. It runs as a containerized stack (with services like Redis as dependencies). Not a single `docker run`, but lighter than bare-metal infrastructure deployment.
§ 8.5
Which SDKs are available?
Officially supported: Python (`firecrawl-py`), Node.js (`@mendable/firecrawl-js`), Java (Gradle/Maven via JitPack), Elixir (`firecrawl`), and Rust (`firecrawl`). A community Go SDK is also linked in the README.
§ 8.6
How do I use it from Claude Code?
Through the [Firecrawl MCP](/en/tools/firecrawl-mcp/). The MCP server can target either the cloud (`FIRECRAWL_API_KEY`) or a self-hosted instance (`FIRECRAWL_API_URL`), so you can use your own deployment from inside Claude as well.