Firecrawl
Updated
π₯ The API to search, scrape, and interact with the web for AI. Three integrated capabilities β Search, Scrape, Interact β exposed through one API. Open source under AGPL-3.0 and self-hostable via docker-compose, the engine also powers the firecrawl.dev cloud SaaS run by the same team.
git clone https://github.com/firecrawl/firecrawl && cd firecrawl && docker compose up pip install firecrawl-py npm install @mendable/firecrawl-js What it does
The infrastructure for βclean, LLM-ready dataβ from the live web is a real bottleneck for AI agents and RAG pipelines. General scrapers leave you to handle JavaScript rendering, complex markup, robots.txt, and multi-step interactions yourself β and the output rarely lands in a shape that an LLM can consume directly.
Firecrawl bundles that infrastructure into one API. Quoting firecrawl.dev: βthe infrastructure layer that helps AI find, read, and act on the live web.β Output is returned as LLM-ready markdown or structured data from the start.
Key features β three integrated capabilities
-
Search β web search
Run a query and get search results, with optional content extraction for each hit in the same call.
-
Scrape β page β clean data
Extract a single URL into JSON, markdown, or branding formats. JavaScript rendering and complex markup are handled automatically.
-
Interact β page automation
Automate clicks, typing, and navigation to reach content that static scraping cannot.
Additional endpoints include Agent (autonomous multi-source research), Crawl (multi-page extraction with depth and page limits), Map (discover indexed URLs on a site), and Batch Scrape (parallel processing of many URLs).
Cloud vs Open Source
| Aspect | Open Source (this repo) | Cloud (firecrawl.dev) |
|---|---|---|
| Operator | You | Firecrawl team |
| License | AGPL-3.0 (SDKs / some UI = MIT) | SaaS terms |
| Extra features | Core engine | Additional cloud-only features (see README comparison) |
| Cost | Your infra cost | 1,000 credits/month free + paid plans |
| Data control | Full self-control | Routed through Firecrawl infrastructure |
| Best for | Strict data residency, cost or customization control | Fast start without infrastructure overhead |
SDKs
| Language | Install |
|---|---|
| Python | pip install firecrawl-py |
| Node.js | npm install @mendable/firecrawl-js |
| Java | JitPack via Gradle / Maven (com.github.firecrawl:firecrawl-java-sdk:2.0) |
| Elixir | {:firecrawl, "~> 1.0"} |
| Rust | firecrawl = "2" |
A community Go SDK is linked separately in the README.
Usage
Cloud (fastest start) β generate an API key at firecrawl.dev and call directly.
curl -X POST 'https://api.firecrawl.dev/v2/search' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{"query": "firecrawl", "limit": 5}'
Self-host β use the docker-compose stack at the repo root.
git clone https://github.com/firecrawl/firecrawl
cd firecrawl
docker compose up
See SELF_HOST.md in the repo for environment setup and dependencies.
From Claude Code β use the Firecrawl MCP. Point it at a self-hosted instance via FIRECRAWL_API_URL to keep the cloud out of the loop entirely.
Notes
- AGPL-3.0 has real obligations β review the copyleft terms before integrating the engine source into a commercial product. Simply calling the API as a client (via MCP or SDK) is generally unaffected.
- SDKs and some UI components are MIT β explicit in the README. Client-side integration draws only the MIT-licensed parts.
- robots.txt respected by default β README quote: βFirecrawl respects robots.txt by default,β and: βIt is the sole responsibility of end users to respect websitesβ policies when scraping.β
- Adoption β firecrawl.dev cites over one million signups and customers including Apple, Canva, and Lovable.
- Actively maintained β near-daily commits since the first commit in April 2024.
Frequently Asked Questions
What is Firecrawl?
Quoting the README: "The API to search, scrape, and interact with the web for AI." A full-stack backend service written in TypeScript, Python, Rust, and Java that both powers the [firecrawl.dev](https://firecrawl.dev) cloud SaaS and is open-source under AGPL-3.0 for anyone to self-host.
Is it open source? What's the license?
Yes β published on GitHub under AGPL-3.0. From the README: "This project is primarily licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). The SDKs and some UI components are licensed under the MIT License." The core engine is AGPL; SDKs and some UI components are MIT.
How does it relate to firecrawl.dev?
firecrawl.dev is the cloud SaaS run by the same Firecrawl team β a hosted version of this engine with additional cloud-only features (see the README's "Open Source vs Cloud" comparison). The free plan starts at 1,000 credits per month.
How do I self-host it?
Use the `docker-compose.yaml` in the repo root and follow the `SELF_HOST.md` guide. It runs as a containerized stack (with services like Redis as dependencies). Not a single `docker run`, but lighter than bare-metal infrastructure deployment.
Which SDKs are available?
Officially supported: Python (`firecrawl-py`), Node.js (`@mendable/firecrawl-js`), Java (Gradle/Maven via JitPack), Elixir (`firecrawl`), and Rust (`firecrawl`). A community Go SDK is also linked in the README.
How do I use it from Claude Code?
Through the [Firecrawl MCP](/en/tools/firecrawl-mcp/). The MCP server can target either the cloud (`FIRECRAWL_API_KEY`) or a self-hosted instance (`FIRECRAWL_API_URL`), so you can use your own deployment from inside Claude as well.