diff --git a/README.md b/README.md index 17ba373..205ff3f 100644 --- a/README.md +++ b/README.md @@ -215,8 +215,6 @@ curl -X POST https://api.firecrawl.dev/v0/scrape \ ``` -Coming soon to the Langchain and LLama Index integrations. - ## Using Python SDK ### Installing Python SDK @@ -250,7 +248,7 @@ scraped_data = app.scrape_url(url) ### Extracting structured data from a URL -With LLM extraction, you can easily extract structured data from any URL. We support pydantic schemas to make it easier for you too. Here is how you to use it: +With LLM extraction, you can easily extract structured data from any URL. We support pydanti schemas to make it easier for you too. Here is how you to use it: ```python class ArticleSchema(BaseModel): @@ -283,6 +281,125 @@ query = 'What is Mendable?' search_result = app.search(query) ``` +## Using the Node SDK + +### Installation + +To install the Firecrawl Node SDK, you can use npm: + +```bash +npm install @mendable/firecrawl-js +``` + +### Usage + +1. Get an API key from [firecrawl.dev](https://firecrawl.dev) +2. Set the API key as an environment variable named `FIRECRAWL_API_KEY` or pass it as a parameter to the `FirecrawlApp` class. + + +### Scraping a URL + +To scrape a single URL with error handling, use the `scrapeUrl` method. It takes the URL as a parameter and returns the scraped data as a dictionary. + +```js +try { + const url = 'https://example.com'; + const scrapedData = await app.scrapeUrl(url); + console.log(scrapedData); + +} catch (error) { + console.error( + 'Error occurred while scraping:', + error.message + ); +} +``` + + +### Crawling a Website + +To crawl a website with error handling, use the `crawlUrl` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format. + +```js +const crawlUrl = 'https://example.com'; +const params = { + crawlerOptions: { + excludes: ['blog/'], + includes: [], // leave empty for all pages + limit: 1000, + }, + pageOptions: { + onlyMainContent: true + } +}; +const waitUntilDone = true; +const timeout = 5; +const crawlResult = await app.crawlUrl( + crawlUrl, + params, + waitUntilDone, + timeout +); + +``` + + +### Checking Crawl Status + +To check the status of a crawl job with error handling, use the `checkCrawlStatus` method. It takes the job ID as a parameter and returns the current status of the crawl job. + +```js +const status = await app.checkCrawlStatus(jobId); +console.log(status); +``` + +### Extracting structured data from a URL + +With LLM extraction, you can easily extract structured data from any URL. We support zod schema to make it easier for you too. Here is how you to use it: + +```js +import FirecrawlApp from "@mendable/firecrawl-js"; +import { z } from "zod"; + +const app = new FirecrawlApp({ + apiKey: "fc-YOUR_API_KEY", +}); + +// Define schema to extract contents into +const schema = z.object({ + top: z + .array( + z.object({ + title: z.string(), + points: z.number(), + by: z.string(), + commentsURL: z.string(), + }) + ) + .length(5) + .describe("Top 5 stories on Hacker News"), +}); +const scrapeResult = await app.scrapeUrl("https://firecrawl.dev", { + extractorOptions: { extractionSchema: schema }, +}); +console.log(scrapeResult.data["llm_extraction"]); +``` + +### Search for a query + +With the `search` method, you can search for a query in a search engine and get the top results along with the page content for each result. The method takes the query as a parameter and returns the search results. + +```js +const query = 'what is mendable?'; +const searchResults = await app.search(query, { + pageOptions: { + fetchPageContent: true // Fetch the page content for each search result + } +}); + +``` + + ## Contributing We love contributions! Please read our [contributing guide](CONTRIBUTING.md) before submitting a pull request.