Commit Graph

1124 Commits

Author SHA1 Message Date
Nicolas 2c1221750b
Merge pull request #449 from mendableai/bugfix/malformed-url-sitemap
Added regex for links in sitemap
2024-07-24 20:37:35 -04:00
Nicolas 6ad7e24403 Update ingestion.tsx 2024-07-24 18:15:51 -04:00
Nicolas 92843a356d Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-07-24 18:13:36 -04:00
Nicolas 1e13ddbe8e Nick: changes to the ui component 2024-07-24 18:13:34 -04:00
Gergő Móricz 623b547292 fix(fly.toml): scale up memory limit 2024-07-24 23:39:00 +02:00
Nicolas 15890772be Scale bump 2024-07-24 16:56:19 -04:00
Eric Ciarla a4bccbe3bb
Firecrawl UI Template
Firecrawl UI template
2024-07-24 15:05:55 -04:00
Eric Ciarla a62c0730c1
Delete package-lock.json 2024-07-24 15:00:19 -04:00
Eric Ciarla 4cb091ad05
Update .gitignore 2024-07-24 14:59:34 -04:00
Eric Ciarla 4596d0b2e6 Add ReadMe and LICENSE 2024-07-24 14:56:53 -04:00
Eric Ciarla 9654721bf2 Vite commit 2024-07-24 14:27:50 -04:00
rafaelsideguide cc98f83fda added failed and completed log events 2024-07-24 15:25:36 -03:00
Jakob Stadlhuber 2dc7be3869 Remove liveness and readiness probes from worker.yaml
This commit removes the liveness and readiness probes configuration from the Kubernetes worker manifest. Additionally, a Service definition for the worker application has been removed. These changes might be necessary to update the deployment strategy or simplify the configuration.
2024-07-24 19:38:54 +02:00
Jakob Stadlhuber d68f349109 Update Kubernetes YAMLs and add worker service
Refactored container configurations in worker, api, and playwright-service YAMLs to streamline syntax and add missing fields. Added a service definition for the worker component and included a new environment variable in the configmap for rate-limiting. These changes enhance configuration clarity and ensure proper resource definitions.
2024-07-24 19:31:37 +02:00
Jakob Stadlhuber f26bda2477 Update Docker build paths in Kubernetes setup README
Corrected relative paths for Docker build commands to ensure the appropriate directories are targeted. This fix is crucial for successful image builds and deployment consistency in the Kubernetes cluster setup.
2024-07-24 19:06:19 +02:00
Jakob Stadlhuber 895e80caa4 Add liveness and readiness probes to Kubernetes configs
Introduced liveness and readiness probes for the Playwright service, API, and worker components. This ensures that Kubernetes can better manage the health and availability of these services by periodically checking their endpoints. This enhancement will improve the robustness and reliability of the deployed applications.
2024-07-24 19:00:23 +02:00
Jakob Stadlhuber be9e7f9edf Update Kubernetes configs for playwright-service, api, and worker
Added new ConfigMap for playwright-service and adjusted existing references.
Applied imagePullPolicy: Always to ensure all images are updated promptly.
Updated README to include --no-cache for Docker build instructions.
2024-07-24 18:54:16 +02:00
Gergo Moricz 60c74357df feat(ScrapeEvents): log queue events 2024-07-24 18:44:14 +02:00
Jakob Stadlhuber 497aa5d25e Update Kubernetes configs for playwright-service, api, and worker
Added new ConfigMap for playwright-service and adjusted existing references.
Applied imagePullPolicy: Always to ensure all images are updated promptly.
Updated README to include --no-cache for Docker build instructions.
2024-07-24 17:55:45 +02:00
rafaelsideguide 4eca6bd301 fix/check-for-auth-on-scrape-log 2024-07-24 12:54:14 -03:00
Nicolas 4ead89f983
Merge pull request #453 from mendableai/nsc/notion-fix
Notion Website Fixes
2024-07-24 11:40:19 -04:00
Nicolas 3a1b8a9797 Update website_params.ts 2024-07-24 11:04:47 -04:00
Nicolas 8b48ec8d30 Update website_params.ts 2024-07-24 11:02:20 -04:00
Gergo Moricz 4d35ad073c feat(monitoring/scrape): include url, worker, response_size 2024-07-24 16:43:39 +02:00
Gergo Moricz 64bcedeefc fix(monitoring): bad success check on scrape 2024-07-24 16:21:59 +02:00
Gergo Moricz d57dbbd0c6 fix: add jobId for scrape 2024-07-24 15:18:12 +02:00
Gergo Moricz 71072fef3b fix(scrape-events): bad logic 2024-07-24 14:46:41 +02:00
Gergo Moricz 7cd9bf92e3 feat: scrape event logging to DB 2024-07-24 14:31:25 +02:00
Rafael Miller 5e728c1a4d
Update apps/api/src/scraper/WebScraper/crawler.ts
no need for regex

Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2024-07-24 08:33:00 -03:00
Eric Ciarla 1b7a00624d Delete old comp 2024-07-23 21:51:08 -04:00
Eric Ciarla 565bc09439 Basic react app 2024-07-23 21:48:11 -04:00
rafaelsideguide 6208ecdbc0 added logger 2024-07-23 17:30:46 -03:00
Eric Ciarla a0d89169ed init 2024-07-23 15:48:12 -04:00
Nicolas f0b07b509b Update index.ts 2024-07-23 15:15:56 -04:00
rafaelsideguide a684bd3c5d added regex for links in sitemap 2024-07-23 09:07:23 -03:00
Nicolas 252bc09ee2
Merge pull request #447 from mendableai/nsc/speed-improvements
/scrape should now be 600ms-900ms faster
2024-07-22 19:18:24 -04:00
Nicolas ac692ef09c
Update CONTRIBUTING.md 2024-07-22 19:17:53 -04:00
Nicolas 30e706b43f Update scrape.ts 2024-07-22 19:15:24 -04:00
Nicolas 8916fec66c Update index.ts 2024-07-22 19:14:53 -04:00
Nicolas 575ddc9e6e Update scrape.ts 2024-07-22 19:12:51 -04:00
Nicolas e31a5007d5 Nick: speed improvements 2024-07-22 18:30:58 -04:00
Nicolas 1bc36e1a56
Update fly-direct.yml 2024-07-22 14:12:55 -04:00
Nicolas b229fbebd8 Update scrape_log.ts 2024-07-19 12:53:26 -04:00
rafaelsideguide 5c02dbe20c fix(isFile): added .tiff extension 2024-07-18 17:07:21 -03:00
Gergo Moricz f0e95ce399 fix(WebCrawler): filter out file URLs when taking URLs from sitemap 2024-07-18 21:49:37 +02:00
Gergo Moricz 95c6c63b85 fix(fly): raise heap limit to 4G per process 2024-07-18 20:56:54 +02:00
Nicolas 5f14f4f788 Update blocklist.ts 2024-07-18 14:20:19 -04:00
Nicolas 6161b83890 Update scrape_log.ts 2024-07-18 14:17:08 -04:00
Nicolas c402c85346 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-07-18 14:16:51 -04:00
Nicolas 2dd7398aad Update scrape_log.ts 2024-07-18 14:16:46 -04:00