API Documentation - Web Scraper Online

📘 Introduction

Welcome to the Web Scraper Online API documentation. This API allows you to programmatically scrape websites, manage jobs, and retrieve statistics.

Base URL

https://cloner.gtechgroup.it/api/v1

Authentication

All API requests require authentication using an API key. Include your API key in one of two ways:

Header: X-API-Key: YOUR_API_KEY
Query Parameter: ?api_key=YOUR_API_KEY

Note: Contact the administrator to obtain an API key. Default rate limit is 60 requests per minute.

📍 Endpoints

GET /info

Get API information and available endpoints.

Request

curl -X GET "https://cloner.gtechgroup.it/api/v1/info" \
     -H "X-API-Key: YOUR_API_KEY"

Response (200 OK)

{
  "success": true,
  "message": "API information",
  "data": {
    "version": "1.0.11",
    "name": "Web Scraper Online API",
    "endpoints": { ... },
    "rate_limit": "60 req/min"
  },
  "timestamp": 1705824000,
  "api_version": "v1"
}

POST /scrape

Create a new scraping job (async mode).

Request Body

Parameter	Type	Required	Description
`url`	string	Yes	The URL to scrape
`depth`	integer	No	Crawling depth (0-5, default: 2)
`max_pages`	integer	No	Maximum pages to scrape (default: 100)
`html_only`	boolean	No	Download only HTML files (default: false)
`async`	boolean	No	Async mode (default: true)
`filters`	object	No	Advanced filters (presets, patterns, etc.)

Request Example

curl -X POST "https://cloner.gtechgroup.it/api/v1/scrape" \
     -H "X-API-Key: YOUR_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
       "url": "https://example.com",
       "depth": 2,
       "max_pages": 50,
       "html_only": false,
       "async": true,
       "filters": {
         "presets": ["no_tracking"],
         "exclude_patterns": ["/admin", "/login"]
       }
     }'

Response (202 Accepted)

{
  "success": true,
  "message": "Job created",
  "data": {
    "job_id": "job_20250120123045_a1b2c3d4",
    "status": "pending",
    "status_url": "https://cloner.gtechgroup.it/api/v1/jobs/job_20250120123045_a1b2c3d4",
    "message": "Job queued for processing. Check status_url for updates."
  },
  "timestamp": 1705824000,
  "api_version": "v1"
}

GET /jobs

List all jobs with optional filtering.

Query Parameters

Parameter	Type	Description
`limit`	integer	Results per page (default: 100)
`offset`	integer	Pagination offset (default: 0)
`status`	string	Filter by status (pending, running, completed, failed)
`url`	string	Filter by URL pattern

Request

curl -X GET "https://cloner.gtechgroup.it/api/v1/jobs?limit=10&status=completed" \
     -H "X-API-Key: YOUR_API_KEY"

Response (200 OK)

{
  "success": true,
  "message": "Jobs retrieved",
  "data": {
    "jobs": [ ... ],
    "count": 10,
    "limit": 10,
    "offset": 0
  },
  "timestamp": 1705824000,
  "api_version": "v1"
}

GET /jobs/{id}

Get details of a specific job.

Request

curl -X GET "https://cloner.gtechgroup.it/api/v1/jobs/job_20250120123045_a1b2c3d4" \
     -H "X-API-Key: YOUR_API_KEY"

Response (200 OK)

{
  "success": true,
  "message": "Job retrieved",
  "data": {
    "id": "job_20250120123045_a1b2c3d4",
    "url": "https://example.com",
    "status": "completed",
    "created_at": 1705824000,
    "started_at": 1705824010,
    "completed_at": 1705824125,
    "duration": 115,
    "result": {
      "zip_file": "example_com_20250120123045.zip",
      "zip_size": 5242880
    },
    "stats": {
      "pages_downloaded": 25,
      "resources_downloaded": 150,
      "bytes_downloaded": 5242880,
      "duplicates_skipped": 3,
      "errors": 0
    }
  },
  "timestamp": 1705824125,
  "api_version": "v1"
}

DELETE /jobs/{id}

Delete a specific job.

Request

curl -X DELETE "https://cloner.gtechgroup.it/api/v1/jobs/job_20250120123045_a1b2c3d4" \
     -H "X-API-Key: YOUR_API_KEY"

Response (200 OK)

{
  "success": true,
  "message": "Job deleted",
  "data": {
    "job_id": "job_20250120123045_a1b2c3d4"
  },
  "timestamp": 1705824200,
  "api_version": "v1"
}

GET /stats

Get statistics for a given period.

Query Parameters

Parameter	Values	Description
`period`	today, week, month, year, all	Time period for statistics (default: all)

Request

curl -X GET "https://cloner.gtechgroup.it/api/v1/stats?period=month" \
     -H "X-API-Key: YOUR_API_KEY"

Response (200 OK)

{
  "success": true,
  "message": "Statistics retrieved",
  "data": {
    "total_jobs": 150,
    "completed": 135,
    "failed": 10,
    "running": 3,
    "pending": 2,
    "total_pages": 3500,
    "total_resources": 25000,
    "total_bytes": 524288000,
    "avg_duration": 95.5,
    "success_rate": 90.0,
    "top_domains": { ... },
    "top_cms": { ... }
  },
  "timestamp": 1705824300,
  "api_version": "v1"
}

⚠️ Error Codes

Code	Error	Description
400	Bad Request	Invalid request parameters
401	Unauthorized	Missing or invalid API key
404	Not Found	Endpoint or resource not found
405	Method Not Allowed	HTTP method not supported for this endpoint
429	Too Many Requests	Rate limit exceeded
500	Internal Server Error	Unexpected server error

Error Response Format

{
  "success": false,
  "message": "Error message",
  "error": {
    "error": "ERROR_CODE",
    "details": "Additional details..."
  },
  "timestamp": 1705824400,
  "api_version": "v1"
}

⏱️ Rate Limiting

API requests are rate-limited per API key. Default limit is 60 requests per minute.

Tip: When rate limited, the API returns a 429 Too Many Requests response with a Retry-After header indicating seconds until reset.

🔍 Advanced Filters

Use advanced filters to control what gets scraped:

Available Presets

html_only - Download only HTML files
images_only - Download only images
no_images - Exclude all images
no_videos - Exclude all videos
no_media - Exclude images, videos, and audio
essential_only - Only HTML, CSS, JS up to 5MB
no_tracking - Exclude analytics and tracking scripts

Custom Filters Example

{
  "url": "https://example.com",
  "filters": {
    "presets": ["no_tracking", "no_videos"],
    "exclude_patterns": ["/admin/*", "/wp-admin/*", "*.pdf"],
    "include_patterns": ["/blog/*", "/docs/*"],
    "max_size": "10MB",
    "blocked_extensions": ["zip", "exe", "dmg"]
  }
}

✅ Ready to start? Get your API key from the administrator and start integrating!

🚀 Web Scraper API Documentation

📘 Introduction

Base URL

Authentication

📍 Endpoints

GET /info

Request

Response (200 OK)

POST /scrape

Request Body

Request Example

Response (202 Accepted)

GET /jobs

Query Parameters

Request

Response (200 OK)

GET /jobs/{id}

Request

Response (200 OK)

DELETE /jobs/{id}

Request

Response (200 OK)

GET /stats

Query Parameters

Request

Response (200 OK)

⚠️ Error Codes

Error Response Format

⏱️ Rate Limiting

🔍 Advanced Filters

Available Presets

Custom Filters Example