🚀 Web Scraper API Documentation

RESTful API v1.0.11 - Complete integration guide

📘 Introduction

Welcome to the Web Scraper Online API documentation. This API allows you to programmatically scrape websites, manage jobs, and retrieve statistics.

Base URL

https://cloner.gtechgroup.it/api/v1

Authentication

All API requests require authentication using an API key. Include your API key in one of two ways:

Note: Contact the administrator to obtain an API key. Default rate limit is 60 requests per minute.

📍 Endpoints

GET /info

Get API information and available endpoints.

Request

curl -X GET "https://cloner.gtechgroup.it/api/v1/info" \
     -H "X-API-Key: YOUR_API_KEY"

Response (200 OK)

{
  "success": true,
  "message": "API information",
  "data": {
    "version": "1.0.11",
    "name": "Web Scraper Online API",
    "endpoints": { ... },
    "rate_limit": "60 req/min"
  },
  "timestamp": 1705824000,
  "api_version": "v1"
}

POST /scrape

Create a new scraping job (async mode).

Request Body

Parameter Type Required Description
url string Yes The URL to scrape
depth integer No Crawling depth (0-5, default: 2)
max_pages integer No Maximum pages to scrape (default: 100)
html_only boolean No Download only HTML files (default: false)
async boolean No Async mode (default: true)
filters object No Advanced filters (presets, patterns, etc.)

Request Example

curl -X POST "https://cloner.gtechgroup.it/api/v1/scrape" \
     -H "X-API-Key: YOUR_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
       "url": "https://example.com",
       "depth": 2,
       "max_pages": 50,
       "html_only": false,
       "async": true,
       "filters": {
         "presets": ["no_tracking"],
         "exclude_patterns": ["/admin", "/login"]
       }
     }'

Response (202 Accepted)

{
  "success": true,
  "message": "Job created",
  "data": {
    "job_id": "job_20250120123045_a1b2c3d4",
    "status": "pending",
    "status_url": "https://cloner.gtechgroup.it/api/v1/jobs/job_20250120123045_a1b2c3d4",
    "message": "Job queued for processing. Check status_url for updates."
  },
  "timestamp": 1705824000,
  "api_version": "v1"
}

GET /jobs

List all jobs with optional filtering.

Query Parameters

Parameter Type Description
limit integer Results per page (default: 100)
offset integer Pagination offset (default: 0)
status string Filter by status (pending, running, completed, failed)
url string Filter by URL pattern

Request

curl -X GET "https://cloner.gtechgroup.it/api/v1/jobs?limit=10&status=completed" \
     -H "X-API-Key: YOUR_API_KEY"

Response (200 OK)

{
  "success": true,
  "message": "Jobs retrieved",
  "data": {
    "jobs": [ ... ],
    "count": 10,
    "limit": 10,
    "offset": 0
  },
  "timestamp": 1705824000,
  "api_version": "v1"
}

GET /jobs/{id}

Get details of a specific job.

Request

curl -X GET "https://cloner.gtechgroup.it/api/v1/jobs/job_20250120123045_a1b2c3d4" \
     -H "X-API-Key: YOUR_API_KEY"

Response (200 OK)

{
  "success": true,
  "message": "Job retrieved",
  "data": {
    "id": "job_20250120123045_a1b2c3d4",
    "url": "https://example.com",
    "status": "completed",
    "created_at": 1705824000,
    "started_at": 1705824010,
    "completed_at": 1705824125,
    "duration": 115,
    "result": {
      "zip_file": "example_com_20250120123045.zip",
      "zip_size": 5242880
    },
    "stats": {
      "pages_downloaded": 25,
      "resources_downloaded": 150,
      "bytes_downloaded": 5242880,
      "duplicates_skipped": 3,
      "errors": 0
    }
  },
  "timestamp": 1705824125,
  "api_version": "v1"
}

DELETE /jobs/{id}

Delete a specific job.

Request

curl -X DELETE "https://cloner.gtechgroup.it/api/v1/jobs/job_20250120123045_a1b2c3d4" \
     -H "X-API-Key: YOUR_API_KEY"

Response (200 OK)

{
  "success": true,
  "message": "Job deleted",
  "data": {
    "job_id": "job_20250120123045_a1b2c3d4"
  },
  "timestamp": 1705824200,
  "api_version": "v1"
}

GET /stats

Get statistics for a given period.

Query Parameters

Parameter Values Description
period today, week, month, year, all Time period for statistics (default: all)

Request

curl -X GET "https://cloner.gtechgroup.it/api/v1/stats?period=month" \
     -H "X-API-Key: YOUR_API_KEY"

Response (200 OK)

{
  "success": true,
  "message": "Statistics retrieved",
  "data": {
    "total_jobs": 150,
    "completed": 135,
    "failed": 10,
    "running": 3,
    "pending": 2,
    "total_pages": 3500,
    "total_resources": 25000,
    "total_bytes": 524288000,
    "avg_duration": 95.5,
    "success_rate": 90.0,
    "top_domains": { ... },
    "top_cms": { ... }
  },
  "timestamp": 1705824300,
  "api_version": "v1"
}

⚠️ Error Codes

Code Error Description
400 Bad Request Invalid request parameters
401 Unauthorized Missing or invalid API key
404 Not Found Endpoint or resource not found
405 Method Not Allowed HTTP method not supported for this endpoint
429 Too Many Requests Rate limit exceeded
500 Internal Server Error Unexpected server error

Error Response Format

{
  "success": false,
  "message": "Error message",
  "error": {
    "error": "ERROR_CODE",
    "details": "Additional details..."
  },
  "timestamp": 1705824400,
  "api_version": "v1"
}

⏱️ Rate Limiting

API requests are rate-limited per API key. Default limit is 60 requests per minute.

Tip: When rate limited, the API returns a 429 Too Many Requests response with a Retry-After header indicating seconds until reset.

🔍 Advanced Filters

Use advanced filters to control what gets scraped:

Available Presets

Custom Filters Example

{
  "url": "https://example.com",
  "filters": {
    "presets": ["no_tracking", "no_videos"],
    "exclude_patterns": ["/admin/*", "/wp-admin/*", "*.pdf"],
    "include_patterns": ["/blog/*", "/docs/*"],
    "max_size": "10MB",
    "blocked_extensions": ["zip", "exe", "dmg"]
  }
}
✅ Ready to start? Get your API key from the administrator and start integrating!