RESTful API v1.0.11 - Complete integration guide
Welcome to the Web Scraper Online API documentation. This API allows you to programmatically scrape websites, manage jobs, and retrieve statistics.
https://cloner.gtechgroup.it/api/v1
All API requests require authentication using an API key. Include your API key in one of two ways:
X-API-Key: YOUR_API_KEY?api_key=YOUR_API_KEYGet API information and available endpoints.
curl -X GET "https://cloner.gtechgroup.it/api/v1/info" \
-H "X-API-Key: YOUR_API_KEY"
{
"success": true,
"message": "API information",
"data": {
"version": "1.0.11",
"name": "Web Scraper Online API",
"endpoints": { ... },
"rate_limit": "60 req/min"
},
"timestamp": 1705824000,
"api_version": "v1"
}
Create a new scraping job (async mode).
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | The URL to scrape |
depth |
integer | No | Crawling depth (0-5, default: 2) |
max_pages |
integer | No | Maximum pages to scrape (default: 100) |
html_only |
boolean | No | Download only HTML files (default: false) |
async |
boolean | No | Async mode (default: true) |
filters |
object | No | Advanced filters (presets, patterns, etc.) |
curl -X POST "https://cloner.gtechgroup.it/api/v1/scrape" \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"depth": 2,
"max_pages": 50,
"html_only": false,
"async": true,
"filters": {
"presets": ["no_tracking"],
"exclude_patterns": ["/admin", "/login"]
}
}'
{
"success": true,
"message": "Job created",
"data": {
"job_id": "job_20250120123045_a1b2c3d4",
"status": "pending",
"status_url": "https://cloner.gtechgroup.it/api/v1/jobs/job_20250120123045_a1b2c3d4",
"message": "Job queued for processing. Check status_url for updates."
},
"timestamp": 1705824000,
"api_version": "v1"
}
List all jobs with optional filtering.
| Parameter | Type | Description |
|---|---|---|
limit |
integer | Results per page (default: 100) |
offset |
integer | Pagination offset (default: 0) |
status |
string | Filter by status (pending, running, completed, failed) |
url |
string | Filter by URL pattern |
curl -X GET "https://cloner.gtechgroup.it/api/v1/jobs?limit=10&status=completed" \
-H "X-API-Key: YOUR_API_KEY"
{
"success": true,
"message": "Jobs retrieved",
"data": {
"jobs": [ ... ],
"count": 10,
"limit": 10,
"offset": 0
},
"timestamp": 1705824000,
"api_version": "v1"
}
Get details of a specific job.
curl -X GET "https://cloner.gtechgroup.it/api/v1/jobs/job_20250120123045_a1b2c3d4" \
-H "X-API-Key: YOUR_API_KEY"
{
"success": true,
"message": "Job retrieved",
"data": {
"id": "job_20250120123045_a1b2c3d4",
"url": "https://example.com",
"status": "completed",
"created_at": 1705824000,
"started_at": 1705824010,
"completed_at": 1705824125,
"duration": 115,
"result": {
"zip_file": "example_com_20250120123045.zip",
"zip_size": 5242880
},
"stats": {
"pages_downloaded": 25,
"resources_downloaded": 150,
"bytes_downloaded": 5242880,
"duplicates_skipped": 3,
"errors": 0
}
},
"timestamp": 1705824125,
"api_version": "v1"
}
Delete a specific job.
curl -X DELETE "https://cloner.gtechgroup.it/api/v1/jobs/job_20250120123045_a1b2c3d4" \
-H "X-API-Key: YOUR_API_KEY"
{
"success": true,
"message": "Job deleted",
"data": {
"job_id": "job_20250120123045_a1b2c3d4"
},
"timestamp": 1705824200,
"api_version": "v1"
}
Get statistics for a given period.
| Parameter | Values | Description |
|---|---|---|
period |
today, week, month, year, all | Time period for statistics (default: all) |
curl -X GET "https://cloner.gtechgroup.it/api/v1/stats?period=month" \
-H "X-API-Key: YOUR_API_KEY"
{
"success": true,
"message": "Statistics retrieved",
"data": {
"total_jobs": 150,
"completed": 135,
"failed": 10,
"running": 3,
"pending": 2,
"total_pages": 3500,
"total_resources": 25000,
"total_bytes": 524288000,
"avg_duration": 95.5,
"success_rate": 90.0,
"top_domains": { ... },
"top_cms": { ... }
},
"timestamp": 1705824300,
"api_version": "v1"
}
| Code | Error | Description |
|---|---|---|
| 400 | Bad Request | Invalid request parameters |
| 401 | Unauthorized | Missing or invalid API key |
| 404 | Not Found | Endpoint or resource not found |
| 405 | Method Not Allowed | HTTP method not supported for this endpoint |
| 429 | Too Many Requests | Rate limit exceeded |
| 500 | Internal Server Error | Unexpected server error |
{
"success": false,
"message": "Error message",
"error": {
"error": "ERROR_CODE",
"details": "Additional details..."
},
"timestamp": 1705824400,
"api_version": "v1"
}
API requests are rate-limited per API key. Default limit is 60 requests per minute.
429 Too Many Requests response with a Retry-After header indicating seconds until reset.
Use advanced filters to control what gets scraped:
html_only - Download only HTML filesimages_only - Download only imagesno_images - Exclude all imagesno_videos - Exclude all videosno_media - Exclude images, videos, and audioessential_only - Only HTML, CSS, JS up to 5MBno_tracking - Exclude analytics and tracking scripts{
"url": "https://example.com",
"filters": {
"presets": ["no_tracking", "no_videos"],
"exclude_patterns": ["/admin/*", "/wp-admin/*", "*.pdf"],
"include_patterns": ["/blog/*", "/docs/*"],
"max_size": "10MB",
"blocked_extensions": ["zip", "exe", "dmg"]
}
}