Extraction rules
Skip BeautifulSoup, Cheerio, and regex. Tell Shifter which CSS selectors map to which JSON fields, and get clean JSON back.
Basic syntax
Section titled “Basic syntax”Pass extract_rules as a URL-encoded JSON object where each key is an output field and each value describes how to extract it.
Minimal example:
{ "title": { "selector": "h1", "output": "text" }}URL-encoded and sent:
curl "https://scrape.shifter.io/v1?api_key=YOUR_API_KEY&url=https://example.com&extract_rules=%7B%22title%22%3A%7B%22selector%22%3A%22h1%22%2C%22output%22%3A%22text%22%7D%7D"
# {"title": "Example Domain"}Output types
Section titled “Output types”output | Returns | Example |
|---|---|---|
text | Text content of the element | "Example Domain" |
html | Inner HTML | "<strong>Example</strong> Domain" |
@<attr> | Attribute value | "@href" → "https://example.com/" |
Multiple fields
Section titled “Multiple fields”Add keys to the rules object:
{ "title": { "selector": "h1", "output": "text" }, "description": { "selector": "meta[name=description]", "output": "@content" }, "canonical": { "selector": "link[rel=canonical]", "output": "@href" }}Response:
{ "title": "Example Domain", "description": "The example domain...", "canonical": "https://example.com/"}Arrays
Section titled “Arrays”For lists (search results, product cards, table rows), wrap the rule in a parent that specifies type: "list" and item:
{ "products": { "selector": "div.product", "type": "list", "item": { "name": { "selector": "h2", "output": "text" }, "price": { "selector": ".price", "output": "text" }, "link": { "selector": "a.title", "output": "@href" } } }}Response:
{ "products": [ { "name": "Item A", "price": "$19.99", "link": "/item-a" }, { "name": "Item B", "price": "$24.50", "link": "/item-b" } ]}JSON auto-parser
Section titled “JSON auto-parser”For endpoints that already return JSON (most REST APIs), add auto_parser=1 to parse the body and return it as-is:
curl "https://scrape.shifter.io/v1?api_key=YOUR_API_KEY&url=https://api.example.com/products&auto_parser=1"- Test selectors with the browser dev console first:
document.querySelector(...). - Escape URL-encoded JSON properly. Most HTTP clients do this automatically when you pass the rules as a parameter object.
- If a field is missing on the page, it returns
nullrather than failing the request.
- Sessions & proxies, scrape across paginated flows.
- Advanced, headers, cookies, POST bodies.