Shifter verwenden mit Scrapy
Binde Shifters Residential- und ISP-Proxys über ein kleines Downloader-Middleware in jeden Scrapy-Spider ein. Rotation pro Anfrage, Sticky Sessions und Spider-spezifisches Geo-Targeting — alles in 20 Zeilen Python.
Schnellstart
Installieren
pip install scrapy Grundlegende Nutzung
# settings.py
DOWNLOADER_MIDDLEWARES = {
"myproject.middlewares.ShifterProxyMiddleware": 350,
"scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": 360,
}
# middlewares.py
class ShifterProxyMiddleware:
PROXY = (
"customer-USERNAME-country-us-sid-123ABC:"
"PASSWORD@p.shifter.io:443"
)
def process_request(self, request, spider):
request.meta["proxy"] = self.PROXY
# Run as usual:
# scrapy crawl my_spider Funktionen
Beispiele
Downloader-Middleware (Sticky Session)
Der Standardweg, einen Proxy in Scrapy einzubinden. Füge dem Benutzernamen eine `sid` hinzu, und jede Anfrage des Spiders teilt sich eine Residential-IP. Füge `country-uk-city-london` für Geo-Targeting hinzu.
# myproject/middlewares.py
import secrets
class ShifterProxyMiddleware:
"""Routes every Scrapy request through Shifter's residential pool."""
def __init__(self, country="us", city=None, ttl=300):
self.sid = secrets.token_hex(4)
parts = [
"customer-USERNAME",
f"country-{country}",
]
if city:
parts.append(f"city-{city}")
parts.append(f"sid-{self.sid}")
parts.append(f"ttl-{ttl}")
username = "-".join(parts)
self.proxy_url = f"http://{username}:PASSWORD@p.shifter.io:443"
@classmethod
def from_crawler(cls, crawler):
s = crawler.settings
return cls(
country=s.get("SHIFTER_COUNTRY", "us"),
city=s.get("SHIFTER_CITY"),
ttl=s.getint("SHIFTER_TTL", 300),
)
def process_request(self, request, spider):
request.meta["proxy"] = self.proxy_url
# myproject/settings.py
DOWNLOADER_MIDDLEWARES = {
"myproject.middlewares.ShifterProxyMiddleware": 350,
"scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": 360,
}
SHIFTER_COUNTRY = "uk"
SHIFTER_CITY = "london" Rotation pro Anfrage
Setze keine sid — lass das Gateway bei jeder Anfrage die IP rotieren. Nützlich für umfangreiches Scraping paginierter Ziele, bei denen jede Seite wie ein anderer Besucher aussehen soll.
# myproject/middlewares.py
import secrets
class ShifterRotatingMiddleware:
"""Rotates the residential IP on every Scrapy request."""
PROXY_HOST = "p.shifter.io:443"
def process_request(self, request, spider):
# Unique sid per request -> guaranteed new IP for every fetch
unique_sid = secrets.token_hex(6)
username = (
f"customer-USERNAME-country-{spider.country}"
f"-sid-{unique_sid}"
)
request.meta["proxy"] = (
f"http://{username}:PASSWORD@{self.PROXY_HOST}"
)
# myproject/spiders/products.py
import scrapy
class ProductsSpider(scrapy.Spider):
name = "products"
country = "us" # consumed by the middleware
custom_settings = {
"DOWNLOADER_MIDDLEWARES": {
"myproject.middlewares.ShifterRotatingMiddleware": 350,
},
"CONCURRENT_REQUESTS": 32,
}
start_urls = [
f"https://example.com/products?page={i}" for i in range(1, 100)
]
def parse(self, response):
for card in response.css(".product-card"):
yield {
"title": card.css("h2::text").get(),
"price": card.css(".price::text").get(),
"url": response.urljoin(card.css("a::attr(href)").get()),
} Länder-spezifische Spiders (paralleles Geo-Scraping)
Erstelle eine Spider-Klasse und parametrisiere das Land zur Laufzeit. Führe mehrere Instanzen parallel aus — jede mit eigenem Residential-IP-Pool.
# scrapy crawl localized -a country=uk
# scrapy crawl localized -a country=de
# scrapy crawl localized -a country=jp
import scrapy
class LocalizedSpider(scrapy.Spider):
name = "localized"
def __init__(self, country="us", *args, **kwargs):
super().__init__(*args, **kwargs)
self.country = country
self.start_urls = [
f"https://www.example.com/{country}/products",
]
def start_requests(self):
proxy = (
f"customer-USERNAME-country-{self.country}-sid-{self.country}-batch:"
f"PASSWORD@p.shifter.io:443"
)
for url in self.start_urls:
yield scrapy.Request(url, meta={"proxy": proxy}, callback=self.parse)
def parse(self, response):
for product in response.css(".product"):
yield {
"country": self.country,
"title": product.css("h2::text").get(),
"price": product.css(".price::text").get(),
} Scrapy + scrapy-playwright (JavaScript-gerenderte Seiten)
Wenn das Ziel JavaScript benötigt, ersetze den Downloader durch scrapy-playwright. Übergib den Proxy in den Launch-Optionen — Scrapy übernimmt weiterhin Scheduling und Pipelines.
# pip install scrapy-playwright
# playwright install chromium
# settings.py
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
PLAYWRIGHT_LAUNCH_OPTIONS = {
"headless": True,
"proxy": {
"server": "http://p.shifter.io:443",
"username": "customer-USERNAME-country-fr-sid-789GHI",
"password": "PASSWORD",
},
}
# spider.py
import scrapy
class JsHeavySpider(scrapy.Spider):
name = "js_heavy"
start_urls = ["https://app.example.com/dashboard"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url,
meta={"playwright": True, "playwright_include_page": True},
callback=self.parse,
)
async def parse(self, response):
page = response.meta["playwright_page"]
await page.wait_for_selector(".widget")
widgets = await page.query_selector_all(".widget")
for w in widgets:
yield {"label": await w.text_content()}
await page.close() Häufig gefragt FAQ-Fragen
Häufige Fragen zur Verwendung von Shifter mit Scrapy.
Schreibe eine kleine Downloader-Middleware, die `request.meta['proxy']` auf deine Shifter-URL setzt, und registriere sie in DOWNLOADER_MIDDLEWARES mit einer Priorität unter 750 (damit sie vor HttpProxyMiddleware ausgeführt wird). Zwanzig Zeilen Python — kein SDK erforderlich.
Shifter verwenden mit Scrapy
Binden Sie Shifters 205M+ Residential- und ISP-Proxys über ein 20-zeiliges Middleware in Ihre Scrapy-Spider ein. Rotation pro Anfrage, Sticky Sessions und vollständige scrapy-playwright-Unterstützung.