集成

将Shifter与以下工具配合使用 Apify

将Shifter的住宅代理和ISP代理接入任意Apify Actor——Crawlee负责队列管理和重试,Shifter负责提供住宅IP。ProxyConfiguration原生支持Shifter URL。

快速入门

安装

npm install apify crawlee

基本用法

// main.js (an Apify Actor)
import { Actor } from "apify";
import { CheerioCrawler, ProxyConfiguration } from "crawlee";

await Actor.init();

const proxyConfiguration = new ProxyConfiguration({
  proxyUrls: [
    "customer-USERNAME-country-us-sid-123ABC:PASSWORD@p.shifter.io:443",
  ],
});

const crawler = new CheerioCrawler({
  proxyConfiguration,
  async requestHandler({ request, $, log }) {
    log.info(`${request.url} -> ${$("h1").text().trim()}`);
  },
});

await crawler.run(["https://example.com"]);
await Actor.exit();

功能特性

Crawlee原生支持ProxyConfiguration——传入Shifter URL或newUrlFunction
兼容CheerioCrawler、PuppeteerCrawler、PlaywrightCrawler和BasicCrawler
通过Crawlee的会话池实现按会话粘性IP——封禁时自动淘汰过期会话
支持 195+ 个国家的地理定位 — 通过 Actor 输入模式参数化国家
可直接用于Apify Console定时运行、Apify Cloud自动扩缩容及自托管Crawlee
兼容完整Apify SDK——pushData、键值存储、请求队列和数据集持久化

示例

Crawlee + 按会话轮换

Crawlee自动处理封禁和会话过期。使用newUrlFunction为每个会话生成新的Shifter URL——当Crawlee因封禁而终止某个会话时,下一个会话将获得全新的住宅IP。

import { Actor } from "apify";
import { CheerioCrawler, ProxyConfiguration } from "crawlee";

await Actor.init();

const proxyConfiguration = new ProxyConfiguration({
  // Each session asks for a fresh URL — and Crawlee bumps the session
  // on bans, so stale IPs get cycled out automatically.
  newUrlFunction: () => {
    const sid = Math.random().toString(36).slice(2, 10);
    return `customer-USERNAME-country-uk-sid-${sid}-ttl-300:PASSWORD@p.shifter.io:443`;
  },
});

const crawler = new CheerioCrawler({
  proxyConfiguration,
  useSessionPool: true,
  persistCookiesPerSession: true,
  maxConcurrency: 8,

  async requestHandler({ request, $, enqueueLinks, log, session }) {
    log.info(`Session ${session.id} -> ${request.url}`);

    $(".product-card").each((_, el) => {
      // Push to dataset (auto-persisted by Apify)
      Actor.pushData({
        url:   request.url,
        title: $(el).find("h2").text().trim(),
        price: $(el).find(".price").text().trim(),
      });
    });

    await enqueueLinks({ selector: "a.next-page", strategy: "same-domain" });
  },

  failedRequestHandler({ request, log }) {
    log.error(`Failed after retries: ${request.url}`);
  },
});

await crawler.run(["https://example.co.uk/products"]);
await Actor.exit();

PuppeteerCrawler(JS密集型目标)

当目标需要真实浏览器时,将CheerioCrawler替换为PuppeteerCrawler。相同的ProxyConfiguration即可接入——Crawlee将Shifter URL传入Puppeteer的启动参数。

import { Actor } from "apify";
import { PuppeteerCrawler, ProxyConfiguration } from "crawlee";

await Actor.init();

const proxyConfiguration = new ProxyConfiguration({
  newUrlFunction: () => {
    const sid = Math.random().toString(36).slice(2, 10);
    return `customer-USERNAME-country-de-city-berlin-sid-${sid}:PASSWORD@p.shifter.io:443`;
  },
});

const crawler = new PuppeteerCrawler({
  proxyConfiguration,
  useSessionPool: true,
  launchContext: {
    launchOptions: { headless: "new" },
  },
  maxConcurrency: 4,

  async requestHandler({ request, page, log }) {
    log.info(`Visiting ${request.url}`);
    await page.waitForSelector(".product");

    const products = await page.$$eval(".product", (els) =>
      els.map((el) => ({
        title: el.querySelector("h2")?.textContent?.trim(),
        price: el.querySelector(".price")?.textContent?.trim(),
      })),
    );

    await Actor.pushData(products);
  },
});

await crawler.run(["https://example.de/categories/electronics"]);
await Actor.exit();

带输入Schema的按国家Actor

将country作为Apify Actor输入项暴露出来。Actor在启动时读取该值,并为对应的住宅IP池配置Shifter。同一份Actor代码适用于所有地区。

// .actor/input_schema.json
{
  "title": "Localized Scraper Input",
  "type": "object",
  "schemaVersion": 1,
  "properties": {
    "startUrl": { "type": "string", "title": "Start URL", "default": "https://example.com" },
    "country":  { "type": "string", "title": "Country",   "enum": ["us","uk","de","jp","fr","br"], "default": "us" },
    "maxPages": { "type": "integer", "title": "Max Pages", "default": 100, "minimum": 1, "maximum": 5000 }
  },
  "required": ["startUrl", "country"]
}

// main.js
import { Actor } from "apify";
import { CheerioCrawler, ProxyConfiguration } from "crawlee";

await Actor.init();

const { startUrl, country, maxPages } = await Actor.getInput();

const proxyConfiguration = new ProxyConfiguration({
  newUrlFunction: () => {
    const sid = Math.random().toString(36).slice(2, 10);
    return `customer-USERNAME-country-${country}-sid-${sid}-ttl-300:PASSWORD@p.shifter.io:443`;
  },
});

const crawler = new CheerioCrawler({
  proxyConfiguration,
  maxRequestsPerCrawl: maxPages,
  useSessionPool: true,

  async requestHandler({ request, $, enqueueLinks }) {
    await Actor.pushData({
      country,
      url:   request.url,
      title: $("title").text().trim(),
      h1:    $("h1").first().text().trim(),
    });
    await enqueueLinks({ strategy: "same-domain" });
  },
});

await crawler.run([startUrl]);
await Actor.exit();

Crawlee之外的Apify SDK(自定义逻辑)

如果Crawlee不适合您的场景,您仍可从ProxyConfiguration获取Shifter代理URL,并与任意HTTP客户端配合使用。会话、重试和持久化功能均正常工作。

import { Actor } from "apify";
import { ProxyConfiguration } from "crawlee";
import { gotScraping } from "got-scraping";

await Actor.init();

const proxyConfiguration = new ProxyConfiguration({
  newUrlFunction: () => {
    const sid = Math.random().toString(36).slice(2, 10);
    return `customer-USERNAME-country-fr-sid-${sid}:PASSWORD@p.shifter.io:443`;
  },
});

// Pull a fresh proxy URL per logical task
async function fetchTarget(url) {
  const proxyUrl = await proxyConfiguration.newUrl();

  const html = await gotScraping({
    url,
    proxyUrl,
    headerGeneratorOptions: {
      browsers: [{ name: "chrome", minVersion: 120 }],
      locales:  ["en-US"],
    },
  }).text();

  return html;
}

const urls = [
  "https://example.fr/api/v1/products?page=1",
  "https://example.fr/api/v1/products?page=2",
  // ...
];

for (const url of urls) {
  try {
    const html = await fetchTarget(url);
    await Actor.pushData({ url, length: html.length });
  } catch (err) {
    console.error(`Failed ${url}: ${err.message}`);
  }
}

await Actor.exit();
常见问题

常见问题

关于将 Shifter 与 Apify 搭配使用的常见问题。

使用Crawlee的ProxyConfiguration类,传入`proxyUrls`数组(一个或多个Shifter URL)或返回每个会话新Shifter URL的`newUrlFunction`。将配置传递给爬虫——每个请求将自动通过Shifter路由。

立即开始

开始将Shifter与以下工具配合使用 Apify

通过 Shifter 的 205M+ 住宅和 ISP 代理运行 Apify Actors。原生 Crawlee ProxyConfiguration、会话级粘性 IP,以及完整的 Cheerio / Puppeteer / Playwright 爬虫支持。

免费试用 Shifter几分钟内完成设置,随时可取消。