Send one request to an Amazon scraping API and receive normalized JSON for products, search results, offers, reviews, Q&A, and sellers. This approach removes proxy rotation, CAPTCHA solving, and headless-browser maintenance while giving you consistent fields you can store or analyze immediately.
Typical payloads return item titles, ASINs, prices, availability, images, star ratings, review counts, offers (buy box and third-party), and seller metadata. Most providers support country- or ZIP-level geolocation so you can fetch localized content per marketplace.
Before you start
Gather these basics and note key terms so your requests return the data you expect.
- API credentials from a scraping provider (e.g., Oxylabs, ScraperAPI, Scrapingdog).
- Python 3.11 or curl installed for the examples.
- Know the Amazon marketplace domain (for example,
com
,de
,nl
). - Identify the target
ASIN
or keyword for searches. - Compliance note: scraping publicly visible pages is generally lawful, but sites can restrict automated access in their terms and may throttle or block. Avoid logging into seller accounts for scraping, respect rate limits, and consider separate infrastructure to prevent account risk.
Method 1 — Use a structured Amazon Scraping API (Oxylabs)
This method returns parsed JSON for product, search, offers, reviews, Q&A, best sellers, and seller pages, and supports geotargeting, optional JavaScript rendering, webhooks, and scheduling.
Step 1: Create an API user in your provider’s dashboard.
Step 2: Build a JSON payload for a product request using an ASIN.
{
"source": "amazon_product",
"query": "B0CX23V2ZK",
"domain": "com",
"geo_location": "90210",
"parse": true
}
Step 3: Send a POST request to the real-time endpoint.
import requests
auth_user = "YOUR_USERNAME"
auth_pass = "YOUR_PASSWORD"
payload = {
"source": "amazon_product",
"query": "B0CX23V2ZK",
"domain": "com",
"geo_location": "90210",
"parse": True
}
resp = requests.post(
"https://realtime.oxylabs.io/v1/queries",
auth=(auth_user, auth_pass),
json=payload,
timeout=60
)
print(resp.json())
Step 4: Confirm the response includes expected fields.
Expect keys such as asin
, title
, price
, stock
, images
, rating
, and buy-box details.
Step 5: Switch targets by changing the source
and payload.
# Example: search results with parsing
payload = {
"source": "amazon_search",
"domain": "nl",
"query": "adidas",
"start_page": 1,
"pages": 2,
"parse": True
}
Step 6: Enable JavaScript rendering when elements load dynamically.
Add "render": true
to the payload if key sections require client-side execution.
Step 7: Receive results asynchronously via webhook.
Set callback_url
in the payload to push parsed results to your endpoint without holding the connection open.
Step 8: Automate recurring jobs with the provider’s scheduler.
Create schedules in the dashboard to run product, search, or review collection at intervals and deliver output to cloud storage or your API.
Step 9: Localize results with domain and delivery location.
Use the marketplace TLD via domain
and set geo_location
(country, city, or ZIP where supported) to fetch accurate regional data.
Method 2 — Use ScraperAPI’s structured Amazon endpoints
This option provides ready-to-use JSON for popular Amazon page types with GET requests and includes an output mode for text or markdown if you want LLM-ready content.
Step 1: Copy your api_key
from the ScraperAPI dashboard.
Step 2: Request search results from the structured endpoint.
curl -G "https://api.scraperapi.com/structured/amazon/search" \
--data-urlencode "api_key=YOUR_API_KEY" \
--data-urlencode "query=boxing gloves" \
--data-urlencode "page=1" \
--data-urlencode "country=us"
Step 3: Parse the results
array for fields like ASIN, title, price, rating, position, and image URL.
Step 4: Fetch specific page types using their structured routes.
# Product details by ASIN (example endpoint shape)
curl -G "https://api.scraperapi.com/structured/amazon/product" \
--data-urlencode "api_key=YOUR_API_KEY" \
--data-urlencode "asin=B08SJ3Y3QF" \
--data-urlencode "country=us"
Step 5: Return LLM-ready content by switching output mode.
curl -G "https://api.scraperapi.com/structured/amazon/product" \
--data-urlencode "api_key=YOUR_API_KEY" \
--data-urlencode "asin=B08SJ3Y3QF" \
--data-urlencode "output=markdown"
Step 6: Scale large jobs using the provider’s async mode and webhooks.
Use asynchronous requests to submit many URLs at once, receive notifications, and avoid managing retries and timeouts yourself.
Method 3 — Run automated, no‑code Amazon jobs (ScraperAPI DataPipeline)
For teams that prefer configuration over code, the vendor template schedules complete projects and exports results to CSV/JSON or a webhook.
Step 1: Open the DataPipeline interface and select the Amazon template.
Step 2: Provide inputs as an ASIN list or a list of search queries.
Step 3: Choose where to deliver results.
- Webhook to your application.
- Downloadable JSON or CSV.
- Cloud storage destinations supported by the provider.
Step 4: Schedule the job to run at your preferred frequency.
Step 5: Start the pipeline and monitor completion status from the dashboard.
Method 4 — Use an open-source Amazon scraper CLI (GitHub)
This CLI demonstrates a straightforward way to turn category or department pages into a CSV without writing your own parser.
Step 1: Install Python 3.11 on your system.
Step 2: Clone the repository locally.
git clone https://github.com/oxylabs/amazon-scraper.git
cd amazon-scraper
Step 3: Install project dependencies.
make install
Step 4: Run the scraper with a specific Amazon department URL.
make scrape URL="https://www.amazon.com/s?i=specialty-aps&bbn=16225009011&rh=n%3A%2116225009011%2Cn%3A541966"
Step 5: Open the generated CSV to review item titles, URLs, ASINs, images, and prices.
Method 5 — Build a lightweight scraper in Python (DIY)
This approach gives full control but requires careful handling of anti-bot systems; many teams switch to an API when volume grows or reliability matters.
Step 1: Install libraries for HTTP and HTML parsing.
pip install requests beautifulsoup4
Step 2: Prepare a session with realistic headers and, if needed, a rotating proxy or scraping API gateway.
import requests
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9"
})
# Optional proxy or API gateway
# session.proxies.update({"https": "http://user:pass@proxyhost:port"})
Step 3: Request a search results page for a keyword and check the status code.
from bs4 import BeautifulSoup
params = {"k": "boxing gloves"}
r = session.get("https://www.amazon.com/s", params=params, timeout=30)
r.raise_for_status()
soup = BeautifulSoup(r.text, "html.parser")
Step 4: Parse product tiles to extract name, URL, price, and ASIN.
items = []
for card in soup.select("div[data-asin][data-component-type='s-search-result']"):
asin = card.get("data-asin")
title_el = card.select_one("h2 a span")
href_el = card.select_one("h2 a[href]")
price_whole = card.select_one("span.a-price span.a-offscreen")
if asin and title_el and href_el:
items.append({
"asin": asin,
"title": title_el.get_text(strip=True),
"url": "https://www.amazon.com" + href_el["href"],
"price": price_whole.get_text(strip=True) if price_whole else None
})
print(len(items), "items")
Step 5: Throttle your crawl to avoid being rate-limited.
Introduce randomized delays between requests and consider exponential backoff on non‑200 responses.
Step 6: Save results to disk for downstream analysis.
import json, pathlib
pathlib.Path("data").mkdir(exist_ok=True)
with open("data/search-boxing-gloves.json", "w", encoding="utf-8") as f:
json.dump(items, f, ensure_ascii=False, indent=2)
Scaling, accuracy, and compliance tips
- Use geotargeting when prices or availability vary by region; set the marketplace
domain
and delivery location where supported. - Prefer parsed JSON (
parse=true
) to cut data cleaning time and reduce selector breakage. - Adopt async + webhooks for millions of URLs to avoid connection limits and to offload retries.
- Keep a small golden set of ASINs and compare fields over time to detect parser drift quickly.
- Review site terms and avoid scraping while authenticated to sensitive accounts; separate infrastructure lowers risk.
Pick the method that matches your workload: structured APIs for reliability and speed, no‑code for quick scheduling, or DIY for full control. With a small setup, you can move from manual browsing to consistent JSON in minutes.
Member discussion