A Comprehensive Walkthrough: Scraping and Cleansing eBay Product Data in Simple Steps
Originally published as https://www.octoparse.com/blog/scraping-and-cleansing-ebay-data#
Today, we'll show you step by step how to build your own no-code eBay web scraper to collect eBay product data in bulk. Then we'll conduct a quick research using the extracted data (price, sold number, product specification, availabilities, and more) to analyze if one kind of product sells better than the other kinds of product. If you are in the eCommerce business or are looking to run a similar analysis, definitely read on.
To make things easier, we will use Octoparse for data collection, and QuickTable for data cleansing & analysis. Both of these tools are built for non-coders and are quite intuitive to work with.
Table of Contents
Why scrape eBay product data
As a globally popular C2C e-commerce platform, there are more than 1.7 billion product listings on eBay, making it an irreplaceable data source for all kinds of business analysis. It enables you to gather a variety of data, such as product categories, price specifications, prices, availabilities, top sellers, sales, etc. This data can be valuable for finding the perfect marketing niche for your business, setting the right prices for the products you’re selling, monitoring how your customers are liking your products, sneaking around your competitors, and much more.
Define a goal for the study
The purpose of this study is to find out if there is a relationship between the prices, sold numbers, and scales of the toy vehicles that are selling on eBay. For our source data, we'll run a quick search on eBay using the keyword "toy car". What we've got in the search results is the product data we need for the research, especially, this is the URL to the search results: https://www.ebay.com/sch/i.html?_from=R40&_nkw=toy+car&_sacat=0&_pgn=35
Collect data - Scrape product data using Octoparse
First, we'll use Octoparse to scrape the product data from the search results pages.
Go ahead and download Octoparse on your device. Once you have it installed, sign up for a free account. The free plan is good for most web scraping scenarios unless you are on for something of a larger scale.
Step 1: Create a new task with auto-detection
Enter the target URL into the search bar and click the "Start" button. Octoparse will start loading the page right away.
Once the page finishes loading, click "Auto-detect webpage data". Octoparse will scan the page and try its best to "guess" what data you'd like to extract. Learn more about auto-detection.
Tips:
Octoparse also provides pre-built templates for eBay. You can find it on the home page or by searching for "eBay" in the search bar. With the template, you'd only need to enter a few required parameters to start collecting data from eBay right away.
Step 2: Create and modify the workflow
Click "Create a workflow" after the auto-detection is done. The scraping workflow will be auto-generated on the right-hand side. You may click on each step to check if it works properly and add or delete any unwanted steps. The data to be extracted is highlighted in red inside the built-in browser and is also available in the preview section below. You can edit the data fields directly in the preview section whenever needed.
Tips:
If Octoparse fails to detect the desired data on the page, you can turn off the "auto-detect" feature and build a workflow manually by following the tips on the Tips panel. Check out this step-by-step tutorial for how to build an eBay scraper from scratch.
Step 3: Scrape and export the data
Once the workflow is tested and working, click on the Run button. You can choose to run your scraping task in the cloud if you have a premium subscription or on your local device if you don’t have one. It can take a while to scrape the data depending on the exact amount of data you'd like to fetch. You can wait until the task is completed or stop the run when you have just enough data extracted. The data extracted can be exported and downloaded as CSV, Excel, or whatever formats work for you.
Tips:
If you are still having trouble building the perfect crawler at this point, no worries, here's a pre-made task file (.otd) that you can import into Octoparse and run to scrape data from eBay directly. Or, you can use this sample data to proceed with the rest of the post.
Clean data - turn data into usable formats
Now that we have the source data ready, it’s time to dig into the analysis part of it. We will use Quicktable to clean the dataset and create a graphic illustration of how these data points relate to one another.
Please sign up for a free account on QuickTable first.
Step 1: Upload the CSV file and create a new recipe
Launch QuickTable and create a new project named "Toy Car Analysis on eBay". Upload the scraped CSV file to the project as a new dataset and open it. Click the Save Recipe button to create a new recipe.
Step 2: Clean and rename the data columns
You'd want to remove any unwanted data from the dataset and keep only the columns that are relevant to the research. For the sake of this study, we'll keep the columns "Title", "Price", and "sitem__hotness". Rename the columns as needed.
Step 3: Extract the data into usable formats
3.1 Extract price information
The "Price" data we have now is currently in string format so you'd want to convert the data into numerical values before doing any sort of calculations with the data. We'll only keep the lower price for each product. Select the Price column, click the Text button, and then select Substring->Extract numbers. A new column "Price0" will be created to include only the lower price of the price range. You can delete the original data column and rename Price0 to Min Price.
3.2 Extract sold number with a formula
The data we now have for "sitem__hotness" is quite messy. It can be something like "Last one" or any number followed by the word "sold". Apparently, we are only interested in the numerical part of it.
To extract the sold number information from the column "sitem__hotness", we'll use a formula. Click on the Formula button and the formula panel will show up on the right side. Enter a name for the new column and then input the below formula: IF(CONTAINS(`sitem__hotness`, "sold"), `sitem__hotness`, "")
3.3 Extract scale information with regular expression
Notice there isn't a single column for the scale of the toys but this information is very often revealed in the Title column. In this case, we'll need to clean the Title column using a regular expression that extracts the scale info from the product titles. Click "Text->Substring->Regular extraction", then select column "Title" in the Substring panel, enter the regular expression, 1[:/][0-9]{1,3}, then enter the new column name "Scale". The scale information will then be extracted into a new column.
Still, the extracted scale info is displayed with either "/" or ":." and can be tough to work with. To keep the format consistent, we'll replace one of them with the other one. Click Replace, select column Scale, and enter "/" as the Original value and ":" as the New value, then click the "Save" button.
3.4 Calculate the average price and sold numbers
Last but not least, we'll group the products by scale, and find out what the average prices and sold numbers are for toys of various scales. To do that, we need to add a "Group by" step. Click the "Group by" button, select the column Scale in the list box "Group by", and Min Price and Average in the list box "Column calculations". Click Save and you'll get an average price and sold number of products of different scales.
With QuickTable, every step of the data cleaning process will be saved and kept in the Recipe panel. You can review the steps or reuse the formula whenever needed.
Analyze data - visualize data using charts and graphs
Now we have the data cleaned and ready, the easiest way to see if there's any relationship between the toy scales, the prices and the sold number is to create a chart. Click Chart, and define X and Y as shown in the following image. Click the "Save" button and you'll get a chart created right away.
Looking at this chart, it is clear that the 1:32 toy cars are the best-selling ones on eBay, and the 1:18 toy cars are more expensive than toys of any other scale. In conclusion, our study does show that out of all the toy vehicles that are selling on eBay, those of 1:32 scale sell the best. So if you are a seller who wants to boost sales volume, you’d probably want to sell more of those. On the contrary, if you are aiming for higher prices items, you’d probably want to sell more 1:18 toy cars.
Wrap up
This study is only an oversimplified example of how you can conduct research of interest using product data on eBay. There's no doubt about eBay being one of the largest data goldmines for market research, competitor analysis, trend research and much more. Tools like Octoparse and QuickTable make it possible for anyone to compile and analyze the data without the need to code. Of course, the idea is not restricted to eBay but can be extended to any other eCommerce stores like Amazon, AliExpress, and others.
Comments
Post a Comment