Revamp Your Supplier Search on AliExpress with Data Scraping and Cleansing

Originally published as https://www.octoparse.com/blog/scraping-and-cleansing-aliexpress-data

Do you know where to find suppliers for your e-commerce business? You may check out AliExpress. As a top global online e-commerce platform for both retail and wholesale, many sellers on Amazon and Shopify have stocked from this platform.

 

This article will introduce how you can use data to find reliable suppliers on AliExpress.

 

Taking "toy cars" as an example, we will guide you step by step to extract data, including suppliers, products, prices, ratings, and the number sold on AliExpress. Then, filter the scraped data to get the average price and rating for each supplier, as well as the total amount sold. When you have this information in hand, you can easily see which of the suppliers are dependable and suitable for your business.

 

 

Why scrape data from AliExpress

There are more than 100 million listing products on AliExpress. Everything you need in daily life can be found there, such as cell phones, clothes, toys, surveillance devices, etc. You can collect prices, ratings, reviews, and supplier information from it, as much as you can grab from Amazon or Shopify.

 

AliExpress differs from other e-commerce platforms in that it allows you to track product sales when the majority of the other platforms do not. This data can be quite valuable for us to identify market trends and potential suppliers, and finally, develop a deeper understanding of the products and the market.

 

Scrape data by Octoparse

Octoparse is a no-code web crawler that makes data scraping easier and faster without having you to code. If this is your first time using this tool, you can go to Octoparse to download the software and install it on your device, then sign up for a free account to log in.

 

Step 1: Create a new task

Enter the target URL below into the search bar and click "Start". The page will be loaded in an Octoparse built-in browser after seconds.

 

Target URL: https://www.aliexpress.com/wholesale?catId=0&initiative_id=SB_20221122192506&SearchText=toy+car&spm=a2g0o.home.1000002.0&dida=y

 

create a new task

 

Step 2: Select the data we need

2.1 Once the page is loaded, click "Auto-detect webpage data" on the Tips panel. Then Octoparse will look through the whole page and give you a data preview. The data contained in this preview will be highlighted in red on the page.

 

auto-detect webpage data

 

2.2 Preview all the selected data fields at the bottom. By clicking on the name of the data fields, you can check the exact location of each field on the page. Next, remove unnecessary data and rename the data fields as needed. In this case, we'll keep the product URL, price, sold, rating and supplier for further use.

 

Tips:

The auto-detection approach is basically Octoparse predicting the data you might need. If you can't get the desired data fields with auto-detection, you can try selecting the elements manually by following the tips on the Tips Panel.

 

Step 3: Create and modify the workflow

After selecting all the data you want, click "Create Workflow". Then a workflow will show up on the right-hand side. You can gain an overview of the entire scraping process and check if the steps work properly by clicking through them.

 

Step 4: Run the task and export the data

After everything is set to go, click "Run" to start the scraping process. You need to pick whether to run this task on your device or in the Cloud, and in Standard Mode or Boost Mode. Octoparse will take care of the rest for you. Once the run is finished, export the data as a CSV file.

 

run the task

 

Clean and analyze data using QuickTable

Looking at the scraped dataset, it's quite obvious that it's a bit messy now. Before diving into any further analysis, we should do a simple data cleansing first. We'll use a tool called QuickTable, a handy Excel alternative, to handle large and messy datasets.

 

Clean date

Step 1: Upload the data file and create a new recipe

Sign up for a free account on QuickTable, then log in. Create a new project called "Toy Car Analysis on AliExpress", and upload the scraped CSV file into the project as a new dataset. Once the data file is successfully uploaded, you can open it and click the "Save Recipe" button to create a new recipe.

 

Step 2: Keep and rename columns

Keep the columns "Title_URL", "mgxne1", "mgxne2", "_1knf9", "expam", and "ox0kz". Then, rename them as "product link", "price fragment 1", "price fragment 2", "sold", "rating", and "supplier" respectively.

 

Step 3: Recover the price information

You may have noticed that product prices were split improperly in the file. This is because there is a dot between the price numbers on the original website. Use the "Merge" menu in QuickTable, you can merge the two columns in a second and get a new column created with the right prices.

 

merge columns

 

 

Step 4: Extract the number value of the "sold" column

Take another look at the column "sold", and you'll find that the data is in string format and cannot be used for calculations directly. We need to have it extracted into numerical values before we perform any sort of calculations with it. You can use "Text->Substring->Extract number" to extract the numerical values into a new column. Then, remove the original column and rename the newly-created column to "sold".

 

Analyze data

We now have a clean dataset and it's time to dive into the numbers. Looking at the data, we can easily see that a supplier might sell several different products on AliExpress. So we'll first group all the data by suppliers. Click the "Group by" button, and select "supplier" in the list Group by box.

 

To get the average price and rating, as well as the total number sold, perform the calculation steps as shown in the screenshot. Click on Save and you will get the results in three columns.

 

group date by supplier

 

 

Now that you have the average product price of each supplier, you can easily see which suppliers sell products at prices that fit your budget. Additionally, using the total number sold and average rating, you can also pinpoint suppliers that are long-standing and have good credit.

 

Wrap-up
When looking for suppliers online, there are many things to take into account. In this context, price, rating, and sales are all pretty basic information. Additional data like the number of orders, location, and shipping fee need to be considered as well. Utilizing data extraction, you can gather any information that's needed for your analysis and eventually find the right suppliers for your e-commerce business.

Comments

Popular posts from this blog

Revealing 3 Effective Methods to Export HTML Tables to Excel

A Comprehensive Walkthrough: Scraping and Cleansing eBay Product Data in Simple Steps

Access Rakuten's Product Insights with Web Scraping in a Few Simple Steps