Importing Shopify product info programmatically

Sometimes one would scrape eCommerce product data for the purpose of reselling these products. For example, a retail ecommerce company might be sourcing their products from a distributor that does not provide an easy way to integrate into Shopify store. This problem can be solved through web scraping. Once data is scraped, it can be imported into Shopify store.

One way to do that is to wrangle the product dataset into file(s) that heed the Shopify product CSV schema and import it via Shopify store admin dashboard. However this entails human intervention and may be considered to be an inferior solution when compared to something fully automatic.

Good news is that Shopify does provide a public API for store administration tasks, including product importing and updating. We can integrate this API with our web scraping solution.

We will be showing how the REST version of Shopify Admin API can be used to upload pre-scraped product dataset. But first, we need to get some API credentials that we can use. The following assumes you having a development store that you can play with.

We need to create something called Custom App. In your Shopify admin dashboard, go to Apps tab and press “Develop apps” button on the top. In the next page, press “Create an app” button on the top. The modal view will ask for app name. Once app is created, we need to configure API scopes. Press “Configure Admin API scopes” button in the new page. Bunch of checkboxes will be presented to you. For the purpose of this task, we need to choose read_products and write_products. Now press the “Save” button on the bottom of the page. Now press the “Install app” button on the top of the page. You will be given a chance to see and copy the Admin API access token. Make sure you save it somewhere else before clicking away from the page. Don’t worry about API key and secret - we won’t be needing them here.

Let us get familiar with the Shopify product data model. There are three entities of interest to us: product, product variant and product image. Product is the parent object that may have one or more variant associated to it and some images as well. Furthemore, product variants are allowed to make secondary association to product images (e.g. product variant of specific color will point to corresponding image). For the exact details, see the following pages on Shopify developer portal:

We will be importing product CSV file that has the following columns:

  • title - product title.
  • asin - Amazon product ID.
  • price - price in GBP, with pound character included. Sometimes range of prices.
  • brand - name of brand that sometimes need a little cleanup.
  • product_details - product description.
  • breadcrumbs - slash-separated list of breadcrumb strings.
  • images_list - JSON array containing URLs to product images

Let us discuss the following code that implements the importing.

#!/usr/bin/python3

import csv
import json
import time
from pprint import pprint

import requests

API_TOKEN = None


def check_if_present(store_name, title, asin):
    headers = {"Content-Type": "application/json", "X-Shopify-Access-Token": API_TOKEN}

    api_url = "https://{}.myshopify.com/admin/api/2022-04/products.json".format(
        store_name
    )

    params = {"title": title}

    resp = requests.get(api_url, headers=headers, params=params, timeout=10)

    json_dict = resp.json()

    if json_dict.get("products") is None or len(json_dict.get("products")) == 0:
        return False

    for prod_dict in json_dict.get("products"):
        variant = prod_dict.get("variants")[0]
        if variant.get("sku") == asin:
            return True

    return False


def import_product(store_name, product):
    headers = {"Content-Type": "application/json", "X-Shopify-Access-Token": API_TOKEN}

    api_url = "https://{}.myshopify.com/admin/api/2022-04/products.json".format(
        store_name
    )

    payload = {
        "product": {
            "title": product.get("title"),
            "body_html": product.get("product_details"),
            "vendor": product.get("brand", "")
            .replace("Visit the ", "")
            .replace(" Store", ""),
            "tags": product.get("breadcrumbs", "").split("/"),
            "published": True,
            "options": [
                {"name": "Size"},
            ],
            "product_type": "Shoes",
            "images": [],
            "variants": [
                {
                    "sku": product.get("asin"),
                    "price": product.get("price").split(" - ")[-1].replace("£", ""),
                    "requires_shipping": True,
                }
            ],
        }
    }

    i = 1

    for img_url in json.loads(product.get("images_list")):
        payload["product"]["images"].append(
            {
                "src": img_url,
                "position": i,
            }
        )

        i += 1

    pprint(payload)

    resp = requests.post(api_url, headers=headers, json=payload, timeout=10)

    if resp.status_code != 201:
        print("Error: Import to shopify failed!")
        print(resp.text)


def main():
    global API_TOKEN

    store_name = input("Enter store name (part of subdomain before .myshopify.com): ")

    t_f = open("api_token.txt", "r")
    API_TOKEN = t_f.read()
    t_f.close()

    in_f = open("amazon_uk_shoes_dataset.csv", "r")

    csv_reader = csv.DictReader(in_f)

    for row in csv_reader:
        if row.get("price") == "":
            continue

        if not check_if_present(store_name, row.get("title"), row.get("asin")):
            import_product(store_name, row)
        else:
            print(
                "{} ({}) already present - skipping...".format(
                    row.get("title"), row.get("asin")
                )
            )

        time.sleep(0.5)

    in_f.close()


if __name__ == "__main__":
    main()

In the main() function we read CSV file line by line. We skips rows that do not have price defined.

Next, we call check_if_present() function that will check if product matching title from CSV with variant SKU matching asin field is already present on the store. This is necessary to prevent duplicate product listings from appearing on the store when script is launched multiple times. If pre-existing product listing is not found, we call the import_product() function that will generate JSON payload describing the product in question (with some data cleanup) and launch a HTTP POST to the corresponding API endpoint. In both cases we must set X-Shopify-Access-Token with API token we got from Shopify admin dashboard to authorize the request. Since this is POST request, we must check that response status code is 201 in case it’s all good.

You may notice that time.sleep() is called in the code to make it slower. That’s because of Shopify rate limits. We must not go faster than two requests per seconds. That is indeed quite slow. Importing non-trivial number of products will take some time.

Shopify also has some client libraries for using it’s API in a few programming languages, including Python. Shopify Python SDK is based on pyactiveresource library that attempts to extend the concept of Object-Relational Mapping to RESTful APIs. At surface, this seems like an improvement because now we have OOP API that would allow dealing with API entities in cleaner way. Instead of constructing that JSON payload dictionary we did before we could instantiate Product object and set a property for each field. However, this library is not without its own problems. Since it calls the very same REST API we still need to deal with slowness of data importing. Furthermore, it does not have a proper API session object and relies on global state fairly heavily. Currently the documentation for this library is outdated and discusses now-deprecated way of setting up Private App to access the API instead of the modern way we used in the code above.

Shopify also has GraphQL API with bulk import process that entails converting data to JSONL file, uploading it to S3 bucket and calling GraphQL API to get it processed. But that’s a topic for another day.

Trickster Dev

Code level discussion of web scraping, gray hat automation, growth hacking and bounty hunting


By rl1987, 2022-06-04