Overview

SESE’s Price Scraper is a Python application that was built to scrape competitor’s website for price information. SESE uses this information to determine the degree of our yearly price increases for each variety of seed we have.

When run, the application reads a tab-delimited file containing the SKU#, name, category and organic status of each of our products. It uses this information to create a Product object for each SESE variety.

Each Product object creates a new object for each website to scrape. These objects are from classes that sub-class the BaseSite abstract class, such as the BotanicalInterests class. The website classes implement the specific functionality for scraping a single website for a single product.

After every Product object has created all it’s website objects, the application runs through each Product object, creating a tab-delimited file as the output.

In order to add additional websites for the application to scrape, a new website class should be created, sub-classing the BaseSite class and overriding it’s abstract methods. Next, the settings module should be edited so that the COMPANY_HEADER_ORDER setting contains the abbreviation of the new website, and the COMPANIES_TO_PROCESS setting contains the path to the website’s implementation class, for example sites.botanical_interests.BotanicalInterests.