SESE’s Price Scraper is a Python application that was built to scrape competitor’s website for price information. SESE uses this information to determine the degree of our yearly price increases for each variety of seed we have.
When run, the application reads a tab-delimited file containing the SKU#, name,
category and organic status of each of our products. It uses this information
to create a
Product object for each SESE variety.
Product object creates a new object for each website to
scrape. These objects are from classes that sub-class the
BaseSite abstract class, such as the
BotanicalInterests class. The website
classes implement the specific functionality for scraping a single website for
a single product.
Product object has created all it’s website
objects, the application runs through each
creating a tab-delimited file as the output.
In order to add additional websites for the application to scrape, a new
website class should be created, sub-classing the
class and overriding it’s abstract methods. Next, the
should be edited so that the
contains the abbreviation of the new website, and the
COMPANIES_TO_PROCESS setting contains the path to the
website’s implementation class, for example