Overview¶
SESE’s Price Scraper is a Python application that was built to scrape competitor’s website for price information. SESE uses this information to determine the degree of our yearly price increases for each variety of seed we have.
When run, the application reads a tab-delimited file containing the SKU#, name,
category and organic status of each of our products. It uses this information
to create a Product
object for each SESE variety.
Each Product
object creates a new object for each website to
scrape. These objects are from classes that sub-class the
BaseSite
abstract class, such as the
BotanicalInterests
class. The website
classes implement the specific functionality for scraping a single website for
a single product.
After every Product
object has created all it’s website
objects, the application runs through each Product
object,
creating a tab-delimited file as the output.
In order to add additional websites for the application to scrape, a new
website class should be created, sub-classing the BaseSite
class and overriding it’s abstract methods. Next, the settings
module
should be edited so that the COMPANY_HEADER_ORDER
setting
contains the abbreviation of the new website, and the
COMPANIES_TO_PROCESS
setting contains the path to the
website’s implementation class, for example
sites.botanical_interests.BotanicalInterests
.