The client has made the following changes to the job.
Client prefers freelancers from:
You are still able to submit a proposal for this job.
The client prefers freelancers from
a different location.
The goal of this project is to create a robust PHP/cURL scraper that can scrape deals and the details of deals from a non-english "daily deal" website (GroupOn clone).
There is only semi-structured data to base the scraper on (ie no RSS-feed or API), so the scraper must do its magic based on recognizing HTML patterns using regular expressions to identify the different data fields.
The scraper script will be set up as a cron job and will run 2-3 times a day. The results of the scraping should be put into a mysql database.
It is important is that the database should contain currently valid deals and no duplicates (old deals should be stored but flagged that they are not valid anymore). So there must be mechanisms within the PHP script to prevent this.
Data fields will typically include the data listed below (if available, if else pad with NULL values). The script should do data validating and transformation ensuring that correct data is inserted into the database.
Sign in or Register to see more