The client has made the following changes to the job.
Client prefers freelancers from:
You are still able to submit a proposal for this job.
The client prefers freelancers from
a different location.
The task requires the creation source code for a program that can extract information from the very large crawled snapshot of the Web posted on Amazon S3's service.
If you have not worked with S3 buckets before then this will be a good opportunity.
The program would take in input of a domain, run on an EC2 instance, and would produce a tab-separated text files with three columns: SourceURL\tAnchorText\tTargetURL where the TargetURL points to the domain of the input.
The domains that I would like to test are amazon.com, overstock.com, and dealerdirectparts.com. I.e. provide three files for these three domains. These will likely be large files so ideally you can use S3 directly to provide them.
The source code is ideally Java, Python or PHP
Sign in or Register to see more