The client has made the following changes to the job.
Client prefers freelancers from:
You are still able to submit a proposal for this job.
The client prefers freelancers from
a different location.
I'm looking for a skill coder to scrape all available english titles from Project Gutenberg. The contents of Gutenberg.org is available via a very large RDF file. The coder should be able to parse the 200+ mb file and extract the needed data. If the data is not available via this file, then using the reference below, the coder should be able to whip up a scraper to grab it.
After looking at the RDF files, it appears we can use the catalog.rdf file to get the english books, then get the individual RDF files for the remaining data - available here - [obscured] /ebooks/12345.rdf (Replace 12345 with the ebook no. you are interested in.)
The required data will be placed into a CSV file with the following columns:
- Book no. (numeric value gutenberg assigns each book)
- Book title
- Author full name
- Author first name (some parsing required from full name)
- Author last name (some parsing required from full name)
- ePub URL (location ...
Sign in or Register to see more