With 1.7 million freelancers, we'll match you with the perfect talent.
The client has made the following changes to the job.
Client prefers freelancers from:
You are still able to submit a proposal for this job.
The client prefers freelancers from
a different location.
We're looking for a person to help with running scripts to extract text from html files,
then converting the text to a format that is suitable for our needs. Large volumes of text
need to be prepared (billions of words). The approach is to run the input data through
a series of conversion steps with defined locations for human assistance. We do not want the input text loaded into a word processor and edited by hand. That doesn't scale, and is not repeatable. We're looking for someone that is strong in python, perl, C scripting and have extensive linux experience. A plus if your english skills are strong.
Preparing text in a format that is "suitable" amounts to three aspects.
1. Normalization of the text. This includes converting unicode
characters to plain ASCII, sending all text to a single case
(lowercase or uppercase), and removing the majority of punctuation.
Also important is to transform dates and numbers and times and so
forth to mimic how people would tend t...
Sign in or Register to see more