Hourly Rate: Not Sure
| Duration: Not Sure
| Posted: Jul 22, 2015 | Ends: 1d, 9h |
**Need finished in 3 days.! Objective: Process pdfs to create counts of key words/key phrases in a database table High Level Process ? Basic System a. Input: Medical journals in PDF b. Step 1: Use Apache Tika to convert PDF's to text files c. Step 2: MapReduce to count words/phrases, output to tab-separated files d. Step 3: Tab-separated files to database tables I want to be clear on the Requirements: 1. Must be able to query finished tables (3. word count table & 4. word combination count table) 2. Table to be organized similar to the attached sheet. 3. Table should include a count of all words that appear 4. Table should include 5 two word combinations: 'positive results' 'negative results' 'positive response' 'negative response' 'in remission' You will need to download the pdf files from an SFTP site and then re-upload the .dat files to my SFTP.
Category: Data Engineering
Fixed Price: $1,000 - $5,000
| Posted: Jul 01, 2015 | Ends: 55d, 16h |
Note : Project based in USA but this is 100% remote, work from anywhere. We are developing a customer intelligence platform that helps software as a service, e-commerce, mobile, and social businesses make smarter decisions using person-based data. Our mission is to help businesses delight their customers. We're currently tracking billions of activities per month. Everyone in our engineering team helps solve interesting problems with scaling, data science, visualization, API architecture, and delivering insights. WHAT WE'RE LOOKING FOR We track a LOT of data and are constantly optimizing getting the data in and out of our systems. We are looking for candidates that love tackling the types of problems that come with this. We use a mix of many different technologies and while our environment changes, here are some of the current technologies we are using: Git Key-Value Stores Distributed systems Python C++ C Ruby Map/Reduce PostgreSQL Help us build solutions that make you and our c...
Category: Other IT & Programming