Fixed Price: $150 or less
| Posted: 10h, 22m ago | Ends: 1d, 13h |
I am working on a research study to analyze the clickstream data of a subscription-based website. I need to work out the differences in behavior between people who pay for a subscription vs people who do not pay. We need to go through around 820 individual Tab Separated Variable (TSV) files that together total around 3Tb. There are hundreds of millions of rows and 550+ fields. All files have the same format. Each file represents one full day of data. In total, there are 820 consecutive days of data. We have the most powerful compute optimized virtual machine instance that Amazon Web Services offers (c3.8xlarge). This includes 32 virtual CPUs, 10Tb of SDD storage, and 60Gb of memory. It is running Ubuntu Server. You will be granted access to AWS in order perform the work. We need to do four main things:  Develop a script to import the data from the files into a master table  Derive additional fields using existing fields contained in the master table and add those new fields...