you are to extract a list of all the unique URLs available from this common big data set:
A corpus of web crawl data composed of over 5 billion web pages.
1.flat csv files with each url per line
2. code,script and docs associated with producing this result on linux.
only bid if you are co...
Skills: big data, java, research