Programmers: SAX-XML parsing of DMOZ ODP RDF
I am seeking a programmer (or team) to build an effective solution that will parse the DMOZ Open Directory data dump and save it to a SQL file. According to the project's site, the dump is loosely RDF-based. I am open to any working solution, but PHP or Java-based solutions will be receiving first consideration. The intended platform is (simply) my localhost, a Windows-based LAMP config (XAMPP).
It seems from the research that I have conducted, that SAX is the preferred method given the size of the dump (1GB+), but that is my only reason for requesting SAX; if you know a better way, be my guest. The goal is a quick-and-efficient script to parse the entire dump and then output it to a SQL command file.
Here's what is needed
* The database info and credentials must be configurable through an independent settings file as this script may be used with multiple DBs in the future and I do not want to look line-by-line to change credentials
* The filename to be parsed should be a variable, this variable may be saved in the same external "settings" file as the database info
* It must be capable of importing all information contained within the ODP dump, I will filter extraneous data on my own
* Data should be exported to a file which can be imported into SQL; I will be using SQLyog to import to DB
Here's what I do NOT need : )
* Convoluted solutions - I want simple and efficient processing, please
* HTML output - your output goal is limited to the SQL file which may be imported
* An interface - I am comfortable at a code-level I just don't have the time or knowledge to write this specific type of parser, so I do not need a interface beyond the program ECHOing what it is doing at the moment
* Options - once again, simple and efficient is the goal and I will do my own data filtering so there should not be any options to configure beyond the filename to parse
As part of the terms of this project, the completed source needs to be released without restriction so that future modifications may be made as needed. Thanks for bidding! : )