Press "Enter" to skip to content

Snippet: Groan… Lots of Data :(

*snippet* I’m currently in the process of rebuilding large sections of my website(s) and need to import a substantial amount of data into the new content management system. By “substantial amount” we’re talking around 2Gb of data(!). However, the data seems very very slightly corrupted (around a quarter of a record every 1million entries) so I have to run another script to correct the corruption and then rerun the parser utility. And I’ll tell you this, even on a 2.4Ghz machine, parsing 2Gb of data takes a looong time. Especially when it fails and you’ve got to restart from scratch.

Of course, once it’s parsed, I’ve then got to import it all into a MySQL database (I’m having it write the SQL statements instead of directly importing it for speed reasons), and then index it (which will take ages: believe me, once you start hitting the half-a-million row point onwards on MySQL it begins to crawl) and then link all the data together and then export it into a suitable format: no way am I going to bog down my server by having it make around 50 database requests per page!

Fingers crossed that I’ll have it all parsed by Sunday…

(That’s why there haven’t been that many blog entries: my machine begins to crawl whilst parsing – and it’s just a command line parsing system anyway: no GUI to slow it down).

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.