Snippet: Groan… Lots of Data :(

September 5th, 2003 by Richy B. Leave a reply »

*snippet* I’m currently in the process of rebuilding large sections of my website(s) and need to import a substantial amount of data into the new content management system. By “substantial amount” we’re talking around 2Gb of data(!). However, the data seems very very slightly corrupted (around a quarter of a record every 1million entries) so I have to run another script to correct the corruption and then rerun the parser utility. And I’ll tell you this, even on a 2.4Ghz machine, parsing 2Gb of data takes a looong time. Especially when it fails and you’ve got to restart from scratch.

Of course, once it’s parsed, I’ve then got to import it all into a MySQL database (I’m having it write the SQL statements instead of directly importing it for speed reasons), and then index it (which will take ages: believe me, once you start hitting the half-a-million row point onwards on MySQL it begins to crawl) and then link all the data together and then export it into a suitable format: no way am I going to bog down my server by having it make around 50 database requests per page!

Fingers crossed that I’ll have it all parsed by Sunday…

(That’s why there haven’t been that many blog entries: my machine begins to crawl whilst parsing – and it’s just a command line parsing system anyway: no GUI to slow it down).

This post is over 6 months old.

This means that, despite my best intentions, it may no longer be accurate.

This blog holds over 12 years of archived content - during that time, I may have changed my opinion of something, technology will have advanced (and old "best standards" may no longer be the case), my technology "know how" has improved etc etc - it would probably take me a considerable amount of time to update all the archival entries: and defeat the point of keeping them anyway.

Please take these posts for what they are: a brief look into my past, my history, my journey and "caveat emptor".

Leave a Reply

%d bloggers like this: