Press "Enter" to skip to content

Month: October 2003

Personal: Oh crap….

Right, any posts I make within the next couple of hours may not make any sense as I’m currently slightly drunk (say practically a whole bottle of Port without much food). Why have I got drunk on a “school night”? Well, it wasn’t intentional – I just had a bit of “middling news” over the weekend which has only just really hit home.

Basically – and without saying too much – certain medical conditions can be hereditary and others are “not yet known if they could be hereditary or not”. I’ve been aware that I’ve been running a “slightly higher than average” case of suffering from a certain condition due to heritage for a couple of years now, but I was informed on Saturday that something that I thought was “one-sided” of my family tree wasn’t to be. A quick test conducted on Saturday afternoon indicated that, yes, I do appear to be already suffering from that condition – even though I’m only 24. (and, yes, I’m being intentionally vague as it’s slightly private but I just want to get it off my chest: the whole reason for this blog).

And if that isn’t enough to contend with, I stand a “higher than average risk” of suffering from a number of other conditions – and to compound it all, there was quite a bit of news coverage on Sunday evening/night about two other males suffering from the “primary condition”. B-gger.

I hope this is just ramblings of a drunkard (my subconscious does actually control my “ability to get drunk” – if I’m happy and with friends, then I cannot get drunk no matter how hard I try: my body just won’t let me drink any more :- whereas if something is playing on my mind, I just gradually get drunk until the matter arises in my main consciousness) and that I’m just suffering from a bout of acute hypochondria – but… *gulp*.

Blarg! Sometimes I hate being me!

Spam: 15k Emails…

Quick notification: if you’ve got an email address for me ending in .demon.co.uk (such as example.demon.co.uk) please change it to just .com (so an email address of joebloggs@example.demon.co.uk will change to joebloggs@example.com ). I’m having to migrate all my email accounts to my main server from my 8year old+ Demon ISP mailbox as the spam levels have just got too high. Yesterday I deleted over 15,000 emails from my Demon ISP account leaving less than 5,000 to delete today: by the time I woke up the levels were back up to 14,700 and growing 🙁

This wouldn’t be too bad apart from the fact Demon’s mail servers are suffering from the load – as Gradwell’s ISP Mail System Performance chart shows, emails sent to demon.co.uk email addresses are taking over 6 hours on average to arrive: and that’s just not good enough for me (plus the lack of “server side filtering” makes things even worse).

I’m still planning on keeping Demon for my ADSL connectivity at the moment (I haven’t got time to currently hunt ADSL providers that can offer over 3Gb/traffic per day), but I no longer trust them with my email.

However – the Gradwell chart shows that one of our (ie my employers) main rivals in one portion of their business – UK2.net – is just as bad as Demon. 5hours+ delays and 13 missing emails.

Snippet: Ok, now I’m freaked out….

Last night, just before I went to bed, I watched an episode of South Park entitled “The Super Best Friends” where Jesus and Pals (Buddha, Lord Krishna, Seaman and a few others) fights the David Blainetologists – and I thought, “Since David Blaine is currently trying to attract publicity in London, it’ll be quite funny if Channel 4 were to show this episode again.”

So, just before I go to bed tonight, I’m just channel surfing and guess which episode of South Park is showing on Channel 4….

Yep – spooky! Especially since earlier they were also showing David Blaine’s “Vertigo” (when he’s just standing on a pole in Times Square) and that just happens to be on my “watch rotation”* as well.

Blogging: Comment Spam

Like practically everybody else in the blogsphere at the moment, I’m suffering quite a bit of comment spam: I had to block my first IP address yesterday – and now I’m blocking the following 7 IP addresses:
209.210.176.19
209.210.176.20
209.210.176.21
209.210.176.22
209.210.176.23
80.50.117.113
64.109.143.166

What sort of spam have I been getting? Well, 80.50.117.113 from “klaus” was a Cheap Viagra, Vicodin, Xanax, Prescription Drugs, and Penis Enlargement Pills spam and 64.109.143.166 from “Alex Dolbayov” was for “Great Site Folks! I have another [?] big t-ts site for you which is really the #1 big t-ts site” (and that’s after I’ve implemented Neil’s change the comment cgi-bin script filename patch type thing) and all the rest were over a variety of posts advertising the same pedo orientated porn site which a number of others have been unfortunately hit with.

Patches I’m going to try include URLs including zipcode are prohibited, Avoid Comment Spam, Comment Spam Quick Fix and I’m certainly going to try Jay Allen’s MT-BlackList once it’s released (I’ve in fact had an email from Kadyellebee of MT-Plugins to let me that it’ll be included in the MT Plugins Manager as soon as it’s released!)

(I may also include the Avoid Duplicate Comments and use some of the advice from Seven quick tips for a spam-free blog) and Comment Queue Script/MT Hack).

Expect a few minor things to change around here once it’s all been implemented (oh, I’ve also installed the Trickle thingy so I can schedule “future blog entries”).

Techy: Regexp’s are slow…

As part of a massive project that I’ve been working on the last few weeks in my spare time (hence the lack of good quality blog posts), I’ve been having to handle a lot of data. By “a lot”, I mean that during one day my computer transferred over 5Gb of data to and from remote systems – and then had over 21million MySQL database queries/updates to process: you try and use a computer for anything else when it’s processing the h-ll out of a lot of files in several different formats (oh, many thanks for Jeremy Zawodny for reminding me of his post regarding MySQL’s 4Gb table limit – I had seen it before, but once I took into account I was hitting the 4Gb limit after just a couple of thousand records (and I’m dealing with a minimum of 3million records), I thought a DB redesign was needed.

Anyway, I’ve just had to process 250files of over 700Mb of TSV (tab separated data) data and extract the information I need. I originally used a Perl regexp (regular expression) to separate the data in each line and then perform a brief comparison on the data (if field X is the letter “B, C or D” and field Z is “K or M” then make an SQL database insert entry, otherwise ignore). Alas, the script was SLOOOOW. After a day of processing, the script had only done around 30files and then crashed my machine for some reason (probably because I was trying to make it go faster by increasing it’s priority).

So I decided to try a write and use the split(/\t/,$_); command to split the data and then use an (if $fieldx=~/[B|C|D]/) style query to compare the data and store it. Perl then shot through the data and 5.4million records later, it’s extracted the 2million items I needed. Speed? 400seconds! Yep, that’s fast!

What have I learnt? Well, if you know you can “trust the data” (i.e. it’s been produced by a computer instead of being typed in by an error prone human-and you know the data hasn’t been corrupted), then use the split command instead of regular expressions – it’s a lot lot faster (just a brief speed test during development showed that regexps were taking over 843 seconds to process the same data that a split took less than 8 seconds to process!).

Anyway, now I’ve extracted the data, I’ve now got to get it in the database and then get the hard bit done where I’ve got to link the new 2million items with the 3million items already in the database whilst querying around 20 remote database systems. Once that’s done and it’s all cross references (which will take ages – I’ll have to use regular expressions for all the cross referencing), I’ve then got to get the data analysed before it can be outputted and (finally) uploaded to my web server. It’s a lot of data, but if this pans out (which it should do), this project will instantly offer something no one else on the internet offers at the moment (although I’m aware of a number of similar developments in the pipeline).