Life: Work and Techy – Page 39 – Richy's Random Ramblings

Techy: Really cheap International Calls

May 11, 2004

Ok – now phone prices are getting silly. Using a service I found two days ago called Telediscount, my parents can call me when I’m in holiday in Japan at a cheaper rate (1p per minute) than when I’m only 12miles away from them in the UK (around 2-5p per minute.

It was bad enough that I can now could call the States for 10p per minute via Komtel, but with Telediscount – 1p per minute?!?

Somebody please explain to me how it can be cheaper to make a transatlantic international phone call than it is to call the house next door? (NTL charge me a minimum of 1p per minute off peak, 2p on peak for local calls – US calls via Telediscount’s 0844 8 610 610 number are 1p per minute at ALL times!). And, no, you don’t need to “sign up” to telediscount or anything – just start calling.

Scary innit!

Oh – it gets better, using their 0911 50 101 50 “Premium rate” number (15p per minute) – I can call UK mobile numbers at, I believe, less then the rate the mobile phone operators charge the network(s) for terminating call charges.

Cheap telephone (phone) calls ahoy!

Techy: Checklist for New Machine

January 14, 2004

Work: Job Offers

October 22, 2003

Techy: Standardised WHOIS records

October 20, 2003

4 Comments

Can somebody please point me in the direction where I can either get a Perl, PHP or C/C++ module/script that will handle at least 90% of all TLD’s (top level domains) and registrars and be able to parse out “discreet” data records (such as the registrant’s address or the expiry date).

I’ve come across Perl’s Net::ParseWhois (no longer being maintained AFAIK) which only supports around 50% of the .com and .net registries and Net::Whois which supports .edu, .net and .com domains and even those not very well. I’ve also tried Geektools WP Whois Proxy scripts: but that doesn’t parse the data (plus it seems extremely slow for what it does do).

I don’t have to write handlers for 134+ registrars do I? Surly someone else has done this before me and somebody reading this blog knows how/where to get access to the script/data… Please….

If only all the registrars had agreeded on a standardised WHOIS record format *sigh*

Techy: Regexp’s are slow…

October 1, 2003

Comments closed

As part of a massive project that I’ve been working on the last few weeks in my spare time (hence the lack of good quality blog posts), I’ve been having to handle a lot of data. By “a lot”, I mean that during one day my computer transferred over 5Gb of data to and from remote systems – and then had over 21million MySQL database queries/updates to process: you try and use a computer for anything else when it’s processing the h-ll out of a lot of files in several different formats (oh, many thanks for Jeremy Zawodny for reminding me of his post regarding MySQL’s 4Gb table limit – I had seen it before, but once I took into account I was hitting the 4Gb limit after just a couple of thousand records (and I’m dealing with a minimum of 3million records), I thought a DB redesign was needed.

Anyway, I’ve just had to process 250files of over 700Mb of TSV (tab separated data) data and extract the information I need. I originally used a Perl regexp (regular expression) to separate the data in each line and then perform a brief comparison on the data (if field X is the letter “B, C or D” and field Z is “K or M” then make an SQL database insert entry, otherwise ignore). Alas, the script was SLOOOOW. After a day of processing, the script had only done around 30files and then crashed my machine for some reason (probably because I was trying to make it go faster by increasing it’s priority).

So I decided to try a write and use the split(/\t/,$_); command to split the data and then use an (if $fieldx=~/[B|C|D]/) style query to compare the data and store it. Perl then shot through the data and 5.4million records later, it’s extracted the 2million items I needed. Speed? 400seconds! Yep, that’s fast!

What have I learnt? Well, if you know you can “trust the data” (i.e. it’s been produced by a computer instead of being typed in by an error prone human-and you know the data hasn’t been corrupted), then use the split command instead of regular expressions – it’s a lot lot faster (just a brief speed test during development showed that regexps were taking over 843 seconds to process the same data that a split took less than 8 seconds to process!).

Anyway, now I’ve extracted the data, I’ve now got to get it in the database and then get the hard bit done where I’ve got to link the new 2million items with the 3million items already in the database whilst querying around 20 remote database systems. Once that’s done and it’s all cross references (which will take ages – I’ll have to use regular expressions for all the cross referencing), I’ve then got to get the data analysed before it can be outputted and (finally) uploaded to my web server. It’s a lot of data, but if this pans out (which it should do), this project will instantly offer something no one else on the internet offers at the moment (although I’m aware of a number of similar developments in the pipeline).

Category: Life: Work and Techy

Techy: Really cheap International Calls

Techy: Checklist for New Machine

Work: Job Offers

Techy: Standardised WHOIS records

Techy: Regexp’s are slow…