Press "Enter" to skip to content

Richy's Random Ramblings

Command Line awk Regular Expression for Apache logs

For code testing against a live site, I’ve had to extract all urls from an Apache access file – but how to do this from the Linux command line?

The secret is to use two regular expressions (regexp) in a “awk” command – for example:

cat examine.txt | awk 'sub(/.*(GET|POST) \//,"")&&sub(/ HTTP.*/,"")'

This will pipe the contents of the file examine.txt to AWK which will run two regular expressions. The first one will remove the “phrase” “GET /” or “POST /” and anything before it – and the second will remove the “phrase” ” HTTP” and anything after it. It’ll then give you a nice list of URLs to test.

Oh – and if you’d like it to produce a nice “curl friendly” file of just URLs starting “xyz.php” from host example.com then:

cat examine.txt | grep "GET /xyz.php" | awk 'sub(/.*(GET|POST) \//,"http://example.com/")&&sub(/ HTTP.*/,"")' > curl.txt

should do the trick (combine that with cat curl.txt | xargs -n1 -i curl {} > /dev/null to test)

Techy: Setting up IPv6 on Linux Mint

On my Linux Mint 14 HP Laptop, I need connectivity to the “new” IP v6 internet – however, neither our office router (a 4 year old Draytek Vigor 2820n) or our ISP (BT Business Infinity) support IPv6: so how do I get access?

Well, first of all, I’ve signed up for an account via Sixxs.net (who provides what is called a “4to6 tunnel”) and waited for my account to be approved (it took about 3 days). I then requested a tunnel type of “Dynamic NAT-traversing IPv4 Endpoint using AYIYA” from the UK provider Goscomb and waited for that to be approved (which took about 8 minutes).

Once I had the tunnel, I needed to configure my laptop. A quick “sudo apt-get install aiccu” installed the AICCU package – which is the “SixXS Automatic IPv6 Connectivity Client Utility” (it’s also available for Windows, Mac, FreeBSD and many other platforms).

I then modified /etc/aiccu.conf by running “dpkg-reconfigure aiccu” which prompted me to select my Tunnel Broker (SixXS), my SixXS username (in the format WXYZ-SIXXS) and my SixXS password. It then automatically restarted aiccu for me.

Finally, I needed to configure my DNS to use IPv6 resolvers. I tend to use OpenDNS for DNS provision (if your nameservers are 208.67.222.222 and 208.67.2220.220 then you do as well), but I needed IPv6 ones. Luckily, OpenDNS provides the following IPv6 DNS resolvers 2620:0:ccc::2 and 2620:0:ccd::2. Add them to my /etc/resolv.conf file in the format:
# OpenDNS ipv6
nameserver 2620:0:ccc::2
nameserver 2620:0:ccd::2
# OpenDNS ipv4
nameserver 208.67.222.222
nameserver 208.67.220.220

Go to the IPv6 test site at http://www.test-ipv6.com/ and my score is 10/10! Woot!

Google Adwords: Bullet Point recommendations

Here is a list of bullet points to keep in mind when running advertisements on Google Adwords:

* Each campaign should be very specific (“cpanel web hosting”)
* Use of the exact match [keyword] system will target specific keywords entered on their own
* Set up separate campaign for EACH country (try not to group countries)
* Multiple campaigns/adgroups with keyword specific ads and budget
* Upper casing the first letter of each Adword is recommend
* Using exclamation marks at end of adwords is good
* Use {KeyWord: Default text} to generate headlines
* Create variant adverts – sometimes with minor differences
* Try and keep the number of keywords per adgroup low: Google suggest 5 good keywords, maximum of 15 keywords per adgroup
* The Display URL can be “faked” (i.e. doesn’t really exist): for example http://example.com/webhosting
* Use “Keywords->See search terms” to see exactly which search terms were triggered in that adgroup
* Make sure negative keywords such as “free”, “jobs”, “careers” etc are set if not relevant
* On negative keywords, DON’T do things such as “free web hosting” as it will negatively match “web” AND “hosting”! If you need to block that specific, use [free web hosting]
* Use separate campaigns for display network with each adgroup targeting specific types of sties and each separate budgets
* If targeting generic words such as “web hosting use a separate campaign with separate budgets
* Recommend setting up Ad Extensions using Sitelinks. Possibly including telephone numbers and/or product images (although this would need us to add “products” to Google Merchant Centre with fixed pricing: so may not be relevant).

Geographical precision compared

Precision UK Ordnance Survey Eastings and Northings Grid Reference Degrees Decimal Degrees
Data from Wikipedia Stack Exchange
111km 1 0 decimal places
11.1km 0.1 1 decimal place
1.1km 2 digits each/4 digits total
1km square
0.01 2 decimal places
111m 3 digits each/6 digits total
100m square
0.001 3 decimal places
11.1m 4 digits each/8 digits total
10m square
0.0001 4 decimal places
1.1m 5 digits each/10 digits total
1m square
0.00001 5 decimaal places
11cm 6 digits each/12 digits total
10cm square
0.000001 6 decimal places
1.11cm 7 digits each/14 digits total
1cm square
0.0000001 7 decimal places
1.11mm 8 digits each/16 digits total
1mm square
0.00000001 8 decimal places