Varnish: Normalizing / Normalising the url

March 3rd, 2010 by Richy B. Leave a reply »

We’ve had a small issue with our installation of the Varnish Proxy Cache not working as efficiently as we hoped. This was tracked down to the fact we are using Google Adwords and Google Analytics for tracking and Google was adding query string items such as utm_source , utm_medium , utm_campaign and gclid to the URL. This caused Varnish not to cache the page (and/or treat them as separate urls) and just led to bad cache usage.

I’ve added this code to fix this which may be of use for others:

/* Normalize the url - first remove any hashtags (shouldn't make it to the server anyway, but just in case) */
if (req.url ~ "\#") {
set req.url=regsub(req.url,"\#.*$","");
}
/* Normalize the url - remove Google tracking urls */
if (req.url ~ "\?") {
set req.url=regsuball(req.url,"&(utm_source|utm_medium|utm_campaign|gclid)=([A-z0-9_\-]+)","");
set req.url=regsuball(req.url,"\?(utm_source|utm_medium|utm_campaign|gclid)=([A-z0-9_\-]+)","?");
set req.url=regsub(req.url,"\?&","?");
set req.url=regsub(req.url,"\?$","");
}

This post is over 6 months old.

This means that, despite my best intentions, it may no longer be accurate.

This blog holds over 12 years of archived content - during that time, I may have changed my opinion of something, technology will have advanced (and old "best standards" may no longer be the case), my technology "know how" has improved etc etc - it would probably take me a considerable amount of time to update all the archival entries: and defeat the point of keeping them anyway.

Please take these posts for what they are: a brief look into my past, my history, my journey and "caveat emptor".

1 comment

  1. I was looking for this exact thing and stumbled across your blog. Thank you.

    I’ve shortened your VCL code a bit. Here is what I used:

    # Strip out Google Analytics campaign variables. They are only needed
    # by the javascript running on the page
    # utm_source, utm_medium, utm_campaign, gclid
    if(req.url ~ “(\?|&)(gclid|utm_[a-z]+)=”) {
    set req.url = regsuball(req.url, “(gclid|utm_[a-z]+)=[-_A-z0-9]+&?”, “”);
    set req.url = regsub(req.url, “(\?|&)$”, “”);
    }

gamy-dance
%d bloggers like this: