For code testing against a live site, I’ve had to extract all urls from an Apache access file – but how to do this from the Linux command line?
The secret is to use two regular expressions (regexp) in a “awk” command – for example:
cat examine.txt | awk 'sub(/.*(GET|POST) \//,"")&&sub(/ HTTP.*/,"")'
This will pipe the contents of the file examine.txt to AWK which will run two regular expressions. The first one will remove the “phrase” “GET /” or “POST /” and anything before it – and the second will remove the “phrase” ” HTTP” and anything after it. It’ll then give you a nice list of URLs to test.
Oh – and if you’d like it to produce a nice “curl friendly” file of just URLs starting “xyz.php” from host example.com then:
cat examine.txt | grep "GET /xyz.php" | awk 'sub(/.*(GET|POST) \//,"http://example.com/")&&sub(/ HTTP.*/,"")' > curl.txt
should do the trick (combine that with cat curl.txt | xargs -n1 -i curl {} > /dev/null
to test)
Be First to Comment