phil wilson :: a geek commodity

10:33 AM

permanent link to post  

Wednesday, December 03, 2003

Buried in the comments of Mark Pilgrim’s latest post about aggregators and power laws is this little gem of a Bash script which, given the name of your RSS file and an apache access log will return a list of the number of different user-agents who have accessed your file, ordered by the number of user-agents

So for me:

grep /rssify.php /var/log/apache/access.log | cut -d " " -f 1,12- | sort | uniq | cut -d " " -f 2- | cut -d "/" -f 1 | sort |uniq -c | sort -gr | head -25

and you can swap grep for zgrep to search through the gzipped archives (i.e. grep /rssify.php /var/log/apache/access.log.1.gz etc.)