[sword-devel] module statistics

Troy A. Griffitts scribe at crosswire.org
Sun Aug 24 04:50:14 MST 2008

Dear Greg,

Thank you so much for your work.  Both you and DM had offered to help on 
this.  As DM has a ton of other tasks, I'm sure he would appreciate it 
if you wanted to own this.  Here is the history up to now.

Originally, I believe Joachim, Chris, Martin, and DM had a hand in 
creating, improving, debugging, etc., a perl script to do module 
statistics.  I think they worked out a good way to minimize skewed 
numbers from multiple retries, multiple files per modules, etc.  I've 
moved their script to ~sword/bin/ on the server and placed it under 
version control.

If you'd like to own this task moving forward, you are more than 
welcome-- and I think I can say this for all those involved in the 
process in the past (though they can speak up if they still have a 
heartfelt attachment to the task).  However, so as not to neglect 
gleaning from their past work, I would like to ask you to take a look at 
their script and see how they decided to computer numbers.

This script is run from a daily cron job to produce the top20.html file 
on swords front page.  The arguments for the run are:

/home/sword/bin/makeDownloadsStats.pl /home/sword/html/top20.html 20 30

If your new python script could take the same params and generate a 
similar file, it would make it easy for me to substitute it into the 
cron job.

If you don't feel this is something you'd like to own, maybe DM is still 
willing to look into updating the current perl script.

Thanks everyone for your recent work and work from the past on this. 
Automation is our friend: it captures nebulous knowledge floating around 
and places it into a solid description, and keeps humans out of the role 
of 'bottleneck'. :)


Greg Hellings wrote:
> Troy,
> I've written up a log processor for the download statistics.  It's the
> executable .py file in my user directory on the server.  Below is an
> example run of it:
> [ghellings at www ~]$ ./process_log.py ESV <path-to-log snipped>
> Total downloads: 362
> Unique downloads: 210
> It will accept as many files on the command line as you desire and
> report their statistics in aggregate.  Such is most useful for
> maintaining information about the IP-address across the multiple
> files.  It also works for the FTP files, but for those, relying on the
> total downloads is misleading, since it reports individual downloads
> of both new AND old testament .bz* files.  Thus, each individual
> download of the module should crop up as about 6 files in the "total
> downloads" section.  Unique downloads are based solely on IP address.
> As an example of the discrepancy of the counting:
> [ghellings at www ~]$ ./process_log.py ESV <path-to-log snipped>
> Total downloads: 540
> Unique downloads: 84
> Examples for comparison:
> [ghellings at www ~]$ ./process_log.py KJV <ftp log>
> Total downloads: 2098
> Unique downloads: 163
> [ghellings at www ~]$ ./process_log.py KJV <http log>
> Total downloads: 342
> Unique downloads: 198
> Those stats are based off of the currently in-use log files.  If you
> would like a version of the script that will also report all module
> download totals, that can be provided for little extra work.
> --Greg
> On Tue, Aug 19, 2008 at 4:14 PM, Greg Hellings <greg.hellings at gmail.com> wrote:
>> Troy,
>> On Tue, Aug 19, 2008 at 4:04 PM, Troy A. Griffitts <scribe at crosswire.org> wrote:
>>> Hey guys.  We have a few needs which need addressing:
>>> Log files got a new naming convention recently.  Instead of:
>>> ffff
>>> ffff.1
>>> ffff.2
>>> ...
>>> It has become
>>> ffff
>>> ffff-20080819
>>> ffff-20080818
>>> ...
>>> Hence our perl scripts that generate module statistics are not working,
>>> seen on the left panel here:
>> I don't know thing 1 on Perl, so editing that is out for me.  A
>> rewrite is possible into Python if no one with Perl knowledge shows
>> up.
>>> http://crosswire.org/sword
>>> Also, Crossway asks for periodic download statistics for their ESV
>>> module.  I generated the last report for them by hand, but I would love
>>> for someone to write a script that would run on the first of each month
>>> and email them statistics for the previous month.
>> What format is the file in (I'm guessing it's an Apache file access
>> log)?  A simple Python script should be more than sufficient for this
>> purpose.  I can probably whip one up in little time.  Also, what
>> statistics are you in need of -- just a download count or do you also
>> want to have information on the unique IP address downloads, etc.  A
>> sample of one line of the file (or multiple lines, if a file access is
>> spread across several lines) which pertains to the ESV should be
>> sufficient to base the work off of -- more would be appropriate if
>> there are multiple formats the line appears in.  Also, odds are good
>> that the same script can be used to generate the statistics for any
>> individual module.
>> --Greg
>>> Any takers?
>>>        -Troy.
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

More information about the sword-devel mailing list