[sword-devel] Character Frequency
Greg Hellings
greg.hellings at gmail.com
Sun Jul 3 10:43:16 MST 2011
In fact,
http://dl.thehellings.com/count.py
churns through kjv.xml in 11 seconds on my machine and gives the
desired output of character counts. Can be invoked with either the
name of a file (python count.py kjv.xml) as part of a pipe (cat
kjv.xml | ./count.py) or with a whole list of files (./count.py
kjv.xml kjvfull.xml kjvlite.xml).
--Greg
On Sun, Jul 3, 2011 at 12:30 PM, Greg Hellings <greg.hellings at gmail.com> wrote:
> A few simple pipes in Unix can do the same thing with relative ease.
>
> cat kjv.xml | sed -e 's/./&\n/g' | sort | uniq -c | sort -nr
> 1669596
> 1661832 "
> 1330866 o
> 1307266 r
> 1172801 s
> 1156121 e
> 1092384 n
> 1029125 m
> 901465 t
> 864037 >
> 864037 <
> 830916 =
> 776214 a
> 772641 w
> 625029 h
> 609087 :
> 560652 g
> 497519 l
> 469056 /
> 406801 i
> 393184 0
> 370919 p
> 350731 1
> 312386 H
> 290358 2
> 283469 8
> 263960 3
> 257239 d
> 220707 .
> 209066 5
> 204056 b
> 197713 4
> 197400 c
> 193701 7
> 183464 6
> 175932 G
> 172006 9
> 152074 -
> 133127 I
> 126782 M
> 121721 D
> 115182 N
> 114636 v
> 113384 T
> 111775 u
> 109108 y
> 107290 P
> 94242 A
> 85226 S
> 84923 f
> 74768 ,
> 73229 C
> 39531 J
> 36203 V
> 35707 k
> 34899
> 25991 E
> 24737 R
> 23948 F
> 20676 O
> 18179 x
> 16367 L
> 10159 ;
> 6930 z
> 5389 K
> 5047 B
> 4036 …
> 3421 ?
> 3283 X
> 2970 ¶
> 2596 j
> 2489 W
> 2334 q
> 2040 '
> 1776 Z
> 797 U
> 551 Y
> 313 !
> 240 )
> 240 (
> 199 Q
> 93 æ
> 5 }
> 5 {
> 3 Æ
> 1 ת
> 1 ש
> 1 ר
> 1 ק
> 1 צ
> 1 פ
> 1 ע
> 1 ס
> 1 נ
> 1 מ
> 1 ל
> 1 כ
> 1 י
> 1 ט
> 1 ח
> 1 ז
> 1 ו
> 1 ה
> 1 ד
> 1 ג
> 1 ב
> 1 א
>
> The format looks a bit nicer on the terminal. Takes about 75 seconds
> to run on the file. A few simple lines in Python or the like only
> takes about 10s and is equally simple to whip up.
>
> --Greg
>
> On Sun, Jul 3, 2011 at 11:53 AM, David Haslam <dfhmch at googlemail.com> wrote:
>> A useful tool for analysing or editing source text files is BabelPad, the
>> Unicode Text Editor (for Windows).
>> http://www.babelstone.co.uk/Software/BabelPad.html
>>
>> One of the Menu Tool Options is Character Frequency.
>>
>> This can be very helpful to detect unexpected code points, such as when the
>> translators were inconsistent when they were editing.
>>
>> David
>>
>>
>>
>> --
>> View this message in context: http://sword-dev.350566.n4.nabble.com/Character-Frequency-tp3642222p3642222.html
>> Sent from the SWORD Dev mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>>
>
More information about the sword-devel
mailing list