[sword-devel] Character Frequency
Greg Hellings
greg.hellings at gmail.com
Sun Jul 3 10:30:32 MST 2011
A few simple pipes in Unix can do the same thing with relative ease.
cat kjv.xml | sed -e 's/./&\n/g' | sort | uniq -c | sort -nr
1669596
1661832 "
1330866 o
1307266 r
1172801 s
1156121 e
1092384 n
1029125 m
901465 t
864037 >
864037 <
830916 =
776214 a
772641 w
625029 h
609087 :
560652 g
497519 l
469056 /
406801 i
393184 0
370919 p
350731 1
312386 H
290358 2
283469 8
263960 3
257239 d
220707 .
209066 5
204056 b
197713 4
197400 c
193701 7
183464 6
175932 G
172006 9
152074 -
133127 I
126782 M
121721 D
115182 N
114636 v
113384 T
111775 u
109108 y
107290 P
94242 A
85226 S
84923 f
74768 ,
73229 C
39531 J
36203 V
35707 k
34899
25991 E
24737 R
23948 F
20676 O
18179 x
16367 L
10159 ;
6930 z
5389 K
5047 B
4036 …
3421 ?
3283 X
2970 ¶
2596 j
2489 W
2334 q
2040 '
1776 Z
797 U
551 Y
313 !
240 )
240 (
199 Q
93 æ
5 }
5 {
3 Æ
1 ת
1 ש
1 ר
1 ק
1 צ
1 פ
1 ע
1 ס
1 נ
1 מ
1 ל
1 כ
1 י
1 ט
1 ח
1 ז
1 ו
1 ה
1 ד
1 ג
1 ב
1 א
The format looks a bit nicer on the terminal. Takes about 75 seconds
to run on the file. A few simple lines in Python or the like only
takes about 10s and is equally simple to whip up.
--Greg
On Sun, Jul 3, 2011 at 11:53 AM, David Haslam <dfhmch at googlemail.com> wrote:
> A useful tool for analysing or editing source text files is BabelPad, the
> Unicode Text Editor (for Windows).
> http://www.babelstone.co.uk/Software/BabelPad.html
>
> One of the Menu Tool Options is Character Frequency.
>
> This can be very helpful to detect unexpected code points, such as when the
> translators were inconsistent when they were editing.
>
> David
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/Character-Frequency-tp3642222p3642222.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
More information about the sword-devel
mailing list