[sword-devel] Character Frequency
Greg Hellings
greg.hellings at gmail.com
Mon Jul 4 11:10:40 MST 2011
Fixed:
$ count.py kjv.xml
Code point Character Name Count
000020 SPACE 1669596
000022 " QUOTATION MARK 1661832
00006F o LATIN SMALL LETTER O 1330866
000072 r LATIN SMALL LETTER R 1307266
000073 s LATIN SMALL LETTER S 1172801
000065 e LATIN SMALL LETTER E 1156121
00006E n LATIN SMALL LETTER N 1092384
00006D m LATIN SMALL LETTER M 1029125
000074 t LATIN SMALL LETTER T 901465
00003C < LESS-THAN SIGN 864037
00003E > GREATER-THAN SIGN 864037
00003D = EQUALS SIGN 830916
000061 a LATIN SMALL LETTER A 776214
000077 w LATIN SMALL LETTER W 772641
000068 h LATIN SMALL LETTER H 625029
00003A : COLON 609087
000067 g LATIN SMALL LETTER G 560652
00006C l LATIN SMALL LETTER L 497519
00002F / SOLIDUS 469056
000069 i LATIN SMALL LETTER I 406801
000030 0 DIGIT ZERO 393184
000070 p LATIN SMALL LETTER P 370919
000031 1 DIGIT ONE 350731
000048 H LATIN CAPITAL LETTER H 312386
000032 2 DIGIT TWO 290358
000038 8 DIGIT EIGHT 283469
000033 3 DIGIT THREE 263960
000064 d LATIN SMALL LETTER D 257239
00002E . FULL STOP 220707
000035 5 DIGIT FIVE 209066
000062 b LATIN SMALL LETTER B 204056
000034 4 DIGIT FOUR 197713
000063 c LATIN SMALL LETTER C 197400
000037 7 DIGIT SEVEN 193701
000036 6 DIGIT SIX 183464
000047 G LATIN CAPITAL LETTER G 175932
000039 9 DIGIT NINE 172006
00002D - HYPHEN-MINUS 152074
000049 I LATIN CAPITAL LETTER I 133127
00004D M LATIN CAPITAL LETTER M 126782
000044 D LATIN CAPITAL LETTER D 121721
00004E N LATIN CAPITAL LETTER N 115182
000076 v LATIN SMALL LETTER V 114636
000054 T LATIN CAPITAL LETTER T 113384
000075 u LATIN SMALL LETTER U 111775
000079 y LATIN SMALL LETTER Y 109108
000050 P LATIN CAPITAL LETTER P 107290
000041 A LATIN CAPITAL LETTER A 94242
000053 S LATIN CAPITAL LETTER S 85226
000066 f LATIN SMALL LETTER F 84923
00002C , COMMA 74768
000043 C LATIN CAPITAL LETTER C 73229
00004A J LATIN CAPITAL LETTER J 39531
000056 V LATIN CAPITAL LETTER V 36203
00006B k LATIN SMALL LETTER K 35707
00000A
not found 34899
000045 E LATIN CAPITAL LETTER E 25991
000052 R LATIN CAPITAL LETTER R 24737
000046 F LATIN CAPITAL LETTER F 23948
00004F O LATIN CAPITAL LETTER O 20676
000078 x LATIN SMALL LETTER X 18179
00004C L LATIN CAPITAL LETTER L 16367
00003B ; SEMICOLON 10159
00007A z LATIN SMALL LETTER Z 6930
00004B K LATIN CAPITAL LETTER K 5389
000042 B LATIN CAPITAL LETTER B 5047
00003F ? QUESTION MARK 3421
000058 X LATIN CAPITAL LETTER X 3283
002026 … HORIZONTAL ELLIPSIS 3115
0000B6 ¶ PILCROW SIGN 2970
00006A j LATIN SMALL LETTER J 2596
000057 W LATIN CAPITAL LETTER W 2489
000071 q LATIN SMALL LETTER Q 2334
000027 ' APOSTROPHE 2040
00005A Z LATIN CAPITAL LETTER Z 1776
002013 – EN DASH 920
000055 U LATIN CAPITAL LETTER U 797
000059 Y LATIN CAPITAL LETTER Y 551
000021 ! EXCLAMATION MARK 313
000028 ( LEFT PARENTHESIS 240
000029 ) RIGHT PARENTHESIS 240
000051 Q LATIN CAPITAL LETTER Q 199
0000E6 æ LATIN SMALL LETTER AE 93
00007B { LEFT CURLY BRACKET 5
00007D } RIGHT CURLY BRACKET 5
0000C6 Æ LATIN CAPITAL LETTER AE 3
0005D1 ב HEBREW LETTER BET 1
0005D5 ו HEBREW LETTER VAV 1
0005D9 י HEBREW LETTER YOD 1
0005E1 ס HEBREW LETTER SAMEKH 1
0005E9 ש HEBREW LETTER SHIN 1
0005D2 ג HEBREW LETTER GIMEL 1
0005D6 ז HEBREW LETTER ZAYIN 1
0005DE מ HEBREW LETTER MEM 1
0005E2 ע HEBREW LETTER AYIN 1
0005E6 צ HEBREW LETTER TSADI 1
0005EA ת HEBREW LETTER TAV 1
0005D3 ד HEBREW LETTER DALET 1
0005D7 ח HEBREW LETTER HET 1
0005DB כ HEBREW LETTER KAF 1
0005E7 ק HEBREW LETTER QOF 1
002015 ― HORIZONTAL BAR 1
0005D0 א HEBREW LETTER ALEF 1
0005D4 ה HEBREW LETTER HE 1
0005D8 ט HEBREW LETTER TET 1
0005DC ל HEBREW LETTER LAMED 1
0005E0 נ HEBREW LETTER NUN 1
0005E4 פ HEBREW LETTER PE 1
0005E8 ר HEBREW LETTER RESH 1
--Greg
On Mon, Jul 4, 2011 at 10:41 AM, David Haslam <dfhmch at googlemail.com> wrote:
> Output is a tad less descriptive than that from BabelPad.
>
> Here's the first 25 lines from a file I was working on.
>
> /For files with long character names, best to use a wider tab setting in
> one's editor./
>
> Code point Character Character Name Count
> 000020 SPACE 609,105
> 000021 ! EXCLAMATION MARK 2,009
> 000022 " QUOTATION MARK 2,245
> 000027 ' APOSTROPHE 199
> 000028 ( LEFT PARENTHESIS 93
> 000029 ) RIGHT PARENTHESIS 93
> 00002A * ASTERISK 3,500
> 00002B + PLUS SIGN 66
> 00002C , COMMA 73,327
> 00002D - HYPHEN-MINUS 901
> 00002E . FULL STOP 22,991
> 000030 0 DIGIT ZERO 2,822
> 000031 1 DIGIT ONE 14,709
> 000032 2 DIGIT TWO 10,486
> 000033 3 DIGIT THREE 6,626
> 000034 4 DIGIT FOUR 4,786
> 000035 5 DIGIT FIVE 3,897
> 000036 6 DIGIT SIX 3,478
> 000037 7 DIGIT SEVEN 3,230
> 000038 8 DIGIT EIGHT 3,062
> 000039 9 DIGIT NINE 2,920
> 00003A : COLON 10,445
> 00003B ; SEMICOLON 11,513
> 00003F ? QUESTION MARK 3,010
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/Character-Frequency-tp3642222p3643921.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
More information about the sword-devel
mailing list