[sword-svn] r344 - trunk/misc

greg.hellings at crosswire.org greg.hellings at crosswire.org
Sun Jan 8 10:31:04 MST 2012


Author: greg.hellings
Date: 2012-01-08 10:31:04 -0700 (Sun, 08 Jan 2012)
New Revision: 344

Added:
   trunk/misc/count.py
Log:
Added a script which counts character frequencies and outputs the
frequency of each along with the Unicode character name for that
code point.

Added: trunk/misc/count.py
===================================================================
--- trunk/misc/count.py	                        (rev 0)
+++ trunk/misc/count.py	2012-01-08 17:31:04 UTC (rev 344)
@@ -0,0 +1,37 @@
+#!/usr/bin/env python
+# Distributed under the "here, have it" license
+# Written by Greg Hellings, all rights reserved
+
+# Counts all the characters in a file, assumes UTF-8 encoding, and
+# reports the frequency of each character as well as the Unicode
+# character name for that code point. Can accept an arbitrary number
+# of files on the argument line and will report the aggregate across
+# each file. Can also accept input from stdin. If you want to mix
+# stdin with files pass the filename '-' on the argument line.
+
+import fileinput
+from unicodedata import name
+from operator import itemgetter
+
+def sort_dict(adic):
+	items = adic.items()
+	items.sort()
+	return [value for key, value in items]
+
+chars = dict()
+for line in fileinput.input():
+	for c in line.decode('utf-8'):
+		if not chars.has_key(c):
+			chars[c] = 1
+		else:
+			chars[c] += 1
+
+items = chars.items()
+items.sort(key=itemgetter(1), reverse=True)
+print 'Code point\tCharacter\tName\t\tCount'
+for key, val in items:
+	try:
+		n = name(key)
+	except:
+		n = 'not found'
+	print '%06X\t\t%s\t%24s %s' % (ord(key), key, n, val)


Property changes on: trunk/misc/count.py
___________________________________________________________________
Added: svn:executable
   + *




More information about the sword-cvs mailing list