[sword-devel] Normalising on the commandline
Chris Little
chrislit at crosswire.org
Wed Jan 21 12:01:16 MST 2009
Peter von Kaehne wrote:
> As a side issue of the other debate - how can I achieve NFC for a text I
> am working on via commandline utilities?
>
> All I can find in ICU documentation is about programming methods
> available, but I have seen no command line utilities.
DM's suggestion of using the Perl facility is fine, and I use it myself
plenty often when I'm scripting Perl. But there's also an ICU utility
which can achieve normalization (and much more).
uconv (meant as a replacement for iconv, if you're familiar with that)
does codepage/encoding conversion, transliteration, and normalization.
It's part of the standard ICU distribution and we have Windows binaries
on the FTP site:
http://crosswire.org/ftpmirror/pub/sword/utils/win32/uconv.zip
http://crosswire.org/ftpmirror/pub/sword/utils/win32/icudt40-big.zip
(I'd recommend the big, 7.6 MB version of the ICU data for this.)
Use is fairly straightforward, but to take a file "input" and NFC
normalize it as a file "output" you would use (assuming both are UTF-8):
uconv -f utf-8 -t utf-8 -x NFC -o output input
--Chris
More information about the sword-devel
mailing list