[sword-devel] Normalising on the commandline
DM Smith
dmsmith555 at yahoo.com
Wed Jan 21 08:34:03 MST 2009
Peter von Kaehne wrote:
> As a side issue of the other debate - how can I achieve NFC for a text I
> am working on via commandline utilities?
>
> All I can find in ICU documentation is about programming methods
> available, but I have seen no command line utilities.
>
> Peter
You can use perl to do it, using the following module:
http://search.cpan.org/~sadahiro/Unicode-Normalize-1.02/Normalize.pm
Note, the more recent the version of perl, the more recent the version
of unicode. See the bottom of the page for the mapping.
Once this is installed, it should be something like: (I'm going from
memory as I haven't used perl significantly for quite a while)
perl -p -i.bak -MUnicode::Normalize -e '$_ = NFC($_)' filename
This will rename x.txt to x.txt.bak and apply the argument of -e to
every line and then print the line.
For more details see:
perldoc perlrun
The tei2mod and osis2mod do conversion to Unicode and NFC normalization
by default. You can turn it off when you know the input is already NFC
or that it is cp1252. Chris has said that he'd like all the module
making programs to be modified to do the same.
Hope this helps.
In Him,
DM
More information about the sword-devel
mailing list