[sword-devel] Converting RTF \'XX to UTF-8
Chris Little
chrislit at crosswire.org
Sun Jun 22 12:31:24 MST 2008
Aha! An example goes a long way, so now I understand the real problem.
You just need to change the codepage. cp1252 is the Windows equivalent
of ISO 8859-1. Since you want Greek, you need the ISO 8859-7 equivalent,
which would be cp1253, thus:
perl -CO -pe 'use Encode; s/\\\'([0-9a-fA-F]{2})/decode("cp1253",
chr(hex($1)))/eg'
If you don't have cp1253 as an available encoding in perl, just skip the
decode part, convert the \'XX to chars and use iconv to convert:
perl -CO -pe 's/\\\'([0-9a-fA-F]{2})/chr(hex($1))/eg'
And to manage the codepoints not in cp1253, you can do a separate pass:
perl -CO -pe 's/\\u(\d{1,5})./pack("U", $1)/eg'
(I haven't tested that, so it might be a little off, but it should point
you in the right direction.)
--Chris
Karl Kleinpaste wrote:
> I've got an RTF document which contains this kind of encoding:
>
> \cf2 \'c3\'e5\u769?\'ed\'e5\'f3\'e9\'f2\cf0
>
> That renders the word "Genesis" in the Greek, i.e. \'c3\'e5 is the
> capital gamma. As seen in another app which uses this RTF natively:
>
>
>
> ------------------------------------------------------------------------
>
>
> ------------------------------------------------------------------------
>
>
> I need to find a scriptable way to convert this kind of encoding to
> UTF-8. I've tried a few things (and Chris has offered a couple more
> variants) of this general flavor:
>
> perl -CO -pe 'use Encode; s/\\\'([0-9a-fA-F]{2})/decode("cp1252", chr(hex($1)))/eg'
>
> But at best I seem to get what amounts to a format-shifted identity
> function (\'ab becomes an actual 0xAB byte) which does me no good.
>
> Any ideas?
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel
mailing list