[sword-devel] SIL Ezra fonts
Chris Little
sword-devel@crosswire.org
Sat, 28 Sep 2002 08:47:45 -0700 (MST)
On Fri, 27 Sep 2002, Joel Mawhorter wrote:
> On September 27, 2002 22:49, Joel Mawhorter wrote:
> > Why use a Windows specific encoding for a cross-platform library?
It's not such a bad encoding, even if MS did design it--and I'm not
confident that they did. In any event, it's well supported on pretty much
every platform, if for no reason other than that it is the default
encoding used by the monopoly.
> > I don't think Codepage 1252 could be a superset of ISO-8859-1; aren't they
> > both one byte encodings?
>
> Opps, check first, speak second. You're right CP1252 is a suberset of
> ISO-8859-1. I guess I should read James 1:19 more often. :)
:) Like I said, it's just a couple of characters different. In ISO
they're unassigned.
> > Maybe just really close? Do we use anything above
> > 0x7f in Codepage 1252? I assume all the european stuff is up there.
Exactly.
> > What about Hebrew and Greek? Do we use UTF-8 for that?
Yes, I hope so. There might be some still floating around that use Symbol
or another font encoding, but we have tried to eliminate these.
Incidentally, CP1252 texts are no longer created either. Everything is
either USASCII or UTF-8. UTF-8 stuff uses Encoding=UTF-8. USASCII stuff
uses the CP1252 code. When everything is migrated to UTF-8, we'll
probably drop CP1252. Though, with more modules coming from the public,
this we might want to keep CP1252 support, I guess.
> > Sorry for all the questions. I'm just trying to figure out what needs to be
> > dealt with for the searching stuff.
For searching, you should never have to deal with anything that is no
UTF-8 because decoding CP1252 to UTF-8 is one of the first things we do in
the filters.
--Chris