[sword-devel] Comming soon: new improved sword searching
Joel Mawhorter
sword-devel@crosswire.org
Sun, 08 Sep 2002 19:31:23 -0700
On September 8, 2002 13:12, Chris Little wrote:
> FWIW, we need to upgrade our regexp engine. The current one (from GNU)
> has a couple of problems that I was aware of. First it is GPL--this is
> the last GPL component in the library. If it were replaced with something
> else, we could license Sword under non-GPL licenses to other entities
> (e.g. Bible societies that don't want to deal with GPL's restrictions) or
> put it out publicly under a license that we write that better meets our
> needs than the GPL. Second (and probably more immediately important) it
> doesn't handle UTF-8.
Wouldn't it make more sense to use UTF-16 than UTF-8 in regular expressions.
At least with UTF-16, in most cases, 1 character == 1 symbol so regular
expressions would be more managable (e.g. what does a dot mean in a regular
expression when being matched against symbols that can be represented in 1,2
or 3 chars?). Does ICU have regular expression support? I know the regular
expression support in Java 1.4 is very nice and uses UTF-16 but alas we can't
really use that in Sword unless we come up with a CNNI (C non-native
interface :-).
Joel