[sword-devel] New Accented Greek NT with Morph

DM Smith dmsmith555 at yahoo.com
Mon Apr 25 05:01:23 MST 2005


Great input!

Placement of accents with composed are determined by the font editor and 
are placed correctly. Rendering of decomposed is an approximation. So, 
yes, composed look better. If the end user has a font that completely 
supports composed.

Font selection is critical with composed. Many fonts support unaccented 
Greek letters, so when decomposed is rendered with a font that does not 
do accents, the accents are represented as boxes (or perhaps something 
else). This is really ugly, but it is possible to ignore the "static" 
and read the unaccented letters. But with the same font, composed is 
entirely un-readable.

End users will need help making good selections. Currently, in 
BibleDesktop, we provide a Font Picker that gives an alphebetized list 
of each font on the system and a bunch of sizes that the user might like 
to use. This does not provide the user any help in picking out a good font.

With regard to searching, it is important that the search is normalized 
the same way that the text is normalized. One of the challenges with 
decomposed is that when two accents are provided, they can be in more 
than one order. So that is why one normalizes decomposed by re-composing 
and then decomposing. Also, I have heard that for some accented letters 
there are more than one composed form. This is why one often normalizes 
composed by decomposing and then recomposing.

Lucene provides for the ability to have multiple indexes per word (think 
of these as columns in a database table). One can index the unaccented 
form, the composed form, a transliterated form, ...

ICU provides the ability for front-ends to be independent of the 
implementation decisions of the Sword API. The frontend can always 
compose what it is handed before it is rendered.

What is important is for Sword to normalize non-latin1 text as modules 
and indexes are built. This should be published so that front-end 
developers can code accordingly.


Eeli Kaikkonen wrote:

>This Accented Greek NT thing is great, and I'd like to share some thoughts 
>about it. I'm not a specialist in fonts, Greek or unicode, so be warned. I 
>hope this gives some new and useful thoughts if not else.
>
>First, about de/precomposed characters. If the text uses decomposed 
>characters, the font renderer has to compose them. A Bible software cannot 
>make any difference, it might be good looking or bad looking depending on the 
>renderer. If the text uses precomposed characters, the renderer renders the 
>glyphs straight from the font file, and it's up to the font author to make a 
>good looking glyph. Of course also when using precomposed characters the font 
>file may have bugs which the renderer cannot fix, and renderer may not handle 
>correctly the situations where a glyph in the font file is actually a link to 
>some other glyphs (this is not the same thing as de/precomposing!).
>
>So there are many renderers, many fonts and many combinations, and any of them 
>may have bugs. Using decomposed characters adds more chances of having bugs. 
>Therefore I would prefer the precomposed form.
>
>I did not know there are free (like in thought) unicode fonts covering Greek 
>Extended before this thread. I like FreeFont, though it has some bugs. I 
>think I found the reason for those bugs, now I have to report them.
>
>If there are reasons for the Sword library being Free Software, there are also 
>reasons for fonts being Free or Open. In my humble opinion we as individuals 
>and the Sword project as a whole could support FreeFont project in some way 
>or another. Finding and reporting bugs is one way. It would be quite 
>short-sighted to choose a good looking but non-free font for use with Sword.
>
>Fonts are of course not the problem of library, but of the frontends. However, 
>I think there are many developers here who are working for the frontends. A 
>Bible software could even include the font files, and that would help the 
>users because they would not have to find a proper font from their system. At 
>least the software developers could add pointers to the Free font files into 
>documentation.
>
>Here is more information about FreeFont:
>http://www.nongnu.org/freefont/
>http://savannah.nongnu.org/projects/freefont/
>
>I have put some screenshots in my www pages. I think they show quite clearly 
>that precomposed is better than decomposed. I copied the text shown in
>http://crosswire.org/study/parallelstudy.jsp?add=WHNU&add=WHAC&add=WHACD.
>Unfortunately I did not get that page (the fonts) working with Konqueror or 
>Firefox. The CSS is too complicated to edit by hand and makes the worst 
>possible mistake usability wise: it overwrites the settings which the user 
>has got right before. I could use FreeFont, Gentium or some other and I think 
>that the browsers could handle them. But the CSS gives other font names and 
>either I don't have them or they don't include Greek Extended properly.
>
>Anyways, I copied some verses to KWord and OpenOffice (I use Debian 
>GNU/Linux). They render the fonts differently. Both render the precomposed 
>characters well. Both have problems with decomposed characters. Look at verse 
>1, Iakobos and diaspora, and verse 4, ina eete. I used two fonts, Gentium and 
>FreeSerif. FreeSerif looks better. Additionally FreeFont has also sans serif 
>and monospace fonts, and sans serif looks even better or is easier to read 
>with small sizes.
>
>Here are the screenshots, they are large pictures:
>http://iki.fi/eelik/kwordjacobgreek.png
>http://iki.fi/eelik/oojacobgreek.png
>
>
>Then, about searching. If you want to do the search using accents you have to 
>know exactly what you want. Remember that accents may depend on other words 
>than which you are searcing for. Also if you don't know Greek very well it 
>might be hard to remember the accents even though you remember some word. 
>Only rarely someone wants to really search for accents. Mostly those who use 
>Sword want to do biblical interpretation, not linguistic research. Therefore 
>I think that accents should in some way or another be excluded from 
>searching.
>
>For canonical New Testament the best solution might be using search with 
>Strong's numbers or some equivalent. There already are modules with Strong's 
>numbers and morphological tags and the new modules also have at least the 
>morphological tags. Those tags give the possibility to search by any form of 
>the word, and accents may be ignored. Doing syntactical analysis becomes 
>possible too, and it is not a small advantage.
>
>It is up to a frontend software to make this kind of search usable. For the 
>Sword library it would be enough to offer the search for text letter by 
>letter, and search for numbers/tags.
>
>If someone wants to have search with Greek words and accents, precomposed form 
>would be better. I think it is faster to do a search with precomposed 
>characters because there is less to compare. Only if someone wants to search 
>for a word where e.g. "the last alpha may have grave OR acute" the decomposed 
>form would be better. And actually even then the frontend could alter the 
>search string by normalizing and making the proper OR statements.
>
>Troy wrote that we could "b)NFC both the search string and the text before 
>searching". But why NFC the text before searching? The text should be 
>normalized to NFC or NFD already, there is no reason to offer a 
>non-normalized module. The search string can be normalized to any known form, 
>whether it be NFC or NFD, if the form of the text module is known. 
>(Normalization forms are quite hard to understand reading the Unicode 
>documentation, I suppose NFC means the most precomposed form and NFD the most 
>decomposed.)
>
>The bottom line is this:
>1. Precomposed is better. I don't see any reason to use decomposed text in 
>modules.
>2. It would be good to support Free or Open fonts in some way or another.
>
>  
>


More information about the sword-devel mailing list