<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On May 16, 2008, at 8:30 AM, Mullins, Steven wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">DM,<br><br>Thanks for the tips and direction, it is much appreciated!<br>I'm going to work on these issues as time allows. I may<br>still have to bug you with a question or two as I learn how<br>jsword is structured. I'm very new to Java and object-oriented<br>programming in general (unless you count python). I tend to<br>think and write procedurally i.e. (C, Perl and Fortran),<br>but will try hard to fit the paradigm of the existing code.<br><br>I'd really like to see jsword on par with BibleWorks:<br><a href="http://www.bibleworks.com/">http://www.bibleworks.com/</a> in the area of searching and<br>morphological analysis of greek texts. I think with some<br>work we can get it there.</blockquote><div><br></div>Yes this would be great. Here are some ideas. (Some are in Jira, which is down at the moment, so we can't get to our issues database.)<br><br></div><div>You may find the following of interest:</div><div><a href="https://issues.apache.org/jira/browse/LUCENE-1284">https://issues.apache.org/jira/browse/LUCENE-1284</a></div><div><br></div><div>This is a contribution to Lucene that allows for words to broken up into their constituent parts for searching. This is very important for languages that have compound words, such as German. Basically, a word such as "hotdog" is searchable as both hot and dog.</div><div><br></div><div>There is also some work going on regarding n-grams. The basic idea here is that some languages (e.g. Thai and Japanese) do not have word boundaries. Searching in these languages is the process of finding substring matches.</div><div>This is discussed here: </div><div><font class="Apple-style-span" color="#144FAE"><span class="Apple-style-span" style="text-decoration: underline;"><span class="Apple-style-span" style="color: rgb(0, 0, 0); "><a href="https://issues.apache.org/jira/browse/LUCENE-1224">https://issues.apache.org/jira/browse/LUCENE-1224</a></span></span></font></div><div><span class="Apple-style-span" style="text-decoration: underline;"><a href="https://issues.apache.org/jira/browse/LUCENE-1225">https://issues.apache.org/jira/browse/LUCENE-122</a>5</span></div><div>and in some threads on jira-dev</div><div><br></div><div>I don't know if either of those have applicability to Greek and/or Hebrew.</div><div><br></div><div>The other thing that we need is the ability to strip accents, vowel points and cantillation.</div><div><br></div><div>Soon we will have a Greek text with accents. When we do, it will be important to search with and without regard to accents.</div><div><br></div><div>To be able to reliably search on a Unicode text we need to normalize the text before storing it and also to normalize search requests the same way, before doing the search. (Unicode has various normalization forms.) The texts should already be NFC, but that may not be the best for indexing and searching.</div><div><br></div><div>Same with Hebrew. With Hebrew it is also important to be able to remove cantillation for the sake of readability.</div><div><br></div><div>I'd also like the ability added to transliterate these texts. Chris Little has done some wonderful work here for the Sword engine. This would help beginners learn how to read Greek and Hebrew texts. It might also help as an additional normalization form to index.</div><div><br></div><div>I think it would be interesting in a work like the KJV to do a Strong's search that retrieves a list of the different translations of a particular Strong's number.</div><div><br></div><div>What are some of your ideas?</div><div><br></div><div>I will be focused on adding BookMarks for the next release (after the one that is about to be done now) and won't be able to get to much of anything else, but bug fixes.</div><div><br></div><div>In Him,</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>DM<br></div><div><br></div><div><br></div><div><br></div></body></html>