[jsword-devel] Passage List Parsing

Tonny Kohar tonny.kohar at gmail.com
Fri Aug 29 22:45:21 MST 2008


Hi,

Whoah it really such a long list of problem :)

> Some of the flaws:
> 1) Uses hard-coded delimiters for the verse list and for the ranges. This
> might be OK, but splitting on them arbitrarily is not OK.

To avoid aribtrarily delimiter, the delimiters could be defined in the
config/properties file along with the BibleNames as in specific
locale, the drawback is that the book name could not have a
punctuation char same with the range delim char.

> 2) The code looks up the book several times. This is an expensive operation,
> it should only be done once, if at all possible.
> 3) The code handles osisRefs as a fall back case. These use spaces to
> delimit verse references. On failure, spaces are replaced with commas and
> reparsed. As we go more and more to osisRefs and osisIDs in SWORD modules,
> this should be the norm, not the exception.

Did you mean, the osisRefs/osisID will be handled first and if fail
for various reason, it will use the current algorithm ?

> 4) I think it would be better to have a streaming tokenization that
> normalizes book names as they are found.

> Some of the other bugs:
> 1) Does not handle verse 0.

Is this what you mean by introductory verses as point 1.7 ? is there
any sword module that have this introductory things, that I could use
as example ?

> 2) Does not handle 5ff properly. This is taken as 5, ff and not 5-ff.

> Consider a reference that starts with a book name. As we gather text into
> what might be a book name we could determine which book name it could be. We
> have a limited catalog of names and abbreviations. Given this universe,
> there are only so many start characters. If we see one of these, then we
> only need to consider those words. The second letter narrows it further. At
> some point, we some words in our universe are done. These are candidates. If
> there is more input that can match, we continue. When we are done, we are
> left with the candidates which may need to be disabmiguated. At this point
> we have a valid, matched book name, or an error/revovery condition. If the
> names were built into a Trie and one walked down it as given above, I think
> that would work.
>

How is the tree solve the problem of
- point 1.5
- point 3.1
- point 3.6
- "The parsing engine is o.c.j.passage.AccuracyType. The basic
responsibility of AccuracyType is to determine what a string reference
is given it's context (or basis). AccuracyType.tokenize(String ref)
parses the reference into parts on digit boundaries. There is an
undocumented assumption in the code that book names do not end in
numbers. This pertains to books like 3 John, and it may be that this
does not work for other languages which might call it the equivalent
of John 3."

Or the tree does not care about those, it only try to get the book
name, the disambiquity will be handled in different process ?

Side note: is the Character.isLetter(char) and Character.isDigit(char)
works for locale other than english ?

Cheers
Tonny Kohar
-- 
Alkitab Bible Study
imagine, design, create ...
http://www.kiyut.com



More information about the jsword-devel mailing list