[sword-devel] OSHB module

Kahunapule Michael Johnson kahunapule at eBible.org
Thu Mar 14 20:03:26 EDT 2024

Right now, all modules on eBible.org force Strong's numbers to be G or H followed by 4 or 5 digits, with leading zeroes as necessary to make 4 digits. The reason for this is that Paratext and the DBL software choke on any other format. The decision was forced on me, really.

Ideally, I would consider the Real Solution to be that any process that READS Strong's numbers should tolerate the presence or absence of leading zeroes. Indeed, the G or H, if missing, should be inferred from the Testament in which it is found. (Tagging of the longer Esther and Daniel should require an explicit G or H.) But if you write Strong's numbers, maximum compatibility would come from sticking to the Paratext/DBL pattern. Maximum encoding efficiency, of course, would be in the other direction, 
stripping out the redundant leading zeroes and implied G or H would save space, but at this point, I think maximum compatibility is more important.

Right now, asking for all modules to be rebuilt one way or another is a really big ask. It is probably easier to preprocess all Strong's numbers to make the format consistent within the back end. That way a string comparison in the search should work just fine. We would just have to decide what the search format should be. G or H should be supplied to disambiguate when necessary, and leading zeroes either supplied or stripped. Make sense?

Of course, if a strong consensus on Strong's number formatting could be obtained and manifested in code in all relevant Sword Project front and back end software, I could go either way. My Bible translation source would still have the Paratext/DBL format, but stripping out leading zeroes in writing OSIS files is not hard. For now, though, I must agree with Karl about the probability of his trademarked Real Solution coming to pass. Sigh.

On 3/14/24 11:23, Karl Kleinpaste wrote:
> Quite honestly, the Real Solution™ to this problem is to bite the bullet, make a concrete decision that Strong's numbers are to be encoded in exactly one way, and re-work all existing modules to conform to that standard. Personally, I advocate that such a standard would stipulate Strong's numbers to be encoded in minimal (natural) digits: Encoding an OT reference as "1" means a Heb Strong's dictionary key of "00001" and an NT "1401" means a Grk Strong's dictionary key of "01401", that is, zeroes to create 
> dictionary module keys are prepended to natural numbers to fill exactly 5 digits.
> I've never bothered to attempt a final fix to this problem in Xiphos for exactly the reason that, no matter which direction I might take, it will be an unreliable hack; that in turn is because the very concept of a leading '0' as a weak discriminant between Heb and Grk Strong's numbers is itself an unreliable hack. Whenever the subsequent conceptual change came along, to distinguish Heb/Grk numbers according to a leading H or G (that is, lucene search using e.g. "lemma:G1401"), /that/ was the point at 
> which the leading-zero-encoding nonsense should have been forced into the trash bin.
> It was not, and here we are.
> Probability of the Real Solution™ coming to pass: Vanishingly close to zero.
> _______________________________________________
> sword-devel mailing list:sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


*/Michael Johnson/**
mljohnson.org <https://mljohnson.org/> • eBible.org <https://eBible.org> • WorldEnglish.Bible <https://WorldEnglish.Bible> • PNG.Bible <https://PNG.Bible>
Signal/Telegram/WhatsApp/Telephone: +1 808-333-6921
Skype: kahunapule • Telegram/Twitter: @kahunapule • Facebook: fb.me/kahunapule <https://www.facebook.com/kahunapule>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20240314/527198b2/attachment.htm>

More information about the sword-devel mailing list