[sword-devel] French with Strongs?
Michael Johnson
Michael at eBible.org
Sun Jul 24 21:49:18 EDT 2022
Hello, Robert & all.
This is still close enough to Sword software development that it is probably appropriate to make an initial response on this list. If we get too much into the nitty-gritty, we should probably go off list.
First, my main mission is viral digital Bible distribution to as many people speaking as many languages as possible in the formats that work best for them. One of the 14 or so formats I distribute Bibles in is Crosswire Sword modules. Most of what I do is focused on minority languages, like Matigsalug, Matupi Chin, Gadsup, and Anindilyakwa. Matupi Chin is the only minority language I have tagged with Strong's numbers, so far (thanks to a lot of manual labor by some brothers in Mynmar). I was hoping
eventually to do a sort of automated Rosetta Stone analysis of each of the Bibles in the distribution to tag them with Strong's numbers, because that enables some nice user interface features for Bible study. Some Sword front ends touch on these capabilities a little, although I think there is room for improvement, there. Anyway, my Rosetta Stone analysis got kind of bogged down in versification variations and trying to come up with algorithms that work across languages with different word orders. It turns
out that infrequently-used words, like the "beam" and "speck" we are supposed to keep out of our eyes are a bit tough to differentiate based only on correlations with Greek words in the same verses. For more literal-leaning translations, enough can be done to be useful, though. The hard cases could be relegated to manual intervention by someone who knows the language.
Now, about speaker identification. That sounds nice, but it is tough to automate, given the vast differences in the ways different languages handle quotations. I know that Glyssen takes a pretty good stab at it, but on the other hand, I've never had a Bible translation pass Glyssen checks. Not even in English. So have fun with that. I'll watch. ;-)
Now for the mechanics:
I use Haiola, which is free and open source, from https://haiola.org. As you know, it takes USFM, USX, or USFX input and produces multiple outputs, including a dialect of OSIS suitable for feeding to osis2mod. USFX is my "hub" format. If the input is USFM or USX, I convert it to USFX, first. Then I convert from USFX to the other formats. Some formatting and some noncanonical text (i.e. footnotes in subtitles) get discarded in that conversion, which is why I stopped distributing OSIS files. Anyway, any
transformations I do on the text, I do only on the USFX text. This includes some dialect conversions/phrase substitutions, like those going from the /World English Bible/ to the /World English Bible British Edition/, the /World Messianic Bible/, and the /World Messianic Bible British Edition/. It also includes Strong's number merging.
Rather than detailed Pascal source, since most of you would prefer Python, C++, or something else, I'll give some high-level pseudo-code:
To merge Strong's numbers, first I read the source language texts with Strong's numbers into a SourceWords table consisting of
Source word, Strong's number, lemma, and morphology.
Then I read the model text(s) with Strong's numbers already inserted (the hard way) into a StrongsGlossesxxx table, where xxx is the language code (i.e. eng for English, fra fro French, deu for German, etc.). That table contains
record ID, Count, Strong's number, and Gloss.
The count is incremented every time the same Strong's number and Gloss are found paired.
I also write to a SourceVerses table, which has records of
Record ID, Strong's number, verse.
There will be one record in that table for every Strong's number found in every verse.
Then I can write the Strong's numbers into the target text.
For each canonical word in the text
select Strongs from StrongsGlosses'+langId+
' where Gloss="'+nakedWord+
'" and Strongs in (select Strongs from SourceVerses where Verse = "'+
verse+'") order by Count desc limit 1; {Get the most common match in THIS verse.}
if the above select gives me an answer, use it. If not, try again with a SourceLetter of 'G' for NT or DC and 'H' for OT:
select Strongs from StrongsGlosses'+langId+
' where Gloss="'+nakedWord+'" and Strongs like "'+sourceLetter+
'%" order by Count desc limit 1; {Get the most common match in any verse.}
Write the Strong's number into the target text.
The simplified pseudo-code above omits some details like skipping words already tagged and normalizing Strong's numbers to always be "G" or "H" followed by exactly 4 digits (padding with leading zeroes, if needed). Inconsistencies in the way Strong's numbers are represented impede matching logic, making it more complicated than a simple string compare. Alternate numbering systems, like extended Strong's numbers, don't really help at this level, unless they are applied universally to all source languages
that I use for templates.
The rest of the Free Pascal code is parsing and writing the USFX XML. Most of the "heavy lifting" is done in SQL. I use MariaDB for that (a fork of MySQL), because it is powerful and free, and because I use it for other things, too.
On 7/24/22 10:29, Robert Hunt wrote:
>
> Hi Michael,
>
> I'm very interested to learn more about your procedure because I'm wanting to work on a few related things this year:
>
> * attempt a combined/expanded name (person/location/deity) DB -- see https://github.com/Freely-Given-org/Bible_speaker_identification/discussions/11 and other discussions there
> * take the new SR GNT <https://greekcntr.org/collation/index.htm> (currently at RC1) and the OSHB <hb.openscriptures.org/> and add links to the above (SR GNT already has their extended Strongs with extra digit) and IIRC OSHB has regular Strongs)
> * Manually add links from pronouns (e.g., "And /he/ said" might map to Pilate. "When Jesus arrived /there/" might map to Galilee), probably writing a specific, throw-away command-line or simple windowed interface to make this one-off task speedier.
> * Try to apply this information to a non-European language (Matigsalug from where we worked in the Philippines) to discover the issues involved.
>
> So you'll see some overlap with what you've done and why I'm interested to learn more. I'm guessing you use Haiola to convert formats -- what's your intermediate format? I would use Python rather than Pascal, but would still be interested to view your code and DB if that's feasible? Possibly it would give me a head-start; possibly just give me ideas how to do it. (I played with some tagging ideas <https://freely-given.org/BibleTranslations/English/OET/Tags.html> a decade ago but currently leaning towards
> more of a native stand-off format with back-and-forth USFM3 converters.)
>
> Feel free to reply privately or move conversation if it's not appropriate here. But I'm sure that I'm not the only one with these types of aims, and others probably have more experience and/or better ideas than me.
>
> Nice job with DBS Cyber.Bible!
>
> Blessings,
> Robert.
>
> On 25/07/22 07:12, Michael Johnson wrote:
>> Yes, I have a script that works for both OT and NT in French, German, Spanish, English. I also have one for the NT in Biblical Greek. I can start with USFX, USFM, or USX. I am running the process now for the old French Ostervald Bible. It isn't yet easy to share, because it uses a combination of Haiola, some custom Free Pascal code, some SQL, and a MariaDB database with the words, Strong's numbers, and where they are found. Anyway, if all goes well, fraFOB1744eb should have Strong's numbers embedded by
>> tomorrow.
>>
>> There is a lot of room for improvement, especially in the database, so that equivalent words are also listed (like both spellings of Isaiah). That way, more words would find matches. Still, enough of the words are matched to be somewhat useful in study. See cyber.Bible <https://cyber.Bible/study/> and pull up some Bible translations in the above languages in parallel. The way the highlights move when you mouse over words is nice. Then when you click on a word with a Strong's number, you get a
>> Greek/English dictionary entry. Of course, there are a lot of languages not supported with Strong's numbers, yet, and only one language for the dictionary, so far, so we are in no danger of running out of work to do.
>>
>> On 7/24/22 06:25, Fr Cyrille wrote:
>>> This is a very nice news!!
>>> I'm working on the frecrampon with strong's number. Michael have you a script to merge the strong number for the OT books? I already did it with the NT books.
>>> I work first with the usfm files: https://gitlab.com/crosswire-bible-society/neo-crampon-libre
>>>
>>> Le 24/07/2022 à 08:06, Michael Johnson a écrit :
>>>> There is also Strongs support for frasbl2022eb with the same Strong's numbers merged from the Louis Segond.
>>>>
>>>> On 7/23/22 13:29, Timmy wrote:
>>>>> Good news. French F10 now has Strongs support. Someone contacted Michael Johnson from eBible, who was able to merge back in the Strongs numbers. See https://github.com/AndBible/and-bible/issues/1774
>>>>>
>>>>> Kind regards,
>>>>> *--
>>>>> Timmy Braun*
>>>>>
>>>>>
>>>>> On Thu, Feb 17, 2022 at 1:27 PM Fr Cyrille <fr.cyrille at tiberiade.be> wrote:
>>>>>
>>>>> This is amazing!!!!
>>>>> It helps a lot! Few question:
>>>>>
>>>>> * I'm working with the usfm file, any idea for the convertion? If not I can convert the imp to usfm after the work of migratetags.
>>>>>
>>>>> * How to improve the script? By instance the source have Esaïe, the output Isaïe
>>>>>
>>>>> * Do you have also the script for the old testament?
>>>>>
>>>>> Now I need to finish first to improve the source.
>>>>>
>>>>>
>>>>>
>>>>> Le 17/02/2022 à 18:47, Troy A. Griffitts a écrit :
>>>>>>
>>>>>> :)
>>>>>>
>>>>>> OK. So, I've updated the source a bit with improvements we made for the NA28 module. If you already have SWORD either built and installed or installed sword-dev packages for your distribution, you should be able to:
>>>>>>
>>>>>> cd ~/src
>>>>>>
>>>>>> svn co https://crosswire.org/svn/sword-tools/trunk sword-tools
>>>>>>
>>>>>> cd sword-tools/migratetags
>>>>>>
>>>>>> make
>>>>>>
>>>>>> ./migratetags --help
>>>>>>
>>>>>> ./migratetags -ss KJV -t BBE > BBE.imp
>>>>>>
>>>>>> # or more practically, you will probably wish to use the -v flag to see how the mapping is going:
>>>>>>
>>>>>> ./migratetags -ss KJV -t BBE -v | less
>>>>>>
>>>>>>
>>>>>> Let me know if that doesn't get you started well enough to improve things on your own.
>>>>>>
>>>>>> Troy
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/17/22 08:51, Fr Cyrille wrote:
>>>>>>>
>>>>>>>
>>>>>>> Le 17/02/2022 à 15:57, Troy A. Griffitts a écrit :
>>>>>>>> Dear Fr. Cyrille,
>>>>>>>>
>>>>>>>> Have a look here. We regularly use this for migrating tags between modules. You can use one of the default matchers or try the GNT matcher to see how it does or use it as an example to write a more "French-specific" matcher.
>>>>>>>>
>>>>>>>> Let me know if you'd like help getting started:
>>>>>>>>
>>>>>>>> http://crosswire.org/svn/sword-tools/trunk/migratetags/
>>>>>>> It's chinese for me... How to use this?
>>>>>>>>
>>>>>>>> On February 17, 2022 7:49:07 AM MST, Fr Cyrille <fr.cyrille at tiberiade.be> <mailto:fr.cyrille at tiberiade.be> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Le 17/02/2022 à 15:32, Timmy a écrit :
>>>>>>>>> Greetings, lately there have been a few requests for a French Bible with Strongs support in AndBible support requests. One even sounded like they had one before but it seems that module (F10) no longer has Strongs. I don't know it's history so I don't know.
>>>>>>>>>
>>>>>>>>> So my question, are there any modules publicly available in French with Strongs support? If so, which ones?
>>>>>>>> I'm working on it, and any help is welcome... I have a private source of Segond with strongs (from Bibleworks), and I'm looking for a solution to merge the strong number in my translation (Crampon modernized). But I already add some numbers in it. See the module here: https://gitlab.com/crosswire-bible-society/neo-crampon-libre
>>>>>>>>>
>>>>>>>>> Thanks and God bless,
>>>>>>>>> *--
>>>>>>>>> Timmy Braun
>>>>>>>>> *
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>>>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>>>
>>>>>>>> -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>
>>>>>> _______________________________________________
>>>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>> _______________________________________________
>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>>
>>>
>>
>> --
>> signature
>>
>> Aloha,
>> */Michael Johnson/**
>> 26 HIWALANI LOOP • MAKAWAO HI 96768-8747*• USA
>> mljohnson.org <https://mljohnson.org/> • eBible.org <https://eBible.org> • WorldEnglish.Bible <https://WorldEnglish.Bible> • PNG.Bible <https://PNG.Bible>
>> Signal/Telegram/WhatsApp/Telephone: +1 808-333-6921
>> Skype: kahunapule • Telegram/Twitter: @kahunapule • Facebook: fb.me/kahunapule <https://www.facebook.com/kahunapule>
>>
>>
>> _______________________________________________
>> sword-devel mailing list:sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
--
signature
Aloha,
*/Michael Johnson/**
26 HIWALANI LOOP • MAKAWAO HI 96768-8747*• USA
mljohnson.org <https://mljohnson.org/> • eBible.org <https://eBible.org> • WorldEnglish.Bible <https://WorldEnglish.Bible> • PNG.Bible <https://PNG.Bible>
Signal/Telegram/WhatsApp/Telephone: +1 808-333-6921
Skype: kahunapule • Telegram/Twitter: @kahunapule • Facebook: fb.me/kahunapule <https://www.facebook.com/kahunapule>
More information about the sword-devel
mailing list