[sword-devel] Fw: Re: [modules] New Beta Module: Tyndale
David Haslam
dfhdfh at protonmail.com
Sun May 11 16:35:03 EDT 2025
I'm forwarding this to the wider community, in order to obtain a response regarding my suggestion that we design a new SWORD filter to process abbreviations.
See my last reply to the modules team for details.
Best regards,
David
Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
------- Forwarded Message -------
From: David Haslam <dfhdfh at protonmail.com>
Date: On Sunday, May 11th, 2025 at 4:45 PM
Subject: Re: [modules] New Beta Module: Tyndale
To: domcox at crosswire.org <domcox at crosswire.org>, Fr Cyrille <fr.cyrille at tiberiade.be>
CC: modules at crosswire.org <modules at crosswire.org>
> Dear all,
>
> Today, I have begun to examine the use of Roman numerals to translate numbers in the Tyndale module text exported using diatheke.
>
> The following records match a simple PCRE that simply looks for words that consist entirely of the permitted lowercase letters found in numbers using Roman numerals.
>
> Here's my PCRE: [ijvxlcdm]+
>
> The search was performed on the word frequency analysis already done using BabelPad Tools.
>
>> 1 cxliiii
>> 38 did
>> 15 i
>> 16 ii
>> 1 iic
>> 22 iii
>> 31 iiii
>> 1 iiiii
>> 68 iiij
>> 81 iij
>> 137 ij
>> 16 ix
>> 25 l
>> 1 li
>> 1 liii
>> 2 liiij
>> 3 liij
>> 1 lij
>> 2 lix
>> 3 lvij
>> 10 lx
>> 2 lxi
>> 2 lxiiij
>> 4 lxij
>> 1 lxix
>> 4 lxv
>> 1 lxvj
>> 25 lxx
>> 2 lxxiiij
>> 2 lxxiij
>> 2 lxxij
>> 6 lxxv
>> 1 lxxvi
>> 1 lxxvij
>> 3 lxxx
>> 1 lxxxiij
>> 2 lxxxij
>> 2 lxxxvi
>> 1 lxxxvij
>> 1 lxxxx
>> 1 m
>> 7 mi
>> 1 mid
>> 86 v
>> 26 vi
>> 43 vii
>> 5 viii
>> 26 viij
>> 133 vij
>> 5 vj
>> 51 x
>> 9 xi
>> 45 xii
>> 1 xiiii
>> 20 xiiij
>> 4 xiij
>> 31 xij
>> 1 xix
>> 1 xj
>> 59 xl
>> 2 xli
>> 2 xlii
>> 3 xliiii
>> 1 xliij
>> 1 xlij
>> 1 xlix
>> 4 xlv
>> 3 xlvi
>> 1 xlviij
>> 1 xlvij
>> 18 xv
>> 6 xvi
>> 3 xviii
>> 1 xviij
>> 4 xvij
>> 53 xx
>> 1 xxiii
>> 7 xxiiii
>> 2 xxiiij
>> 1 xxiij
>> 3 xxij
>> 2 xxix
>> 1 xxj
>> 2 xxv
>> 2 xxviij
>> 2 xxvij
>> 51 xxx
>> 1 xxxiiij
>> 3 xxxiij
>> 6 xxxij
>> 3 xxxv
>> 2 xxxvi
>> 1 xxxviii
>> 1 xxxviij
>> 5 xxxvij
>
> Observations:
>
> - Most of the numbers in verse text that potentially match Roman numerals are lowercase.
> - There are 103 unique strings that potentially matchRoman numerals irrespective of case.
>
> - There are 95 unique strings that potentially match lowercase Roman numerals.
>
> - A few of these can be discounted as being ordinary words: "did", "mi", "mid", etc.
> - Arabic numeral 4 is often represented as either "iiii" or "iiij" instead of "iv" reflecting the usage of that period.
> - The use of the alternative final letter "j" in place of "i" is likely to be a printer's flourish of that period.
> - The vast majority of such strings found in verse text are marked with a period (full stop) fore 'n' aft. e.g. ".xxx."
> - Some strings omit one or both of these period delimiters!
> - Some strings are wrongly preceded by ". " rather than " ." (misplaced delimiter due to OCR error ?)
> - The total number of matches to PCRE "\W[ijvxlcdm]+\W" (without the quotes) is 1293
> - Ofthose 1293, only 958 match the PCRE "\.[ijvxlcdm]+\." (i.e. with both the properperioddelimiters).
>
> - That leaves 335 instances in which there's a missing or misplaced period delimiter (or which are ordinary words).
> - Searching for patterns that include uppercase Roman numerals is more difficult because of the very common word "I" (first person pronoun).
> - The total number of matches to PCRE "\W[ijvxlcdmJVXLCDM]+\W" (without the quotes) is 1314.
> - That means we thereby discovered 21 further potential candidates in which at least one letter is uppercase, excluding "I",
>
> If the Tyndale Bible was printed consistently with every number properly delimited between two periods, and always lowercase,
> then it has become apparent that there are many instances where the digitised text did not faithfully transcribe many of these!
>
> We therefore require the upstream source to be thoroughly checked in this regard, and edited to fix all such OCR errors.
>
> Looking to the future, we might also make good use of the OSIS element abbr to encode all such numbers. E.g.
>
> <abbr type="x-Roman" expansion="30">.xxx.</abbr>
>
> Aside: It would be a cool enhancement to the SWORD API to provide support for a new filter:
>
> GlobalOptionFilter=OSISExpandAbbreviations
>
> cf. Does the SWORD API already provide any support for the abbr element? If so, what is the functionality ?
>
> Best regards,
>
> David
>
> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>
> On Sunday, May 11th, 2025 at 3:35 PM, David Haslam <dfhdfh at protonmail.com> wrote:
>
>> Dear Cyrille, dear Dom,
>>
>> In numerous places, the digital text of the Tyndale module omits the macron over a vowel that's there in the original printed pages. e.g. Abraha - should be Abrahā.
>>
>> This is just one example of the many kinds of deficiencies in the upstream source.
>>
>> Fixing these in the upstream source would require a lot of intensive effort.
>>
>> Best regards,
>>
>> David
>>
>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>>
>> On Wednesday, May 7th, 2025 at 7:29 PM, David Haslam <dfhdfh at protonmail.com> wrote:
>>
>>> Hi Cyrille,
>>>
>>> Unless users know what the MALTESE CROSS & the CROSS PATTY WITH RIGHT CROSSBAR actually denote, how does including them help the Bible student?
>>>
>>> - Can we try to we find out more?
>>> - Would ChatGPT help in any way?
>>>
>>> Best regards,
>>>
>>> David
>>>
>>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>>>
>>> On Wednesday, May 7th, 2025 at 6:30 PM, Fr Cyrille <fr.cyrille at tiberiade.be> wrote:
>>>
>>>> Le 07/05/2025 à 15:08, David Haslam a écrit :
>>>>
>>>>> Hi Cyrille,
>>>>>
>>>>> Why was only one correction made?
>>>>>
>>>>> I listed two locations where the verse hadn't been properly referenced!
>>>>>
>>>>> - You have fixed Acts 9:38:
>>>>> -
>>>>>
>>>>> You have not fixed Revelation of John 1:9:
>>>>
>>>> I did, but i missed osisID....
>>>>
>>>>> And those tow types of peculiar symbol are all still there!
>>>>>
>>>>> - 3 of U+2720 ✠ MALTESE CROSS
>>>>> - 5 of U+2E50 ⹐ CROSS PATTY WITH RIGHT CROSSBAR
>>>>
>>>> Ok you want it to be removed?
>>>>
>>>>> Best regards,
>>>>>
>>>>> David
>>>>>
>>>>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>>>>>
>>>>> On Wednesday, May 7th, 2025 at 1:50 PM, domcox at crosswire.org domcox at crosswire.org wrote:
>>>>>
>>>>>> This is to announce that we have just now uploaded Tyndale
>>>>>> in the CrossWire beta repository for testing purposes.
>>>>>>
>>>>>> If no raised concern nor a quality alert has been sent on the list,
>>>>>> Tyndale will be published in a week.
>>>>>>
>>>>>> This is an update.
>>>>>> Language=English
>>>>>> Version=2.0
>>>>>> History_2.0=(2025-05-07) New source
>>>>>> TextSource=https://en.wikisource.org/wiki/Bible_(Tyndale)
>>>>>> Versification=KJV
>>>>>>
>>>>>> Many thanks to everyone who contributed to this release.
>>>>>>
>>>>>> yours
>>>>>>
>>>>>> P.S.: This email is sent automatically.
>>>>>>
>>>>>> _______________________________________________
>>>>>> modules mailing list
>>>>>> modules at crosswire.org
>>>>>> http://www.crosswire.org/mailman/listinfo/modules
>>>>
>>>> --
>>>> Vous aimez la Bible ? Vous êtes étudiant en théologie ? Utilisez l'application libre [Xiphos](https://xiphos.org/) ou [Andbible](https://andbible.github.io/) et accédez aux textes sources, à des commentaires, des dictionnaires et beaucoup d'autres fonctionnalités... Me contacter pour des traductions en français.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250511/246c2f05/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Analysis.7z
Type: application/x-compressed
Size: 45153 bytes
Desc: not available
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250511/246c2f05/attachment-0001.bin>
More information about the sword-devel
mailing list