[jsword-devel] Patching from on versification to another

Thu Mar 14 18:06:50 MST 2013

When searching the archives use the word "mapping", there are some pertinent discussions.

If you come up with a mapping that works that's great! If our data structure is very clear and can be migrated to C++ (by someone else) that's even better. If we can externalize it as a file that can be loaded by JSword, then SWORD can use it directly.

BTW, I hope to be able to externalize the V11N so that a module can supply one. With mapping (patching) another file would need to be supplied.

The representation of a mapping in Java is a lot easier (as you have described) than getting the data correct.

Some embedded comments below:

On Mar 14, 2013, at 7:03 PM, Chris Burrell <chris at burrell.me.uk> wrote:

> I will wait to see if anyone responds, but I thought I'd emailed a few months ago and got no response (although I can't find my original email). While I understand that support for versification as a whole might be tricky, I'm not sure I agree with the fact that patching from one to another is tricky as well.
> 
> If you allow you master versification to be able to have split verses, then everything becomes rather trivial
> 
> 2 verse in the original, 1 verse in the KJV-based master
> Gen.1.1=Gen.1.1
> Gen.1.2=Gen.1.1

Right.
> 
> 1 verse in the original, 2 verses in the master
> Gen.1.1=Gen.1.1-Gen.1.2
Is there a good representation that the rest of the verses are off by one?

> 
> Using splits, so that 2 separate versifications can refer to the same parts of verses
> Gen.1.1=Gen.1.1a
> Gen.1.2=Gen.1.1b
> Gen.1.3=Gen.1.1.c

I'm not sure that splits should be represented. The modules certainly wouldn't mark the division between a, b, c, ....

> 
> Then another versification can also refer to the same parts. It would become necessary to keep track of what the parts are, such that they can be easily re-used when approrpriate. For example, a second versification might be mapped as follows:
> 
> Gen.1.1=Gen.1.1a-Gen.1.1b
> Gen.1.2=Gen.1.c
> 
> No mapping
> Osis IDs mean the same thing.
> 
yes.

> I think the above covers, "Split verse", "Merged verse", Different verse boundaries.
> 
> Chapter boundaries can be mapped equally the same:
> Gen.1.1=Gen.1.1
> Gen.1.2=Gen.2.1

Having a compact representation for the resulting shift would be good.

> 
> indicates that that Genesis 1:2 can be found in Gen 2:1 in the master.
> 
> Extra verses
> We simply introduce some identified ids (osis ids, or other form of unique identifier) in the master to identify the content of these verses
> 
> Additional verses within a chapter can be represented using the above.

So, the KJV has a chapter 4 with 20 verses. A v11n B differs from the KJV in that B has 6-11 that are not in the KJV and 12 to the end of the chapter in B is the same as 6-20 in the KJV.

So then the KJV is not the base, but a modified representation of the KJV that has 7 more verses in chapter 4. This modified representation is what I was calling the "rosetta stone".

I think a mapping has to be bi-directional.

Missing verses
A v11n might leave out something in the middle of a KJV chapter. (The ESV has this, but shrewdly has a note in those verses explaining that the verse is not Biblical).

You said that if a mapping is not present then the verses are the same between the two. How do you represent a hole?
> 
> Psalm headings:
> Ps.53.1=Ps.53.1 (not required, or required if map it to 'nothing' or '0')
> Ps.53.2=Ps.53.1
> Ps.53.3=Ps.53.2
> 
> We can reduce this to if we want to introduce a less verbose way of mapping things.
> Ps.53.2-7-=Ps.53.1-+1 (where minus simply indicates that there is an offset of 1 compared to the master)
I hadn't noticed this when I asked about a compact representation.

> 
> I don't think we need to introduce splits on the left hand-side. The reason being, you can't do anything with an OSIS id of Gen.1.1a, since you're going to retrieve a whole verse anyway, so we can keep things with splits only-ever on the right hand side.
> 

I don't think we need splits on either side.

> For verse ranges, we can expand out to its list of verses contained in the source versification first. Then there needs to be a choice by the user/software of whether we're attempting to compare a contiguous section in the target versification, or verses of the same content..
> 
> ---
> 
> I saw some posts on the archives, but not a lot. One by Greg H, whom I agree with, in that the KJV versification should be the master + extended bits from the apocrypha. That makes it easier for people to write mappings.
> 
> I agree with your last point, that coming up with the mappings can be hard, and it will take someone to understand where the verses really different in content. But I think the above system is pretty simple.
> 
> I disagree with the idea of having a master versification being "the rosetta stone", if by that we mean inflexible to change.

I didn't mean it should be inflexible to change. It may be resistant to change.

> There are bound to be changes (new/custom versifications, bug fixes, oversights, etc.) We can easily provide a tool that inserts splits and rewrites all the mappings safely for that.

Yes.

> 
> I'm happy to work with others if they want to work on this, but I've found most of the posts on the lists about this are rather old. I'll wait and see what comes of my post, but on the other hand, STEP needs this rather quickly (doing text comparisons, parallel texts and interlinears). I have the mappings for the Masoretic Text which would solve most of STEP's interlinear issues when using 1 or more ancient texts. And there's nothing to say that we don't put in the mappings one by one, especially if we provide a tool that can do that.

Go for it.

> 
> To be clear, I'm not trying to solve your last point of "In each of these it is important to determine what will be represented by a v11n".

My point was that the granularity of a split would never be represented in the data structures of a v11n or in a module, so it can safely be ignored.

> This someone else can do when they come up with a different v11n. The problem of mapping them is distinctively separate and the one I'm trying to address.
> 
> Maybe I'm missing something?

Be ready for surprises. I don't know anything that's missing.

> Chris
> 
> 
> 
> On 13 March 2013 22:46, DM Smith <dmsmith at crosswire.org> wrote:
> The work has started. And it is very hard. Not trying to discourage you, but it'd be better to work with others. Give it some time. Perhaps search the sword-devel archives for discussions regarding the difficulties and the work that needs to go into it.
> Harry Plantinga (over at CCEL) has done some work on this, too. (Hope I have his name right.)
> 
> Here are some of the issues of comparing one version to another:
> Split verse. This causes an increase in the number of verses in one chapter.
> Merged verses. This causes a decrease in the number of verses in one chapter.
> Different verse boundaries. This causes no change in the number of verses in one chapter. (E.g. Some Greek NT have then end of John 1:3 as the start of John 1:4)
> Split chapters. This causes the reduction of the number of verses in one chapter and the increase in the number of chatpers in a book. (The German tradition splits the last chapter of Malachi)
> Different chapter boundaries. This causes the number of verses in one chapter to increase(decrease) and in the next to decrease(increase). (The Greek NT's sometimes put the last few verses of one chapter into the beginning of the other. or the first few verses of one chapter at the end of the prior).
> Additional or fewer verses at the end of a chapter. (Mark 16:9-20 which some regard as an apocryphal addition to the NT.)
> Additional or fewer verses in the middle of a chapter. (The placement of the samaritan woman at the well differs by tradition.)
> Additional or fewer chapters in the middle of a book.
> Psalms having the canonical introduction to the Psalm being verse 1.
> The Apocrypha is a mess of inconsistencies. In some versifications, the apocrypha is inserted into canonical books of the Bible. E.g. Esther, Daniel.
> 
> In each of these it is important to determine what will be represented by a v11n. For example, verses that have different verse boundaries would not be represented by different v11ns.
> 
> I think Chris Little was working on a new v11n that basically took all (many?) of the different traditions and merged them together. Then this would be the rosetta stone. I don't know how far along he is on that. Or whether it is still in ideation.
> 
> With the "super" v11n, it would be used transitively to line up the different verses.
> 
> Some v11ns are traditions by language. E.g. Germans and Russians each have their own particular v11ns. It'd be reasonable for a bi-lingual user to view books in parallel from different v11ns. I wonder whether native speakers will be needed to help figure out the mapping.
> 
> While I have studied English, French, Italian, Spanish, German, Greek and Hebrew and have a Masters of Divinity, I don't feel competent to tackle this. (code wise yes, but not the actual mapping) Also, there are other things that more readily occupy my frontal lobes.
> 
> -- DM
> 
> 
> On Mar 13, 2013, at 5:54 PM, Chris Burrell <chris at burrell.me.uk> wrote:
> 
>> Hi DM
>> 
>> I'm wondering about doing some of the work to convert from one v11n to another, especially if I don't hear back on the sword-devel.
>> 
>> I'm thinking of doing the following :
>> - create a converter that goes through to a "master" versification as you suggested. We define the mappings for those OsisIDs that don't match. The master version is based on the KJV versification (for ease of being able to create new mappings) + the books that aren't in the KJV.
>> 
>> - mappings include just the bits that don't match up.
>> 
>> - mappings can be define for sets of verses with offset, e.g. Psalm 51:1-5=Psalm 51:1+1 (where +1 indicates an offset of 1 to be applied to the source.
>> 
>> - Have the concept of split verses (i.e. Rom 1:1a, Rom 1:1b, etc.)
>> 
>> - A tool to create a split in the master versification, which rewrites all the current versifications, such that when someone writes a mapping and needs to split a verse, they can introduce the split without adversely affecting every other mapping.
>> 
>> - the lookup process takes an String (OSIS ID) with a source and target versification. (also overload that with Key/Verse). It then queries the master mapping, which returns 1 or more entries (an entry being a whole verse, or set of split verses). Then goes to reverse mappings for the target versification and does the same. For incomplete verses on one side or the other, we round up the closest verse.
>> 
>> - The reason for basing the versification on KJV (or some other English versification), is that's it easy to work against. The alternative would be to go for the most split versification ever found, but that becomes painful in the future if someone decides to split another verse in 2 parts.
>> 
>> - When introducing a split, we would want to record what the split actually refers to (i.e. what content). This wouldn't be used by the library, but instead be useful for people coming along and writing new mappings.
>> 
>> Those are some of my thoughts so far.
>> Chris
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130314/13776a36/attachment-0001.html>