Chris Umphress wrote:
>>>When I run osis2mod on the osis file I obtained from CCEL, I have the four
>>>files that are produced. Problem: the nt file is rediculously large, the the
>>>range of hundreds of megabytes, but doesn't contain any text. It simply
>>>contains OSIS format, verse delimiters mostly. The ot file, on the other hand
>>>is empty.

Make sure you are using the earliest generation file you can find. In 
other words, figure out who created the file originally and get it from 
them. CCEL credits Unbound Bible in the case of the French Darby. 
However, Unbound Bible doesn't create content, to my knowledge, so they 
got their text from somewhere else. See if you can find the actual 
source and then see if the text is really public domain.

(Unbound Bible gets a number of their texts from Online Bible. The 
Online Bible version of the French Darby says it is copyright 1991. If 
this is the source of the text that Unbound Bible and CCEL have then we 
cannot use it.)

> The file frdarb.osis does not appear to follow the OSIS standards. I
> am still surprised that osis2mod creates a 700MB output file, but I
> believe this may be caused by the formatting of frdarb.osis.

osis2mod does not do any kind of validation on your OSIS file input. We 
expect you to supply it with valid, best-practice conformant OSIS 
(2.0+). If it results in any errors when you run it on such files, 
please let us know so that we can improve the importer.

> If it were written this way, the file would be parsed correctly using osis2mod:
> <verse osisID="Gen.1.9">
> 9 ¶ Et Dieu dit: Que les eaux qui sont au-dessous des cieux se
> rassemblent en un lieu, et que le sec paraisse. Et il fut ainsi.
> </verse>
> <verse osisID="Gen.1.10">
> 10 Et Dieu appela le sec Terre, et le rassemblement des eaux, il l'appela Mers.
> </verse>

Those leading verse numbers in the text should also be removed. And the 
pilcrow characters should be replaced by paragraph containers. 
(Incidentally, Online Bible inserts those pilcrow characters 
automatically, which leads me to believe this text came from them.)


