[sword-devel] Repository of uncompiled modules? Errors in Vietnamese bible !

Chris Little sword-devel@crosswire.org
Sat, 3 May 2003 09:42:50 -0700 (MST)


On Sat, 3 May 2003, Nguyen Ly wrote:

> I manage to find a mod2osis utility from the bibletechnologies.net
> (OSIS) website. Though the link on the main page is broken .. Try this
> page instead:  http://www.bibletechnologieswg.org/osis/tools/

You should use mod2vpl to turn the module into a plaintext file.  We don't 
have tools to turn an OSIS document into a Bible module yet, but we can 
use the output of mod2vpl as input to vpl2mod.  (You can find mod2vpl & 
vpl2mod in ftp://ftp.crosswire.org/pub/sword/utils/win32/ .)

> Is there currently a repository of submitted uncompiled modules
> accessible from somewhere?

Not currently, though I would like there to be one eventually, at least 
for our developers' private use.  As you figured out, the best way to get 
the text is by exporting from the module, but you can also find the 
Vietnamese text at http://www.unboundbible.com/zips/index.cfm .  Their 
UTF-8 text is what ours is based on, so you might have an easier time 
working with their non-UTF-8 text if they made encoding errors.
 
> I've been using the Vietnamese bible and found some minor errors in the
> text. For eg, some characters don't display properly - (John 1:19) --
> this is due to a conversion problem from the original text file being
> converted from VNI to unicode format. Another issue is with the spelling
> of the word "Jesus" in Viet. In some places it's spelt Je^sus and in
> others it's Gie^-xu. Technically, the later is more correct since
> there's no "J" in the Viet alphabet. This may make searches inaccurate.

Do you have a print version of the same Bible?  If you do, and it 
consistently uses Gie^-xu instead of Je^sus, then this change is 
appropriate.  But from the source files, it appears that Je^sus is far 
more common than Gie^-xu since Je^sus has 1168 occurrences and Gie^-xu has 
only 3.  Google also shows a preference for Je^sus (2280 occurrences) over 
Gie^-xu (910 occurrences).  So the spelling change is does not appear to 
me to be appropriate.

--Chris