[sword-devel] KJV2006 - 4th Beta

DM Smith dmsmith555 at yahoo.com
Sat Mar 25 10:14:09 MST 2006



L.Allan-pbio wrote:
> Thanks for working on this.
>
> I would add a vote for providing KjvLite (with all or most of the 
> embedded tags removed.)
Here you go: http://www.crosswire.org/~dmsmith/kjv2006/kjvlite.zip (If 
it is not there, check back later)

I created it by running the following xsl. I left divineName, 
transChange, notes and q.

If you don't want these, it is a trivial change from match="osis:w" to 
match="osis:w|osis:q" (or what ever you don't want.)

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:osis="http://www.bibletechnologies.net/2003/OSIS/namespace"
  >

  <xsl:output method="xml" indent="no"/>

  <xsl:template match="osis:w"><xsl:apply-templates /></xsl:template>

  <!-- ignore markup notes -->
  <xsl:template match=" osis:note[@type = 'x-strongsMarkup'] | 
osis:milestone[@type = 'x-strongsMarkup']"/>

  <!-- Copy all remaining nodes -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

>
> LcdBible does a one-time detagging of verses with a one-pass 
> state-machine defilter, and puts the entire contents in a 4 meg buffer 
> .... and "brute-force" strstr searches are about 10x - 50x faster. 
> I've been working on Boyer-Moore-Horspool searching, which can be 
> quite a bit faster once the search word(s) are four characters or longer.
>
>
> ----- Original Message ----- From: "DM Smith" <dmsmith555 at yahoo.com>
> To: "SWORD Developers' Collaboration Forum" <sword-devel at crosswire.org>
> Sent: Saturday, March 25, 2006 8:59 AM
> Subject: Re: [sword-devel] KJV2006 - 4th Beta
>
>
>> L.Allan-pbio wrote:
>>> DM,
>>>
>>> I tried out the "raw" KJV2006-Beta-4 with sword.exe rc1 ....
>>>
>>> drum roll, please <g>
>>>
>>> Well, nothing shows up. Am I providing "dummy checking" or otherwise 
>>> doing something wrong?
>>
>> I guess you are providing "dummy checking". (I'm the *dummy* for not 
>> actually testing this module! ;)
>>
>>>
>>> The KJV2006-ztext shows up ok, but not the raw vpl.
>>>
>>> When I try to look at nt and ot with a text editor, I'm getting a 
>>> message:
>>> "nt" contains characters that do not exist in code page 1252 (ANSI - 
>>> Latin I). They will be converted to the system defualt character, if 
>>> you click ok."
>>>
>>> After clicking ok, I'm still not seeing anything.
>>>
>>> I looked at kjvraw.conf, and see that it is trying to use ztext:
>>> [KJVraw]
>>> DataPath=./modules/texts/rawtext/kjvraw/
>>> ModDrv=zText
>>> BlockType=BOOK
>>> CompressType=ZIP
>>>
>>> I changed that to WEB settings, and it shows up ok.
>>
>> It was a cut and paste error in the conf. I fixed the conf and 
>> re-zipped the file. It should work now (still haven't tested it! :)
>>
>>>
>>> Mark 1:9 seems to be a reasonable length (was over 15000 before).
>>>
>>> Line 7972 and 8275 are about 3800 characters long. There are a 
>>> number of verses that are nearly 3000 characters long.
>>
>> I took a look at the verse at line 7972 in the raw NT module and it 
>> seems reasonable. It is just the by-product of deep, rich markup.
>>
>> The nature of the strong's markup for the NT portion of the KJV2003 
>> module and preserved here is that every word in the TR is represented 
>> with a <w> tag.
>> The form of the w tag supplies the following attributes:
>>    src="n" where n contains the position of the Greek word in the TR.
>>    lemma="strong:G1234" where it provides the strong's number that 
>> can be looked up in Strongs.
>>    morph="robinson:T-ASM" where it provides the morphology code that 
>> can be looked up in Robinsons
>> If the word is translated into a phrase and some of the words in the 
>> phase are not from it then the <w> tag is split into several 
>> non-adjacent parts and each of these has extra attributes:
>>    type="x-split"
>>    subType="x-n" where n is the "split" number. (Not sure what it 
>> means or how it was derived, or how it is used, if at all)
>> For Greek words that are not translated, the <w> element is fully 
>> attributed but empty. Its position in the verse may be anywhere.
>> Additionally, in the KJV words that are in italics in the print 
>> version are represented with <transChange type="added">added 
>> words</transChange>
>> And if the verse contains a quote from Jesus, it is marked up as well.
>> Most verses in the NT part of the module also have elements 
>> representing an audit trail.
>> There may be other markup.
>>
>> There are over 50 word from the Greek being represented in the verse.
>> There are a lot of italic words.
>> The verse contains a quote  of Jesus.
>>
>> So yes, length seems long, but it is very reasonable.
>>
>>>
>>> HTH,
>>>
>>> ----- Original Message ----- From: "DM Smith" <dmsmith555 at yahoo.com>
>>> To: "SWORD Developers' Collaboration Forum" <sword-devel at crosswire.org>
>>> Sent: Saturday, March 25, 2006 7:22 AM
>>> Subject: [sword-devel] KJV2006 - 4th Beta
>>>
>>>
>>>> YAB (yet another beta) (downloadable from the links at the bottom 
>>>> of http://www.crosswire.org/~dmsmith/kjv2006)
>>>>
>>>> Again, I really value your input and the time you take to evaluate 
>>>> these betas.
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>>
>>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>>
>>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
>



More information about the sword-devel mailing list