[sword-devel] KJV2006 - 4th Beta
DM Smith
dmsmith555 at yahoo.com
Sat Mar 25 10:14:09 MST 2006
L.Allan-pbio wrote:
> Thanks for working on this.
>
> I would add a vote for providing KjvLite (with all or most of the
> embedded tags removed.)
Here you go: http://www.crosswire.org/~dmsmith/kjv2006/kjvlite.zip (If
it is not there, check back later)
I created it by running the following xsl. I left divineName,
transChange, notes and q.
If you don't want these, it is a trivial change from match="osis:w" to
match="osis:w|osis:q" (or what ever you don't want.)
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:osis="http://www.bibletechnologies.net/2003/OSIS/namespace"
>
<xsl:output method="xml" indent="no"/>
<xsl:template match="osis:w"><xsl:apply-templates /></xsl:template>
<!-- ignore markup notes -->
<xsl:template match=" osis:note[@type = 'x-strongsMarkup'] |
osis:milestone[@type = 'x-strongsMarkup']"/>
<!-- Copy all remaining nodes -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
>
> LcdBible does a one-time detagging of verses with a one-pass
> state-machine defilter, and puts the entire contents in a 4 meg buffer
> .... and "brute-force" strstr searches are about 10x - 50x faster.
> I've been working on Boyer-Moore-Horspool searching, which can be
> quite a bit faster once the search word(s) are four characters or longer.
>
>
> ----- Original Message ----- From: "DM Smith" <dmsmith555 at yahoo.com>
> To: "SWORD Developers' Collaboration Forum" <sword-devel at crosswire.org>
> Sent: Saturday, March 25, 2006 8:59 AM
> Subject: Re: [sword-devel] KJV2006 - 4th Beta
>
>
>> L.Allan-pbio wrote:
>>> DM,
>>>
>>> I tried out the "raw" KJV2006-Beta-4 with sword.exe rc1 ....
>>>
>>> drum roll, please <g>
>>>
>>> Well, nothing shows up. Am I providing "dummy checking" or otherwise
>>> doing something wrong?
>>
>> I guess you are providing "dummy checking". (I'm the *dummy* for not
>> actually testing this module! ;)
>>
>>>
>>> The KJV2006-ztext shows up ok, but not the raw vpl.
>>>
>>> When I try to look at nt and ot with a text editor, I'm getting a
>>> message:
>>> "nt" contains characters that do not exist in code page 1252 (ANSI -
>>> Latin I). They will be converted to the system defualt character, if
>>> you click ok."
>>>
>>> After clicking ok, I'm still not seeing anything.
>>>
>>> I looked at kjvraw.conf, and see that it is trying to use ztext:
>>> [KJVraw]
>>> DataPath=./modules/texts/rawtext/kjvraw/
>>> ModDrv=zText
>>> BlockType=BOOK
>>> CompressType=ZIP
>>>
>>> I changed that to WEB settings, and it shows up ok.
>>
>> It was a cut and paste error in the conf. I fixed the conf and
>> re-zipped the file. It should work now (still haven't tested it! :)
>>
>>>
>>> Mark 1:9 seems to be a reasonable length (was over 15000 before).
>>>
>>> Line 7972 and 8275 are about 3800 characters long. There are a
>>> number of verses that are nearly 3000 characters long.
>>
>> I took a look at the verse at line 7972 in the raw NT module and it
>> seems reasonable. It is just the by-product of deep, rich markup.
>>
>> The nature of the strong's markup for the NT portion of the KJV2003
>> module and preserved here is that every word in the TR is represented
>> with a <w> tag.
>> The form of the w tag supplies the following attributes:
>> src="n" where n contains the position of the Greek word in the TR.
>> lemma="strong:G1234" where it provides the strong's number that
>> can be looked up in Strongs.
>> morph="robinson:T-ASM" where it provides the morphology code that
>> can be looked up in Robinsons
>> If the word is translated into a phrase and some of the words in the
>> phase are not from it then the <w> tag is split into several
>> non-adjacent parts and each of these has extra attributes:
>> type="x-split"
>> subType="x-n" where n is the "split" number. (Not sure what it
>> means or how it was derived, or how it is used, if at all)
>> For Greek words that are not translated, the <w> element is fully
>> attributed but empty. Its position in the verse may be anywhere.
>> Additionally, in the KJV words that are in italics in the print
>> version are represented with <transChange type="added">added
>> words</transChange>
>> And if the verse contains a quote from Jesus, it is marked up as well.
>> Most verses in the NT part of the module also have elements
>> representing an audit trail.
>> There may be other markup.
>>
>> There are over 50 word from the Greek being represented in the verse.
>> There are a lot of italic words.
>> The verse contains a quote of Jesus.
>>
>> So yes, length seems long, but it is very reasonable.
>>
>>>
>>> HTH,
>>>
>>> ----- Original Message ----- From: "DM Smith" <dmsmith555 at yahoo.com>
>>> To: "SWORD Developers' Collaboration Forum" <sword-devel at crosswire.org>
>>> Sent: Saturday, March 25, 2006 7:22 AM
>>> Subject: [sword-devel] KJV2006 - 4th Beta
>>>
>>>
>>>> YAB (yet another beta) (downloadable from the links at the bottom
>>>> of http://www.crosswire.org/~dmsmith/kjv2006)
>>>>
>>>> Again, I really value your input and the time you take to evaluate
>>>> these betas.
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>>
>>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>>
>>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
>
More information about the sword-devel
mailing list