[sword-devel] phrasal concordance

Fri Jun 23 16:34:57 MST 2006

Yes!  That sounds like it's on the same track as I was thinking, except I'm 
not sure if it would end up with every combination.  It would need to 
consider not only "in the", "in the beginning", etc., but also "the 
beginning", etc., that would need not only making the phrases longer and 
longer, but should also start with each consecutive word, using every length 
beginning from every word as long as (of course) the whole length doesn't go 
beyond the end of the book.

The results might need to have an additional reference number for the number 
of the word in the verse, so "in", "in the", "in the beginning", "the", "the 
beginning", etc. would be referenced like: 1:1:1:1 (1 word phrase), 1:1:1:1 
(2 word phrase), 1:1:1:1 (3 word phrase), 1:1:1:2 (1 wp), 1:1:1:2 (2 wp), 
etc.

If a list of each unique word along with id numbers for each could be used 
in conjunction with a book full of those numbers instead of the words they 
represent, iterating through all the different combinations would probably 
be less processor intensive (which is a major problem I ran into).  Using 
numbers instead of words might also make analysis of the finished product 
easier.

I hope my words are clear enough.  If you'd like further thoughts from me, 
please ask.

JB
----- Original Message ----- 
From: "Troy A. Griffitts" <scribe at crosswire.org>
To: "SWORD Developers' Collaboration Forum" <sword-devel at crosswire.org>
Sent: Friday, June 23, 2006 6:06 PM
Subject: Re: [sword-devel] phrasal concordance

> Jeremy,
> It would be interested to write a text analysis program that followed
> some algol like:
>
> search("in the"), results?  store write an entry: ["in the" : result
> verses] and add a word
> search("in the beginning")....
>
> if no results, drop one word at the front, search ("the beginning") and
> continue adding words and writing entries until no results.
>
>
> Not sure if this would be the best way to produce such research, but it
> would be neat to see such.  Maybe a first pass which scores every word
> by the total number of times used.  Then you could score phrases higher
> by number of words and words less frequent.
>
> It sounds like it might produce interesting research data.
>
> -Troy.
>
>
>
> Jeremy Bickel wrote:
>> Hello all.  I really hope this is the right place for this.  If not,
>> please forgive me. :-D
>>
>> What about a hard coded (smallish, if possible) concordance of every
>> sized phrase in any book (Genesis, Ruth, Josephus, etc.), from 1 to x
>> words (where 1 word phrases would be a traditional concordance)?  Then
>> this could be used, for instance, to quickly identify similarities of
>> text.  A 20 word long phrase in a single book (that's doesn't go between
>> books), found in multiple places, perhaps in multiple books, might shed
>> light on significant phrases, which light might be otherwise obscure.
>>
>> On a first look, this might not seem important, because a search is
>> already incorporated into Sword.  But in thinking about this a while, I
>> can see very good possibility with it.
>>
>> Thanks.
>> ###########################
>> God is love Himself.  God is completely just.  Fear Him and be at peace.