[sword-devel] Musings about the Cherokee NT module
DM Smith
dmsmith at crosswire.org
Sun Jul 1 13:45:26 MST 2012
First a couple of assumptions regard the algorithm:
1) That transliterated Cherokee to Latin produces names that are somewhat similar to English ones.
2) At least they are more similar to a name than to a non-name word.
3) For a given verse having more than one name, the result is accurate.
You already know how to loop in perl, so that is not important. So start with a single verse having names.
For each name "N" in the verse using the English name list for that verse
do
for each word "W" in the same verse in the Cherokee text
do
compare "W" to "N" to get a score "S"
if "N" has not been scored or if the "S" is better than the current "S" for "N"
then
save the "W" and "S" for "N"
end if
done
done
Of course, this double loop is classically inefficient. But is is easy to understand.
The work is in the comparison. There are a variety of comparisons. Chris mentioned 2:
a) Edit distance - The minimum number of edits on "Wa" that will transform it into "Wb", where an edit is either a character insertion, a character deletion, a character change or a swap of adjacent characters.
b) Soundex - I think this is an invention of Donald Knuth. Basically, it converts a string into a score based upon it's pronunciation (using computer rules). The closer the soundex scores the closer the words are to each other.
I didn't look but I'm pretty sure that these two are available in CPAN.
Is that mire detail enough?
BTW, a language that does not break between one word and the next would require extra-effort to break the text up into words, or a sliding window of indeterminate length would be needed.
-- DM
On Jul 1, 2012, at 3:51 PM, refdoc at gmx.net wrote:
> While I must confess my interest in Cherokee is fairly limited, the process of proximity testing would be extremely helpful for study bible creation in any number of languages. Could you explain the algorithms with mire details? Are there cpan or python modules available?
>
> Sent from my HTC
>
> ----- Reply message -----
> From: "Greg Hellings" <greg.hellings at gmail.com>
> To: "SWORD Developers' Collaboration Forum" <sword-devel at crosswire.org>
> Subject: [sword-devel] Musings about the Cherokee NT module
> Date: Sun, Jul 1, 2012 8:09 pm
>
>
> On Sun, Jul 1, 2012 at 1:48 PM, DM Smith <dmsmith at crosswire.org> wrote:
> > I think what Greg said was correct. I understood it the same way:
> >
> > Starting w/ a breakdown of the names in each verse in an English Bible, iterate over that set of verses in the Cherokee Bible, doing the following:
> >
> > For each word in the original Cherokee text transliterate into latin characters (So A -> B for each word). Then heuristically compare the list of English names to each B, determining the best B for each English name (So, C -> B for each word).
> >
> > At this point we have a transitive relationship of A -> B -> C thus, A can be tagged as a name.
>
> Yes, this is my understanding as well. The process of transliterating
> B back into A' (hopefully the same as A but unlikely to be 100% so) is
> unnecessary because the A -> B process will be done in memory and in
> the same process as the comparison. That way it already has the
> mapping between the original and the transliteration and can properly
> tag the original text rather than trying to round-trip the process.
>
> On the otherhand, if we have a Latin version of a text that is closer
> to Cherokee than any of our English texts, we can probably do the
> heuristic step with even more confidence.
>
> --Greg
>
> >
> > As an optimization, the mapping from A <-> C can be retained and used.
> >
> > In Him,
> > DM
> >
> > On Jul 1, 2012, at 2:34 PM, David Haslam wrote:
> >
> >> Hi Greg,
> >>
> >> If all we wanted to achieve is the capitalization of proper names for the
> >> transliteration, the back conversion wouldn't be needed.
> >> We could even make a Cherokee Latin module, were we so inclined.
> >> /Aside - I've even actually made one for myself, but without any uppercase
> >> letters/.
> >>
> >> It only becomes relevant were we to go the whole way to restore the original
> >> orthography
> >> with proper names and sentence starts having 20% enlarged Cherokee syllabary
> >> symbols,
> >> as observed in the PDF file I downloaded from Google books.
> >>
> >> Does this answer your question?
> >>
> >> Remember - these are musings that emerged from pursuing my curiosity much
> >> further than I'd normally do.
> >>
> >> For me it was what I learned during the process that's important, yet
> >> something significant has emerged.
> >> Namely that a detailed inspection of
> >> http://en.wikipedia.org/wiki/Cherokee_syllabary brought to light:
> >>
> >> (a) the potential for inaccuracies arising for theoretical a round trip
> >> (b) confirmation from the Cherokee NT text that this was real, not just
> >> theoretical
> >> (c) a demonstrable method to workaround this weakness
> >>
> >> The Wikipedia page is something I edited while I was engaged in this line of
> >> investigation.
> >> In particular, this section.
> >> http://en.wikipedia.org/wiki/Cherokee_syllabary#Transliteration_issues
> >> was added.
> >>
> >> Best regards,
> >>
> >> David
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context: http://sword-dev.350566.n4.nabble.com/Musings-about-the-Cherokee-NT-module-tp4650474p4650482.html
> >> Sent from the SWORD Dev mailing list archive at Nabble.com.
> >>
> >> _______________________________________________
> >> sword-devel mailing list: sword-devel at crosswire.org
> >> http://www.crosswire.org/mailman/listinfo/sword-devel
> >> Instructions to unsubscribe/change your settings at above page
> >
> >
> > _______________________________________________
> > sword-devel mailing list: sword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel
mailing list