<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Please remember,<br>
<br>
SWORD already supports a search normalization layer. We have
normalizers for many things like accents, diacritics, etc., that
we run on the text before passing the text to lucene (or using our
own search mechanism).<br>
<br>
SWORD has distinct stages where it applies filters. The two most
obvious are the render stage and the search stage (names Render
and Strip in the engine). We have many filters that do many
different things and any can be applied to a module for
normalizing during search by including a:
LocalStripFilter=FilterName in the module's .conf file.<br>
<br>
Here are the filters currently available:<br>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/svn/sword/trunk/src/modules/filters/">http://www.crosswire.org/svn/sword/trunk/src/modules/filters/</a><br>
<br>
<br>
So, for example, we use have:<br>
<br>
LocalStripFilter=UTF8GreekAccents<br>
LocalStripFilter=PapyriPlain<br>
<br>
To normalize papyrilogical searches on the Duke Databank of
Papyri:<br>
<a class="moz-txt-link-freetext" href="http://crosswire.org/study/wordsearchresults.jsp?mod=DDP&searchTerm=%CF%80%CE%B1%CF%81%CE%B1%CE%B3%CE%B3%CE%B5%CE%BB%CE%BB*">http://crosswire.org/study/wordsearchresults.jsp?mod=DDP&searchTerm=%CF%80%CE%B1%CF%81%CE%B1%CE%B3%CE%B3%CE%B5%CE%BB%CE%BB*</a><br>
<br>
These normalizations discussed certainly need to be discussed and
considered but we have a mechanism in place to do this in SWORD.<br>
<br>
Troy<br>
<br>
<br>
<br>
On 03/03/2013 05:57 PM, DM Smith wrote:<br>
</div>
<blockquote
cite="mid:DD0A06DB-1C52-4A58-9224-4639E42CC988@crosswire.org"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<br>
<div>
<div>On Mar 3, 2013, at 11:53 AM, Chris Burrell <<a
moz-do-not-send="true" href="mailto:chris@burrell.me.uk">chris@burrell.me.uk</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<p dir="ltr">Yes although in French only the contacted form is
correct</p>
<div><br>
</div>
</blockquote>
<div><br>
</div>
WRT indexing and searching, it really doesn't matter which is
correct. The normalization is not visible to the user.
Normalization often goes to forms that are ugly for the
end-user.</div>
<div><br>
</div>
<div>-- DM</div>
<div><br>
<blockquote type="cite">
<div class="gmail_quote">On 3 Mar 2013 16:10, "David Haslam"
<<a moz-do-not-send="true"
href="mailto:dfhmch@googlemail.com">dfhmch@googlemail.com</a>>
wrote:<br type="attribution">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
There are similar issues in French modules.<br>
<br>
e.g. Some French Bibles have "coeur", some have "cœur",
and some even use<br>
both!<br>
<br>
etc., etc.<br>
<br>
David<br>
<br>
<br>
<br>
--<br>
View this message in context: <a moz-do-not-send="true"
href="http://sword-dev.350566.n4.nabble.com/Searching-for-hyphenated-words-tp4652016p4652042.html"
target="_blank">http://sword-dev.350566.n4.nabble.com/Searching-for-hyphenated-words-tp4652016p4652042.html</a><br>
Sent from the SWORD Dev mailing list archive at <a
moz-do-not-send="true" href="http://Nabble.com">Nabble.com</a>.<br>
<br>
_______________________________________________<br>
sword-devel mailing list: <a moz-do-not-send="true"
href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
<a moz-do-not-send="true"
href="http://www.crosswire.org/mailman/listinfo/sword-devel"
target="_blank">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above
page</blockquote>
</div>
_______________________________________________<br>
sword-devel mailing list: <a moz-do-not-send="true"
href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
<a moz-do-not-send="true"
href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above page</blockquote>
</div>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
</blockquote>
<br>
</body>
</html>