<html><body><div dir="ltr">Thinking about this more, I think the best possible outcome is likely if we were to deploy a mixture of global and per-module settings:</div><div dir="ltr"><br></div><div dir="ltr">No write optimising should clearly be a global choice, while word length optimising may be a value discovered during the module making ? </div><div dir="ltr"><br></div><div id="ms-outlook-mobile-body-separator-line" dir="ltr"><br></div><div id="ms-outlook-mobile-signature">Sent from <a href="https://aka.ms/o0ukef">Outlook for iOS</a></div><div id="mail-editor-reference-message-container" class="ms-outlook-mobile-reference-message"><hr style="display: inline-block; width: 98%;"><div id="divRplyFwdMsg" dir="ltr"><span style="font-family: Calibri, sans-serif;"><b>From:</b> sword-devel <sword-devel-bounces@crosswire.org> on behalf of Peter von Kaehne <refdoc@gmx.net><br><b>Sent:</b> Friday, June 6, 2025 7:52 am<br><b>To:</b> SWORD Developers' Collaboration Forum <sword-devel@crosswire.org><br><b>Subject:</b> Re: [sword-devel] RIP CLucene on Mac Silicon</span><div style="font-family: Calibri, sans-serif;"> </div></div><div dir="ltr">Here are some indexing parameters. </div><div dir="ltr" style="color: rgb(0, 0, 0);"><a href="https://getting-started-with-xapian.readthedocs.io/en/latest/concepts/indexing/limitations.html#index-limitations" rel="noreferrer noopener">https://getting-started-with-xapian.readthedocs.io/en/latest/concepts/indexing/limitations.html#index-limitations</a></div><div dir="ltr"><br></div><div dir="ltr">Not all will be relevant but default word size of 245 will be unnecessary for most languages. </div><div dir="ltr"><br></div><div dir="ltr">Peter</div><div id="ms-outlook-mobile-body-separator-line" dir="ltr"><br></div><div id="ms-outlook-mobile-signature">Sent from <a href="https://aka.ms/o0ukef">Outlook for iOS</a></div><div id="mail-editor-reference-message-container" class="ms-outlook-mobile-reference-message"><hr style="display: inline-block; width: 98%;"><div id="divRplyFwdMsg" dir="ltr"><span style="font-family: Calibri, sans-serif;"><b>From:</b> sword-devel <sword-devel-bounces@crosswire.org> on behalf of Peter von Kaehne <refdoc@gmx.net><br><b>Sent:</b> Friday, June 6, 2025 7:43 am<br><b>To:</b> SWORD Developers' Collaboration Forum <sword-devel@crosswire.org><br><b>Subject:</b> Re: [sword-devel] RIP CLucene on Mac Silicon</span><div style="font-family: Calibri, sans-serif;"> </div></div><div dir="ltr">Xapian is of course used by Gnome extensively and while initial indexing of a full home directory or full mailbox - each many multiples of a sword module or even a relatively sizeable library - can take its time I never had a concerns with index size in daily use with Gnome indexing. </div><div dir="ltr"><br></div><div dir="ltr">So could it be that the problem is not Xspian per se but the parameters we give it and the way we use for indexing? </div><div dir="ltr"><br></div><div dir="ltr" style="color: rgb(0, 0, 0);"><a href="https://getting-started-with-xapian.readthedocs.io/en/latest/advanced/scalability.html" rel="noreferrer noopener">https://getting-started-with-xapian.readthedocs.io/en/latest/advanced/scalability.html</a></div><div dir="ltr" style="color: rgb(0, 0, 0);"><br></div><div dir="ltr" style="color: rgb(0, 0, 0);">This suggests that there is a lot of possible ways of tweaking. FWIW our module indices are individual indices rather than library wide. We do not need any update facility for a search for most modules, just redo from scratch when we get a new updated module. So our trees could /should get optimised at least for that - compact size and fast reading, no writing necessary. </div><div dir="ltr" style="color: rgb(0, 0, 0);"><br></div><div dir="ltr" style="color: rgb(0, 0, 0);">Would this and any other material further down help (I have not looked too hard as I do not yet know the search related code) ? </div><div id="ms-outlook-mobile-body-separator-line" dir="ltr"><br></div><div id="ms-outlook-mobile-signature">Sent from <a href="https://aka.ms/o0ukef">Outlook for iOS</a></div><div id="mail-editor-reference-message-container" class="ms-outlook-mobile-reference-message"><hr style="display: inline-block; width: 98%;"><div id="divRplyFwdMsg" dir="ltr"><span style="font-family: Calibri, sans-serif;"><b>From:</b> sword-devel <sword-devel-bounces@crosswire.org> on behalf of Greg Hellings <greg.hellings@gmail.com><br><b>Sent:</b> Friday, June 6, 2025 5:55 am<br><b>To:</b> SWORD Developers' Collaboration Forum <sword-devel@crosswire.org><br><b>Subject:</b> Re: [sword-devel] RIP CLucene on Mac Silicon</span><div style="font-family: Calibri, sans-serif;"> </div></div><div dir="ltr"><br></div><div dir="ltr"><br></div><div dir="ltr" class="gmail_attr">On Thu, Jun 5, 2025 at 1:56 PM Karl Kleinpaste <<a href="mailto:karl@kleinpaste.org">karl@kleinpaste.org</a>> wrote:</div><blockquote style="margin: 0px 0px 0px 0.8ex; padding-left: 1ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204);"><div dir="ltr" class="gmail_quote">On 6/5/25 1:07 PM, Greg Hellings wrote:<br>
</div><blockquote><div dir="ltr" class="gmail_quote">Sword
has support for Xapian, I believe, which is a much more recent and
up to date library</div></blockquote><div dir="ltr" class="gmail_quote"><br>
<span style="font-family: FreeSerif;">Way back in November 2014, when Xapian's
presence in Sword was new, I experimented with it. The problem I
found is that its generated indices are absolutely humongous. At
the time, I wrote to the list here to say that they were a 7x size
increase, and that what was once a couple Gbytes had ballooned to
23.2Gbytes when I went through a round of mkfastmod for all my
installed modules.</span></div></blockquote><div dir="ltr" class="gmail_quote gmail_quote_container"><br></div><div dir="ltr" class="gmail_quote gmail_quote_container">Running with just the KJV module just now, I have:</div><div dir="ltr" class="gmail_quote gmail_quote_container"><br></div><div dir="ltr" class="gmail_quote gmail_quote_container">CLucene indexes the KJV in 12.5 seconds with a 12MB lucene directory</div><div dir="ltr" class="gmail_quote gmail_quote_container">Xapian indexes KJV in 31 seconds with a 185MB xapian directory</div><div dir="ltr" class="gmail_quote gmail_quote_container"><br></div><div dir="ltr" class="gmail_quote gmail_quote_container">It looks like it hasn't really gotten any better since your tests, Karl.</div><div dir="ltr" class="gmail_quote gmail_quote_container"> </div><blockquote style="margin: 0px 0px 0px 0.8ex; padding-left: 1ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204);"><div dir="ltr" class="gmail_quote" style="font-family: FreeSerif;"><br>
I would reference this from sword-devel archives, but
<a href="http://www.crosswire.org">www.crosswire.org</a> is failing to respond right now.</div></blockquote><div dir="ltr" class="gmail_quote gmail_quote_container"><br></div><div dir="ltr" class="gmail_quote gmail_quote_container">Apropos of none of the above, in order for mkfastmod to be able to make a Xapian index, I had to apply the attached patch to the released Sword 1.9.0 as it was not updated when Xapian was first present as a target. Without it, mkfastmod doesn't know that it can run and gives the error that search frameworks are not supported.</div><div dir="ltr" class="gmail_quote gmail_quote_container"><br></div><div dir="ltr" class="gmail_quote gmail_quote_container">I am also unable to pull up any of the <a href="http://crosswire.org">crosswire.org</a> site, so I don't know if the patch is applied to trunk, but I would venture to guess not. Xapian builds of Sword don't seem to be very popular so long as CLucene still exists on Linux.</div><div dir="ltr" class="gmail_quote gmail_quote_container"><br></div><div dir="ltr" class="gmail_quote gmail_quote_container">--Greg</div><blockquote style="margin: 0px 0px 0px 0.8ex; padding-left: 1ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204);"><div dir="ltr" class="gmail_quote">
_______________________________________________<br>
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
<a href="http://crosswire.org/mailman/listinfo/sword-devel" rel="noreferrer">http://crosswire.org/mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above page<br>
</div></blockquote></div><div> </div></div><div> </div></div><div> </div></body></html>