<HTML><BODY style="word-wrap: break-word; -khtml-nbsp-mode: space; -khtml-line-break: after-white-space; ">While the design is recursive, it is probably not going to recurse except for Raw GenBooks.<DIV><BR class="khtml-block-placeholder"></DIV><DIV>In JSword the interface for a Key allows for any Key to have children. This would be akin to a book having chapters and chapters having verses. However in the case of a Bible the key is a flat list. With regard to the storage requirements of a Key to the whole bible, the amount of storage it takes is dependent upon what kind of optimization is used for the Key. It might be a:<DIV><BR class="khtml-block-placeholder"></DIV><DIV>BitwisePassage with one bit for each verse in the Key. BitwisePassage has a constant space requirement.</DIV><DIV>RangedPassage with very little storage overhead. Each range is stored separately. It is slower to iterate over than any of the other implementations.</DIV><DIV>DistinctPassage uses way too much storage, with one Key object per verse.</DIV><DIV>PassageTally keeps a weight for each of the keys it stores. It is used prioritize search results.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>I have found that this generation of the search index is expensive. But I have found ways to make it faster. The first thing is that Lucene uses lots of temporary documents on disk to build the index. Depending on what hardware I use, I can index an entire bible from <2 minutes to 5 minutes. However, on Windows I found that it took in excess of 40 minutes. This with an AMD 2400+. I did two things that got it down to a few minutes. First I turned off Microsoft's "fast index". Turns out MS tried to index all of these temporary documents. It should not have tried to index any. Second, I was using a "smart" virus programmer that scanned every document as it is deposited on the disk or perhaps accessed from the disk. Not sure which. Turning both of these off gave me an index speed of about 4 minutes. Subsequently, I replace the virus scanner and never turned "fast indexing" back on.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>However, I don't expect that we will build an index on a "small" device. Rather, I would imagine we would pre-build it and load it.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>With regard to the Job class, we should change it to a Job interface and make the current class an implementation of it. Then we can create a null implementation of the Job that does nothing in contexts where there should be no reporting of progress. Or we can create an appropriate implementation for the target device.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Please note that from the comment "// report progress" that none of that is needed if we don't report progress.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Also, there are several opportunities for optimization here (e.g. the number of verses in a bible does not change as progress is made). Also, the implementation should be generalized a bit moe. This does not allow for indexing commentaries or dictionaries. It should allow for indexing all books.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>I'll see if I can make those changes.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>In Him,</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN>DM</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV><BR><DIV><DIV>On Dec 19, 2006, at 12:38 AM, Zhaojun Li wrote:</DIV><BR class="Apple-interchange-newline"><BLOCKQUOTE type="cite">Here are the two methods, one original, one mirror.<BR><BR> /**<BR> * Dig down into a Key indexing as we go.<BR> */<BR> private void newgenerateSearchIndexImpl( List errors, IndexWriter writer, Key key) throws BookException, IOException <BR> {<BR> int bookNum = 0;<BR> int oldBookNum = -1;<BR> int percent = 0;<BR> String name = ""; //$NON-NLS-1$<BR> String text = ""; //$NON-NLS-1$<BR> BookData data = null; <BR> Key subkey = null;<BR> Verse verse = null;<BR> Document doc = null;<BR> for (Iterator it = key.iterator(); it.hasNext(); )<BR> {<BR> subkey = (Key) it.next();<BR> if ( subkey.canHaveChildren())<BR> {<BR> newgenerateSearchIndexImpl( errors, writer, subkey);<BR> }<BR> else<BR> {<BR> data = null;<BR> try <BR> {<BR> data = book.getData(subkey);<BR> }<BR> catch (BookException e)<BR> {<BR> errors.add(subkey);<BR> continue; <BR> }<BR><BR> text = data.getVerseText();<BR><BR> // Do the actual indexing<BR> if (text != null && text.length() > 0)<BR> {<BR> doc = new Document(); <BR> doc.add(new Field(FIELD_NAME, subkey.getOsisRef(), Field.Store.YES, <A href="http://Field.Index.NO">Field.Index.NO</A>));<BR> doc.add(new Field(FIELD_BODY, new StringReader(text))); <BR> writer.addDocument(doc);<BR> }<BR><BR> // report progress<BR> verse = KeyUtil.getVerse(subkey);<BR><BR> try<BR> {<BR> percent = 95 * verse.getOrdinal() / BibleInfo.versesInBible();<BR> bookNum = verse.getBook();<BR> if (oldBookNum != bookNum)<BR> {<BR> name = BibleInfo.getBookName (bookNum);<BR> oldBookNum = bookNum;<BR> }<BR> }<BR> catch (NoSuchVerseException ex)<BR> {<BR> log.error("Failed to get book name from verse: " + verse, ex); //$NON-NLS-1$ <BR> assert false;<BR> name = subkey.getName();<BR> }<BR><BR> <BR> }<BR> }<BR> }<BR><BR> /**<BR> * Dig down into a Key indexing as we go. <BR> */<BR> private void generateSearchIndexImpl(Job job, List errors, IndexWriter writer, Key key) throws BookException, IOException<BR> {<BR> int bookNum = 0;<BR> int oldBookNum = -1;<BR> int percent = 0; <BR> String name = ""; //$NON-NLS-1$<BR> String text = ""; //$NON-NLS-1$<BR> BookData data = null;<BR> Key subkey = null;<BR> Verse verse = null;<BR> Document doc = null; <BR> for (Iterator it = key.iterator(); it.hasNext(); )<BR> {<BR> subkey = (Key) it.next();<BR> if (subkey.canHaveChildren())<BR> {<BR> generateSearchIndexImpl(job, errors, writer, subkey); <BR> }<BR> else<BR> {<BR> data = null;<BR> try<BR> {<BR> data = book.getData(subkey);<BR> }<BR> catch (BookException e) <BR> {<BR> errors.add(subkey);<BR> continue;<BR> }<BR> <BR> text = data.getVerseText();<BR> <BR> // Do the actual indexing <BR> if (text != null && text.length() > 0)<BR> {<BR> doc = new Document();<BR> doc.add(new Field(FIELD_NAME, subkey.getOsisRef(), Field.Store.YES , <A href="http://Field.Index.NO">Field.Index.NO</A>));<BR> doc.add(new Field(FIELD_BODY, new StringReader(text)));<BR> writer.addDocument(doc);<BR> }<BR> <BR> // report progress <BR> verse = KeyUtil.getVerse(subkey);<BR> <BR> try<BR> {<BR> percent = 95 * verse.getOrdinal() / BibleInfo.versesInBible();<BR> bookNum = verse.getBook();<BR> if (oldBookNum != bookNum)<BR> {<BR> name = BibleInfo.getBookName(bookNum);<BR> oldBookNum = bookNum;<BR> } <BR> }<BR> catch (NoSuchVerseException ex)<BR> {<BR> log.error("Failed to get book name from verse: " + verse, ex); //$NON-NLS-1$<BR> assert false; <BR> name = subkey.getName();<BR> }<BR> <BR> job.setProgress(percent, Msg.INDEXING.toString(name));<BR> <BR> // This could take a long time ...<BR> Thread.yield();<BR> if (Thread.currentThread().isInterrupted())<BR> {<BR> break;<BR> }<BR> }<BR> }<BR> }<BR><BR><BR><DIV><SPAN class="gmail_quote"> On 12/19/06, <B class="gmail_sendername">Zhaojun Li</B> <<A href="mailto:lzj369@gmail.com">lzj369@gmail.com</A>> wrote:</SPAN><BLOCKQUOTE class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Hi, Dear all,<BR><BR>I am new to Lucene, so please help.<BR><BR>I need to remove the job class from the current Lucene implementation. What I did is: create mirror method from generateSearchIndexImpl by removing any Job class reference. I tested it and it works. <BR><BR>However, the speed is not good. In the design, it is a recursive call. How to do multithreading for this? I mean by usual thread class, not JSWORD Job api.<BR><BR>Thanks!<BR><SPAN class="sg"><BR>Zhaojun<BR><BR> <BR><BR> </SPAN></BLOCKQUOTE></DIV><BR><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">_______________________________________________</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">jsword-devel mailing list</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><A href="mailto:jsword-devel@crosswire.org">jsword-devel@crosswire.org</A></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><A href="http://www.crosswire.org/mailman/listinfo/jsword-devel">http://www.crosswire.org/mailman/listinfo/jsword-devel</A></DIV> </BLOCKQUOTE></DIV><BR></DIV></DIV></BODY></HTML>