[sword-devel] Optimizing index time Was: Re: module modtime -vs- CLucene index out-of-date-ness
DM Smith
dmsmith555 at yahoo.com
Thu May 3 03:48:28 MST 2007
That's a 3x improvement under Windows when Google indexing and Mcafee
is on.
I don't have commit privs for this patch.
Would someone else please commit it?
In His Service,
DM
On May 2, 2007, at 11:42 PM, Chris Little wrote:
> My benchmark system is a 2.0GHz Pentium-M with a 7200RPM drive and
> 1.25GB ram. These are times for compressing KJV (which takes
> significantly longer than most other Bibles).
>
> Old mkfastmod (mcafee protection & google indexing on):
> 5m33.007s
>
> Old mkfastmod (mcafee protection & google indexing off):
> 4m16.322s
>
> New mkfastmod (virus protection/search indexing made insignificant
> differences):
> 1m46.252s
>
>
> --Chris
>
>
> DM Smith wrote:
>> Karl fixed the bugs in my patch and I am attaching a new patch.
>> His statistics under cygwin on Windows XP against all modules:
>> Before: old mkfastmod: 344.577u 129.499s 9:08.59 86.4%
>> After: new mkfastmod: 328.452u 29.749s 6:20.30 94.1%
>> (The three values are: user, system, wall and cpu)
>> So there was nearly a 30% gain.
>>
>>
>> ---------------------------------------------------------------------
>> ---
>>
>>
>> Chris has volunteered to benchmark under Windows.
>>
>>
>> On May 2, 2007, at 7:52 PM, DM Smith wrote:
>>
>>> Attached is a patch that uses the RAMDirectory. It parallels the
>>> JSword code and it compiles, but other than that I have not
>>> tested it.
>>>
>>> Would any of you mind testing it, especially in Windows with Virus
>>> scanning on and also off. There should be negligible difference
>>> between the two. Also, measure RAM usage when indexing a Bible with
>>> Strong's numbers, like the KJV.
>>>
>>> <patch.zip>
>>>
>>> In His Service,
>>> DM
>>>
>>> On May 2, 2007, at 4:54 PM, DM Smith wrote:
>>>
>>>> Chris Little wrote:
>>>>> Unfortunately, that's impractical. With a virus scanner on, the
>>>>> compression takes 5 minutes for a single Bible (OT+NT) on my Win32
>>>>> system (2GHz Pent-M, 7200RPM drive), due to the constant disk
>>>>> access. We
>>>>> would either have to tell users to disable virus protection or
>>>>> deal with
>>>>> people complaining that their systems freeze every time they
>>>>> add/update
>>>>> a module.
>>>>> --Chris
>>>>>
>>>>
>>>> Actually, Lucene has an implementation of a RamDirectory to
>>>> which the
>>>> index can be written. And once completed it can be copied to the
>>>> local
>>>> file system. We've done it in JSword and the results were
>>>> phenomenal. I
>>>> presume that the CLucene implementation is sufficiently similar to
>>>> Lucene to have it. It is less than 10 lines of additional code
>>>> in Java.
>>>>
>>>> The only problem is that it eats RAM proportional to the size of
>>>> the
>>>> final index. I have not measured it to see how big it is, but since
>>>> Win98SE with all the updates on an old Pentium laptop is hardly
>>>> usable
>>>> with less than 64M RAM, I think that most machines have enough RAM.
>>>> After ugrading my old laptop to 128M ram, JSword can index in
>>>> about 4
>>>> minutes, whereas I never had the patience to let it complete
>>>> before.
>>>>
>>>> That aside, it shifts from being disk bound to cpu bound and the
>>>> machine
>>>> is still practically unresponsive. So I think that it will still be
>>>> impractical.
>>>>
>>>>>
>>>>> Kahunapule Michael Johnson wrote:
>>>>>
>>>>>> What about updating the Sword engine to index each module as
>>>>>> it is
>>>>>> installed, if the indexing can be used. That way, you get small
>>>>>> downloads for everyone, faster searches for those who can use
>>>>>> indexes,
>>>>>> and a little more module installation time.
>>>>>>
>>>>>> Just a thought...
>>>>>>
>>>>>> Michael
More information about the sword-devel
mailing list