[sword-devel] Question for Optimization / speed experts

David White sword-devel@crosswire.org
07 Jan 2003 20:45:51 +1100


- Personally, I would find a list of *all* keys more desirable (that
>   is, list any key that is found in any of the chosen lexicons). I
>   took a look at Bibleworks when visiting my brother (he is teaching
>   OT) and noticed it has a feature where instead of having a window
>   per dictionary, you have one window displaying the definitions in a
>   list of selected dictionaries, like this:

I think that perhaps this is so. At the least, one could have the option of either the union or intersection.

- Your algorithm is O(m * n), which explains the runtime. The
>   algorithm proposed by Glen looks good, especially as it runs in
>   linear time (ie, looks at each entry exactly once). However, it can
>   still be optimized: If an element is > the currently active item,
>   you use that one to continue your search. Also, you can step the
>   marker more often.

I really don't think an extreme amount of optimization is required; just
moving to an O(n) algorithm instead of the current O(m*n) should be
sufficient imho.

> I implemented the algorithm in perl (just for
>   fun), but tried to not use perl-specific optimizations. I put some
>   code into subroutines so the main algorithm reads better.
> 

Not being able to resist the sirens call to compare programming
languages using highly contrived examples, I have whipped up an
equivalent implementation in C++. It is as follows:

----

#include <algorithm>
#include <iostream>
#include <numeric>
#include <vector>

using namespace std;

int main()
{
    const int sets[][20] =
    {{ 0,1,2, 3,       7, 8, 9, 10, 11, 12, 13, 14, 15,    -1,},
     { 0,     3,       7, 8,    10, 11,     13, 14,    16, -1,},
     { 0,1,2, 3,  5,   7, 8,    10, 11, 12, 13, 14,    16, -1,},
     {      2,3,4,  6, 7, 8,    10, 11, 12,     14,    16, -1,},
     { 0,1,2, 3,       7, 8, 9, 10, 11,     13, 14, 15,    -1,}};

    vector<int>
results(sets[0],sets[0]+sizeof(sets[0])/sizeof(*sets[0]));

    for(int i = 1; i != sizeof(sets)/sizeof(*sets); ++i) {
        vector<int> v;
        v.swap(results);

        set_intersection(v.begin(),v.end(),
                         sets[i],find(sets[i],(const int*)NULL,-1),
                         back_inserter(results));
    }

    cout << "common items in the sets: ";
    copy(results.begin(),results.end(),ostream_iterator<int>(cout," "));
    cout << "\n";
}

---

Note that one can easily change from the intersection to the union by
changing set_intersection to set_union.

Blessings,

-David.