[sword-devel] Question for Optimization / speed experts

Christian Renz sword-devel@crosswire.org
Mon, 6 Jan 2003 19:18:24 -0300


>BibleTime supports displaying different lexicons side by side, it offers only 
>the keys which are available in all of the chosen lexicons.

Some ideas for this:

- You don't have to build the whole list at once. Build e.g. a list of
  20 or 30 items, then calculate more as the user scrolls up/down. You
  might even calculate the list using an idle thread. 

- Personally, I would find a list of *all* keys more desirable (that
  is, list any key that is found in any of the chosen lexicons). I
  took a look at Bibleworks when visiting my brother (he is teaching
  OT) and noticed it has a feature where instead of having a window
  per dictionary, you have one window displaying the definitions in a
  list of selected dictionaries, like this:

    x Dictionaries -----------*
    | Yaddayadda              |
    | ...                     |
    |                         |
    | BDB                     |
    | ...                     |
    |                         |
    | Websters                |
    | ...                     |
    |                         |
    * ------------------------*

  It is a nice feature I would like to see in sword as well.

- Your algorithm is O(m * n), which explains the runtime. The
  algorithm proposed by Glen looks good, especially as it runs in
  linear time (ie, looks at each entry exactly once). However, it can
  still be optimized: If an element is > the currently active item,
  you use that one to continue your search. Also, you can step the
  marker more often. I implemented the algorithm in perl (just for
  fun), but tried to not use perl-specific optimizations. I put some
  code into subroutines so the main algorithm reads better.

--------------- 8< --------------- 8< --------------- 
# Calculate intersection of sets. 20030106 crenz@web42.com

use strict;
use warnings;

my @sets =
    ([ 0, 1, 2,   3,          7,   8, 9, 10,   11, 12, 13,   14, 15,    ],
     [ 0,         3,          7,   8,    10,   11,     13,   14,    16, ],
     [ 0, 1, 2,   3,    5,    7,   8,    10,   11, 12, 13,   14,    16, ],
     [       2,   3, 4,    6, 7,   8,    10,   11, 12,       14,    16, ],
     [ 0, 1, 2,   3,          7,   8, 9, 10,   11,     13,   14, 15,    ]);

my $numSets = scalar @sets; # $numsets = 5;
my @intersection = ();
my $currentSet = 0;
my @marker = (0, 0, 0, 0, 0);
my $currentItem;

# helper functions 

sub stepMarker {
    $marker[$currentSet]++;
    if ($marker[$currentSet] >= scalar @{$sets[$currentSet]}) {
        # we are at the end of the current set
        # => abort, there cannot be any more common items
        last MAIN;
    }
}

sub nextSet {
    $currentSet++;
    $currentSet %= $numSets;
}

sub getCurrentItem {
    return $sets[$currentSet]->[$marker[$currentSet]];
}


# main intersection algorithm

my $activeItem = getCurrentItem;
my $isPresent = 1;
MAIN: while (1) {
    stepMarker();
    nextSet;

    while (($currentItem = getCurrentItem()) < $activeItem) {
        stepMarker();
    }

    if ($currentItem > $activeItem) {
        $activeItem = $currentItem;
        $isPresent = 1;
    } else {
        $isPresent++;

        if ($isPresent == $numSets) {
            # item has been found present in all sets
            push @intersection, $activeItem;
            stepMarker();
            $activeItem = getCurrentItem;
            $isPresent = 1;
        }
    }
}

print "Common items in the $numSets sets: @intersection\n";
--------------- 8< --------------- 8< --------------- 

  If you want to combine all items present in any list, you could use
  Mergesort. It's only O(n * log n) for two lists.

Greetings,
   Christian

-- 
crenz@web42.com - http://www.web42.com/crenz/ - http://www.web42.com/

"Few are those who can see with their own eyes and hear with their own
hearts."  -- Albert Einstein