#1537 Binary search generalization

Yuri Strot Tue 17 May 2011

sys::List.binarySearch has one small issue - it requires a key. However it doesn't work in some cases. For example I have person class:

class Person
{
  Int id
  Str firstName
  Str lastName
  Str age
  Person[] kids
  ...
}

If I have list of persons sorted by id I can't use binary search to get person by id because I haven't person. Yeah, I can create fake person but it's additional overhead and it's ugly.

Actually binary search doesn't requre a key, because on each step we already have one and just need to determine where we should go. So it can be simplified as following:

List.newBinarySearch(|V -> Int|)

Default implementation is just a special case when we already have comparator or want to use default one:

List.binarySearch(V key, |V, V -> Int| f := defComparer)
{
  List.newBinarySearch(|V k -> Int| { f(k, key) })
}

Does it make sense?

brian Tue 17 May 2011

Does it make sense?

No...

Are you suggesting to replace binarySearch API with a single V->Int function? Not sure I follow how you actually pass the key you want to search for?

vkuzkokov Tue 17 May 2011

Not sure I follow how you actually pass the key you want to search for?

This way.

List.binarySearch(V key, |V, V -> Int| f := defComparer)
{
  List.newBinarySearch(|V k -> Int| { f(k, key) })
}

Bind second parameter of comparator to the key you want to search, thus getting a |V->Int| function.

newBinarySearch is a more general case allowing to search without having any key instance.

Yuri Strot Tue 17 May 2011

OK, let's look to the binary search implementation:

//consider that 'f(a, b)' our compare function
public final long binarySearch(Object key, Func f)
{
  Object[] values = this.values;
  int low = 0, high = size-1;
  while (low <= high)
  {
    int probe = (low + high) >> 1;
    Object val = values[probe]; // 1
    int cmp = f.call(val, key); // 2
    if (cmp < 0) low = probe + 1;
    else if (cmp > 0) high = probe - 1;
    else return probe;
  }
  return -(low + 1);
}

On each iteration we have probe index in our list and do following:

  1. get element val at probe index
  2. calculate cmp based on val
  3. determine next step based on cmp

The most tricky step is 2. We require user to specify function f with two arguments and always use key (specified by user) as a second argument. The simplification I'm talking about is following (only API and step 2 was changed):

//f ~ |V -> Int|
public final long binarySearch(Func f)
{
  Object[] values = this.values;
  int low = 0, high = size-1;
  while (low <= high)
  {
    int probe = (low + high) >> 1;
    Object val = values[probe]; // 1
    // use f to get direction based on the current key
    int cmp = f.call(val); // 2
    if (cmp < 0) low = probe + 1;
    else if (cmp > 0) high = probe - 1;
    else return probe;
  }
  return -(low + 1);
}

I'm not talking about replacing because first variant usable as well when we already have a comparator. But I want to have both, because second variant useful when we haven't a key. Also as I mentioned we need only one implementation, because the first variant can be simply implemented with the second (as I demonstrated in the first comment).

brian Tue 17 May 2011

Okay that makes perfect sense now.

I think the only thing we need to move this forward is to decide:

  1. do we keep both versions
  2. what should we call this new version?

Yuri Strot Tue 17 May 2011

  1. I think we should keep current binarySearch as is for compatibility. Also I'm sure that such API is exactly what user expect. So we need to have both.
  2. Good question. I'm trying to find something like this in other langs/libs but with no luck. Does anybody know about such binary search somewhere else?

qualidafial Tue 17 May 2011

  1. I'd say keep both, they are useful in different scenarios.
  2. unarySearch? ;) Here are all the methods I saw with Func arguments that accept one list element for evaluation: all, any, exclude, find, findAll, findIndex. So.. binaryFind maybe?

Also, in the binarySearch fandoc, s/insertation/insertion/ :)

DanielFath Tue 17 May 2011

My suggestion - binaryFind. It's similar to search but implies slight difference. To me search implies searching for something, while finding could be more like discovering something that matches a criteria.

brian Wed 18 May 2011

I like binaryFind, seems like the best suggestion so far.

ykuzkokov or ystrot - do you guys want to email me a patch?

Yuri Strot Wed 18 May 2011

binaryFind looks cool. I'll work on it.

Login or Signup to reply.