#884 List API proposals

ivan Thu 24 Dec 2009

Sometimes it can be useful to have List.minIndex and List.maxIndex methods which are equivalent to sys::List#min and sys::List#max, but return indices instead of values.

Also it would be great to have binary search methods which return first or last element index, for example binarySearchFirst and binarySearchLast

EDIT: removed invalid code snippet

brian Sat 26 Dec 2009

minIndex and maxIndex I could potentially see, although I've never actually stumbled across a case where I've wanted those myself - so I'd have to hear some other votes before adding those methods

the binarySearchFirst and binarySearchLast I'm a bit more reluctant about adding; what exactly is the difference b/w binarySearch and binarySearchFirst? and is binarySearchLast just handles if you have a sequence of the same item?

helium Sat 26 Dec 2009

C++ has a std::lower_bound and std::upper_bound which return the first / last index at which you could insert the value without violating the ordering of the collection. I don't know if that is what ivan wants. Those functions give you useful results even if the collection does not contain the value you search for, whereas if I remember correctly Fantom returns a negative value if the item is not present.

index := list.binarySearchFirst(x)
list.insert(index, x)   // always valid and preserves ordering

DanielFath Sat 26 Dec 2009

On the topic of min/max index, wouldn't it be better to add a single function like

Int[]? indices(V val)

that returns indices for given value, instead of adding two function for corner cases? You could chain it them to achieve desired effect.

indices(list.max)

Personally, I think sys::List is just getting a tad too bloated for my taste.

helium Sat 26 Dec 2009

That's something totally different, isn't it?

list := [1, 5, 9]
index := list.binarySearchFirst(3)  // returns 1
list.insert(index, 3)               // list is now [1, 3, 5, 9]

How would you do that with your indices?

KevinKelley Sat 26 Dec 2009

We've got a couple different things conflated into one class. min and max (and the suggested minIndex and maxIndex) only make sense for unsorted lists; binarySearch and its proposed friends only make sense for sorted.

I agree that the extra methods would be convenient, but when we've already made the compromise to use a single List class for anything that's at all list-like, instead of having List, OrderedList, Stack, ... then it's probably better to not overload the namespace too much. List is already kind of full, seems to me.

helium Sat 26 Dec 2009

KevinKelley is right. List is a monolithic class.

And to me it seems it even has duplicates: What's the difference between add and push? Is push just there to make it look like a stack? But than where are the methods to make it look like a queue?

DanielFath Sat 26 Dec 2009

How would you do that with your indices?

I was referring to List.max(min)Index, not the BinarySearch.

helium Sun 27 Dec 2009

Sorry, my fault.

DanielFath Sun 27 Dec 2009

In hindsight my post was ambiguous so I changed it a bit.

I like the humane approach of first, last, each, eachr... But it really gets bogged down when you merge stack and list functionality with nearly any List-like structure doubling the interfaces for no good reason.

I really think there should be helper/wrapper/util methods when instantiating Stacks, OrderedLists, Queue and Deque.

brian Sun 27 Dec 2009

First a bit of philosophy - I am a big believer in Humane Interfaces. I think 100 classes with 100 methods is better way to organize code than 1000 classes with 10 methods. Just a matter of a more balanced tree :-)

So I don't mind having lots of useful methods on core classes like List, Map, DateTime, etc because I believe that is the right way to organize code. However, there is still a fairly high bar to add methods to a core class.

Right now I'd say these proposed methods don't seem to meet the bar.

DanielFath Sun 27 Dec 2009

Sure I respect that but when you start duplicating functionality (with peek / first and add / push) you invalidate DRY principle which IMO is more important than Humane Interface.

Also tighter coupling causes less code smells.

ivan Mon 28 Dec 2009

For binarySearchFirst and binarySearchLast I meant the following:

i := [1, 3, 3, 3, 3, 3, 5].binarySearch(3) //i == 3
i := [1, 3, 3, 3, 3, 3, 5].binarySearchFirst(3) //i == 1
i := [1, 3, 3, 3, 3, 3, 5].binarySearchLast(3) //i == 5

Regarding minIndex and maxIndex I meant:

min := [1, 3, 5].min |a,b| { a<=>b } //min == 1
minIndex := [1, 3, 5].minIndex |a,b| { a<=>b } //minIndex == 0

Of course the same result can be easily achieved with using indexOf, but this is not efficient

DanielFath Mon 28 Dec 2009

Why wouldn't indexOf work? I mean if you don't even want all indices what is the point? Some marginal speed up? What would require this level of optimization?

Another thing why would binarySearchFirst (and last) take anything but the comparator?

ivan Mon 28 Dec 2009

binarySearchFirst/'Last' should have exactly the same signature as binarySearch.

regarding minIndex - it's not a performance issue, I just thought that probably it looks more natural:

minIndex := myList.minIndex(...)

than:

minIndex := myList.index(myList.min(...))

It's somehow similar to remove and removeAt for me - of course it's possible to write list.removeAt(list.index(obj)) instead of list.remove(obj), or list.remove(list[index]) instead of list.removeAt(index)

Anyway, I agree, these methods are too rare and not vital

KevinKelley Mon 28 Dec 2009

When there are several items that match, binarySearch is, by its nature, going to be indeterminate in which of the matches it returns. So, for sorted lists where identical items are allowed, when you use binarySearch to find the insertion point for a new value, you need to write something like

ip := list.binarySearch(value)
while (ip > 0 && list[ip-1] == value) // if prev also matched,
  ip = ip-1;                          // back up
// okay to insert

I think I agree with this one ( binarySearchFirst and ...Last) - at least, I do know that using binarySearch is awkward. My biggest thing with it is trying to interpret its -(insertPoint-1) returned value; I have to write a little test case every single time :-)

The only reason I hesitated to agree is, that I also agree that sys::List already has a pretty large footprint. But this particular use case is fairly perilous; something maybe should be done.

As to the minIndex and maxIndex: maybe I'm missing something. If the list is sorted, min and max are first and last, so I guess you're talking about unsorted lists for those. In that case, you have to look at every item anyway, right, so why not findIndex ? (edit: I see ivan got a response ahead of me, so nevermind this paragraph :-)

ivan Mon 28 Dec 2009

Yes, findIndex makes sense. However, I think that min and minIndex are on the same level of usefulness. Also I agree that sys::List is way too big now. Probably it is time to review it and move some methods to some util class? So, instead of list.min |a,b| {a <=> b}, there should be ListUtil.min(list) |a, b| {a <=> b}? Here the list of methods which I guess can be moved to utility:

Converting intersection and union to static methods also brings "varargs" advantage - instead of

list1.interserction(list2).intersection(list3)

there can be:

ListUtil.intersection([list1, list2, list3])

Also, it may be useful to introduce some mixin for List so utility methods can work with this mixin, not with list

jodastephen Mon 28 Dec 2009

Intention revealing

Having re-read the humane docs, I think the key phrase I read was that the methods selected to be on the API should be "intention revealing". Thus, last is a good method name because it reveals more about the developers intention when re-reading the code at a later point in time.

The same intention revealing approach should apply to types as well as methods. A Stack is not a List and a List is not a Stack. Thus, IMO Fantom is playing with fire by trying to conflate them in one class. Now, this may partly be due to the specific generics implementation, and partly due to the lack of a language feature to easily delegate functionality to another class, but either way I believe it to be a mistake.

Method proposals

  • containsAny - similar to containsAll, but returns true if this list contains any of the other list.
  • methods to mutate the existing list (keep/filter), like exclude or findAll
  • fill might be better named addMultiple
  • unique might be better named deduplicate (although having a Set class would be more intention rvealing)
  • addUnique adds to the list if not already there, although again a Set would be clearer

Same

containsAllSame, containsSame, indexSame, removeSame.... These have been added because there are two ways to compare objects as equal. Except that isn't true, there are many ways. For example, if each object contains both state compared in equals, and a unique object id, then there will be a requirement for "contains by id". Thus, we should remove containsSame and change contains:

Bool contains(V item, |V, V -> Bool| matcher := EqualsMatchers.equals)

where Fantom supplies EqualsMatchers.equals and EqualsMatchers.same, and an application can supply "equals by id". (Ideally, EqualsMatchers.equals should just be Obj#equals that hoists its target to be an additional first arg, but I'm not sure thats possible in Fantom right now)

While

eachWhile, eachrWhile.... These still smack of a lack of a language feature. Now I know we've looked at this in the past, but perhaps we need to think again.

Idempotent (added in an edit)

Perhaps there should be an @Idempotent facet, as having it in the docs means it can't be queried by the application.

jodastephen Mon 28 Dec 2009

Binary search

I think that what might be needed is to return a pair instead of a single Int:

Int[] binarySearch(V item)

(Ideally, the result type should be a Pair<Int,Int> in Java generics speak)

Index 0 of the result is the index of the first match, or the insertion point before the match.

Index 1 of the result is the index just after the last match, or the insertion point after the match.

If index 0 == index 1, then no match was found.

DanielFath Mon 28 Dec 2009

Thanks KevinKelley, I don't use binarySearch so I didn't know that it was that much of a hassle.

Ivan, well it would be convenient but as you said it is a real corner case. I would prefer if there was a superior way to express method chaining like list.index(.max) or something but if all else fails an extra method would be a nice addition (to ListUtil just to be clear).

However what jodastephen said really hit the mark. List, Stack and Set (and to some extent OrderedList) aren't the same thing. They can all be implemented using list but when you merge their functionality it really doesn't look nice.

I was thinking the exact thing about Same in methods. It should be removed and === should just become optional/default parameter.

brian Mon 28 Dec 2009

The binarySearch just maps to API used by java.util.Collections which maybe is not the best API, but seems reasonable. In general if you have multiple items you don't care where it gets inserted, and we don't want to sacrifice performance.

Regarding the the xxxSame methods - the reason I didn't make those closures was strictly for performance. I feel that finding something using the === is common enough in performance critical situations that it warrants an optimized method.

Although it might be a good idea to add closures to contains, containsAll

andy Mon 28 Dec 2009

Although it might be a good idea to add closures to contains, containsAll

You can do that today with find - just have to check for null. If the closure for contains was optional - then maybe its ok - otherwise seems like alot of overlap.

@minIndex,maxIndex

I don't see this a common enough to warrant methods on List - not everything has to be done with a single method call.

jodastephen Thu 7 Jan 2010

The binarySearch just maps to API used by java.util.Collections which maybe is not the best API, but seems reasonable. In general if you have multiple items you don't care where it gets inserted, and we don't want to sacrifice performance.

If you're worried about performance, then add a second method binarySearchIndices. This is one case where the Java method, copied by Fantom, is horrible to use for anything other than the simple use case.

Regarding the the xxxSame methods - the reason I didn't make those closures was strictly for performance.

Maybe speed is occasionally important. But it doesn't remove the use case of equals/contains etc based on something other than .equals().

I'd also like to hear comments on mutating the existing list, and on some of the method renames I proposed.

katox Thu 7 Jan 2010

fill might be better named addMultiple

I'm not too sure about this, fill is a rather traditional name. On the other hand I've been bitten more then once when trying

listOfLists.fill([,], max)

So if it were to be renamed I'd prefer a name which would be really clear -- that this results in [..., a, a, a] not in [..., a1, a2, a3]

jodastephen Thu 7 Jan 2010

fill makes sense if the list is empty (ie. as a factory method). addMultiple is clearer as to what is really occurring, and also fits if try to use the list as a bag data structure (although again, just like Stack and Set, it should be a separate class.

brian Thu 7 Jan 2010

I'd also like to hear comments on mutating the existing list, and on some of the method renames I proposed.

  • containsAny: might be useful, although it doesn't exist because I've actually never needed it
  • mutate exclude/findAll: you end up needing to create a separate array under the covers, so not sure we get any performance benefit
  • fill might be better named addMultiple: I think fill is right name for common case when working with empty list; maybe we should use a range instead of a count?
    list.fill(0..5, "")
  • unique: I think this is a good simple name for what it does
  • addUnique: might be useful (although you are right, that really implies Set whichis better backed by map than list)
    • contains, index: these already have methods which take a closure (find, any, all, findIndex, etc);

Perhaps there should be an @Idempotent facet, as having it in the docs means it can't be queried by the application.

I like that idea.

jodastephen Thu 7 Jan 2010

  • containsAny - we use that quite a lot at $work (travel industry) Its a pain its not in the Java collections framework.
  • mutators - this isn't about performance! Consider a "Java bean", a regular mutable POJO. I want to be able to filter the list on that bean in situ:

    // proposed bean.flights.removeMatching(someCriteriaInAClosure)

    // today? temp := bean.flights temp2 := temp.exclude(someCriteriaInAClosure) bean.flights.clear() bean.flights.addAll(temp2)

I should also say that exclude() violates my expectations, as I'd expect it to mutate the list, not return a new one. (And the examples for exclude, find and so on are misleading too as they don't assign the result to a variable)

  • fill - its certainly clearer with a range
  • unique - again this is something we use all the time at $work, but we would need to mutate, not return new, maybe a new method?

Many methods are duplicated with a mutating and non-mutating version. But I still have difficulty know which are which.

A general comment: I also have large classes with Joda-Time/JSR-310. But I make them understandable by having relatively few common verbs. This greatly aids navigation by auto-complete, which is the most common technique these days. For example, when changing the time-zone offset, you can retain the same local time, or the same instant. So I use a common verb prefix to make them appear together in auto-complete:

date.withOffsetSameLocal(offset)
date.withOffsetSameInstant(offset)

katox Thu 7 Jan 2010

that really implies Set which is better backed by map than list

Wouldn't adding Set/Bag/Dict types have a large impact on current API? It would have to be expanded hierarchically to avoid duplicating everything for similar but distinct types, uncessarily complicating everything.

I've always been more productive with just gluing stuff with associative arrays than for instance STL. If you really need some kind of uber data structure I think you should use a specialized library (outside core language).

A Stack is not a List and a List is not a Stack.

Sounds like an invitation to generics.

Most programmers know (or even seek for) DS backing structures anyway. If there were no Stack class or conveniece methods they'd probably use plain [,] to simulate it (or they'd roll a completely new lib if generics were available).

Having some O(n), 0(1), ... guarantees in docs on things like size() would be nice though.

brian Fri 8 Jan 2010

The Fantom convention I use is that you never expose a mutable list, mutation is done safely inside a class and only a ro or toImmutable copy is every exposed. So I think adding more mutating methods would be the wrong path to head down.

Actually I think we made a big mistake making sort/reverse in-place. We just make all those transform methods (map, sort, etc) return new lists. Any objections to making them return new lists? I'm thinking it would be a much better design, because often now you are given a ro list to sort and end up doing someList.dup.sort, but with this change you can safely just take any list (ro or immutable) and just call sort.

Regarding Stacks, Sets, etc... I am firm believer that just about every API problem where you share a data structure b/w encapsulated APIs can be boiled down to just lists and maps. You might use a more complicated data structure internally, but you should design APIs that share only lists or maps. This is why I've placed so much emphasis on these two classes and they are given special generic privilege. And I don't even want to consider introducing full generics or additional generic classes at this point.

Now perhaps the thorn in the side is Sets. In general I've always used Map as a poor maps sets (where I just use the keys and ignore the value). Sets are best publically exposed as a List, but backed by a map. And they are common enough to warrant generic parameterization.

But I'm thinking that for all practical purposes the List API is the Set API. it is just a question of a different backing store. I solved this problem elegantly with Map by just making caseInsensitive and ordered fields to change the backing store to support different use cases. And I think we can do the same with List by just adding an isSet field. Then we can get all the List API and generic goodness. What do you guys think of that idea?

andy Fri 8 Jan 2010

Actually I think we made a big mistake making sort/reverse in-place

I agree, lets make them all consistently return a new list - this one gets me alot.

I am firm believer that just about every API problem where you share a data structure b/w encapsulated APIs can be boiled down to just lists and maps

+1 - this model makes gluing code together much simpler.

Now perhaps the thorn in the side is Sets

Agree again, I run across this fairly often, and do the same thing using a Map while ignoring the values, and return map.keys as List. If I could efficiently use a List for this, it'd be nice.

helium Fri 8 Jan 2010

Actually I think we made a big mistake making sort/reverse in-place.

Sorting in-place is faster than returning a sorted copy. You could solve it by adding further methods: sort and reverse operate in-place, sorted and reversed return copies and leave the original list untouched. But as we are discussing that list is already too bloated that might be a bad idea.

tactics Fri 8 Jan 2010

If we're making that push for greater purity, what about adding a List.concat that returned the concatenation of two lists? I mistakenly used List.addAll for this in the thread `http://fantom.org/sidewalk/topic/807`. Maybe a Map.merge that would do a similar thing for Maps.

I am firm believer that just about every API problem where you share a data structure b/w encapsulated APIs can be boiled down to just lists and maps

This is precisely the kind of philosophy that makes Fantom such an interesting language to me.

katox Fri 8 Jan 2010

Sorting in-place is faster than returning a sorted copy. You could solve it by adding further methods: sort and reverse operate in-place, sorted and reversed return copies and leave the original list untouched.

Even though it is probably faster I agree making sort and reverse return a new array is more coherent with the rest of Fantom API. I'd rather make it default and add new functions the other way around, non-mutating by default - maybe some sort of new convention (postfix?) for mutating methods would help.

+1 on not touching [,] and [:] usage in interfaces. +1 on making non-mutating the default.

I mistakenly used List.addAll for this in the thread #807. Maybe a Map.merge that would do a similar thing for Maps.

I don't see what is confusing on plus returning concatenated (not flattened) two lists returning a copy. But concat for Lists and merge for Maps would be good names too.

DanielFath Sat 9 Jan 2010

On the topic of Sets, Stacks and Ordered lists. I have to say a few words.

First of all I like the thought the goes into Fantom and I respect your choices when it comes to humane and abolishment of generics (even though I disagree in the latter case).

Speed

I'm not sure what you use below the hood for List but either way there is little to no way it will always be optimal. In Java I could always choose the implementation best suited for my needs or if push comes to shove make my own.

In Fantom there ain't much of a choice. Ok, truth be told this is probably the least of concern (though I still don't get the insisting of Same operator).

Behavior

Adding field to sys::List that change implementation or behavior, such as isSet, (is)Ordered and caseInsensitive could be a solution that will just run into more unexpected behavior (like ordered and caseInsensitive being exclusive). I mean what if future more fields are added that aren't exclusive with ordered, but are exclusive with caseInsensitive?

BTW if all of these behaviors are exclusive wouldn't making those fields into Enum a more sensible choice?

My proposition for solving this is to find a way to group your slots. I remember someone already proposing something like that. It seems like a more encompassing solution overall.

list := ['a','b','c']
list[3]               => c 
stack := list.Stack

stack.push('c')       => [a,b,c,c]
stack[3]              => UnknownMethodErr


set := list.Set
set.add('c')          => [a,b,c]
set[3]                => c

Iteration

The way we iterate through a structure should be independent from the data structures. I think some kind of generic iterator should exist but I'm unsure how it would look like in Fantom.

helium Sat 9 Jan 2010

My proposition for solving this is to find a way to group your slots. I remember someone already proposing something like that.

I did, but deleted it 5 minutes later, because I wanted to think more about it before proposing it. Seems like you read it in that short time.

lbertrand Sat 9 Jan 2010

Sorry, a little bit off topic, but if all lists and maps are immutable, would it not be great to have their implementation like in clojure, so that the cost of immutability is not too high?

brian Sun 10 Jan 2010

I mean what if future more fields are added that aren't exclusive with ordered, but are exclusive with caseInsensitive?

I think we just have define APIs that are practically implemented. BTW, there is no reason caseInsensitive and ordered have to be exclusive, other than I just didn't have the time to write real backing stores myself to handle it.

My proposition for solving this is to find a way to group your slots. I remember someone already proposing something like that. It seems like a more encompassing solution overall

I don't really like that - a List is a List and a Map is Map. How you use them is up you. Eventually I'd like to have oodles of other collections in util, but they won't occupy the same promence in the API as List and Map. List and Map are the "API types" you use to share data between APIs. Other collections should be internal structures.

The way we iterate through a structure should be independent from the data structures. I think some kind of generic iterator should exist but I'm unsure how it would look like in Fantom.

It does today with duck typing:

collection->each |item| { echo(item) }

Login or Signup to reply.