#224 pattern matching and python tuples

mike Tue 6 May 2008

The Roadmap mentions enhancing the switch statement so that it "does more of what functional languages do with pattern matching".

This reminded me of Python's tuples, which are very similar in spirit to mvbind from Common Lisp. Python tuples are not quite as clever as pattern-matching-assigment in e.g. Erlang, but they do accomplish one great thing: avoiding creating a class just to pass a value back out from a function.

Python-style tuples on their own may not be particularly useful for enhancing switch, but they could be extended to support Erlang-style matching if assigment were allowed to "fail" by returning False or null.

My point though is that even if plain Python-tuples were simply imported as-is into Fan, they perform a very useful task in that they allow a simplistic form of lightweight object creation. Python actually uses them as return values in various places in its libraries, e.g. os.path.splitext, or getopt.getopt.

Here is an example python script that I whipped up, showing what you can do with tuples:

#!/usr/bin/python

# function that returns a tuple
def foo():
    return ('w', 'x', ('y', 'z'))

# function that "deconstructs" a tuple argument
def bar(x, (y, z)):
    assert x == 'd'
    assert y == 'e'
    assert z == 'f'

if __name__ == '__main__':

    # make a nested tuple
    t = ('a', 'b', ('c', 'd'))

    # use tuple assignment
    (m, n, (o, p)) = t
    assert m == 'a'
    assert n == 'b'
    assert o == 'c'
    assert p == 'd'

    # function that returns a tuple
    (m, n, (o, p)) = foo()
    assert m == 'w'
    assert n == 'x'
    assert o == 'y'
    assert p == 'z'

    # function that "deconstructs" a tuple argument
    bar('d', ('e', 'f'))

brian Tue 6 May 2008

IMO tuples are kind of a weird compromise between list and map literals, with some clever variable manipulation. One of things I would like to do is figure out how can solve some of the tuple use cases using strictly the existing List/Map types. For example light-weight returns are already easy:

Obj[] foo() { return ['w', 'x', ['y', 'z']] }

What isn't easy today is decomposing a list into variables. I would like to figure out a way to do that leveraging the existing collection types and not having to introduce first class tuples. I haven't given it much thought other than the fairly narrow problem of multiple returns:

w, x, yz := foo

That is pretty simple, but not as general purpose or as flexible as real tuples.

tactics Tue 6 May 2008

I also support the idea of tuples.

The distinction in Python isn't a good example. In Haskell and ML, though, tuples are very important data structures. The nice property they have is that they preserve exact typing. So instead of having a bunch of method calls returning Obj[], you get a very clear idea of what's coming back to you: (Str, Int) or (Int, Int, Int). They are also very nice datastructures for a quick and dirty association list. Yeah, usually you can use a map, but assoc. lists are really handy too.

Regardless of everything, though, untupling of some variety is definitely a must-have.

brian Tue 6 May 2008

I agree that some type of untupling is probably a requirement for 1.0.

And you are quite right that typed tuples would indeed be a bit distinction b/w the Python model. That would be pretty nice actually. I will put this on the roadmap doc.

helium Wed 7 May 2008

If the language gets pattern matching and you use patterns with the := operator like in ML you would get "untupling" and list deconstruction for free.

(* O'Caml *)
let foo = 42;;  (* just like your foo := 42. 42 is matched against the pattern foo *)

let (x, y) = (42, 7);;   (* "untupling", the tuple (42, 7) is matched against (x, y) *)

let (z, _) = (42, 7);;   (* (42, 7) is matched against (z, _), which only binds z. We aren't interested in the second tuple element *)

let h::t = [1; 2; 3; 4];;  (* h becomes the head (1) and t the tail ([2;3;4]) *)

let (a, b)::c = [(1,2); (3,4)];; (* a becomes 1, b becomes 2 and c becomes [(3, 4)] *)

If you allow patterns you can deconstruct in an arbitrarily complex way.

The only problem I see is that it might not work well together with Fan's type annoation syntax.

foo := 42  // would mean match 42 against x, so x get's bound to 42
Int bar := 42   // OK, I help the type-inferer a bit

[(x, y), z] := [(1, "one"), (2, "two")]  // binds x to 1, y to "one" and z to (2, "two")

But how would type annotations look like in the last case?

[(Int x, String y), (Int, String) z] := [(1, "one"), (2, "two")]

Or do I allways annotate the whole pattern?

(Int, String)[] [(x, y), z] := [(1, "one"), (2, "two")]

tactics Wed 7 May 2008

I'm not sure I support the idea of pattern matching against lists in Fan. Tuples yes, but lists....

Lists are the fundamental building block of functional languages. The paradigm is to split the head and tail, process the head, then recurse on the tail. However, in OOP and more procedure-oriented languages, splitting on the head and tail is unnecessary (and more costly, as lists are usually implemented as resizeable arrays).

Additionally, splitting on head and tail is not a total operation. In MLs, this is ok (for function defs), because if the (h::t) clause fails a pattern match, it moves on to the next pattern. But in variable assignments, there are no alternative patterns, and this would cause situations where an exception can be thrown on assignment, which is just weird.

So the above code would need to be something more like:

(Int, String)[] list = [(1, "one"), (2, "two")]
(Int, String) (x, y) = list[0]
(Int, String)[] z = list[1..-1]

Then, everything is completely type sound, and errors are thrown by the [] method instead of by magic ;-)

helium Thu 8 May 2008

"The paradigm is to split the head and tail, process the head, then recurse on the tail."

This is normaly called "map", isn't it?

"I'm not sure I support the idea of pattern matching against lists in Fan. Tuples yes, but lists...."

I'd suggest to use something like F#'s active patterns or Scala's extractors. IMHO, without something like this pattern matching wouldn't make much sense in an OO language. I don't suggest that there sould be an operator :: the can deconstruct lists. This was just an example for deep pattern matching.

"Additionally, splitting on head and tail is not a total operation. In MLs, this is ok (for function defs), because if the (h::t) clause fails a pattern match, it moves on to the next pattern."

How so?

let (1, a) = (1, 2);; (* OK *)
let (1, b) = (2, 2);; (* exception, there is no other pattern you could "move" to *)

This is stupid and you never do this, but it is possible (and the compiler will warn you, that the pattern isn't exhaustive).

The roadmap says:

"Today the Fan switch statement is a not much different than its C and Java ancestors. It seems to make sense to enhance the switch statement to embody more of what functional languages do with pattern matching. We'll need to do some brainstorming on this one."

This does not sound like they only want pattern matching on tuples, especially if you consider that until this thread they didn't even consider adding tuples.

EDIT: blockquotes don't seem to work, so I added quotation marks.

mike Thu 8 May 2008

My original idea was to have tuples be totally dynamic and untyped. But Fan is a (mostly) statically typed language, so if it has tuples, then I agree that they should be typed. Most of my knowledge of functional languages is with dynamically typed Lisp dialects (plus a little Erlang), so I clearly need to bone up on ML, Haskell, etc. Haskell makes my brain hurt though.

I think from an implementation standpoint, that if no one really needs to match against anything other than tuples, then that is all Fan needs to do. Some concrete use cases for-and-against would be useful.

@brian -- I took a cursory look through the api and didn't see anything that would obviously be better expressed with a tuple. But it would probably be a good idea for someone with a detailed knowledge of the library to look for opportunities to use tuples in the public api before it becomes too set in stone.

tactics Thu 8 May 2008

Haskell's not that bad once you get used to the hurting. Once you finally get it, you start seeing everything you do in types.

If you really want an untyped tuple, you simply create an (Obj, Obj) tuple, and use the -> syntax to do dynamic calls.

Thinking about tuples some more, this presents and interesting issue with accessing components.

In Python, it can also be done through iteration or indexing:

for x in (1, 2, 3):
   print x

print (1,2,3,4)[2]

However, these don't really make sense, because the types would always have to be generalized to Obj. For example, what is the type of the expression:

Int index = getUserInput
echo( (1, "Ham", [1,2] )[index] )

You don't know! Not at compile time anyway. So tuples would really have to act a lot more like those in ML and Haskell. I'm wondering if there might be issues, though, since pattern matching, being a Statement in Fan's syntax, wouldn't be as flexible as in a functional language (where everything is an expression).

brian Thu 8 May 2008

This discussion has really got me thinking. I definitely think tuples and pattern matching go together and need to be designed together. Unfortunately I haven't thought a lot about pattern matching and that use case. But coming at from another direction, I like to think of tuples as a simple structures where declaring a class is too busy. Let me ask this: do we really want to access tuples values by position or by name? I would expect that by name is preferred right?

If you think along those lines then instead of tuples, maybe we should be thinking more like C# anonymous types. Anonymous types let us declare a new class "inline" via something like with-blocks. Using C# syntax it might look something like:

x := new { id = 7; firstName="Bart"; lastName="Simpson" }

Now x has a first class type that I can use calls and normal reflection on:

x.type.fields      =>  ["id", "firstName", "lastName"]
x.type.field("id") =>  Int id
x.firstName        =>  "Bart"
x->lastName        =>  "Simpson"

The way C# uses anonymous types within LINQ to mix relational and OO is pretty elegant. In fact we've struggled with that for haven - how to you "type" something like an SQL projection? What we are doing today is encapsulated into the sql::Row class which itself uses dynamic types.

What do you guys think of moving that direction? How might that work with pattern matching?

andy Thu 8 May 2008

I probably can't provide much useful input on this topic, since I have very limited experience with tuples in languages like python, ML, and Haskell. Mostly my only use case has been for multiple return values.

But I have used "anonymous types" in JavaScript alot, and they are very useful. And I use them in exactly the way Brian has outlined above. So I think something like C# anonymous types would be a very nice feature to have.

tactics Thu 8 May 2008

OCaml has this concept (they call them records). The only issue with them is they have no pretty type literal. In OCaml, given the example above, I think the syntax for the type is something like {id : int, firstName : string, lastName : string}, which is pretty verbose. To me, at least, it feels like anonymous types are hardly anonymous at all because of all the identifiers! If you're going to carry around the identifiers, you should probably just create a class.

But that's just my opinion =-)

Like Andy said, the main use is passing back multiple return values from a function. There's all sorts of cool things you can do with this. One example, I remember I did in PHP a while ago, was parsing these IDs we had in our system that looked like "123-a". But we needed a method to parse slightly malformed IDs, such as "123a", "123 a", etc. It would have been awesome to write a function that took a string and returned an (Int, String). But since it was PHP, I had to return an array instead. (This was before I learned about PHP's list macro, of course).

But a list style macro doesn't really work in a statically typed language. As in the above example, it's often nice to return heterogeneous types.

If I have time, I will try to pull out some nice uses I've seen from the Python standard library. Off the top of my head, two that stand out in my head are socket.accept(), which returns the incoming socket and it's Inet address, and os.walk(), which returns an iterator of type (String, Stirng[], String[]), so you can do cool things like:

for dirpath, dirnames, filenames in os.walk('mydirectory'):
    print dirpath
    for file in dirnames + filenames:
        print '   ',  file

brian Thu 8 May 2008

I actually think both of those use cases get mapped into Fan's OO APIs pretty well without tuples since we have a nice TcpSocket and File class:

dir := `mydirectory`.toFile
echo(dir)
dir.list.each |File f| { echo(f.path) }

Actually having a File object you can call path, name, ext, parent, etc on seems more flexible.

Although assuming we did use anonymous types, I think having the fields tagged with a name makes things a lot more readable:

{Str dirpath, Str[] dirnames, Str[] filenames} walk()

That gets back to my original point - doesn't it result in cleaner more maintainable code to access a "tuple's" values with names versus ordering? I think so.

Although I don't really think tuples or anonymous types are really necessary for APIs. Except for a couple rare cases where I've really wanted multiple returns, things can always be done pretty cleanly with the existing type system IMO. To me the big use cases for something like this are:

  • pattern matching
  • internal logic such as message passing b/w threads

helium Thu 8 May 2008

"But coming at from another direction, I like to think of tuples as a simple structures where declaring a class is too busy. Let me ask this: do we really want to access tuples values by position or by name? I would expect that by name is preferred right?"

If I don't use pattern matching I access tuples normaly "by index" (if you want to call it that). In most cases I only have tuples with two elements (pairs) and use a function that returns the first or the second element of that pair (normally called fst and snd, at least in Haskell and O'Caml). For example this happen's when I'm only interested in one of the two results of a function.

"If you think along those lines then instead of tuples, maybe we should be thinking more like C# anonymous types. [...] What do you guys think of moving that direction? How might that work with pattern matching?"

In O'Caml you can pattern match against records, so you could do it just like they do. But I very rarly use this. Well, actually I don't use records that often.

let bar = { x = 0; y = 3.1415926; z = "" };;

let { x = a; y = b; z = c } = bar;;

Now a is bound to 0, b is bound to 3.1415926, c is bound to "".

Normaly I want either simple tuples (most often pairs, seldom triples) or full blown objects. Records are somewhere in between. They are nice in the case of LINQ, but I'm not sure that there are many other use-cases. Perhaps you can use them to write functions where you can pass arguments by name, rather than by position.

"That gets back to my original point - doesn't it result in cleaner more maintainable code to access a "tuple's" values with names versus ordering? I think so."

Then they are no longer tuples!!! Then they are records! A tuple is something like a arecord with unnamed elements. It comes from mathemathics and you can't just define it like you want to, at least not without confusing many people.

Login or Signup to reply.