All Topics

#305 Constructor Design Take 3

brian Mon 21 Jul 2008

I think it is time to lay the constructor debate to rest. Previous discussions:

I think the discussion has been really great, but I've convinced myself that the current design while not perfect, is the correct design. The focal point of my decision is that true constructors as currently designed serve as good initialization methods for both client code and subclasses. Neither true static nor instance methods can satisfy that requirement.

However, there are still two enhancements required:

Short Calling Syntax

There seems to be general agreement to provide some syntax sugar for calling a constructor. Proposed change is to compile Type(args) , the construction operator, as follows:

if the arguments match Type.make then bind to it (whether constructor or factory)
if the arguments match Type.fromStr then bind to it (existing behavior to keep serialization true subset of language)
otherwise compiler error

The nice thing about this feature is we could add new rules in the future, so it decouples the caller from the actual implementation a little bit (at the source level only). Plus we are already using make in this way for Type {...} which is sugar for Type.make {...}.

Convention will be that Type(...) is preferred over Type.make(...). Not sure whether Type() is preferred over Type.make (when there are no arguments). We should pick one as convention.

Validation

There is broad agreement that validation is required once all the const fields have been initialized. There is not consensus on how that actually happens. My proposal is to enhance the current two phase construction process with a three phase construction process:

constructor runs (class code)
with-block on the constructor expression runs (client code)
new handler runs (class code)

At any point during the construction's three phases, const or readonly fields can be set. Because client code can set const fields after the constructor, the class gets a chance to validate in its new handler. Once all three phases are complete, construction is complete and const fields are guaranteed immutable.

The new handler is specified similar to a static initializer using the new keyword:

new
{
  if (name == null) throw NewErr("name is null")
}

By convention validation errors will throw NewErr. New handlers run from the top of the base class down to the subclasses. The superclass new handler is guaranteed to be called. There is no requirement to "call super", it is implicit. Mixins are not allowed to have new handlers. New handlers are generated as public virtual instance methods called "new$" with an implicit super call. The new handler is invoked by the compiler at the constructor call site using CallVirtual.

Future work might entail annotating fields for compiler generated error checking (such as a nonnull facet). But I want to get some experience with the manual design first.

brian Mon 21 Jul 2008

One thing I forgot to mention is that a new handler can change its own const fields, but cannot change any const fields of its superclass. As the new handlers execute down the class hierarchy, each level of the hierarchy gets locked down.

jodastephen Mon 21 Jul 2008

Lets accept these two items as is - I think that they will enhance Fan. Personally, I'd choose Type() as the no args construction variant.

However, unfortunately, I don't quite think this is the end of the story. (Note that the following don't prevent you from going ahead and implementing your proposal, as these are extensions/comments not changes).

1) Derived state:

class Point {
  const Int x
  const Int y
  const Int area
}

point := Point {x = 10; y = 10; size = 500}

As we can see, the point has been setup with an invaild area (should be x * y). We can add a new-handler to manage this:

new {
  area = x * y
}

But this doesn't affect the caller:

point := Point {x = 10; y = 10; size = 500}
point.size   // this wil return 100 despite the 500 above

This definitely doesn't seem desirable to me.

What alternatives are there? Well the field could just be a calculated field - easy in this case, but lets say the calculation is heavyweight and slow, and we want to cache the result in real storage. (Can you have a const calculated field?)

Or, we could mark the area field as derived, maybe using readonly (allowing the field to be set by the class, but not by the with-block):

class Point {
  const Int x
  const Int y
  const readonly Int area
}

point := Point {x = 10; y = 10}              // compiles
point := Point {x = 10; y = 10; size = 500}  // does not compile

This might already work - I haven't tried it yet. If so, it needs additional documentation.

2) What do constructors do?

Given this proposal, what does the constructor do:

setup state from arguments, typically simple assigns
setup private state, only if a with-block does not affect the state

What does it not do:

validate state - has to go in new-handler
setup private/public state which could be affected by with-block
validate state

As such, it seems most likely that a lot of constructors will simply be specifying a set of values which are assgned directly to fields. This common case is boilerplate, and it would be nice to improve it with an optional syntax sugar. For example:

class Point : NamedObj {
  const Int x := 0
  const Int y := 0
  new make(name, x, y)
}
// syntax sugar for
new make(Str name := null; Int x :=0, Int y := 0) : super(name) {
  this.x = x;
  this.y = y;
}

I'd also like to document some other possible changes, and why they might now be discounted:

1) One constructor.

The concept was to make construction simpler by only having one constructor - additional factories would be used instead calling the one constructor.

However, with the finding that a constructor is a useful way to initialize a class together with a new-handler, there doesn't seem to be any great benefit to having only one constructor. It might be a convention however.

2) Embedded super() call.

The super() call could be embedded in the constructor with code allowed before it:

new make(Str name := null) {
  baseName = name[1..-1]
  super(baseName)
  this.name = name;
}

However, this is difficult to make work in superclass hierachies. And with constructors being less important now, its just not worth it.

I'm sure I'll think of something else later.

helium Mon 21 Jul 2008

As we can see, the point has been setup with an invaild area (should be x * y).

Wrong. A point has no area. (It doesn't have a length or a volume, either.)

jodastephen Mon 21 Jul 2008

I've obviously read too many Point/Line/Rect examples. Think of the issue, not the name "area" ;-)

brian Mon 21 Jul 2008

At this point, I think the shorten calling syntax is finalized and I will be implementing this week. I don't consider the validation matter closed, it still doesn't sit well me. I just wanted to get a concrete proposal written up to help organize my mind.

What alternatives are there? Well the field could just be a calculated field - easy in this case, but lets say the calculation is heavyweight and slow

Typically you would use a once method for that, although currently I disallow once methods for const classes. I've thought about allowing once methods on a const class but running thru them all at construction time.

Or, we could mark the area field as derived, maybe using readonly (allowing the field to be set by the class, but not by the with-block)

That idea I like. We should keep think about that.

The new handler thing is starting to feel like overkill. After sleeping on it I realize that I have to generate and call the new handler all the time, even if 95% of classes don't declare or need one. Otherwise we end up with a fragile base class problem. After thinking more about it, I'm not sure I even have any existing code where I really care that much about a new handler. There aren't that many cases where it is the end of the world to not validate at construction time.

So maybe a better approach is to consider how to work around the problem when you do care. You can always do this:

const class Foo
{
  new make(Str name) { _name = name }
  Str name() { return _name }
  private const Str _name
}

Definitely has a bit of the Java smell, but is a serviceable mechanism for the cases when you don't want to allow name to be exposed as a const field. Tom had suggested a couple times to "just wait and see", and I've wavered back and forth. Now I'm kind of leaning towards do nothing again.

The concept was to make construction simpler by only having one constructor - additional factories would be used instead calling the one constructor.

Yes, this issue kind of got lost. Actually this would be a really good thing if we can swing it because it allows me to emit a Fan constructor as one Java ctor call (today it requires 3 method calls). I still mean to go thru my constructor list and evaluate the places where I'm really using more than one. I will do that this week. If we can live with a single constructor, then I would still like to evaluate that change.

The super() call could be embedded in the constructor with code allowed before it

I consider this issue decided - the super call always goes first C++/C# style.

jodastephen Mon 21 Jul 2008

After sleeping on it I realize that I have to generate and call the new handler all the time, even if 95% of classes don't declare or need one

I think that the only happy solution will be bytecode that contains a genuine constructor with one argument for each field that can be set. Yes it will have many arguments, but that shouldn't be a problem. And it means the fields can be truly bytecode final. Note that the Fan source code doesn't need to reflect the bytecode.

There aren't that many cases where it is the end of the world to not validate at construction time.

I disagree strongly with that. This seems like a statement that leans very much towards the dynamic "let it happen" end of the spectrum. If Fan is to be a general purpose language like Java, it needs to appeal to the mass of developers, many of whom like tight control.

Or to put it another way, most of the classes I write in Java need tight control over their internal state.

brian Tue 22 Jul 2008

If Fan is to be a general purpose language like Java, it needs to appeal to the mass of developers, many of whom like tight control.

My point was that anything you can do in Java can be done similarly in Fan. If you want to keep tight control, then don't make your fields public. Use private fields with accessor methods. Since this is a already common pattern in Java, it seems a reasonable option when that level of control is desired.

jodastephen Tue 22 Jul 2008

If you want to keep tight control, then don't make your fields public. Use private fields with accessor methods

Whilst I appreciate your point, I hope you can see that this is not a valid solution. Creating and using fields is going to be key in Fan and it needs to be simple and bulletproof. Any boilerplate is bad.

Here are the basics of where I am:

(1) const state needs to be validated at the end of the construction process
(2) private state needs to be setup at the end of the construction process, if it needs to be kept up to date after that, then setters or derived getters should be used
(3) public state can be set freely (setters/with-blocks), if it needs to be validated then setters must be used
(4) derived const state needs to be setup at the end of the construction process

I don't see any of these three as particularly negotiable to make the language usable.

Note that the new-handler addresses (1), (2) and (4), not (3). Not bad for the feature, but it does feel like a bit of a hack.

I think my view is that construction is a special phase of an object's lifecycle, and one that will necessarily have special rules. Trying to use the same with-block approach for both does appear to break whichever way I look at it. (Note that constructing then immediately use a normal-with-block is fine).

JohnDG Tue 22 Jul 2008

I consider this issue decided - the super call always goes first C++/C# style.

I really hate this style -- not that I like Java's any more (I don't). Many times I need to perform work in the constructor involving the this pointer (typically in the context of delegation/composition), and then pass some derived artifact to the superclass. However, Java won't let you reference the this pointer until the superclass constructor is invoked. As a result, I'm forced to extract this logic into a factory method that calls an initialize method of the class -- with attendant consequences such as no longer being able to use final fields. Then classes that would otherwise be derived from this class are forced to delegate, instead, leading to endless boilerplate.

For issues like this, instead of trying to outsmart developers, I think it's better to trust them. I know the implications of referencing this before the superclass is invoked (and they are quite similar to the implications of referencing this before the subclass constructor is invoked, which Java allows) -- and for all my use cases, there are none.

Which is all why I originally wanted the super = notation. It stays out of my way and lets me initialize the superclass anytime I want and with any method I want (constructor or factory method).

However, I do see a possible way to achieve the effect I want for some cases in the current design. Something like:

class MyClass : SuperClass 
{
   new make() : super(createArtifact(this)) 
   {
   }

   static Artifact createArtifact(MyClass myClass) 
   {
     //
   }
}

Not sure if it will compile, though (it won't in Java).

andy Tue 22 Jul 2008

I've been on the fence with validation, but I think I've convinced myself it should not be part of the language. The problem is you're not really solving it in totality. Its very likely you need external resources to determine the validity of object (to keep with the point example):

pt := Point(5,3) { someConstField = "invalid value" }

So validation solves the case where I internally want to verify the object. But say in my application, points must have positive values:

pt := Point(-5,3) { someConstField = "valid value" }
validatePoint(pt) // throw Err("x must be positive")

So why bother with validating Point when I'm just going to do it again in my application code?

Or in the third case (the most common one IMO), I just don't care. So in the cases where I do care, it seems like thats an application issue to implement, and we shouldn't complicate the core language with this feature.

brian Tue 22 Jul 2008

I hope you can see that this is not a valid solution. Creating and using fields is going to be key in Fan and it needs to be simple and bulletproof. Any boilerplate is bad.

I understand this sentiment. But in the end fields expose a certain detail about the implementation. There is a difference between a method which returns a value and a field which stores a value.

I think fan solves the normal mutable field boiler plate that drives me crazy about Java and C#.

But I'm also not so quick to say that Fan solves all those problems. Accessing a const field not binary compatible with a method call (accessing a normal field is).

John - I hear you, but I don't particularly like any of the proposals to enable that.

JohnDG Tue 22 Jul 2008

John - I hear you, but I don't particularly like any of the proposals to enable that.

Sure. Who knows, maybe mixins will eliminate the need to do this kind of thing? Need more experience with Fan to say.

alexlamsl Tue 22 Jul 2008

One thing I forgot to mention is that a new handler can change its own const fields, but cannot change any const fields of its superclass. As the new handlers execute down the class hierarchy, each level of the hierarchy gets locked down.

I would like to know more about that...

class A {
  make {Sys.out.printLine 'make A'}
  new {Sys.out.printLine 'new A'}
}

class B : A {
  make {Sys.out.printLine 'make B'}
  new {Sys.out.printLine 'new B'}
}

// what would this print?
new B{Sys.out.printLine 'with B'}

brian Wed 23 Jul 2008

I would like to know more about that

I don't really understand your example since there isn't any fields.

alexlamsl Thu 24 Jul 2008

Since I'm asking about the order of execution, I didn't thought I need to include fields explicitly...

But I guess I can be more rigorous:

class A {
  const Obj a;

  // OK
  new make() {
    a = Obj.make();
  }

  // OK
  new {
    a = Obj.make();
  }
}


class B : A {
  // allowed?
  new make() {
    a = Obj.make();
  }

  // allowed?
  new {
    a = Obj.make();
  }
}


// allowed?
B.make {
  a = Obj.make();
}

brian Thu 24 Jul 2008

Based on my original proposal (not that it is the one I favor now), the rules were:

class B : A 
{
  // allowed, super class hasn't run its own new handler yet
  new make() { a = Obj.make(); }  

  // not allowed, super class locked down after its new-handler  
  new { a = Obj.make(); }
}

// allowed, super class hasn't run its own new handler yet
B.make { a = Obj.make();

alexlamsl Thu 24 Jul 2008

Indeed - doesn't feel too well with me :-/

Ignoring the validation parts, what is the behaviour for subclass modifying super const fields today?

Somehow I feel strange about the idea of B being able to modify a...

Fantom

#305 Constructor Design Take 3

brian Mon 21 Jul 2008

Short Calling Syntax

Validation

brian Mon 21 Jul 2008

jodastephen Mon 21 Jul 2008

helium Mon 21 Jul 2008

jodastephen Mon 21 Jul 2008

brian Mon 21 Jul 2008

jodastephen Mon 21 Jul 2008

brian Tue 22 Jul 2008

jodastephen Tue 22 Jul 2008

JohnDG Tue 22 Jul 2008

andy Tue 22 Jul 2008

brian Tue 22 Jul 2008

JohnDG Tue 22 Jul 2008

alexlamsl Tue 22 Jul 2008

brian Wed 23 Jul 2008

alexlamsl Thu 24 Jul 2008

brian Thu 24 Jul 2008

alexlamsl Thu 24 Jul 2008