All Topics

#345 Pod dependencies

brian Mon 25 Aug 2008

I just want to move the pod dependency discussion to a different thread.

I think it is safe to allow multiple versions of one pod loaded at a time if dependencies are managed properly.

There are a couple problems with this approach. First, this often turns into a classloading nightmare in Java. Debugging this sort of thing where multiple classloaders have loaded different versions of the same named class is pure evil in my book.

This would be especially bad in Fan because we heavily use meta-data style programming by passing around qnames and types. A given qname such as "acme::Foo" should always resolve to the exact same Type instance. It shouldn't be a context sensitive resolution.

Also remember that Fan builds a type database of your installed pods. This is what enables that style of meta-programming. But it wouldn't work in a multi-version architecture.

So loading multiple versions can be incredibly flexible but you must pay an extreme price in complexity.

Fan's philosophy is to carefully declare your dependencies. The runtime will not load your pod if your dependencies are not met. This turns it into a social problem - you have to define a versioning design with real thought to how you do backward compatibility.

We do still need to define versioning rules. For example if you are going to develop a pod, what version of sys should you dependent on? We probably need to say any 1.X version is guaranteed backward compatible and that we will roll to 2.X for breaking changes.

katox Mon 25 Aug 2008

Debugging this sort of thing where multiple classloaders have loaded different versions of the same named class is pure evil in my book.

It is not the versioning that causes the JAR hell it is the lack of versioning.

In Java you have no idea which class version you load or access (unless there is just one available). The classpath is flat but there is a classloader hierarachy. There is nothing preventing you from specifying arbitrary junk of colliding jars or even portion of jars on your classpath - the first loaded class wins. The pitty is that often you don't know which class file is to be loaded first (and even from what jar). To complicate it even more, the delegation of class loading is (by convention!) reversed to polymorphism, parent first.

On the other hand, there are no problems with the same name of a class or a variable in a properly structured Java code. Encapsulation and scoping guarantees that such thing is ok. The compiler doesn't force you to name each variable or internal class differently - at least not the javac ;)

A given qname such as "acme::Foo" should always resolve to the exact same Type instance. It shouldn't be a context sensitive resolution.

There nothing preventing us from specifying "acme::Foo" in code with versioned pods. It is up to assembly to resolve the correct pod context.

Also remember that Fan builds a type database of your installed pods. This is what enables that style of meta-programming. But it wouldn't work in a multi-version architecture.

If the database is not flat I don't see a problem with that.

So loading multiple versions can be incredibly flexible but you must pay an extreme price in complexity.

No, you don't have to - if you stick with versioned pods not individual classes or mixins.

Pod versioning

Let's clarify why would we want versioned pods. In my opinion the main reasons are:

to avoid class and namespace clashes,
to support modularity and code reuse,
to ease migration to newer software components.

If you allow just one pod at a time you gain 1. but lose 2. and 3. With pod versioning you can have the cake and eat it also if you avoid too much topping.

Versioning scheme

I'd suggest slight changes to sys::Depend in order copy well known OSGi scheme (along with terminology) <major>.<minor>.<micro>.<qualifier> where <major>, <minor> and <micro> are numbers and <qualifier> is alphanumeric.

Each segment captures a different intent:

the major segment indicates incompatible breakage in the API
the minor segment indicates backward compatible change, usually "externally visible" changes
the micro indicates bug fixes only
the string qualifier segment indicates a particular build or a variation

Exported packages have to indicate a specific version (unlike mvn SNAPSHOT, for instance). Importers can indicate a range using a math notation (could be swapped for Fan notation).

Dependencies (I'd preserve Fan additions because the seem logical to me):

"foo 1.2"             Any version of foo 1.2 with any micro version or qualifier
"foo 1.2.64"          Any build or variation of foo 1.2.64
"foo 1.2.64-b14"      Fragile exact build version
"foo 1.2+"            Any version of foo 1.2 or greater
"foo 0+"              Any version of foo
"foo [1.2.3, 4.5.6)"  Any version 1.2.3 <= x < 4.5.6
"foo [1.2.3, 4.5.6]"  Any version 1.2.3 <= x <= 4.5.6
"foo (1.2.3, 4.5.6]"  Any version 1.2.3 < x <= 4.5.6

The qualifiers are sorted alphabetically, so foo-1.2.3-rc1 < foo-1.2.3-rc2 etc. An empty qualifier beats everything foo-1.2.3-rcX < foo-1.2.3.

By introducing non-numeric character we'd allow to use popular "release candidates" (hey, this needs testing, it is the next major upgrade) and build versions separate from bug fixes. Still alphabetical ordering of qualifiers keeps things simple enough (no need to guess a meaning).

Dependencies

The dependency graph would be layered, constructed using following rules:

a pod is a subject of a dependency (not a Class, Mixin etc.)
a pod must declare all its direct input dependencies
a pod must not declare dependencies on two pods of the same name with different versions
input dependencies of a pod are opaque

This set of rules should be sufficient. The first rule ensures proper granularity. The second and fourth rule ensure that we know what our pod depends on without messing with transitive dependencies - they are invisible to our pod (we want to use net-2.0, we don't care about its internals like parser-1.1). Third rule ensures there is no ambiguity among our pod dependencies.

The graph from the previous posting would look like:

((parser-1.1.0) <-- net-2.0.1) <-- A-0.2.0
                (parser-1.0.0) <-|

There is no reason why A couldn't use parser-1.0 along with net-2.0. Pod parser-1.1 won't be visible to A. The rules for sys would be no different (including versioning).

Pod sharing

Of course this could lead to great multiplication of pod versions in a Fan system. This can be alleviated by carefull tracking of dependencies and removing unnecessary pod versions by a wise usage of version intervals and well planned migration plans.

A pod codebase could be shared. I see no problem there. Various pods could instantinate classes identified as "acme::Foo" - the assembly would know the context and it would give us a proper version automatically.

Imagine we upgraded A to v0.3:

((parser-1.1.3) <-- net-2.0.2) <-- A-0.3.0
                (parser-1.1.3) <-|

In A-0.3 we requested version [parser-1.1.2,parser-1.2) because we know that in parser-1.1.2 they corrected a nasty bug. The networking pod net-2.0.2 has some bugs fixed and still the same old dependency of parser version [1.1.0+]. Now with parser-1.1.3 available to assembly this pod is shared by the system (though invisible for A through net pod).

There could be problems with singleton classes or with configuration by a bunch of static methods but that's rather smelly anyway. I'd suggest to prevent sharing of such pods completely.

Another problem could be competing for external resources but such problem is not exclusive for versioning it can happen to any pods of any content.

Summary

By loosening the rule one pod version per runtime to one pod version per dependency we can get far better flexibility and robustness for large systems with rather low overhead. The rules for dealing with versions and dependencies are extremely simple.

I'd also say this system is quite natural. If you wanted to construct a cart and you already had an engine and a frame you wouldn't "refactor" the frame just because the engine case had come with atypic bolts inside. You could use whatever bolts you liked without even noticing. However if you wanted to modify the engine using the same bolts was a no brainer.

jodastephen Mon 25 Aug 2008

When I saw Fan supported pods (modules) and versions I immediately assumed that loading multiple versions of the same pod would be supported. I'm a little surprised that its not.

Now, I haven't thought through all the implications of multiple loaded pods of different versions, but the discussion above seems like a good start. It does seem to me that the flexibility is worth the extra complexity. ie, change is a fact of life in coding, so lets design support in.

As soon as you get into a world of many open source pods, each with their own set of dependencies on other open source pods, this rapidly becomes a nightmare scenario where it can be nigh on impossible to get a set of working pods with the right versions. Building in the versioning is surely the right solution.

It should also be obvious that this would be a major selling feature for Fan!

brian Tue 26 Aug 2008

This isn't a black and white issue. Loading multiple versions of a pod into memory at the same time does indeed provide a great deal of flexibility. But it does add a lot of complexity. So this is an engineering decision - what is the right trade-off to make?

Currently my position is that the complexity is not worth the extra flexibility. In the end we may have to disagree on the right trade-off, but this is a critical issue so it is definitely worth exploring.

What We Do Have

Just for starters let's examine what Fan does have today. We have a very well defined mechanism for creating uniquely named pods which are the standard unit of deployment, naming, and versioning. Pods must explicitly declare their dependencies which are verified at compile time (you can't import a pod which is not in your dependency list). If a pod's dependencies are not met, the runtime will not load the pod with the appropriate error message. The rules for versions and dependencies are rigorously defined in the Version and Depend classes.

IMO this is a pretty solid design and puts Fan ahead of most other platforms when it comes to module management.

So now let's look at what Fan is missing - the ability to load multiple versions of a pod into memory at the same.

Use Case

Let me create a use case to base our discussion on. Let's assume four fictitious pods:

UtilPro: an open source set of utils like Apache commons
BlogPro: a blogging package, has dependency on UtilPro
StylePro: a package for styling websites, has dependency on UtilPro
MyApp: an application I'm building with dependency on BlogPro and StylePro

We have two different packages BlogPro and StylePro with a dependency on UtilPro. The dependency of MyApp on BlogPro and StylePro creates a diamond dependency graph.

Where things can go wrong is if UtilPro makes a breaking change between 1.0 and 2.0. BlogPro depends on the original UtilPro 1.0 version and StylePro depends on the new 2.0 UtilPro version. This poses a problem because now I can't use BlogPro and StylePro at the same time. This is a pretty big issue, and why being able to use multiple versions of UtilPro at the same time would be quite powerful.

Complexity

If we wanted to allow two version of UtilPro to be loaded at the same time, how would that actually work? We want UtilPro 1.0 to be loaded and used by BlogPro and UtilPro 2.0 to be loaded and used by StylePro.

But this is where things get really ugly. In order to make this work, we have to ensure that BlogPro and StylePro's usage of UtilPro is completely internal. For example consider what would happen if BlogPro and StylePro used UtilPro's classes in their public API:

UtilWidget BlogPro.produceWidget()
Void StylePro.consumeWidget(UtilWidget w)

What version of UtilWidget would MyApp see in this case? And what would happen if we tried to pass an instance between those two APIs? We can't support this because there would be two instances of the UtilWidget type in memory, and each of these APIs would be using a different version.

So this gets to the heart of the matter - you can't support multiple versions of the same pod if the next pod in the dependency chain exposes any of those APIs publicly. So we've created a huge restriction on when the multi-version feature can actually be used - this drastically reduces its attractiveness in our trade-off battle between flexibility and simplicity.

But if we still wanted to persist in building this feature, we have to figure out the rules for "public" usage of a multi-versioned API. I clearly can't use a multi-versioned class in any public or protected scoped API - that wreaks havoc in diamond dependencies. But even if I restrict to internal/private APIs, this is a fairly leaky abstraction. What would happen if I passed one of those objects out thru an API marked as Obj? In Fan this happens all the time such as in message passing between threads. Now I start to get weird class cast exceptions because even though I have an instance of UtilWidget, it isn't the right version of UtilWidget. So I stick by my original premise - it does turn into a debugging nightmare.

My conclusion is that loading multiple versions of a pod into memory can only be supported with severe restrictions and even with those restrictions would introduce tremendous complexity into the runtime's classloading, reflection, and type database APIs.

Alternate Solutions

So if we don't support multiple pod versions in memory at the same time, how do we solve our UtilPro problem above? Really it becomes a process problem. First you'd like to think that if UtilPro was widely deployed it would never break backward compatibility - or if it did then it should choose a new pod name because effectively it is something new and different in that case.

Even if that isn't realistic, the developer of BlogPro is going to have to put out a version which depends on the latest version of UtilPro if we want to consider BlogPro actively maintained. We have to assume a process where everyone is working toward dependencies on the latest stuff.

It certainly isn't perfect, but building large scale systems from lots of different software components from multiple vendors (or projects) is never perfect - it is a tough problem on many fronts: versioning, scheduling, integration, etc.

andrey Tue 26 Aug 2008

Hi Brian,

Eclipse Runtime, which is now aligned with OSGi (and influenced) is a versioned modules solution like hypotetical pod system you described. Eclipse exists for years and proven by many vendors and huge deployments (including hundreeds of versioned modules, sometimes with multiple versions of concrete module).

Obviously concerns like you described exists, but I hope Eclipse community can confirm that cases you described are extremely rare. For example our company delivered tens of small and big eclipse-based projects, which depends on almost every key eclipse technology (and set of its plugins/modules). In our practice we met "UtilPod" problem only once (and I remeber it well, cause the ugly workaround we did was to create old-Util-object using proper classloader's reflection and copy values from new-Util-object).

Again, this is not only my experience, I believe thousand Eclipse committers and much more Eclipse Platform users rarely (or never) met this problems in practice.

As for type database, Eclipse has similar concept used for IoC (Inversion of Control): each plugin contributes searchable "extensions" to the database, like each pod contribute types to type database. So challenge is similar and Eclipse runtime seems to solve these problems perfect.

My hope for Fan is Fan be able to absorb best ideas from the world, and Eclipse experience looks like not the last place to look for some :)

Kind Regards, Andrey

brian Tue 26 Aug 2008

Andrey,

Yes I am very familiar with OSGi - in fact I've personally implemented an OSGi runtime (although it was years ago so I'm a bit rusty). There are some very elegant aspects of OSGi, but I also disagree with some fundamental design decisions. So I don't consider it the perfect paradigm. Obviously it works for the Eclipse community, but my experience with it led to the nightmare scenarios I discussed above. It works well when you cleanly separate your public interfaces (which are shared and always the latest version) from your private implementation (which might be multi-versioned). But it doesn't work well without that separation, especially if you want to use implementation inheritance. I think Fan's more extensive use of meta-programming and serialization compounds these issues. But I would be interested to hear feedback from people who have a deep Eclipse development experience.

katox Tue 26 Aug 2008

Brian, thanks for a detailed response. I'll stick with your example as it is more detailed and shows basically the same problem as the previous one.

we have to ensure that BlogPro and StylePro's usage of UtilPro is completely internal

Yes and it is quite intentional to force developers to do so. Why not if it can be done with a reasonable effort?

what would happen if BlogPro and StylePro used UtilPro's classes in their public API

Then clearly we have MyApp direct dependency on UtilPro in MyApp. BlogPro and StylePro should indicate the version of UtilPro they need in their respective public APIs - they know which version they use by their own input dependencies.

That's why I don't really like the term diamond dependencies - transitive dependencies of dependent pods can't be seen by MyApp. From MyApp's view it is a tree (of two branches in this case).

What version of UtilWidget would MyApp see in this case?

The one "UtilPro::UtilWidget" that is resolved from version of UtilPro that fits StylePro, BlogPro and MyApp requirements, ie. requirements of directly interfacing classes. The UtilPro dependency of StylePro and BlogPro would be resolved on MyApp load.

We can differentiate several cases:

There is single version of UtilPro available (among accessible pods) that fits all - then this version is disambiguated and set as exact version of output dependency of StylePro and BlogPro and used as input dependency of MyApp (on MyApp load).
There are multiple versions of UtilPro available that fits all - then a manual diambiguation by setting tighter input dependencies of MyApp would be needed (or, possibly, we can just pick the latest version).
There is a version of UtilPro that fits StylePro and another version of UtilPro that fits BlogPro but they are incompatible. Though it may fit MyApp itself an assembly error is reported asking for manual resolution. This would probably result in necessary migration of StylePro or BlogPro to achieve compatible interfaces or adapting one or both interfaces.
There is no such pod which would satisfy all the dependencies. An assembly error is reported asking for manual resolution.

A disadvantage of this method is that it needs a static resolution before MyApp is loaded.

If we needed a hotplug it might happen that the dependency to UtilPro had been already resolved (in #1 or #2, before we had tried to load MyApp). In such a case we could either reload the entire subtree (re-resolving all dependencies) or just report an assembly error if the previously resolved version didn't fit MyApp.

you can't support multiple versions of the same pod if the next pod in the dependency chain exposes any of those APIs publicly. So we've created a huge restriction on when the multi-version feature can actually be used

You can but it would be automatically resolved to a single version.

If there is an incompatible class UtilWidget required by interfacing classes it doesn't matter if it is just a different version of a different class. It won't work and, of course, we have to resolve such cases manually.

I clearly can't use a multi-versioned class in any public or protected scoped API - that wreaks havoc in diamond dependencies.

No, but we can use a public class in multi-versioned pod in public API. Though it would be wise to avoid it anyway if the same can be achieved using basic Fan types. Another possibility is to isolate common dependencies (for instance some transfer data objects) into a separate (and separately versioned) pod.

protected scoped API

Anything like protected (as in Java) that would ignore the pod modules would pose a problem because it'd go through layers of pods - breaking scoping.

What would happen if I passed one of those objects out thru an API marked as Obj? In Fan this happens all the time such as in message passing between threads.

Yes, that's also a problem. But we can't be upset that Fan didn't resolve our objects correctly if we deliberately specified "any object" in public APIs. If we passed a totally different kind of object (not just a different version) through the same liberal API we'd also get weird runtime class cast exceptions. Isn't this a type of problem we are actually trying to avoid using a type system?

Mind that this problem would show if we needed to pass object from one thread to another thread handled by class within an another pod. Internally the problem does not exist.

Now I start to get weird class cast exceptions because even though I have an instance of UtilWidget, it isn't the right version of UtilWidget.

Yes, it would result in a class cast exception error. But - because assembly knows that this call was inter-pod call - it can provide additional versioning information in such case.

If the message was "MyApp.fan:143 classCastError can't pass UtilPro-1.2.3::UtilWidget to BlogPro-2.0.0::AcceptingClass, UtilPro-2.0.0::UtilWidget expected" instead of cryptic "MyApp.fan:143 classCastError can't pass UtilPro::UtilWidget to BlogPro::AcceptingClass, UtilPro::UtilWidget expected" (as in Java) I'd say it is on par with two differently named classes (passed in by an error).

If we needed untyped public pod APIs (with Obj) "all over the place" maybe generics would sort this out. Otherwise it would seem like a bad pod API design to me.

My conclusion is that loading multiple versions of a pod into memory can only be supported with severe restrictions and even with those restrictions would introduce tremendous complexity into the runtime's classloading, reflection, and type database APIs.

Yes, this type of versioning is restrictive. But it allows us to automatically resolve examples as above if there is a fitting pod version available. So far it doesn't look that complicated to me but maybe I am missing something.

Alternate Solutions ... Really it becomes a process problem.

This alternate solution would force us to resolve all the dependencies all the time. Not just the (interfacing) conflicting ones - even the dependencies that would otherwise be strictly internal. This seems like a huge trade-off to me.

The proposal described above is just little more loose but allows Fan to resolve such cases automatically. In my observation the problem you described above tends to occur seldom in systems with bigger granularity. There are certainly very complex and powerful packages with many dependencies with rather simple API...

alexlamsl Thu 28 Aug 2008

I agree to keep the system as it is for now. We can always make it less restrictive when we gather more experience for using Fan in the field.

katox Sun 31 Aug 2008

True Alex but this type of decision should be better done before you develop much code. Otherwise it might force you to choose suboptimal solution to avoid rewriting existing (and working!) components and APIs. Even this kind of relaxation might have unexpected side effects...

Fantom

#345 Pod dependencies

brian Mon 25 Aug 2008

katox Mon 25 Aug 2008

Pod versioning

Versioning scheme

Dependencies

Pod sharing

Summary

jodastephen Mon 25 Aug 2008

brian Tue 26 Aug 2008

What We Do Have

Use Case

Complexity

Alternate Solutions

andrey Tue 26 Aug 2008

brian Tue 26 Aug 2008

katox Tue 26 Aug 2008

alexlamsl Thu 28 Aug 2008

katox Sun 31 Aug 2008