#368 Java/C# Interop

brian Thu 2 Oct 2008

I've been giving a lot of thought to lang interop after my experience at the JVM Summit last week. The very first post to this forum laid out three rules:

  • Golden Rule: Fan Source -> Fan Libraries
  • Silver Rule: Fan Source -> Java/C# Libraries
  • Bronze Rule: Java/C# Source -> Fan Libraries

Up until this point we been super focused on the the "Golden Rule" (Fan-to-Fan) which really forced us to solve to the JVM and CLR portability issue. But we definitely need to refocus our efforts on interop with Java and C# (both ways). I think it has been a mistake to ignore this problem for so long. I've decided this is the next problem to tackle.

The solution space includes a couple different aspects:

  1. the syntax for bringing Java/.NET libraries "into the Fan type system"
  2. how this stuff plugs into the compiler
  3. how this stuff plugs into the runtime
  4. how the Fan type system is exposed to the Java and .NET world

We've talked about the interop problem a bit in a previous discussion. I think there was fairly broad consensus that we need static binding to the Java and C# libraries (a position I strongly agree with myself). So there is syntax problem to solve which will somehow allow compiler plugins to bring alternate Java or .NET libraries into the Fan type system for static type checking and call opcodes. I had proposed generating stubs, but that idea doesn't make sense because whatever we can do ahead of time, is most likely best done at compile time with whatever libraries you want to use.

FanObj Problem

The aspect of lang interop which seems like it is going to be the real bitch is the runtime design. Today both the Java and .NET runtime are premised upon all Fan types implementing the Obj interface and subclassing from the FanObj class. This technique also seems used in other languages like JRuby and Jython. The problem with that approach is that it doesn't allow us to bring native Objects from a Java/.NET library into the Fan runtime without boxing. Boxing is feasible, but creates a serious performance problem and can muck up the the notion of object identity.

So the question then becomes how can we redesign the two Fan runtimes to use normal java.lang.Object and System.Object. Basically it means we can't use normal method dispatch for Obj methods like hash, equals, compare, toStr, trap, and toImmutable. We need to make these methods work for both Fan types and arbitrary Java/.NET types. I think that is a tractable problem using some magic in the fcode and emitters. No doubt it is an incredible pain the ass to refactor, but doable. It also potentially opens the door to leveraging the built-in equals method which I would suspect gets optimized by HotSpot better than Fan's boxed version (one the earliest HotSpot optimizations was to optimize equals for reference equality).

Boxed Strings?

So then once you make that leap, it seems quite likely we can apply the same techniques to sys::Str so that we don't need to box java.lang.String and System.String. Rather we can route the Fan APIs to static methods that take a native string. Using native strings in place of boxed strings is definitely a nice performance and memory improvement. And I expect it too might benefit from HotSpot optimizations better than boxed strings.

Primitives

So then I start thinking about taking it all the way. What does it look like when Fan is used to build a library for use by normal Java and C# code? Idiomatic Fan maps to normal Java and C# probably better than any other alternate language out there (because it is such a small evolution of Java and C# itself). The only real ugliness is that Bool, Int, and Float will be boxed. For return values this isn't a huge deal because you can just call ".val" (still annoying). But it is a serious pain in the ass for arguments because instead of using a normal bool, int, or float literals in your Java/C# code you have to call something like Int.make(10).

Solving the bool, int, float problem is kind of a can of worms. Since both Java and C# support method overloading we could potentially generate overloaded versions of all the methods which take the appropriate primitives. That is a bit heavy weight and doesn't solve the return value problem (since we can't overload by return value).

So I'm debating taking the plunge towards using primitive booleans, longs, and doubles for sys::Bool, sys::Int, and sys::Float respectively. Using boxed numerics is the number 1 performance issue Fan as compared to Java and C#. Dropping down to use primitives would put Fan on par with Java and C# performance. We might have a slight penalty on 32-bit machines using longs everywhere, but that won't last long in the scheme of things.

But the complexity is nothing to discount. As soon as Fan type's system uses primitives we have to deal with the fractured type system. We now have to deal with boxing and unboxing and all the headaches that causes. But we have to solve this on the lang interop front no matter what.

The thing I'm really struggling with is how deal with null primitives. This would be a really great reason to move towards a full nullable type system since our APIs make extensive use for null Int in Str and IO APIs. But I just don't think I can swallow that. So I don't know how to handle that - the way C# only did nullable types on primitives is pretty ugly.

Conclusion

So these are the lines I'm thinking of:

  • move toward using normal Object in the runtime
  • create alternate call semantics for Obj methods
  • move toward unboxed, native Strings using alternate call semantics
  • evaluate moving toward primitives
  • once the foundation is sound solve the syntax/compiler design

If we solved all those problems then Fan becomes really ideal for Java and C# interop. We could call native libraries without boxing everything. We could expose Fan libraries as normal Java and C# libraries - it won't be perfect since all your ints would be longs (but that is probably the way it should be on the eve of 64-bit platforms), but it would be a pretty darn good solution.

I'm a bit stumped on the right way to go with primitives. Getting primitive performance is something I would absolutely love! Value types in the JVM are at least probably not going to happen until at least 1.8 if ever - which at best is six years until widespread deployment. So I'm more inclined to make Fan less pure in the name of performance and lang interop.

helium Thu 2 Oct 2008

But it is a serious pain in the ass for arguments because instead of using a normal bool, int, or float literals in your Java/C# code you have to call something like Int.make(10).

You could do this:

// c#
class Int
{
   // ...
   public static implicit operator Int(int x)
   {
      return Int.make(x);
   }
   public static implicit operator int(Int x)
   {
      return x.getPrimitiveValue();
   }
}

So in C# it's actually pretty convenient. In Java ... I don't know. You could create a class with static methods

// Java
class Fan
{
    public static Int fan(int x)
    { 
       return Int.make(x);
    }
    // same for other primitives
}

and static import it. That way you could write

object.method(fan(42), fan(3.141592654))

rather than

object.method(Int.make(42), Decimal.make(3.141592654))

. Not much, but I'd still prefere it.

JohnDG Thu 2 Oct 2008

Fantastic news! In my opinion, Fan will never be more than a niche language without easy, seamless support for Java and/or C#. There are just too many nice JVM languages that make it really easy to do interop with Java code, and too much existing code written in Java/C# to ignore -- either from a productivity standpoint (e.g. "I'm developing a new application, what library can I use?"), or from a business standpoint (e.g. "We've got 1 million lines of code of Java and cannot afford to rewrite it, so we'll just stick with Java.").

As for primitives, in an ideal world, any declaration of Int in Fan would create a primitive, but putting such a value in a map or feeding it to a function that expects an object would do autoboxing. So from the Java/C# side, nearly all Fan methods use primitives. Meanwhile, when Fan is calling Java/C#, the user still specifies Int but the compiler decides what suitable function it can call, and does unboxing if necessary.

brian Thu 2 Oct 2008

So in C# it's actually pretty convenient. In Java ... I don't know. You could create a class with static methods

That is a good idea Helium - definitely C# would make for much prettier interop assuming we kept everything boxed in Fan specific classes. But after sleeping on it I was thinking that even if we keep numerics boxed, I would actually use java.lang.Boolean, java.lang.Long, and java.lang.Double. Then I will get auto-boxing from Java for free when calling into Fan libraries:

// fan function
Int foo(Int a, Int b) { return a + b }

// would map to Java as
Long foo(Long a, Long b) { return Int.plus(a, b); } // or some static method

// so I could use natural Java code and let the Java compiler 
// deal with boxing/unboxing
long r = foo(4, 5);

The real question here is how Fan's type system deals with primitives. Obviously we need to support boxing/unboxing. The tricky question is nulls - do we allow an Int value to be null? Because that can't be mapped to a primitive. Java's solution is to make a distinction between int and Integer. C# solution is nullable types, but only for value types. What should Fan's approach be?

brian Thu 2 Oct 2008

Actually I just tried that in Java, and auto-boxing doesn't appear to coerce ints into Longs so you have do this:

long r = foo(4L, 5L);

That seems inconsistent with the rest of the language, and would be a bit annoying for interop. Does anyone know why javac works that way?

Edit: I asked this question on JVM-Lang group.

jodastephen Sat 4 Oct 2008

This sounds really good, as I've been getting a bit worried about how Fan might imply starting all projects from scratch. This should allow an easier intro.

On primitives and nulls, my gut feeling says that it is a type-system thing. We've already talked about using a symbol with the type to define whether a variable can hold null or not. And the academic evidence suggests that most programmers intend most variables to not hold null (not null by default).

Providing null in the type system may be a harder change to make, but it has more power longer term. Just allowing non-null behaviour for primitives seems like a cop-out, as the principle applies to any variable.

The key to such a change is getting the auto-casting right, to avoid excessive knock on effects from making a change in one part of a program.

brian Sat 4 Oct 2008

This coming week, we need to discuss the notion of primitives and null in the type system. In the meantime, I've checked in the first phase of the Java runtime refactoring.

Previously the Java runtime assumed that all objects extended fan.sys.FanObj and implemented the fan.sys.Obj interface. I've gotten rid of the fan.sys.Obj interface and the runtime now uses java.lang.Object instead. Method calls on sys::Obj are now routed to static methods on FanObj. For example, Obj.hash is now implemented as:

public static Int hash(Object self)
{
  if (self instanceof FanObj)
    return ((FanObj)self).hash();
  else
    return Int.make(self.hashCode());
}

Until we switch to use Java primitives, this means we have a conflict with Object.equals since we can't override by return type. So sys::Obj.equals maps to _equals for now. We can fix that when we fall back to use real Java primitives.

This was a pretty major refactoring of all the Java code. But it didn't require touching a line of the compiler or any Fan code - so it does prove we have a clean separation of concerns. I was a bit concerned how this would effect performance since we have an extra level of indirection on Obj methods. But there was no noticeable change, it was +/- 1% for various benchmarks. HotSpot does its job well.

So the real point of this change is that we can now represent any normal Java object within the Fan type system as sys::Obj. This is a huge win for efficient interop with Java libraries.

The next phases:

  1. Replace Bool, Int, Float, Str with java.lang.Boolean, Long, Double, and String. This will allow us to leverage javac's auto-boxing for cleaner interop. When we move to real Java primitives it will also let us map efficiently with java.lang.reflect. I also think it will put us in a much stronger position for HotSpot optimizations and future value-type features.
  2. Decide how to model primitives and nulls in Fan type system.
  3. Refactor Fan compiler and Java runtime to deal with primitives.
  4. Define interop syntax and compiler plugins.

alexlamsl Mon 6 Oct 2008

Until we switch to use Java primitives, this means we have a conflict with Object.equals since we can't override by return type. So sys::Obj.equals maps to _equals for now. We can fix that when we fall back to use real Java primitives.

Why not just auto(un)box?

brian Mon 6 Oct 2008

Why not just auto(un)box?

We will eventually, but first things first.

I have completed refactoring for the Java runtime to map types as follows:

sys::Obj     =>  java.lang.Object
sys::Bool    =>  java.lang.Boolean
sys::Str     =>  java.lang.String
sys::Num     =>  java.lang.Number
sys::Int     =>  java.lang.Long
sys::Float   =>  java.lang.Double
sys::Decimal =>  java.math.BigDecimal

For the most part these changes were completely isolated in the Java runtime. One minor change I did need to make was to make the methods on sys::Num non-virtual and to make its constructor internal. But other than that, no Fan code or APIs where changed.

Performance wise things stayed about the same. Using Java's boxed numerics actually hurt performance slightly. Getting rid of Fan's boxed string increased performance a tad. The compiler got around 1% faster (which is string intensive).

Andy has started on refactoring the C# code following the same model.

This is just the first step towards primitives, nullable types, and hopefully clean, efficient interop with Java and .NET libraries.

alexlamsl Mon 6 Oct 2008

Using Java's boxed numerics actually hurt performance slightly.

Not unexpected or hard to imagine :-)

But as a standard feature in the Java Language, the chances of it getting performance boost by latter Sun implementations is quite probable...

Having said that, I'm not holding my breath on that one. Looking forward to using a Fan library in Java "natively" soon!

(BTW - is BigInteger not represented in Fan because it can be conveniently represented by a Decimal? Just like the case between Double and Float?)

brian Tue 7 Oct 2008

(BTW - is BigInteger not represented in Fan because it can be conveniently represented by a Decimal? Just like the case between Double and Float?)

I'm inclined to say we just use sys::Decimal (BigDecimal) when you need something like BigInteger. Maybe one day the JVM will get a full Scheme like numerical tower, which would be a much more elegant solution. But I'm not going to tackle that for Fan.

Login or Signup to reply.