#633 Packed value lists

tompalmer Thu 11 Jun 2009

As discussed in #625, for performance (speed and memory) reasons, it might be good to make Float[], Int[], and Bool[] be packed arrays in memory, for platforms where that is possible.

I'm breaking the subject out here to give it a topic of its own.

Doing this would break backwards compatibility, since Lists would be required to validate that objects added conform to (i.e., are subtypes of / assignable to) their of type.

It also has interest implementation issues to consider, such as whether the JVM/CLR emit takes care of the details or if there are custom internal subclasses or whatnot. Are special getFloat (and other) methods exposed on List, or is that all hidden? If exposed, how does this relate to operator overloading?

Note that FFI might still be wanted for packed arrays of smaller size (say 32-bit floats or ints), but this would cover performance issues for the common cases.

brian Thu 11 Jun 2009

Promoted to ticket #633 and assigned to brian

Not sure if this is something we really need for 1.0, but certainly worth tracking to add since it might boost performance.

I am pretty sure we can do things like get/set/add under the covers as an optimization, other things that work with functions like List.each require boxing. So it seems like unless you really know what you are doing you could actually make performance worse by boxing all the time on access versus one boxing operation on add.

This would make a lot more sense if functions could be optimized to avoid boxing (but holy cow that is a can of worms on the JVM).

tompalmer Thu 11 Jun 2009

other things that work with functions like List.each require boxing. > This would make a lot more sense if functions could be optimized to avoid boxing (but holy cow that is a can of worms on the JVM).

Good points. Might be cool to provide optimized function calls for Funcs with value argument types. But maybe you are right to defer the pain for now.

Still, it might be worth adding runtime enforcement of of for future-proofing compatibility reasons. Or were there performance or other concerns with enforcing of? Seems like it's been discussed before, but I don't remember the details and haven't gone researching old posts.

KevinKelley Thu 11 Jun 2009

Seems like it's been discussed before

#562 is about checked lists, and I think the outcome was, Sounds good but nobody's sure it's absolutely necessary and won't cause problems.

qualidafial Thu 11 Jun 2009

Still, it might be worth adding runtime enforcement of of for future-proofing compatibility reasons.

Seems like this could be trivially enforced in the JVM if sys::List used java.lang.reflect.Array.newInstance(of.toClass(), size) instead of new Object[size], no?

qualidafial Thu 11 Jun 2009

So it seems like unless you really know what you are doing you could actually make performance worse by boxing all the time on access versus one boxing operation on add.

That's a really good point. Something as simple as List.each would force every element to be boxed and discarded--yuck.

So perhaps having distinct IntList and FloatList types will be the most performant, if not the most elegant.

nobody's sure it's absolutely necessary and won't cause problems.

I assumed that using the decorator pattern would sidestep the side effects issue. Did I miss something?

tompalmer Thu 11 Jun 2009

#562 is about checked lists

Thanks. I'll go make the recommendation there, then.

Seems like this could be trivially enforced in the JVM if sys::List used java.lang.reflect.Array.newInstance(of.toClass(), size) instead of new Object[size], no?

I haven't looked at the code. I'd presumed it used ArrayList in Java, but I never looked. If it does manage arrays directly, and if Fan classes always map so nicely to Java classes, that would be an easy fix, I agree.

brian Thu 11 Jun 2009

Still, it might be worth adding runtime enforcement of of for future-proofing compatibility reasons.

I don't think enforcing of is really a backward compatibility issue. If you create an Int[], then cast to a Obj[] and then stick a Str in it then that is a bug. The fact that the runtime allows it (until you do a get) doesn't mean that it won't break in the future. So I guess I take the viewpoint, that backward compatibility guarantees aren't extended to incorrect/buggy code that only works because of loop holes in the runtime checks.

So perhaps having distinct IntList and FloatList types will be the most performant, if not the most elegant.

That might end up being the best solution. Although honestly I don't consider this a huge issue since Fan as is still way faster than Groovy, Ruby, etc. I think it really only comes into play if doing hard code math/graphics - does that warrant special optimizations in the core? Or is it best just handled by another library?

tompalmer Thu 11 Jun 2009

I don't think enforcing of is really a backward compatibility issue. If you create an Int[], then cast to a Obj[] and then stick a Str in it then that is a bug.

Still, runtime behavior manages to get exploited in clever ways. I still think it matters.

Although honestly I don't consider this a huge issue since Fan as is still way faster than Groovy, Ruby, etc.

In that recent performance comparison, it seems like they needlessly used an ArrayList in Java when a raw array might have shown an even much larger gap.

Or is it best just handled by another library?

For my matrix code, I could provide cross-runtime optimization more automatically for common cases if this was supported directly in Fan. Also, it just seemed like it might be nice if, out of the box, large Fan numeric lists just happened to perform well.

That said, it might be necessary eventually to support matrices of smaller (than 64-bit) types eventually anyway, and I wouldn't recommend putting that in the core.

There is one use case for me if Int[] was packed. I currently use Int[] for n-dimensional matrix indexing (e.g., the element at [3, 7, 4]). But that creates lots of stray objects, and I'm afraid of slow math when providing indexes on callbacks (incrementing indexes as it goes, though maybe the Long object flyweights in the JVM take away some of the pain, and I haven't profile anything). I need to decide about changing this to use the Mat type which can optimize such things or stick to the Int[] type which makes it clear that integers are needed, and only a single list (rather than n-dimensional data). To summarize, Int[] is clearer, but it might not perform as well as I'd like.

Again, I'm fine indefinitely postponing the packed list feature to the future, but I'm not sure I agree that changing behavior for abusing of (back to #562 again) should be considered safe. Well, maybe the docs could reserve the right to enforce that in the future. That might be good enough for now.

KevinKelley Thu 11 Jun 2009

If you create an Int[], then cast to a Obj[] and then stick a Str in it then that is a bug.

I agree, but it's worrisome -- bad code has a way of getting enshrined: "we know it's wrong, but we can't afford to re-write it".

That said, I don't see this as being a huge issue, and I wouldn't think enforcing of would be worth a slowdown. qualidafials checked decorator, or brians addChecked(Func) idea, might be the best way to deal with the checked issue. But I'm not personally feeling a great need for it.

As to the packed arrays -- my feeling is that can probably be handled okay in library code. I'd say that until somebody proves that

  • A) Fan is too slow
  • and B) it could be faster if we did X
  • oh yeah, and C) X won't break anything else.

So far anyway, it feels like we're speculating. Tom's work on Matrix implementation might give us something to test with, get a feel for what Fan can really do.

qualidafial Thu 11 Jun 2009

If you create an Int[], then cast to a Obj[] and then stick a Str in it then that is a bug.

Agreed. However if somebody tries to put a Str into an Int[] I would rather see his code blow up than mine. The problem with the current permissive behavior is that when the bug finally surfaces, any trace of how that Str got in there could be gone. So instead of being able to just look at the top of the stack trace to see which idiot caused a CastErr I may have to search through the entire code base or trace through a long, tedious debug session. Bonus points if the list passes through a few actors first.

brian Thu 11 Jun 2009

Agreed. However if somebody tries to put a Str into an Int[] I would rather see his code blow up than mine. The problem with the current permissive behavior is that when the bug finally surfaces, any trace of how that Str got in there could be gone.

I don't disagree with you, I agree it would nice to have. But at the same time I just really don't care that much about it - it has been that way with Java ArrayList since the beginning (untyped until you tried to pull a a value out and cast it).

That said, if someone wants to submit a patch that does check adds against List.of which 1) doesn't add a lot of code and 2) doesn't degrade the performance of List at all, I would certainly welcome taking a look at it.

The simplest thing would be to use of to check the type being added with reflection. A more sophisticated approach would be to allocate the correct type of the backing array using reflection. I don't know how HotSpot will optimize those reflective calls.

qualidafial Thu 11 Jun 2009

That said, if someone wants to submit a patch that does check adds against List.of which 1) doesn't add a lot of code and 2) doesn't degrade the performance of List at all, I would certainly welcome taking a look at it.

I will create a patch for this tonight or tomorrow. Rather than checking every element though my plan is to create the List's internal array using java.lang.reflect.Array.newInstance(of.toClass(),capacity) instead of new Object[capacity] and take advantage of the implicit type checking already built into the JVM.

brian Thu 11 Jun 2009

Great - can't wait to see it. I've heard supposedly that later versions of HotSpot optimize newInstance(X) to be as fast a new X[]

tompalmer Fri 12 Jun 2009

I agree. Sounds cool. Maybe profiling before and after would be nice.

Rather than checking every element though my plan is to create the List's internal array using java.lang.reflect.Array.newInstance(of.toClass(),capacity)

Skipping null checks for now, then? (Just bringing up the topic. Not arguing one way or another at the moment.)

brian Wed 17 Jun 2009

Qualidafail submitted a patch to get this started, and after a few tweaks I was able to get List to use a typed array as it backing store. This lets Java handle the type checking and throw ArrayStoreException if there is a problem.

Measuring performance before and after shows that the enhancement really doesn't effect performance any measurable way, so I pushed the change into hg.

Now you will get a CastErr if you attempt to add an incorrectly typed item to a list which has been upcast:

fansh> Obj[] strs := Str[,]
[,]
fansh> strs.add(4)
sys::CastErr: Adding 'sys::Int' into 'sys::Str[]'
  fan.sys.List.insert (List.java:375)
  fan.sys.List.add (List.java:342)

tompalmer Thu 18 Jun 2009

Sounds awesome, but I've been thinking about some potential problems with this nice solution.

I've been thinking about how it probably won't handle things like so:

strLists := Str[][,]
Obj[][] objLists := strLists
objLists.add(Num[,])

Seems like type erasure will keep such things from working automatically. Also, if the check is done for Java, it needs done for .NET and JavaScript (in my opinion, at least, but maybe I'm just being ornery). I'm guessing .NET will work at least as easily as Java (could be wrong), but JavaScript will presumably need manual checks entirely.

And there's also still the nullable thing. And making Map behavior consistent with List behavior.

Might all be fine, but it's all stuff to consider, at least.

For manually required checks, maybe optimized methods for Obj[] and Obj?[] could help prevent extra code execution for JavaScript. Or extra clever adding of extra checks only for necessary cases by the compiler itself (or the native code emit stage) could be ways of handling it. Since static typing can show checks in some cases to be unneeded.

Again, just some thoughts. Maybe all fine to ignore for now.

brian Thu 18 Jun 2009

As far as I can concerned this is a just an extra plus we got b/c the JVM seems to handle it for us really efficiently. No matter what we do, the JVM and JS will not handle type casts the same at runtime, the JVM just does a lot more under the covers since it is a statically typed environment.

tompalmer Thu 18 Jun 2009

Sounds fine for now, but I recommend documenting that such things are officially illegal and the right to enforce them is occasionally enforced and always reserved.

It also occurred to me that for the same erasure reasons, we already went along with not enforcing the 'of' for type casts of Lists for normal vars, and the newly committed behavior would be consistent with that. Or in other words, I think the new behavior in Java stays consistent with other parts of the language, so I think that's good.

andrey Thu 18 Jun 2009

Hi folks,

What about fanish version of java.nio.Buffer hierarchy based on underlying JDK implementation? Also it could be used further in conjunction with potential (a)sync I/O subsystem.

Kind Regards, Andrey

brian Thu 18 Jun 2009

That is pretty much what sys::Buf is, it has three implementations:

  1. MemBuf backed by byte[] in memory
  2. FileBuf backed by a RandomAccessFile
  3. MmapBuf backed by a nio.MappedByteBuffer

Although I tried extremely hard to unify buffered IO with stream IO (and all-n-all it works pretty well).

brian Wed 12 May 2010

Ticket cancelled

We may eventually want to implement Bool, Int, and Float lists with their Java primitive counterparts. But given that so much of Fantom uses boxed versions, it is unclear what the performance implications will be. At this point I don't think the extra complexity warrants this feature. What would be really cool is if the JVM optimized this for us since with the latest Fantom code, the JVM knows its has a Long[]. Anyways I am going to stop tracking this as a ticket for now.

Login or Signup to reply.