#2452 Regex.split() Implementation

SlimerDude Wed 2 Sep 2015

The current Regex.split() implementation delegates to Java's java.util.regex.Pattern.split() - which has odd behaviour if a limit is used. (The last entry in the returned array contains all remaining input that wasn't split.) This is different to Javascript's String.split():

Java:
"1 2 3 4 5".split(" ", 3);  // --> ["1", "2", "3 4 5"]

Javascript:
"1 2 3 4 5".split(" ", 3);  // --> ["1", "2", "3"]

Java's split is impossible to implement in Javascript without re-writing your own split() method and manually applying the regex and building up your own array. Urgh.

But Javascript's split method is pretty easy to replicate in Java (and in my opinion, has more expected behaviour).

So in the interests of uniformity, how aboot updating Fantom's Regex.split to follow Javascript's behaviour?

brian Wed 2 Sep 2015

If its not the same, might be good to see how other engines do it like Ruby, Python, etc

SlimerDude Thu 3 Sep 2015

Interesting...

  • Python returns the remainder of the string
  • Go returns the remainder of the string
  • .NET returns the remainder of the string
  • Ruby returns the remainder of the string and also follows Java's behaviour of trimming empty strings from the array if the limit is zero.

I don't know who's following who with this! Javascript still doesn't do it though :)

brian Thu 3 Sep 2015

I would say that we should make it work like Java does then. Or I actually never use the limit, so I wouldn't personally care if JS worked different than Java (but opinions would be welcome)

SlimerDude Sat 5 Sep 2015

I don't use limit with split either(*), but I would want Fantom's behaviour to remain consistent between runtimes. Only then can you be certain that libraries are truly interchangeable between environments.

(*) I think I used it on the Fantex Website to prevent some regular expressions from returning a seemingly infinite number of matches.

andy Wed 9 Sep 2015

JVM and JS regex grammars differ - so its never truly portable - so I think we could allow some differences - as long as they are documented well.

Login or Signup to reply.