A 10 Year Journey, Stop 5: Growing the SOM Family

SOM, the Simple Object Machine, has been a steady companion for much of my research. As mentioned earlier, all this work on virtual machines started for me with CSOM, a C-based implementation of a simple Smalltalk language. From the beginning, SOM was meant as a vehicle for teaching language implementation techniques as well as doing research on related topics. As such, it is keep simple. The interpreter implementations do not aim to be fast. Instead, concepts are supposed to be expressed explicitly and consistent, so that the code base is accessible for students. Similarly, the language is kept simple and includes dynamic typing, objects, classes, closures, and non-local returns. With these features the core of typical object-oriented languages is easily covered. One might wonder about exceptions, but their dynamic semantics very similar to non-local returns and thus covered, too.

Originally, SOM was implemented in Java. Later, CSOM, SOM++ (a C++-based implementation), AweSOM (a Smalltalk-based implementation) joined the family. Some of this history is documented on the old project pages at the HPI, where much of this work was done.

When I picked up maintaining the SOM family for my own purposes, I added PySOM, a Python-based implementation, and JsSOM implemented in JavaScript. As part of the work on building a fast language implementation, I also added TruffleSOM, a SOM implementation using the Truffle framework on top of the JVM. As well as RPySOM, an RPython-based bytecode interpreter for SOM, and RTruffleSOM, a Truffle-like AST interpreter implemented in RPython.

A Fast SOM

For TruffleSOM and RTruffleSOM, the focus was on performance. This means, the clarity and simplicity got somewhat compromised. In the code base, that’s usually visible in form of concepts being implemented multiple times to cover different use cases or special cases. Otherwise, the language features haven’t really changed. The only thing that got extended is the set of basic operations implemented for SOM, which we call primitives, i.e., builtin operations such as basic mathematical operations, bit operations, and similar things that either cannot be expressed in the language, or are hard to express efficiently.

The main reason to extend SOM’s set of primitives was to support a wide set of benchmarks. With the Are We Fast Yet project, I started a project to compare the performance of a common set of object-oriented languages features across different programming languages. One of the main goals was for me to be able to understand how fast TruffleSOM and RTruffleSOM are, for instance compared to state-of-the-art Java or JavaScript VMs.

Well, let’s have a look at the results:

Performance Overview of SOM Implementations
Performance Overview of SOM Implementations

The figure shows the performance of various SOM implementations relative to Java 1.8, i.e., the HotSpot C2 compiler. To be specific, it shows the peak performance discounting warmup and compilation cost. As another reference point for a common dynamic language, I also included Node.js 8.1 as a JavaScript VM.

As the numbers show, TruffleSOM and RTruffleSOM reach about the same performance on the used benchmarks. Compared to Java, they all are about 2-4x slower. Looking at the results for Node.js, I would argue that I managed to reach the performance of state-of-the-art dynamic language VMs with my little interpreters.

The simple SOM implementations are much slower however. SOM and SOM++ are about 500x slower. That is quite a bit slower than the performance reached by the Java interpreter, which is only about 10-50x slower than just-in-time compiled and highly optimized Java. The slowness of SOM and SOM++ are very much expected because of their focus on teaching. Beside many of the little design choices that are not optimal for performance, there is also the used bytecode set, which is designed to be fairly minimal and thus causes a high overhead compared to the optimized bytecode sets used by Java, Ruby, or Smalltalk 80.

Making SOM++ Fast with Eclipse OMR

As shown with TruffleSOM and RTruffleSOM, meta-compilation approaches are one possible way to gain state-of-the-art performance. Another promising approach is the reuse of existing VM technology in the form of components to improve existing systems. One of the most interesting systems in that field is currently Eclipse OMR. The goal of this project, which is currently driven by IBM, is to enable languages such as Ruby or Python to use the technology behind IBM’s J9 Java Virtual Machine. At some point, they decided to pick up SOM++ as a show case for their technology. They first integrated their garbage collector, and later added some basic support for their JIT compiler. My understanding is that it currently compiles each bytecode of a SOM method into the J9 IR using the JitBuilder project, allowing a little bit of inlining, but not doing much optimizations. And the result is a 4-5x speedup over the basic SOM++ interpreter. For someone implementing languages, such a speedup is great, and not to snub at, even if we start from a super slow system. But as a result, you reach the performance of optimized interpreters, while still maintaining a minimal bytecode set and the general simplicity of the system. Of course, minus the complexity of the JIT compiler itself.

To reach the same performance as TruffleSOM and RTruffleSOM, there is quite a bit more work to be done. I’d guess, SOM++ OMR would need more profiling information to guide the JIT compiler. And, it probably will also need a few other tricks like an efficient object model and stack representation to really achieve the same speed. But anyway, to me it is super cool to see someone else picking up SOM for their purposes and built something new with it 🙂.

Other Uses of SOM

And while we are at it, over the years, some other projects spawned off from SOM. There was NXTalk for the Lego Mindstorm system. My own ActorSOM++, which implemented a simple Actor language as part of SOM. And more recently, SOMns, a Newspeak implementation derived from TruffleSOM. You might have noticed, it’s actually a bit faster than TruffleSOM itself :) And, it supports all kind of concurrency models, from actors over CSP, STM, fork/join, to classic threads and locks.

Similar to SOM++ OMR, the Mu Micro VM project picked up a SOM implementation to showcase their own technology. Specifically, they used RPySOM, an RPython-based bytecode interpreter for their experiments.

Guido Chari forked TruffleSOM to built TruffleMate and experiment with making really all parts of a language runtime reflectively accessible, while maintaining excellent performance.

And last, but not least, Richard Roberts is currently working on a Grace implementation on top SOMns.

So there are quite a few things happening around SOM and the various offspring. I hope, the next 10 years are going to be as much fun as the last.

And with that, I’ll end this series of blog posts. If you’re interested to learn more, check out the community section on the SOM homepage, ask me on Twitter @smarr, or sent me an email.

A 10 Year Journey, Stop 4: Concurrency and Tooling

This post, the fourth in the series, is about my current work on concurrency and tooling. As mentioned before, I believe that there is not a single concurrency model that is suitable for all problems we might want to solve. Actually, I think, this can be stated even stronger: Not a single concurrency model is appropriate for a majority of the problems we want to solve.

In practice, this means that the programming languages, which we have today, all get it wrong. They all pick a winner. Be it shared memory, a very common choice (e.g. C/C++, Java, C#, Python, …), or be it message-passing models with strong isolation (e.g. Erlang, JavaScript, Dart, …). At some point, the choice will turn out to suboptimal, and restricting. It will either lead to code riddled with data races or deadlocks, or it becomes increasingly difficult to ensure data consistency or performance.

So, what’s the solution? I have been advocating for a while that we should investigate the combination of concurrency models. As part of this research, we started Project MetaConc. The goal of the project is not necessarily to design languages that combine models, but instead, provide more immediate solutions in the form of tools that allow us to reason about the complex programs we already got.

We outlined the vision in a short position paper last year. The general idea is to devise some form of meta-level interface for tools to abstract from the concrete concurrency models present in a language or library. With meta-level interface, I mean either some form of reflective API in a language, or an interface to the language implementation, which could either be an API or something like a debugger protocol. This would allow us to construct tools that can handle many different concurrency models, and ideally, their combinations, too.

We started the project with looking into debugging. Specifically, we focused on the actor model and identified the kind of bugs that might occur and what kind of support we would want from a debugger to handle actor problems. Carmen, one of our PhD students, reported on it at the AGERE’16 workshop.

I myself had good fun adding debugging support to SOMns. With Truffle’s debugger support that’s pretty straightforward. You just need to add a few tags to your AST nodes as described in my blog post. Since I was already busy with the debugger, I also invested some time into the language server protocol (LSP), a project pushed by RedHat and Microsoft. I think, with a platform like Truffle, and a generic way of talking to an IDE like the LSP, it should be possible to get basic IDE support for a language by just implementing a Truffle interpreter. But since that’s getting a little off topic, I’ll just point at the Can we get the IDE for free, too? blog post. In practical terms, the LSP allowed me to provide a very basic support for code completion, jump to definition, and debugging support for Visual Studio Code.

More recently, I demoed our own Kómpos debugger for various concurrency models at the <Programming>‘17 conference. It is a live debugger that abstracts from specific concurrency models, and instead allows us to use custom stepping operations and breakpoints as provided by the SOMns interpreter. In the demo, I wasn’t actually able to show that yet. That’s more recent work, which we wrote up and submitted for review to DLS’17. And at least for debuggers, I think we come very close to the goal set out for the project. We devised a protocol that uses metadata to communicate from the interpreter to the debugger which breakpoints and stepping operations are supported. This makes the debugger completely independent from the concurrency models. We also showed that one can use such concurrency-agnostic protocol to provide visualizations. And I hope that’s a good indication for being able to build other advanced debugging tools, too.

That’s it for today. There are so many other things, I probably don’t get to mention. But, in the next and likely last post in the series, I am going to look at the SOM family of language implementations.

A 10 Year Journey, Stop 3: Performance, Performance, and Metaprogramming

The third post of this series is about how I started using Truffle and Graal, pretty much 4 years ago. It might be in parts ranty, but I started using it when it was in a very early stage. So, things are a lot better today.

Concurrency needs Performance, usually

As mentioned in the last post, the result of my PhD was an ownership-based metaobject protocol that is meant to enable VMs to support a wide range of different concurrency models. The major problem with the approach, and also my evaluation, was that I couldn’t show that it is practical. The RoarVM is a simple bytecode interpreter, and the literature on compiling and optimizing metaobject protocols talked only about static systems with restrictions that would make the ownership-based MOP impossible. Worse, MOPs were kind of abandoned by the research community, because performance was an issue. Many researchers moved on to aspect-oriented approaches, at least in part, because aspects are applied more targeted and thus, incur less general overhead than MOPs.

A hard problem, abandoned for 20 years, and nobody really interested in it anymore? Pretty much sounds like it’s a stupid idea, right? It probably was, and perhaps still is. But concurrency researchers typically want to show that their techniques are useful for performance critical applications. And, I wanted to do that, too.

How to get a fast VM?

So, I needed a new platform for my research. One option would have been to take the RoarVM all the way. Build a state-of-the-art JIT compiler for it. Another would be to apply the ideas of the RoarVM to the CogVM and improve its JIT compiler. But building another JIT compiler? That’s a huge undertaking. Would probably take a few person years to get anywhere useful. And, while I am curious about compilers, I am not really seeing myself building more than a baseline JIT compiler.

But what other options do we have? RPython is a pretty interesting project. It promises you a meta-tracing JIT compiler for your simple interpreter. Sounds great. But there’s a catch: RPython doesn’t really have a concurrency story compatible with my goals. There’s a bit of experimenting with STM going on, but no decent shared-memory GC.

And then, there was that Truffle thing, colleagues from Oracle Labs kept talking about. It was just released as open source at that point. Truffle promised that simple AST-based interpreters combined with partial evaluation would run applications as fast as Java on a JVM. Sounds great. And, the JVM got everything I need for my research, too.

TruffleSOM: The first steps with a simple Smalltalk on top of Truffle

Truffle it is, I thought, and started implementing the little Smalltalk I had been toying around with in 2007 on top of it. There was already a Java version, simply called SOM. So, how hard could it be?

Well, turns out, much harder than expected. Me being not really a compiler person, I had no intuition for how partial evaluation would work on my interpreter. And as a testament to how hard it was for me, apparently even to this date, I am third in the ranking of who spammed the graal-dev mailing list most.

I suppose there were three important reasons for it. As I mentioned before, I had no intuition of how the partial evaluation really works, and what kind of optimizations I can expect from the systems. The second reason was that I did not really have access to the expertise. Mailing lists are fine but slow. And people need to be willing to take the time to answer. So, during my endeavor of building a fast Smalltalk, the most helpful conversations were actually in person at conferences with people from the Truffle team, or when I actually got the chance to visit them in Linz. And the third reason, which is fixed by now, was the overall stability of the system. To me it was very surprising that I ran into many bugs in Truffle and the Graal compiler. I could somehow not really believe that the Truffle team didn’t encounter those issues with their JavaScript and Ruby implementations. But as it turns out, each new language implementation does things somewhat different and the languages are just different enough to trigger new edge cases that haven’t been considered before. As far as I know, there is still a little of that happening today with Graal. Every new language leads to one or two bugs being discovered that none of the previous languages hit.

How can this be sooo hard???

All in all, my experience to build a new language with Truffle and Graal was far from pleasant. On the contrary, it was frustrating. I often just didn’t have the knowledge to debug problems myself. And the Truffle team didn’t really have time to teach an outsider all those basics.

So, yeah, I was very close to throwing in the towel. Well, I actually kind of did. At least a little. There was a moment of “fuck this, how can it be so hard? screw Truffle! Let’s look into RPython!”

And PySOM was born. PySOM is a literal port of the SOM bytecode interpreter to Python. SOM is really a simple and small language. If you know what you’re doing, and can type reasonably fast, you can implement it in 3 or 4 days in a new language.

PySOM was the first step. The next step was RPySOM: a port from Python to RPython, which is the Python subset that the RPython toolchain can compile statically into a fast interpreter with a meta-tracing JIT compiler. This experience was sooo much more pleasant. One big reason was that the PyPy+RPython community uses an IRC channel for communication and was super friendly and happy to help with all my problems. Another reason was that I knew Carl Friedrich, one of the PyPy people already for a while, and he guided me through the classic pit falls. And, I suppose, RPython at that point was already much more mature than Truffle used to be. So, fewer crashes, and I think, I didn’t really trigger much bugs in RPython either. And, since it is a trace-based compiler, understanding what the optimizer did was also much easier, because the result mapped much more directly back to the input code of the interpreter than with a fancy graph-based compiler IR. So, yeah, RPySOM was born, and with that additional knowledge, I kind of managed to make TruffleSOM a reality, too.

Some of this story, and what we learned was written down in the Are We There Yet? article.

And Finally, Fast Metaprogramming!

Then it was time to get back to my original problem: how do I get my ownership-based metaobject protocol fast? Well, turns out if you got a fancy JIT compiler, the solution is pretty simple and already existed in other Truffle interpreters: dispatch chains. Dispatch chains are essentially lexical caches for dispatch operations. A generalization of polymorphic inline caches if you will.

Together with Chris Seaton, we published a paper on Zero-Overhead Meta Programming, where we were able to show how all kind of reflective operations can be made fast, and were I was finally able to show that my metaobject protocol can be realized without sacrificing performance.

A bit later, I also wrote up a longer paper comparing meta-tracing and partial evaluation in more detail.

Cutting a long story short: nothing is as easy as it sounds. In total, it took me two years to go from a simple Smalltalk AST interpreter to a system that can take on Java. But, things should be better today. When starting to implement a new language with Truffle, there are now a few tutorials, and other resources, and the platform is much more mature and pleasant to use!

Next week, I might take a break from this series, but there are at least two more posts coming:

  • Concurrency and Tooling, or ‘What is project MetaConc?’
  • and, Growing the SOM Family

Older Posts

Subscribe via RSS