[offtopic] marshalling

Tyson Dowd trd@cs.mu.OZ.AU
Thu, 4 Jan 2001 17:48:22 +1100


On 03-Jan-2001, linas@linas.org <linas@linas.org> wrote:
> > There's one technical feature of .NET that tends to get lost in the
> > spin.  This might give you an idea of the level of interoperability
> > you can get.  It's possible (indeed, it's simple) for a class written in
> > one language to inherit from a class written in another language. 
> > Since there a single root class (Object) this happens all the time.
> > This is full object orientation, virtual methods, implementation
> > inheritance, you name it.  Accessing a field or calling a method of a
> > type written in another language is as easy as if it were one written in
> > your language.  
> 
> Looks beautiful. However, isn't there dirt under that carpet?
> 
> How does the 'class factory' get the 'meta-class description'?
> 
> I know of only a few ways of getting the meta information:
> -- use SWIG: it parses C header files and tries to guess.
> -- use g-wrap & scheme forms to specify the interface
> -- use (corba) IDL and a stub generator to get language bindings
> -- write an XML schema that specifies the object.

You missed one:

-- use reflection on .class files (like Java)

And it's the final option that MS goes for.  The meta-information
(which contains all of the stuff that is usually in an IDL) is stored
with the code -- the jitter uses it to jit the code, and other compilers
use it to interface to the code.  Your compiler will simply load the
.dll you want to interface with, reflect over the class in it, and
insert the contents into its symbol tables somehow.

Everthing can be disassembled into information like this:

.assembly 'hello' { }
.class public 'hello' {
.method static default  void 'main'() {
.entrypoint 
.maxstack 1
.zeroinit 
        .locals ()
        ldstr   "Hello, world\n"
        tail.
        call    void 'System'.'Console'::'WriteLine'(class 'System'.'String')
        ret
}

And the meta-information is extensible -- you can add your own
meta-information (as a normal developer or as a compiler writer).
For example C++ can add an attribute to an "int32" parameter to say that
in C++ you should consider this parameter to be a "long".  But other
languages are free to ignore this attribute and treat it as an int32,
since they might not draw any distinction between the two (or might not
have a long at all).   In pascal you can add an attribute to a parameter
to mark it as an output parameter, so that the caller knows they don't
have to initialize the actual parameter before the call.

Finally, because this kind of impedence mismatch happens a lot, there is
a standard level of interoperability called CLS (Common Language
Standard is what I think it stands for this week).  This is a subset of
the type system and some conventions of naming, as well as some
functionality that compilers must implement (e.g. keyword escaping) that
make it relatively easy to interop with other languages.  It is expected
(although it remains to be seen) that you will offer a simplified API
for interoperability purposes, and a more rich one for languages that
might be close to your own.

> 
> The last three options all require extra work on the part of the 
> interface designer.  The first option only works automatically 
> in the simplest cases, and needs hand-massaging for anything harder.
> 
> The way you wrote up your description makes it sound like Microsoft
> has somehow magically solved the meta-class problem, presumably
> by building a super-duper-ultra-smart version of SWIG.  Right?

If you develop super-SWIG for N languages, either it
	(a) understands just one sort of "header file" (e.g. C header files)
	   and generates N different bindings 
	(b) understands N different "header files" and generates N
	   different bindings

Doing (a) is just the same as using an IDL, and we already eliminated
that.  Besides, as the implementor of super-SWIG, you have to maintain N
different bindings, while those languages go their separate directions,
which is a tough job.

Doing (b) is an N x N problem, where N is the number of languages that
want to interoperate.  This is N times tougher than (a).

So the solution .NET uses is to develop one VM, and embed the meta-info
in the VM.  They support just the VM, so they don't have to worry about
N of anything.  Everybody else targets the VM and has to worry about
getting the VM types working in their system, so they don't have to
worry about N of anything either.  Each compiler write has to write a
reasonably simple convertor from the CLS to their language, and can
optionally implement a moderately complex full binding of all the .NET
types and features into their language. 

The only time you have an explosion in interoperability problems is when
N languages with features not directly supported by the VM want to
interoperate on those features.  And because you can use custom
attributes to add your own meta-info to the VM, the hope is that they
will lead to some sort of cooperative standard.  Whether this actually
pans out remains to be seen.

Your other mail on this topic said:

> OK,
> I think I got it now: viz. basically, a super-duper SWIG.
>   
> http://www.swig.org
>   
> The 'right thing' to do in the free software world would be to
> write a module for SWIG that auto-generates SOAP schema &
> perform the marshalling when invoked.

Well, this depends entirely on what you mean by right thing ;-)

I think this would be nice, and it looks like SWIG would be pretty
suitable for this task.

But I also think that a open-source universal VM for Linux et al would
be a very nice thing too.  But it's a much bigger problem than a SWIG
module.

-- 
       Tyson Dowd           # 
                            #  Surreal humour isn't everyone's cup of fur.
     trd@cs.mu.oz.au        # 
http://www.cs.mu.oz.au/~trd #