[offtopic] marshalling

linas@linas.org linas@linas.org
Wed, 3 Jan 2001 12:14:28 -0600 (CST)


It's been rumoured that Christopher Browne said:
> 
> > Here's the question: if one writes a soap dtd/schema in the M$
> > framework, it will then auto-generate language bindings for several 
> > languages? (i.e. they treat the  soap dtd/schema as an IDL for 
> > all practical purposes? OR did they invent some new IDL language?)
> 
> None of the above.
> 
> What SOAP amounts to....

Right, but that doesn't answer the question.  Marshalling and message
formats are related to language bindings, but are not the same thing.


For example, If I were building a system like this, and this were
C++, I might build a base class like so:

class XML_IO_Base_Class {
  public:
    void SendMyself ();   // converts object to XML and 
                          // sends it on socket
    void ReceiveMyself(); // receives XML from socket, and 
                          // populates object field values
  private:
    int socket;   // etc ...
};

And anything deriving from this can send/receive itself over the net:
e.g.

class gnc_account : public XML_IO_Base_Class {
  public:
    char * account_name;
    char * bank_name;
    int current_balance;
}

So, to send the account struct accross the net, we do:

gnc_account *acc = new gnc_account;
acc->account_name = "ABC Credit Line";
acc->current_balance=100;
acc->SendMyself();

The problem is, of course, that XML_IO_Base_Class needs to 'find out' 
or 'know', somehow, that gnc_account consists of three fields, two
of which are char *, and one of which is int.

In CORBA and RPC, a description of the structure is accomplished with 
an IDL; then IDL parsers generate 'stubs' that automatically
implement routines such as "SendMyself()". 

Note that XML DTD's contain *almost* enough information to be an IDL,
but not quite.   In particular, they fail to indicate the type of
the field.
For example:
<gnc_account>
  <account_name>ABC Credit Line</account_name>
  <current_balance>100</current_balance>
</gnc_account>

has a dtd that looks like
<!element gnc_account     - - (account_name, current_balance)>
<!element account_name    - o #PCDATA>
<!element current_balance - o #PCDATA>

Sooo ... is the balance an int or a char *?  The DTD doesn't say,
can't say.

This isn't a problem for perl, because perl 'doesn't have' types.

$account_balance = "100";  # string? number? no problem in perl

I don't know C# or VB, but maybe they're untyped as well.  But for 
C and C++, and Java, its a problem.  

Sooo I restate the question:  Does Microsoft define a new IDL
language to deal with this issue, or do they extend the idea of a DTD
to indicate the type in some way, e.g. OFX is marked up in the
following rather bogus fashion:

<!-- four byte integer -->
<!ENTITY % INTTYPE  "(#PCDATA)">
<!--#ENTITY % INTTYPE			#Datatype(I-4)-->

<!-- string of 255 chars or less, BSOD if more -->
<!ENTITY % CHARTYPE "(#PCDATA)">
<!--#ENTITY % CHARTYPE			#Datatype(A-255)-->


---------
Either SOAP / .net needs to solve this problem for typed languages,
or everybody should program in perl.  I'm curious about how this is
handled.

--linas