Is there anything *enjoyable* about our development process?

Stuart D. Gathman stuart at bmsi.com
Sun Oct 16 23:34:04 EDT 2005


On Sun, 16 Oct 2005, Thomas Bushnell BSG wrote:

> So, I take this as a grudgingly granted (finally) admission that I
> cannot, in fact, add whitespace wherever I please between Python
> tokens.  

Short answer: no

Long (and Clintonesque) answer: If you define "tokens" as lexical units that do
not contain whitespace, then Python programs suitable for a standard Python
lexer cannot be expressed with those tokens - hence it doesn't make any
difference what you do with the whitespace, the programs will all be invalid
(except a few expressions).

If you define "token" as whatever the language lexical structure defines
as a "token" (and in Python, the INDENT and DEDENT tokens are composed of
white space characters), then you can add whitespace between "tokens" wherever
you please, since whitespace outside of tokens is ignored.  

And remember C also has whitespace in some of its tokens - string literals.
(So perhaps you want to define "token" as "whatever languages I am familiar
with define as a token".)

Of course, that Clintonesque answer needs a rationale:

You cannot add whitespace wherever you please in a C program and
expect it to get accepted by any project with a coding standard.
In both cases (Python and C that conforms to a coding standard), you can add
whitespace where you please within reason.

> My point is simply exactly what it was long before: that python is not
> modern in syntax; it's a step backwards, because it's not context-free.

It's a step forward for programs that can be easily read and understood
by humans - even when they are unfamiliar with the language.

Here is a suggestion.  Pretend Python has BEGIN and END (or INDENT and DEDENT
- the actual names in the Python compiler) tokens.  Use them to delimit blocks.
Have your code generator put the whole program on one line if it wants. (You
can use ';' to separate statements in Python.)  Then beautify it and remove the
BEGIN and END tokens.  That last step is a really trivial filter.  The
INDENT and DEDENT tokens are represented by newlines and tabs/spaces rather
than visible characters.  

The Python *grammar* is context free in terms of the tokens involved.  I know
you hate the lexical representation of the INDENT and DEDENT tokens, but it is
only a *lexical* issue, and easily adapted to your taste.  And the standard
lexical representation happens to match what C, Pascal, or any other block
structured language looks like when presented for human consumption.  Even a
LISP tokenizer has context.  It is only after the characters have been
converted to the beautiful internal list structures that LISP is context free.
Internally, the Python compiler has INDENT and DEDENT tokens to delimit blocks
just like any other language with blocks.

I am often told that the standard prefix notation in LISP is not a problem,
because it is trivial to add an infix front end.  I even did a bit of
programming using such a front end for an IBM XT version of LISP (MUMATH).
While true, it is not feasible in practice to have every programmer
use their own macro syntax.  Truly amazing things can be done with the
C preprocessor, especially when you realize that recursion (#include)
and decision (#if) make it Turing complete (if somewhat slow, file includes
not being especially fast).  But such code is only accepted by the obfuscated C
contest.  So everyone in a LISP project has to use the same syntax - which is
almost always prefix - making the possibility of using infix instead a red
herring.

> I don't know about you, but I regard the whole point of computers in
> *not* requiring humans to do everything.

Humans supply intelligence - literally the ability "to choose between".
It will be a while, if ever, before humans are able to pass on that
ability to their mechanical creations.  The whole point of human interfaces
and computer languages is to make espressing those choices both
intuitive for the human and feasible for the machine to implement.

> > Since the core of the system is in C, 
> > your code generators would be much better off targetting C code.

> C is not an extension language; or do we need to repeat *that* part of
> this discussion one more time?  

Agreed, but the code generator argument is a red herring for an extension
language.  If you need a code generator for something, it should target C -
the core project language.  Unless the algorithm is truly self modifying
(like, say, "genetic" algorithms for super-optimizers), code generators are a
hack to work around weaknesses in a language (e.g. bitblt mini compilers 
for graphics drawing primitives).

> Why not aim for both?  Good grief.

In my opinion, LISP excels at the machine readable part, especially
for self modifying programs, but not the human readable part.  But that
is only an opinion.  Perhaps there is a statistical study somewhere of 
how well the average person or even the average programmer comprehends 
LISP code that could provide a more objective evaluation.

Since Python seems to be so controversial, how about some of the other
options (including sticking with Scheme)?

Do we agree on these requirements for the high level language?

Must be dynamic (late binding).
Must have automatic memory management.
Compilation must be "invisible" to the script programmer (gives
  the illusion of a pure interpreter even if some kind of byte code
  or machine code is cached).
Must be a standard package on most distros.
Must be currently actively supported.
Must provide a clean C language extension API to interface with
  the core GnuCash C code.
  o Must support callbacks into C from high level language.
  o Must support callbacks into high level language from C.
Must provide a robust module system able to encapsulate the GnuCash interface
  in a module.

Other features are more controversial:

Robust native support for "object oriented" paradigm (LISP could argue
  that closures are more general).
Block structure (BEGIN/END,INDENT/DEDENT,{/},[/],yada/yoda).
Scripts should be understandable by humans with minimal 
  or no introduction to the language.  This may leave out otherwise
  very elegant languages like Prolog and pure functional languages.
  o This may also leave out languages with postfix or prefix based syntax.
  o Infix languages with no operator precedence are probably also out
    (e.g. Smalltalk, APL).

-- 
	      Stuart D. Gathman <stuart at bmsi.com>
    Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.



More information about the gnucash-devel mailing list