new text format for HTML files?

Christopher Browne cbbrowne@hex.net
Sun, 13 Aug 2000 21:55:02 -0500


On Sun, 13 Aug 2000 22:04:52 -0000, the world broke into rejoicing as
Yannick LE NY <y-le-ny@ifrance.com>  said:
> Now, the english HTML files are in SGML.
> For the next update for french HTML files, do you want it in SGML or in
> HTML?
> For me translate the HTML files in french take a long time, if I need to
> learn SGML
> I need more time to release at the beginning these files but if it is
> necessary, I learn it.
> Because it will be a future standard(see below)
> 
> >DocBook/XML.  Dan's summary of the Documentation Summit is at
> >http://mail.gnome.org/pipermail/gnome-doc-list/2000-July/001502.html
> 
> And I will need these packages below or others:
> 
> >And making that all work out normally requires three packages:
> >   a) Jade/OpenJade
> >   b) DocBook DTD
> >   c) Norm Walsh's Modular Style Sheets
> >which are all available in RPM and .deb form.
> 
> But how do you have transfer the HTML files in SGML files.
> Is there any text editor for natural entry and output in SGML?

I've run through the French docs twice, mapping a bunch of stuff,
and then giving up and starting over in hopes of getting it to go
a little more cleanly the next time.  It hasn't been getting
much cleaner, unfortunately.

I wrote a DSSSL script that will transform HTML tags into
DocBook tags; it is _not_ a complete translation, by any means, 
does _no_ handling of <DIV> entries, which appear to be in fairly
major use in the French docs, and expects itemized lists like:
<ul>  
  <li> this 
  <li> that 
  <li> other thing 
</ul> 
to be set up as such, whereas the HTML files seem to do a lot of
physical formatting looking like:
<ul> <div> <b> Topic </b> about something </uL>

Any portions where tagging has been "abused" by using tags for other
than what they were intended for because some web browser happens to
render things "attractively" winds up requiring that that whole section
get retagged from scratch.

It looks like there is no _significant_ shortcut over retagging much of
it by hand.

Furthermore, I haven't yet seriously looked at accent handling.

The English docs were in pretty readily transformed form; the
French form will take a _lot_ more work...
--
cbbrowne@hex.net - <http://www.hex.net/~cbbrowne/>
The proof of a system's value is its existence.
-- Alan Perlis
[Thus implying COBOL and JCL /do/ have some value after all!  Ed.]