can you link html to sgml?

Christopher B. Browne cbbrowne@hex.net
28 Nov 2000 07:03:59 -0600


Richard -Gilligan- Uschold writes:
 > I have an html report that I want to call up from a link in the sgml
 > help system.  I do not want to convert the html to sgml first, not by
 > hand, anyway, as it was generated by directly by scheme.  Can this be
 > done?

I've got a DSSSL script [attached below] that can be used to transform
HTML into "something almost-like-DocBook;" the results need to get
cleaned up a tad in order to be used, but tend to at least be Useful
Enough in avoiding the work of manually tagging it all.

Usage is thus:
% jade -t sgml -d wherever-dsl-file-got-put.dsl inputfile.html > output.docbook

Note that the DSSSL script does _nothing_ about tables, and if you
have funky HTML attributes like CLASS, FONT, and such, the script will
pass them through unchanged, which might not be exactly what is
wanted.

I would also suggest taking the output and passing it through
"sgmlnorm" so as to clean up the formatting that is Not Very Pretty.

% sgmlnorm output.docbook > output2.docbook

[which will likely complain about a whole _host_ of things that are
Not Nice in the converted document...]

More or less the whole point of the problem is that HTML _as commonly
used_ is so ill-specified that HTML documents tend to contain all
sorts of unstructured "physical rendering rubbish" that is impossible
to transform into DocBook in a really coherent way.  The above stuff
gets you probably 60% of the way; the other 40% will involve taking a
look at what's produced, and probably _throwing out_ a bunch of tags
that were being used to try to force in some physical rendition
information that should outright Not Be In Use.

---  Set phasers to Cut Here  ---

<!doctype style-sheet PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN">

(define debug
  (external-procedure "UNREGISTERED::James Clark//Procedure::debug"))

(declare-flow-object-class element
  "UNREGISTERED::James Clark//Flow Object Class::element")
(declare-flow-object-class empty-element
  "UNREGISTERED::James Clark//Flow Object Class::empty-element")
(declare-flow-object-class document-type
  "UNREGISTERED::James Clark//Flow Object Class::document-type")
(declare-flow-object-class processing-instruction
  "UNREGISTERED::James Clark//Flow Object Class::processing-instruction")
(declare-flow-object-class formatting-instruction
  "UNREGISTERED::James Clark//Flow Object Class::formatting-instruction")
(declare-characteristic preserve-sdata?
  "UNREGISTERED::James Clark//Characteristic::preserve-sdata?"
  #t)

(define (copy-attributes #!optional (nd (current-node)))
  (let loop ((atts (named-node-list-names (attributes nd))))
    (if (null? atts)
        '()
        (let* ((name (car atts))
               (value (attribute-string name nd)))
          (if value
              (cons (list name value)
                    (loop (cdr atts)))
              (loop (cdr atts)))))))

(default (if (node-property 'momitend (current-node))
		(make empty-element attributes: (copy-attributes))
		(make element attributes: (copy-attributes))))

(define (write-string str)
  (make formatting-instruction
        data: str))

(element HTML
    (make sequence
	(make document-type 
		name: "ARTICLE" 
		public-id: "-//Davenport//DTD DocBook V3.0//EN")
	(process-children)))

(element article (make element))

(element title (make element))

(element head
    (make element gi: "Artheader"))

(element BODY
        (make element gi: "Para"))

(element h1
   (sosofo-append
    (write-string "<") (write-string "Sect1") (write-string ">")
    (make element gi: "Title" )))

(element h2
   (sosofo-append
    (write-string "<") (write-string "Sect2") (write-string ">")
    (make element gi: "Title" )))

(element h3
   (sosofo-append
    (write-string "<") (write-string "Sect3") (write-string ">")
    (make element gi: "Title" )))

(element h4
   (sosofo-append
    (write-string "<") (write-string "Sect4") (write-string ">")
    (make element gi: "Title" )))

(element h5
   (sosofo-append
    (write-string "<") (write-string "Sect5") (write-string ">")
    (make element gi: "Title" )))

(element heading
    (make element gi: "Title"))

(element p
    (make element gi: "Para"))

(element tt
    (make element gi: "Literal"
	attributes: `(("remap" "tt")))) ;; fixme

(element tscreen (process-children)) ; FIXME

(element ul
    (make element gi: "ItemizedList"))

(element li
   (make element gi: "ListItem" 
	(make element gi: "Para")))

(element URL
    (make element gi: "ULink"
	  attributes: `(("URL" ,(attribute-string "URL")))
	  (if (attribute-string "NAME")
		(literal (attribute-string "NAME"))
		(literal (attribute-string "URL")))))

(element IMG
   (make element gi: "Inlinegraphic"
        attributes: `(("Fileref" ,(attribute-string "SRC")) (copy-attributes))))

(element A
    (if
        (attribute-string "HREF")
        (make element gi: "Ulink"
               attributes: `(("URL" ,(attribute-string "HREF"))(copy-attributes)))
        (make element gi: "Anchor"
               attributes: `(("ID" ,(attribute-string "NAME"))(copy-attributes)))))

(element label 
   (make empty-element gi: "Anchor"
	attributes: (copy-attributes)))

(element ol
    (make element gi: "OrderedList"))

(element em
    (make element gi: "Emphasis"))

(element bf
    (make element gi: "Literal"
		  attributes: `(("remap" "bf"))))

(element pre
    (make element gi: "ProgramListing"))

(element quotep (process-children))


(element dl
   (make element gi: "GlossList"
	(process-matching-children "DT")))

(define (get-sibs)
    (let loop ( (rest (follow (current-node)))
		(accum (empty-sosofo)))
	(let ( (tag (gi (node-list-first rest))))
	    (if (or (not tag)
		    (string=? tag "DT"))
		 accum
		(loop (node-list-rest rest)
		    (sosofo-append accum 
			(process-node-list 
			    (node-list-first rest))))))))

(element DT
   (make element gi: "GlossEntry"
        (make element gi: "GlossTerm")
        (make element gi: "GlossDef" 
	     (get-sibs))))

(element BR
    (make element gi: "Emphasis"))

-- 
(concatenate 'string "cbbrowne" "@hex.net") <http://www.ntlug.org/~cbbrowne/>
"If someone  criticises your attitude towards an  operating system and
your only answer is ``get  laid,'' this indicates that you have either
got a  major case  of Reality  Distortion Syndrome or  that you  are a
troll..." -- mawa