can you link html to sgml?
Christopher B. Browne
cbbrowne@hex.net
28 Nov 2000 07:03:59 -0600
Richard -Gilligan- Uschold writes:
> I have an html report that I want to call up from a link in the sgml
> help system. I do not want to convert the html to sgml first, not by
> hand, anyway, as it was generated by directly by scheme. Can this be
> done?
I've got a DSSSL script [attached below] that can be used to transform
HTML into "something almost-like-DocBook;" the results need to get
cleaned up a tad in order to be used, but tend to at least be Useful
Enough in avoiding the work of manually tagging it all.
Usage is thus:
% jade -t sgml -d wherever-dsl-file-got-put.dsl inputfile.html > output.docbook
Note that the DSSSL script does _nothing_ about tables, and if you
have funky HTML attributes like CLASS, FONT, and such, the script will
pass them through unchanged, which might not be exactly what is
wanted.
I would also suggest taking the output and passing it through
"sgmlnorm" so as to clean up the formatting that is Not Very Pretty.
% sgmlnorm output.docbook > output2.docbook
[which will likely complain about a whole _host_ of things that are
Not Nice in the converted document...]
More or less the whole point of the problem is that HTML _as commonly
used_ is so ill-specified that HTML documents tend to contain all
sorts of unstructured "physical rendering rubbish" that is impossible
to transform into DocBook in a really coherent way. The above stuff
gets you probably 60% of the way; the other 40% will involve taking a
look at what's produced, and probably _throwing out_ a bunch of tags
that were being used to try to force in some physical rendition
information that should outright Not Be In Use.
--- Set phasers to Cut Here ---
<!doctype style-sheet PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN">
(define debug
(external-procedure "UNREGISTERED::James Clark//Procedure::debug"))
(declare-flow-object-class element
"UNREGISTERED::James Clark//Flow Object Class::element")
(declare-flow-object-class empty-element
"UNREGISTERED::James Clark//Flow Object Class::empty-element")
(declare-flow-object-class document-type
"UNREGISTERED::James Clark//Flow Object Class::document-type")
(declare-flow-object-class processing-instruction
"UNREGISTERED::James Clark//Flow Object Class::processing-instruction")
(declare-flow-object-class formatting-instruction
"UNREGISTERED::James Clark//Flow Object Class::formatting-instruction")
(declare-characteristic preserve-sdata?
"UNREGISTERED::James Clark//Characteristic::preserve-sdata?"
#t)
(define (copy-attributes #!optional (nd (current-node)))
(let loop ((atts (named-node-list-names (attributes nd))))
(if (null? atts)
'()
(let* ((name (car atts))
(value (attribute-string name nd)))
(if value
(cons (list name value)
(loop (cdr atts)))
(loop (cdr atts)))))))
(default (if (node-property 'momitend (current-node))
(make empty-element attributes: (copy-attributes))
(make element attributes: (copy-attributes))))
(define (write-string str)
(make formatting-instruction
data: str))
(element HTML
(make sequence
(make document-type
name: "ARTICLE"
public-id: "-//Davenport//DTD DocBook V3.0//EN")
(process-children)))
(element article (make element))
(element title (make element))
(element head
(make element gi: "Artheader"))
(element BODY
(make element gi: "Para"))
(element h1
(sosofo-append
(write-string "<") (write-string "Sect1") (write-string ">")
(make element gi: "Title" )))
(element h2
(sosofo-append
(write-string "<") (write-string "Sect2") (write-string ">")
(make element gi: "Title" )))
(element h3
(sosofo-append
(write-string "<") (write-string "Sect3") (write-string ">")
(make element gi: "Title" )))
(element h4
(sosofo-append
(write-string "<") (write-string "Sect4") (write-string ">")
(make element gi: "Title" )))
(element h5
(sosofo-append
(write-string "<") (write-string "Sect5") (write-string ">")
(make element gi: "Title" )))
(element heading
(make element gi: "Title"))
(element p
(make element gi: "Para"))
(element tt
(make element gi: "Literal"
attributes: `(("remap" "tt")))) ;; fixme
(element tscreen (process-children)) ; FIXME
(element ul
(make element gi: "ItemizedList"))
(element li
(make element gi: "ListItem"
(make element gi: "Para")))
(element URL
(make element gi: "ULink"
attributes: `(("URL" ,(attribute-string "URL")))
(if (attribute-string "NAME")
(literal (attribute-string "NAME"))
(literal (attribute-string "URL")))))
(element IMG
(make element gi: "Inlinegraphic"
attributes: `(("Fileref" ,(attribute-string "SRC")) (copy-attributes))))
(element A
(if
(attribute-string "HREF")
(make element gi: "Ulink"
attributes: `(("URL" ,(attribute-string "HREF"))(copy-attributes)))
(make element gi: "Anchor"
attributes: `(("ID" ,(attribute-string "NAME"))(copy-attributes)))))
(element label
(make empty-element gi: "Anchor"
attributes: (copy-attributes)))
(element ol
(make element gi: "OrderedList"))
(element em
(make element gi: "Emphasis"))
(element bf
(make element gi: "Literal"
attributes: `(("remap" "bf"))))
(element pre
(make element gi: "ProgramListing"))
(element quotep (process-children))
(element dl
(make element gi: "GlossList"
(process-matching-children "DT")))
(define (get-sibs)
(let loop ( (rest (follow (current-node)))
(accum (empty-sosofo)))
(let ( (tag (gi (node-list-first rest))))
(if (or (not tag)
(string=? tag "DT"))
accum
(loop (node-list-rest rest)
(sosofo-append accum
(process-node-list
(node-list-first rest))))))))
(element DT
(make element gi: "GlossEntry"
(make element gi: "GlossTerm")
(make element gi: "GlossDef"
(get-sibs))))
(element BR
(make element gi: "Emphasis"))
--
(concatenate 'string "cbbrowne" "@hex.net") <http://www.ntlug.org/~cbbrowne/>
"If someone criticises your attitude towards an operating system and
your only answer is ``get laid,'' this indicates that you have either
got a major case of Reality Distortion Syndrome or that you are a
troll..." -- mawa