spam,Re: Testing reports

Colin Scott gnucash at double-bars.net
Mon Apr 9 23:32:00 EDT 2012


> Can one not find an html parser that will read the html into a DOM
> tree?  Then one could walk the tree comparing the tags, attributes,
> and contents with a DOM from a reference page.  That way minor 
> changes in the html that do not change the resulting displayed page
> will be ignored.

I expect you are right, but I've never worked with HTML at that sort of level, so please pardon my ignorance!

Besides, I was making two points.  The first was that one needs to weigh the cost of what one is doing against the benefits.  If it's easy, then maybe it's worth doing - but maybe not, read on!  :-)

The second was that *ANY* change to the HTML output needs to be flagged and checked - if you have something that works, it shouldn't be changed at all unless there is a very good reason makding the change.  If the output changes, and the changed HTML is determined to be correct, then it should become the reference against which future test output is compared.  IMHO one should *NEVER* automatically accept a change in output without it having been thoroughly scrutinised and checked by a human!

Colin

-------- Original Message --------

*Subject:* spam,Re: Testing reports
*From:* "Colin Scott" <gnucash at double-bars.net>
*To:* warlord at MIT.EDU
*CC:* jralls at ceridwen.us, yawar.amin at gmail.com, gnucash-user at gnucash.org, gnucash at double-bars.net
*Date:* Mon, 9 Apr 2012 10:00 +0100 (BST)

Sorry for the delay - I've been away.

> Define "html text"?  For example, would you consider this:
> 
>   <A HREF="foo">foo</A>
> 
> and this:
> 
>   <a href="foo">foo</a>

For the purposes of the exercise in hand, I would consider them to be different.  Working out whether the difference is significant is not something I would want to leave to a robot - not because it isn't perfectly possible to do robotically, but because it simply isn't worth devoting that much time and effort to inventing a robot that will do it reliably.  After all, even such an apparently minor change as a case inversion shouldn't be made without a damn' good reason!

>  What about:
>   <a href="foo">foo</a>    and    <a href="bar">foo</a>
> ?

Different under *any* circumstances!  :-)

Colin

-------- Original Message --------

*Subject:* Re: Testing reports
*From:* Derek Atkins <warlord at MIT.EDU>
*To:* spam at spambayes.invalid,gnucash at double-bars.net
*CC:* jralls at ceridwen.us, yawar.amin at gmail.com, gnucash-user at gnucash.org
*Date:* Mon, 02 Apr 2012 09:19:14 -0400

"Colin Scott" <gnucash at double-bars.net> writes:

>> So basically what you would like to test is that the normalized HTML
>> output is "the same".  This would require:
>
> Actually, no.  Were I doing this I would automate a test for whether
> the HTML text has changed.  At the point a change is found, it would
> probably require human intervention to see what has changed and why -
> *any* change is an error unless there is a legitimate reason for it,
> and automating the legitimacy test is probably too expensive a task to
> be worth doing.

Define "html text"?  For example, would you consider this:

  <A HREF="foo">foo</A>

and this:

  <a href="foo">foo</a>

to be the same or different?   What about:
  <a href="foo">foo</a>    and    <a href="bar">foo</a>
?

To me I would normalize in such a way that the first two would be
considered "the same" but the second two would be considered
"different".

I'm not sure how I would implement that, tho.  But those would be my
requirements.  Similarly, I would want these two to result in a "match":

  <p>This is a paragraph</p>

and

  <p>This
  is
  a parahraph</p>

-derek
-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available




More information about the gnucash-user mailing list