Testing reports

Tue Apr 10 10:23:04 EDT 2012

On Apr 9, 2012, at 10:53 PM, Colin Scott wrote:

> 
> Yes, I accept that one might argue about whether the output is the HTML or the report.  I would incline to the former, on the grounds that if the HTML doesn't change then neither does the report, and if the HTML does then the report might (or might not).  I guess it depends, as you suggest, on how you create the HTML - and in general I don't disagree with anything you say.  I am sure you similarly understand where I am coming from ...  :-)
> 
> If my approach seems simplistic, then it is deliberately so.  I found over many years that it is unhelpful to introduce needless complexities, and that it is always easier to start simple and add the complexities one finds to be necessary, rather than starting complex and then finding that some of the initial complexity is superfluous.  This approach generally gave me more robust results, and got them quicker ...
> 
> Colin
> 
> -------- Original Message --------
> 
> *Subject:* spam,Re: spam,Re: Testing reports
> *From:* Colin Law <clanlaw at googlemail.com>
> *To:* gnucash at double-bars.net, gnucash-user at gnucash.org
> *Date:* Tue, 10 Apr 2012 13:26:49 +0100
> 
> On 10 April 2012 04:32, Colin Scott <gnucash at double-bars.net> wrote:
>> 
>>> Can one not find an html parser that will read the html into a DOM
>>> tree?  Then one could walk the tree comparing the tags, attributes,
>>> and contents with a DOM from a reference page.  That way minor
>>> changes in the html that do not change the resulting displayed page
>>> will be ignored.
>> 
>> I expect you are right, but I've never worked with HTML at that sort of level, so please pardon my ignorance!
>> 
>> Besides, I was making two points.  The first was that one needs to weigh the cost of what one is doing against the benefits.  If it's easy, then maybe it's worth doing - but maybe not, read on!  :-)
>> 
>> The second was that *ANY* change to the HTML output needs to be flagged and checked - if you have something that works, it shouldn't be changed at all unless there is a very good reason makding the change.  If the output changes, and the changed HTML is determined to be correct, then it should become the reference against which future test output is compared.  IMHO one should *NEVER* automatically accept a change in output without it having been thoroughly scrutinised and checked by a human!
> 
> It is a bit of a philosophical point really, but one could argue that
> the output from the software is a report rather than a chunk of html.
> What matters is what the report looks like, and if the Document Object
> Model has not changed then it will look the same whatever the details
> of the underlying html.  Using the DOM allows the test to relate to
> the end result, with tests such as "there should be a paragraph of a
> particular class containing particular text", rather then "there
> should be a string of the form <p class =......".
> I don't know how the html is generated in the code, if it uses an html
> library (or if at some point in the future it used an html library)
> then another issue could be that a newer version of the library might,
> for example, change the order of attributes in a tag.  A DOM based
> test would not care about this but for a text based one it would be a
> disaster.

You guys are forgetting about CSS's affect on the way a particular DOM is rendered. That can make a huge difference in appearance. Play around a bit with http://www.csszengarden.com/ to see what I mean (or just for fun, it's an awesome demo!)

An easy way to test the HTML without worrying about tag case or extraneous whitespace is to use HTML Diff [1] against a canned comparison document. That will get us close enough for a first order test, I think. Anyone got time to code something up?

The talk about DOM gave me another idea: Since the html generator seems to be outputting XHTML transitional (at least it *says* it is), we can write a strict schema and then validate the output report against the schema. If it validates, it passes.

The HTML generation code is in src/report/report-system, is in Scheme, and appears from cursory inspection to be self-contained (meaning that it doesn't use an external library except for jqplot, which makes the graphs).

Regards,
John Ralls

[1] http://www.aaronsw.com/2002/diff/