XML Parse error on Local Build

Sat Dec 10 08:33:01 EST 2016

Op vrijdag 9 december 2016 19:00:51 CET schreef Chris Good:
> > Message: 2
> > Date: Wed, 07 Dec 2016 20:36:30 +0100
> > From: Geert Janssens <geert.gnucash at kobaltwit.be>
> > To: gnucash-devel at gnucash.org, "David T." <sunfish62 at yahoo.com>
> > Subject: Re: XML Parse error on Local Build
> > Message-ID: <1942799.LyrSZbBnvp at legolas.kobaltwit.lan>
> > Content-Type: text/plain; charset="us-ascii"
> > 
> > Op dinsdag 6 december 2016 17:05:37 CET schreef David T. via
> 
> gnucash-devel:
> > > Hi,
> > > 
> > > I am trying to finish a first pass at a new Glossary for the Guide. My
> > > changes are focused in a new file (gnc-glossary.xml) and in
> > > gnucash-guide.xml. In the latter, I have added two lines to
> > > incorporate the new file into the guide.
> > > 
> > > I have reached the point of running the commands:
> > > 
> > > xmllint --valid --noout gnucash-guide.xml xsltproc -o
> > > ../../../output_html/guide/ ../../xsl/general-customization.xsl
> > > gnucash-guide.xml
> > > 
> > > These are prescribed by the Wiki page.
> > 
> > A side note, more targeted at Chris who's done an excellent job so far to
> > keep the Documentation Update wiki page up to date:
> > I haven't fully read that page in a long time, so this never struck me
> > before:
> > 
> > I wonder why you explicitly explain to use xmllint and xsltproc ?
> > Both commands are incorporated in the makefile system for the
> > documentation.
> > 
> > So my typical flow (after a git clone/checkout/pull whatever) is
> > - in the gnucash-docs directory run "./autogen.sh"
> > - then make a build directory to keep the build documentation out of the
> > source tree for cleanliness. Let's call that directory "build" for this
> > example.
> > Typically this can be done using "mkdir build"
> > - Then from within the build directory call configure. Assuming build is a
> > 
> > direct
> > subdirectory of gnucash-docs, do
> > 
> >   cd build
> >   ../configure
> > 
> > - This will recreate the gnucash-docs directory structure under build. But
> > 
> > not
> > the files. Instead you will find a Makefile in most directories. From now
> 
> on
> 
> > you can choose at which level you wish to run commands. Either for all of
> > the
> > documentation, or only for the guide or only for one specific language of
> > the
> > guide. Just cd into the proper directory. For the options above these
> 
> would
> 
> > be
> > respectively: build, build/guide and build/guide/<lang>.
> > - The command to run xmllint is
> > make check
> > - The command to generate the html documentation is make html
> > 
> > Did you have a particular reason not to use these ? The documentation will
> > let you use these anyway in a later step  to "test on linux", so I figure
> > you
> > might as well just stick with them for the earlier steps as well. That way
> > there's only one set of commands to remember.
> > 
> > Regards,
> > 
> > Geert
> 
> Hi Geert,
> 
> I've always used the xmllint and xsltproc commands for several reasons.
> 1) That's what was in the wiki Documentation_Update_Instructions and until
> John Ralls mentioned recently that you could use the make commands instead I
> had no idea this was possible.
> 2) I didn't understand enough about what the autogen.sh, configure & make
> commands actually did. I did have a look but decided not to spend too much
> time on that as I had a workable system (without using a separate build
> system). Thanks for you explanations. I learn all the time.
> The instructions for creating a system where you could build gnucash-docs
> are spread all over the place, refer mostly building gnucash itself, and use
> different directory names (install, build, release, distrelease etc). I
> understand that we are free to name some directories whatever we like, but
> the lack of consistent guidance makes it hard to know exactly what the
> documentation is
> referring to.
> 
> BTW, What does autogen.sh do? Does it examine the system to determine if &
> where required software is installed and save that information for later use
> of configure?
> 
I agree the complete details of autogen/configure/make are fairly complex. I 
don't even understand all of it myself. The complexity stems from the design 
idea to create a build system that's generic enough to build all kinds of 
objects (applications, libraries, documentation, websites, whatever) on as 
many different platforms as possible. Luckily we don't need all of this 
complexity for our purposes.

Let me try to explain this top to bottom. The command we're ultimately 
interested in is 'make'. This command follows recipes to convert some source 
files into some end object(s). For gnucash this is a bunch of c and guile 
sources which are transformed into the gnucash application (which itself is 
composed of an executable and lots of libraries). There are lots of details 
I'm glossing over here, focusing mostly on the principles.

For our docs plain make (without explicit target that is) does nothing, 
because the xml files are both the source and the target which yelp needs to 
display the docs.

However aside from the main target one can also define alternative targets. A 
few spring to mind for the docs: html, pdf, mobi, epub, install. So one could 
run
make pdf
and make will search a recipe to build one or more pdf files, depending on 
which directory you execute the command in.

'install' is a special recipe that is intended to put whatever make built as 
primary target (the xml files in case of the docs, which are the same as the 
source xml files) in a special location where a typical unix system expects 
these files. For example, executables typically should be installed in
/usr/bin on linux. This location can be overridden. I'll get back to this 
later.

Where does make find all these recipes ?
make looks for a file called Makefile in the current directory. Makefile holds 
the recipes and other configuration items. Unfortunately this file is very 
complex in itself because - as I said earlier - the build system is designed 
to work in very different environments. 
For example the location of all the tools needed to create targets from the 
source files (like xsltproc) may be installed in different locations on 
different platforms, or even have a different name (like xsltproc.exe in a 
Windows environment). Or some commandline options for certain tools may not 
be available on all platforms or versions of the tool. Manually coping with 
all these variations will quickly result in a mess.

So to cope with this a configuration step was added to the build process. This 
step runs a very complicated script called "configure" that will analyze your 
system and actually creates the Makefile files I described above. configure 
will lookup installation location for all tools used and can enable or disable 
certain recipes in the Makefiles based on its findings. "configure" can also 
take extra commandline options that can alter what it will include in the 
Makefiles. You can see these options by running configure --help. Most of them 
are not relevant for us, except for the few we invented ourselves like
--with-mobi.

"configure" doesn't take Makefile's as input. Instead it will read files 
called Makefile.in, which is a pseudo Makefile with lots or variables that 
still need final setting.

Still, writing a configure script and Makefile.in files remains very 
cumbersome and lots of information in both files comes back all the time in 
different projects. So yet another step was added before in the build system. 
This step is meant to generate the configure script and the Makefile.in files 
based on another set of files: configure.ac and Makefile.am files.
These files encode the essence of configure and the final Makefiles, but with 
everything removed that can be detected by a smart configuration script. Only 
the details that matter and are unique for each project are retained in these 
files.
Generating configure and the various Makefile.in files involve a few steps, 
but they are always the same, regardless of the platform you run them on. For 
this reason they are combined together in a small script called autogen.sh. 
This script itself is the platform independent and is used to initiate the 
whole build system for a given project.

And that's the general idea behind the autotools based build system. Again I 
glossed over a lot of details.

As a rule of thumb, autogen.sh should be called the first time you want to 
initialize the build system and sometimes when changes are made in 
configure.ac or the Makefile.am files. Frequently those changes are 
autodetected though. Calling autogen.sh when it's not necessary doesn't doe 
harm either way. The only side effect is that the first next build may take 
longer.

The same goes for "configure". It should generally be called right after 
autogen.sh for the same reasons. And same here, calling it while not really 
required has no negative side-effect other than that the first next build may 
take longer because more objects will be rebuilt.

Lastly, make is the command you'll want to call after each change in the 
sources you want reflected in the targets.

Now on to directories. In a build system there are three important 
directories:
- the source directory
- the build directory
- the installation directory (which can also be more than one directory all 
under a special directory called the prefix-directory)

The source directory should be clear - it's where your source files are. 
gnucash-docs (the clone of our github repository) with all its subdirectories 
on your system for documentation. If you want to make changes to the 
documentaiton, you do this in the source directory.

The build directory is a directory used by the build system to store all 
files/objects that are generated by the build system.*make is always executed 
in the build directory*. Each make recipe results in at least one such file or 
object.
Without any extra steps, the build directory will be the same as the source 
directory. So if you build the html files for the English guide for example, 
these files will be put next to the xml files for that guide in the source 
directory (although the recipe will put them in a subdirectory as a 
convenience).
If you don't want this (and it's generally recommended *not* to build in the 
source directory directly for various reasons), you can also choose to work 
with a separate build directory. The way to do this is a bit unusual but very 
simple once you understand it: configure will use the directory from which 
it's called as the build directory. An example will help here:
Assume you have cloned the gnucash-docs repository here:
/home/user/gnucash/gnucash-docs
and want to use
/home/user/gnucash/gnucash-docs/build
as build directory

Then first run autogen.sh in /home/user/gnucash/gnucash-docs as follows
cd /home/user/gnucash/gnucash-docs
./autogen.sh
This ensures the configure script is generated. Now create the directory you 
want to use as build directory
mkdir /home/user/gnucash/gnucash-docs/build
Note that "build" is purely arbitrary. It can be whatever suits you, and even 
wherever it suits you. Some people create it as a subdirectory to the source 
directory. I tend to have it in a completely different location to have all 
builds together under one directory. That is a matter of preference.
Now to "call configure from the build directory", you first have to be *in* 
the build directory:
cd /home/user/gnucash/gnucash-docs/build
And then invoke configure. Configure is still in the source directory, so in 
order to call it you need to use the full (absolute or relative) path to it:
/home/user/gnucash/gnucash-docs/configure
or
../configure
Both are equivalent. The former uses the absolute path, the latter uses a 
relative path.
With this, configure will use /home/user/gnucash/gnucash-docs/build as the 
build directory. The Makefile files and target objects will all be stored in 
there (and its subdirectories).

And finally there is the installation location, which is the location where 
the generated objects should finally end up to be used sensibly in the system. 
Without any special configuration this will be /usr/local/... on linux based 
systems. This is however not a good location to keep as default during 
development (or documentation changing), because this would interfere with 
your stable, running system and that's generally not what you want on a modern 
system where installed software is usually done via a package manager.
So for development purposes we need to tell the build system to use another 
final location. This is done during the run to configure, by setting the
--prefix option. You can find a reference to this in the wiki page as well in 
the section about testing on linux. Again whatever you choose as base 
directory for the --prefix option is really a matter of preference. The only 
conditions are:
- it should be a writable location for the user that runs make install
- it should not be in the default paths /usr or /usr/local
For the example started above, one could use say
/home/user/gnucash/gnucash-docs/install
Or the full configure command (using relative paths here):
../configure --prefix=/home/user/gnucash/gnucash-docs/install

And this is as far as I can tell all there is to know. You may need to install 
a few packages at the beginning. xsltproc is one, the build system also 
requires autoconf, automake and libtoolize. I believe most distributions 
provide generic packages to install all that is needed to have a suitable 
development environment. These should take care of most of the dependencies. 
I'm a bit vague here as it's been quite a while since I had to start from 
scratch like this. Sorry about that.

> 3) I wanted to use an IDE, netbeans being what I am most familiar with. I
> wanted to use the netbeans xml editor and integrated git/github support. I
> haven't tried to use netbeans to do any of the making. Maybe that is
> possible, but the GnuCash make system seems very easy to use.
> I found that I could:
>  a. Clone  gnucash-docs to ~/github/gnucash-docs
>  b. Rename gnucash-docs to gnucash-docs-save (as a netbeans project must be
> created in an empty folder)
>  c. Create a netbeans project in ~/github/gnucash-docs
>  d. copy/move all files and directories ((including hidden folders like
> .git) in ~/github/gnucash-docs-save to ~/github/gnucash-docs and I would
> have a working netbeans project with working git/github integration in the
> repository directory.
> 
I haven't used netbeans but I believe you should be able to configure it so it 
matches the use of autogen.sh and configure as described above. I am using a 
mixture of Eclipse and the command line for my development and I can configure 
Eclipse to match my command line configuration. So I can run  builds from 
within eclipse or straight from the command line with the same result. I 
usually run make from the command line because it allows more fine-grained 
control and I have more experience now than when I started on the project. In 
that past I did most directly in Eclipse.

> 4) Using a separate 'build' directory seemed as though it would cause
> problems with my system and be more work. Should I edit the files in my
> repository or the build directory? I would need to be very careful which
> files I edit if I decided to only edit files in the build directories. Will
> I need to manually copy files between the build and repository directories?
> 
As described above, you always edit files in the source directory. If you want 
to update the targets, this is done by calling make in the appropriate build 
(sub-)directory.

One thing I haven't stressed above:
autogen.sh is *always* run in the top-level *source* directory
configure is *always* run in the top-level *build* directory (which can be the 
same as the source directory, but that's discouraged)
make can be called in *any* subdirectory of the build directory. If you only 
want to build the English guide, you can run make in <build-dir>/guide/C
If you want to build help in all languages, you can run make in
<build-dir>/help

And also as a reminder running "make" without a target will do nothing in the 
documentation, because the the final target for yelp are the exact same xml 
files we are working on in the sources. More useful make invocations here 
would typically be
make check (to run xmllint)
make html (to build the html version of the documentation, which is the 
xsltproc command)
make pdf (to build the pdf version of the documentation)

> 5) From memory I think the documentation on installing the dependencies
> needed to use 'make' etc could use some work. That could discourage
> non-technical people from doing documentation, whereas the instructions for
> using xmllint and xsltproc are pretty easy to follow IMHO. Those commands
> are always in my bash history so I don't have to remember them.
Yes, the documentation on installing the basic commands needed for a make 
based build system for our docs need improvement.

On the other hand you also had to install xmllint and xsltproc. I assume you 
also followed documentation to do so or had to search for it. I think if the 
documentation on installing the prerequisites is straightforward, using the 
make system should not be that much more complicated.

Note also that using the make system has the advantage your commands are 
always up to date. For example the xsltproc invocation by make uses more 
options than your manual command does. It may be that in the future more 
tweaks are added by one of the devs for various reasons. Then you'd have to be 
alert enough to pick this up and update your manual command as well (both in 
your bash history as in the wiki page).
> 
> With your information, I can see now that after building the 'build' system,
> I can just use netbeans to edit the repository files, run the appropriate
> make command in the appropriate build directory in a terminal, then use
> netbeans git/github integration when ready. I assume the netbeans project
> files will not break the autogen.sh, configure or make commands.
> I'll test and update the 'documentation update' documentation.
> 
Indeed. Netbeans should be able to use that make based build system without 
issues. It's a well established system and most ide's support it.

Regards,

Geert