[GNC-dev] Import PDF to GnuCash

Tommy Trussell tommy.trussell at gmail.com
Tue Jul 31 14:52:19 EDT 2018


On Sat, Jul 28, 2018 at 1:08 AM jeffrey black <beastmaster126 at hotmail.com>
wrote:

> As near as I understand it, Quickbooks imports a specially formatted pdf
> file of a statement for reconciliation.  I am sure there is a large
> amount of money flowing between Quickbooks and Adobe for this right.
> Adobe has gone to great lengths to make their files viewable and printer
> printable only, unless you pay fees for features that used to be built
> in, like export to M$document files (which I used to use extensively for
> university extension publications).
>

At some point Adobe declared PDF to be an "open" format, so in many cases
you can peek inside and do something interesting with the files. PDFs are
"container" files and can contain more than one representation of a
document at a time.

The kind of PDFs that get generated by desktop applications and such
generally contain an abbreviated version of the PostScript page declaration
language. A PDF generated by a scanner application normally contains a
compressed TIFF image because that's directly compatible with fax software.
(And even if it isn't, ImageMagick can generally convert to whatever image
format you need.)

Some widely available applications, such as LibreOffice, can generate a PDF
with multiple items in the container at once. LibreOffice calls theirs a
"Hybrid PDF," and those PDF files contain the PostScript image AND the
document's editable source.

All this to say... If you acquired of one of the "specially formatted" PDF
documents intended for Quickbooks, I wonder what other document type might
they have they embedded into the file? For Quickbook's purposes it would
likely be a Quickbooks or OFX file because parsing the PostScript or
another image file format might be too unreliable. Of course they might do
something uncharitable like encrypt it or even compress it in an unusual
fashion to make reverse-engineering it a hurdle.

Most linux distributions include several useful PDF parsing and
manipulation utilities, so conceivably extracting useful data might be
relatively straightforward with a bit of command-line tinkering.


More information about the gnucash-devel mailing list