[GNC-dev] Import PDF to GnuCash

John Ralls jralls at ceridwen.us
Tue Jul 31 19:16:16 EDT 2018



> On Jul 31, 2018, at 11:52 AM, Tommy Trussell <tommy.trussell at gmail.com> wrote:
> 
> On Sat, Jul 28, 2018 at 1:08 AM jeffrey black <beastmaster126 at hotmail.com>
> wrote:
> 
>> As near as I understand it, Quickbooks imports a specially formatted pdf
>> file of a statement for reconciliation.  I am sure there is a large
>> amount of money flowing between Quickbooks and Adobe for this right.
>> Adobe has gone to great lengths to make their files viewable and printer
>> printable only, unless you pay fees for features that used to be built
>> in, like export to M$document files (which I used to use extensively for
>> university extension publications).
>> 
> 
> At some point Adobe declared PDF to be an "open" format, so in many cases
> you can peek inside and do something interesting with the files. PDFs are
> "container" files and can contain more than one representation of a
> document at a time.
> 
> The kind of PDFs that get generated by desktop applications and such
> generally contain an abbreviated version of the PostScript page declaration
> language. A PDF generated by a scanner application normally contains a
> compressed TIFF image because that's directly compatible with fax software.
> (And even if it isn't, ImageMagick can generally convert to whatever image
> format you need.)
> 
> Some widely available applications, such as LibreOffice, can generate a PDF
> with multiple items in the container at once. LibreOffice calls theirs a
> "Hybrid PDF," and those PDF files contain the PostScript image AND the
> document's editable source.
> 
> All this to say... If you acquired of one of the "specially formatted" PDF
> documents intended for Quickbooks, I wonder what other document type might
> they have they embedded into the file? For Quickbook's purposes it would
> likely be a Quickbooks or OFX file because parsing the PostScript or
> another image file format might be too unreliable. Of course they might do
> something uncharitable like encrypt it or even compress it in an unusual
> fashion to make reverse-engineering it a hurdle.
> 
> Most linux distributions include several useful PDF parsing and
> manipulation utilities, so conceivably extracting useful data might be
> relatively straightforward with a bit of command-line tinkering.

Careful. Intuit very likely has a "no reverse engineering" clause in their EULA and prying into their "special" PDF format in order to enable a competing product, even (or maybe especially) a FLOSS one, is likely to get one some attention from their lawyers.

Regards,
John Ralls




More information about the gnucash-devel mailing list