Successful data recovery (was Re: File signatures??)
matt at considine.net
matt at considine.net
Fri Jun 30 17:01:31 EDT 2017
Max, all,
Thank you for the pointers and help. I'm pleased to say that I seem to
have recovered my data. Still remaining to be found would be
customizations I made to the standard report. But if what I have found
stands up to some checks against known bank balances, etc, then I won't
be too far off of where I was a month ago.
What I have been trying to sift through (to recap a bit) is the result
of a recovery done with testdisk/photorec, which left a blizzard of
files and file fragments on a multi-terabyte harddrive. By and large
the filenames were lost (though not in all cases) and photorec uses a
list of known file signatures to try to append the appropriate file
extension. This largely works, but also not always. Finally, if I had
known of a definitive file signature *before* I started the recovery,
that might have helped. But for text-oriented files (vs JPEGs, PDFs,
executables, etc) that's not always reliable or available.
Fortunately, photorec seems to recognize XML and xml.gz formatted files.
Diving head first into a pool I hadn't been in before, I came up with
bash scripts (this is a Linux machine I'm working on) to do recursive
searches. Basically, I would open a terminal window and
gedit ~/.bashrc
and add the following to the end:
function odsgrep(){
term="$1"
echo Start search : $term
OIFS="$IFS"
IFS=$'\n'
for file in $(find . -name "*.ods"); do
echo $file;
unzip -p "$file" content.xml | tidy -q -xml 2> /dev/null | grep -i
-F "$term" > /dev/null;
if [ $? -eq 0 ]; then
echo FOUND FILE $file;
echo $file;
fi;
done
IFS="$OIFS"
echo Finished search : $term
}
function mattpdfgrep(){
term="$1"
echo Start search : $term
OIFS="$IFS"
IFS=$'\n'
for file in $(find . -name "*.pdf"); do
#echo $file;
pdftotext -htmlmeta "$file" - | grep --with-filename --label="$file"
--color -i -F "$term" ;
if [ $? -eq 0 ]; then
echo $file;
pdfinfo $file;
fi;
done
IFS="$OIFS"
echo Finished search : $term
}
function mattxlsgrep(){
term="$1"
echo Start search : $term
OIFS="$IFS"
IFS=$'\n'
for file in $(find . -name "*.xlsx"); do
#echo $file;
xlsx2csv "$file" | grep --with-filename --label="$file" --color -i
-F "$term" ;
if [ $? -eq 0 ]; then
echo $file;
fi;
done
for file in $(find . -name "*.xls"); do
#echo $file;
xls2csv "$file" | grep --with-filename --label="$file" --color -i -F
"$term" ;
if [ $? -eq 0 ]; then
echo $file;
fi;
done
IFS="$OIFS"
echo Finished search : $term
}
function mattxmlgzgrep(){
term="$1"
echo Start search : $term
OIFS="$IFS"
IFS=$'\n'
for file in $(find . -name "*.xml.gz"); do
#echo $file;
gunzip -c "$file" | tidy -q -xml 2> /dev/null | grep -i -F "$term" >
/dev/null;
if [ $? -eq 0 ]; then
echo FOUND FILE $file;
#echo $file;
fi;
done
IFS="$OIFS"
echo Finished search : $term
}
function matttxtgrep(){
term="$1"
echo Start search : $term
OIFS="$IFS"
IFS=$'\n'
for file in $(find . -name "*.txt"); do
#echo $file;
grep -i -F "$term" "$file"> /dev/null;
if [ $? -eq 0 ]; then
echo FOUND FILE $file;
#echo $file;
fi;
done
IFS="$OIFS"
echo Finished search : $term
}
These custom commands (built from a 'net search that turned up a variant
of the first one) allow for recursive file searches as well as
subsequent unzipping and string search operations. Importantly, they
attempt to look inside of spreadsheets and pdfs which aren't otherwise
"grep-able".
To find the data, I used the mattxmlgzgrep routine to search *backwards*
in time for the following
<ts:date>2017-06
It found no files, which was expected, since I had last worked on this
account in March or April, around US tax season. The next search for
<ts:date>2017-05
also turned up nothing. But searching <ts:date>2017-04 turned up 1 hit
and <ts:date>2017-03 turned up a large number. So even though the
timestamp on the file was dated as of the recovery, by searching
backwards for entries I was able to narrow things down.
Examining the file in gnucash (it seemed to have been pulled in cleanly)
showed all the categories, accounts, data, etc that I expected to see.
It would be great to find the files related to the standard report
customizations and I'll spend a little time trying to do that. Not sure
what would be a suitable "marker" yet but I think I have a candidate or
two. But after that I need to find the other records that made up some
of this workflow. Fortunately, they were all digital to begin with and
I believe I still have access online.
Thanks again for anyone's help. If there's anything I can share in
return, let me know.
Matt
On 2017-06-30 14:06, max at hyre.net wrote:
> Dear Matt:
>
>> The problem is that the recovery operation (using
>> Testdisk/Photorec) results in files and file fragments
>> that may or may not be correctly identified by file
>> extensions.
>
> It sounds like what you want is a magic number (file-format ID:
> https://en.wikipedia.org/wiki/File_format#Magic_number) for .gnucash
> files. Looking at my file it appears that ``<gnc-v2'' starting at the
> 41st character in the file whould do it. (I presumee the `2' in
> ``-v2'' is a version number, and could change at some future date, but
> for now that's not a problem.)
>
> It would be nice if the recovery program lets you add to the
> file-ID list, otherwise you're back to grep. I hope that it
> recognizes gzipped files (possible GNUCash files, compressed), but if
> not you want to look for the first two characters = 0x1f 0x8b. Of
> course, then you'll have to unzip them to see whether they're really
> what you want. :-/
>
> Gurus: Is this right? For future-proofing, can we assume the
> magic number will always be in position 41? Is there an actual,
> designated, magic number for GNUCash files somewhere?
>
> Hope this makes sense/helps...
>
>
> Best wishes,
>
> Max Hyre
More information about the gnucash-user
mailing list