Search engine

Neil Williams linux at
Wed May 5 15:16:26 EDT 2004

On Wednesday 05 May 2004 7:05, Derek Atkins wrote:
> Neil Williams <linux at> writes:
> > Same applies for the search engine - the faster the script can access the
> > HTML archive, the faster the search returns the answer.
> Ahh, so it's doing a real-time grep, effectively?  

Yes. PHP uses readdir() and then fopen() in a loop.

> Ok, that makes sense. 
> That also explains why you don't think you need an index.  Honestly,
> I don't know much about search engines, but I was under the impression
> that you could make multiple indices that speed up searches.

I think that's more for database backend searches or where the search terms 
are strictly defined: price between X and Y or catalogue ID etc. Creating and 
maintaining indices outside a database is not fun!

> But.. eh. 
> That explains the disconnect.  We're on the same page, now.

Great - I was getting a tad confused.

> Ok, how about this:  CVS access to a "search engine" module which gets
> pulled out into the web server?  This way:
> 1) you can make changes to the server in near-real-time
> 2) we get a change history of the script
> 3) it still keeps the server protected
> Would this be "good enough" for you?

Novel idea! It'll work with one proviso - could you make a tarball of:
please? I need some real files to test the adapted script. If you could put 
the tarball somewhere (any site will do) and let me know the URL, it'll be 
easier than sending by email - I'm aware these are not small archives.

> >> Let me know what other information you need.
> >
> > absolute path names,
> the mail archives are /var/mailman/archives/public/<listname>/...

Great, thanks. I'll reproduce the same structure locally.

> What other paths do you need?
> > ability to use phpinfo() as and when required (without
> > leaving it there for the entire world to see),
> I have no idea what this implies or how to grant you this
> access.

That's OK. It's a simple file that PHP treats as a special function. It simply 
outputs the entire list of PHP and server environment variables, PHP config 
and status messages. That's why it isn't good to have around all the time!

One way round this is simple - the info doesn't really change, so if you 
create this single line file:
<?php phpinfo(); ?>
and call it phpinfo.php then put that somewhere in the webspace for the lists 
site (so that it picks up the correct environment for my script), send me the 
HTML output and delete the file.

> > and access.
> Would CVS be sufficient?  Assume that your commits are auto-pushed
> into some area.  This way you can still upload changes to your script
> to fix bugs.


> If you send me an ssh key I can set up CVS access in a matter of
> minutes, and then I can set up a job to pull out the search script
> whenever there are changes made to CVS, allowing easy updates.

You should have just received that.

> One reason I'd like to audit the scripts and limit access is that I'm
> a bit paranoid about system security.  PHP has never been a strong
> winner in that area.  So I'm obviously a bit concerned.

I can use Perl if you prefer. That script is the one running searches and was used on dclug - it needs the time-aware 
code added and it was easier to do that with a new one in PHP for the dclug 

It's too late this side of the pond to finish the script now (the problems of 
international development eh?) so let me know the CVS commands / URL's and 
I'll start work on it on Friday (more paid work tomorrow).

Let me know if you want me to do this in Perl asap. Ta.

> > So the contribute script is to be updated . . . . ?
> Well, it needs to be updated..  Whether it WILL be is something
> different..  Access to is rather limited.  :(



Neil Williams
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url :

More information about the gnucash-devel mailing list