Search engine

Derek Atkins warlord at MIT.EDU
Wed May 5 14:05:01 EDT 2004


Neil Williams <linux at codehelp.co.uk> writes:

> So I'd like to have the search engine on machine 2 under lists.gnucash.org 
> using local filesystem access - read, no write access required for the script 
> except to upload new versions.

Yep, that's what I think, too.  Both the doxygen docs and search engine
should be on cvs/lists.  I'm planning to work on the doxygen scripts
relatively soon.

> Same applies for the search engine - the faster the script can access the HTML 
> archive, the faster the search returns the answer.

Ahh, so it's doing a real-time grep, effectively?  Ok, that makes sense.
That also explains why you don't think you need an index.  Honestly,
I don't know much about search engines, but I was under the impression
that you could make multiple indices that speed up searches.  But.. eh.
That explains the disconnect.  We're on the same page, now.

> Yuk! I may be good at PHP but I know when I need testing and speedy updates to 
> my code!! Sorry, I've done this before and it simply did NOT work. I really 
> do not want to get involved in that morass again.

Ok, how about this:  CVS access to a "search engine" module which gets
pulled out into the web server?  This way:

1) you can make changes to the server in near-real-time
2) we get a change history of the script
3) it still keeps the server protected

Would this be "good enough" for you?

>> Let me know what other information you need.  
>
> absolute path names,

the mail archives are /var/mailman/archives/public/<listname>/...

What other paths do you need?

> ability to use phpinfo() as and when required (without 
> leaving it there for the entire world to see), 

I have no idea what this implies or how to grant you this
access.

> and access.

Would CVS be sufficient?  Assume that your commits are auto-pushed
into some area.  This way you can still upload changes to your script
to fix bugs.

>  Sorry, I really 
> cannot work without FTP or preferably SSH. I'm NOT going to sit and download 
> the archive page by page to create a test site and I don't work without being 
> able to do my own updates. FTP allows me to download an accurate copy quickly 
> and upload script updates simply, SSH allows me to compress the copy into a 
> tarball, download that and delete the temporary tarball. Either way, I really 
> have tried to do this without access and it is a process WORSE than learning 
> QOF. We really, really, truthfully do NOT want to do go there! This is a 
> simple job that can be done today, IF some sort of FTP or SSH user access is 
> available. Just an ordinary user, but if this isn't available then sorry, I 
> really can't help with the search engine.
>
>> No offence, but right now not 
>> even the developers have shell access to the server.
>
> Pity. I've got a script that is just waiting to be adapted but I really cannot 
> proceed without some access. With the timezone differences as well, I will 
> not be able to fix bugs unless I can update in real time from my own box. 
> (Yes, there may well be bugs, nobody writes perfect code first time!)

If you send me an ssh key I can set up CVS access in a matter of
minutes, and then I can set up a job to pull out the search script
whenever there are changes made to CVS, allowing easy updates.

One reason I'd like to audit the scripts and limit access is that I'm
a bit paranoid about system security.  PHP has never been a strong
winner in that area.  So I'm obviously a bit concerned.

>> > With regard to:
>> > http://www.gnucash.org/en/contribute.phtml
>> >  We even need someone to make sure that the mail archives are running
>> > correctly, and that recent mail is getting indexed & is searchable.
>> > (webmaster selected)
>> >
>> > Is that bit about checking the operation of the mail archives still a
>> > problem? From only a casual use of the archive, recent messages seem to
>> > be added very quickly. The 'searchable' I can solve for you.
>>
>> No, it is not a problem any more.  The lists are running fine, and the
>> archives are running fine.
>
> So the contribute script is to be updated . . . . ?

Well, it needs to be updated..  Whether it WILL be is something
different..  Access to www.gnucash.org is rather limited.  :(

-derek
-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available


More information about the gnucash-devel mailing list