Forums

Haystack with Xapian or Solr or PyLucene

Hi,

Are these supported/available in PA? I want to use then. Wanted to check before i start the development.

Not right now, but it's on the list. I've added an upvote.

hmm... i checked today by importing xapian package and ran a simple program. it ran fine. Looks like i will just drop haystack and live with xapian alone.

That's odd! I don't think we installed it deliberately -- perhaps something else installed it as a dependency. I'll make a note to add a test so that we make sure it's there for future versions

Hey giles,

One more question.

My current implementation for searching is using db query backed with local memory cache. i want to replace this search mechanism with xapian because xapian prebuilts the index (B+ tree) and is disk based. (it maintains its index in files).

What i want to know is, is there a minimum gaurantee of the disk access speed on PA platform? or do you see any flaw in my plan?

Hmm, that sounds like it would work. But the disk access speed question is a good one. We can't guarantee disk access speeds right now, unfortunately. The thing is, your disk needs to be accessible from a number of different machines -- the ones where your consoles run, the ones where your web apps run, and the ones where your scheduled tasks run. So it's networked storage, which means that access can be slow and speed can vary.

On the other hand, we're putting a lot of work into making it as fast as possible; yesterday we released an upgrade with a significant improvement (which we got by moving Dropbox syncing to a different server). And we'll be working on it more in the future.

I guess the best thing to do to get a feel as to whether it's likely to be acceptable would be to see if people advise against using networked storage for xapian indexes. We're using NFS as the transport, in case that matters.

isn't the same argument (networked storage) applicable to DB as well? I assume DB is accessed via network as well and the speed to access a Disk should be same as speed to access a DB, unless PA team did some thing special to opimize the access?

I would say the main difference is the amount of data that needs to be shuffled back and forth. A database access is uaually a very small query that tells the database to send a small subset of the database back to the client. So you can get useful access to a multi-gigabyte database where you're only sending a few K each way. For filesystem-type access, you're assuming that the file is available in its entirety, so you could end up shipping gigabytes across the network to filter out a few pieces.

is haystack still not supported??? going to have to cancel if so.

You can use haystack with the Whoosh backend (since it's just Python and doesn't need a server), just not any of the others. We could probably also support Xapian soon. Solr and ElasticSearch require quite a bit more work on our infrastructure.

Any updates on the support for Haystack?

Still only the whoosh backend I'm afraid :(

Have there been any updates on the support for Haystack, specifically solr?

No, there haven't.

How about now? :)

We will post an update when there is an update.