Forums

fcntl() / flock() functionality

Since I'm not particularly familiar with the filesystem functionality underlying the hosting on PA, can I safely assume that Python's fcntl.lockf() function will work correctly? I'm particularly interested in instances of the web hosting process, but it would also be interesting to know whether it would also guard against invocations from the consoles.

I'm pretty confident that the locking works over NFS, but I wasn't sure whether the filesystem on PA used NFS or was backed by something like Amazon EBS (about which I know precious little!).

We're using both! EBS-backed storage mounted across NFS... so I'm afraid that you can't be quite sure of anything! I'm reminded of a Ted Dziuba article I read once... traces of it here

What are you trying to achieve?

Here's the fellow!

So, calls to fsync may be unreliable... I'd have thought calls to locking functions may be better though?

Really it was a question that got thrown up while I was working through possible ideas for a project. I'll explain how I got there for interest, but please don't consider this as a request to do anything, I'm just exploring possibilities.

I'm thinking about writing an online home budgeting tool, initially entirely for personal use but toying with the idea of making available if it proves useful (you never know!). On that basis, some of the data could be quite sensitive - especially if the system gets extended to store real transaction data. As a result, I'd like to make sure that data stored in non-volatile storage is at least somewhat protected, ideally using AES with the user's own password as a passphrase (so the administrator of the tool doesn't even have access).

One option would be to use per-column encryption, and I'm aware that MySQL has some limited functions to help with this. This is pretty CPU-intensive (though that can be mitigated with sensible in-memory caching), and also potentially reveals some information (e.g. number of transactions).

Another option would be wholesale database encryption which incurs a one-off CPU hit after which data processing in memory is quite cheap. I spotted the SQLCipher project, but it looks like it's quite a lot of hassle getting a version of the standard Python sqlite3 module compiled with support for this.

So then I was considering a somewhat knock-together solution involving an object-based store serialised out with cPickle and then encrypted to disk with PyCrypto. For this to work sensibly with a web service, however, some sort of locking would be required. Failing locking, I'm sure there would a feasible fallback option if we could rely on the atomicity of a rename operation - however, I'm not sure about even that.

That's where my question came in - if we can't make too many assumptions about the backend then maybe I should stop trying to be too clever (probably the most common failing of software engineers?) and just use MySQL with per-column AES encryption for the sensitive data for now. I'd just have to be a little careful with the schema, as obviously reading hundreds of rows and using AES on each would suck up quite a bit of CPU and I wouldn't want to load the service down with that... But I'm sure there are sensible ways of grouping data such that it can be held in a small number of binary blobs.

I guess actually the cPickle solution could even be used but using MySQL as a back-end instead of the filesystem. Hm. I feel a little unclean for even suggesting it... (^_^)

Anyway, hopefully that gives you some background - but mostly I was curious about what we can and can't assume about the filesystem, and I think "very little" is a perfectly reasonable answer!