Forum Pagination

The addition of pagination to the PA forum is obviously a good feature. I'm sure it is highly desirable in most cases. However there are certainly times when seeing the whole list is also desirable and where it is preferable to using the Google search. Would you please consider adding a See All button to the forum so we can manually go back to the old view when needed? Thank-you...☺

The reason we had to add it was that the DB hit from showing all of the forum posts (including the "last post by" username) was getting too large to support :-(

Would bumping up the page size be of any use? Or would you specifically like to be able to see everything? Perhaps without the last-post data?

Choice b would work for me. It's the option to see it all listed from time to time. And I completely understand that you had to paginate it. I was actually surprised it hadn't happened sooner. So, yeah the ability to list them all on demand w/o the last-post data would cover my uses...☺

Is it searching for a particular string in the title that you need? That could be addressed with a suitable LIKE clause on the SELECT and if the column was indexed that would be a lot faster than a table scan. Needs a bit of fussy UI work, though, and it could seem a bit redundant with the Google search box there as well - multiple search boxes would probably be confusing.

(Of course, I'm assuming an SQL back-end here - same principle could hold with a NoSQL DB, but the index would need to be manually implemented)

Perhaps if you explained your use-case in a little more detail we can find a solution which is database efficient and meets your needs?

I'm a little confused as to what you are asking Cartroo? I thought the issue was settled. Of course I'm reading your response at 1am, so perhaps I'm just missing your point.

Maybe your question directed @ giles?

It was directed at you, but if the "last poster" overhead was the only issue then my question is arguably redundant. I was rather assuming that showing all the threads could be quite a load even without the last poster, so I was trying to find out more about your use-case in case we can find a solution which doesn't involve the same DB hit.

In general I think any function which offers an ever-increasing list is going to hit a scalability issue at some point, even if it's just the sheer size of the HTTP response. It may be that with removing the last poster that point has been moved far into the future, of course, in which case the point is moot.

Personally I always get a bit twitchy about unbounded lists in any web app, but if it's the result of a SELECT on a single table (or equivalent) then it should be fast as long as there's an index on the "last update" column (again, assuming SQL).

I used it a few ways.

  1. First is that by having had the long form list for so long I got used to where certain older posts were on the list and could scroll directly to them, but now that we're paginated I have no idea what page and when I jump to an arbitrary page don't know if I should go forward or backward to find the correct paginated page.
  2. Sometimes I could only remember a word from the title of the topic (again with it's relative timeline) and being able to use browser search to zipping up and down a single page will be way faster than the multiple attempts at a Google search when I need to try a few words, plus the items appear in history order when on one page versus appearing in Google results order.
  3. Nothing against Google, but I don't like to be tracked, so when I can avoid my online activity being logged for someone else s benefit...I do. I'm not trying to make this issue about something it isn't, but our privacy is not just eroding in this era, it is evaporating and anything to slow that down is a feature to me!
  4. There is more, but I could live w/o it for the rest, so I guess the above is my justification for wanting it back.

P.S. I hope that doesn't sound paranoid. I do try and keep these issues in balance.

Would new post notifications also help?

As in something other than using RSS for threads?

In principle you could use the RSS feed to implement some or all of what you want, although in practice I'm not sure if there's much software out there which would have the requisite functionality so you might end up having to craft something in Python.

If the RSS feed was updated to support RFC 5005-style feed archiving then this creates a permanent record which can be cached locally (once an archive feed is created, it's not permitted to change - not to be confused with a paginated feed, which is much more dynamic and also, confusingly, covered in the same RFC). This would mean that you'd have the entire history if you followed the chain of "previous" URLs, but each URL would only be a paginated chunk similar to the HTML view of the feed.

I've implemented feeds which support this RFC and it's feasible, but a little fiddly. Also, most RSS feed readers are very simple beasts (and typically pretty poorly written) so they often won't utilise the fact that it's an archived feed to cache things intelligently. Often they're hard-coded to only ever show the most recent N posts, anyway.

On reflection, maybe the simpler solution is to generate a massive "link to every thread ever" as a static HTML page and just update it every hour or so on a cron job. Using a naive approach (i.e. regenerating the file every time) thsi would be a significant DB hit, but hopefully the operation could be spread over a few minutes with artificial pauses between each DB query to keep the load small. Alternatively, something more cunning could be done to re-use the static HTML for last time and filter out any threads that had been updated, and then regenerate just that portion from the DB and stick it on the front of the file.

Seems a bit of a quirky thing to do, but it might at least amortise the DB cost across a longer period and keep it predictable (as opposed to being per-request).

It's funny how things that seem so easy when we think them are so often the opposite. And how much Gigahertz speeds and high level abstraction can spoil us not realizing the detail of what really must happen for the work we do with computers.

One of the advantages of coming from an embedded software background is being acutely conscious of all the overheads that high-level abstractions can incur. One of the disadvantages, of course, is a constant nagging urge to over-engineer every aspect of a solution until it's as lightweight as possible.

It all balances out in the end. (^_^)

(And then you work on large-scale distributed systems and suddenly all those "silly" aspects of embedded development don't seem so silly any more when you're trying to handle millions of connections a second spread across a hundred machines worldwide and still keep the back-end storage consistent).

Sounds like you've been quite busy. Out of curiosity what source control is your favorite? Oh, and why...☺

EDIT: Looks like my iPhone's Google Chrome has submitted several copies of this post at varying stages of completion - strangely apropos for a post about SCM. I'll try and tidy it up now...

I don't have particularly strong opinions about source control apart from Microsoft SourceSafe which, bluntly, isn't.

I had many years using Perforce which is about the most competent centralised SCM of which I'm aware - its merge tracking is significantly better than Subversion's comparatively primitive equivalent, and the fact that it tracks client state centrally makes updates fast on even large repositories. Features such as being able to "shelve" pending changes to the server is also handy, somewhat akin to Git's stashing. It was certainly a massive improvement on the CVS I'd used prior to that (and RCS before that, although that's delving back into university days).

I've heard it may have scalability issues with very large teams (thousands of users) but it's quite possible this is admin error. As an aside, their customer support is also brilliant.

However, with boring predictability I've become quite a fan of distributed SCMs lately so my current choice would be Git, although I must confess I've only used Mercurial and Bazaar very lightly when I've come across projects which use them, and I've never had occasion to try BitKeeper or darcs.

I know some people find Git confusing at first and this doesn't surprise me too much - it's been written by a bunch of Unix hackers and it shows in the raft of minor inconsistencies and quirks across the command set. However, I personally like it for the same reasons I like C and Python - it contains a core set of very competent functionality, which takes a small set of simple concepts and combines them in powerful ways. There is then a clearly separated layer of more complex functionality built on top, but you can always go back to the basic principles to understand how this is working.

I think Git definitely rewards some careful background reading in how it works under the hood - it makes the usage patterns significantly easier to understand in context. I even went as far as to write my own little reference to Git structure which helped cement my understanding.

That said, Git is still a fairly low-level tool and is more of a toolkit with which to design a process than it is a definition of the process itself. This is in contrast to more centralised systems, which tend to be designed to support particular processes or methodologies. As a result, any teams using it need to be prepared to layer their own set of policies on top - for example, deciding whether to use a truly decentralised model or have a "golden" central repostitory that everyone pushes to, whether to gate commits on reviews being performed, how release and feature branches should be used, etc. These are decisions that face any team using SCM to some extent, but Git tends to offer more options and hence requires more decisions.

I'm a sufficient fan of Git that at my current employer, who uses Subversion, I personally use git-svn to layer a local Git repository on top of the Subversion one. This works really well as long as you use the SVN repo as your one true upstream repository - if you peer with other Git repos, it can become confused.

So, if I had to recommend an SCM tool it would be Git for preference (assuming a technical audience) or Perforce if a centralised system is mandated.

<delete me>

<delete me>

Time to bring up the seminal guide to Git using spatial analogies

PS - @Cartroo, the link to your reference is broken.

Personally I didn't find it any more or less confusing than any other DVCS. Once you understand that it references file content directly and diffs are created on demand (as opposed to many SCMs which store diffs directly and file content is created on demand) then it starts to become straightforward. Ultimately, if one expects everything to work the same as things with which you're already familiar, one'll never move on to a system which is significantly better.

Of course, there are some features like rebasing are a little intricate, but the only reason these are more confusing than other systems if that they simply don't offer these features. It's quite possible not to use them. I think this point could be better made by the Git community and/or documentation, but they're keen to push these things as they're one of the things that make Git so flexible. Another example is the fact that Git has a set of different merge strategies to choose from - nice and flexible for power users, but just another thing to confuse beginners who don't know enough to realise they can just rely on Git to do The Right Thing™ most of the time.

On the subject of the documentation, it is quite technically-worded, although Pro Git does a fine job of demystifying it in my opinion. It's a common failing of Open Source projects, and I'm sure it'll get better over time as more people try to write high level Git tutorials.

PS - @Harry: thanks, I forgot that links with colons need to be URL-escaped - should work now.