You can look forward to those annoying console freezes going away in the very near future folks!
THANK YOU VERY MUCH for addressing this: "consoles sometimes seem to hang for a few seconds, once every few minutes? Well, it turns out there was blocking call somewhere in our ioloop, but we hadn't really noticed because we were off in our own little cluster. Boy is it annoying on live though! So we've spent a couple of days tracking down the problem, and we've now got a fix going through our integration loop."
Yep, being in exactly the same environment as our users really made all the difference. When you feel pain, we feel pain... Plus it feels more friendly :)
PA...the best dogfood I've ever tasted...err...ONLY!!...yeah, that's it only.
I have been awaiting improvements in reducing latency -- but so far, the console is still very sluggish. Sometimes it is even hard to know if a entered letter symbol has been received by PAW. Thus sadly, the console is currently not "usable" as advertised.
I'm wondering if the particular AMI (Amazon Machine Image) instance I have been assigned is overloaded with intense processes by shared users. Hard for an user to determine that because the "ps" command, as well as "top," is disabled. Please kindly check on this. Maybe give console processes better "nice" values :-)
I would appreciate other insights into how the console (via browser) can be made snappy and responsive. Thank you very much.
In my experience, latency is often a trickier problem to solve than throughput, especially on virtualised and hosted systems. Looks like it's definitely being actively worked on, however, so I'd hope things will be improving.
Yes, do take a look at the blog post. We've got a bunch of changes planned to the console server, that we hope to be able to roll out in the next week or so. We've already removed one major bug that would cause 3-4 second pauses on consoles.
Aside from that, it would be good to understand how you're using the consoles? What do you do in them, is it mostly launching commands from Bash, or doing line-by-line manipulations in an IPython console?
Many of us use the consoles every day for development, using
vim. We find the experience far from satisfactory, but certainly it's still "usable" - but that may be because
vi was actually designed as an editor for high-latency environments... http://www.theregister.co.uk/2003/09/11/bill_joys_greatest_gift/
hi Harry, when PAW launched, vim via the console through the browser was responsive -- and impressive, because all the customization files could be placed in a Dropbox folder. Now I hesitate to to use vim since it is difficult to know whether any editing commands has been properly executed, and there is a long delay before visually seeing the edited result in the browser.
If an user cannot be fully confident in the operations of the text editor, clearly writing code is a no-go (this not directed at Go fans :-) Perhaps vim should get a better "nice" value (seriously, it looks like it's connected to a 300-baud modem which is 0.03 kB/s). But sure, as Cartroo points out, there's a vast difference between CPU and I/O bound processes.
I appreciate the efforts by PAW, and will check in later on the progress being made. Thanks again.
Hi rsvp -- thanks for the note.
Just to update everyone on where we are right now: we're pushing a big change through our integration testing system. We have high hopes for it -- it moves processes around so that the load on the console server should be significantly smaller. If it all works then vi should become much more usable. We're also planning to release a change that should make it harder for out-of-control web apps belonging to one user from affecting other user' apps.
The next change is going to streamline the way the file browser works. We have a first cut of something that makes the directory listing page much faster, but doesn't speed up downloading files and editing and saving them in the in-browser editor. This needs to go through rigorous security testing -- the streamlining changes the way we sandbox everyone's data, so it's high-risk. We'll also push that one live as soon as we're happy with it, and then move right on to streamlining the downloading, editing and saving.
We've got a bunch of other ideas about how we can speed things up, but our plan is currently to get the stuff that's in the pipeline done, then to work out where that moves the bottlenecks to before we decide which ones to go for.
So, in summary -- we're working on performance and nothing else right now. We know that it's been becoming unacceptable over the last few weeks.
@giles: Thanks for the update. I'm not sure how much control over the underlying scheduling the virtualised system gives you, but if it's based on Linux's rather optimistically named Completely Fair Scheduler, there are a few things that can be tweaked to adjust latency. For example, if you can manage to split off the interactive tasks into more or less their own instance, you could try cranking down the scheduling granularity - this may make things rather less efficient overall, but it might make things feel snappier for things like text editing (mostly idle, but need low latency).
Just another idea to throw into the mix, in case it hasn't cropped up so far.
@Cartroo -- that's interesting, thanks!
I think another important step we should consider taking is to split the console servers (which are tornado-based servers that handle passing messages from and to in-browser consoles) from the place where the code executes. Right now, even after we've moved the scheduled tasks to their own server type, the stuff that people run from consoles will still run on the same instances as the console server. Which means that if the server gets overloaded then pretty much no matter what we do to boost the amount of CPU available to the console server (cgroups, etc) it will get bogged down.
Of course, if a keystroke from a browser has to go
browser --websockets--> console server --ssh--> execution server
and back again before it's echoed, then we might wind up worsening the latency even further -- but hopefully not too much, as the last hop would be entirely on the fast internal Amazon network. Hopefully...
Presumably one of the principle advantages of offloading console and execution on to separate servers is that you can scale them independently? I guess the execution speed can be scaled according to account type in principle, whereas the console load is probably an approximately constant per-user load.
But yes, it does depend on Amazon's internal network being fairly reliably low latency - fingers crossed, eh? (^_^)
+100 Giles: "split the console servers from the place where the code executes."
I will bet that dramatic reduction in console latency will result.
@Cartroo -- exactly! And it also means that we can insulate console performance from machine load better. Cgroups and other virtualisation systems are great, but there's nothing that stops process X from affecting process Y's performance more thoroughly and reliably than putting them on different hardware :-)
@rsvp -- that's what we're hoping... lots of testing to do first, though.