Forums

Server returning HTTP 403 for Telegram Webhook

Hello!

First of all, thanks for the great support. Every time I needed it, you did great.

Now let's get to the real thing. I'm using this PyAnywhere account to host some Telegram Bots, and I've set webhooks so Telegram would POST my bots' update to my page. It's all good until here. The problem arises when, sometimes, the server responds with a HTTP 403, and I don't know why. These HTTP 403 seem to happen randomly (which means HTTP 200 happens too), but they've become more frequent lately. My application uses Django and I deactivated the CsrfMiddleware, since my views are all for the bots.

This is the last line of my access log, for reference:

149.154.167.213 - - [18/Mar/2016:20:50:07 +0000] "POST /bot/<TOKEN-REDACTED>/ HTTP/1.1" 403 1477 "-" "-" "149.154.167.213"

Really, I have absolutely no idea why HTTP 403 is happening here.

Any ideas for what might be going on?
Thanks in advance!

Fabricio

This is what I get when I reload the webapp and try sending the command in Telegram again:

149.154.167.217 - - [18/Mar/2016:21:03:53 +0000] "POST /bot/<TOKEN-REDACTED>/ HTTP/1.1" 200 5 "-" "-" "149.154.167.217"

Fabricio

Interesting! My first thought was that it might be CSRF-related, but you've already ruled that out. Is there any pattern -- for example, it works for some particular amount of time and then consistently 403s? Or is it more random?

Yeah, there is one. In fact, it's just like you said: it goes 200 for some amount of time (can't really say how much, though), and then BAM, 403s all the way. Currently, I have my daily task touch the wsgi file every six hours in order to reload the webapp automatically.

(context for the touch daily task below)
I've created this script yesterday, and set it to run everyday at 00:00 UTC.

1
2
3
4
5
#!/bin/bash

for i in 0 6 12 18; do
    (sleep $[3600*$i] ; touch "/var/www/fawerstelebot_pythonanywhere_com_wsgi.py") &
done

I'll have it run until we have a solution for this problem.

That's really interesting. I've heard of people getting a problem like this with Flask, but never with Django.

Is there any way to ask Telegram for the contents of the 403 responses? Or perhaps you could curl for it next time it happens? It would be interesting to know which level of the stack is sending it, and the content might help track that down.

Is there any way to ask Telegram for the contents of the 403 responses?

I'm afraid not. :/

Or perhaps you could curl for it next time it happens?

Well, I could simulate a POST request when I see them 403s coming. I'll deactivate the daily task so they can happen, and then I reply to this thread with the content of the response. Maybe a pickled response object? :p

Sounds good! Hopefully it won't be anything too horrible to interpret...

Hi Giles. How are you?

I'm back with some more info. Since last time, I left DEBUG = True so I could extract more info from the response. This is what I got after all:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8">
  <meta name="robots" content="NONE,NOARCHIVE">
  <title>403 Forbidden</title>
  <style type="text/css">
    html * { padding:0; margin:0; }
    body * { padding:10px 20px; }
    body * * { padding:0; }
    body { font:small sans-serif; background:#eee; }
    body>div { border-bottom:1px solid #ddd; }
    h1 { font-weight:normal; margin-bottom:.4em; }
    h1 span { font-size:60%; color:#666; font-weight:normal; }
    #info { background:#f6f6f6; }
    #info ul { margin: 0.5em 4em; }
    #info p, #summary p { padding-top:10px; }
    #summary { background: #ffc; }
    #explanation { background:#eee; border-bottom: 0px none; }
  </style>
</head>
<body>
<div id="summary">
  <h1>Forbidden <span>(403)</span></h1>
  <p>CSRF verification failed. Request aborted.</p>

  <p>You are seeing this message because this HTTPS site requires a &#39;Referer header&#39; to be sent by your Web browser, but none was sent. This header is required for security reasons, to ensure that your browser is not being hijacked by third parties.</p>
  <p>If you have configured your browser to disable &#39;Referer&#39; headers, please re-enable them, at least for this site, or for HTTPS connections, or for &#39;same-origin&#39; requests.</p>


</div>

<div id="explanation">
  <p><small>More information is available with DEBUG=True.</small></p>
</div>

</body>
</html>

For some reason, it's trying to validate CSRF. That got me by surprise. I commented out the CsrfMiddleware, so how is that even possible?
On the other hand, it also asks for the Referer header. I don't think there's a way to append this header from the Telegram bot request - I have no control over any of the requests the server makes.

Thoughts?

Thanks,
Fabricio

(And yes, it ran flawless for 10 days before this 403)
Fabricio

That's really weird. An experiment -- I know you've disabled the CSRF middleware, but perhaps you could add a @csrf_exempt to the view that's causing the problem?

perhaps you could add a @csrf_exempt to the view that's causing the problem?

Just added it. Let's see how it behaves...
I'll be back with more info when possible.

Thanks!

Hi Giles,

The problem seems to have somthing to do with the Referer header after all. Even with the csrf_exempt decorator, I still get the same error as before.

This is the rendered error:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8">
  <meta name="robots" content="NONE,NOARCHIVE">
  <title>403 Forbidden</title>
  <style type="text/css">
    html * { padding:0; margin:0; }
    body * { padding:10px 20px; }
    body * * { padding:0; }
    body { font:small sans-serif; background:#eee; }
    body>div { border-bottom:1px solid #ddd; }
    h1 { font-weight:normal; margin-bottom:.4em; }
    h1 span { font-size:60%; color:#666; font-weight:normal; }
    #info { background:#f6f6f6; }
    #info ul { margin: 0.5em 4em; }
    #info p, #summary p { padding-top:10px; }
    #summary { background: #ffc; }
    #explanation { background:#eee; border-bottom: 0px none; }
  </style>
</head>
<body>
<div id="summary">
  <h1>Forbidden <span>(403)</span></h1>
  <p>CSRF verification failed. Request aborted.</p>

  <p>You are seeing this message because this HTTPS site requires a &#39;Referer header&#39; to be sent by your Web browser, but none was sent. This header is required for security reasons, to ensure that your browser is not being hijacked by third parties.</p>
  <p>If you have configured your browser to disable &#39;Referer&#39; headers, please re-enable them, at least for this site, or for HTTPS connections, or for &#39;same-origin&#39; requests.</p>


</div>

<div id="explanation">
  <p><small>More information is available with DEBUG=True.</small></p>
</div>

</body>
</html>

That's really weird, though -- it says that it's a CSRF-related message, and the only place that message appears in the Django source is in the CSRF middleware. It's as if it's ignoring the @csrf_exempt decorator.

Can I take a look at your code? We can see it from our side, but we always ask permission first. Just let me know which Django app and URL the relevant bit is in -- over a "Send feedback" message if you don't want to post it publicly.

I just added the CSRF_TRUSTED_ORIGINS setting. Let's hope it solves the issue.

Fabricio

Hi Giles,

Feel free to look at the code. The app is the default one (telebot) and the view is bot_webhook.

Thanks! I'll take a look when I'm back in the office tomorrow.

Great! Thanks.

Fabricio

Is there a possibility that this is happening when your app hasn't been hit for a while? The thing is, when a website hasn't been accessed for 26 hours, we hibernate it. Then we wake it up the next time a hit comes in. It's possible that the wake-up isn't working properly and is somehow causing some kind of CSRF problems.

Evidence for that would be if the problem occurs after there's been no traffic to the site for 26 hours; evidence against would be if it's happening even when the site has been constantly busy.

Is there a possibility that this is happening when your app hasn't been hit for a while? The thing is, when a website hasn't been accessed for 26 hours, we hibernate it. Then we wake it up the next time a hit comes in. It's possible that the wake-up isn't working properly and is somehow causing some kind of CSRF problems.

Still, that's weird. Why would the CSRF checking re-activate on its own? I really don't get it.

Anyway, I found this on Django's CsrfViewMiddleware:

referer = force_text(
    request.META.get('HTTP_REFERER'),
    strings_only=True,
    errors='replace'
)
if referer is None:
    return self._reject(request, REASON_NO_REFERER)

So it really is expecting a Referer header when it checks for the CSRF. I wrote my own Middleware that injects HTTP_REFERER into the request META dict; let's see how it behaves from now on.

Here's what I'm thinking: when your app is hibernated, then gets a hit and needs to be woken up, some of our code is executed to reactivate it. That code is run by Django -- our own Django instance, not yours. I'm wondering if our Django code messes up something that yours is relying on, which generates an error that it's mis-reporting as a CSRF problem. It's a really fuzzy idea -- the code paths in that part of the system are a bit tricky, with nginx, uWSGI and Django all talking to each other asynchronously.

If we could narrow the problem down to "this only happens when the website is woken up after hibernation" that would narrow things down to a smaller bit of code that I could instrument to see if we can track things down a little better.

That code is run by Django -- our own Django instance, not yours.

Makes sense. It may be because of it that mine is asking for the CSRF token. It makes so much sense that I think this could also be the problem with Flask's 403 you mentioned some posts earlier. And that's why a GET hit isn't affected by any of this, but POST is... Feels like all is clear now. :p

If we could narrow the problem down to "this only happens when the website is woken up after hibernation" that would narrow things down to a smaller bit of code that I could instrument to see if we can track things down a little better.

Well, my friends and I (and even some random people sometimes) use the bots I'm developing in a daily basis, so I can't guarantee there won't be a hit for 26 hours. But let's try it! And since your instance of Django is executed when mine hibernates, the middleware I wrote is completely disposable; will remove it after the next 403.

Side question

Does the python requests GET request my app receives have anything to do with the "reload webapp" button on the Web tab?

Just had a eureka moment -- I think there is a way this could be triggered. The symptoms of this code path would be:

  • Website gets stopped for some reason -- perhaps by hibernation, but importantly this could also happen if the web server your app is running on got rebooted, or if your site was temporarily down for another reason.
  • After this happens, your web app gets POST requests, which should start it up.
  • Those POST requests would just lead to you getting CSRF errors.
  • But if you did a GET request to any page on your site, it would spin up and start handling things normally.

This one eliminates the 26 hour thing -- it could happen if the web server was rebooted, or any other number of things that happen intermittently.

So, my question is: are almost all of the requests coming into your web app POSTs? And if they are, does any GET to the app fix it without reloading?

We've just pushed a system patch that fixes that problem. It looks like if a web app was shut down for some reason (system reboot, certain kinds of glitch, maybe hibernation) then only a GET request would wake it up. POST requests, in particular, would be rejected with a CSRF error, and the app wouldn't be woken up. So if your app is one that processes mostly POST requests, you'd see this problem.

Our new code wakes up the app when a POST is received. One slight issue remains -- the first POST request that wakes it up will receive a "503 Service Unavailable" response with the "retry-after" header set to "5". If you handle this and do the retry, then the next request will work. We believe that browsers do that automatically, but unfortunately the requests library doesn't, and it's possible that the Telegram webhooks don't either.

Hi Giles,

thanks for the updates.

So, my question is: are almost all of the requests coming into your web app POSTs?

About 99% of them, I'd say?

We've just pushed a system patch that fixes that problem. (...) Our new code wakes up the app when a POST is received.

Great! I appreciate that!

the first POST request that wakes it up will receive a "503 Service Unavailable" response with the "retry-after" header set to "5".

Nice. Helps a lot anyway. Telegram sends the webhook requests until the server responds with 200 or, if it doesn't, for some amount of time. I'd say it tries for 5 to 10 seconds before giving up. So even if the server responds with 503, it would still be able to receive and process the next request, even after 5 seconds.


I appreciate all your support, Giles. Really. This was the biggest problem I had with your service, and yet you guys have managed to fix it gracefully. Thanks a LOT!

Fabricio
fswerneck | fawerstelebot

Glad we could help! And thank you for posting about it; we'd seen this problem occasionally with some users' websites in the past, but we'd never managed to work out what the cause was. It was you giving so much information, and sticking with us while we tried to debug it -- combined with another equally helpful person with the same problem contacting us over email -- that helped us finally work out what the problem was. Really glad to have it fixed!

Really glad to have it fixed!

I guess today's tea time was good! heheh.

SOLVED AND CLOSED.

              / ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄\    
<<<<<<:>~  <   Yay!           |
              \_________/