Forums

Call Python scripts via a PHP cron job on an external website?

Hello. I am developing a football predictions website. Part of it uses the Understat library which uses the BeautifulSoup library to import some data from Understat.com.

An example Python script using Understat is this:

import asyncio
import json
import aiohttp
from understat import Understat
async def main():
async with aiohttp.ClientSession() as session:
    understat = Understat(session)
    fixtures = await understat.get_league_results(
        "epl",
        2020
    )
    print(json.dumps(fixtures))
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

In my local development site I have been calling my Python script via this code in a PHP script, which pulls in a json file:

$pyscript = 'get_league_results_20_21.py';
$python = 'C:/Users/Nick/AppData/Local/Programs/Python/Python38/python.exe';
$cmd = "$python $pyscript";
exec($cmd, $output);
$results = json_decode($output[0], true);

When my site is live, 3 such scripts need to be run daily or even hourly via a cron job.

My hosting provider has told me their shared hosting does not support Python and I would need to spend around 5x as much to upgrade to a VPS, which would be over-budget.

Can the Understat library and Python scripts be hosted on pythonanywhere.com and called from a PHP file on another host? If so, can the free version do that?

(I did open an account here, set up a Python 3.8 virtual environment, and installed Understat in it via pip, but got stuck after that and haven't been able to get my scripts working so far.)

You'd need a paid account for that, because free accounts can only schedule one task to run once a day. But with that, you should be able to get it to work. The trick would be to write your output JSON data to a file, and then to set up a simple static website that served up those files and nothing else. If you gave them a hard-to-guess filename and thus URL, like https://nickhope.pythonanywhere.com/daea4a00-4c4e-4684-9d83-7e17cb476c1f/file1.json, and only accessed them from your PHP code using an HTTPS URL, then you'd essentially have built a private API for your data.

Thank you very much Giles. That sounds feasible. I was hoping that, with the remote web server doing the hourly or daily scheduling via cron jobs (rather than pythonanywhere doing the scheduling), that there might be a way to just run the understat python script on pythonanywhere (triggered by the call from the PHP script on the remote server) and for the json output to become directly available for the PHP script on the remote server, as the script is run. It is only a backend database update script and not run on demand when users visit the site, so speed is relatively unimportant. In other words, my PHP script would look and work similar to that shown in my first post, but with paths on pythonanywhere to '.../get_league_results_20_21.py' and '.../python.exe'.

Does that make sense? I am new to much of this so I may well be imagining something that is ridiculous.

A specific question: Is it possible to test your suggested method with a free account, accepting that the task can only run once a day?

You can't do it like that. You could create a web app with an endpoint that triggers the execution of some code, but it's not a way to run long-running scripts. On paid accounts, you have https://help.pythonanywhere.com/pages/AlwaysOnTasks/ You can create an endpoint that triggers something that your alwayson task is looking at.

Thank you fjl.

So I intend to set things up to serve JSON files via a simple static website as Giles suggested. Before switching to a paid account I'm trying to test that things will work, but I'm stuck...

I set up a Python 3.8 virtual environment at /home/nickhope/.virtualenvs/understat, installed the Understat library and it's dependencies using pip, and verified they are there with 'pip list'.

I made a folder /home/nickhope/understat/ and uploaded my 3 simple python scripts. One of them, for example, is called get_league_results_20_21.py and contains the exact code in the upper code block in my original post.

On the Web tab I set these:

Virtualenv: /home/nickhope/.virtualenvs/understat Source code: /home/nickhope/understat Working directory: /home/nickhope/ WSGI configuration file: /var/www/nickhope_pythonanywhere_com_wsgi.py

I don't know what should go in my nickhope_pythonanywhere_com_wsgi.py file. So far I have this:

import sys
path = '/home/nickhope/understat'
if path not in sys.path:
    sys.path.append(path)
from my_wsgi_file import application  # noqa

I am pretty sure the last line needs editing but I don't know what to change it to. I have done a lot of research on this but nearly everything I find is for Django, Flask etc., which I'm not using.

Finally, I tried running get_league_results_20_21.py in a console. The first time I run it I get:

ModuleNotFoundError: No module named 'aiohttp'

After commenting that line out and running again I get:

ModuleNotFoundError: No module named 'understat'

It looks like I have some fundamental things wrong. Any help would be greatly appreciated.

If you're not using Flask or Django, what web framework are you using? If you're not using a web framework, then you'd have to be writing raw WSGI code.

Your import errors are probably because you have not activated your virtualenv in the console you're using (assuming you installed aiohttp into your virtualenv). The early parts of this page describe the basics of using virtualenvs in consoles.

Thanks for your reply. I'm not using a web framework (as far as I know). I'm just trying to run simple scripts like that seen in my original post. I didn't write the script myself. It came from the Understat documentation.

aiohttp was installed as a dependency in my virtual environment when I ran 'pip install understat'.

I got the 'ModuleNotFoundError' errors at the bottom of my last comment when I navigated to the get_league_results_20_21.py file through the "Files" tab and ran it using the ">>>Run" button.

So when I run it in a bash consoe using 'python get_league_results_20_21.py' I get a bit futher but it fails with the error messages shown below. I suppose these could be errors with my script or the library or even the site that it's connecting to but it does run for me locally here.

Traceback (most recent call last):
  File "/home/nickhope/.virtualenvs/understat/lib/python3.8/site-packages/aiohttp/connector.py", line 936, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore  # noqa
  File "/usr/lib/python3.8/asyncio/base_events.py", line 1017, in create_connection
    raise exceptions[0]
  File "/usr/lib/python3.8/asyncio/base_events.py", line 1002, in create_connection
    sock = await self._connect_sock(
  File "/usr/lib/python3.8/asyncio/base_events.py", line 916, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/lib/python3.8/asyncio/selector_events.py", line 485, in sock_connect
    return await fut
  File "/usr/lib/python3.8/asyncio/selector_events.py", line 517, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('87.236.16.151', 443)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "get_league_results_20_21.py", line 26, in <module>
    loop.run_until_complete(main())
  File "/usr/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
    return future.result()
  File "get_league_results_20_21.py", line 13, in main
    fixtures = await understat.get_league_results(
  File "/home/nickhope/.virtualenvs/understat/lib/python3.8/site-packages/understat/understat.py", line 95, in get_league_results
    dates_data = await get_data(self.session, url, "datesData")
  File "/home/nickhope/.virtualenvs/understat/lib/python3.8/site-packages/understat/utils.py", line 55, in get_data
    html = await fetch(session, url)
  File "/home/nickhope/.virtualenvs/understat/lib/python3.8/site-packages/understat/utils.py", line 29, in fetch
    async with session.get(url) as response:
  File "/home/nickhope/.virtualenvs/understat/lib/python3.8/site-packages/aiohttp/client.py", line 1012, in __aenter__
    self._resp = await self._coro
  File "/home/nickhope/.virtualenvs/understat/lib/python3.8/site-packages/aiohttp/client.py", line 480, in _request
    conn = await self._connector.connect(
  File "/home/nickhope/.virtualenvs/understat/lib/python3.8/site-packages/aiohttp/connector.py", line 523, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "/home/nickhope/.virtualenvs/understat/lib/python3.8/site-packages/aiohttp/connector.py", line 858, in _create_connection
    _, proto = await self._create_direct_connection(
  File "/home/nickhope/.virtualenvs/understat/lib/python3.8/site-packages/aiohttp/connector.py", line 1004, in _create_direct_connection
    raise last_exc
  File "/home/nickhope/.virtualenvs/understat/lib/python3.8/site-packages/aiohttp/connector.py", line 980, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
  File "/home/nickhope/.virtualenvs/understat/lib/python3.8/site-packages/aiohttp/connector.py", line 943, in _wrap_create_connection
    raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host understat.com:443 ssl:default [Connect call failed ('87.236.16.151', 443)]

Free users can only connect to whitelisted sites. You would not be able to connect to some ip like "87.236.16.151". See https://www.pythonanywhere.com/whitelist/

Ah, I see. I upgraded to a $5 account and have been able to get this done in the way giles suggested. Thank you very much for your help, giles, fjl & glenn.

Although things are working, I just wanted to clear this up. Apologies for my Python inexperience...

This is my current WSGI configuration file at /var/www/nickhope_pythonanywhere_com_wsgi.py:

import sys
path = '/home/nickhope/understat'
if path not in sys.path:
    sys.path.append(path)
from my_wsgi_file import application  # noqa

As I'm not using a framework, I've only uncommented some of the CUSTOM WSGI section. Is the last line correct or even necessary? I guess "my_wsgi_file" and "application" are placeholders but I'm not sure exactly what to replace them with. Not surprisingly I got some "No module named 'my_wsgi_file'" errors in my error log yesterday.

Or perhaps none of the lines are necessary in my case?

It's a placeholder, but there must be some application available in the wsgi file. It's required by the protocol.