Deploying SciPy on AWS

Fun and games with Elastic Beanstalk config files

This is a bit of a technical post for RiskTree, but I thought that it would be worth making this available in case anyone else has the same problem. We’re currently developing some new features for RiskTree (hint: Monte Carlo risk analysis) and these require some sophisticated mathematical processing. The best way to do this, without reinventing the wheel, is by using the SciPy library.

All went well in developing the new code, and it was soon up and running in the development environment. This was looking good. And then we pushed it into our test environment, hosted in AWS Elastic Beanstalk (EB).

Here’s where things went wrong, and in an unusual way. Normally when we deploy, one of two things happen. Most frequently, the new build just works. Occasionally we get the dreaded ‘Internal Server Error’ message, and realize that one of the Python packages has had a problem with versioning. This time, it just hung. No response at all, until a gateway timeout from the Elastic Load Balancer occurred. So we checked the EB logs, and these were equally unhelpful:

Script timed out before returning headers: application.py

So what was going on? We checked the packages and the dependencies for SciPy, which was the only server-side change that we were making. Everything matched with our locally hosted Dev environment. Next we started searching for information about SciPy incompatibilities with EB. There is surprisingly little to be found. Fortunately, one of the results was a blog by David McKeown, who had run into exactly the same problem. Despite, by his own admission, having little experience of Python, he carefully and forensically broke the problem down into pieces and determined the cause to be running SciPy with WSGI in the Apache server used in Elastic Beanstalk. It appears to be a problem with running C extension modules, which are used by NumPy, which is a dependency for SciPy to run. Even better, David’s blog post provided a solution. We were able to take his code and rework it to fix the problem for RiskTree.

The solution is to add a folder to the .ebextensions directory. If this doesn’t already exist, you need to create it under your root directory for your Elastic Beanstalk project. Within this directory you then need to create a configuration file. This can be called anything you like, but must have a .config extension. If you have other config files here already, remember that they will be run in alphabetic order by filename. We called ours 02_scipy.config so that we’d remember what it was for! And because we already had a config file that started 01

The file contents are as follows (written in YAML):

# This file is essential for EB loading the SciPy library. Without it, EB will hang
# Reference: https://medium.com/@DaveJMcKeown/deploying-scipy-into-aws-elastic-beanstalk-2e5e481155de

container_commands:  
  AddGlobalWSGIGroupAccess: 
    command: "if ! grep -q 'WSGIApplicationGroup %{GLOBAL}' ../wsgi.conf ; then echo 'WSGIApplicationGroup %{GLOBAL}' >> ../wsgi.conf; fi;"

We’ve deliberately included a link to David’s blog post so that anyone else looking at the file can see why it does what it does. The key part is the container_commands section. This checks to see if the WSGIApplicationGroup %{GLOBAL} is already present in the wsgi.conf file, and if not, adds it in. If you want to know why this is necessary, please go and read David’s blog post, which provides the technical detail.

We’re posting this here on our blog not to claim credit, but to help anyone else who has the same problem. SciPy is a major Python library, and there must be other people who will need to deploy it to Elastic Beanstalk. David wrote his original post back in 2015; by posting this, we hope that people will know that this neat solution is just as relevant today.