Deploying SciPy on AWS

Fun and games with Elastic Beanstalk config files

This is a bit of a technical post for RiskTree, but I thought that it would be worth making this available in case anyone else has the same problem. We’re currently developing some new features for RiskTree (hint: Monte Carlo risk analysis) and these require some sophisticated mathematical processing. The best way to do this, without reinventing the wheel, is by using the SciPy library.

All went well in developing the new code, and it was soon up and running in the development environment. This was looking good. And then we pushed it into our test environment, hosted in AWS Elastic Beanstalk (EB).

Here’s where things went wrong, and in an unusual way. Normally when we deploy, one of two things happen. Most frequently, the new build just works. Occasionally we get the dreaded ‘Internal Server Error’ message, and realize that one of the Python packages has had a problem with versioning. This time, it just hung. No response at all, until a gateway timeout from the Elastic Load Balancer occurred. So we checked the EB logs, and these were equally unhelpful:

Script timed out before returning headers: application.py

So what was going on? We checked the packages and the dependencies for SciPy, which was the only server-side change that we were making. Everything matched with our locally hosted Dev environment. Next we started searching for information about SciPy incompatibilities with EB. There is surprisingly little to be found. Fortunately, one of the results was a blog by David McKeown, who had run into exactly the same problem. Despite, by his own admission, having little experience of Python, he carefully and forensically broke the problem down into pieces and determined the cause to be running SciPy with WSGI in the Apache server used in Elastic Beanstalk. It appears to be a problem with running C extension modules, which are used by NumPy, which is a dependency for SciPy to run. Even better, David’s blog post provided a solution. We were able to take his code and rework it to fix the problem for RiskTree.

The solution is to add a folder to the .ebextensions directory. If this doesn’t already exist, you need to create it under your root directory for your Elastic Beanstalk project. Within this directory you then need to create a configuration file. This can be called anything you like, but must have a .config extension. If you have other config files here already, remember that they will be run in alphabetic order by filename. We called ours 02_scipy.config so that we’d remember what it was for! And because we already had a config file that started 01

The file contents are as follows (written in YAML):

# This file is essential for EB loading the SciPy library. Without it, EB will hang
# Reference: https://medium.com/@DaveJMcKeown/deploying-scipy-into-aws-elastic-beanstalk-2e5e481155de

container_commands:  
  AddGlobalWSGIGroupAccess: 
    command: "if ! grep -q 'WSGIApplicationGroup %{GLOBAL}' ../wsgi.conf ; then echo 'WSGIApplicationGroup %{GLOBAL}' >> ../wsgi.conf; fi;"

We’ve deliberately included a link to David’s blog post so that anyone else looking at the file can see why it does what it does. The key part is the container_commands section. This checks to see if the WSGIApplicationGroup %{GLOBAL} is already present in the wsgi.conf file, and if not, adds it in. If you want to know why this is necessary, please go and read David’s blog post, which provides the technical detail.

We’re posting this here on our blog not to claim credit, but to help anyone else who has the same problem. SciPy is a major Python library, and there must be other people who will need to deploy it to Elastic Beanstalk. David wrote his original post back in 2015; by posting this, we hope that people will know that this neat solution is just as relevant today.

May update for RiskTree

The latest update for RiskTree has just gone live. There aren’t any big, flashy features this time – just a number of incremental improvements that will make your experience smoother and easier. The most noticeable change will be at login: instead of including a field for the MFA code with the username and password, we’ve split this onto a separate box. Everyone will enter their standard credentials, but only users who have enabled MFA will be prompted for their code. It’s a small change, but it makes the logging in process a little bit clearer!

Another change is the ability to save files in a compressed format. This is particularly useful for the risk assessment reports created by RiskTree Processor, as these files can get rather large. The new compressed format has the extension .rtz (RiskTree zipped), and this can be used for loading and saving both RiskTree files and reports.

Please continue to let us have your suggestions for improvements and features for RiskTree. We’re driven by our users’ requests, and they all help to make the process better for everyone.

Principles are good, but exceptions are OK

Recently we discussed the security of RiskTree with a client, who ran through the NCSC Cloud Security Principles. Since RiskTree is delivered as software-as-a-service, this made sense. One point that arose was the lack of Multi-Factor Authentication (MFA) in use: CSP Principle 10 states that 2FA is ‘considered good practice’, using either a hardware or software token or out of band challenge.

We subsequently implemented MFA, using the Time-based One-time Password Algorithm. This allows users to enter a code (either manually, or by scanning a QR code) into an app on their mobile ‘phone, such as Google Authenticator, Microsoft Authenticator, or LastPass Authenticator. We haven’t mandated its use, preferring to let our users decide whether they need it. We’re allowing our users to make the decision about their data, because it isn’t ours to make. To help inform their decision, we used RiskTree to analyse the risks of client data being stolen from RiskTree, and put the information onto the site as a demo (https://risktree.2t-security.co.uk/demo). Consequently, some users have enabled it, and some haven’t.

The key question though, and the reason for this blog post, is “what are we actually protecting?”. RiskTree stores no risk data from any of our clients – all of the data are handled locally in the browser. For the risk calculations, just a node reference and the six assessment values are transferred to the servers for transferred to the servers for processing into the risk assessment, and creation of the data charts and visualizations, making the most of the power of cloud processing for the intensive number-crunching part of the work. The only data that we hold about our users is their name, e-mail address, and organization, and we don’t show the latter two, even if a user is logged in. There is no ability for the user to change any of these, so the risk of data tampering has been removed. The only thing than can be achieved by stealing credentials is that the attacker can use RiskTree without paying.

This leads us to conclude that for RiskTree, MFA is overkill. We’ve designed our SaaS to avoid holding any data – to an extent it’s almost like a lambda, in that a user throws some data at it and it returns some results. Whilst the principle of applying MFA to SaaS is sound, it isn’t important if there isn’t anything to protect. Having security principles is a great idea, but following them slavishly isn’t. When performing a risk assessment, always keep the context in mind, and be prepared to have exceptions.

Introducing RiskFlow™

A RiskFlow diagram, created using RiskTree

We’ve added a new data visualization tool to RiskTree. It’s a Sankey diagram for risk, showing how your risks change as you introduce countermeasures. The example above shows a typical diagram, with risks moving from intrinsic on the left, to residual in the middle, to target on the right. Residual risks are risks that have been mitigated to some extent, through the application of countermeasures. Target risks show the effect of future countermeasures – this could be a ‘what if’ scenario, or could be taken from a project plan showing that they will be introduced at a future date.

The thickness of the arrows reflects the number of risks moving between each block. So, of the eight intrinsic risks at Very High , when their residual status is assessed, three remain at this level, one becomes High , three become Medium-High , and one becomes Medium .

The diagram is interactive. As you move your mouse over the blocks and the arrows, a pop-up box lists the risks. These are all linked back to the main risk table in RiskTree, so you can drill into the data in the same way you can with all of our other charts.

The RiskTree Processor tabs, showing the location of RiskFlow

As with all of our other upgrades, RiskFlow will work for reports that have been previously created. Just reload the report file and you should see the new tab in the Risk Charts section.

Reaction to RiskFlow from our testers has been uniformly positive. We hope that you’ll feel the same way when you get to see it.

RiskTree update

We’ve just updated RiskTree ready for 2020. The changes include:

  • You can now view charts showing how you’ve used tags in your RiskTree and reports.
  • The search functionality uses tabs to display its results (much clearer!) and is now available in the RiskTree Processor, as well as the Designer.
  • The risk labels on the RiskSpider and RiskGraph data visualizations can now be wrapped to two lines (no more truncated risk names!).
  • Improvements to the CSV file downloads.
  • A number of other minor tweaks and improvements.

Hello!

Welcome to the 2T Security blog. We’ve created this so that we can write about security and risk topics that interest us, and that might be of interest to the community. We’ll also be writing about our RiskTree process for risk management, especially where this doesn’t fit in with the standard help pages.

We’re looking forward to the conversation!