Deploying SciPy on AWS

Fun and games with Elastic Beanstalk config files

This is a bit of a technical post for RiskTree, but I thought that it would be worth making this available in case anyone else has the same problem. We’re currently developing some new features for RiskTree (hint: Monte Carlo risk analysis) and these require some sophisticated mathematical processing. The best way to do this, without reinventing the wheel, is by using the SciPy library.

All went well in developing the new code, and it was soon up and running in the development environment. This was looking good. And then we pushed it into our test environment, hosted in AWS Elastic Beanstalk (EB).

Here’s where things went wrong, and in an unusual way. Normally when we deploy, one of two things happen. Most frequently, the new build just works. Occasionally we get the dreaded ‘Internal Server Error’ message, and realize that one of the Python packages has had a problem with versioning. This time, it just hung. No response at all, until a gateway timeout from the Elastic Load Balancer occurred. So we checked the EB logs, and these were equally unhelpful:

Script timed out before returning headers:

So what was going on? We checked the packages and the dependencies for SciPy, which was the only server-side change that we were making. Everything matched with our locally hosted Dev environment. Next we started searching for information about SciPy incompatibilities with EB. There is surprisingly little to be found. Fortunately, one of the results was a blog by David McKeown, who had run into exactly the same problem. Despite, by his own admission, having little experience of Python, he carefully and forensically broke the problem down into pieces and determined the cause to be running SciPy with WSGI in the Apache server used in Elastic Beanstalk. It appears to be a problem with running C extension modules, which are used by NumPy, which is a dependency for SciPy to run. Even better, David’s blog post provided a solution. We were able to take his code and rework it to fix the problem for RiskTree.

The solution is to add a folder to the .ebextensions directory. If this doesn’t already exist, you need to create it under your root directory for your Elastic Beanstalk project. Within this directory you then need to create a configuration file. This can be called anything you like, but must have a .config extension. If you have other config files here already, remember that they will be run in alphabetic order by filename. We called ours 02_scipy.config so that we’d remember what it was for! And because we already had a config file that started 01

The file contents are as follows (written in YAML):

# This file is essential for EB loading the SciPy library. Without it, EB will hang
# Reference:

    command: "if ! grep -q 'WSGIApplicationGroup %{GLOBAL}' ../wsgi.conf ; then echo 'WSGIApplicationGroup %{GLOBAL}' >> ../wsgi.conf; fi;"

We’ve deliberately included a link to David’s blog post so that anyone else looking at the file can see why it does what it does. The key part is the container_commands section. This checks to see if the WSGIApplicationGroup %{GLOBAL} is already present in the wsgi.conf file, and if not, adds it in. If you want to know why this is necessary, please go and read David’s blog post, which provides the technical detail.

We’re posting this here on our blog not to claim credit, but to help anyone else who has the same problem. SciPy is a major Python library, and there must be other people who will need to deploy it to Elastic Beanstalk. David wrote his original post back in 2015; by posting this, we hope that people will know that this neat solution is just as relevant today.

How we’re using SpectX log analysis as a security tool

I have found that in some cases, running log analysis tools can be complicated to learn and cumbersome to support, especially for smaller-scale projects. Each project or company needs the log analysis capability: some large commercial tools and software are excessive or overkill.

Here at 2T Security we are hosting a web service which offers one of our UK clients visualisations of some large datasets in a read-only format, which is for the consumption of external nationally based third parties. This web service is hosted on Amazon Web Service (AWS) with S3 buckets focused with Web Application Firewall (WAF) and application logging.

As this is an external web service, we have implemented security measures and the tool we are using for monitoring and analytics is SpectX.

SpectX is used as a log analyser for data; analytics, investigation and manipulation. It reads and parses raw log files to return a structured and accessible format. SpectX is directed to the data source of the web service and only returns the result of the script and not the raw data. SpectX does not require additional storage locally to query data as it pulls directly from the source for processing.

One example of how we monitor our web service is by identifying attempts to the webservice and by using open source threat feeds we have created a more broader security monitoring aspect:

SpectX is given the API credentials to gain access to the S3 buckets in a single pane window. Examples have been anonymised below.

Using SpectX Parsing rules, I can identify what the logs look like and what the output should be displayed as:

The pattern defined above is for an AWS WAF to highlight all blocked attempts to the web service for the last 10 days. By using this pattern, I can increase the extraction of data fields to return the IP address in question with the geo tag and country code.

Using SpectX  I can break apart the logs generated by AWS and display them in an easily readable format. The results are displayed as follows:

The results have produced a list of IPs attempting access with the geolocation and country code attached. This can be displayed with graphics such as maps:

Using this information, I can collate and compare the data to query other results such as known bad IP lists, in this case IPsum. Though there are many open source threat feeds available to use, IPsum is my preferred as it is updated daily with the latest known bad IPs collated from various sources then pushed to a grabbable github repository. There numerous others which can be interchangeable, such as Blocklist or CIA army.

Inputting data to SpectX is fairly straightforward. The URL used for the source is parsed and returned with a preview:

When preparing a query, it contains the prewritten pattern. In this instance, I need the IP address, therefore I will select only this. The below pattern will pull the most recent master list of IPs:

SpectX will simultaneously run the two queries (“IPsum” and “known bad lookups”) to pull all blocked attempts to the webservice, whilst querying the most recent version of IPsum’s data.

$pattern = <<<PATTERN
   }(greedy='AWSX'):AWS EOL
| filter (path_time >= now()[-135 day] AND path_time <= now()[+1 day])
| parse(pattern:$pattern)
| select(AWS[timestamp], AWS[action],AWS[terminatingRuleId], IPADDR(AWS[AWSX][httpRequest][clientIp]) as IP)
| select (count(IP) as Hits, IP)
| group(IP)
| join(@[/PATH/Report/Bad IPSUM] on left.IP = right.IPaddress)
| select(*, geo(IP), CC(IP))
| sort(Hits DESC)
| select(*, case_when:
           WHEN cc='--' THEN (DNS_LOOKUP(IP))
           ELSE CC(IP)

As a result, comparing both the known bad IPs with the blocked IP attempts returns with the following:

With this result, I can clearly see who has been attempting to gain access to the webservice from an identified and known bad IP address. In this instance although the IP address was not listed as a country, I also ran a DNS Lookup for further understanding into the IP owner: this returned null. After a domain lookup, I was able to identify that the address has been reported as abusive from a Russian domain.

I then ran this same query with successful login attempts to be able to check if there are any known bad IPs gaining access to the network. Fortunately, there are none.

Other instances in which SpectX might be used is to recognise any anomalies. For example, when trying to find the most popular HTTP version visiting the webservice the below was found:

$pattern = <<<PATTERN
   }(greedy='AWSX'):AWS EOL
| filter(path_time BETWEEN T('2020-07-16 00:00:00 +0200') AND now()[+1 day])
| parse(pattern:$pattern)
| select(AWS[AWSX][httpRequest][httpVersion] as httpVersion)
| select (*, count(httpVersion) as Hits)
| group (httpVersion)
| sort(Hits DESC) 

Although HTTP versions 1.0/ 1.1/ 2 are all expected, one anomaly was identified as likely spoofed HTTP header. Upon closer inspection I found it was a blocked attempt from Romanian-based IP known for IP scanning.

2T Security is a partner of SpectX. We use it internally and we are now using it for regular monitoring to baseline what we know to be normal. The more frequently we use SpectX, the easier and more efficient our processes will become when identifying anomalies, security alerts and other negative or unrecognised activities.

For more information about SpectX and how we have utilised the tool with our own systems and projects please contact us

May update for RiskTree

The latest update for RiskTree has just gone live. There aren’t any big, flashy features this time – just a number of incremental improvements that will make your experience smoother and easier. The most noticeable change will be at login: instead of including a field for the MFA code with the username and password, we’ve split this onto a separate box. Everyone will enter their standard credentials, but only users who have enabled MFA will be prompted for their code. It’s a small change, but it makes the logging in process a little bit clearer!

Another change is the ability to save files in a compressed format. This is particularly useful for the risk assessment reports created by RiskTree Processor, as these files can get rather large. The new compressed format has the extension .rtz (RiskTree zipped), and this can be used for loading and saving both RiskTree files and reports.

Please continue to let us have your suggestions for improvements and features for RiskTree. We’re driven by our users’ requests, and they all help to make the process better for everyone.

Principles are good, but exceptions are OK

Recently we discussed the security of RiskTree with a client, who ran through the NCSC Cloud Security Principles. Since RiskTree is delivered as software-as-a-service, this made sense. One point that arose was the lack of Multi-Factor Authentication (MFA) in use: CSP Principle 10 states that 2FA is ‘considered good practice’, using either a hardware or software token or out of band challenge.

We subsequently implemented MFA, using the Time-based One-time Password Algorithm. This allows users to enter a code (either manually, or by scanning a QR code) into an app on their mobile ‘phone, such as Google Authenticator, Microsoft Authenticator, or LastPass Authenticator. We haven’t mandated its use, preferring to let our users decide whether they need it. We’re allowing our users to make the decision about their data, because it isn’t ours to make. To help inform their decision, we used RiskTree to analyse the risks of client data being stolen from RiskTree, and put the information onto the site as a demo ( Consequently, some users have enabled it, and some haven’t.

The key question though, and the reason for this blog post, is “what are we actually protecting?”. RiskTree stores no risk data from any of our clients – all of the data are handled locally in the browser. For the risk calculations, just a node reference and the six assessment values are transferred to the servers for transferred to the servers for processing into the risk assessment, and creation of the data charts and visualizations, making the most of the power of cloud processing for the intensive number-crunching part of the work. The only data that we hold about our users is their name, e-mail address, and organization, and we don’t show the latter two, even if a user is logged in. There is no ability for the user to change any of these, so the risk of data tampering has been removed. The only thing than can be achieved by stealing credentials is that the attacker can use RiskTree without paying.

This leads us to conclude that for RiskTree, MFA is overkill. We’ve designed our SaaS to avoid holding any data – to an extent it’s almost like a lambda, in that a user throws some data at it and it returns some results. Whilst the principle of applying MFA to SaaS is sound, it isn’t important if there isn’t anything to protect. Having security principles is a great idea, but following them slavishly isn’t. When performing a risk assessment, always keep the context in mind, and be prepared to have exceptions.

Introducing RiskFlow™

A RiskFlow diagram, created using RiskTree

We’ve added a new data visualization tool to RiskTree. It’s a Sankey diagram for risk, showing how your risks change as you introduce countermeasures. The example above shows a typical diagram, with risks moving from intrinsic on the left, to residual in the middle, to target on the right. Residual risks are risks that have been mitigated to some extent, through the application of countermeasures. Target risks show the effect of future countermeasures – this could be a ‘what if’ scenario, or could be taken from a project plan showing that they will be introduced at a future date.

The thickness of the arrows reflects the number of risks moving between each block. So, of the eight intrinsic risks at Very High , when their residual status is assessed, three remain at this level, one becomes High , three become Medium-High , and one becomes Medium .

The diagram is interactive. As you move your mouse over the blocks and the arrows, a pop-up box lists the risks. These are all linked back to the main risk table in RiskTree, so you can drill into the data in the same way you can with all of our other charts.

The RiskTree Processor tabs, showing the location of RiskFlow

As with all of our other upgrades, RiskFlow will work for reports that have been previously created. Just reload the report file and you should see the new tab in the Risk Charts section.

Reaction to RiskFlow from our testers has been uniformly positive. We hope that you’ll feel the same way when you get to see it.

RiskTree update

We’ve just updated RiskTree ready for 2020. The changes include:

  • You can now view charts showing how you’ve used tags in your RiskTree and reports.
  • The search functionality uses tabs to display its results (much clearer!) and is now available in the RiskTree Processor, as well as the Designer.
  • The risk labels on the RiskSpider and RiskGraph data visualizations can now be wrapped to two lines (no more truncated risk names!).
  • Improvements to the CSV file downloads.
  • A number of other minor tweaks and improvements.


Welcome to the 2T Security blog. We’ve created this so that we can write about security and risk topics that interest us, and that might be of interest to the community. We’ll also be writing about our RiskTree process for risk management, especially where this doesn’t fit in with the standard help pages.

We’re looking forward to the conversation!