Streaming Global Cyber Attack Analytics with Tableau and Python


Introduction and Hacks

As grandiose a notion as the title may imply, there have been some really promising and powerful moves made in the advancement of smoothly integrating real-time and/or streaming data technologies into most any enterprise reporting and analytics architecture. When used in tandem with functional programming languages like Python, we now have the ability to create enterprise grade data engineering scripts to handle the manipulation and flow of data, large or small, for final consumption in all manner of business applications.

In this cavalcade of coding, we're going to use a combination of Satori, a free data streaming client, and python to stream live world cyber attack activity via an api. We'll consume the records as json, and then use a few choice python libraries to parse, normalize, and insert the records into a mysql database. Finally, we'll hook it all up to Tableau and watch cyber attacks happen in real time with a really cool visualization.


The Specs

For the this exercise, we're going to bite things off a chunk at a time. We're going to utilize a service called Satori, a streaming data source aggregator that will make it easy for us to hook up to any number of streams to work with as we please. In this case, we'll be working with the Live Cyber Attack Threat Map data set. Next, we'll set up our producer code that will do a couple of things. First it will create the API client from which we will be ingesting a constant flow of cyber attack records. Next, we'll take these records and convert them to a data frame using the Pandas library for python. Finally, we will insert them into a MySQL database. This will allow us to use this live feed as a source for Tableau in order to create a geo mapping of countries that are currently being targeted by cyber attacks.


The Data Source

Satori Logo

Satori is a new-ish service that aggregates the web's streaming data sources and provides developers with a client and some sample code that they can then use to set up their own live data streams. While your interests may lie in how you can stream your own company's data, it then simply becomes a matter of using python's requests library to get at whatever internal sources you might need. Find more on the requests library here.

Satori has taken a lot of the guess work out of the first step of the process for us, as they provide basic code samples in a number of popular languages to access their streaming service and to generate records. You can find the link to this code in a number of popular languages here. Note that you'll need to install their client and get your own app key. I've added a bit of code at the end to handle the insertion of records, and to continue the flow, should any records produce a warning.


Satori Code

# Imports
from __future__ import print_function

import sys  
import threading  
from pandas import DataFrame  
from satori.rtm.client import make_client, SubscriptionMode

# Local Imports
from create_table import engine

# Satori Variables
channel = "live-cyber-attack-threat-map"  
endpoint = "wss://open-data.api.satori.com"  
appkey = " "

# Local Variables
table = 'hack_attacks'


def main():

    with make_client(
            endpoint=endpoint, appkey=appkey) as client:

        print('Connected!')

        mailbox = []
        got_message_event = threading.Event()

        class SubscriptionObserver(object):
            def on_subscription_data(self, data):
                for message in data['messages']:
                    mailbox.append(message)
                got_message_event.set()

        subscription_observer = SubscriptionObserver()
        client.subscribe(
            channel,
            SubscriptionMode.SIMPLE,
            subscription_observer)

        if not got_message_event.wait(30):
            print("Timeout while waiting for a message")
            sys.exit(1)

        for message in mailbox:
                # Create dataframe
                data = DataFrame([message],
                                 columns=['attack_type', 'attacker_ip', 'attack_port',
                                          'latitude2', 'longitude2', 'longitude',
                                          'city_target', 'country_target', 'attack_subtype',
                                          'latitude', 'city_origin', 'country_origin'])
                # Insert records to table
                try:
                    data.to_sql(table, engine, if_exists='append')

                except Exception as e:
                    print(e)

if __name__ == '__main__':  
    main()


Creating a Table

Now that we've set up the streaming code that we'll use to fill our table, we'll need to set up the table in MySQL to hold them all. For this we'll use the SQLAlchemy ORM (object relational mapper). It's a high falutin' term for a tool that simply abstracts SQL commands to be more 'pythonic'; that is, you don't necessarily have to be a SQL expert to create tables in your given database. Admittedly, it can be a bit daunting to get the hang of, but give it a shot. Many developers choose to interact a with a given database either via direct SQL or using an ORM. It's good practice to use a separate python file, in this case settings.py (or some variation thereof), to hold your database connection string in the following format (the addition of the mysqldb tag at the beginning is as a result of the installation of the mysql library you'll need for python), entitled SQLALCHEMY_DATABASE_URI:

'mysql+mysqldb://db_user:pass@db_host/db_name'  

Don't forget to sign in to your database to validate success!


Feeding MySQL and Tableau

Now all we need to do is turn on the hose and watch our table fill up. Running producer.py, we can then open a new tab, log in to our database to make sure our table is being populated, and go to work. Create a new connection to your MySQL database (called my db 'hacks') in Tableau and verify that everything is in order once you navigate to the data preview. There are lots of nulls in this data set, but this will simply be a matter of filtering them out on the front end.

Tableau should pick up right away on the geo data in the dataset, as denoted by the little globe icon next to the field.
We can now simply double-click on the corresponding geo data field, in this case we'll be using Country Target, and then the Number of Records field in the Measures area.
I've chosen to use the 'Dark' map theme for this example as it just really jives with the whole cyber attack, international espionage vibe. Note that you'll need to maintain a live connection, via Tableau, to your datasource and refresh at the interval you'd like, if using Tableau Desktop. If you're curious about how to automagically provide for this functionality, a quick google search will come up with some solutions.

Subscribe to Rittman Mead

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!

Subscribe to Our
Monthly Newsletter!

* indicates required