Revealing malware relationships with GraphDB: Part 1

6 min readFeb 7, 2019

In this post, we will learn how using a Graph Database like Neo4j can help visualize malware relationships and extend these relationships to identify patterns between samples. Before we dig into Neo4j, let’s start with some fundamental graph terminologies:

Nodes represent entities such as a human, car, laptop or phone.

Properties are attributes nodes can contain. A steering wheel or tires would be a property of the “car” node.

Labels are a way to group together nodes of a similar type. For example, a label of “FastFood” may include nodes such as “Taco Bell, McDonald’s, and Chipotle”.

Edges (or vertices) represent the relationship connection between two nodes. Relationships can also have their own properties.

Getting started with Neo4j


Neo4j is a Graph Database commonly known for its pure simplicity and easy to use interface. I find the structure of a graph database quite fascinating, on top of learning how to normalize malware analysis data for each sample into a schema that works for a graph database. To get started, we first need to get a Neo4j instance running. The quickest way to do this is docker. Once you have docker installed (, you can quickly pull down a Neo4j docker image using the following command:

docker pull neo4j

Once you have the image downloaded to your system, you can start the container by running the command below:

docker run \

— publish=7474:7474 — publish=7687:7687 \

— volume=$HOME/neo4j/data:/data \


If all goes well, you should see some standard output in your console, including the line:

INFO Remote interface available at http://localhost:7474/

If you navigate to this url in your browser, you should be prompted to login to the Neo4j docker container using the default credentials “neo4j/neo4j”. After logging in and changing your password, you can now begin exploring the interface. If you’re new to Neo4j, I would recommend digging into the “Learning about Neo4j” section, so you can get a handle on the syntax for searching and updating node or edges in the database.


Posting on various topics including incident response, malware analysis, development and finance/investing automation.