In this post, we will learn how using a Graph Database like Neo4j can help visualize malware relationships and extend these relationships to identify patterns between samples. Before we dig into Neo4j, let’s start with some fundamental graph terminologies:
Nodes represent entities such as a human, car, laptop or phone.
Properties are attributes nodes can contain. A steering wheel or tires would be a property of the “car” node.
Labels are a way to group together nodes of a similar type. For example, a label of “FastFood” may include nodes such as “Taco Bell, McDonald’s, and Chipotle”.
Edges (or vertices) represent the relationship connection between two nodes. Relationships can also have their own properties.
Getting started with Neo4j
Link: https://neo4j.com/
Neo4j is a Graph Database commonly known for its pure simplicity and easy to use interface. I find the structure of a graph database quite fascinating, on top of learning how to normalize malware analysis data for each sample into a schema that works for a graph database. To get started, we first need to get a Neo4j instance running. The quickest way to do this is docker. Once you have docker installed (https://docs.docker.com/install/), you can quickly pull down a Neo4j docker image using the following command:
docker pull neo4j