Introduction to Malware Analysis

Why malware analysis

Malware analysis (“MA”) is a fun and excited journey for anyone new or seasoned in the career field. Taking a specimen (malware sample) and reverse engineering it to better understand its inner workings can be a long, tedious adventure. With the sheer number of malware samples circulating the internet, in addition to the various formats specimens are found in, makes malware analysis a good challenge. Outside of learning MA as a hobby, here are some other reasons why we perform malware analysis:

  • To better understand how a specimen works. This may yield certain unique attributes about how the malware was written, methods it performs or its dependencies.

How do I get started?!

If you’re new to malware analysis, you want to ensure you’ve taken the right precautions before handling any malicious code. These series of posts will cover the following objectives:

  • Gather additional readings and resources that helped me get started.

Resources to get you started

Books

Research/training websites

Operational Security (OpSec)

When handling malicious code, there are some best practices that should be abided by to avoid any adverse effects of working with malicious code. Especially if you perform any malware analysis at your day job, you don’t want to be responsible for infecting your work laptop, having the malware spread to other systems, or worse, grant an adversary access to your system.

  • Don’t visit an attackers’ infrastructure, especially from corporate or company networks: We call this “don’t poke the bear”. Why?! I’m glad you asked…. For starters, you may inadvertently tip off the attackers. If you visit the attackers’ server(s) manually (e.g. web browser, wget, curl, etc.) or depending on the specimen you’re working with, if you run it and allow it to communicate back to the attackers’ server(s), the attacker may notice this activity coming from your Internet Protocol (“IP”) range or from a different user agent. If the attacker deployed the malware to a group of specific target networks or regions, this would look suspicious to the attacker. As a result, the attacker could burn down their infrastructure, document your IP space or roll out new malware to the target. While these may be extreme cases, it’s generally good practice to not let the malware communicate over the internet unless you know what you’re doing.

Malware in the wild

The figure below outlines some of various malware formats that crosses my lab. We will go into these formats and a few others in later posts.

Image for post
Image for post
Figure 1: Common malware formats

Types of malware analysis

With modern malware analysis, initial triage is usually handled by an automated sandbox solution such as Cuckoo sandbox. However, a malware sandbox is not always effective and malware analysts may need to resort to manual analysis, especially when they’re in the field and where time is everything. While under the clock, initial triage of any malware sample should take roughly between 15–30 minutes on average to yield results. However, the speed of analysis is subject to the type of specimen and experience of the analyst. Most of the time, samples are submitted to a sandbox in the background while the analyst extracts additional details from the sample, sometimes comparing notes between the sandbox output and manual triage.

For example, when using Cuckoo sandbox, I may see a registry key created. Using my local analysis VM, I can reproduce the same results and acquire the registry key data and dump the raw bytes as needed. It’s not uncommon to spend a day or even a week performing analysis, depending on the sample or level of analysis required. With most samples, running them inside an automated sandbox does the trick, but if the sample performs any additional operations, such as runtime decryption of a configuration file, anti-debugging or anti-VM, we may need to take a deeper dive.

In general, I’ve found that malware analysis can be broken down into four separate categories.

Image for post
Image for post
Figure 2: Malware triage categories

Virtual machines

Virtual machines (“VM’s”) are a must have for any malware analyst. Unless you have the proper tooling in place (e.g. hard drive cloning), it’s best to setup a VM for each flavor of Operating System and bitness (i.e. x86 and x64).

Image for post
Image for post
Figure 3: Win7 VM after setting up the Flare-VM

When setting up a malware environment, I find that one of three virtual applications are commonly used:

  • VMware Workstation

When setting up new virtual machines, I recommend you review the following items:

Image for post
Image for post
Figure 4: Host only networking configuration

NAT/Bridged with internet

  • Simulated. In this configuration, you set up another isolated network in which only VM’s on the network can communicate with each other, without any internet access. For example, VM1 and VM2 could be set on a “internal” only network. You can then route all web traffic from VM1 to VM2. VM2 could have Wireshark running so that it can collect network packets.

Dealing with obfuscation

Throughout your malware analysis journey, you will encounter blocks of code or text with various levels of obfuscation, that is, data which is purposefully modified to make analysis harder. Some of the common obfuscation techniques include Base64, char, ord, concatenation, code comments, string replacement, xor and raw byte streams, just to name a few. To make matters worse, some specimens use a “layered” approach by combining the same obfuscation techniques multiple times or using a different obfuscation technique per layer. We cover a basic example of this in the next section.

To combat most obfuscation, I recommend using a toolkit called “CyberChef”. This tool comes loaded with many common encoding and encryption routines, which can be chained together for any layered obfuscation or encryption. The best way to get CyberChef up and running is to install and run it inside a docker container:

Image for post
Image for post
Figure 5: Terminal output after running the command “sudo docker pull remnux/cyberchef”

Once you pull down the docker container, run it using the command below:

## MacOS/Linux

sudo docker run -d -p 8080:80 remnux/cyberchef

## Visit http://localhost:8080 in any web browser

CyberChef is excellent at handling single layer obfuscation tactics, but it can also handle layered obfuscation with ease. For example, say we run into the hex string below:

53 58 4e 75 4a 33 51 67 64 47 68 70 63 79 42 6d 64 57 34 2f 49 51 3d 3d

If we paste that string of hex into CyberChef and choose the “From Hex” for your first layer and “From Base64” as the second layer, your should see the plain text “Isn’t this fun?!. CyberChef supports drag and drop too. To make things even better, you can save this as a “recipe” using the “Save recipe” button at the bottom. This is excellent when you want to share or reuse the recipe.

Image for post
Image for post
Figure 6: CyberChef inside the web browser

In the next blog post, we will go into our first malware analysis category, basic static analysis.

Posting on various topics including incident response, malware analysis, development and finance/investing automation.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store