Espionage Droid Sandbox -Our Datasets



Dataset 1:
Toward Developing a Systematic Approach to Generate Benchmark Android Malware Datasets and Classification (CICAndMal2017)


We generate a new Android malware dataset, named CICAndMal2017, which is fully labeled and includes network traffic, logs, API/SYS calls, phone statistics, and memory dumps of 42 malware families. The following fiqure shows the network architecture for the experiment set up and configuration. In this experiment, we used three laptops which are connected to three real smart-phones.


In order to create a comprehensive scenario for data capturing, we first created a taxonomy to understand the malware behavior and its characteristics. We classified the taxonomy into 20 types of attacks (A1-A20) and 4 types of C&C communications (C1-C4) as shown in the below table.

Based on the provided taxonomy, we created a specific scenario for each of the malware categories. The below table shows the list of the scenario for each malware category.

Also, we defined three states of data capturing in order to overcome the stealthiness of an advanced malware:
1) Installation: The first state of data capturing which occurs immediately after installing malware (1-3 min).
2) Before restart: The second state of data capturing which occurs 15 min before rebooting phones.
3) After restart: The last state of data capturing which occurs 15 min after rebooting phones.

For feature Extraction and Selection:
We captured network traffic features (.pcap files), and extracted more than 80 features by CICFlowMeter-V3 during all three mentioned states (installation, before restart, and after restart).

The full research paper outlining the details of the dataset and its underlying principles:

Arash Habibi Lashkari, Andi Fitriah A.Kadir, Laya Taheri, and Ali A. Ghorbani, “Toward Developing a Systematic Approach to Generate Benchmark Android Malware Datasets and Classification”, In the proceedings of the 52nd IEEE International Carnahan Conference on Security Technology (ICCST), Montreal, Quebec, Canada, 2018.


                                                                  **************************************************************************************************


Dataset 2:
Android Adware and General Malware Dataset (AAGM):


A labeled dataset of mobile malware traffic from real smartphones, built with nine new flow-based network traffic features. This dataset includes 1900 benign and malicious apps in 12 different families. The 400 malware apps are from two categories: adware (250), and general malware (150).
The operations of creating the AAGM dataset are divided into three phases, which you can see in below figure.

To further analyze our dataset, we employed Droidkin, a lightweight detector of the similarity of Android apps. Droidkin is used to investigate the relationships between each apps category: adware, general malware, and benign. The following figure visualizes the result of the detection analysis. The red circles represent categories, and the small black circles represent the apps that belong to those categories. Overall, there is a weak-relationship between these three categories.
For more information about this dataset, please find the related published paper here:

Arash Habibi Lashkari, Andi Fitriah A.Kadir, Hugo Gonzalez, Kenneth Fon Mbah and Ali A. Ghorbani, “Towards a Network-Based Framework for Android Malware Detection and Characterization”, In the proceedings of the 15th International Conference on Privacy, Security and Trust, PST, Calgary, Canada, 2017.




To requesting the datasets, please visit following links:
Dataset Link Description
CICAndMal2017 CICAndMal2017 Android malware while running on real phones, alongwith network traffic, memory dump, logs, permission, API calls, and phone statistics data
AAGM AAGMAndroid Adware and General Malware Dataset
In the case that you've used our datasets in your research, please cite their publsihed paper as well.