TY - GEN
T1 - AndroCT
T2 - 18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021
AU - Li, Wen
AU - Fu, Xiaoqin
AU - Cai, Haipeng
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/5
Y1 - 2021/5
N2 - Data-driven approaches have proven to be promising in mobile software analysis, yet these approaches rely on sizable and quality datasets. For Android app analysis in particular, there have been several well-known datasets that are widely used by the community. However, there is still a lack of such datasets that represent the run-time behaviors of apps - existing datasets are largely static, whereas run-time datasets are essential for data-driven dynamic and hybrid analysis of apps. In this paper, we present AndroCT, a large-scale dataset on the run-time traces of function calls in 35, 974 benign and malicious Android apps from ten historical years (2010 through 2019). These call traces were produced by running each sample app against automatically generated test inputs for ten minutes. Moreover, each app was exercised both on an emulator and a real device, and the traces were separately curated. AndroCT has been used to build a novel dynamic profile of Android apps that has enabled several effective techniques and informative empirical studies concerning Android app security. We describe what this dataset includes, how it was created and stored, and how it has been used in past and would be used in the future.
AB - Data-driven approaches have proven to be promising in mobile software analysis, yet these approaches rely on sizable and quality datasets. For Android app analysis in particular, there have been several well-known datasets that are widely used by the community. However, there is still a lack of such datasets that represent the run-time behaviors of apps - existing datasets are largely static, whereas run-time datasets are essential for data-driven dynamic and hybrid analysis of apps. In this paper, we present AndroCT, a large-scale dataset on the run-time traces of function calls in 35, 974 benign and malicious Android apps from ten historical years (2010 through 2019). These call traces were produced by running each sample app against automatically generated test inputs for ten minutes. Moreover, each app was exercised both on an emulator and a real device, and the traces were separately curated. AndroCT has been used to build a novel dynamic profile of Android apps that has enabled several effective techniques and informative empirical studies concerning Android app security. We describe what this dataset includes, how it was created and stored, and how it has been used in past and would be used in the future.
KW - Android apps
KW - Dataset
KW - Function calls
KW - Tracing
UR - https://www.scopus.com/pages/publications/85113627287
U2 - 10.1109/MSR52588.2021.00076
DO - 10.1109/MSR52588.2021.00076
M3 - Conference contribution
AN - SCOPUS:85113627287
T3 - Proceedings - 2021 IEEE/ACM 18th International Conference on Mining Software Repositories, MSR 2021
SP - 570
EP - 574
BT - Proceedings - 2021 IEEE/ACM 18th International Conference on Mining Software Repositories, MSR 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 May 2021 through 19 May 2021
ER -