TY - GEN
T1 - Poster
T2 - 15th ACM International Conference on Mobile Systems, Applications, and Services, MobiSys 2017
AU - Shen, Feng
AU - Del Vecchio, Justin
AU - Mohaisen, Aziz
AU - Ko, Steven Y.
AU - Ziarek, Lukasz
PY - 2017/6/16
Y1 - 2017/6/16
N2 - This paper proposes a new technique for detecting mobile malware based on information ow analysis. Our approach focuses on the structure of information ows we gather in our analysis, and the patterns of behavior present in information ows. Our analysis not only gathers simple ows that have a single source and a single sink, but also Multi-Flows that either start from a single source and ow to multiple sinks, or start from multiple sources and ow to a single sink. This analysis captures more complex behavior that both recent malware and recent benign applications exhibit. We leverage N-gram analysis to understand both unique and common behavioral patterns present in Multi-Flows. Our tool leverages N-gram analysis over sequences of API calls that occur along control ow paths in Multi-Flows to precisely analyze Multi-Flows with respect to app behavior. Using our approach, we show that there is a need to look beyond simple ows in order to effectively leverage information ow analysis for malware detection. By analyzing recently-collected malware, we show there has been an evolution in malware beyond simply collecting sensitive information and immediately exposing it. Many previous systems focus on identifying the existence of simple information ows|i.e. considering an information ow as just a (source, sink) pair. However, modern malware performs complex computations before, during, and after collecting sensitive information and tends to aggregate data before exposing it. A simple (source, sink) view of information ow does not adequately capture such behavior. The uniqueness of our approach comes from the following two features. First, our information ow analysis represent an information ow not as a simple (source, sink) pair, but as a sequence of API calls. This gives us the ability to distinguish different ows with same sources and sinks based on the computation performed along the information ow. Second, our information ow analysis detects Multi-Flows, ows that either start with a single source and ow to multiple sinks, or start with multiple sources and ow to a single sink. We treat such ows as a single ow, instead of multiple distinct ows. Fig. 1 shows a comparison of a Multi-Flow and its corresponding simple ows. This allows us to examine the structure of the ows themselves. We leverage machine learning techniques to extract features from Multi-Flows and their API sequences (N-gram analysis) and use these features to perform SVM-based classification. Based on this approach, we build an open source implementation of Multi-Flow analysis and API sequencing in the BlueSeal framework [2] [3] [1], along with N-gram analysis and a SVM-based classifier. We also conduct a detailed evaluation study, highlighting the differences in old and new apps. We leverage the app behavior extracted as features from both Multi-Flows and their API usage patterns and apply machine learning techniques to automatically identify malware based on the structure of its computation over sensitive data. We test our tool on a set of 1,576 benign apps downloaded from Google Play and 2,422 known malicious apps. Our results show that app behavior difference on sensitive data can be a significant factor in malware detection.
AB - This paper proposes a new technique for detecting mobile malware based on information ow analysis. Our approach focuses on the structure of information ows we gather in our analysis, and the patterns of behavior present in information ows. Our analysis not only gathers simple ows that have a single source and a single sink, but also Multi-Flows that either start from a single source and ow to multiple sinks, or start from multiple sources and ow to a single sink. This analysis captures more complex behavior that both recent malware and recent benign applications exhibit. We leverage N-gram analysis to understand both unique and common behavioral patterns present in Multi-Flows. Our tool leverages N-gram analysis over sequences of API calls that occur along control ow paths in Multi-Flows to precisely analyze Multi-Flows with respect to app behavior. Using our approach, we show that there is a need to look beyond simple ows in order to effectively leverage information ow analysis for malware detection. By analyzing recently-collected malware, we show there has been an evolution in malware beyond simply collecting sensitive information and immediately exposing it. Many previous systems focus on identifying the existence of simple information ows|i.e. considering an information ow as just a (source, sink) pair. However, modern malware performs complex computations before, during, and after collecting sensitive information and tends to aggregate data before exposing it. A simple (source, sink) view of information ow does not adequately capture such behavior. The uniqueness of our approach comes from the following two features. First, our information ow analysis represent an information ow not as a simple (source, sink) pair, but as a sequence of API calls. This gives us the ability to distinguish different ows with same sources and sinks based on the computation performed along the information ow. Second, our information ow analysis detects Multi-Flows, ows that either start with a single source and ow to multiple sinks, or start with multiple sources and ow to a single sink. We treat such ows as a single ow, instead of multiple distinct ows. Fig. 1 shows a comparison of a Multi-Flow and its corresponding simple ows. This allows us to examine the structure of the ows themselves. We leverage machine learning techniques to extract features from Multi-Flows and their API sequences (N-gram analysis) and use these features to perform SVM-based classification. Based on this approach, we build an open source implementation of Multi-Flow analysis and API sequencing in the BlueSeal framework [2] [3] [1], along with N-gram analysis and a SVM-based classifier. We also conduct a detailed evaluation study, highlighting the differences in old and new apps. We leverage the app behavior extracted as features from both Multi-Flows and their API usage patterns and apply machine learning techniques to automatically identify malware based on the structure of its computation over sensitive data. We test our tool on a set of 1,576 benign apps downloaded from Google Play and 2,422 known malicious apps. Our results show that app behavior difference on sensitive data can be a significant factor in malware detection.
UR - https://www.scopus.com/pages/publications/85026239531
U2 - 10.1145/3081333.3089315
DO - 10.1145/3081333.3089315
M3 - Conference contribution
AN - SCOPUS:85026239531
T3 - MobiSys 2017 - Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services
SP - 171
BT - MobiSys 2017 - Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services
PB - Association for Computing Machinery, Inc
Y2 - 19 June 2017 through 23 June 2017
ER -