Recently, the number of cases of distributing malicious codes by exploiting homepages that provide an image search continues to increase, and malicious codes distributed through homepages are causing personal information infringement accidents and DDoS attacks. Due to the malware spread through web pages, privacy theft and infringement are getting serious and DoS attacks happen frequently. Distribution patterns of hidden malicious codes on the image search website were collected, and patterns of collected malicious codes and malicious scripts were analyzed. We have analyzed the malicious samples and derived some additional distribution patterns of web-based malware. Similar patterns are grouped together and a representative feature is then extracted from each group. Each category of the malicious samples contains malicious script codes and their variants. We have implemented a system to automatically detect malicious web sites using the malicious script patterns. The proposed malicious script pattern is expected to be available for the zero-day attacks. |
1.IntroductionThe personal information breach and DDos attack, stealing personal information after infecting a user’s PC via malicious code, are recently on the rise. Among various routes of the infection, distributing malicious codes after exploiting the vulnerabilities of websites and user PCs is constantly growing.1 There are two types of infectious websites containing malicious code: one is a distribution site that conceals malicious code in it, and the other is an intermediate site that automatically connects a PC to a distribution site through an address implanted in the code. Hackers open up a distribution site containing malicious code, hack an intermediate site, and insert the URL of the distribution site. As such, any visitor to an intermediate site would be unwittingly induced to a distribution site and have his/her PC infected with malicious code. Moreover, high-traffic websites, such as portals, blogs, or bulletin boards, become the target of hackers. Thus, detecting malicious code is getting more difficult as hackers obfuscate such code before implanting on the target websites.2 The methods of detecting hidden malicious codes on a website can be categorized into two. The first one is the signature-based detection method, which intends to check where malicious code is inserted in the source code of a website.3 It shows a fast-detection speed but lower detection performance in the face of a zero-day attack. The second one is the behavior-based detection method, which is designed to detect malicious code by accessing a website and tracking status changes, such as the modulation of a visitor’s PC files or the execution of malicious code for detection.4 It manifests slow detection speed but shows high detection performance for a zero-day attack. This paper proposes a new detection method, which is distinguished from the existing methods, to check the obfuscation of malicious web page scripts for a pattern of detection.5 It is designed to extract the patterns of malicious distribution by analyzing malicious scripts distributed via web pages.6 As the extracted patterns are registered and checked in detection rules, the method maintains the speed of the signature-based detection. It can also detect a zero-day attack.7 Research on pattern and technology trends of distribution of malicious codes. Through this process, it presents trends of changes in patterns of distribution of malicious code and predicted trends in the future through analysis of patterns of distribution of new malicious codes. Based on collected distribution patterns, it analyzes vulnerability types and patterns and develops detection algorithms and similarity comparison algorithms by distribution patterns of malicious codes.8 It is going to develop automated modules that analyze patterns of distribution of malicious codes by applying developed algorithms.9 In this paper, we compared three recent methods for detecting malicious codes on image providing websites. We compared the signature-based detection method, the behavior-based detection method, and the script pattern-based detection method. The purpose of this paper is to efficiently detect malicious codes that target image providing sites.10 The limitation of this paper is that it is difficult to collect various malicious code scripts.11 Moreover, this paper consists of five sections. Section 2 describes the security vulnerabilities of web application and the distribution paths of malicious code in the existing study and provides the analysis and classification of the study on website detection methods. Meanwhile, Section 3 demonstrates the analysis of distribution patterns in malicious scripts and describes the website detection method. Section 4 explains the experiment conducted based on the proposed method and its results. Lastly, Section 5 shows the conclusion of this paper. 2.Existing Study2.1.Security Vulnerabilities of Web ApplicationHackers exploit the vulnerabilities of a web application to conceal malicious code within web pages used as an intermediate site for distribution and spreading of malicious code.12 Open Web Application Security Project (OWASP) defines the security vulnerabilities of a web application targeted by hackers, as indicated in Table 1. Table 1Web application security vulnerability and response method.
2.2.Paths of Malicious Code DistributionThere are two main paths where PCs of Internet users get infected with malicious code: one is via a distribution site hiding malicious code in it, and the other is via an intermediate site concealing executable code redirecting a page automatically to a distribution site. Figure 1 shows an intermediate section and a distribution section used in the distribution of malicious code on web pages.13
2.3.Malicious Code Detection SystemGoogle Safe Browsing, MS HoneyMonkey, and UW Spycrawler are the representative malicious code detection systems overseas. They are intended to detect a malicious distribution site, a malicious zero-day attack, a spyware website, and others.15
Table 2 shows the comparison of malicious code detection methods between Google, MS, and University of Washington. It indicates that behavior-based detection has low detection performance. Table 2Malicious code detection method comparison.
3.Proposed Method of Malicious Script Pattern AnalysisIf a website is hacked and the update file is forged, all user PCs that download the file will be infected with malware. For this reason, real-time checking is essential for forgery detection. As shown in Fig. 4, the website sensor network detects update file forgery in the following manner: The website operator registers the update file on the inspection server before distributing it to the users. It is important for the website operator to register the update file and then distribute it to the users. Forgery detection of update files is performed in real time by comparing the hash value of the update file on the website with the registered hash value. In addition to the existing signature-based analysis and behavior-based analysis, this paper proposes a pattern analysis method of a malicious script to verify malicious distribution sites, which can provide a good detection speed and scalability to the zero-day attack detection. 3.1.Analyzing Malicious Script Distribution PatternsWe analyzed malicious script distribution patterns by crawling 500 websites locally and abroad, confirmed as malicious distribution sites, to provide the analysis of malicious script patterns. As a result, 95% of them use a form of web scripts, and the remaining 5% have scripts inserted in multimedia files. The analyzed patterns of distribution are listed in Table 3. Table 3Analysis results on the distribution pattern of malicious scripts.
In addition to the existing signature-based analysis and behavior-based analysis, this paper proposes a pattern analysis method of a malicious script to verify malicious distribution sites, which can provide a good detection speed and scalability to the zero-day attack detection.
Table 4 shows 10 detection algorithms used in the detection of malicious script patterns. The first and second are the algorithms detecting the encoding and obfuscation of large-scale character strings. The third pattern is used in obfuscating malicious script when a web page source code is dissembled and reassembled. The fourth pattern of eval function is used in obfuscation to change character strings expressed in formula into numbers. The fifth pattern is derived because special characters and symbols are frequently used within script tags in case the source is encoded or obfuscated. The encoding method of us-ascii and jscript.encode in the sixth pattern is added because it is not used in normal web pages and is frequently used in a malicious script. The seventh pattern is the case where a file is called in for the distribution of malicious code as if a multimedia file is called in. The eighth and ninth patterns are to detect the cases where an exe executable file and a web shell script are used in a malicious exploit file. The 10th pattern is to detect the case where a malicious distribution file is secretly called in using the img tag, which shows an image file. Table 4Malicious script detection algorithm.
3.2.Proposed Detection Method Based on Malicious Script PatternThe malicious script distribution patterns proposed in this paper can also be detected on normal web pages. Therefore, it is hard to say a web page is exploited as an intermediate/distribution site even if a pattern is detected. However, the web pages showing such patterns can be categorized as suspected web pages. Moreover, the importance would be used to improve certainty. In this proposal, we calculated the importance based on the frequency of malicious script patterns used on normal web pages to provide the level of suspicion quantitatively.
Table 5 shows the importance calculated after measuring the false positive detection rate of each malicious script distribution algorithm based on a roughly 100K white list provided by the Korea Internet and Security Agency. The first, fifth, and seventh algorithms show low importance because of the high false-positive detection rate. The low importance indicates that the algorithm using the relevant distribution patterns are actively used in normal web pages as well. The proposed method is effective because it detects with a script pattern, the algorithm complexity is low, and the load on memory is small. Table 5Calculation of importance.
3.3.Cyber Training EducationCyber training education is possible through detection through malicious script patterns presented in this paper. Attackers can provide training for hacking through malicious script patterns. On the contrary, defenders can train to detect hacking attacks using malicious scripts. Hacking attacks on mutual websites and battles for hacking detection are possible. These exercises allow high levels of cyber training. 4.ExperimentAs shown in Fig. 5, the proposed detection method based on a malicious script distribution pattern is intended to collect information from websites, match distribution patterns, and detect the malicious distribution and intermediate sites. It is designed to respond to a zero-day attack by constantly adding malicious distribution patterns through new analysis on distribution patterns. Static analysis of the detection system involves collecting a website to be checked by using a crawler, which is a collection tool, and checking the website source code for malicious scripts using anomaly detection patterns. The anomaly detection patterns check the website sources for two types of anomalies, which are website links for spreading malware and malicious scripts for attacking a PC’s vulnerabilities. For this reason, it is important to collect and analyze as many cases of malware as possible and register them as detection patterns to detect them when they are concealed in the detection system. The experiment on the proposed detection method was conducted using algorithms extracted from 500 samples of malicious distribution and hopping sites revealed by Google and MS. The weight was added to the algorithms based on importance. If the calculated risk score reaches 60 and above in the experiment, it is considered detected. The results based on the malicious distribution pattern algorithm are shown in Table 6. Algorithms show high detection rates except for the first, second, and third ones in Table 6. These three algorithms show a relatively high nondetection rate because of the limitations in developing detection algorithms; there are various ways of dissembling and reassembling character strings. Table 6Malicious script detection algorithm.
Figure 6 shows changes in the percentages of suspected web pages depending on thresholds. The higher the threshold is set, the lower the risk rate of web pages is. In particular, the percentage sharply drops around 86 points. Thus, it is possible to analyze websites based on risk importance using the proposed malicious script distribution patterns. 5.ConclusionAlthough the distribution of malicious code exploiting the vulnerabilities of websites and web browsers is recently increasing, the existing detection methods of malicious code do not consider the characteristics of the web. There are two existing methods of malicious code detection: a signature-based detection to identify malicious code and a behavior-based detection to track the status changes on a website. The signature-based detection provides a high detection speed but is not effective in detecting a zero-day attack, whereas the behavior-based detection is effective for a zero-day attack but has slow detection speed. Therefore, a detection method in the consideration of the characteristics of the web is required to effectively detect attacks on websites. This paper proposed a method to detect the distribution and intermediate sites based on malicious script distribution patterns. We extracted distribution patterns through the analysis and classification of common characteristics after analyzing the scripts of malicious distribution sites. Whereas the existing detection method focuses on the execution of malicious code, the proposed distribution pattern detection method uses the analysis of client scripts to consider the characteristics of the web to improve the speed and capabilities of detection. As the number of incidents where malicious distribution sites exploit the vulnerabilities of smart devices is growing, research on malicious script distribution patterns for the mobile web is necessary in the future. The purpose of this study is to introduce a system for detecting malware distributed through websites. It is expected that this system will detect malware quickly and scientifically and grasp the hidden purpose of the attackers. The advantage of this method is that it can detect patterns quickly, but the downside is that it cannot detect new, unregistered malware. ReferencesB. A. Khalaf et al.,
“Comprehensive review of artificial intelligence and statistical approaches in distributed denial of service attack and defense methods,”
IEEE Access, 7 51691
–51713
(2019). https://doi.org/10.1109/ACCESS.2019.2908998 Google Scholar
Y. Wang et al.,
“Detecting stealth software with strider GhostBuster,”
in Proc. DSN,
368
–377
(2005). https://doi.org/10.1109/DSN.2005.39 Google Scholar
S. Singhal, U. Chawla and R. Shorey,
“Machine learning and concept drift based approach for malicious website detection,”
in Int. Conf. Commun. Syst. & Netw. (COMSNETS),
582
–585
(2020). https://doi.org/10.1109/COMSNETS48256.2020.9027485 Google Scholar
K. Rieck et al.,
“Learning and classification of malware behavior,”
Lect. Notes Comput. Sci., 5137 108
–125
(2008). Google Scholar
H. Mishra, R. K. Karsh and K. Pavani,
“Anomaly-based detection of system-level threats and statistical analysis,”
in Smart Comput. Paradigms: New Progr. and Chall.,
271
–279
(2019). https://doi.org/10.1007/978-981-13-9680-9_23 Google Scholar
J. Wang, A. Ghosh and Y. Huang,
“Web canary: a virtualized web browser to support large-scale silent collaboration in detecting malicious web sites,”
in Proc. CollaborateCom,
24
–33
(2008). https://doi.org/10.1007/978-3-642-03354-4_3 Google Scholar
P. Likarish, E. Jung and I. Jo,
“Obfuscated malicious javascript detection using classification techniques,”
in Proc. MALWARE,
47
–54
(2009). https://doi.org/10.1109/MALWARE.2009.5403020 Google Scholar
R. Panigrahi et al.,
“A consolidated decision tree-based intrusion detection system for binary and multiclass imbalanced datasets,”
Mathematics, 9
(7), 751
(2021). https://doi.org/10.3390/math9070751 Google Scholar
S. A. Mostafa et al.,
“Formulating layered adjustable autonomy for unmanned aerial vehicles,”
Int. J. Intell. Comput. Cybern., 10 430
–450
(2017). https://doi.org/10.1108/IJICC-02-2017-0013 Google Scholar
C. L. Chowdhary et al.,
“Analytical study of hybrid techniques for image encryption and decryption,”
Sensors, 20
(18), 5162
(2020). https://doi.org/10.3390/s20185162 SNSRES 0746-9462 Google Scholar
N. Khan, J. Abdullah and A. S. Khan,
“Defending malicious script attacks using machine learning classifiers,”
Wireless Commun. Mobile Comput., 2017 9
(2017). https://doi.org/10.1155/2017/5360472 Google Scholar
X. Lu et al.,
“A universal malicious documents static detection framework based on feature generalization,”
Appl. Sci., 11
(24), 12134
(2021). https://doi.org/10.3390/app112412134 Google Scholar
Y. Hou et al.,
“Malicious web content detection by machine learning,”
Expert Syst. Appl., 37
(1), 55
–60
(2010). https://doi.org/10.1016/j.eswa.2009.05.023 Google Scholar
D. Liu and J. H. Lee,
“CNN based malicious website detection by invalidating multiple web spams,”
IEEE Access, 8 97258
–97266
(2020). https://doi.org/10.1109/ACCESS.2020.2995157 Google Scholar
M. Cova, C. Krügel and G. Vigna,
“Detection and analysis of drive-by-download attacks and malicious javascript code,”
in Proc. WWW,
281
–290
(2010). https://doi.org/10.1145/1772690.1772720 Google Scholar
M. So-Yeon et al.,
“Design of comprehensive security vulnerability analysis system through efficient inspection method according to necessity of upgrading system vulnerability,”
J. Korea Acad. Ind. Cooperation Soc., 18
(7), 1
–8
(2017). https://doi.org/10.5762/KAIS.2017.18.7.1 Google Scholar
K. H. Kim, D. I. Lee and Y. T. Shin,
“Research on cloud-based on web application malware detection methods,”
Lect. Notes Electr. Eng., 474 817
–822
(2018). https://doi.org/10.1007/978-981-10-7605-3_130 Google Scholar
K. Pavani, H. Mishra and R. Karsh,
“Multi-attached network topology with different routing protocols and stub network resolution in OSPF routing,”
in Proc. Third Int. Conf. Microelectron., Comput. and Commun. Syst.,
129
–141
(2019). https://doi.org/10.1007/978-981-13-7091-5_12 Google Scholar
K. Nandhini and R. Balasubramaniam,
“Malicious website detection using probabilistic data structure bloom filter,”
in 3rd Int. Conf. Comput. Methodologies and Commun. (ICCMC),
311
–316
(2019). https://doi.org/10.1109/ICCMC.2019.8819818 Google Scholar
S. Kyung-Sang and N. Wonshik,
“A study on the implementation of a system providing reliable malware information service,”
Int. J. Electr. Eng. Educ., 58
(2), 517
–530
(2019). https://doi.org/10.1177/0020720919828982 IJEEAF 0020-7209 Google Scholar
C. Sharma, S. C. Jain and A. K. Sharma,
“A quantitative risk analysis methodology for the security of web application database against SQL injection (SQLi) attacks utilizing fuzzy logic system as computational technique,”
Int. J. Electr. Eng. Educ.,
(2019). https://doi.org/10.1177/0020720919847542 IJEEAF 0020-7209 Google Scholar
M. A. Mohammed et al.,
“Implementing an agent-based multi-natural language anti-spam model,”
in Int. Symp. Agent, Multi-Agent Syst. and Rob. (ISAMSR),
1
–5
(2018). https://doi.org/10.1109/ISAMSR.2018.8540555 Google Scholar
T. Shibahara et al.,
“Detecting malicious websites by integrating malicious, benign, and compromised redirection subgraph similarities,”
in IEEE 41st Annu. Comput. Software and Appl. Conf. (COMPSAC),
655
–664
(2017). https://doi.org/10.1109/COMPSAC.2017.105 Google Scholar
H.-W. Hsiao, D.-N. Chen and T. J. Wu,
“Detecting hiding malicious website using network traffic mining approach,”
in 2nd Int. Conf. Educ. Technol. and Comput.,
V5-276
–V5-280
(2010). https://doi.org/10.1109/ICETC.2010.5530064 Google Scholar
G. Tan et al.,
“Adaptive malicious URL detection: learning in the presence of concept drifts,”
in 17th IEEE Int. Conf. Trust, Security and Privacy in Comput. and Commun./12th IEEE Int. Conf. Big Data Sci. and Eng. (TrustCom/BigDataSE),
737
–743
(2018). https://doi.org/10.1109/TrustCom/BigDataSE.2018.00107 Google Scholar
W. Chia-Chun et al.,
“A trustworthy web-based system platform for teaching evaluation and STEM education,”
Int. J. Electr. Eng. Educ.,
(2019). https://doi.org/10.1177/0020720919853427 IJEEAF 0020-7209 Google Scholar
BiographyYong-joon Lee received his PhD in computer science from Soongsil University, in 2005. From 2006 to 2009, he was a deputy researcher at the LG CNS Technology Research Department. From 2010 to 2015, he was a senior research fellow with the Korea Internet and Security. From 2016 to 2019, he was a digital forensic research officer at Information Security Office, Defense Security Support Command, Republic of Korea. He is currently an assistant professor with the Department of Cyber Security, Far East University, Republic of Korea. His research interests include industrial security, cybersecurity, and internal information leakage prevention. |