Skip to main content:

Published: Jan 17, 2025

Advanced phishing detection with AI techniques


Executive summary

Phishing, an increasingly complex cyber threat, affects various communication channels beyond emails, such as texts and social media. 

These attacks, aimed at stealing personal data or financial fraud, use sophisticated social engineering, challenging traditional firewall defences. Phishing's global impact, financial damages, and targeted nature necessitate advanced solutions. 

This white paper introduces a novel AI-based approach using Natural Language Processing (NLP) and computer vision to counter these evolving cyber threats. 

The scope of phishing

Phishing attacks have become a prominent threat globally and consistently rank among the top cybersecurity concerns in numerous reports. They cause significant financial losses annually. The FBI's Internet Crime Complaint Centre reports more than US$1.8 billion was lost in the United States in 2021 due to business email compromise (BEC) scams alone. The frequency and sophistication of phishing attacks has been increasing, with cybercriminals employing more advanced methods to evade detection. 

Targeted spear-phishing 

Spear-phishing attacks are highly targeted attempts where cybercriminals customise their deceptive messages to specific individuals or organisations. These attacks are more successful because they often use personal information to create a semblance of legitimacy. 

Industry-specific vulnerabilities 

Certain sectors, notably finance, healthcare, and government, are more vulnerable to phishing attacks due to the sensitive nature of the data they manage. 

Phishing detection techniques 

Current phishing detection solutions range from simple rule-based heuristics to advanced machine learning and AI-based methods. These solutions include user behaviour monitoring, the sandboxing of suspicious links or attachments, URL analysis using blacklists and reputation databases, and real-time URL scanning. Multiple deep learning models, including computer vision and NLP, can be combined to create a novel, comprehensive, anti-phishing solution. 

The growing threat of phishing

Phishing in 2023 

Around 4,100 phishing attempts were reported to the Singapore Cyber Emergency Response Team (SingCERT) in 2023, less than half of what was reported in 2022. Notwithstanding the decline, the number of phishing attempts was still about 30% higher than that in 2021. 

Evolving phishing strategies 

Bad actors are constantly refining their phishing strategies to bypass conventional detection methods. They often use URL shortening services to obscure malicious links and create webpages that are visually indistinguishable from legitimate sites. 

Advanced phishing schemes also use image-based text to avoid text-based detection and employ homograph attacks to trick users into visiting harmful websites. 

Our innovative solution

In response to the sophisticated and evolving nature of phishing attacks, we have developed a state-of-the-art detection system that synergises deep learning models, including computer vision and NLP. Our system has an in-depth understanding of phishing markers and operates on a proactive knowledge base that is continuously updated with the latest information on popular global websites, with a particular focus on Singapore. 

In Table 1, we evaluated the features included with Google Safe Browsing, Microsoft Office 365, and other readily available commercial off-the-shelf (COTS) solutions in the market, and compared those to our solution: 

Table 1: Capabilities comparison table 

Brand verification 

Our system compares webpages that are suspected of being fraudulent to a list of known, legitimate, brand webpages. This process helps us identify instances where a webpage appears very similar to a legitimate brand's website but uses a different domain. When we detect such a similarity, we flag the webpage as a phishing site that is targeting the brand it resembles. 

Advanced phishing detection techniques 

Brand verification 

Our system meticulously compares suspicious webpages with an extensive list of legitimate brand webpages, identifying deceptive sites that mimic the appearance of authentic brands but operate under different domains. 

Employing computer vision 

We utilise advanced computer vision algorithms to analyse webpage screenshots and logos. This analysis is instrumental in detecting phishing sites that imitate the visual design of legitimate brands and enhances our ability to identify potential threats accurately. 

Natural Language Processing 

Complementing our computer vision capabilities, we employ NLP to extract and analyse information from the HTML source code of webpages. This process helps us to identify the indicative linguistic patterns and cues of phishing attempts. 

Figure 1: NCS Multi-model Phishing Detection system 

As depicted in Figure 1, our detection system incorporates a suite of deep learning models, enabling us to detect sophisticated phishing attempts that elude traditional detection methods. We analyse multiple elements, including webpage layout, logos, and textual content, to ensure a comprehensive approach to phishing detection. 

Deep learning for explainable detection 

Our multi-model approach includes several critical components: 

  • Web page layout detector: Analyses webpage layouts for inconsistencies and anomalies that are characteristic of phishing sites

  • Logo matcher: Compares logos against known brands to detect counterfeit usage 

  • Domain checker: Scrutinises domains to identify suspicious or deceptive URLs 

  • Credential Request Page (CRP) classifier: Identifies pages designed to capture sensitive information 

Proactive knowledge base 

A key feature of our system is its proactive knowledge base, which serves several essential functions: 

  • Continuous updates: Houses and continuously updates current information on popular websites globally 

  • Focus on Singapore and government sites: Specifically tracks Singaporean websites and government portals for enhanced local protection 

  • Semi-automatic knowledge retrieval: Gathers information from multiple sources, including the Google Image API and Wikidata Extractor to ensure a comprehensive and up-to-date database 

Advantage of our solution

Our AI-powered approach offers several advantages over standard phishing detection solutions: 

  • Enhanced detection accuracy: The combination of computer vision and NLP enables us to detect phishing sites accurately, even when fraudsters employ advanced tactics to mimic legitimate brands. In testing using the OpenPhish and Tranco datasets, our solution received an F1 score of 89%, demonstrating the robustness of our approach. Moreover, our model demonstrated exceptional precision, reaching 97.1%, which indicates a low false positive rate. Additionally, our experiment showed a recall rate of 82.3%, indicating our ability to effectively detect malicious websites. These outcomes highlight the efficacy of our methodology in enhancing cybersecurity measures and identifying phishing threats. 

  • Adaptability: As phishing techniques continue to evolve, our solution can be updated to recognise new patterns and trends to ensure ongoing protection against emerging threats. 

  • Reduced false positives: By employing multiple detection techniques, we reduce the likelihood of false positives and minimise disruptions to legitimate web traffic. 

  • Protection for targeted sectors: The financial services, government, and logistics sectors are frequent targets of phishing attacks, and can benefit greatly from our advanced detection system. 

Conclusion

Phishing attacks are a persistent problem for individuals and organisations alike. The tactics employed by malicious actors are constantly evolving, making it difficult to rely on traditional rule-based firewall defences. Our AI-powered phishing detection solution leverages advanced NLP and computer vision techniques to provide a more robust defence against these malicious tactics. 

Our solution compares suspicious webpages to a database of legitimate brand sites, analyses visual elements, and extracts information from HTML source code to deliver enhanced detection accuracy, adaptability, and protection for targeted sectors. We believe our approach represents a significant step forward in the ongoing battle against phishing attacks, particularly in a world where cyber threats are constantly changing. 

References

Singapore Cyber Landscape. (2023). Cyber Security Agency, 2024. Retrieved from https://www.csa.gov.sg/docs/default-source/publications/2024/singapore-cyber-landscape-2023.pdf. 


Share this article on:

How can tomorrow’s security be safeguarded?

Get the full report from NCS' leading experts.

Download now

Contact us

If you're ready to make extraordinary happen, get in touch today.

what are you looking for?

Contact Us

You can drop us a call or email

6556 8000
We endeavour to respond to your email as soon as possible. When sending in an enquiry, please fill your contact details and indicate the request purpose for our follow-up.

Thank you for your enquiry! We'll get back to you as soon we can.

Thank you for your interest.