Artificial intelligence (AI) is making waves in many industries across the board. It found use in healthcare, manufacturing, retail, finance, and other sectors that deal with large volumes of data. Although AI does not yet possess the full capabilities of human intelligence, it has a distinct advantage in terms of speed. Once trained to recognize patterns in data and make the necessary operations with them, it can process information quickly and efficiently.
In digital forensics and cyber incident response (DFIR), where the rapid discovery of relevant information in a multitude of files and records is crucial, AI can be a game-changer, helping save both lives and businesses.
The challenges in traditional digital forensics
One of the most significant challenges in modern digital forensics, both in the corporate sector and law enforcement, is the abundance of data. Due to increasing digital storage capacities, even mobile devices today can accumulate up to 1TB of information.
Given that DFIR cases can involve a handful of devices, it is not uncommon to have a few dozen terabytes of data within a single investigation. Such volumes make evidence processing and examination time-consuming, to say the least.
Digital forensics tools alleviate the burden of data abundance by automating many DFIR processes, such as the acquisition of data from electronic devices, decryption, and the extraction of digital evidence. However, after data is extracted and presented in an easy-to-use interface, there are still thousands of files and records to examine. Experts often spend considerable time manually searching for events, documents, and conversations relevant to their investigation. A significant bottleneck in this process is the review of numerous pictures and videos and reading through chats and emails.
With limited human resources and digital forensics software licenses in the lab, data-rich cases cause significant delays in investigations and growing case backlogs. Digital forensics software offers various tools to streamline data analysis, but can the adoption of AI technologies solve the problem once and for all?
AI-Powered technologies in digital forensics
Digital forensics started benefiting from AI features a few years ago. The first major development in this regard was the implementation of neural networks for picture recognition and categorization. This powerful tool has been instrumental for forensic examiners in law enforcement, enabling them to analyze pictures from CCTV and seized devices more efficiently. It significantly accelerated the identification of persons of interest and child abuse victims as well as the detection of case-related content, such as firearms or pornography.
Another promising AI technology with significant potential to enhance digital investigations is the large language model (LLM). LLMs are trained on diverse text sources, including scientific literature, fiction, blog posts, and forum discussions. Such extensive training enables them to skillfully leverage human language and knowledge, performing various natural language processing (NLP) tasks. They can analyze, categorize, summarize text, engage in conversations, answer questions, and even reason.
Digital devices involved in criminal or cybersecurity investigations typically include years of text records in messengers, emails, notes, documents, logs, and other files. Large language models have the necessary skills to analyze this text data and help digital examiners quickly pinpoint critical details needed for investigations.
Belkasoft has recently made this AI technology more accessible to digital forensics examiners. We extended our company’s flagship product, Belkasoft X, with an LLM-based tool called BelkaGPT. This offline AI assistant analyzes data extracted from digital devices and helps users discover evidence using natural language queries. The first version of BelkaGPT already demonstrates success in several key areas:
- Detection of topics of interest: Users can ask generic questions such as “Can you find anything suspicious?” or submit more specific queries, like searching for mentions of financial transactions, account update requests, specific names, locations, plans, events, and more.
- Defining the emotional tone of texts: BelkaGPT can identify whether the text contains signs of various sentiment expressions, such as threats, concerns, or conflicts, and determine the nature of relationships between conversation participants.
- Identifying picture properties. BelkaGPT accesses information from case database records and considers additional properties assigned during analysis. For instance, if you run AI picture analysis and Belkasoft X identifies images with guns or nudity, BelkaGPT will recognize these categories in the case and respond to related questions.
Unlike traditional keyword searches, which only detect exact word matches, BelkaGPT focuses on understanding the meaning behind words. That is why, even if a thought is expressed in synonyms or idioms, the LLM can still uncover it. This capability adds precision to investigations as it helps find the details that may be missed by keyword searches or overlooked by a weary examiner.
Nevertheless, AI is an evolving field, and BelkaGPT will continue to develop. Our plan is to make it more versatile, configurable, precise, and focused on the context of digital forensics and cyber incident response.
The benefits of AI in digital forensics
Integrating AI into the digital investigation workflow boosts the productivity of forensic experts by enhancing both speed and quality. AI helps identify key evidence more quickly, allowing investigators to focus on the critical aspects of their cases.
For law enforcement, AI implementation reduces case backlogs and accelerates the delivery of justice, contributing to a safer and more secure society. In corporate security, it shortens the time needed to investigate and contain cyber incidents, minimizing the financial and reputational damage caused by downtime and data breaches.
Addressing concerns with AI adoption
While AI tools enable forensic experts to discover evidence faster and with greater accuracy, working with AI is not without its challenges. Several common concerns arise when implementing AI in digital forensics and cyber incident response.
Improper output
No matter how advanced, AI operates within the boundaries of its training, which can sometimes be incomplete or imperfect. Large language models, in particular, may produce inaccurate information if their training data lacks sufficient detail on a given topic. As a result, investigations involving AI technologies require human oversight.
In DFIR, validating discovered evidence is standard practice. It is common to use multiple digital forensics tools to verify extracted data and manually check critical details in source files. Therefore, validating AI results will not be a new requirement for digital examiners.
For large language models, there are methods to mitigate the risk of inaccurate output. For instance, our tool, BelkaGPT, is designed to generate responses based solely on case data. If it cannot find relevant information, it states that the required data is unavailable. Additionally, it provides references to the three most relevant artifacts in the case database, allowing examiners to quickly verify their contents and origins.
Data privacy
AI solutions require significant computing power, often provided by vendors through cloud services. For most DFIR labs, cloud infrastructure is a no-go since they work with sensitive information and are usually prohibited from sharing it with third parties in any form. As a result, only local AI tools are viable options for digital forensics.
Cost of implementation
AI technologies rely on powerful GPUs for data processing, which are currently in high demand and costly. So, how affordable is it to run an offline AI solution in a digital forensics lab?
Fortunately, DFIR units are often equipped with such hardware, as it is required for other tasks like password brute-forcing. AI tools can also be optimized to run on less powerful GPUs. For instance, BelkaGPT requires a GPU with a minimum of 8GB of VRAM for optimal performance, with prices for such GPUs starting at $250.
The future of AI in digital forensics
The future of AI in digital forensics promises significant advancements, particularly as large language models continue to evolve. These models will become increasingly adept at understanding the meaning of digital artifacts found on electronic devices and their role in investigations.
Beyond text-based analysis, the future of AI in digital forensics will be shaped by the development of multi-modal AI technologies. These technologies, capable of processing and analyzing data across multiple formats, will enhance forensic investigations by covering a broader range of tasks. For instance, multi-modal AI could simultaneously analyze text, media files, and system records, providing a holistic view of the evidence.
Incorporating AI advancements into digital forensics will require ongoing adaptation and innovation.
Conclusion
AI is revolutionizing digital forensics and cyber incident response by enhancing the speed and precision with which forensic experts can analyze vast amounts of data. While AI tools like BelkaGPT demonstrate significant promise in streamlining investigations, they also require careful implementation and oversight to ensure accuracy and address challenges such as data privacy and cost. AI technologies can make the digital forensics industry more efficient, helping law enforcement and corporate security teams deliver justice and investigate cyber threats more effectively.
As AI continues to evolve, its role in digital forensics will expand, offering new ways to process and interpret complex data. With new advancements like multi-modal AI coming soon, digital forensics is set to reach new levels of precision and efficiency. The future will depend on our ability to adapt and refine these technologies, unlocking their full potential to meet the growing demands of digital investigations.
About the Author
Yuri Gubanov is the Founder and CEO of Belkasoft, a company specializing in Digital Forensics and Cyber Incident Response Software. Since its inception in 2002, Yuri has led Belkasoft to become a global leader in digital forensics solutions, trusted by law enforcement agencies, corporate clients, and private investigators in over 130 countries.
Yuri can be reached online at Linkedin. Company website: https://belkasoft.com/
Source: www.cyberdefensemagazine.com