By Ron Kerbs, CEO of Kidas
Machine learning (ML) and classifiers have been used as cybersecurity tools for years. Starting in the 1990s, machine learning techniques began detecting known attacks and deviations from normal system behaviors. In the beginning of the 21st century, machine learning tools analyzed traffic and communication to understand abnormalities. This was the rise of data-driven approaches. The availability of copious amounts of data and computational power in the 2000s enabled significant advancements in machine learning. At first, machine learning was used at the network and system level which gave rise to intrusion detection systems (IDS) emerged. These systems used ML algorithms to analyze network traffic and identify suspicious activity to find viruses and malware.
In the last few decades, companies like Meta, Google and Twitter have relied on Natural Language Processing (NLP) to detect other types of threats – social threats like scams, hate speech and bullying on social media. The NLP solution to monitor communications on social media is accurate but not accurate enough to cut moderation team budgets. In fact, these companies have large moderation teams.
Recent developments in large language models (LLM) like OpenAI GPT-4 enable companies to improve the performance of moderation task accuracy.
The Problem
Currently, I see three main challenges to getting these models to a place where they are good enough.
- Availability of data
The models are trained on big data sets. To monitor gaming or social media DMs correctly, you need access to this specific data. However, this data is private and/or not accessible as it is private between users. This is not the core competency for many of these businesses. Though they may recognize the need to develop these systems internally, it often detracts from the company’s core mission. Furthermore, these companies are reluctant to share data externally as it’s extremely valuable. Take, for example, Reddit and Quora – both started charging for data even though it is available online. In a TechCrunch article, Reddit CEO Steve Huffman said that the data shared on Reddit is extremely valuable. He goes on to say that many of the users feel so comfortable in the community, that they share things that they may not feel comfortable sharing elsewhere. “There’s a lot of stuff on the site that you’d only ever say in therapy, or AA, or never at all,” Huffman is later quoted saying. With access to that information, Reddit saw an opportunity to sell it instead of giving it to large companies for free.
- Change in slang and communication type
Slang is an ever-evolving aspect of language that changes over time. It reflects the cultural, social and generational shifts within a society. The evolution of slang can be influenced by numerous factors, including technology, pop culture, social movements and globalization. For example, current movies, television shows and music influence the slang that people use. Catchphrases, expressions and words popularized by celebrities or influencers can quickly enter mainstream language. Technology and the internet have also had a significant impact on slang as it has created spaces for people to communicate in an abbreviated language with words like “LOL,” “OMG” or even emojis. In short, people change their slang words and use new emojis, etc. either to intentionally mislead the algorithm or because language changes very quickly. As long as these models are not trained regularly, they will miss a lot as language evolves. However, training such a huge model costs a lot of money and computation power, so it is almost impossible to train on a day-to-day basis with current computation power and costs.
- 20/80.
In general, there is an unwritten understanding that if it takes 100 percent effort to get to 100 percent accuracy – 20 percent of the effort is invested in getting to 80 percent accuracy, and 80 percent of the effort is needed to improve accuracy by an additional 20 percent. In other words, the last improvement and movement towards perfection in the finetuning of machine learning is always the longest stretch. Moving from 95 percent accuracy to 99 percent is hard but from 99 percent to 99.5 percent accuracy is the hardest.
The Solution
While it is tempting to try to use LLM for monitoring, a better bet would be to use specific models for each task. For example, a model for scams, a model for hate speech, and so on. This results in a much more cost-efficient and easier-to-train algorithm. LLM can undoubtedly assist in creating or validating training sets, but it muddles efficiency.
AI can be a powerful tool in monitoring and addressing potential cybercrimes to keep children safe from cyberbullying and scams, however, at this time, it’s best used to assist in monitoring and mitigating potential cybercrimes. It should be complemented with human judgment and oversight. At this stage, human involvement is crucial for interpreting AI generated alerts, addressing false negatives or positives and providing emotional support and guidance to those, especially children, in potentially harmful situations.
I believe we will achieve artificial general intelligence (AGI) at some point, but in the next decade, specific expert-trained algorithms will continue to outperform LLMs for these tasks at a fraction of the cost.
About the Author
Ron Kerbs is the Founder and CEO of Kidas. Ron has a decade of experience in leading technology teams and investing in early-stage startups. After volunteering in various children-focused NGOs, he decided to address the problem of gaming toxicity. Ron can be reached online on Twitter, Instagram and at Kidas’ company website https://getkidas.com/.
Source: www.cyberdefensemagazine.com