From the course: Artificial Intelligence for Cybersecurity

Intrusion detection at scale

- [Instructor] Pretend that you are a homeowner. You worry about the safety of your family, the house, and the household artifacts. Wouldn't you want to know if a burglar has broken the window glass and has entered into the house with the intent to harm or steal? In the same way, an organization also guards the data and information it has stored on its servers that are accessible by the network. Intrusion detection is the process of monitoring the corporate network and systems for malicious activity or policy violations. If adversaries are going to attack your organization and go after the crown jewels, such as intellectual property and confidential data, they are invariably likely to leave a trail of actions. Obviously, as a security professional, your goal is to ensure that the information and the assets of your organization are protected and secure. But to do so, you depend on crucial information. Information that is buried in the terabytes of logs, network packets, and files generated every second by almost every device or software on your corporate network. Regardless of how much you despise this mundane task, it is essential for you to sift through the data to prevent the next potential breach. But the challenge is that the amount of data is enormous, never ending, and has complex interrelationships among themselves. When you monitor this data 24/7, you just can't do it by yourself. But rather, you rely on a number of different tools to find signs of suspicious activity. But these tools have limitations. Their programmatic rule-based approach does not handle the massive amount of data being generated in an enterprise. And let's take an example. Just as a homeowner worries about the home invasion by a burglar. You, as a security professional, worry about the intrusion by attackers in your enterprise network. Your goal is to allow only desirable traffic to your network and identify and stop any unwanted intrusions. A well-known solution in the industry for this problem is to deploy an intrusion detection system, or also known as IDS. There are two types of IDS. Network Intrusion Detection System that monitors the traffic pattern for the entire network. And Host Intrusion Detection System that monitors a particular machine, say workstation or a server, for abnormalities. IDS constantly scans and parses network packets and log data. It looks for signatures by using a pattern matching approach. By the way, a signature in this context is a known pattern of a packet flow, or binary code, that has been deemed previously malicious. If IDS fails to find a signature, or even worse, the signature doesn't even exist in the database, it will fail to catch intrusion, and the impending attack will go undetected. With the amount of traffic and logs in an enterprise, this approach of signature matching is constantly under stress. Let's see how. To really do a good job in finding incidents such as brute force attacks, the denial of service attacks, this approach needs to look through large amount of data over a period of time. It also needs to consider many sources of data. Not only that, it also needs to sift through a variety of attributes known as features. Some examples of the features are the protocol being used, IP addresses. Whether the packet contains a script or not, and so on. As you can see, this approach exponentially increases the amount of data the algorithms needs to continuously scan through. And that just does not scale. Which, in turn, leads to a slow response or incorrect match. When you apply an IDS tool that leverages artificial intelligence instead of searching for patterns, you're effectively creating a predictor model behind the scenes. And you do so by training it on data along with the features. You finally arrive at a trained model that uses only features that are absolutely necessary to detect anomaly. No more, and no less. You then deploy the trained predictor model and let the incoming logs or traffic data continuously pass through the model in production. Finally, the deployed model determines if a new event is indeed an intrusion, or just business as usual. Of course, if you have the expertise, you can build a custom intrusion detection system in-house. But most organizations choose to buy and deploy a commercial product.

Contents