Table of contents

Title

Table of content

Title



Blog

Development

Deep Learning for vulnerability disclosure

cover-vulnerabilities-in-deep (https://unsplash.com/photos/z4H9MYmWIMA)

Oscar Uribe

Security analyst

Updated

Sep 23, 2019



3 min

Currently, data scientists have begun using AI (Artificial Intelligence) algorithms to solve problems from the data perspective. Data scientists have been working on problems related to areas like medicine, data mining, robotics, etc.

Some researches have been exploring how Artificial Intelligence can be used in cybersecurity. For example, how we can use Artificial Intelligence for vulnerability detection inside source code.

Most vulnerabilities are a result of using bad practices at the time of programming. When these vulnerabilities are not detected in a timely manner, they can later be discovered and exploited by attackers. So, it is important to detect vulnerabilities in the early stages of a system's development.

There are tools that can perform static analysis of the source code. These tools check the source code for problems without the need for compiling and executing it. There are also dynamic analysis tools that send information to the system inputs with presets or random values in order to check for failures or improper exceptions handling.

Initial thoughts

In a Boston University article, the authors discuss the possibility of using Artificial Intelligence and algorithms for Deep and Machine Learning to automatically detect source code vulnerabilities. The idea stems from the fact that there is a large amount of open-source code available to be analyzed. After all, code is just text and it is possible to use data mining algorithms on source code to extract training data.

Static and dynamic code analyzers do not get the most out of source code. The algorithms that they use are based on preset rules that do not take into account small variations in the original rule. The result is that some vulnerabilities and failures may remain undiscovered.

The purpose of this exercise was to use data mining, and deep and machine learning techniques to automate a process frequently susceptible to human errors, which can then result in unnoticed vulnerabilities in applications or within operating systems. These unnoticed vulnerabilities may then be exploited by hackers.

Data

For data, they used C and C`` codes from different sources, such as SATE IV Juliet Test Suite, a code recompilation used for test cases that contains some known vulnerabilities, code from Debian distributions, and some GitHub public repositories.

Vulnerable code distribution.

Labeling

In labeling, a custom lexer was created to capture only the important information and label the rest as generic. The labels already provided by the test database were used. For the Debian and GitHub codes, they used dynamic analyzers in order to search outputs that later could be interpreted by security professionals as one of the known vulnerabilities from the Common Weakness Enumeration (CWE) list. Also in the GitHub repositories, they searched inside the commits, words like “buggy”, “error”, “fixed”, “broken”, and others, in order to classify each block of source code as vulnerable or non-vulnerable.

Statistics CWE vulnerabilities detected.

Feature extraction

In the feature extraction step, two types of Neural Networks were tried, CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network).

Despite the neural network working fine for the data extraction used by the model, classification was not the best. To solve that, after the Neural Networks feature extraction was made, they passed the output through a Random Forest classifier. They then obtained better results and avoided overfitting.

Convolutional Neural Network Model and Random Forest.

Results

Vulnerability detection using Data mining, and Deep and Machine Learning added some advantages compared with lexical analyzers since they do not need to be compiled to work, and they can be adjusted to obtain the desired precision.

Static analyzers have a limited number of findings because of preset rules and the fact that they do not take into account the variations of the rules. Static analyzers only identify a small portion of the real vulnerabilities present in the source code.

Detection of vulnerabilities.

This algorithm can underline code blocks that might introduce a vulnerability. This allows suggestions that can be used to solve problems. It can also simply notify the person in charge to determine whether there is a vulnerability present or not.

Conclusions

Deep and Machine Learning techniques are used to problem-solve from a different perspective, the perspective of the data. The previous article illustrates several functions where using Artificial Intelligence in security is helping to automate functions previously done by humans. Using Artificial Intelligence allows humans to focus on the analysis of problems rather than their detection.

Before these tools can be widely used within the industry, they need some improvement. However, they demonstrate the potential this type of tool has during the process of vulnerability disclosure. It is also important to evaluate the possibility of integrating them into continuous software development via continuous integrations to detect vulnerabilities in early stages and avoid the spread of known security issues on latter versions of the system.

References

Get started with Fluid Attacks' ASPM solution right now

Tags:

machine-learning

security-testing

software







Subscribe to our newsletter

Stay updated on our upcoming events and latest blog posts, advisories and other engaging resources.

Coding with gen AI: Five best practices

Read post



cover-secure-coding-five-steps (https://unsplash.com/photos/zc9pWsPZd4Y)

Development

Felipe Ruiz

•

December 5, 2022

Secure coding in five steps? A simple approach to try out in cybersecurity training

Read post



Development

Felipe Ruiz

•

November 22, 2022

Go over and practice secure coding

Read post



cover-understand-program-semantics (https://unsplash.com/photos/j3dxI7CNYL0)

Development

Rafael Ballestas

•

February 14, 2020

Understanding program semantics with symbolic execution

Read post



cover-code-translate (https://unsplash.com/photos/r8H8K3w9AzA)

Development

Rafael Ballestas

•

January 31, 2020

Can code be translated? From code to words

Read post



cover-further-code2vec (https://unsplash.com/photos/FoiZoPtxSyA)

Development

Rafael Ballestas

•

January 24, 2020

Further down code2vec: Vector representations of code

Read post



Development

Rafael Ballestas

•

January 10, 2020

Embedding code into vectors: Vector representations of code

Read post



cover-vector-language (https://unsplash.com/photos/_E1PQXKUkMw)

Development

Rafael Ballestas

•

December 13, 2019

The vectors of language: Distributed representations of natural language

Read post



Start your 21-day free trial

Discover the benefits of our Continuous Hacking solution, which organizations of all sizes are already enjoying.

Try for free

Contact sales

Start your 21-day free trial

Discover the benefits of our Continuous Hacking solution, which organizations of all sizes are already enjoying.

Try for free

Contact sales

Start your 21-day free trial

Discover the benefits of our Continuous Hacking solution, which organizations of all sizes are already enjoying.

Try for free

Contact sales

Start your 21-day free trial

Discover the benefits of our Continuous Hacking solution, which organizations of all sizes are already enjoying.

Try for free

Contact sales

Fluid Attacks' solutions enable organizations to identify, prioritize, and remediate vulnerabilities in their software throughout the SDLC. Supported by AI, automated tools, and pentesters, Fluid Attacks accelerates companies' risk exposure mitigation and strengthens their cybersecurity posture.