Rafael Ballestas
With symbolic execution
Here's a reflection on the need to represent code before actually feeding it into neural network based encoders, such as code2vec, word2vec, and code2seq.
Rafael Ballestas
From code to words
Here we talk about Code2seq, which differs in adapting neural machine translation techniques to the task of mapping a snippet of code to a sequence of words.
Rafael Ballestas
Vector representations of code
Here is a tutorial on the usage of code2vec to predict method names, determine the accuracy of the model, and exporting the corresponding vector embeddings.
Rafael Ballestas
Vector representations of code
Here we discuss code2vec relation with word2vec and autoencoders to grasp better how feasible it is to represent code as vectors, which is our main interest.
Rafael Ballestas
Distributed representations of natural language
This post is an overview of word2vec, a method for obtaining vectors that represent natural language in a way that is suitable for machine learning algorithms.
Rafael Ballestas
Prioritize code auditing via ML
This post is a high-level review of our previous discussion concerning machine learning techniques applied to vulnerability discovery and exploitation.
Rafael Ballestas
A pipeline to classify vulnerable code
Here is a simple attempt to define a vulnerability classifier using categorical encoding and a basic neural network with a single hidden layer.
Rafael Ballestas
Simple linear regression in scikit
In this post, we begin to tackle why vectors are the most appropriate representation for data as input to machine learning algorithms.
Rafael Ballestas
It's all about sanitization
This blog post provides a brief description of static and dynamic taint analysis or taint checking.