Posts by Rafael Ballestas

Rafael was security analyst at Fluid Attacks from January 2018 until May 2020.

Photo by Fern M. Lomibao on Unsplash

February 14, 2020

Rafael Ballestas


With symbolic execution

Here's a reflection on the need to represent code before actually feeding it into neural network based encoders, such as code2vec, word2vec, and code2seq.

Book in two languages

January 31, 2020

Rafael Ballestas


From code to words

Here we talk about Code2seq, which differs in adapting neural machine translation techniques to the task of mapping a snippet of code to a sequence of words.

Target and darts

January 24, 2020

Rafael Ballestas


Vector representations of code

Here is a tutorial on the usage of code2vec to predict method names, determine the accuracy of the model, and exporting the corresponding vector embeddings.

Arrows vector field

January 10, 2020

Rafael Ballestas


Vector representations of code

Here we discuss code2vec relation with word2vec and autoencoders to grasp better how feasible it is to represent code as vectors, which is our main interest.

Photo by Possessed Photography on Unsplash

December 13, 2019

Rafael Ballestas


Distributed representations of natural language

This post is an overview of word2vec, a method for obtaining vectors that represent natural language in a way that is suitable for machine learning algorithms.

Photo by camilo jimenez on Unsplash

October 18, 2019

Rafael Ballestas


Prioritize code auditing via ML

This post is a high-level review of our previous discussion concerning machine learning techniques applied to vulnerability discovery and exploitation.

Photo by Rishi Deep on Unsplash

October 4, 2019

Rafael Ballestas


A pipeline to classify vulnerable code

Here is a simple attempt to define a vulnerability classifier using categorical encoding and a basic neural network with a single hidden layer.

Computer showing a graph

October 2, 2019

Rafael Ballestas


Simple linear regression in scikit

In this post, we begin to tackle why vectors are the most appropriate representation for data as input to machine learning algorithms.

Photo by Sara Kurfeß on Unsplash

August 30, 2019

Rafael Ballestas


It's all about sanitization

This blog post provides a brief description of static and dynamic taint analysis or taint checking.

Start your 21-day free trial

Discover benefits of our Continuous Hacking solution, which hundreds of organizations are already enjoying.

Start your 21-day free trial