Tabla de contenidos

Title

Tabla de contenidos

Title



Blog

Ataques

XML: eXploitable Markup Language

Rafael Ballestas

Analista de seguridad

Actualizado

16 feb 2018



6 min

Markup languages are "systems for annotating a document in a way that is syntactically distinguishable from the text." What does that really mean? I reckon that’d be better understood with examples. But before, a warning: if you use them for sensitive information storage, you should be really careful in how they are manipulated.

Perhaps the best example is the ubiquitous HTML, the language of the internet. When you visit a webpage, you download a file with plain text mixed with a bunch of 'tags' which make the text look the way it does when rendered in a web browser. The tags are used to define:

page structure like division in sections
text formatting
general page style
the inclusion of media in the page (images, videos, etc).

A tag looks like this: <h1> some big text </h1> and it would be rendered like a title in your browser. If you’re interested in learning more HTML check out W3Schools:

Some tags and their results.

There are other markup languages for different purposes, like:

UNIX’stroff, which is used to render man pages.
Don Knuth's TeX is the only way to properly typeset maths in a computer.
Markdown and AsciiDoc (with which this article was written) are generally used for program documentation.

A trait of all these markup languages is that their main goal should be to explicitly state the structure and hierarchy of a document, separating content from appearance.

When a markup language is not enough

Now, some clever guys liked the 'tags' and structure idea. But not so much the restricted set of tags in HTML or the specific purposes of others. So they took it upon themselves to design a markup language they could use for 'anything'. And thus was born an `eXtensible Markup Language'' (`XML).

You can really use it for anything. For example:

Office suites like LibreOffice use them in their document formats.
Vector images (the ones you can zoom in on indefinitely without pixelating them).
RSS and Atom feeds are ways of keeping up-to-date with a website without going there and are XML-based.

But you can also use them as your own on-the-fly file format or information exchange protocol. Say, if you want to exchange a person’s information with someone else, you could do it like this:

But then what is what? Among all those numbers, which one is the post code, which the street number, and which the phone?

OK, you might say you can just agree upon the order of the columns. And that would work for a while, but it would be difficult to maintain, not to say messy.

What if we could do it like HTML, with some new tags?

Example XML.

<people>
 <person>
 <name> Wyatt </name>
 <initial> H </initial>
 <last> James </last>
 <home> 37498 </home>
 <mobile> 65789 </mobile>
 <address type="US">
 <street> W Broadway </street>
 <number> 1101 </number>
 <postcode> 1014 </postcode>
 </address>
 </person>
<person>
...
</person>
</people>

OK, maybe that is a little verbose, but it does have structure, it is readable even for a person who does not know the format, and it has the advantage of being machine-readable. Your website can easily read XML files with a few lines of JavaScript. Thus, XML has rapidly become a web standard, even a W3C recommendation due to the ease of use to share data in a structured way.

Given a structure like the one above, you can think of such an XML document as a 'tree' made up of 'nodes'. One way a program can read from an XML is by using this tree-like structure to navigate it.

Suppose we have two more people in our file. You could access the streets where all of them live by saying

These `queries'', which are not unlike `SQL queries, are part of the XPath language. What they return is an ordered list: you can access the individual streets by their position or by asking questions about them (select the people who live on Broadway). These are called 'predicates', for example:

 /people/person/address[number>1000]

selects all street names from people whose address number is larger than 1000.

You can even do math with the results of your queries. You can mix and match those queries with logical operators, and you can even use wildcards and refer to other nodes in virtue of their relative position to other nodes in the tree.

It gets better: you don’t need to know JavaScript in order to make these queries. These kind of queries can be made, as with SQL, from pretty much any programming language. But even this apparently good neutrality has its dark side: being implementation independent also implies that attacks could be automated.

What? Attacks? Like databases, XML files can be a useful tool for storing and sharing data, but they can also be made into an attack surface by malicious users. They can take advantage of a website that uses XPath in order to inject malicious queries which may do something as innocent as listing the entire file or as harmful as deleting the files and even elevate their privileges on the website. XPath injections are particularly dangerous when XML files are used to store passwords, authentication details or other sensitive information.

Injecting XPath into a vulnerable app

Remember bWAPP? It’s vulnerable to XPath injection, too! Here we have a website where superheroes can log in. Assume we don’t know that this authentication uses XML. If we try normal text or empty fields, we just get "invalid credentials" as response. But we do know that the site is PHP-based, and in that language strings can be single (') or double (") quoted. If we try just that, we get the following response:

Login form response to testing query.

The important bit is what is hiding behind the bee:

So now we know they are using the PHP xpath() function to run an XPath query on XML data. Since we don’t know the structure of the file, we may never know the exact XPath, but we may guess that it ends like this:

 login='<input1>' and password='<input2>

Thus if we type anything like x' closing the quote, and append or 'a'='a, then the expression evaluates to true. Let’s do that in both login and password field, so that the end of the expression becomes:

Then both or expressions evaluate to true since the 'a'='a' statement is, and so the outer expression and will also be true. In that case the XPath will select all entries in the tree. However the page is designed to give this response to a successful login:

So Neo must be the first node in the XML authentication file tree. We know now they are using XML for authentication because of the two injections: the good and the bad one.

The source of the problem

This is the actual line that runs the XPath:

And in effect, the XML file has a structure like this:

<heroes>
 <hero>
 <id>1</id>
 <login>neo</login>
 <password>trinity</password>
 <secret>Oh why didn't I took that BLACK pill?</secret>
 <movie>The Matrix</movie>
 <genre>action sci-fi</genre>
 </hero>
 <hero>
 ...
 </hero>
</heroes>

It’s generally not a good idea to store users and passwords (and in this case, `secrets'') in plain text files, even with the `XML structure.

And it’s even worse to use them to check authentications, specially with XML files since, as we’ve just shown, they can be vulnerable to the XPath injection attack.

This goes to show once more the importance of input validation: never take input from users as-is, because then you’re opening a window attackers will try to get in through.

Get started with Fluid Attacks' application security solution right now

Etiquetas:

vulnerabilidad

web







Suscríbete a nuestro boletín

Mantente al día sobre nuestros próximos eventos y los últimos blog posts, advisories y otros recursos interesantes.

Otros posts

portada-gen-ai-en-pentesting-investigacion-empirica

Ataques

Felipe Ruiz

•

24 de abril de 2025

Ventajas y desventajas de GenAI en pentesting: conclusiones de una investigación empírica

Leer post



portada-vulnerabilidad-en-tj-actions-changed-files

Ataques

Felipe Ruiz

•

20 de marzo de 2025

Llamada de atención para GitHub Actions por vulnerabilidad en tj-actions/changed-files

Leer post



Ataques

Felipe Ruiz

•

6 de febrero de 2025

Ataques al sector del transporte: 10 brechas de seguridad críticas recientes

Leer post



portada-filtraciones-datos-sector-minorista

Ataques

Felipe Ruiz

•

21 de octubre de 2024

Filtraciones de datos en el sector minorista: Los siete ciberataques más exitosos

Leer post



portada-riesgos-seguridad-aplicaciones-web

Ataques

Wendy Rodriguez

•

16 de agosto de 2024

Riesgos de seguridad para aplicaciones web: Ataques complejos y medidas proactivas

Leer post



portada-mayores-violaciones-datos-sector-financiero

Ataques

Wendy Rodriguez

•

6 de junio de 2024

Las 8 mayores violaciones de datos en el sector financiero

Leer post



Ataques

Wendy Rodriguez

•

11 de abril de 2024

Las 10 mayores violaciones de datos de la historia

Leer post



Ataques

Wendy Rodriguez

•

3 de abril de 2024

Cómo prevenir ataques de ransomware: La mejor ofensa es una buena defensa

Leer post



Inicia tu prueba gratuita de 21 días

Descubre los beneficios de nuestra solución Hacking Continuo, de la que ya disfrutan empresas de todos los tamaños.

Prueba gratis

Contactar a ventas

Inicia tu prueba gratuita de 21 días

Descubre los beneficios de nuestra solución Hacking Continuo, de la que ya disfrutan empresas de todos los tamaños.

Prueba gratis

Contactar a ventas

Inicia tu prueba gratuita de 21 días

Descubre los beneficios de nuestra solución Hacking Continuo, de la que ya disfrutan empresas de todos los tamaños.

Prueba gratis

Contactar a ventas

Inicia tu prueba gratuita de 21 días

Descubre los beneficios de nuestra solución Hacking Continuo, de la que ya disfrutan empresas de todos los tamaños.

Prueba gratis

Contactar a ventas

Las soluciones de Fluid Attacks permiten a las organizaciones identificar, priorizar y remediar vulnerabilidades en su software a lo largo del SDLC. Con el apoyo de la IA, herramientas automatizadas y pentesters, Fluid Attacks acelera la mitigación de la exposición al riesgo de las empresas y fortalece su postura de ciberseguridad.