MongoDB .NET BSON Serialization

dotnet mongodb

When working with MongoDB, you have a few options for converting your classes into documents:

  • A reflection-based, generic BsonSerializer<T>: Use reflection to read properties of your class at runtime, iterate over them and serialize/deserialize
  • JSON to BSON converter: Serialize as JSON and then to BSON using Mongo’s helper methods
  • Custom serializer: Write a direct converter for your class (like writing a JsonConverter for System.Text.Json)

The best way is not the JSON way because it is quicker…the best way is what it always is: it depends.

Reflection

This is likely the best solution for a project with a large number of document types or if the document bodies are dynamic, but the implementation can be complex. If you need to cover a wide variety of complex objects, you will need to make sure your serialization method handles it.

For each property being serialized, you will need to write to the correct BSON type, which is accomplished by calling w.WriteString(), w.WriteInt32(), etc. Using the correct BSON type is important, especially for something like decimal as you will lose precision if it is not serialized as a BSON decimal.

public class GenericSerializer<T> : SerializerBase<T>
{
    public override void Serialize(BsonSerializationContext context, BsonSerializationArgs args, T value)
    {
        var w = context.Writer;
        // BSON start
        w.WriteStartDocument();

        var type = typeof(T);
        var properties = type.GetProperties(BindingFlags.Public | BindingFlags.Instance);

        foreach (var property in properties)
        {
            // The value of the current property on the object being serialized
            var propertyValue = property.GetValue(value);
            // The name of the current property
            var propertyName = property.Name;

            // Writes the name of the property
            w.WriteName(propertyName);

            // Writes the property based on the type of the value
            switch (propertyValue)
            {
                case string stringValue:
                    w.WriteString(stringValue);
                    break;
                case int intValue:
                    w.WriteInt32(intValue);
                    break;
                // case decimal, case CustomObject, etc.
            }
        }

        w.WriteEndDocument();
    }
...

For deserialization, you need to consider the following for the object that you will be writing to:

  • Is the object immutable? You will need to instantiate this object, so if it is, you need to know all properties and their values at the beginning. If the object is mutable, you can call the corresponding setters after instantiating.
  • How do you instantiate it? The example below assumes a constructor, but you might not have one. Maybe you use a factory pattern or some other method, but you need to be able generically to use that creation method here.

The BSON values will need to be mapped to the corresponding property on the target object. The method below uses the default constructor of the target class to get the mutable properties and their types. It matches the BSON property values by name and uses them to call the constructor and create the object.

If you have multiple constructors, you can find the correct one by criteria you pass to the serializer, but this will probably require abstracting the serializer if you use generics. For no constructors, maybe FormatterServices.GetUninitializedObject could be helpful. If you have public setters, you can use reflection to call SetValue.

    public override T Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args)
    {
        // Get the whole BSON doc
        var document = BsonDocumentSerializer.Instance.Deserialize(context, args);
        // Get all the field names in the BSON doc
        var documentFieldNames = document.Elements.Select(x => x.Name).ToList();

        // For type T that we are deserializing, look for the constructor
        var constructor = typeof(T).GetConstructors().First();
        // Maps constructor parameter type to parameter name
        var constructorTypesToNames = constructor.GetParameters().ToDictionary(p => p.ParameterType, p => p.Name);

        // For each constructor parameter, find the corresponding field in the BSON document
        var constructorValues = new List<object>();
        foreach (var constructorParam in constructorTypesToNames)
        {
            var type = constructorParam.Key;
            var name = constructorParam.Value;

            // The corresponding field in the BSON document
            var bsonFieldName =
                documentFieldNames.FirstOrDefault(x => string.Equals(x, name, StringComparison.OrdinalIgnoreCase));
            if (bsonFieldName == null) continue;

            // The value for that BSON field
            var bsonValue = document.GetValue(bsonFieldName);

            // Set that BSON value for the corresponding constructor param
            if (type == typeof(string))
            {
                constructorValues.Add(bsonValue.AsString);
            }
            else if (type == typeof(int))
            {
                constructorValues.Add(bsonValue.AsInt32);
            }
            // else if (type == typeof(decimal)), else if (type == typeof(CustomObject)), etc.
        }

        // Instantiate the object via the constructor
        return (T)constructor.Invoke(constructorValues.ToArray());
    }
}

Primitive-only JSON to BSON serialization

Say you have a class with only primitive properties:

public class SomeClass
{
    public string Name { get; init; }
    public int Count { get; init; }
}

Then to simplify things, you can simply write a BSON serializer that converts it to a JSON object first, and then to BSON. You could even use generics to accept any type of object that is composed of primitives.

public class SomeClassJsonSerializer : IBsonSerializer<SomeClass>
{
    public Type ValueType => typeof(SomeClass);
    private static readonly IBsonSerializer Serializer = BsonSerializer.LookupSerializer(typeof(BsonDocument));

    object IBsonSerializer.Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args)
    {
        return Deserialize(context, args);
    }

    public void Serialize(BsonSerializationContext context, BsonSerializationArgs args, object value)
    {
        Serialize(context, args, (SomeClass)value);
    }

    public SomeClass Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args)
    {
        // Get the BSON document
        var document = Serializer.Deserialize(context, args);
        var bsonDocument = document.ToBsonDocument();
        bsonDocument.Remove("_id");

        // Convert to JSON, and then deserialize using JSON
        var result = BsonExtensionMethods.ToJson(bsonDocument)!;
        return JsonSerializer.Deserialize<SomeClass>(result);
    }

    public void Serialize(BsonSerializationContext context, BsonSerializationArgs args, SomeClass value)
    {
        // Serialize SomeClass to JSON
        var jsonDocument = JsonSerializer.Serialize(value);
        // Use built-in method to serialize JSON to BSON(Document)
        var bsonDocument = BsonSerializer.Deserialize<BsonDocument>(jsonDocument);
        Serializer.Serialize(context, bsonDocument.AsBsonValue);
    }
}

IBsonSerializer<SomeClass> extends IBsonSerializer, so you need to include both the generic and non-generic implementations of Serialize and Deserialize. But the logic is simple:

  • Serialize: SomeClass => JSON => BSON
  • Deserialize: BSON => JSON => SomeClass

Note that MongoDB and BSON automatically add a _id field for each document. You can remove this if needed.

Complex-type BSON serialization

For more complex classes, especially if you only had a few of them like I did, writing custom serializers is also an option.

Consider that SomeClass now has a nested class:

public class SomeClass
{
    public string Name { get; init; }
    public int Count { get; init; }
    public SomeNestedClass Nested { get; init; }
}

public class SomeNestedClass
{
    public string NestedName { get; init; }
}

You can now just implement SerializerBase since we will be writing a serializer as Mongo intends. You will need to write a serializer for both SomeClass and SomeNestedClass, but the process is the same as writing a JsonConverter for System.Text.Json.

public class SomeClassSerializer : SerializerBase<SomeClass>
{
    public override SomeClass Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args)
    {
        // Get the serializer for SomeNestedClass
        var nestedClassSerializer = BsonSerializer.LookupSerializer<SomeNestedClass>();

        var r = context.Reader;

        // The field names may not be in a specific order, so have placeholders for the values
        string? name = null;
        int? count = null;
        SomeNestedClass? nestedClass = null;

        r.ReadStartDocument();
        // For each field, read the name and value
        while (r.ReadBsonType() != BsonType.EndOfDocument)
        {
            var fieldName = r.ReadName();

            switch (fieldName)
            {
                case nameof(SomeClass.Name):
                    name = r.ReadString(); // You must read the value by its type
                    break;
                case nameof(SomeClass.Count):
                    count = r.ReadInt32();
                    break;
                case nameof(SomeClass.Nested):
                    nestedClass = nestedClassSerializer.Deserialize(context, args);
                    break;
                default:
                    r.SkipValue();
                    break;
            }
        }

        r.ReadEndDocument();

        return new SomeClass(name!, count!.Value, nestedClass!);
    }

    public override void Serialize(BsonSerializationContext context, BsonSerializationArgs args, SomeClass value)
    {
        var w = context.Writer;
        w.WriteStartDocument();

        // Write the field name, and then the value
        w.WriteName(nameof(SomeClass.Name));
        w.WriteString(value.Name);

        w.WriteName(nameof(SomeClass.Count));
        w.WriteInt32(value.Count);

        // Write the field name, and then the value, which is an object/document itself
        w.WriteName(nameof(SomeClass.Nested));
        w.WriteStartDocument();
        w.WriteName(nameof(SomeNestedClass.NestedName));
        w.WriteString(value.Nested.NestedName);
        w.WriteEndDocument();

        w.WriteEndDocument();
    }
}

public class SomeNestedClassSerializer : SerializerBase<SomeNestedClass>
{
    public override SomeNestedClass Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args)
    {
        var r = context.Reader;

        string? name = null;

        r.ReadStartDocument();
        // For each field, read the name and value
        while (r.ReadBsonType() != BsonType.EndOfDocument)
        {
            var fieldName = r.ReadName();

            switch (fieldName)
            {
                case nameof(SomeNestedClass.NestedName):
                    name = r.ReadString(); // You must read the value by its type
                    break;
                default:
                    r.SkipValue();
                    break;
            }
        }

        r.ReadEndDocument();

        return new SomeNestedClass(name!);
    }

    public override void Serialize(BsonSerializationContext context, BsonSerializationArgs args, SomeNestedClass value)
    {
        var w = context.Writer;
        w.WriteStartDocument();

        // Write the field name, and then the value
        w.WriteName(nameof(SomeNestedClass.NestedName));
        w.WriteString(value.NestedName);

        w.WriteEndDocument();
    }
}

Gotchas

If you use the JSON to BSON method, you do run into an issue if your object has a decimal primitive. The decimal value 0.123m will be serialized using JSON first, so the result will be 0.123 and be stored in BSON as a double. When you deserialize, it will be treated as a BSON double, not a decimal. A proper BSON decimal is stored as:

new NumberDecimal("0.123")

which will properly be deserialized as a decimal.