Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: New Feature: MongoDB VectorStore - Allow for filters on nested sub-documents #10152

Open
zarusz opened this issue Jan 10, 2025 · 0 comments
Labels
.NET Issue or Pull requests regarding .NET code triage

Comments

@zarusz
Copy link

zarusz commented Jan 10, 2025


name: MongoDB VectorStore - Support Filters on Nested Sub-Documents
about: Currently, MongoDBVectorStoreRecordCollection does not support filtering on nested sub-documents. I am requesting the ability to apply filters on nested fields within MongoDB documents during vector searches.


Feature Request

When working with MongoDBVectorStoreRecordCollection, it’s not possible to apply filters on nested sub-documents. This limitation prevents filtering on fields within embedded objects.

Example Scenario:

Consider the following MongoDB document:

{
  "_id": { "$oid": "673871520bb02bf2a7bb8b4e" },
  "chunkNumber": 1147,
  "url": "https://mywebsite.com",
  "text": "some text",
  "textEmbedding": [],
  "metadata": {
    "source": "gravity9",
    "content_type": "text/css",
    "title": "",
    "targetId": "00000000-0000-0000-0000-000000000000"
  }
}

And this corresponding MongoDB index:

{
  "fields": [
    { "type": "vector", "numDimensions": 1536, "path": "textEmbedding", "similarity": "cosine" },
    { "type": "filter", "path": "metadata.source" },
    { "type": "filter", "path": "metadata.targetId" }
  ]
}

The model in C# is defined as:

public class KnowledgeChunk
{
    [BsonId(IdGenerator = typeof(StringObjectIdGenerator))]
    [BsonRepresentation(BsonType.ObjectId)]
    [VectorStoreRecordKey]
    public required string Id { get; set; }

    [BsonElement("chunkNumber")]
    [VectorStoreRecordData(IsFilterable = true)]
    public int ChunkNumber { get; set; }

    [BsonElement("text")]
    [VectorStoreRecordData]
    public required string Text { get; set; }

    [BsonElement("textEmbedding")]
    [VectorStoreRecordVector(1536, DistanceFunction.CosineSimilarity)]
    public ReadOnlyMemory<float>? TextEmbedding { get; set; }

    [BsonElement("metadata")]
    // [VectorStoreRecordData] // Uncommenting this throws an error
    public required KnowledgeChunkMetadata Metadata { get; set; }
}

[BsonIgnoreExtraElements]
public class KnowledgeChunkMetadata
{
    [BsonElement("source")]
    [VectorStoreRecordData(IsFilterable = true)]
    public required string Source { get; set; }

    [BsonElement("targetId")]
    [VectorStoreRecordData(IsFilterable = true)]
    public required string TargetId { get; set; }
}

Problem

  1. Error When Marking Nested Properties as Filterable:
    Uncommenting [VectorStoreRecordData] on the Metadata property throws an exception due to unsupported property types:

    System.ArgumentException: Data properties must be one of the supported types...
    Type of the property 'Metadata' is Gravity9.Service.Agent.Application.Plugins.KnowledgeDb.KnowledgeChunkMetadata.
    
  2. No Way to Filter Nested Fields:
    There’s no clear method to filter on nested fields like Metadata.Source using the VectorSearchFilter API.

    Example attempt (fails):

    var propertyName = nameof(KnowledgeChunk.Metadata); // Tried Metadata.Source too
    VectorSearchFilter? filter = new VectorSearchFilter().EqualTo("gravity9");
    
    var options = new VectorSearchOptions
    {
        IncludeVectors = false,
        IncludeTotalCount = false,
        Top = request.Top,
        Filter = filter,
    };
    
    var items = await collection.VectorizedSearchAsync(contentVector, options: options, cancellationToken: cancellationToken);
  3. Limitation in Supported Data Types:
    Reviewing MongoDBConstants.SupportedDataTypes suggests only primitive data types are filterable, blocking nested object filtering.


Proposed Solution

  • Enhance Filter Support:
    Enable filtering on nested sub-document fields (e.g., metadata.source, metadata.targetId).
  • Flatten or Path-Based Filtering:
    Implement path-based filtering similar to MongoDB’s dot notation (metadata.source), or automatically flatten nested objects for filtering.

Questions

  1. Are there any plans to support filtering on nested sub-documents in MongoDBVectorStoreRecordCollection?
  2. If not currently planned, would the team be open to accepting a community contribution to add this feature?

Thank you for considering this request!


Environment:

  • Library: SemanticKernel
  • Storage: MongoDB VectorStore
  • Language: C# (.NET 8.0)

Impact:
This enhancement would significantly improve filtering flexibility for complex document structures and unlock more advanced search capabilities.

@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code triage labels Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.NET Issue or Pull requests regarding .NET code triage
Projects
None yet
Development

No branches or pull requests

2 participants