CCLee / Blog / Understand Stale Statistics and Query Execution Plans in MongoDB

Understand Stale Statistics and Query Execution Plans in MongoDB

1.What Are Database Statistics
2.Stale Statistics in MongoDB
- 2.1.How Statistics Affect Execution Plans
- 2.2.How Stale Statistics Impact MongoDB
3.Detecting Stale Statistics in MongoDB
4.MongoDB Query Examples: From Optimal to Problematic
- 4.1.Baseline: Optimal Query Performance
- 4.2.Potentially Slow Queries
5.Performance Improvement Strategies
- 5.1.Index Design
  - 5.1.1.For the nested array query
  - 5.1.2.For compound queries on arrays
  - 5.1.3.For text searches
  - 5.1.4.For $elemMatch compound queries
  - 5.1.5.For date range queries
  - 5.1.6.For aggregation with $unwind
- 5.2.Check index usage statistics
- 5.3.Track index selectivity changes over time
6.Fixing stale plan cache Problems when Indexes Already Exist in MongoDB
- 6.1.Clear Plan Cache
- 6.2.Rebuild Indexes
- 6.3.Use Index Hints
- 6.4.Analyze Query Performance
- 6.5.Configure Plan Cache Settings
7.Set Up Alerts
- 7.1.MongoDB Atlas Alert Setup
- 7.2.Self-Hosted MongoDB alert setup

February 26, 2026

Database

Mongodb

1.What Are Database Statistics
2.Stale Statistics in MongoDB
- 2.1.How Statistics Affect Execution Plans
- 2.2.How Stale Statistics Impact MongoDB
3.Detecting Stale Statistics in MongoDB
4.MongoDB Query Examples: From Optimal to Problematic
- 4.1.Baseline: Optimal Query Performance
- 4.2.Potentially Slow Queries
5.Performance Improvement Strategies
- 5.1.Index Design
  - 5.1.1.For the nested array query
  - 5.1.2.For compound queries on arrays
  - 5.1.3.For text searches
  - 5.1.4.For $elemMatch compound queries
  - 5.1.5.For date range queries
  - 5.1.6.For aggregation with $unwind
- 5.2.Check index usage statistics
- 5.3.Track index selectivity changes over time
6.Fixing stale plan cache Problems when Indexes Already Exist in MongoDB
- 6.1.Clear Plan Cache
- 6.2.Rebuild Indexes
- 6.3.Use Index Hints
- 6.4.Analyze Query Performance
- 6.5.Configure Plan Cache Settings
7.Set Up Alerts
- 7.1.MongoDB Atlas Alert Setup
- 7.2.Self-Hosted MongoDB alert setup

1. What Are Database Statistics

Database statistics are metadata that the query optimizer uses to estimate the cost of different execution plans. These statistics include.

Row counts in tables
Distribution of values in columns
Index cardinality
Data density and selectivity
NULL value frequencies

When these statistics become outdated (stale), the optimizer makes suboptimal decisions, leading to poor query performance.

2. Stale Statistics in MongoDB

2.1.
How Statistics Affect Execution Plans

MongoDB's query optimizer uses statistics from collection metadata and index statistics to decide.

Whether to use an index or collection scan
Which index to use when multiple options exist
Query shape and plan caching decisions

Unlike relational databases, MongoDB uses a different approach with:

Query plan caching based on query shape
Empirical testing of multiple plans
Adaptive plan selection

2.2.
How Stale Statistics Impact MongoDB

MongoDB's query planner can suffer from.

Cached plans becoming suboptimal as data distribution changes
Index selection issues when index cardinality changes significantly
Performance degradation with stale plan cache

For detection methods, see 〈3. Detecting Stale Statistics in MongoDB〉. For solutions, see 〈5. Performance Improvement Strategies〉 and 〈6. Fixing stale plan cache Problems when Indexes Already Exist in MongoDB〉.

3. Detecting Stale Statistics in MongoDB

3.1.
Key Difference from Relational Databases

Unlike PostgreSQL or Oracle, MongoDB doesn't maintain explicit staleness indicators like n_mod_since_analyze or stale_stats columns. Instead, we must detect stale statistics indirectly through performance symptoms and query analysis (see 〈3.4. Indirect Detection Method: Performance degradation symptoms〉 and 〈3.5. Analyze an explain("executionStats") output〉).

3.2.
MongoDB's Automatic Plan Cache Invalidation

MongoDB automatically invalidates cached query plans after approximately 1,000 write operations (inserts, updates, deletes) on a collection. This means.

Statistics effectively "reset" every ~1,000 writes
High-write-volume collections get fresh plans more frequently
Low-write-volume collections might cache suboptimal plans longer

However, this automatic mechanism doesn't always catch gradual data distribution changes or index selectivity degradation.

3.3.
Important note for MongoDB Atlas users

Atlas restricts direct access to plan cache operations ($planCacheStats, getPlanCache()) for security and resource management.

Where plan cache operations ARE available:

Self-hosted MongoDB - full access to all plan cache commands
MongoDB Community/Enterprise running on our own servers (on-premises or cloud VMs)
Local MongoDB development instances (localhost)
Containerized MongoDB (Docker, Kubernetes) that we manage

Where plan cache operations are NOT available:

MongoDB Atlas (managed cloud service) - restricted for security
Atlas Serverless - no direct cache access
Other managed MongoDB services - may have restrictions

Atlas users should rely on:

Performance Advisor in Atlas UI (provides similar insights)
Query profiler and slow query logs
explain("executionStats") for individual queries (still works! - see 〈3.5. Analyze an explain("executionStats") output〉)
Atlas monitoring metrics and alerts (see 〈7.1. MongoDB Atlas Alert Setup〉)
Real User Monitoring (RUM) for query performance tracking

For all detection methods, see 〈3.4. Indirect Detection Method: Performance degradation symptoms〉 and 〈3.5. Analyze an explain("executionStats") output〉.

3.4.
Indirect Detection Method: Performance degradation symptoms

Query execution time suddenly increases without code changes
Previously fast queries become slow
Increased CPU or memory usage for specific queries
More timeouts or slow query log entries

3.5.
Analyze an `explain("executionStats")` output

1db.orders.find({ status: "pending" }).explain("executionStats")

Key metrics to examine.

1{
2  "executionStats": {
3    "nReturned": 100,           // Rows returned to client
4    "totalDocsExamined": 50000, // Rows scanned
5    "executionTimeMillis": 1523,
6    "executionStages": {
7      "stage": "COLLSCAN"       // Collection scan instead of index!
8    }
9  }
10}

3.6.
Red flags indicating stale statistics or poor index usage

3.6.1.
High examination ratio (`totalDocsExamined` / `nReturned` > 10)

3.6.1.1.
On B-tree

Misconception. High examination ratio is inevitable because we filter documents with WHERE clauses first, then return results.

This is wrong, with proper indexes, the database uses the index to identify matching documents BEFORE examining them. The index is the filter mechanism.

Misconception. But don't I need to scan all rows to check the WHERE condition?

No! This is the entire purpose of indexes - to avoid scanning all rows.

How B-tree indexes eliminate full scans:

Think of an index like a phone book:

Without index (full scan): To find "John Smith", open EVERY page and check EVERY name (scan all 1 million entries)
With index (B-tree): Jump to the "S" section → Jump to "Sm" → Jump to "Smith, John" (check ~log₂(1M) ≈ 20 entries)

3.6.1.2.
Detailed example on how the B-tree structure works

1// Collection has 1,000,000 users
2// We want: status = "active"
3// Only 1,000 users are "active"
4
5// WITHOUT INDEX - Must scan everything:
6db.users.find({ status: "active" })
7
8// Physical process WITHOUT index:
9// 1. Start at first document on disk
10// 2. Read document #1, check status → "inactive" → skip
11// 3. Read document #2, check status → "pending" → skip
12// 4. Read document #3, check status → "active" → keep!
13// 5. Read document #4, check status → "inactive" → skip
14// ... repeat for ALL 1,000,000 documents!
15// Total reads: 1,000,000 documents from disk
16// Time: ~10+ seconds

Now let's see what happens with an index:

1// CREATE INDEX - builds this B-tree structure:
2db.users.createIndex({ status: 1 })
3
4// Index creates a separate sorted tree structure:
5/*
6B-tree Index on "status" field:
7                    [Root Node]
8                  /            \
9            ["active"]          ["inactive", "pending"]
10              /                     /              \
11    [Doc IDs: 5, 12,         [Doc IDs:        [Doc IDs:
12    23, 45, 78,              1, 2, 3,         999,997,
13    ... 1000 IDs]            ... 500k IDs]    ... 499k IDs]
14    
15The index stores:
16- Sorted keys ("active", "inactive", "pending")
17- Document IDs/pointers for each key value
18- Tree structure for fast lookup (log N time)
19*/
20
21// Now query WITH index:
22db.users.find({ status: "active" })
23
24// Physical process WITH index:
25// 1. Look at root node of B-tree (1 read)
26// 2. Compare "active" < root → go left branch (1 read)
27// 3. Found "active" node → contains list of 1,000 document IDs
28// 4. Read only those 1,000 documents by ID from disk
29// Total reads: ~3 index nodes + 1,000 documents = 1,003 reads
30// Time: ~10 milliseconds
31
32// Result:
33// WITHOUT index: Read 1,000,000 docs, examined 1,000,000, returned 1,000 (1000x ratio)
34// WITH index:    Read 1,000 docs, examined 1,000, returned 1,000 (1.0x ratio)

Question. Why we DON'T need to scan all rows with an index?

The B-tree index is a pre-sorted lookup table that was built when we created the index:

1// When we run: db.users.createIndex({ status: 1 })
2// MongoDB builds this mapping (simplified):
3
4Index structure:
5{
6  "active": [5, 12, 23, 45, 78, ... 1000 document IDs],
7  "inactive": [1, 2, 3, 4, 6, ... 500,000 document IDs],
8  "pending": [7, 8, 9, ... 499,000 document IDs]
9}
10
11// When we query: db.users.find({ status: "active" })
12// MongoDB does NOT scan the collection!
13// Instead:
14// 1. Look up "active" in the pre-built index → Found! [5, 12, 23, ...]
15// 2. Jump directly to documents 5, 12, 23, ... on disk
16// 3. Return those specific documents

The key difference:

Full scan: Check condition on every row in the table
Index scan: Check condition in the index only (which is sorted and tiny), then fetch only matching rows

3.6.1.3.
Performance comparison

Operation	Without Index	With Index	Improvement
Data structures accessed	Main table only	Index tree + specific docs	Separate lookup structure
Comparisons needed	1,000,000	~20 (tree depth) + 1,000	1000x fewer
Documents read from disk	1,000,000	1,000	Read only matches
Time complexity	linear		= matching docs
Actual time	10+ s	10 ms	1000x faster
totalDocsExamined	1,000,000	1,000	Only matching docs
Examination ratio	1000x	1.0x	Perfect efficiency

Let's answer the following question:

Question. Do we need to scan all rows to check the WHERE condition?

No! The index is a pre-built answer to our WHERE condition.

When we create an index, MongoDB:

Scans all rows ONCE during index creation
Builds a sorted structure (B-tree) with the answer
Maintains it as data changes

When we query:

Look up in the pre-built answer (the index)
Get document IDs instantly (no scanning needed)
Fetch only those specific documents

Think of it as: "Scanning is done at index creation time, not at query time."

3.6.1.4.
When high examination ratio is acceptable

1// Multikey indexes on arrays may have slight overhead
2db.collection.find({ "tags": "urgent", "status": "active" })

Why multikey indexes examine more documents:

Given these documents:

1{ _id: 1, tags: ["urgent", "bug"], status: "active" }     // Match
2{ _id: 2, tags: ["urgent", "feature"], status: "closed" } // Has "urgent" but wrong status
3{ _id: 3, tags: ["urgent", "docs"], status: "active" }    // Match
4{ _id: 4, tags: ["urgent", "test"], status: "pending" }   // Has "urgent" but wrong status

With compound index on {tags: 1, status: 1}:

Index lookup: Find all entries where tags = "urgent" (using index) → finds 4 documents
Document examination: MongoDB must fetch and check each document's status field
- Doc 1: urgent + active → return
- Doc 2: urgent + closed → examined but not returned
- Doc 3: urgent + active → return
- Doc 4: urgent + pending → examined but not returned

Result:

nReturned: 2 (docs 1 and 3)
totalDocsExamined: 4 (all documents with "urgent")
Ratio: 2x (examined 4, returned 2)

Why this is acceptable:

Still used an index (IXSCAN, not COLLSCAN)
Only examined 4 documents, not the entire collection (could be 50,000 docs)
The overhead is due to legitimate index matches that failed secondary predicates
Much faster than scanning all documents without an index

Acceptable ratio: 1.5x - 3x for multikey/compound indexes with multiple predicates

3.6.1.5.
When high ratio indicates a problem

1// No index exists - must scan everything
2nReturned: 100
3totalDocsExamined: 50,000
4stage: "COLLSCAN"
5// Problem: Missing index, add one!
6
7// Wrong index chosen due to stale statistics
8nReturned: 100
9totalDocsExamined: 50,000
10stage: "IXSCAN"
11indexName: "wrong_index"
12// Problem: Optimizer chose poorly selective index
13// Solution: Update statistics or hint correct index
14
15// Index exists but has poor selectivity
16nReturned: 1
17totalDocsExamined: 10,000
18stage: "IXSCAN"
19indexName: "status_1"
20// Problem: Index not selective enough (e.g., status has only 2 values)
21// Solution: Create compound index with more selective fields

3.6.2.
Collection scans (stage: `COLLSCAN`) when an index should exist

This is the PRIMARY indicator. If we see COLLSCAN:

No index exists on filtered fields, OR
Optimizer determined full scan is cheaper (table too small, or query returns >30% of data)

1// Expected COLLSCAN (acceptable):
2db.smallCollection.find({})  // Fetching all documents
3// Collection has 100 documents - full scan is efficient
4
5// Problematic COLLSCAN:
6db.largeCollection.find({ userId: "12345" })  // Should use index
7// Collection has 10M documents - needs index on userId!

3.6.3.
Wrong index chosen (`indexName` shows unexpected index)

Even with IXSCAN, the optimizer might choose the wrong index due to stale statistics:

1// Expected: Use compound index { userId: 1, timestamp: 1 }
2// Actual: Used index { timestamp: 1 } - less selective
3stage: "IXSCAN"
4indexName: "timestamp_1"
5totalDocsExamined: 50,000  // High ratio despite using index!
6nReturned: 10

Question. When does the optimizer choose the wrong index?

The query optimizer uses statistics to estimate how many documents each index will need to examine (this estimate is called cardinality). It chooses the index with the lowest estimated cost.

3.6.3.1.
Scenario 1: Stale statistics cause wrong index selection

Consider a users collection with 1,000,000 documents and two indexes:

Index A: { status: 1 } (user account status)
Index B: { email: 1 } (user email)

Query: db.users.find({ status: "premium", email: "john@example.com" })

When statistics were fresh (1 month ago):

Status distribution: 50% "free", 50% "premium" (500K each)
Optimizer's estimate: Index A would scan ~500K docs, Index B would scan ~1 doc
Correct choice: Index B (email is unique)

After data growth (statistics NOT updated):

Status distribution changed: 90% "free", 10% "premium" (900K free, 100K premium)
But optimizer still thinks: 50% "free", 50% "premium" (old stats)
Optimizer's calculation (based on stale stats):
- Index A: ~500K docs to scan (WRONG - actually 100K)
- Index B: ~1 doc to scan (correct)
- Still chooses Index B (correct by luck)

3.6.3.2.
Scenario 2: With multiple stale statistics

Query: db.users.find({ status: "premium", country: "USA" })

Indexes:

Index A: { status: 1 }
Index C: { country: 1 }

Optimizer's calculation (using stale stats):

Old stats say: 50% premium users (500K), 30% USA users (300K)
Optimizer estimates: Index A scans 500K, Index C scans 300K
Choice: Index C (lower estimate)

Actual current data:

Reality: 10% premium users (100K), 80% USA users (800K)
If optimizer knew this: Index A scans 100K, Index C scans 800K
Should have chosen: Index A (much better!)

Result with wrong index choice:

1stage: "IXSCAN"
2indexName: "country_1"  // Wrong index!
3totalDocsExamined: 800,000  // Scans most USA users
4docsReturned: 8,000  // Only 10% of USA users are premium
5// Examination ratio: 800,000 / 8,000 = 100x inefficiency!

3.6.3.3.
Summary to two scanerios

Why this happens.

Statistics capture data distribution at a point in time
Data changes (inserts, updates, deletes skew distribution)
Optimizer uses OLD cardinality estimates to calculate index costs
Wrong index appears cheaper based on outdated information
Query uses IXSCAN but examines many irrelevant documents

MongoDB vs SQL databases (same behavior). This is NOT unique to MongoDB, major databases (such as PostgreSQL, MySQL, Oracle, SQL Server) have the same issue. They all:

Use B-tree indexes (same data structure)
Rely on statistics for query planning
Need periodic statistics updates to maintain optimal plans
Can choose wrong indexes when statistics are stale

The index mechanics work identically - the problem is the optimizer's decision-making based on outdated information, not the index structure itself.

How to fix. Force the optimizer to use the correct index. When stale statistics cause wrong index selection, we have several solutions:

Solution 1 (Update statistics, a permanent fix). MongoDB automatically maintains statistics, but we can force a refresh:

1// Option 1: Reindex the collection (updates statistics)
2db.users.reIndex()
3
4// Option 2: Run validate command (repairs and updates stats)
5db.runCommand({ validate: "users", full: true })
6
7// Option 3: Drop and recreate problematic index
8db.users.dropIndex("country_1")
9db.users.createIndex({ country: 1 })

Solution 2 (Use index hints, a temporary workaround). Force the query to use a specific index:

1// Force use of status index
2db.users.find({ status: "premium", country: "USA" })
3  .hint({ status: 1 })
4
5// Alternative: Force by index name
6db.users.find({ status: "premium", country: "USA" })
7  .hint("status_1")
8
9// Verify it uses the correct index
10db.users.find({ status: "premium", country: "USA" })
11  .hint({ status: 1 })
12  .explain("executionStats")

Solution 3 (Create compound index, a long-term solution). If queries frequently filter on multiple fields:

1// Create compound index
2db.users.createIndex({ status: 1, country: 1 })
3
4// Now this query will efficiently use the compound index
5db.users.find({ status: "premium", country: "USA" })
6// Uses status first (10% selectivity) then country (eliminates most remaining)

Solution 4 (Rewrite query). Sometimes restructuring the query helps:

1// Original problematic query
2db.users.find({ status: "premium", country: "USA" })
3
4// Alternative: Use aggregation with explicit pipeline
5db.users.aggregate([
6  { $match: { status: "premium" } },  // Use status index first
7  { $match: { country: "USA" } }       // Then filter by country
8])

Verification. Confirm the fix worked, after applying any solution, verify using explain():

1db.users.find({ status: "premium", country: "USA" })
2  .explain("executionStats")
3
4// Check these values:
5// 1. executionStats.executionStages.indexName: "status_1"
6// 2. executionStats.totalDocsExamined: ~100,000 (not 800,000)
7// 3. Examination ratio: ~12.5x (100K examined / 8K returned)

3.6.4.
High execution time (`executionTimeMillis` significantly higher than baseline)

Combined with other metrics to identify problems.

3.7.
Diagnostic Workflow: What to check in order

When investigating query performance using explain("executionStats") (see 〈3.5. Analyze an explain("executionStats") output〉 for details):

First: Check stage - is it COLLSCAN or IXSCAN? - COLLSCAN → Need to create index (see 〈5. Performance Improvement Strategies〉) - IXSCAN → Continue to next check
Second: Check indexName - is it the right index? - Wrong index → Clear plan cache (see 〈6. Fixing stale plan cache Problems when Indexes Already Exist in MongoDB〉) or use hint - Correct index → Continue to next check
Third: Check examination ratio - is the index selective? - High ratio with correct index → Index has poor selectivity, need compound index

4. MongoDB Query Examples: From Optimal to Problematic

Let's examine the following real database:

Understanding what makes queries slow helps identify when statistics might be stale (see 〈3.6. Red flags indicating stale statistics or poor index usage〉). Here are real-world query patterns that commonly suffer from performance issues:

Document Schema.

1{
2  _id: ObjectId,
3  messagesSessionId: String,
4  success: Boolean,
5  result: [  // Large array with 70+ subdocuments
6    {
7      issueId: String,
8      summaryUuid: String,
9      groupID: Number,
10      lang: String,
11      title: String,
12      summary: String,
13      originalScripts: Array,
14      imgUrls: Array,
15      assignee: Array,
16      topic: Array,
17      location: Array,
18      // ... many more fields
19    }
20  ]
21}

4.1.
Baseline: Optimal Query Performance

Before examining slow queries, let's see what optimal performance looks like:

Query:

1db.llmsummaries.find({ _id: ObjectId("68abc7447d87776535cb4d04") })
2  .explain("executionStats")

Key findings from the output.

Execution Statistics.

1executionStats: {
2executionSuccess: true,
3nReturned: 1,
4executionTimeMillis: 0,
5totalKeysExamined: 1,
6totalDocsExamined: 1
7}

Performance Analysis.

Metric Value Notes
Efficiency ratio 1 / 1 = 1.0x Perfect!
Execution time 0ms Sub-millisecond
Index usage EXPRESS_IXSCAN on _id_ index —
Plan caching isCached: false Simple _id lookups don't require caching

Metric	Value	Notes
Efficiency ratio	`1 / 1 = 1.0x`	Perfect!
Execution time	0ms	Sub-millisecond
Index usage	`EXPRESS_IXSCAN` on `_id_` index	—
Plan caching	`isCached: false`	Simple `_id` lookups don't require caching

Query Plan.

1winningPlan: {
2isCached: false,
3stage: 'EXPRESS_IXSCAN',        // Optimized index scan (MongoDB 8.0+)
4keyPattern: '{ _id: 1 }',
5indexName: '_id_'
6}
7rejectedPlans: []                  // No alternative plans considered

Resource Usage.

1operationMetrics: {
2  docBytesRead: 68251,            // ~68KB document size
3  idxEntryBytesRead: 14,          // Only 14 bytes read from index
4  cpuNanos: 155102                // ~155 microseconds CPU time
5}

What makes this optimal:

Perfect index utilization. Index lookup required only 14 bytes, found exact document pointer immediately
Optimal execution path. No rejected plans, no caching overhead, zero optimization time
No stale statistics concerns. Direct _id lookup always uses primary index, no cardinality estimation needed

This is what we aim for because:

Examination ratio of 1.0x (perfect selectivity)
Sub-millisecond execution
Appropriate index usage
No wasted scans

Now let's contrast this with problematic query patterns:

4.2.
Potentially Slow Queries

All examples below have performance issues that require optimization. For solutions, see 〈5. Performance Improvement Strategies〉.

4.2.1.
Example 1: Scanning Nested Arrays Without Indexes

1// Query: Find all documents where any result has a specific assignee
2db.llmsummaries.find({
3  "result.assignee.userId": "5febb417-3f38-4894-8388-6f79585a5b72"
4}).explain("executionStats")

Actual execution statistics:

1{
2  explainVersion: '1',
3  queryPlanner: {
4    namespace: 'BillieStorage-DEV.llmsummaries',
5    parsedQuery: { 'result.assignee.userId': { '$eq': '5febb417-3f38-4894-8388-6f79585a5b72' } },
6    indexFilterSet: false,
7    winningPlan: {
8      isCached: false,
9      stage: 'COLLSCAN',
10      filter: { 'result.assignee.userId': { '$eq': '5febb417-3f38-4894-8388-6f79585a5b72' } },
11      direction: 'forward'
12    },
13    rejectedPlans: []
14  },
15  executionStats: {
16    executionSuccess: true,
17    nReturned: 3,
18    executionTimeMillis: 148,
19    totalKeysExamined: 0,
20    totalDocsExamined: 6319,
21    executionStages: {
22      isCached: false,
23      stage: 'COLLSCAN',
24      filter: { 'result.assignee.userId': { '$eq': '5febb417-3f38-4894-8388-6f79585a5b72' } },
25      nReturned: 3,
26      executionTimeMillisEstimate: 53,
27      works: 6320,
28      advanced: 3,
29      needTime: 6316,
30      needYield: 0,
31      saveState: 0,
32      restoreState: 0,
33      isEOF: 1,
34      direction: 'forward',
35      docsExamined: 6319
36    },
37    operationMetrics: {
38      docBytesRead: 22236748,
39      docUnitsRead: 176868,
40      cpuNanos: 25932069
41    }
42  },
43  command: {
44    find: 'llmsummaries',
45    filter: { 'result.assignee.userId': '5febb417-3f38-4894-8388-6f79585a5b72' },
46    '$db': 'BillieStorage-DEV'
47  },
48  serverInfo: {
49    host: 'ac-i1i65op-shard-00-01.50i0ong.mongodb.net',
50    port: 27017,
51    version: '8.0.19',
52    gitVersion: 'cc1adb6b0875cc3003854ac2489818e459686f0b'
53  },
54  ok: 1
55}

Performance analysis:

Metric	Value	Impact
Efficiency ratio	6319 / 3 = 2,106x inefficiency!	Extremely wasteful
Collection scan	Examined every single document	6,319 documents
Index usage	`totalKeysExamined: 0`	No index helped
Resource waste	Read 22MB of data	To return 3 matching documents
Nested array overhead	Must scan `result` array	In each document

Why this is slow:

MongoDB must open and examine all 6,319 documents
For each document, traverse the nested result array
Check each assignee array element within each result
No index on result.assignee.userId to shortcut this process

4.2.2.
Example 2: Complex Text Search in Nested Documents

1// Query: Search for keywords in summaries within result array
2db.llmsummaries.find({
3  "result.summary": { $regex: /floor height|elevation/i }
4}).explain("executionStats")

Actual execution statistics:

1{
2  explainVersion: '1',
3  queryPlanner: {
4    namespace: 'BillieStorage-DEV.llmsummaries',
5    parsedQuery: { 'result.summary': { '$regex': 'floor height|elevation', '$options': 'i' } },
6    indexFilterSet: false,
7    winningPlan: {
8      isCached: false,
9      stage: 'COLLSCAN',
10      filter: { 'result.summary': { '$regex': 'floor height|elevation', '$options': 'i' } },
11      direction: 'forward'
12    },
13    rejectedPlans: []
14  },
15  executionStats: {
16    executionSuccess: true,
17    nReturned: 6,
18    executionTimeMillis: 24,
19    totalKeysExamined: 0,
20    totalDocsExamined: 6319,
21    executionStages: {
22      isCached: false,
23      stage: 'COLLSCAN',
24      filter: { 'result.summary': { '$regex': 'floor height|elevation', '$options': 'i' } },
25      nReturned: 6,
26      executionTimeMillisEstimate: 7,
27      works: 6320,
28      advanced: 6,
29      needTime: 6313,
30      needYield: 0,
31      saveState: 0,
32      restoreState: 0,
33      isEOF: 1,
34      direction: 'forward',
35      docsExamined: 6319
36    },
37    operationMetrics: {
38      docBytesRead: 22236748,
39      docUnitsRead: 176868,
40      cpuNanos: 24605620
41    }
42  },
43  command: {
44    find: 'llmsummaries',
45    filter: { 'result.summary': { '$regex': 'floor height|elevation', '$options': 'i' } },
46    '$db': 'BillieStorage-DEV'
47  },
48  serverInfo: {
49    host: 'ac-i1i65op-shard-00-02.50i0ong.mongodb.net',
50    port: 27017,
51    version: '8.0.19',
52    gitVersion: 'cc1adb6b0875cc3003854ac2489818e459686f0b'
53  },
54  ok: 1
55}

Performance analysis:

Metric	Value	Impact
Efficiency ratio	6319 / 6 = 1,053x inefficiency!	Highly wasteful
Collection scan	Full collection scan	Despite only 6 matches
Regex overhead	Case-insensitive pattern matching (`/i` flag)	On every document
Nested array scanning	Must check `result.summary`	In every array element
No text index	Cannot leverage inverted index	For text search

Why this is particularly slow:

Case-insensitive regex (/i) prevents simple string comparison
MongoDB evaluates regex pattern against every summary field in every result array
With ~70 result elements per document: 6,319 docs × 70 elements = ~442,330 string matches!
Pattern has alternation (|) requiring two substring checks per field
No short-circuit: must scan entire collection even after finding matches

4.2.3.
Example 3: Multiple Array Element Matching

1// Query: Find sessions with specific topic AND location
2db.llmsummaries.find({
3  "result": {
4    $elemMatch: {
5      "topic.name": "FLOORING",
6      "location.detail": "2F"
7    }
8  }
9}).explain("executionStats")

Actual execution statistics:

1{
2  explainVersion: '1',
3  queryPlanner: {
4    namespace: 'BillieStorage-DEV.llmsummaries',
5    parsedQuery: {
6      result: {
7        '$elemMatch': {
8          '$and': [
9            { 'location.detail': { '$eq': '2F' } },
10            { 'topic.name': { '$eq': 'FLOORING' } }
11          ]
12        }
13      }
14    },
15    indexFilterSet: false,
16    winningPlan: {
17      isCached: false,
18      stage: 'COLLSCAN',
19      filter: {
20        result: {
21          '$elemMatch': {
22            '$and': [
23              { 'location.detail': { '$eq': '2F' } },
24              { 'topic.name': { '$eq': 'FLOORING' } }
25            ]
26          }
27        }
28      },
29      direction: 'forward'
30    },
31    rejectedPlans: []
32  },
33  executionStats: {
34    executionSuccess: true,
35    nReturned: 0,
36    executionTimeMillis: 10,
37    totalKeysExamined: 0,
38    totalDocsExamined: 6319,
39    executionStages: {
40      isCached: false,
41      stage: 'COLLSCAN',
42      filter: {
43        result: {
44          '$elemMatch': {
45            '$and': [
46              { 'location.detail': { '$eq': '2F' } },
47              { 'topic.name': { '$eq': 'FLOORING' } }
48            ]
49          }
50        }
51      },
52      nReturned: 0,
53      executionTimeMillisEstimate: 4,
54      works: 6320,
55      advanced: 0,
56      needTime: 6319,
57      needYield: 0,
58      saveState: 0,
59      restoreState: 0,
60      isEOF: 1,
61      direction: 'forward',
62      docsExamined: 6319
63    },
64    operationMetrics: {
65      docBytesRead: 22236748,
66      docUnitsRead: 176868,
67      cpuNanos: 10192977
68    }
69  },
70  queryShapeHash: '7AFB5DAD83629CECD2045DED8612797324E0D3DEC377A9369D11FD6CD0D3C81E',
71  command: {
72    find: 'llmsummaries',
73    filter: {
74      result: {
75        '$elemMatch': {
76          'topic.name': 'FLOORING',
77          'location.detail': '2F'
78        }
79      }
80    },
81    '$db': 'BillieStorage-DEV'
82  },
83  serverInfo: {
84    host: 'ac-i1i65op-shard-00-02.50i0ong.mongodb.net',
85    port: 27017,
86    version: '8.0.19',
87    gitVersion: 'cc1adb6b0875cc3003854ac2489818e459686f0b'
88  },
89  ok: 1
90}

Performance analysis:

Metric	Value	Impact
Efficiency ratio	Undefined (0 returned)	Examined all 6,319 documents unnecessarily
Collection scan	Cannot short-circuit	Even though no matches exist
Index usage	`totalKeysExamined: 0`	No compound index available
$elemMatch complexity	Must evaluate AND condition	On nested arrays for every document
Resource waste	Read 22MB	To return zero results

Why this is particularly inefficient:

Compound condition overhead: MongoDB evaluates both topic.name = "FLOORING" AND location.detail = "2F" for each result element
Nested array scanning: With ~70 result elements per document: 6,319 docs × 70 = ~442,330 array element checks
No early termination: Even though no matches found, must scan entire collection to confirm
Double nested arrays: Both topic and location are arrays within result array (triple nesting!)
$elemMatch semantics: Requires finding single array element matching ALL conditions - more complex than separate filters

4.2.4.
Example 4: Aggregation Counting Array Elements

1// Query: Count total issues by groupID across all sessions
2db.llmsummaries.aggregate([
3  { $unwind: "$result" },
4  { $group: { _id: "$result.groupID", count: { $sum: 1 } } },
5  { $sort: { count: -1 } }
6])

Actual execution statistics:

1// Performance measurement results:
2Execution time: 638ms
3Number of unique groups: 124
4Total unwound documents: 12920
5Input documents: 6319
6Expansion factor: 2.04x (12920 / 6319)

Pipeline stages:

$unwind: Expands 6,319 documents into 12,920 documents (2.04x expansion)
$group: Aggregates 12,920 unwound documents into 124 unique groups
$sort: Sorts 124 groups in memory by count descending

Performance analysis:

Metric	Value	Impact
Document expansion	2.04x more than input	12,920 unwound from 6,319 input
Array size	~2 elements on average	Per `result` array
Execution time	638ms	For processing 12,920 unwound documents
Memory usage	Memory-intensive operation	All grouping and sorting in RAM
Memory limits	Without `allowDiskUse: true`	Large result sets could exceed limits

Why this is slow:

No index can help, aggregation pipelines that start with $unwind must perform full collection scan (COLLSCAN) to access all documents
Array unwinding creates intermediate result set 2x larger than input
Grouping and sorting operations consume significant memory
For larger arrays or collections, execution time grows linearly with total unwound documents

4.2.5.
Example 5: Date Range Query on Nested Timestamps

1// Query: Find sessions with results between date range
2db.llmsummaries.find({
3  "result.issueCreatedAt": {
4    $gte: NumberLong("1756086000000"),
5    $lte: NumberLong("1756090000000")
6  }
7}).explain("executionStats")

Actual execution statistics:

1{
2  explainVersion: '1',
3  queryPlanner: {
4    namespace: 'BillieStorage-DEV.llmsummaries',
5    parsedQuery: {
6      '$and': [
7        {
8          'result.issueCreatedAt': {
9            '$lte': 1756090000000
10          }
11        },
12        {
13          'result.issueCreatedAt': {
14            '$gte': 1756086000000
15          }
16        }
17      ]
18    },
19    indexFilterSet: false,
20    queryHash: '974F7C6A',
21    planCacheShapeHash: '974F7C6A',
22    planCacheKey: 'F1AE37C2',
23    optimizationTimeMillis: 0,
24    maxIndexedOrSolutionsReached: false,
25    maxIndexedAndSolutionsReached: false,
26    maxScansToExplodeReached: false,
27    prunedSimilarIndexes: false,
28    winningPlan: {
29      isCached: false,
30      stage: 'COLLSCAN',
31      filter: {
32        '$and': [
33          {
34            'result.issueCreatedAt': {
35              '$lte': 1756090000000
36            }
37          },
38          {
39            'result.issueCreatedAt': {
40              '$gte': 1756086000000
41            }
42          }
43        ]
44      },
45      direction: 'forward'
46    },
47    rejectedPlans: []
48  },
49  executionStats: {
50    executionSuccess: true,
51    nReturned: 1,
52    executionTimeMillis: 10,
53    totalKeysExamined: 0,
54    totalDocsExamined: 6319,
55    executionStages: {
56      isCached: false,
57      stage: 'COLLSCAN',
58      filter: {
59        '$and': [
60          {
61            'result.issueCreatedAt': {
62              '$lte': 1756090000000
63            }
64          },
65          {
66            'result.issueCreatedAt': {
67              '$gte': 1756086000000
68            }
69          }
70        ]
71      },
72      nReturned: 1,
73      executionTimeMillisEstimate: 10,
74      works: 6320,
75      advanced: 1,
76      needTime: 6318,
77      needYield: 0,
78      saveState: 0,
79      restoreState: 0,
80      isEOF: 1,
81      direction: 'forward',
82      docsExamined: 6319
83    },
84    operationMetrics: {
85      docBytesRead: 22236748,
86      docUnitsRead: 176868,
87      cpuNanos: 10846482
88    }
89  },
90  queryShapeHash: '0E0E049F234A4E643217E2AFF74105762F0747D592C87669F9E1CBDB47FB24F0',
91  command: {
92    find: 'llmsummaries',
93    filter: {
94      'result.issueCreatedAt': {
95        '$gte': 1756086000000,
96        '$lte': 1756090000000
97      }
98    },
99    '$db': 'BillieStorage-DEV'
100  },
101  serverInfo: {
102    host: 'ac-i1i65op-shard-00-02.50i0ong.mongodb.net',
103    port: 27017,
104    version: '8.0.19',
105    gitVersion: 'cc1adb6b0875cc3003854ac2489818e459686f0b'
106  },
107  ok: 1
108}

Performance analysis:

Metric	Value	Impact
Efficiency ratio	6319 / 1 = 6,319x inefficiency!	Worst efficiency among all examples
Collection scan	Examined every document	For single match
Index usage	`totalKeysExamined: 0`	No index on date field
Range query overhead	Must evaluate `$gte` AND `$lte`	On nested arrays
Resource waste	Read 22MB	To return 1 matching document

Why this is extremely inefficient:

Date range comparisons: MongoDB evaluates both conditions ($gte and $lte) on every nested issueCreatedAt field
Nested array scanning: With ~70 result elements per document: 6,319 docs × 70 = ~442,330 date comparisons
NumberLong overhead: Each comparison requires handling 64-bit integer values
No temporal index: Cannot use B-tree range scan to quickly locate documents within time window
Worst efficiency ratio: 6,319x is the highest among all examples - scanning entire collection for 1 result!

4.2.6.
Example 6: Compound Query with OR Conditions

1// Query: Find urgent OR negative sentiment issues
2db.llmsummaries.find({
3  $or: [
4    { "result.priority.name": "URGENT" },
5    { "result.sentiment.name": "NEGATIVE" }
6  ]
7}).explain("executionStats")

Actual execution statistics:

1{
2  explainVersion: '1',
3  queryPlanner: {
4    namespace: 'BillieStorage-DEV.llmsummaries',
5    parsedQuery: {
6      '$or': [
7        { 'result.priority.name': { '$eq': 'URGENT' } },
8        { 'result.sentiment.name': { '$eq': 'NEGATIVE' } }
9      ]
10    },
11    indexFilterSet: false,
12    queryHash: 'A51BF552',
13    planCacheShapeHash: 'A51BF552',
14    planCacheKey: '85972E1A',
15    optimizationTimeMillis: 0,
16    maxIndexedOrSolutionsReached: false,
17    maxIndexedAndSolutionsReached: false,
18    maxScansToExplodeReached: false,
19    prunedSimilarIndexes: false,
20    winningPlan: {
21      isCached: false,
22      stage: 'SUBPLAN',
23      inputStage: {
24        stage: 'COLLSCAN',
25        filter: {
26          '$or': [
27            { 'result.priority.name': { '$eq': 'URGENT' } },
28            { 'result.sentiment.name': { '$eq': 'NEGATIVE' } }
29          ]
30        },
31        direction: 'forward'
32      }
33    },
34    rejectedPlans: []
35  },
36  executionStats: {
37    executionSuccess: true,
38    nReturned: 1243,
39    executionTimeMillis: 15,
40    totalKeysExamined: 0,
41    totalDocsExamined: 6319,
42    executionStages: {
43      isCached: false,
44      stage: 'SUBPLAN',
45      nReturned: 1243,
46      executionTimeMillisEstimate: 7,
47      works: 6320,
48      advanced: 1243,
49      needTime: 5076,
50      needYield: 0,
51      saveState: 0,
52      restoreState: 0,
53      isEOF: 1,
54      inputStage: {
55        stage: 'COLLSCAN',
56        filter: {
57          '$or': [
58            { 'result.priority.name': { '$eq': 'URGENT' } },
59            { 'result.sentiment.name': { '$eq': 'NEGATIVE' } }
60          ]
61        },
62        nReturned: 1243,
63        executionTimeMillisEstimate: 7,
64        works: 6320,
65        advanced: 1243,
66        needTime: 5076,
67        needYield: 0,
68        saveState: 0,
69        restoreState: 0,
70        isEOF: 1,
71        direction: 'forward',
72        docsExamined: 6319
73      }
74    },
75    operationMetrics: {
76      docBytesRead: 22236748,
77      docUnitsRead: 176868,
78      cpuNanos: 15083626
79    }
80  },
81  queryShapeHash: '06F16492B28860305AE761F7D4D06584BC6F6A47C7A55DBB3C9BAC7CFFE8A8D6',
82  command: {
83    find: 'llmsummaries',
84    filter: {
85      '$or': [
86        { 'result.priority.name': 'URGENT' },
87        { 'result.sentiment.name': 'NEGATIVE' }
88      ]
89    },
90    '$db': 'BillieStorage-DEV'
91  },
92  serverInfo: {
93    host: 'ac-i1i65op-shard-00-02.50i0ong.mongodb.net',
94    port: 27017,
95    version: '8.0.19',
96    gitVersion: 'cc1adb6b0875cc3003854ac2489818e459686f0b'
97  },
98  ok: 1
99}

Performance analysis:

Metric	Value	Impact
Efficiency ratio	6319 / 1243 = 5.08x inefficiency	Better than other examples
Selectivity	Returns 1,243 documents (19.7%)	Best match rate among all examples!
Scan type	`SUBPLAN` stage with `COLLSCAN`	No indexes used
Execution time	Only 15ms	Despite scanning entire collection
Index usage	`totalKeysExamined: 0`	No indexes on either field

Why this has better efficiency than other examples:

Higher match rate: OR condition is less selective - many documents match at least one condition
19.7% hit rate: Returns 1,243 out of 6,319 documents examined
Less wasted work: Compared to Example 5 (99.98% waste), this only wastes ~80% of examined data
Fast per-document: 15ms / 1243 docs = ~0.012ms per returned document

Why it still needs optimization:

SUBPLAN with COLLSCAN: MongoDB uses subplan optimization for OR but still scans all documents
Cannot short-circuit: Must check both OR conditions on every nested array element
Nested array overhead: With ~70 result elements per document: 6,319 docs × 70 × 2 fields = ~885,000 field checks
No index intersection: Without indexes, cannot efficiently union results from separate index scans

4.2.7.
Signs these queries need attention

explain() shows COLLSCAN instead of IXSCAN (or SUBPLAN with COLLSCAN for OR queries)
totalDocsExamined / nReturned > 100 for low selectivity (Example 1: 2106x, Example 2: 1053x, Example 5: 6319x!)
Even "good" selectivity can be improved (Example 6: 5.08x with 1,243 results, but still scans all docs)
executionTimeMillis > 100ms for queries returning <10 documents
Aggregation pipelines taking > 500ms for moderate data (Example 4: 638ms for 6,319 docs)
totalKeysExamined: 0 means no index was used at all (all five find() examples!)
docBytesRead is disproportionately large (all examples read 22MB)
High cpuNanos relative to actual work done
Examining entire collection even when returning zero results (Example 3)
Extremely high efficiency ratios for single-document results (Example 5: 6319x)
Large expansion factors in aggregations (Example 4: 2.04x with $unwind)

5. Performance Improvement Strategies

This section focuses on creating new indexes for queries that are missing them or have poorly designed indexes.

When to use these strategies:

Queries showing COLLSCAN (collection scan)
No suitable indexes exist for common query patterns
Existing indexes don't match query patterns (e.g., compound index field order)
High totalDocsExamined / nReturned ratio indicating missing indexes

5.1.
Index Design

5.1.1.
For the nested array query

For the query, we refer to 〈4.2.1. Example 1: Scanning Nested Arrays Without Indexes〉.

1// Create multikey index on nested array field
2db.llmsummaries.createIndex({
3  "result.assignee.userId": 1
4})
5
6// After index creation, the query should show:
7// - stage: 'IXSCAN' instead of 'COLLSCAN'
8// - totalKeysExamined: ~3 (only matching documents)
9// - totalDocsExamined: 3 (exact matches only)
10// - executionTimeMillis: <5ms (instead of 148ms)
11// - Efficiency ratio: 1.0x (instead of 2106x)

Expected improvement:

Before index: 148ms, examined 6,319 docs, read 22MB
After index: <5ms, examined 3 docs, read ~40KB
Speedup: 30-50x faster
Resource savings: 99.95% less data read

5.1.2.
For compound queries on arrays

1// Create compound index for multiple common filters
2db.llmsummaries.createIndex({
3  "result.assignee.userId": 1,
4  "result.groupID": 1
5})
6
7// Or denormalize if queries are critical
8// Move frequently-queried array elements to root level

5.1.3.
For text searches

For 〈4.2.2. Example 2: Complex Text Search in Nested Documents〉:

1// Create text index on summary fields
2db.llmsummaries.createIndex({
3  "result.summary": "text",
4  "result.title": "text"
5})
6
7// Query with text index (replaces regex)
8db.llmsummaries.find({ 
9  $text: { $search: "floor height elevation" } 
10})
11
12// After text index creation, the query should show:
13// - stage: 'TEXT' instead of 'COLLSCAN'
14// - Uses inverted index for efficient word matching
15// - No case-sensitivity overhead (text indexes are case-insensitive by default)
16// - executionTimeMillis: <10ms (instead of 24ms)
17// - Efficiency ratio: Much closer to 1.0x

Expected improvement:

Before text index: 24ms, examined 6,319 docs, ~442,330 regex operations
After text index: <10ms, examined only matching docs, efficient word lookup
Speedup: 2-5x faster
Note: Text indexes work with $text operator, not $regex

Alternative for regex patterns:

1// If regex is required, consider filtering first
2db.llmsummaries.find({
3  success: true,  // Filter on indexed field first
4  "result.summary": { $regex: /floor height|elevation/i }
5})
6
7// Or store commonly searched terms separately
8db.llmsummaries.createIndex({ "result.keywords": 1 })

5.1.4.
For `$elemMatch` compound queries

For 〈4.2.3. Example 3: Multiple Array Element Matching〉:

1// Option 1: Create compound multikey index
2db.llmsummaries.createIndex({
3  "result.topic.name": 1,
4  "result.location.detail": 1
5})
6
7// Query remains the same - optimizer will use compound index
8db.llmsummaries.find({
9  "result": {
10    $elemMatch: {
11      "topic.name": "FLOORING",
12      "location.detail": "2F"
13    }
14  }
15})
16
17// After compound index creation, the query should show:
18// - stage: 'IXSCAN' instead of 'COLLSCAN'
19// - totalKeysExamined: varies based on selectivity
20// - executionTimeMillis: <5ms (instead of 10ms)
21// - Can short-circuit when no matches exist

Expected improvement:

Before index: 10ms, examined all 6,319 docs, read 22MB, 0 results
After compound index: <5ms, examined only candidate docs, early termination
Speedup: 2-3x faster even for non-matching queries
Resource savings: 95%+ less data read when matches exist

Alternative: Denormalize for critical queries

1// If this query pattern is frequent, denormalize to root level
2// Add aggregated fields during document creation:
3{
4  _id: ObjectId,
5  messagesSessionId: String,
6  hasFlooring2F: Boolean,  // Pre-computed flag
7  topicLocations: [        // Flattened combinations
8    { topic: "FLOORING", location: "2F" }
9  ],
10  result: [ /* original nested data */ ]
11}
12
13// Query becomes simple and fast:
14db.llmsummaries.find({ hasFlooring2F: true })
15// Or with index on topicLocations:
16db.llmsummaries.find({
17  topicLocations: { $elemMatch: { topic: "FLOORING", location: "2F" } }
18})

5.1.5.
For date range queries

For 〈4.2.5. Example 5: Date Range Query on Nested Timestamps〉:

1// Create index on nested date field
2db.llmsummaries.createIndex({
3  "result.issueCreatedAt": 1
4})
5
6// Query remains the same - optimizer will use index for range scan
7db.llmsummaries.find({
8  "result.issueCreatedAt": {
9    $gte: NumberLong("1756086000000"),
10    $lte: NumberLong("1756090000000")
11  }
12})
13
14// After index creation, the query should show:
15// - stage: 'IXSCAN' instead of 'COLLSCAN'
16// - Uses B-tree range scan to quickly locate matching documents
17// - executionTimeMillis: <5ms (instead of 10ms)
18// - Efficiency ratio: Much closer to 1.0x (instead of 6319x!)

Expected improvement:

Before index: 10ms, examined 6,319 docs, read 22MB, 1 result
After index: <5ms, examined only docs within date range, minimal reads
Speedup: 2-5x faster for typical date range queries
Resource savings: 99%+ less data scanned

Alternative: Denormalize for frequent date queries

1// If date range queries are critical, add indexed date field at root level
2{
3  _id: ObjectId,
4  messagesSessionId: String,
5  earliestIssueDate: NumberLong,   // Min date from result array
6  latestIssueDate: NumberLong,     // Max date from result array
7  result: [ /* original nested data with issueCreatedAt */ ]
8}
9
10// Create simple index (faster than multikey index)
11db.llmsummaries.createIndex({ 
12  earliestIssueDate: 1, 
13  latestIssueDate: 1 
14})
15
16// Query becomes more efficient:
17db.llmsummaries.find({
18  earliestIssueDate: { $lte: NumberLong("1756090000000") },
19  latestIssueDate: { $gte: NumberLong("1756086000000") }
20})

5.1.6.
For aggregation with `$unwind`

For 〈4.2.4. Example 4: Aggregation Counting Array Elements〉:

1// Option 1: Add $match before $unwind to reduce documents processed
2db.llmsummaries.aggregate([
3  { $match: { success: true, messagesSessionId: { $in: relevantSessions } } },  // Filter early
4  { $project: { result: 1 } },     // Project only needed fields
5  { $unwind: "$result" },
6  { $group: { _id: "$result.groupID", count: { $sum: 1 } } },
7  { $sort: { count: -1 } }
8], { allowDiskUse: true })
9
10// Option 2: Use $facet to run multiple aggregations efficiently
11db.llmsummaries.aggregate([
12  {
13    $facet: {
14      groupCounts: [
15        { $unwind: "$result" },
16        { $group: { _id: "$result.groupID", count: { $sum: 1 } } },
17        { $sort: { count: -1 } }
18      ],
19      totalDocs: [
20        { $count: "total" }
21      ]
22    }
23  }
24])

Expected improvement:

Current performance: 638ms, 6,319 input → 12,920 unwound (2.04x expansion)
With $match filter: Reduces input document set, proportionally faster
With allowDiskUse: Prevents memory errors for larger result sets
Note: Indexes cannot optimize $unwind operations, but can speed up initial $match

Alternative. Pre-aggregate during data ingestion

1// If groupID counts are critical, maintain them at document level
2{
3  _id: ObjectId,
4  messagesSessionId: String,
5  groupCounts: {           // Pre-computed aggregation
6    "group1": 3,
7    "group2": 5
8  },
9  result: [ /* original nested data */ ]
10}
11
12// Query becomes a simple find + aggregation
13db.llmsummaries.aggregate([
14  { $project: { groupCounts: { $objectToArray: "$groupCounts" } } },
15  { $unwind: "$groupCounts" },
16  { $group: { 
17      _id: "$groupCounts.k", 
18      totalCount: { $sum: "$groupCounts.v" } 
19  } },
20  { $sort: { totalCount: -1 } }
21])
22
23// Or use materialized views for real-time aggregations
24db.createView(
25  "groupCountsView",
26  "llmsummaries",
27  [
28    { $unwind: "$result" },
29    { $group: { _id: "$result.groupID", count: { $sum: 1 } } },
30    { $sort: { count: -1 } }
31  ]
32)
33
34// Query the view with $merge for incremental updates

Expected improvement with pre-aggregation:

Before: 638ms processing 12,920 unwound documents
After: <50ms querying pre-computed counts from document root
Speedup: 10-15x faster
Trade-off: Additional storage and update complexity

For OR queries (〈4.2.6. Example 6: Compound Query with OR Conditions〉 above):

1// Create separate indexes for each OR condition
2db.llmsummaries.createIndex({
3  "result.priority.name": 1
4})
5
6db.llmsummaries.createIndex({
7  "result.sentiment.name": 1
8})
9
10// Query remains the same - optimizer will use index union
11db.llmsummaries.find({
12  $or: [
13    { "result.priority.name": "URGENT" },
14    { "result.sentiment.name": "NEGATIVE" }
15  ]
16})
17
18// After index creation, the query should show:
19// - Multiple index scans combined (index union)
20// - Reduced documents examined (only candidates from each index)
21// - executionTimeMillis: <10ms (instead of 15ms)
22// - Better efficiency as collection grows

Expected improvement:

Before indexes: 15ms, examined 6,319 docs, 5.08x efficiency, read 22MB
After indexes: <10ms, examined only matching index entries, ~1.5-2x efficiency
Speedup: 1.5-2x faster
Scalability: Performance gain increases with collection size growth

Alternative: Denormalize with boolean flags

1// Add pre-computed flags at root level for frequent OR patterns
2{
3  _id: ObjectId,
4  messagesSessionId: String,
5  hasUrgentPriority: Boolean,      // Pre-computed flag
6  hasNegativeSentiment: Boolean,   // Pre-computed flag
7  result: [ /* original nested data */ ]
8}
9
10// Create simple indexes (non-multikey, faster)
11db.llmsummaries.createIndex({ hasUrgentPriority: 1 })
12db.llmsummaries.createIndex({ hasNegativeSentiment: 1 })
13
14// Query becomes much more efficient:
15db.llmsummaries.find({
16  $or: [
17    { hasUrgentPriority: true },
18    { hasNegativeSentiment: true }
19  ]
20})

5.2.
Check index usage statistics

1db.collection.aggregate([{ $indexStats: {} }])

Example output:

1{
2  name: 'messagesSessionId_1',
3  key: { messagesSessionId: 1 },
4  accesses: {
5    ops: 0,                              // ← Index operations count
6    since: ISODate("2026-02-24T16:17:27.004Z")  // ← When counter started
7  }
8}

What each field actually means:

ops Number of times this index was used since the since timestamp
since When the statistics counter started tracking (NOT last use time!)
- Resets when: mongod restarts, index is rebuilt, replica member restarts
- If since is recent (days/weeks ago) and ops: 0, not enough data yet
- If since is old (months ago) and ops: 0, index is truly unused

What this CAN tell us:

Unused indexes - ops: 0 with old since date (>30 days) = definitely not used
Relatively unused indexes - Very low ops compared to other indexes
Which indexes are hot - High ops count indicates frequent use

What this CANNOT tell us:

Index selectivity or efficiency
Whether the index helps performance
If queries using the index are slow
If a different index would be better

Interpreting the results:

For the example output above:

1// If `since` is 2026-02-24 (2 days ago):
2ops: 0, since: 2026-02-24  // ← Too recent to conclude anything
3                           // Could be: recently created, or mongod restarted
4
5// If `since` is 2025-11-01 (3+ months ago):
6ops: 0, since: 2025-11-01  // ← Strong signal: index unused for months
7                           // Safe to consider dropping (after testing!)
8
9// If comparing indexes:
10Index A: ops: 5000, since: 2026-01-01
11Index B: ops: 2, since: 2026-01-01    // ← Index B rarely used, investigate why

Action items:

Unused indexes (ops: 0 with old since) - Consider dropping to reduce write overhead
Low-use indexes - Verify they serve specific important queries
High-use indexes - Ensure they're properly sized and maintained

Important: Always test in non-production first! An index with ops: 0 might still be critical for a rare but important query (monthly reports, admin operations, etc.).

5.3.
Track index selectivity changes over time

The key to detecting stale plans is comparing query execution behavior before and after collection changes, not plan cache metadata.

What we need to track:

Index selectivity ratio - for each important query:

1// Run periodically and log results
2const result = db.orders.find({ status: "pending" }).explain("executionStats");
3const selectivity = result.executionStats.nReturned / result.executionStats.totalDocsExamined;
4console.log(`Selectivity: ${selectivity}`); // Lower = worse

Execution time trends - same query taking longer?

1const result = db.orders.find({ customerId: 123 }).explain("executionStats");
2console.log(`Execution time: ${result.executionStats.executionTimeMillis}ms`);

Plan stability - is the same index being chosen?

1const result = db.orders.find({ status: "pending" }).explain("executionStats");
2console.log(`Winning plan: ${result.executionStats.executionStages.stage}`);
3console.log(`Index used: ${result.executionStats.executionStages.indexName}`);

Tracking Flow:

Establish baseline explain() results for critical queries
Re-run same queries periodically to detect degradation
Compare totalDocsExamined / nReturned ratio over time
High ratio (scanning many docs, returning few) = likely stale plan

6. Fixing stale plan cache Problems when Indexes Already Exist in MongoDB

We will adopt solutions in this section when:

Indexes exist, but query performance suddenly degraded
explain() shows wrong index being used
Collection has grown significantly since plan was cached
Data distribution has changed (e.g., status field that was evenly distributed is now 99% "archived")

NOT for:

Missing indexes (see 〈5. Performance Improvement Strategies〉 instead)
Collection scans on unindexed queries (create indexes first)
Query patterns that inherently need to scan many documents

6.1.
Clear Plan Cache

Self-hosted MongoDB:

1// Clear all cached plans for a collection
2db.collection.getPlanCache().clear()
3
4// Clear specific query shape
5db.collection.getPlanCache().clearPlansByQuery(
6  { field: value },
7  { projection },
8  { sort }
9)

MongoDB Atlas:

1// Direct plan cache clearing not available in Atlas
2// Atlas manages plan cache automatically
3
4// Alternative: Force plan re-evaluation by:
5// 1. Adding/modifying an index (triggers cache invalidation)
6// 2. Waiting for automatic invalidation (~1000 writes)
7// 3. Using hints to override cached plan selection
8
9// Example: Rebuild index to clear related plans
10db.collection.dropIndex("index_name")
11db.collection.createIndex({ field: 1 }, { name: "index_name" })

6.2.
Rebuild Indexes

1// Rebuild specific index
2db.collection.reIndex()
3
4// Drop and recreate index to update statistics
5db.collection.dropIndex("index_name")
6db.collection.createIndex({ field: 1 })

6.3.
Use Index Hints

1// Force specific index usage
2db.collection.find({ field: value }).hint({ field: 1 })
3
4// Force collection scan
5db.collection.find({ field: value }).hint({ $natural: 1 })

6.4.
Analyze Query Performance

1// Get detailed execution statistics
2db.collection.find({ field: value }).explain("executionStats")
3
4// Monitor slow queries
5db.setProfilingLevel(2)  // Profile all operations
6db.system.profile.find().limit(10).sort({ ts: -1 })

6.5.
Configure Plan Cache Settings

Self-hosted MongoDB 5.0+:

1// Plan cache is managed automatically
2// But we can influence it through query settings
3
4// Set query settings for specific operations
5db.adminCommand({
6  setQuerySettings: {
7    filter: { field: { $eq: "value" } },
8    settings: {
9      indexHints: { allowedIndexes: ["index_name"] }
10    }
11  }
12})

MongoDB Atlas approach:

Use Performance Advisor. Atlas automatically suggests indexes and identifies slow queries
Enable Profiling. Set profiling level via Atlas UI (Cluster → Configuration → Additional Settings)
Monitor Metrics. Track query efficiency through Atlas monitoring dashboard
Create Alerts. Set up alerts for high query execution times or scan ratios
Use Query Profiler. Analyze historical query performance in Atlas UI

7. Set Up Alerts

Monitor for performance degradation that might indicate stale statistics (see 〈2.2. How Stale Statistics Impact MongoDB〉 for why this matters):

Sudden increases in query execution time
Changes in execution plan choices
Resource utilization spikes

7.1.
MongoDB Atlas Alert Setup

WHERE to configure these alerts:

These alert configurations can be set up in three ways:

Method 1: Atlas Web UI (Easiest - Visual Interface)

Log into MongoDB Atlas Console: https://cloud.mongodb.com
Select the Project
Click "Alerts" in the left sidebar
Click "Add Alert" button
Choose alert type from dropdown (e.g., "Query Targeting: Scanned Objects / Returned")
Set threshold value (e.g., 1000)
Configure notification channels (email, Slack, PagerDuty, etc.)
Save the alert

Method 2: Atlas Admin API (Programmatic - REST API)

1# Create alert via REST API
2curl --user "{PUBLIC_KEY}:{PRIVATE_KEY}" --digest \
3  --header "Content-Type: application/json" \
4  --request POST \
5  "https://cloud.mongodb.com/api/atlas/v2/groups/{PROJECT_ID}/alertConfigs" \
6  --data '{
7    "eventTypeName": "QUERY_TARGETING_SCANNED_OBJECTS_PER_RETURNED",
8    "enabled": true,
9    "threshold": {
10      "operator": "GREATER_THAN",
11      "threshold": 1000,
12      "units": "RAW"
13    },
14    "notifications": [
15      {
16        "typeName": "EMAIL",
17        "emailEnabled": true,
18        "emailAddress": "alerts@example.com",
19        "delayMin": 0
20      }
21    ]
22  }'

Method 3: Terraform (Infrastructure as Code)

1resource "mongodbatlas_alert_configuration" "query_efficiency" {
2  project_id = var.project_id
3  
4  event_type = "QUERY_TARGETING_SCANNED_OBJECTS_PER_RETURNED"
5  enabled    = true
6
7  threshold {
8    operator  = "GREATER_THAN"
9    threshold = 1000
10    units     = "RAW"
11  }
12
13  notification {
14    type_name     = "EMAIL"
15    email_enabled = true
16    email_address = "alerts@example.com"
17    delay_min     = 0
18  }
19}

The JSON configuration format shown below represents the Atlas alert structure. These values correspond to what we enter in the UI or send via API:

1// ALERT CONFIGURATION EXAMPLES
2// These are the settings we configure (via UI, API, or Terraform)
3
4// 1. Query Efficiency Alert (Detects inefficient queries)
5// WHERE: Atlas UI → Alerts → Add Alert → "Query Targeting: Scanned Objects / Returned"
6// OR: Use in REST API request body
7// OR: Use in Terraform resource definition
8{
9  eventTypeName: "QUERY_TARGETING_SCANNED_OBJECTS_PER_RETURNED",
10  threshold: 1000,  // Alert if scanned/returned ratio > 1000
11  operator: "GREATER_THAN"
12}
13// What it does: Monitors the ratio of documents scanned vs documents returned
14// Why it matters: High ratio (e.g., 1000x) means query is examining 1000 docs to return 1
15// Example: If this fires, we are likely seeing the problems in Examples 1-5 (2106x, 6319x inefficiency)
16// Real-world: Would have caught Example 5 (6319x) and Example 1 (2106x)
17
18// 2. Slow Query Alert (Detects long-running queries)
19// WHERE: Atlas UI → Alerts → Add Alert → "Query Execution Time"
20// OR: Use in REST API / Terraform with this configuration
21{
22  eventTypeName: "QUERY_EXECUTION_TIME",
23  threshold: 100,   // Alert if query takes > 100ms
24  operator: "GREATER_THAN"
25}
26// What it does: Triggers when any query takes longer than threshold (in milliseconds)
27// Why it matters: Indicates missing indexes, stale statistics, or inefficient operations
28// Example: Would catch Example 1 (148ms) and Example 4 aggregation (638ms)
29// Real-world: Would have caught Example 4 (638ms aggregation)
30
31// 3. Collection Scan Alert (Detects full table scans)
32// WHERE: Atlas UI → Alerts → Add Alert → "Collection Scans"
33// OR: Use in REST API / Terraform with this configuration
34{
35  eventTypeName: "COLLSCANS",
36  threshold: 100,   // Alert if > 100 collection scans per minute
37  operator: "GREATER_THAN"
38}
39// What it does: Counts how many COLLSCAN operations happen per time window
40// Why it matters: COLLSCAN means no index used - reading entire collection
41// Example: ALL our examples 1-6 would trigger this (all showed stage: 'COLLSCAN')
42// Real-world: If we see this firing frequently, we need indexes on those collections
43
44// WHERE DO THESE CONFIGURATIONS COME FROM?
45// - eventTypeName: Predefined by MongoDB Atlas (see full list in Atlas documentation)
46// - threshold: We choose based on our application's performance requirements
47// - operator: Usually "GREATER_THAN" for performance alerts, can also be "LESS_THAN"
48// 
49// Complete list of available eventTypeNames:
50// - QUERY_TARGETING_SCANNED_OBJECTS_PER_RETURNED (efficiency ratio)
51// - QUERY_EXECUTION_TIME (query duration in ms)
52// - COLLSCANS (full collection scans count)
53// - OPCOUNTER_REPL_CMD, OPCOUNTER_REPL_UPDATE, etc. (operation counters)
54// - HOST_DOWN, CLUSTER_MONGOS_IS_MISSING (availability alerts)
55// - Many more in Atlas documentation: 
56//   https://www.mongodb.com/docs/atlas/reference/alert-config-settings/
57
58// SUMMARY: Where to use alert configurations
59// 1. Atlas UI: Click through the web interface (easiest)
60// 2. Atlas Admin API: Send POST request with JSON config (for automation)
61// 3. Terraform: Define as resources in .tf files (infrastructure as code)
62// All three methods use the same eventTypeName, threshold, and operator values
63
64// ============================================================================
65// OPTION 2: MongoDB Profiler (Different from Alerts - Captures Query Details)
66// ============================================================================
67// Note: Profiling is NOT an alert system. It's a diagnostic tool that records
68// slow query details to help us understand WHAT queries are problematic.
69// Use profiling AFTER alerts notify us of performance issues.
70
71// WHERE: Run these commands in MongoDB Shell (mongosh) or our application code
72// WHAT: Records detailed execution info in the system.profile collection
73// WHY: Helps identify which specific queries triggered our Atlas alerts
74
75// Enable profiling - captures slow queries for analysis
76// Level 0 = off, Level 1 = slow queries only, Level 2 = all queries (expensive!)
77db.setProfilingLevel(1, { slowms: 100 })  // Profile queries > 100ms
78
79// Profiling captures:
80// - Full query text
81// - Execution time (millis)
82// - Documents examined
83// - Query plan used
84// - Timestamp
85// This helps identify WHICH queries are slow, not just that slowness exists
86
87// Check profiled queries - find top 10 slowest recent queries
88db.system.profile.find({
89  millis: { $gt: 100 }  // Filter: only queries that took > 100ms
90}).sort({ 
91  ts: -1  // Sort by timestamp descending (most recent first)
92}).limit(10)
93
94// Output example:
95// {
96//   op: "query",
97//   ns: "mydb.llmsummaries",
98//   command: { find: "llmsummaries", filter: { "result.assignee.userId": "..." } },
99//   millis: 148,  // Took 148ms - matches our Example 1!
100//   ts: ISODate("2026-02-26T10:30:00Z"),
101//   planSummary: "COLLSCAN",  // Shows it did full collection scan
102//   docsExamined: 6319,
103//   nreturned: 3
104// }
105
106// Advanced: Find queries with worst efficiency ratios
107db.system.profile.aggregate([
108  { $match: { millis: { $gt: 10 } } },
109  { $project: {
110      ns: 1,
111      millis: 1,
112      docsExamined: 1,
113      nreturned: 1,
114      ratio: { 
115        $cond: {
116          if: { $eq: ["$nreturned", 0] },
117          then: "$docsExamined",
118          else: { $divide: ["$docsExamined", "$nreturned"] }
119        }
120      }
121  }},
122  { $match: { ratio: { $gt: 100 } } },  // Only bad efficiency
123  { $sort: { ratio: -1 } },
124  { $limit: 10 }
125])
126// This would show Example 5 (6319x) at the top!

Recommended workflow: Alerts + Profiling together

1// Step 1: Set up Atlas alerts (one-time setup)
2// → Alerts notify us WHEN performance degrades
3
4// Step 2: When alert fires, enable profiling to investigate
5db.setProfilingLevel(1, { slowms: 100 })
6
7// Step 3: Check profiled queries to find culprits
8db.system.profile.find({ millis: { $gt: 100 } }).sort({ ts: -1 })
9
10// Step 4: Use explain() on specific slow queries
11db.collection.find(problematicQuery).explain("executionStats")
12
13// Step 5: Fix the issue (add indexes, optimize query)
14db.collection.createIndex({ field: 1 })
15
16// Step 6: Verify fix with explain() again
17// → Should see IXSCAN instead of COLLSCAN, lower executionTimeMillis
18
19// Step 7: Turn off profiling (optional, to reduce overhead)
20db.setProfilingLevel(0)

7.2.
Self-Hosted MongoDB alert setup

For self-hosted MongoDB, we need to set up external monitoring since Atlas's built-in alerts aren't available.

Option 1: Prometheus + Grafana + Alertmanager (Recommended)

This is the most popular open-source stack for MongoDB monitoring:

1# docker-compose.yml for monitoring stack
2version: '3.8'
3services:
4  # MongoDB Exporter - collects metrics from MongoDB
5  mongodb-exporter:
6    image: percona/mongodb_exporter:latest
7    command:
8      - '--mongodb.uri=mongodb://user:password@mongodb:27017'
9      - '--collect-all'
10    ports:
11      - "9216:9216"
12    environment:
13      - MONGODB_URI=mongodb://user:password@mongodb:27017
14
15  # Prometheus - stores metrics and evaluates alert rules
16  prometheus:
17    image: prom/prometheus:latest
18    volumes:
19      - ./prometheus.yml:/etc/prometheus/prometheus.yml
20      - ./alerts.yml:/etc/prometheus/alerts.yml
21    ports:
22      - "9090:9090"
23    command:
24      - '--config.file=/etc/prometheus/prometheus.yml'
25      - '--storage.tsdb.path=/prometheus'
26
27  # Grafana - visualization dashboards
28  grafana:
29    image: grafana/grafana:latest
30    ports:
31      - "3000:3000"
32    environment:
33      - GF_SECURITY_ADMIN_PASSWORD=admin
34    volumes:
35      - grafana-storage:/var/lib/grafana
36
37  # Alertmanager - manages and routes alerts
38  alertmanager:
39    image: prom/alertmanager:latest
40    ports:
41      - "9093:9093"
42    volumes:
43      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
44
45volumes:
46  grafana-storage:

Prometheus configuration (prometheus.yml):

1global:
2  scrape_interval: 15s
3  evaluation_interval: 15s
4
5# Load alert rules
6rule_files:
7  - 'alerts.yml'
8
9# Configure Alertmanager
10alerting:
11  alertmanagers:
12    - static_configs:
13        - targets: ['alertmanager:9093']
14
15# Scrape MongoDB metrics
16scrape_configs:
17  - job_name: 'mongodb'
18    static_configs:
19      - targets: ['mongodb-exporter:9216']

Alert rules (alerts.yml) - Equivalent to Atlas alerts:

1groups:
2  - name: mongodb_performance
3    interval: 30s
4    rules:
5      # Alert 1: Query Efficiency (scanned/returned ratio)
6      # Equivalent to: QUERY_TARGETING_SCANNED_OBJECTS_PER_RETURNED
7      - alert: HighQueryScanRatio
8        expr: |
9          (
10            rate(mongodb_ss_metrics_document_scanned[5m]) /
11            rate(mongodb_ss_metrics_document_returned[5m])
12          ) > 1000
13        for: 5m
14        labels:
15          severity: warning
16          category: query_efficiency
17        annotations:
18          summary: "High query scan ratio detected"
19          description: "MongoDB is scanning {{ $value }}x more documents than returned (threshold: 1000x). This indicates missing indexes or inefficient queries."
20
21      # Alert 2: Slow Query Operations
22      # Equivalent to: QUERY_EXECUTION_TIME
23      - alert: SlowQueryOperations
24        expr: mongodb_ss_opcounters_query > 100
25        for: 2m
26        labels:
27          severity: warning
28          category: query_performance
29        annotations:
30          summary: "High number of query operations"
31          description: "{{ $value }} queries per second detected. Check profiler for slow queries."
32
33      # Alert 3: Collection Scans
34      # Equivalent to: COLLSCANS
35      - alert: HighCollectionScans
36        expr: rate(mongodb_ss_metrics_operation_scan_and_order[5m]) > 100
37        for: 2m
38        labels:
39          severity: critical
40          category: index_missing
41        annotations:
42          summary: "High rate of collection scans"
43          description: "{{ $value }} collection scans per minute. Queries are not using indexes efficiently."
44
45      # Additional useful alerts
46      - alert: ReplicationLagHigh
47        expr: mongodb_mongod_replset_member_replication_lag > 10
48        for: 5m
49        labels:
50          severity: warning
51        annotations:
52          summary: "Replication lag is high"
53          description: "Replication lag is {{ $value }} seconds"
54
55      - alert: ConnectionsHigh
56        expr: mongodb_ss_connections{conn_type="current"} > 1000
57        for: 5m
58        labels:
59          severity: warning
60        annotations:
61          summary: "High number of connections"
62          description: "{{ $value }} active connections to MongoDB"

Alertmanager configuration (alertmanager.yml):

1global:
2  resolve_timeout: 5m
3
4# Where to send alerts
5route:
6  group_by: ['alertname', 'cluster']
7  group_wait: 10s
8  group_interval: 10s
9  repeat_interval: 12h
10  receiver: 'default'
11  routes:
12    - match:
13        severity: critical
14      receiver: 'critical-alerts'
15    - match:
16        severity: warning
17      receiver: 'warning-alerts'
18
19receivers:
20  # Email notifications
21  - name: 'default'
22    email_configs:
23      - to: 'team@example.com'
24        from: 'alertmanager@example.com'
25        smarthost: 'smtp.gmail.com:587'
26        auth_username: 'your-email@gmail.com'
27        auth_password: 'your-app-password'
28
29  # Slack notifications for critical
30  - name: 'critical-alerts'
31    slack_configs:
32      - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
33        channel: '#critical-alerts'
34        title: 'MongoDB Critical Alert'
35        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
36
37  # PagerDuty for critical issues
38  - name: 'warning-alerts'
39    webhook_configs:
40      - url: 'http://your-webhook-endpoint/alerts'
41        send_resolved: true

Deploy the stack:

1# Start all services
2docker-compose up -d
3
4# Access Grafana at http://localhost:3000 (admin/admin)
5# Access Prometheus at http://localhost:9090
6# Access Alertmanager at http://localhost:9093
7
8# Import MongoDB dashboard in Grafana
9# Dashboard ID: 2583 (Percona MongoDB Exporter)

Option 2: MongoDB Enterprise Ops Manager (Commercial)

If we have MongoDB Enterprise, use Ops Manager which provides Atlas-like features:

1# Install Ops Manager
2curl -OL https://downloads.mongodb.com/on-prem-mms/rpm/mongodb-mms-x.x.x.x86_64.rpm
3sudo rpm -ivh mongodb-mms-x.x.x.x86_64.rpm
4
5# Configure alerts in Ops Manager UI
6# Navigate to: Ops Manager → Project → Alerts → Add Alert
7# Same alert types as Atlas available

Option 3: Custom Python/Node.js Monitoring Script

For simpler setups, create a custom monitoring script:

1# monitor_mongodb.py
2from pymongo import MongoClient
3import time
4import smtplib
5from email.mime.text import MIMEText
6from datetime import datetime, timedelta
7
8client = MongoClient('mongodb://localhost:27017/')
9db = client.admin
10
11def check_query_efficiency():
12    """Monitor scanned vs returned documents"""
13    stats = db.command('serverStatus')
14    metrics = stats['metrics']['document']
15    
16    scanned = metrics.get('scanned', 0)
17    returned = metrics.get('returned', 0)
18    
19    if returned > 0:
20        ratio = scanned / returned
21        if ratio > 1000:
22            send_alert(
23                f"High scan ratio detected: {ratio:.0f}x",
24                f"Scanned: {scanned}, Returned: {returned}"
25            )
26
27def check_slow_queries():
28    """Check profiler for slow queries"""
29    profiler_db = client['your_database']
30    slow_queries = profiler_db.system.profile.find({
31        'millis': {'$gt': 100}
32    }).sort('ts', -1).limit(10)
33    
34    count = profiler_db.system.profile.count_documents({
35        'millis': {'$gt': 100},
36        'ts': {'$gte': datetime.now() - timedelta(minutes=5)}
37    })
38    
39    if count > 50:
40        send_alert(
41            f"High number of slow queries: {count} in last 5 minutes",
42            f"Threshold: 50 queries > 100ms"
43        )
44
45def send_alert(subject, message):
46    """Send email alert"""
47    msg = MIMEText(message)
48    msg['Subject'] = f'MongoDB Alert: {subject}'
49    msg['From'] = 'monitor@example.com'
50    msg['To'] = 'team@example.com'
51    
52    with smtplib.SMTP('smtp.gmail.com', 587) as server:
53        server.starttls()
54        server.login('your-email@gmail.com', 'your-password')
55        server.send_message(msg)
56    print(f"Alert sent: {subject}")
57
58# Run monitoring loop
59while True:
60    try:
61        check_query_efficiency()
62        check_slow_queries()
63    except Exception as e:
64        print(f"Error: {e}")
65    
66    time.sleep(60)  # Check every minute

Run the monitor:

1# Install dependencies
2pip install pymongo
3
4# Run as background service
5nohup python3 monitor_mongodb.py > monitor.log 2>&1 &
6
7# Or use systemd service
8sudo systemctl enable mongodb-monitor
9sudo systemctl start mongodb-monitor

Option 4: Nagios/Zabbix Integration

1# For Nagios - install MongoDB plugin
2cd /usr/lib64/nagios/plugins/
3wget https://raw.githubusercontent.com/mzupan/nagios-plugin-mongodb/master/check_mongodb.py
4chmod +x check_mongodb.py
5
6# Configure check command in Nagios
7# /etc/nagios/objects/commands.cfg
8define command {
9    command_name    check_mongodb_query_time
10    command_line    $USER1$/check_mongodb.py -H $HOSTADDRESS$ -A query_time -W 100 -C 500
11}
12
13# Define service check
14define service {
15    use                     generic-service
16    host_name               mongodb-server
17    service_description     MongoDB Query Time
18    check_command           check_mongodb_query_time
19}

Comparison of self-hosted monitoring solutions:

Solution	Setup Complexity	Features	Cost	Best For
Prometheus + Grafana	Medium	Excellent	Free	Production environments
Ops Manager	Low	Atlas-like	$$$$	Enterprise with budget
Custom Scripts	Low	Basic	Free	Small deployments
Nagios/Zabbix	High	Good	Free	Existing monitoring

Recommended approach for self-hosted:

Start with: Custom Python script (quick to set up)
Scale to: Prometheus + Grafana (production-ready)
Enterprise: Ops Manager (if budget allows)

Contents

Contents

1. What Are Database Statistics

2. Stale Statistics in MongoDB

2.1.How Statistics Affect Execution Plans

2.2.How Stale Statistics Impact MongoDB

3. Detecting Stale Statistics in MongoDB

3.1.Key Difference from Relational Databases

3.2.MongoDB's Automatic Plan Cache Invalidation

3.3.Important note for MongoDB Atlas users

3.4.Indirect Detection Method: Performance degradation symptoms

3.5.Analyze an explain("executionStats") output

3.6.Red flags indicating stale statistics or poor index usage

3.6.1.High examination ratio (totalDocsExamined / nReturned > 10)

3.6.1.1.On B-tree

3.6.1.2.Detailed example on how the B-tree structure works

3.6.1.3.Performance comparison

3.6.1.4.When high examination ratio is acceptable

3.6.1.5.When high ratio indicates a problem

3.6.2.Collection scans (stage: COLLSCAN) when an index should exist

3.6.3.Wrong index chosen (indexName shows unexpected index)

3.6.3.1.Scenario 1: Stale statistics cause wrong index selection

3.6.3.2.Scenario 2: With multiple stale statistics

3.6.3.3.Summary to two scanerios

3.6.4.High execution time (executionTimeMillis significantly higher than baseline)

3.7.Diagnostic Workflow: What to check in order

4. MongoDB Query Examples: From Optimal to Problematic

4.1.Baseline: Optimal Query Performance

4.2.Potentially Slow Queries

4.2.1.Example 1: Scanning Nested Arrays Without Indexes

4.2.2.Example 2: Complex Text Search in Nested Documents

4.2.3.Example 3: Multiple Array Element Matching

4.2.4.Example 4: Aggregation Counting Array Elements

4.2.5.Example 5: Date Range Query on Nested Timestamps

4.2.6.Example 6: Compound Query with OR Conditions

4.2.7.Signs these queries need attention

5. Performance Improvement Strategies

5.1.Index Design

5.1.1.For the nested array query

5.1.2.For compound queries on arrays

5.1.3.For text searches

5.1.4.For $elemMatch compound queries

5.1.5.For date range queries

5.1.6.For aggregation with $unwind

5.2.Check index usage statistics

5.3.Track index selectivity changes over time

6. Fixing stale plan cache Problems when Indexes Already Exist in MongoDB

6.1.Clear Plan Cache

6.2.Rebuild Indexes

6.3.Use Index Hints

6.4.Analyze Query Performance

6.5.Configure Plan Cache Settings

7. Set Up Alerts

7.1.MongoDB Atlas Alert Setup

7.2.Self-Hosted MongoDB alert setup

Blog Explorer

2.1.
How Statistics Affect Execution Plans

2.2.
How Stale Statistics Impact MongoDB

3.1.
Key Difference from Relational Databases

3.2.
MongoDB's Automatic Plan Cache Invalidation

3.3.
Important note for MongoDB Atlas users

3.4.
Indirect Detection Method: Performance degradation symptoms

3.5.
Analyze an `explain("executionStats")` output

3.6.
Red flags indicating stale statistics or poor index usage

3.6.1.
High examination ratio (`totalDocsExamined` / `nReturned` > 10)

3.6.1.1.
On B-tree

3.6.1.2.
Detailed example on how the B-tree structure works

3.6.1.3.
Performance comparison

3.6.1.4.
When high examination ratio is acceptable

3.6.1.5.
When high ratio indicates a problem

3.6.2.
Collection scans (stage: `COLLSCAN`) when an index should exist

3.6.3.
Wrong index chosen (`indexName` shows unexpected index)

3.6.3.1.
Scenario 1: Stale statistics cause wrong index selection

3.6.3.2.
Scenario 2: With multiple stale statistics

3.6.3.3.
Summary to two scanerios

3.6.4.
High execution time (`executionTimeMillis` significantly higher than baseline)

3.7.
Diagnostic Workflow: What to check in order

4.1.
Baseline: Optimal Query Performance

4.2.
Potentially Slow Queries

4.2.1.
Example 1: Scanning Nested Arrays Without Indexes

4.2.2.
Example 2: Complex Text Search in Nested Documents

4.2.3.
Example 3: Multiple Array Element Matching

4.2.4.
Example 4: Aggregation Counting Array Elements

4.2.5.
Example 5: Date Range Query on Nested Timestamps

4.2.6.
Example 6: Compound Query with OR Conditions

4.2.7.
Signs these queries need attention

5.1.
Index Design

5.1.1.
For the nested array query

5.1.2.
For compound queries on arrays

5.1.3.
For text searches

5.1.4.
For `$elemMatch` compound queries

5.1.5.
For date range queries

5.1.6.
For aggregation with `$unwind`

5.2.
Check index usage statistics

5.3.
Track index selectivity changes over time

6.1.
Clear Plan Cache

6.2.
Rebuild Indexes

6.3.
Use Index Hints

6.4.
Analyze Query Performance

6.5.
Configure Plan Cache Settings

7.1.
MongoDB Atlas Alert Setup

7.2.
Self-Hosted MongoDB alert setup