0%
June 15, 2023

Inner and Left Joining Multiple Collections in Mongo --- The preserveNullAndEmptyArrays in $unwind

mongo

Demonstration 1 Look up from Arrays Instead of Collections (Inner Joins Only)

Stages
  • Since look up is only available for collections in mongodb, if there is a property among the array of elements, like id, which we need to left join with another collection. We need to $unwind it to reduce the problem into our old familiar problem.

  • Next we need to undo the $unwind process, which can be done by using using all other fields jointly to form an_id object, then we push the lookup results into an array according to this _id.

  • We then use $replaceRoot: { newRoot: ... } trick to move everything in _id: {...} to the parent level.

Full Breakdown
1db.ns_material_categories.aggregate([
2    {
3        $lookup: {
4            let: { group_code: "$group_code" },
5            from: "ns_material_category_groups",
6            pipeline: [
7                { $match: { $expr: { $eq: ["$code", "$$group_code"] } } },
8                { $project: { name: 1, code: 1, _id: 0 } }
9            ],
10            as: "materialGroup"
11        }
12    },
13    {
14        $match: {
15            $expr: {
16                $eq: ["$enabled", "Y"]
17            }
18        }
19    },
20    {
21        $unwind: "$materialGroup"
22    },

This lookup only aims at left-joining ns_material_category_groups to get materialGroup, which becomes an array of objects of only 1 element (due to 1-1 correspondance), we $unwind to sequeeze the array into an object:

 [{
    "_id": {
        "timestamp": 1648724334,
        "date": "2022-03-31T10:58:54.000+00:00"
    },
    "code": "FINISHES_CERAMIC_TILES",
    "group_code": "InteriorFinishes",
    "enabled": "Y",
    "materialGroup": {
        "code": "InteriorFinishes"
    }
}]

We next need to lookup from another collection using the field code:

23   {
24        $lookup: {
25            let: { code: "$code" },
26            from: "ns_materials",
27            pipeline: [
28                {
29                    $match: {
30                        $expr: {
31                            $and: [
32                                { $eq: ["$category_code", "$$code"] },
33                                { $eq: ["$is_dummy", "Y"] }
34                            ]
35                        }
36                    }
37                },
38                {
39                    $project: {
40                        _id: 0,
41                        manu_code: 1,
42                    }
43                }
44            ],
45            as: "suppliers"
46        }
47    },

At this point our queried object becomes

[
    {
        "_id": {
            "timestamp": 1648724334,
            "date": "2022-03-31T10:58:54.000+00:00"
        },
        "code": "FINISHES_CERAMIC_TILES",
        "group_code": "InteriorFinishes",
        "enabled": "Y",
        "materialGroup": {
            "code": "InteriorFinishes"
        },
        "suppliers": [
            {
                "manu_code": "DAUGRES"
            },
            {
                "manu_code": "ELEPHOME"
            },
            {
                "manu_code": "HONGYU"
            },
            {
                "manu_code": "KITO"
            },
            {
                "manu_code": "MMMOSAIC"
            }
        ]
    },
 ...
]

But we need to look up from another collection using manu_code, we $unwind the suppliers field and do a $lookup from ns_material_manus again to get the name_en field.

48    {
49        $unwind: "$suppliers"
50    },
51    {
52        $lookup: {
53            let: { manu_code: "$suppliers.manu_code" },
54            from: "ns_material_manus",
55            pipeline: [
56                {
57                    $match: {
58                        $expr: {
59                            $eq: ["$$manu_code", "$internal_code"]
60                        }
61                    }
62                },
63                {
64                    $project: {
65                        _id: 0,
66                        name_en: 1
67                    }
68                },
69                { $limit: 1 }
70            ],
71            as: "supplierDetail"
72        }
73    },

At this point we get

[
    {
        "_id": {
            "timestamp": 1648724334,
            "date": "2022-03-31T10:58:54.000+00:00"
        },
        "code": "FINISHES_CERAMIC_TILES",
        "group_code": "InteriorFinishes",
        "enabled": "Y",
        "materialGroup": {
            "code": "InteriorFinishes"
        },
        "suppliers": {
            "manu_code": "DAUGRES"
        },
        "supplierDetail": [
            {
                "name_en": "DAUGRES"
            }
        ]
    },
    ...
]
  • Note that we have used { $limit: 1 } as we understand each company only has one name, lookup results are born with being an array.

  • We $unwind it to get ordinary object and finally use the original fields jointly as an _id to group all supplier results:

74    {
75        $unwind: "$supplierDetail"
76    },
77    {
78        $group: {
79            _id: {
80                code: "$code",
81                group_code: "$group_code",
82                name: "$name",
83                "code": "$code",
84                "group_code": "$group_code",
85                "enabled": "$enabled",
86                "materialGroup": "$materialGroup"
87            },
88            suppliers: { "$push": { name_en: "$supplierDetail.name_en" } }
89        }
90    },

The result at this point is of the form:

[
    {
        "_id": {
            "code": "EXTERIOR_FINISHES_COMPOSITE_DECKING",
            "group_code": "ExteriorFlooring",
            "enabled": "Y",
            "materialGroup": {
                "code": "ExteriorFlooring"
            }
        },
        "suppliers": [
            {
                "name_en": "GREENZONE"
            },
            {
                "name_en": "MEXY"
            }
        ]
    },
    ...
]

We finally move everything inside _id to upper level:

93    {
94        $replaceRoot: {
95            newRoot: {
96                $mergeObjects: ["$_id", { suppliers: "$suppliers" }]
97            }
98        }
99    }
100])

which finally yields

[
    {
        "code": "TAPWARES_KITCHEN_TAPWARE",
        "group_code": "Tapwares.Accessories",
        "enabled": "Y",
        "materialGroup": {
            "code": "Tapwares.Accessories"
        },
        "suppliers": [
            {
                "name_en": "EMPOLO"
            },
            {
                "name_en": "ECCO VITA"
            }
        ]
    },
    ...
]

Demonstration 2: $lookup + $unwind by Default is an Inner Join

Example Demonstrating Inner Join Behaviour

Consider the following aggregation, we are restricting ourself to look at only one "problematic" document by enforcing { code: "NEW_CATEGORY_20230614C" }.

1db.ns_material_categories.aggregate([
2    {
3        $match: {
4            $and: [
5                { enabled: "Y" },
6                { code: "NEW_CATEGORY_20230614C" }
7            ]
8        }
9    },
10    {
11        $lookup: {
12            let: { group_code: "$group_code" },
13            from: "ns_material_category_groups",
14            pipeline: [
15                { $match: { $expr: { $eq: ["$code", "$$group_code"] } } },
16                { $project: { oid: { $toString: "$_id" }, name: 1, code: 1, _id: 0 } }
17            ],
18            as: "catGroup"
19        }
20    },
21    {
22        $unwind: "$catGroup"
23    },
24    {
25        $lookup: {
26            let: { code: "$code" },
27            from: "ns_materials",
28            pipeline: [
29                {
30                    $match: {
31                        $expr: {
32                            $and: [
33                                { $eq: ["$category_code", "$$code"] },
34                                { $eq: ["$is_dummy", "Y"] }
35                            ]
36                        }
37                    }
38                },
39                {
40                    $project: {
41                        _id: 0,
42                        manu_code: 1,
43                    }
44                }
45            ],
46            as: "suppliers"
47        }
48    },

At this point the result is:

[
  {
    "_id": {
      "$oid": "648954d9547dee498d4f85a3"
    },
    "code": "NEW_CATEGORY_20230614C",
    "group_code": "NewCategoryGroup20230614c",
    "enabled": "Y",
    "catGroup": {
      "code": "NewCategoryGroup20230614c",
      "oid": "648954d2547dee498d4f85a2"
    },
    "suppliers": []
  }
]

Note that suppliers is an empty array, which means that we have nothing matched from the table on the right.

Each item in suppliers contains a field called manu_code, with which we will be using to look up from another collection, for this purpose, we need to $unwind suppliers field and then do a $lookup using $suppliers.manu_code.

50    {
51        $unwind: "$suppliers"
52    },

but oops!

[]

We conclude from this stage that by default $lookup + $unwind is an inner join operation (assuming we will be doing $group action in the last stage).

unwind with preserveNullAndEmptyArrays: true

As in mySQL we most of the time want to left join instead of inner join, if we instead

50    {
51        $unwind: {
52            path: "$suppliers",
53            preserveNullAndEmptyArrays: true
54        }
55    },

we get

[
  {
    "_id": {
      "$oid": "648954d9547dee498d4f85a3"
    },
    "code": "NEW_CATEGORY_20230614C",
    "group_code": "NewCategoryGroup20230614c",
    "enabled": "Y",
    "catGroup": {
      "code": "NewCategoryGroup20230614c",
      "oid": "648954d2547dee498d4f85a2"
    }
  }
]

that means row on the left table is preserved even we match nothing on the right table. This is exactly what left join does.

If we do a look up using suppliers.manu_code:

53    {
54        $lookup: {
55            let: { manu_code: "$suppliers.manu_code" },
56            from: "ns_material_manus",
57            pipeline: [
58                {
59                    $match: {
60                        $expr: {
61                            $eq: ["$$manu_code", "$internal_code"]
62                        }
63                    }
64                },
65                {
66                    $project: {
67                        _id: 0,
68                        name_en: 1
69                    }
70                },
71                { $limit: 1 }
72            ],
73            as: "supplierDetail"
74        }
75    },

we get

[
  {
    "_id": {
      "$oid": "648954d9547dee498d4f85a3"
    },
    "code": "NEW_CATEGORY_20230614C",
    "group_code": "NewCategoryGroup20230614c",
    "enabled": "Y",
    "catGroup": {
      "code": "NewCategoryGroup20230614c",
      "oid": "648954d2547dee498d4f85a2"
    },
    "supplierDetail": []
  }
]

As desired.

Grouping and Handling the [ {} ] Result
Starting from Nonempty Suppliers Document and Group the Results

We enforce { code: "EXTERIOR_FLOORING_COBBLE_STONE" } at the beginning of the aggregation script, this has 3 suppliers and thefore we have 3 supplierDetails.

Up to line 75 of the aggregation script, the result will be 3 pieces of data having the following form (with suppliers and supplierDetail being the only varying field)

{
  "_id": {
    "$oid": "63316d25fe1ff183f4436771"
  },
  "code": "KITCHEN_APPLIANCES_OVEN",
  "group_code": "KitchenAppliances",
  "enabled": "Y",
  "catGroup": {
    "code": "KitchenAppliances",
    "oid": "63316d07fe1ff183f443676a"
  },
  "suppliers": {
    "manu_code": "FOTILE"
  },
  "supplierDetail": [
    {
      "name_en": "FOTILE"
    }
  ]
}

Next we $unwind (just for squeezing the array) and $group what we need:

76    {
77        $unwind: {
78            path: "$supplierDetail",
79            preserveNullAndEmptyArrays: true
80        }
81    },
82    {
83        $group: {
84            _id: {
85                oid: { $toString: "$_id" },
86                code: "$code",
87                enabled: "$enabled",
88                catGroup: "$catGroup"
89            },
90            suppliers: { "$push": { name_en: "$supplierDetail.name_en" } }
91        }
92    },

which yields:

[
  {
    "_id": {
      "oid": "63316d25fe1ff183f4436770",
      "code": "EXTERIOR_FLOORING_COBBLE_STONE",
      "enabled": "Y",
      "catGroup": {
        "code": "ExteriorFlooring",
        "oid": "62cbbf0b35fc26d4990aa2f1"
      }
    },
    "suppliers": [
      {
        "name_en": "DALEI"
      },
      {
        "name_en": "EASTWOOD STONE"
      },
      {
        "name_en": "STONELINK"
      }
    ]
  }
]

$replaceRoot is the trick to bring everything in _id to the top level:

99    {
100        $replaceRoot: {
101            newRoot: {
102                $mergeObjects: ["$_id", { suppliers: "$suppliers" }]
103            }
104        }
105    },

which results in:

[
  {
    "oid": "63316d25fe1ff183f4436770",
    "code": "EXTERIOR_FLOORING_COBBLE_STONE",
    "enabled": "Y",
    "catGroup": {
      "code": "ExteriorFlooring",
      "oid": "62cbbf0b35fc26d4990aa2f1"
    },
    "suppliers": [
      {
        "name_en": "DALEI"
      },
      {
        "name_en": "EASTWOOD STONE"
      },
      {
        "name_en": "STONELINK"
      }
    ]
  }
]

We finally look up another table for technical purpose. Note the use of javascript, which is considerably tedious if we rely merely on mongo query:

106    {
107        $lookup: {
108            from: "ns_generic_form_templates",
109            let: { code: "$code" },
110            pipeline: [
111                { $project: { subtypes: 1 } },
112                {
113                    $addFields: {
114                        formAnalysis: {
115                            $function: {
116                                body: function (code, subtypes, formId) {
117                                    if (!subtypes) {
118                                        return null
119                                    } else {
120                                        const hasForm = subtypes.indexOf(code) > -1;
121
122                                        return { category_code: code, hasForm, formOid: formId }
123                                    }
124                                },
125                                args: ["$$code", "$subtypes", "$_id"],
126                                lang: 'js'
127                            }
128                        }
129                    }
130                },
131                { $project: { subtypes: 0 } },
132                {
133                    $match: { "formAnalysis.hasForm": true }
134                },
135                { $limit: 1 },
136                { $replaceRoot: { newRoot: "$formAnalysis" } },
137                { $set: { formOid: { $toString: "$formOid" } } },
138            ],
139            as: "genericForm"
140        }
141    },

We next $unwind again simply for sequeezing the array into a single object:

142    {
143        $unwind: {
144            path: "$genericForm",
145            preserveNullAndEmptyArrays: true
146        }
147    },

And we get a pretty result:

[
  {
    "oid": "63316d25fe1ff183f4436770",
    "code": "EXTERIOR_FLOORING_COBBLE_STONE",
    "enabled": "Y",
    "catGroup": {
      "code": "ExteriorFlooring",
      "oid": "62cbbf0b35fc26d4990aa2f1"
    },
    "suppliers": [
      {
        "name_en": "DALEI"
      },
      {
        "name_en": "EASTWOOD STONE"
      },
      {
        "name_en": "STONELINK"
      }
    ],
    "genericForm": {
      "category_code": "EXTERIOR_FLOORING_COBBLE_STONE",
      "hasForm": true,
      "formOid": "62972948261ac272d152e238"
    }
  }
]
Starting from Empty Suppliers Document and Deal with [ {} ]

We go back to { code: "NEW_CATEGORY_20230614C" } which has no suppliers. We run through all the same script to line 147 as above, then the result becomes

[
  {
    "oid": "648954d9547dee498d4f85a3",
    "code": "NEW_CATEGORY_20230614C",
    "enabled": "Y",
    "catGroup": {
      "code": "NewCategoryGroup20230614c",
      "oid": "648954d2547dee498d4f85a2"
    },
    "suppliers": [
      {}
    ]
  }
]

To make [ {} ] into [], we do a filtering:

149        {
150            $set: {
151                suppliers: {
152                    $filter: {
153                        input: "$suppliers",
154                        cond: {
155                            $ifNull: ["$$this.name_en", false]
156                        }
157                    }
158                }
159            }
160        }
161    ]
162)

and we get:

[
  {
    "oid": "648954d9547dee498d4f85a3",
    "code": "NEW_CATEGORY_20230614C",
    "enabled": "Y",
    "catGroup": {
      "code": "NewCategoryGroup20230614c",
      "oid": "648954d2547dee498d4f85a2"
    },
    "suppliers": []
  }
]