Demonstration 1 Look up from Arrays Instead of Collections (Inner Joins Only)
Stages
-
Since look up is only available for collections in mongodb, if there is a property among the array of elements, like
id
, which we need to left join with another collection. We need to$unwind
it to reduce the problem into our old familiar problem. -
Next we need to undo the
$unwind
process, which can be done by using using all other fields jointly to form an_id
object, then we push the lookup results into an array according to this_id
. -
We then use
$replaceRoot: { newRoot: ... }
trick to move everything in_id: {...}
to the parent level.
Full Breakdown
1db.ns_material_categories.aggregate([ 2 { 3 $lookup: { 4 let: { group_code: "$group_code" }, 5 from: "ns_material_category_groups", 6 pipeline: [ 7 { $match: { $expr: { $eq: ["$code", "$$group_code"] } } }, 8 { $project: { name: 1, code: 1, _id: 0 } } 9 ], 10 as: "materialGroup" 11 } 12 }, 13 { 14 $match: { 15 $expr: { 16 $eq: ["$enabled", "Y"] 17 } 18 } 19 }, 20 { 21 $unwind: "$materialGroup" 22 },
This lookup only aims at left-joining ns_material_category_groups
to get materialGroup
, which becomes an array of objects of only 1 element (due to 1-1 correspondance), we $unwind
to sequeeze the array into an object:
[{ "_id": { "timestamp": 1648724334, "date": "2022-03-31T10:58:54.000+00:00" }, "code": "FINISHES_CERAMIC_TILES", "group_code": "InteriorFinishes", "enabled": "Y", "materialGroup": { "code": "InteriorFinishes" } }]
We next need to lookup from another collection using the field code
:
23 { 24 $lookup: { 25 let: { code: "$code" }, 26 from: "ns_materials", 27 pipeline: [ 28 { 29 $match: { 30 $expr: { 31 $and: [ 32 { $eq: ["$category_code", "$$code"] }, 33 { $eq: ["$is_dummy", "Y"] } 34 ] 35 } 36 } 37 }, 38 { 39 $project: { 40 _id: 0, 41 manu_code: 1, 42 } 43 } 44 ], 45 as: "suppliers" 46 } 47 },
At this point our queried object becomes
[ { "_id": { "timestamp": 1648724334, "date": "2022-03-31T10:58:54.000+00:00" }, "code": "FINISHES_CERAMIC_TILES", "group_code": "InteriorFinishes", "enabled": "Y", "materialGroup": { "code": "InteriorFinishes" }, "suppliers": [ { "manu_code": "DAUGRES" }, { "manu_code": "ELEPHOME" }, { "manu_code": "HONGYU" }, { "manu_code": "KITO" }, { "manu_code": "MMMOSAIC" } ] }, ... ]
But we need to look up from another collection using manu_code
, we $unwind
the suppliers
field and do a $lookup
from ns_material_manus
again to get the name_en
field.
48 { 49 $unwind: "$suppliers" 50 }, 51 { 52 $lookup: { 53 let: { manu_code: "$suppliers.manu_code" }, 54 from: "ns_material_manus", 55 pipeline: [ 56 { 57 $match: { 58 $expr: { 59 $eq: ["$$manu_code", "$internal_code"] 60 } 61 } 62 }, 63 { 64 $project: { 65 _id: 0, 66 name_en: 1 67 } 68 }, 69 { $limit: 1 } 70 ], 71 as: "supplierDetail" 72 } 73 },
At this point we get
[ { "_id": { "timestamp": 1648724334, "date": "2022-03-31T10:58:54.000+00:00" }, "code": "FINISHES_CERAMIC_TILES", "group_code": "InteriorFinishes", "enabled": "Y", "materialGroup": { "code": "InteriorFinishes" }, "suppliers": { "manu_code": "DAUGRES" }, "supplierDetail": [ { "name_en": "DAUGRES" } ] }, ... ]
-
Note that we have used
{ $limit: 1 }
as we understand each company only has one name, lookup results are born with being an array. -
We
$unwind
it to get ordinary object and finally use the original fields jointly as an_id
to group all supplier results:
74 { 75 $unwind: "$supplierDetail" 76 }, 77 { 78 $group: { 79 _id: { 80 code: "$code", 81 group_code: "$group_code", 82 name: "$name", 83 "code": "$code", 84 "group_code": "$group_code", 85 "enabled": "$enabled", 86 "materialGroup": "$materialGroup" 87 }, 88 suppliers: { "$push": { name_en: "$supplierDetail.name_en" } } 89 } 90 },
The result at this point is of the form:
[ { "_id": { "code": "EXTERIOR_FINISHES_COMPOSITE_DECKING", "group_code": "ExteriorFlooring", "enabled": "Y", "materialGroup": { "code": "ExteriorFlooring" } }, "suppliers": [ { "name_en": "GREENZONE" }, { "name_en": "MEXY" } ] }, ... ]
We finally move everything inside _id
to upper level:
93 { 94 $replaceRoot: { 95 newRoot: { 96 $mergeObjects: ["$_id", { suppliers: "$suppliers" }] 97 } 98 } 99 } 100])
which finally yields
[ { "code": "TAPWARES_KITCHEN_TAPWARE", "group_code": "Tapwares.Accessories", "enabled": "Y", "materialGroup": { "code": "Tapwares.Accessories" }, "suppliers": [ { "name_en": "EMPOLO" }, { "name_en": "ECCO VITA" } ] }, ... ]
Demonstration 2: $lookup
+ $unwind
by Default is an Inner Join
$lookup
+ $unwind
by Default is an Inner JoinExample Demonstrating Inner Join Behaviour
Consider the following aggregation, we are restricting ourself to look at only one "problematic" document by enforcing { code: "NEW_CATEGORY_20230614C" }
.
1db.ns_material_categories.aggregate([ 2 { 3 $match: { 4 $and: [ 5 { enabled: "Y" }, 6 { code: "NEW_CATEGORY_20230614C" } 7 ] 8 } 9 }, 10 { 11 $lookup: { 12 let: { group_code: "$group_code" }, 13 from: "ns_material_category_groups", 14 pipeline: [ 15 { $match: { $expr: { $eq: ["$code", "$$group_code"] } } }, 16 { $project: { oid: { $toString: "$_id" }, name: 1, code: 1, _id: 0 } } 17 ], 18 as: "catGroup" 19 } 20 }, 21 { 22 $unwind: "$catGroup" 23 }, 24 { 25 $lookup: { 26 let: { code: "$code" }, 27 from: "ns_materials", 28 pipeline: [ 29 { 30 $match: { 31 $expr: { 32 $and: [ 33 { $eq: ["$category_code", "$$code"] }, 34 { $eq: ["$is_dummy", "Y"] } 35 ] 36 } 37 } 38 }, 39 { 40 $project: { 41 _id: 0, 42 manu_code: 1, 43 } 44 } 45 ], 46 as: "suppliers" 47 } 48 },
At this point the result is:
[ { "_id": { "$oid": "648954d9547dee498d4f85a3" }, "code": "NEW_CATEGORY_20230614C", "group_code": "NewCategoryGroup20230614c", "enabled": "Y", "catGroup": { "code": "NewCategoryGroup20230614c", "oid": "648954d2547dee498d4f85a2" }, "suppliers": [] } ]
Note that suppliers
is an empty array, which means that we have nothing matched from the table on the right.
Each item in suppliers
contains a field called manu_code
, with which we will be using to look up from another collection, for this purpose, we need to $unwind
suppliers
field and then do a $lookup
using $suppliers.manu_code
.
50 { 51 $unwind: "$suppliers" 52 },
but oops!
[]
We conclude from this stage that by default $lookup + $unwind
is an inner join operation (assuming we will be doing $group
action in the last stage).
unwind with preserveNullAndEmptyArrays: true
As in mySQL we most of the time want to left join
instead of inner join
, if we instead
50 { 51 $unwind: { 52 path: "$suppliers", 53 preserveNullAndEmptyArrays: true 54 } 55 },
we get
[ { "_id": { "$oid": "648954d9547dee498d4f85a3" }, "code": "NEW_CATEGORY_20230614C", "group_code": "NewCategoryGroup20230614c", "enabled": "Y", "catGroup": { "code": "NewCategoryGroup20230614c", "oid": "648954d2547dee498d4f85a2" } } ]
that means row on the left table is preserved even we match nothing on the right table. This is exactly what left join
does.
If we do a look up using suppliers.manu_code
:
53 { 54 $lookup: { 55 let: { manu_code: "$suppliers.manu_code" }, 56 from: "ns_material_manus", 57 pipeline: [ 58 { 59 $match: { 60 $expr: { 61 $eq: ["$$manu_code", "$internal_code"] 62 } 63 } 64 }, 65 { 66 $project: { 67 _id: 0, 68 name_en: 1 69 } 70 }, 71 { $limit: 1 } 72 ], 73 as: "supplierDetail" 74 } 75 },
we get
[ { "_id": { "$oid": "648954d9547dee498d4f85a3" }, "code": "NEW_CATEGORY_20230614C", "group_code": "NewCategoryGroup20230614c", "enabled": "Y", "catGroup": { "code": "NewCategoryGroup20230614c", "oid": "648954d2547dee498d4f85a2" }, "supplierDetail": [] } ]
As desired.
Grouping and Handling the [ {} ]
Result
[ {} ]
ResultStarting from Nonempty Suppliers Document and Group the Results
We enforce { code: "EXTERIOR_FLOORING_COBBLE_STONE" }
at the beginning of the aggregation script, this has 3 suppliers and thefore we have 3 supplierDetails.
Up to line 75 of the aggregation script, the result will be 3 pieces of data having the following form (with suppliers
and supplierDetail
being the only varying field)
{ "_id": { "$oid": "63316d25fe1ff183f4436771" }, "code": "KITCHEN_APPLIANCES_OVEN", "group_code": "KitchenAppliances", "enabled": "Y", "catGroup": { "code": "KitchenAppliances", "oid": "63316d07fe1ff183f443676a" }, "suppliers": { "manu_code": "FOTILE" }, "supplierDetail": [ { "name_en": "FOTILE" } ] }
Next we $unwind
(just for squeezing the array) and $group
what we need:
76 { 77 $unwind: { 78 path: "$supplierDetail", 79 preserveNullAndEmptyArrays: true 80 } 81 }, 82 { 83 $group: { 84 _id: { 85 oid: { $toString: "$_id" }, 86 code: "$code", 87 enabled: "$enabled", 88 catGroup: "$catGroup" 89 }, 90 suppliers: { "$push": { name_en: "$supplierDetail.name_en" } } 91 } 92 },
which yields:
[ { "_id": { "oid": "63316d25fe1ff183f4436770", "code": "EXTERIOR_FLOORING_COBBLE_STONE", "enabled": "Y", "catGroup": { "code": "ExteriorFlooring", "oid": "62cbbf0b35fc26d4990aa2f1" } }, "suppliers": [ { "name_en": "DALEI" }, { "name_en": "EASTWOOD STONE" }, { "name_en": "STONELINK" } ] } ]
$replaceRoot
is the trick to bring everything in _id
to the top level:
99 { 100 $replaceRoot: { 101 newRoot: { 102 $mergeObjects: ["$_id", { suppliers: "$suppliers" }] 103 } 104 } 105 },
which results in:
[ { "oid": "63316d25fe1ff183f4436770", "code": "EXTERIOR_FLOORING_COBBLE_STONE", "enabled": "Y", "catGroup": { "code": "ExteriorFlooring", "oid": "62cbbf0b35fc26d4990aa2f1" }, "suppliers": [ { "name_en": "DALEI" }, { "name_en": "EASTWOOD STONE" }, { "name_en": "STONELINK" } ] } ]
We finally look up another table for technical purpose. Note the use of javascript, which is considerably tedious if we rely merely on mongo query:
106 { 107 $lookup: { 108 from: "ns_generic_form_templates", 109 let: { code: "$code" }, 110 pipeline: [ 111 { $project: { subtypes: 1 } }, 112 { 113 $addFields: { 114 formAnalysis: { 115 $function: { 116 body: function (code, subtypes, formId) { 117 if (!subtypes) { 118 return null 119 } else { 120 const hasForm = subtypes.indexOf(code) > -1; 121 122 return { category_code: code, hasForm, formOid: formId } 123 } 124 }, 125 args: ["$$code", "$subtypes", "$_id"], 126 lang: 'js' 127 } 128 } 129 } 130 }, 131 { $project: { subtypes: 0 } }, 132 { 133 $match: { "formAnalysis.hasForm": true } 134 }, 135 { $limit: 1 }, 136 { $replaceRoot: { newRoot: "$formAnalysis" } }, 137 { $set: { formOid: { $toString: "$formOid" } } }, 138 ], 139 as: "genericForm" 140 } 141 },
We next $unwind
again simply for sequeezing the array into a single object:
142 { 143 $unwind: { 144 path: "$genericForm", 145 preserveNullAndEmptyArrays: true 146 } 147 },
And we get a pretty result:
[ { "oid": "63316d25fe1ff183f4436770", "code": "EXTERIOR_FLOORING_COBBLE_STONE", "enabled": "Y", "catGroup": { "code": "ExteriorFlooring", "oid": "62cbbf0b35fc26d4990aa2f1" }, "suppliers": [ { "name_en": "DALEI" }, { "name_en": "EASTWOOD STONE" }, { "name_en": "STONELINK" } ], "genericForm": { "category_code": "EXTERIOR_FLOORING_COBBLE_STONE", "hasForm": true, "formOid": "62972948261ac272d152e238" } } ]
Starting from Empty Suppliers Document and Deal with [ {} ]
[ {} ]
We go back to { code: "NEW_CATEGORY_20230614C" }
which has no suppliers. We run through all the same script to line 147 as above, then the result becomes
[ { "oid": "648954d9547dee498d4f85a3", "code": "NEW_CATEGORY_20230614C", "enabled": "Y", "catGroup": { "code": "NewCategoryGroup20230614c", "oid": "648954d2547dee498d4f85a2" }, "suppliers": [ {} ] } ]
To make [ {} ]
into []
, we do a filtering:
149 { 150 $set: { 151 suppliers: { 152 $filter: { 153 input: "$suppliers", 154 cond: { 155 $ifNull: ["$$this.name_en", false] 156 } 157 } 158 } 159 } 160 } 161 ] 162)
and we get:
[ { "oid": "648954d9547dee498d4f85a3", "code": "NEW_CATEGORY_20230614C", "enabled": "Y", "catGroup": { "code": "NewCategoryGroup20230614c", "oid": "648954d2547dee498d4f85a2" }, "suppliers": [] } ]