Matching ObjectId to String for $graphLookup Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!How to convert string to objectId in LocalField for $lookup MongodbMongoose lookup always return empty arraytoObjectId for arrrays of String to be used with lookup and aggregateReturn empty result in mongodb $graphLookupJoin Query in MongoDBMongoose aggregate pipeline not working as expectedusing aggrate in mongoose populate array of reference objectsMongoDB Getting count of how popular a tag is based on how many posts have itMongodb update subdocument field in order better approach compare to update one by oneNode.js Mongoose.js string to ObjectId functionResults from MongoDBMongoose: CastError: Cast to ObjectId failed for value “[object Object]” at path “_id”Can't get $or query to work in NodeJS with MongooseMongodb graphLookupIs it possible to compare string with ObjectId via $lookupgraphLookup StartWith Object instead of ArrayEvaluating an Array of MongoDB IDsMongoDB complex $graphlookup with NodeJSConvert string to objectid mongodb

How to react to hostile behavior from a senior developer?

Do square wave exist?

Is there a kind of relay only consumes power when switching?

Why aren't air breathing engines used as small first stages?

Do I really need recursive chmod to restrict access to a folder?

Why are both D and D# fitting into my E minor key?

Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?

Amount of permutations on an NxNxN Rubik's Cube

Most bit efficient text communication method?

What does "lightly crushed" mean for cardamon pods?

What is the meaning of the new sigil in Game of Thrones Season 8 intro?

Can anything be seen from the center of the Boötes void? How dark would it be?

Do wooden building fires get hotter than 600°C?

Does classifying an integer as a discrete log require it be part of a multiplicative group?

Is this homebrew Lady of Pain warlock patron balanced?

Is grep documentation wrong?

How would a mousetrap for use in space work?

When a candle burns, why does the top of wick glow if bottom of flame is hottest?

Can an alien society believe that their star system is the universe?

How to convince students of the implication truth values?

How to deal with a team lead who never gives me credit?

Crossing US/Canada Border for less than 24 hours

What font is "z" in "z-score"?

Is it common practice to audition new musicians one-on-one before rehearsing with the entire band?



Matching ObjectId to String for $graphLookup



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!How to convert string to objectId in LocalField for $lookup MongodbMongoose lookup always return empty arraytoObjectId for arrrays of String to be used with lookup and aggregateReturn empty result in mongodb $graphLookupJoin Query in MongoDBMongoose aggregate pipeline not working as expectedusing aggrate in mongoose populate array of reference objectsMongoDB Getting count of how popular a tag is based on how many posts have itMongodb update subdocument field in order better approach compare to update one by oneNode.js Mongoose.js string to ObjectId functionResults from MongoDBMongoose: CastError: Cast to ObjectId failed for value “[object Object]” at path “_id”Can't get $or query to work in NodeJS with MongooseMongodb graphLookupIs it possible to compare string with ObjectId via $lookupgraphLookup StartWith Object instead of ArrayEvaluating an Array of MongoDB IDsMongoDB complex $graphlookup with NodeJSConvert string to objectid mongodb



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








2















I'm trying to run a $graphLookup like demonstrated in print bellow:



$graphLookup sample



The objective is to, given a specific record (commented $match there), retrieve it's full "path" throught immediateAncestors property. As you can see, it's not happening.



I introduced $convert here to deal with _id from collection as string, believing it could be possible to "match" with _id from immediateAncestors records list (which is a string).



So, I did run another test with different data (no ObjectIds involved):



db.nodos.insert("id":5,"name":"cinco","children":["id":4])
db.nodos.insert("id":4,"name":"quatro","ancestors":["id":5],"children":["id":3])
db.nodos.insert("id":6,"name":"seis","children":["id":3])
db.nodos.insert("id":1,"name":"um","children":["id":2])
db.nodos.insert("id":2,"name":"dois","ancestors":["id":1],"children":["id":3])
db.nodos.insert("id":3,"name":"três","ancestors":["id":2,"id":4,"id":6])
db.nodos.insert("id":7,"name":"sete","children":["id":5])


And the query:



db.nodos.aggregate( [
$match: "id": 3 ,
$graphLookup:
from: "nodos",
startWith: "$ancestors.id",
connectFromField: "ancestors.id",
connectToField: "id",
as: "ANCESTORS_FROM_BEGINNING"

,
$project:
"name": 1,
"id": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING.id"


] )


...which outputs what I was expecting (the five records directly and indirectly connected to the one with id 3):




"_id" : ObjectId("5afe270fb4719112b613f1b4"),
"id" : 3.0,
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [
1.0,
4.0,
6.0,
5.0,
2.0
]



The question is: there is a way to achieve the objetive I mentioned in the beginning?



I'm running Mongo 3.7.9 (from official Docker)



Thanks in advance!










share|improve this question



















  • 2





    Not the way you seem to think it would. The problem here would appear to be that the "from" in $graphLookup is getting data from the "collection" and not the $project on successive recursions. You could try a "view" which does the projection and use that as the "from" source. Also you probably just want the aliased $toString or $toObjectId in this context, since there's no practical purpose for the onError in your context.

    – Neil Lunn
    May 18 '18 at 1:31






  • 2





    Keep in mind though that using $convert or it's aliases for this specific purpose is really a "band aid" and not a practical solution. The "real solution" is to make sure the data which "looks like an ObjectId" is actually of an ObjectId type in ALL places where you record it. This kind of "brute force" was never the intention of the casting conversions being introduced.

    – Neil Lunn
    May 18 '18 at 1:33











  • Thanks for the considerations Neil. Do you think that storing a true ObjectId there will make things better? If it happens that I could manage that, will the query works, without $project and $convert?

    – Cesar
    May 18 '18 at 2:17







  • 1





    Sorry, been away. Your "screenshot" does not really help here. It would be better to discover the JSON view of the data and paste some of that into the question instead. My general opinion is that the "brute force" you are trying to do is not the correct approach, and instead the types should be converted in the data so that that actually match without coercion. But putting some clear data in the post we can use with copy/paste makes that easier to explain.

    – Neil Lunn
    May 18 '18 at 3:35

















2















I'm trying to run a $graphLookup like demonstrated in print bellow:



$graphLookup sample



The objective is to, given a specific record (commented $match there), retrieve it's full "path" throught immediateAncestors property. As you can see, it's not happening.



I introduced $convert here to deal with _id from collection as string, believing it could be possible to "match" with _id from immediateAncestors records list (which is a string).



So, I did run another test with different data (no ObjectIds involved):



db.nodos.insert("id":5,"name":"cinco","children":["id":4])
db.nodos.insert("id":4,"name":"quatro","ancestors":["id":5],"children":["id":3])
db.nodos.insert("id":6,"name":"seis","children":["id":3])
db.nodos.insert("id":1,"name":"um","children":["id":2])
db.nodos.insert("id":2,"name":"dois","ancestors":["id":1],"children":["id":3])
db.nodos.insert("id":3,"name":"três","ancestors":["id":2,"id":4,"id":6])
db.nodos.insert("id":7,"name":"sete","children":["id":5])


And the query:



db.nodos.aggregate( [
$match: "id": 3 ,
$graphLookup:
from: "nodos",
startWith: "$ancestors.id",
connectFromField: "ancestors.id",
connectToField: "id",
as: "ANCESTORS_FROM_BEGINNING"

,
$project:
"name": 1,
"id": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING.id"


] )


...which outputs what I was expecting (the five records directly and indirectly connected to the one with id 3):




"_id" : ObjectId("5afe270fb4719112b613f1b4"),
"id" : 3.0,
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [
1.0,
4.0,
6.0,
5.0,
2.0
]



The question is: there is a way to achieve the objetive I mentioned in the beginning?



I'm running Mongo 3.7.9 (from official Docker)



Thanks in advance!










share|improve this question



















  • 2





    Not the way you seem to think it would. The problem here would appear to be that the "from" in $graphLookup is getting data from the "collection" and not the $project on successive recursions. You could try a "view" which does the projection and use that as the "from" source. Also you probably just want the aliased $toString or $toObjectId in this context, since there's no practical purpose for the onError in your context.

    – Neil Lunn
    May 18 '18 at 1:31






  • 2





    Keep in mind though that using $convert or it's aliases for this specific purpose is really a "band aid" and not a practical solution. The "real solution" is to make sure the data which "looks like an ObjectId" is actually of an ObjectId type in ALL places where you record it. This kind of "brute force" was never the intention of the casting conversions being introduced.

    – Neil Lunn
    May 18 '18 at 1:33











  • Thanks for the considerations Neil. Do you think that storing a true ObjectId there will make things better? If it happens that I could manage that, will the query works, without $project and $convert?

    – Cesar
    May 18 '18 at 2:17







  • 1





    Sorry, been away. Your "screenshot" does not really help here. It would be better to discover the JSON view of the data and paste some of that into the question instead. My general opinion is that the "brute force" you are trying to do is not the correct approach, and instead the types should be converted in the data so that that actually match without coercion. But putting some clear data in the post we can use with copy/paste makes that easier to explain.

    – Neil Lunn
    May 18 '18 at 3:35













2












2








2


2






I'm trying to run a $graphLookup like demonstrated in print bellow:



$graphLookup sample



The objective is to, given a specific record (commented $match there), retrieve it's full "path" throught immediateAncestors property. As you can see, it's not happening.



I introduced $convert here to deal with _id from collection as string, believing it could be possible to "match" with _id from immediateAncestors records list (which is a string).



So, I did run another test with different data (no ObjectIds involved):



db.nodos.insert("id":5,"name":"cinco","children":["id":4])
db.nodos.insert("id":4,"name":"quatro","ancestors":["id":5],"children":["id":3])
db.nodos.insert("id":6,"name":"seis","children":["id":3])
db.nodos.insert("id":1,"name":"um","children":["id":2])
db.nodos.insert("id":2,"name":"dois","ancestors":["id":1],"children":["id":3])
db.nodos.insert("id":3,"name":"três","ancestors":["id":2,"id":4,"id":6])
db.nodos.insert("id":7,"name":"sete","children":["id":5])


And the query:



db.nodos.aggregate( [
$match: "id": 3 ,
$graphLookup:
from: "nodos",
startWith: "$ancestors.id",
connectFromField: "ancestors.id",
connectToField: "id",
as: "ANCESTORS_FROM_BEGINNING"

,
$project:
"name": 1,
"id": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING.id"


] )


...which outputs what I was expecting (the five records directly and indirectly connected to the one with id 3):




"_id" : ObjectId("5afe270fb4719112b613f1b4"),
"id" : 3.0,
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [
1.0,
4.0,
6.0,
5.0,
2.0
]



The question is: there is a way to achieve the objetive I mentioned in the beginning?



I'm running Mongo 3.7.9 (from official Docker)



Thanks in advance!










share|improve this question
















I'm trying to run a $graphLookup like demonstrated in print bellow:



$graphLookup sample



The objective is to, given a specific record (commented $match there), retrieve it's full "path" throught immediateAncestors property. As you can see, it's not happening.



I introduced $convert here to deal with _id from collection as string, believing it could be possible to "match" with _id from immediateAncestors records list (which is a string).



So, I did run another test with different data (no ObjectIds involved):



db.nodos.insert("id":5,"name":"cinco","children":["id":4])
db.nodos.insert("id":4,"name":"quatro","ancestors":["id":5],"children":["id":3])
db.nodos.insert("id":6,"name":"seis","children":["id":3])
db.nodos.insert("id":1,"name":"um","children":["id":2])
db.nodos.insert("id":2,"name":"dois","ancestors":["id":1],"children":["id":3])
db.nodos.insert("id":3,"name":"três","ancestors":["id":2,"id":4,"id":6])
db.nodos.insert("id":7,"name":"sete","children":["id":5])


And the query:



db.nodos.aggregate( [
$match: "id": 3 ,
$graphLookup:
from: "nodos",
startWith: "$ancestors.id",
connectFromField: "ancestors.id",
connectToField: "id",
as: "ANCESTORS_FROM_BEGINNING"

,
$project:
"name": 1,
"id": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING.id"


] )


...which outputs what I was expecting (the five records directly and indirectly connected to the one with id 3):




"_id" : ObjectId("5afe270fb4719112b613f1b4"),
"id" : 3.0,
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [
1.0,
4.0,
6.0,
5.0,
2.0
]



The question is: there is a way to achieve the objetive I mentioned in the beginning?



I'm running Mongo 3.7.9 (from official Docker)



Thanks in advance!







mongodb mongodb-query aggregation-framework






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 18 '18 at 6:26









Neil Lunn

101k23181188




101k23181188










asked May 18 '18 at 1:18









CesarCesar

314112




314112







  • 2





    Not the way you seem to think it would. The problem here would appear to be that the "from" in $graphLookup is getting data from the "collection" and not the $project on successive recursions. You could try a "view" which does the projection and use that as the "from" source. Also you probably just want the aliased $toString or $toObjectId in this context, since there's no practical purpose for the onError in your context.

    – Neil Lunn
    May 18 '18 at 1:31






  • 2





    Keep in mind though that using $convert or it's aliases for this specific purpose is really a "band aid" and not a practical solution. The "real solution" is to make sure the data which "looks like an ObjectId" is actually of an ObjectId type in ALL places where you record it. This kind of "brute force" was never the intention of the casting conversions being introduced.

    – Neil Lunn
    May 18 '18 at 1:33











  • Thanks for the considerations Neil. Do you think that storing a true ObjectId there will make things better? If it happens that I could manage that, will the query works, without $project and $convert?

    – Cesar
    May 18 '18 at 2:17







  • 1





    Sorry, been away. Your "screenshot" does not really help here. It would be better to discover the JSON view of the data and paste some of that into the question instead. My general opinion is that the "brute force" you are trying to do is not the correct approach, and instead the types should be converted in the data so that that actually match without coercion. But putting some clear data in the post we can use with copy/paste makes that easier to explain.

    – Neil Lunn
    May 18 '18 at 3:35












  • 2





    Not the way you seem to think it would. The problem here would appear to be that the "from" in $graphLookup is getting data from the "collection" and not the $project on successive recursions. You could try a "view" which does the projection and use that as the "from" source. Also you probably just want the aliased $toString or $toObjectId in this context, since there's no practical purpose for the onError in your context.

    – Neil Lunn
    May 18 '18 at 1:31






  • 2





    Keep in mind though that using $convert or it's aliases for this specific purpose is really a "band aid" and not a practical solution. The "real solution" is to make sure the data which "looks like an ObjectId" is actually of an ObjectId type in ALL places where you record it. This kind of "brute force" was never the intention of the casting conversions being introduced.

    – Neil Lunn
    May 18 '18 at 1:33











  • Thanks for the considerations Neil. Do you think that storing a true ObjectId there will make things better? If it happens that I could manage that, will the query works, without $project and $convert?

    – Cesar
    May 18 '18 at 2:17







  • 1





    Sorry, been away. Your "screenshot" does not really help here. It would be better to discover the JSON view of the data and paste some of that into the question instead. My general opinion is that the "brute force" you are trying to do is not the correct approach, and instead the types should be converted in the data so that that actually match without coercion. But putting some clear data in the post we can use with copy/paste makes that easier to explain.

    – Neil Lunn
    May 18 '18 at 3:35







2




2





Not the way you seem to think it would. The problem here would appear to be that the "from" in $graphLookup is getting data from the "collection" and not the $project on successive recursions. You could try a "view" which does the projection and use that as the "from" source. Also you probably just want the aliased $toString or $toObjectId in this context, since there's no practical purpose for the onError in your context.

– Neil Lunn
May 18 '18 at 1:31





Not the way you seem to think it would. The problem here would appear to be that the "from" in $graphLookup is getting data from the "collection" and not the $project on successive recursions. You could try a "view" which does the projection and use that as the "from" source. Also you probably just want the aliased $toString or $toObjectId in this context, since there's no practical purpose for the onError in your context.

– Neil Lunn
May 18 '18 at 1:31




2




2





Keep in mind though that using $convert or it's aliases for this specific purpose is really a "band aid" and not a practical solution. The "real solution" is to make sure the data which "looks like an ObjectId" is actually of an ObjectId type in ALL places where you record it. This kind of "brute force" was never the intention of the casting conversions being introduced.

– Neil Lunn
May 18 '18 at 1:33





Keep in mind though that using $convert or it's aliases for this specific purpose is really a "band aid" and not a practical solution. The "real solution" is to make sure the data which "looks like an ObjectId" is actually of an ObjectId type in ALL places where you record it. This kind of "brute force" was never the intention of the casting conversions being introduced.

– Neil Lunn
May 18 '18 at 1:33













Thanks for the considerations Neil. Do you think that storing a true ObjectId there will make things better? If it happens that I could manage that, will the query works, without $project and $convert?

– Cesar
May 18 '18 at 2:17






Thanks for the considerations Neil. Do you think that storing a true ObjectId there will make things better? If it happens that I could manage that, will the query works, without $project and $convert?

– Cesar
May 18 '18 at 2:17





1




1





Sorry, been away. Your "screenshot" does not really help here. It would be better to discover the JSON view of the data and paste some of that into the question instead. My general opinion is that the "brute force" you are trying to do is not the correct approach, and instead the types should be converted in the data so that that actually match without coercion. But putting some clear data in the post we can use with copy/paste makes that easier to explain.

– Neil Lunn
May 18 '18 at 3:35





Sorry, been away. Your "screenshot" does not really help here. It would be better to discover the JSON view of the data and paste some of that into the question instead. My general opinion is that the "brute force" you are trying to do is not the correct approach, and instead the types should be converted in the data so that that actually match without coercion. But putting some clear data in the post we can use with copy/paste makes that easier to explain.

– Neil Lunn
May 18 '18 at 3:35












1 Answer
1






active

oldest

votes


















4














You are currently using a development version of MongoDB which has some features enabled expected to be released with MongoDB 4.0 as an official release. Note that some features may be subject to change before the final release, so production code should be aware of this before you commit to it.



Why $convert fails here



Probably the best way to explain this is to look at your altered sample but replacing with ObjectId values for _id and "strings" for those under the the arrays:




"_id" : ObjectId("5afe5763419503c46544e272"),
"name" : "cinco",
"children" : [ "_id" : "5afe5763419503c46544e273" ]
,

"_id" : ObjectId("5afe5763419503c46544e273"),
"name" : "quatro",
"ancestors" : [ "_id" : "5afe5763419503c46544e272" ],
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e274"),
"name" : "seis",
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e275"),
"name" : "um",
"children" : [ "_id" : "5afe5763419503c46544e276" ]


"_id" : ObjectId("5afe5763419503c46544e276"),
"name" : "dois",
"ancestors" : [ "_id" : "5afe5763419503c46544e275" ],
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e277"),
"name" : "três",
"ancestors" : [
"_id" : "5afe5763419503c46544e273" ,
"_id" : "5afe5763419503c46544e274" ,
"_id" : "5afe5763419503c46544e276"
]
,

"_id" : ObjectId("5afe5764419503c46544e278"),
"name" : "sete",
"children" : [ "_id" : "5afe5763419503c46544e272" ]



That should give a general simulation of what you were trying to work with.



What you attempted was to convert the _id value into a "string" via $project before entering the $graphLookup stage. The reason this fails is whilst you did an initial $project "within" this pipeline, the problem is that the source for $graphLookup in the "from" option is still the unaltered collection and therefore you don't get the correct details on the subsequent "lookup" iterations.



db.strcoll.aggregate([
"$match": "name": "três" ,
"$addFields":
"_id": "$toString": "$_id"
,
"$graphLookup":
"from": "strcoll",
"startWith": "$ancestors._id",
"connectFromField": "ancestors._id",
"connectToField": "_id",
"as": "ANCESTORS_FROM_BEGINNING"
,
"$project":
"name": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"

])


Does not match on the "lookup" therefore:




"_id" : "5afe5763419503c46544e277",
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [ ]



"Patching" the problem



However that is the core problem and not a failing of $convert or it's aliases itself. In order to make this actually work we can instead create a "view" which presents itself as a collection for the sake of input.



I'll do this the other way around and convert the "strings" to ObjectId via $toObjectId:



db.createView("idview","strcoll",[
"$addFields":
"ancestors":
"$ifNull": [
"$map":
"input": "$ancestors",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]
,
"children":
"$ifNull": [
"$map":
"input": "$children",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]


])


Using the "view" however means that the data is consistently seen with the values converted. So the following aggregation using the view:



db.idview.aggregate([
"$match": "name": "três" ,
"$graphLookup":
"from": "idview",
"startWith": "$ancestors._id",
"connectFromField": "ancestors._id",
"connectToField": "_id",
"as": "ANCESTORS_FROM_BEGINNING"
,
"$project":
"name": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"

])


Returns the expected output:




"_id" : ObjectId("5afe5763419503c46544e277"),
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [
ObjectId("5afe5763419503c46544e275"),
ObjectId("5afe5763419503c46544e273"),
ObjectId("5afe5763419503c46544e274"),
ObjectId("5afe5763419503c46544e276"),
ObjectId("5afe5763419503c46544e272")
]



Fixing the problem



With all of that said, the real issue here is that you have some data which "looks like" an ObjectId value and is in fact valid as an ObjectId, however it has been recorded as a "string". The basic issue to everything working as it should is that the two "types" are not the same and this results in an equality mismatch as the "joins" are attempted.



So the real fix is still the same as it always has been, which is to instead go through the data and fix it so that the "strings" are actually also ObjectId values. These will then match the _id keys which they are meant to refer to, and you are saving a considerable amount of storage space since an ObjectId takes up a lot less space to store than it's string representation in hexadecimal characters.



Using MongoDB 4.0 methods, you "could" actually use the "$toObjectId" in order to write a new collection, just in much the same matter that we created the "view" earlier:



db.strcoll.aggregate([
"$addFields":
"ancestors":
"$ifNull": [
"$map":
"input": "$ancestors",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]
,
"children":
"$ifNull": [
"$map":
"input": "$children",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]


"$out": "fixedcol"
])


Or of course where you "need" to keep the same collection, then the traditional "loop and update" remains the same as what has always been required:



var updates = [];

db.strcoll.find().forEach(doc =>
var update = '$set': ;

if ( doc.hasOwnProperty('children') )
update.$set.children = doc.children.map(e => ( _id: new ObjectId(e._id) ));
if ( doc.hasOwnProperty('ancestors') )
update.$set.ancestors = doc.ancestors.map(e => ( _id: new ObjectId(e._id) ));

updates.push(
"updateOne":
"filter": "_id": doc._id ,
update

);

if ( updates.length > 1000 )
db.strcoll.bulkWrite(updates);
updates = [];


)

if ( updates.length > 0 )
db.strcoll.bulkWrite(updates);
updates = [];



Which is actually a bit of a "sledgehammer" due to actually overwriting the entire array in a single go. Not a great idea for a production environment, but enough as a demonstration for the purposes of this exercise.



Conclusion



So whilst MongoDB 4.0 will add these "casting" features which can indeed be very useful, their actual intent is not really for cases such as this. They are in fact much more useful as demonstrated in the "conversion" to a new collection using an aggregation pipeline than most other possible uses.



Whilst we "can" create a "view" which transforms the data types to enable things like $lookup and $graphLookup to work where the actual collection data differs, this really is only a "band-aid" on the real problem as the data types really should not differ, and should in fact be permanently converted.



Using a "view" actually means that the aggregation pipeline for construction needs to effectively run every time the "collection" ( actually a "view" ) is accessed, which creates a real overhead.



Avoiding overhead is usually a design goal, therefore correcting such data storage mistakes is imperative to getting real performance out of your application, rather than just working with "brute force" that will only slow things down.




A much safer "conversion" script which applied "matched" updates to each array element. The code here requires NodeJS v10.x and a latest release MongoDB node driver 3.1.x:



const MongoClient, ObjectID: ObjectId = require('mongodb');
const EJSON = require('mongodb-extended-json');

const uri = 'mongodb://localhost/';

const log = data => console.log(EJSON.stringify(data, undefined, 2));

(async function()

try

const client = await MongoClient.connect(uri);
let db = client.db('test');
let coll = db.collection('strcoll');

let fields = ["ancestors", "children"];

let cursor = coll.find(
$or: fields.map(f => ( [`$f._id`]: "$type": "string" ))
).project(fields.reduce((o,f) => ( ...o, [f]: 1 ),));

let batch = [];

for await ( let _id, ...doc of cursor )

let $set = ;
let arrayFilters = [];

for ( const f of fields )
if ( doc.hasOwnProperty(f) )
$set = ...$set,
...doc[f].reduce((o, _id ,i) =>
( ...o, [`$f.$[$f.substr(0,1)$i]._id`]: ObjectId(_id) ),
)
;

arrayFilters = [ ...arrayFilters,
...doc[f].map(( _id ,i) =>
( [`$f.substr(0,1)$i._id`]: _id ))
];



if (arrayFilters.length > 0)
batch = [ ...batch,
updateOne: filter: _id , update: $set , arrayFilters
];

if ( batch.length > 1000 )
let result = await coll.bulkWrite(batch);
batch = [];




if ( batch.length > 0 )
log( batch );
let result = await coll.bulkWrite(batch);
log( result );


await client.close();

catch(e)
console.error(e)
finally
process.exit()


)()


Produces and executes bulk operations like these for the seven documents:




"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e272"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e273"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e273"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e273"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e272"
,
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e272"
,

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e274"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e275"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e276"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e276"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e276"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e275"
,
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e275"
,

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e277"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e273"
,
"ancestors.$[a1]._id":
"$oid": "5afe5763419503c46544e274"
,
"ancestors.$[a2]._id":
"$oid": "5afe5763419503c46544e276"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e273"
,

"a1._id": "5afe5763419503c46544e274"
,

"a2._id": "5afe5763419503c46544e276"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5764419503c46544e278"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e272"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e272"

]







share|improve this answer

























  • Awesome explanation! I'll stay with performance and go with "clean" fields from the beginning 🙄

    – Cesar
    May 18 '18 at 11:18











  • @Cesar It was worth writing because you won't be the last to ask it. People already ask it now who are yet not aware that these "conversion" functions are coming. Once it hits production I would expect the number of people attempting it to only increase. So "thanks for asking" I guess.

    – Neil Lunn
    May 18 '18 at 11:22












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f50402500%2fmatching-objectid-to-string-for-graphlookup%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









4














You are currently using a development version of MongoDB which has some features enabled expected to be released with MongoDB 4.0 as an official release. Note that some features may be subject to change before the final release, so production code should be aware of this before you commit to it.



Why $convert fails here



Probably the best way to explain this is to look at your altered sample but replacing with ObjectId values for _id and "strings" for those under the the arrays:




"_id" : ObjectId("5afe5763419503c46544e272"),
"name" : "cinco",
"children" : [ "_id" : "5afe5763419503c46544e273" ]
,

"_id" : ObjectId("5afe5763419503c46544e273"),
"name" : "quatro",
"ancestors" : [ "_id" : "5afe5763419503c46544e272" ],
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e274"),
"name" : "seis",
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e275"),
"name" : "um",
"children" : [ "_id" : "5afe5763419503c46544e276" ]


"_id" : ObjectId("5afe5763419503c46544e276"),
"name" : "dois",
"ancestors" : [ "_id" : "5afe5763419503c46544e275" ],
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e277"),
"name" : "três",
"ancestors" : [
"_id" : "5afe5763419503c46544e273" ,
"_id" : "5afe5763419503c46544e274" ,
"_id" : "5afe5763419503c46544e276"
]
,

"_id" : ObjectId("5afe5764419503c46544e278"),
"name" : "sete",
"children" : [ "_id" : "5afe5763419503c46544e272" ]



That should give a general simulation of what you were trying to work with.



What you attempted was to convert the _id value into a "string" via $project before entering the $graphLookup stage. The reason this fails is whilst you did an initial $project "within" this pipeline, the problem is that the source for $graphLookup in the "from" option is still the unaltered collection and therefore you don't get the correct details on the subsequent "lookup" iterations.



db.strcoll.aggregate([
"$match": "name": "três" ,
"$addFields":
"_id": "$toString": "$_id"
,
"$graphLookup":
"from": "strcoll",
"startWith": "$ancestors._id",
"connectFromField": "ancestors._id",
"connectToField": "_id",
"as": "ANCESTORS_FROM_BEGINNING"
,
"$project":
"name": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"

])


Does not match on the "lookup" therefore:




"_id" : "5afe5763419503c46544e277",
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [ ]



"Patching" the problem



However that is the core problem and not a failing of $convert or it's aliases itself. In order to make this actually work we can instead create a "view" which presents itself as a collection for the sake of input.



I'll do this the other way around and convert the "strings" to ObjectId via $toObjectId:



db.createView("idview","strcoll",[
"$addFields":
"ancestors":
"$ifNull": [
"$map":
"input": "$ancestors",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]
,
"children":
"$ifNull": [
"$map":
"input": "$children",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]


])


Using the "view" however means that the data is consistently seen with the values converted. So the following aggregation using the view:



db.idview.aggregate([
"$match": "name": "três" ,
"$graphLookup":
"from": "idview",
"startWith": "$ancestors._id",
"connectFromField": "ancestors._id",
"connectToField": "_id",
"as": "ANCESTORS_FROM_BEGINNING"
,
"$project":
"name": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"

])


Returns the expected output:




"_id" : ObjectId("5afe5763419503c46544e277"),
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [
ObjectId("5afe5763419503c46544e275"),
ObjectId("5afe5763419503c46544e273"),
ObjectId("5afe5763419503c46544e274"),
ObjectId("5afe5763419503c46544e276"),
ObjectId("5afe5763419503c46544e272")
]



Fixing the problem



With all of that said, the real issue here is that you have some data which "looks like" an ObjectId value and is in fact valid as an ObjectId, however it has been recorded as a "string". The basic issue to everything working as it should is that the two "types" are not the same and this results in an equality mismatch as the "joins" are attempted.



So the real fix is still the same as it always has been, which is to instead go through the data and fix it so that the "strings" are actually also ObjectId values. These will then match the _id keys which they are meant to refer to, and you are saving a considerable amount of storage space since an ObjectId takes up a lot less space to store than it's string representation in hexadecimal characters.



Using MongoDB 4.0 methods, you "could" actually use the "$toObjectId" in order to write a new collection, just in much the same matter that we created the "view" earlier:



db.strcoll.aggregate([
"$addFields":
"ancestors":
"$ifNull": [
"$map":
"input": "$ancestors",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]
,
"children":
"$ifNull": [
"$map":
"input": "$children",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]


"$out": "fixedcol"
])


Or of course where you "need" to keep the same collection, then the traditional "loop and update" remains the same as what has always been required:



var updates = [];

db.strcoll.find().forEach(doc =>
var update = '$set': ;

if ( doc.hasOwnProperty('children') )
update.$set.children = doc.children.map(e => ( _id: new ObjectId(e._id) ));
if ( doc.hasOwnProperty('ancestors') )
update.$set.ancestors = doc.ancestors.map(e => ( _id: new ObjectId(e._id) ));

updates.push(
"updateOne":
"filter": "_id": doc._id ,
update

);

if ( updates.length > 1000 )
db.strcoll.bulkWrite(updates);
updates = [];


)

if ( updates.length > 0 )
db.strcoll.bulkWrite(updates);
updates = [];



Which is actually a bit of a "sledgehammer" due to actually overwriting the entire array in a single go. Not a great idea for a production environment, but enough as a demonstration for the purposes of this exercise.



Conclusion



So whilst MongoDB 4.0 will add these "casting" features which can indeed be very useful, their actual intent is not really for cases such as this. They are in fact much more useful as demonstrated in the "conversion" to a new collection using an aggregation pipeline than most other possible uses.



Whilst we "can" create a "view" which transforms the data types to enable things like $lookup and $graphLookup to work where the actual collection data differs, this really is only a "band-aid" on the real problem as the data types really should not differ, and should in fact be permanently converted.



Using a "view" actually means that the aggregation pipeline for construction needs to effectively run every time the "collection" ( actually a "view" ) is accessed, which creates a real overhead.



Avoiding overhead is usually a design goal, therefore correcting such data storage mistakes is imperative to getting real performance out of your application, rather than just working with "brute force" that will only slow things down.




A much safer "conversion" script which applied "matched" updates to each array element. The code here requires NodeJS v10.x and a latest release MongoDB node driver 3.1.x:



const MongoClient, ObjectID: ObjectId = require('mongodb');
const EJSON = require('mongodb-extended-json');

const uri = 'mongodb://localhost/';

const log = data => console.log(EJSON.stringify(data, undefined, 2));

(async function()

try

const client = await MongoClient.connect(uri);
let db = client.db('test');
let coll = db.collection('strcoll');

let fields = ["ancestors", "children"];

let cursor = coll.find(
$or: fields.map(f => ( [`$f._id`]: "$type": "string" ))
).project(fields.reduce((o,f) => ( ...o, [f]: 1 ),));

let batch = [];

for await ( let _id, ...doc of cursor )

let $set = ;
let arrayFilters = [];

for ( const f of fields )
if ( doc.hasOwnProperty(f) )
$set = ...$set,
...doc[f].reduce((o, _id ,i) =>
( ...o, [`$f.$[$f.substr(0,1)$i]._id`]: ObjectId(_id) ),
)
;

arrayFilters = [ ...arrayFilters,
...doc[f].map(( _id ,i) =>
( [`$f.substr(0,1)$i._id`]: _id ))
];



if (arrayFilters.length > 0)
batch = [ ...batch,
updateOne: filter: _id , update: $set , arrayFilters
];

if ( batch.length > 1000 )
let result = await coll.bulkWrite(batch);
batch = [];




if ( batch.length > 0 )
log( batch );
let result = await coll.bulkWrite(batch);
log( result );


await client.close();

catch(e)
console.error(e)
finally
process.exit()


)()


Produces and executes bulk operations like these for the seven documents:




"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e272"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e273"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e273"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e273"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e272"
,
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e272"
,

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e274"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e275"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e276"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e276"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e276"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e275"
,
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e275"
,

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e277"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e273"
,
"ancestors.$[a1]._id":
"$oid": "5afe5763419503c46544e274"
,
"ancestors.$[a2]._id":
"$oid": "5afe5763419503c46544e276"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e273"
,

"a1._id": "5afe5763419503c46544e274"
,

"a2._id": "5afe5763419503c46544e276"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5764419503c46544e278"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e272"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e272"

]







share|improve this answer

























  • Awesome explanation! I'll stay with performance and go with "clean" fields from the beginning 🙄

    – Cesar
    May 18 '18 at 11:18











  • @Cesar It was worth writing because you won't be the last to ask it. People already ask it now who are yet not aware that these "conversion" functions are coming. Once it hits production I would expect the number of people attempting it to only increase. So "thanks for asking" I guess.

    – Neil Lunn
    May 18 '18 at 11:22
















4














You are currently using a development version of MongoDB which has some features enabled expected to be released with MongoDB 4.0 as an official release. Note that some features may be subject to change before the final release, so production code should be aware of this before you commit to it.



Why $convert fails here



Probably the best way to explain this is to look at your altered sample but replacing with ObjectId values for _id and "strings" for those under the the arrays:




"_id" : ObjectId("5afe5763419503c46544e272"),
"name" : "cinco",
"children" : [ "_id" : "5afe5763419503c46544e273" ]
,

"_id" : ObjectId("5afe5763419503c46544e273"),
"name" : "quatro",
"ancestors" : [ "_id" : "5afe5763419503c46544e272" ],
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e274"),
"name" : "seis",
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e275"),
"name" : "um",
"children" : [ "_id" : "5afe5763419503c46544e276" ]


"_id" : ObjectId("5afe5763419503c46544e276"),
"name" : "dois",
"ancestors" : [ "_id" : "5afe5763419503c46544e275" ],
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e277"),
"name" : "três",
"ancestors" : [
"_id" : "5afe5763419503c46544e273" ,
"_id" : "5afe5763419503c46544e274" ,
"_id" : "5afe5763419503c46544e276"
]
,

"_id" : ObjectId("5afe5764419503c46544e278"),
"name" : "sete",
"children" : [ "_id" : "5afe5763419503c46544e272" ]



That should give a general simulation of what you were trying to work with.



What you attempted was to convert the _id value into a "string" via $project before entering the $graphLookup stage. The reason this fails is whilst you did an initial $project "within" this pipeline, the problem is that the source for $graphLookup in the "from" option is still the unaltered collection and therefore you don't get the correct details on the subsequent "lookup" iterations.



db.strcoll.aggregate([
"$match": "name": "três" ,
"$addFields":
"_id": "$toString": "$_id"
,
"$graphLookup":
"from": "strcoll",
"startWith": "$ancestors._id",
"connectFromField": "ancestors._id",
"connectToField": "_id",
"as": "ANCESTORS_FROM_BEGINNING"
,
"$project":
"name": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"

])


Does not match on the "lookup" therefore:




"_id" : "5afe5763419503c46544e277",
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [ ]



"Patching" the problem



However that is the core problem and not a failing of $convert or it's aliases itself. In order to make this actually work we can instead create a "view" which presents itself as a collection for the sake of input.



I'll do this the other way around and convert the "strings" to ObjectId via $toObjectId:



db.createView("idview","strcoll",[
"$addFields":
"ancestors":
"$ifNull": [
"$map":
"input": "$ancestors",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]
,
"children":
"$ifNull": [
"$map":
"input": "$children",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]


])


Using the "view" however means that the data is consistently seen with the values converted. So the following aggregation using the view:



db.idview.aggregate([
"$match": "name": "três" ,
"$graphLookup":
"from": "idview",
"startWith": "$ancestors._id",
"connectFromField": "ancestors._id",
"connectToField": "_id",
"as": "ANCESTORS_FROM_BEGINNING"
,
"$project":
"name": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"

])


Returns the expected output:




"_id" : ObjectId("5afe5763419503c46544e277"),
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [
ObjectId("5afe5763419503c46544e275"),
ObjectId("5afe5763419503c46544e273"),
ObjectId("5afe5763419503c46544e274"),
ObjectId("5afe5763419503c46544e276"),
ObjectId("5afe5763419503c46544e272")
]



Fixing the problem



With all of that said, the real issue here is that you have some data which "looks like" an ObjectId value and is in fact valid as an ObjectId, however it has been recorded as a "string". The basic issue to everything working as it should is that the two "types" are not the same and this results in an equality mismatch as the "joins" are attempted.



So the real fix is still the same as it always has been, which is to instead go through the data and fix it so that the "strings" are actually also ObjectId values. These will then match the _id keys which they are meant to refer to, and you are saving a considerable amount of storage space since an ObjectId takes up a lot less space to store than it's string representation in hexadecimal characters.



Using MongoDB 4.0 methods, you "could" actually use the "$toObjectId" in order to write a new collection, just in much the same matter that we created the "view" earlier:



db.strcoll.aggregate([
"$addFields":
"ancestors":
"$ifNull": [
"$map":
"input": "$ancestors",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]
,
"children":
"$ifNull": [
"$map":
"input": "$children",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]


"$out": "fixedcol"
])


Or of course where you "need" to keep the same collection, then the traditional "loop and update" remains the same as what has always been required:



var updates = [];

db.strcoll.find().forEach(doc =>
var update = '$set': ;

if ( doc.hasOwnProperty('children') )
update.$set.children = doc.children.map(e => ( _id: new ObjectId(e._id) ));
if ( doc.hasOwnProperty('ancestors') )
update.$set.ancestors = doc.ancestors.map(e => ( _id: new ObjectId(e._id) ));

updates.push(
"updateOne":
"filter": "_id": doc._id ,
update

);

if ( updates.length > 1000 )
db.strcoll.bulkWrite(updates);
updates = [];


)

if ( updates.length > 0 )
db.strcoll.bulkWrite(updates);
updates = [];



Which is actually a bit of a "sledgehammer" due to actually overwriting the entire array in a single go. Not a great idea for a production environment, but enough as a demonstration for the purposes of this exercise.



Conclusion



So whilst MongoDB 4.0 will add these "casting" features which can indeed be very useful, their actual intent is not really for cases such as this. They are in fact much more useful as demonstrated in the "conversion" to a new collection using an aggregation pipeline than most other possible uses.



Whilst we "can" create a "view" which transforms the data types to enable things like $lookup and $graphLookup to work where the actual collection data differs, this really is only a "band-aid" on the real problem as the data types really should not differ, and should in fact be permanently converted.



Using a "view" actually means that the aggregation pipeline for construction needs to effectively run every time the "collection" ( actually a "view" ) is accessed, which creates a real overhead.



Avoiding overhead is usually a design goal, therefore correcting such data storage mistakes is imperative to getting real performance out of your application, rather than just working with "brute force" that will only slow things down.




A much safer "conversion" script which applied "matched" updates to each array element. The code here requires NodeJS v10.x and a latest release MongoDB node driver 3.1.x:



const MongoClient, ObjectID: ObjectId = require('mongodb');
const EJSON = require('mongodb-extended-json');

const uri = 'mongodb://localhost/';

const log = data => console.log(EJSON.stringify(data, undefined, 2));

(async function()

try

const client = await MongoClient.connect(uri);
let db = client.db('test');
let coll = db.collection('strcoll');

let fields = ["ancestors", "children"];

let cursor = coll.find(
$or: fields.map(f => ( [`$f._id`]: "$type": "string" ))
).project(fields.reduce((o,f) => ( ...o, [f]: 1 ),));

let batch = [];

for await ( let _id, ...doc of cursor )

let $set = ;
let arrayFilters = [];

for ( const f of fields )
if ( doc.hasOwnProperty(f) )
$set = ...$set,
...doc[f].reduce((o, _id ,i) =>
( ...o, [`$f.$[$f.substr(0,1)$i]._id`]: ObjectId(_id) ),
)
;

arrayFilters = [ ...arrayFilters,
...doc[f].map(( _id ,i) =>
( [`$f.substr(0,1)$i._id`]: _id ))
];



if (arrayFilters.length > 0)
batch = [ ...batch,
updateOne: filter: _id , update: $set , arrayFilters
];

if ( batch.length > 1000 )
let result = await coll.bulkWrite(batch);
batch = [];




if ( batch.length > 0 )
log( batch );
let result = await coll.bulkWrite(batch);
log( result );


await client.close();

catch(e)
console.error(e)
finally
process.exit()


)()


Produces and executes bulk operations like these for the seven documents:




"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e272"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e273"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e273"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e273"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e272"
,
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e272"
,

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e274"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e275"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e276"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e276"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e276"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e275"
,
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e275"
,

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e277"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e273"
,
"ancestors.$[a1]._id":
"$oid": "5afe5763419503c46544e274"
,
"ancestors.$[a2]._id":
"$oid": "5afe5763419503c46544e276"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e273"
,

"a1._id": "5afe5763419503c46544e274"
,

"a2._id": "5afe5763419503c46544e276"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5764419503c46544e278"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e272"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e272"

]







share|improve this answer

























  • Awesome explanation! I'll stay with performance and go with "clean" fields from the beginning 🙄

    – Cesar
    May 18 '18 at 11:18











  • @Cesar It was worth writing because you won't be the last to ask it. People already ask it now who are yet not aware that these "conversion" functions are coming. Once it hits production I would expect the number of people attempting it to only increase. So "thanks for asking" I guess.

    – Neil Lunn
    May 18 '18 at 11:22














4












4








4







You are currently using a development version of MongoDB which has some features enabled expected to be released with MongoDB 4.0 as an official release. Note that some features may be subject to change before the final release, so production code should be aware of this before you commit to it.



Why $convert fails here



Probably the best way to explain this is to look at your altered sample but replacing with ObjectId values for _id and "strings" for those under the the arrays:




"_id" : ObjectId("5afe5763419503c46544e272"),
"name" : "cinco",
"children" : [ "_id" : "5afe5763419503c46544e273" ]
,

"_id" : ObjectId("5afe5763419503c46544e273"),
"name" : "quatro",
"ancestors" : [ "_id" : "5afe5763419503c46544e272" ],
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e274"),
"name" : "seis",
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e275"),
"name" : "um",
"children" : [ "_id" : "5afe5763419503c46544e276" ]


"_id" : ObjectId("5afe5763419503c46544e276"),
"name" : "dois",
"ancestors" : [ "_id" : "5afe5763419503c46544e275" ],
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e277"),
"name" : "três",
"ancestors" : [
"_id" : "5afe5763419503c46544e273" ,
"_id" : "5afe5763419503c46544e274" ,
"_id" : "5afe5763419503c46544e276"
]
,

"_id" : ObjectId("5afe5764419503c46544e278"),
"name" : "sete",
"children" : [ "_id" : "5afe5763419503c46544e272" ]



That should give a general simulation of what you were trying to work with.



What you attempted was to convert the _id value into a "string" via $project before entering the $graphLookup stage. The reason this fails is whilst you did an initial $project "within" this pipeline, the problem is that the source for $graphLookup in the "from" option is still the unaltered collection and therefore you don't get the correct details on the subsequent "lookup" iterations.



db.strcoll.aggregate([
"$match": "name": "três" ,
"$addFields":
"_id": "$toString": "$_id"
,
"$graphLookup":
"from": "strcoll",
"startWith": "$ancestors._id",
"connectFromField": "ancestors._id",
"connectToField": "_id",
"as": "ANCESTORS_FROM_BEGINNING"
,
"$project":
"name": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"

])


Does not match on the "lookup" therefore:




"_id" : "5afe5763419503c46544e277",
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [ ]



"Patching" the problem



However that is the core problem and not a failing of $convert or it's aliases itself. In order to make this actually work we can instead create a "view" which presents itself as a collection for the sake of input.



I'll do this the other way around and convert the "strings" to ObjectId via $toObjectId:



db.createView("idview","strcoll",[
"$addFields":
"ancestors":
"$ifNull": [
"$map":
"input": "$ancestors",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]
,
"children":
"$ifNull": [
"$map":
"input": "$children",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]


])


Using the "view" however means that the data is consistently seen with the values converted. So the following aggregation using the view:



db.idview.aggregate([
"$match": "name": "três" ,
"$graphLookup":
"from": "idview",
"startWith": "$ancestors._id",
"connectFromField": "ancestors._id",
"connectToField": "_id",
"as": "ANCESTORS_FROM_BEGINNING"
,
"$project":
"name": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"

])


Returns the expected output:




"_id" : ObjectId("5afe5763419503c46544e277"),
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [
ObjectId("5afe5763419503c46544e275"),
ObjectId("5afe5763419503c46544e273"),
ObjectId("5afe5763419503c46544e274"),
ObjectId("5afe5763419503c46544e276"),
ObjectId("5afe5763419503c46544e272")
]



Fixing the problem



With all of that said, the real issue here is that you have some data which "looks like" an ObjectId value and is in fact valid as an ObjectId, however it has been recorded as a "string". The basic issue to everything working as it should is that the two "types" are not the same and this results in an equality mismatch as the "joins" are attempted.



So the real fix is still the same as it always has been, which is to instead go through the data and fix it so that the "strings" are actually also ObjectId values. These will then match the _id keys which they are meant to refer to, and you are saving a considerable amount of storage space since an ObjectId takes up a lot less space to store than it's string representation in hexadecimal characters.



Using MongoDB 4.0 methods, you "could" actually use the "$toObjectId" in order to write a new collection, just in much the same matter that we created the "view" earlier:



db.strcoll.aggregate([
"$addFields":
"ancestors":
"$ifNull": [
"$map":
"input": "$ancestors",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]
,
"children":
"$ifNull": [
"$map":
"input": "$children",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]


"$out": "fixedcol"
])


Or of course where you "need" to keep the same collection, then the traditional "loop and update" remains the same as what has always been required:



var updates = [];

db.strcoll.find().forEach(doc =>
var update = '$set': ;

if ( doc.hasOwnProperty('children') )
update.$set.children = doc.children.map(e => ( _id: new ObjectId(e._id) ));
if ( doc.hasOwnProperty('ancestors') )
update.$set.ancestors = doc.ancestors.map(e => ( _id: new ObjectId(e._id) ));

updates.push(
"updateOne":
"filter": "_id": doc._id ,
update

);

if ( updates.length > 1000 )
db.strcoll.bulkWrite(updates);
updates = [];


)

if ( updates.length > 0 )
db.strcoll.bulkWrite(updates);
updates = [];



Which is actually a bit of a "sledgehammer" due to actually overwriting the entire array in a single go. Not a great idea for a production environment, but enough as a demonstration for the purposes of this exercise.



Conclusion



So whilst MongoDB 4.0 will add these "casting" features which can indeed be very useful, their actual intent is not really for cases such as this. They are in fact much more useful as demonstrated in the "conversion" to a new collection using an aggregation pipeline than most other possible uses.



Whilst we "can" create a "view" which transforms the data types to enable things like $lookup and $graphLookup to work where the actual collection data differs, this really is only a "band-aid" on the real problem as the data types really should not differ, and should in fact be permanently converted.



Using a "view" actually means that the aggregation pipeline for construction needs to effectively run every time the "collection" ( actually a "view" ) is accessed, which creates a real overhead.



Avoiding overhead is usually a design goal, therefore correcting such data storage mistakes is imperative to getting real performance out of your application, rather than just working with "brute force" that will only slow things down.




A much safer "conversion" script which applied "matched" updates to each array element. The code here requires NodeJS v10.x and a latest release MongoDB node driver 3.1.x:



const MongoClient, ObjectID: ObjectId = require('mongodb');
const EJSON = require('mongodb-extended-json');

const uri = 'mongodb://localhost/';

const log = data => console.log(EJSON.stringify(data, undefined, 2));

(async function()

try

const client = await MongoClient.connect(uri);
let db = client.db('test');
let coll = db.collection('strcoll');

let fields = ["ancestors", "children"];

let cursor = coll.find(
$or: fields.map(f => ( [`$f._id`]: "$type": "string" ))
).project(fields.reduce((o,f) => ( ...o, [f]: 1 ),));

let batch = [];

for await ( let _id, ...doc of cursor )

let $set = ;
let arrayFilters = [];

for ( const f of fields )
if ( doc.hasOwnProperty(f) )
$set = ...$set,
...doc[f].reduce((o, _id ,i) =>
( ...o, [`$f.$[$f.substr(0,1)$i]._id`]: ObjectId(_id) ),
)
;

arrayFilters = [ ...arrayFilters,
...doc[f].map(( _id ,i) =>
( [`$f.substr(0,1)$i._id`]: _id ))
];



if (arrayFilters.length > 0)
batch = [ ...batch,
updateOne: filter: _id , update: $set , arrayFilters
];

if ( batch.length > 1000 )
let result = await coll.bulkWrite(batch);
batch = [];




if ( batch.length > 0 )
log( batch );
let result = await coll.bulkWrite(batch);
log( result );


await client.close();

catch(e)
console.error(e)
finally
process.exit()


)()


Produces and executes bulk operations like these for the seven documents:




"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e272"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e273"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e273"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e273"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e272"
,
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e272"
,

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e274"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e275"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e276"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e276"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e276"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e275"
,
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e275"
,

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e277"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e273"
,
"ancestors.$[a1]._id":
"$oid": "5afe5763419503c46544e274"
,
"ancestors.$[a2]._id":
"$oid": "5afe5763419503c46544e276"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e273"
,

"a1._id": "5afe5763419503c46544e274"
,

"a2._id": "5afe5763419503c46544e276"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5764419503c46544e278"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e272"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e272"

]







share|improve this answer















You are currently using a development version of MongoDB which has some features enabled expected to be released with MongoDB 4.0 as an official release. Note that some features may be subject to change before the final release, so production code should be aware of this before you commit to it.



Why $convert fails here



Probably the best way to explain this is to look at your altered sample but replacing with ObjectId values for _id and "strings" for those under the the arrays:




"_id" : ObjectId("5afe5763419503c46544e272"),
"name" : "cinco",
"children" : [ "_id" : "5afe5763419503c46544e273" ]
,

"_id" : ObjectId("5afe5763419503c46544e273"),
"name" : "quatro",
"ancestors" : [ "_id" : "5afe5763419503c46544e272" ],
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e274"),
"name" : "seis",
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e275"),
"name" : "um",
"children" : [ "_id" : "5afe5763419503c46544e276" ]


"_id" : ObjectId("5afe5763419503c46544e276"),
"name" : "dois",
"ancestors" : [ "_id" : "5afe5763419503c46544e275" ],
"children" : [ "_id" : "5afe5763419503c46544e277" ]
,

"_id" : ObjectId("5afe5763419503c46544e277"),
"name" : "três",
"ancestors" : [
"_id" : "5afe5763419503c46544e273" ,
"_id" : "5afe5763419503c46544e274" ,
"_id" : "5afe5763419503c46544e276"
]
,

"_id" : ObjectId("5afe5764419503c46544e278"),
"name" : "sete",
"children" : [ "_id" : "5afe5763419503c46544e272" ]



That should give a general simulation of what you were trying to work with.



What you attempted was to convert the _id value into a "string" via $project before entering the $graphLookup stage. The reason this fails is whilst you did an initial $project "within" this pipeline, the problem is that the source for $graphLookup in the "from" option is still the unaltered collection and therefore you don't get the correct details on the subsequent "lookup" iterations.



db.strcoll.aggregate([
"$match": "name": "três" ,
"$addFields":
"_id": "$toString": "$_id"
,
"$graphLookup":
"from": "strcoll",
"startWith": "$ancestors._id",
"connectFromField": "ancestors._id",
"connectToField": "_id",
"as": "ANCESTORS_FROM_BEGINNING"
,
"$project":
"name": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"

])


Does not match on the "lookup" therefore:




"_id" : "5afe5763419503c46544e277",
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [ ]



"Patching" the problem



However that is the core problem and not a failing of $convert or it's aliases itself. In order to make this actually work we can instead create a "view" which presents itself as a collection for the sake of input.



I'll do this the other way around and convert the "strings" to ObjectId via $toObjectId:



db.createView("idview","strcoll",[
"$addFields":
"ancestors":
"$ifNull": [
"$map":
"input": "$ancestors",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]
,
"children":
"$ifNull": [
"$map":
"input": "$children",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]


])


Using the "view" however means that the data is consistently seen with the values converted. So the following aggregation using the view:



db.idview.aggregate([
"$match": "name": "três" ,
"$graphLookup":
"from": "idview",
"startWith": "$ancestors._id",
"connectFromField": "ancestors._id",
"connectToField": "_id",
"as": "ANCESTORS_FROM_BEGINNING"
,
"$project":
"name": 1,
"ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"

])


Returns the expected output:




"_id" : ObjectId("5afe5763419503c46544e277"),
"name" : "três",
"ANCESTORS_FROM_BEGINNING" : [
ObjectId("5afe5763419503c46544e275"),
ObjectId("5afe5763419503c46544e273"),
ObjectId("5afe5763419503c46544e274"),
ObjectId("5afe5763419503c46544e276"),
ObjectId("5afe5763419503c46544e272")
]



Fixing the problem



With all of that said, the real issue here is that you have some data which "looks like" an ObjectId value and is in fact valid as an ObjectId, however it has been recorded as a "string". The basic issue to everything working as it should is that the two "types" are not the same and this results in an equality mismatch as the "joins" are attempted.



So the real fix is still the same as it always has been, which is to instead go through the data and fix it so that the "strings" are actually also ObjectId values. These will then match the _id keys which they are meant to refer to, and you are saving a considerable amount of storage space since an ObjectId takes up a lot less space to store than it's string representation in hexadecimal characters.



Using MongoDB 4.0 methods, you "could" actually use the "$toObjectId" in order to write a new collection, just in much the same matter that we created the "view" earlier:



db.strcoll.aggregate([
"$addFields":
"ancestors":
"$ifNull": [
"$map":
"input": "$ancestors",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]
,
"children":
"$ifNull": [
"$map":
"input": "$children",
"in": "_id": "$toObjectId": "$$this._id"
,
"$$REMOVE"
]


"$out": "fixedcol"
])


Or of course where you "need" to keep the same collection, then the traditional "loop and update" remains the same as what has always been required:



var updates = [];

db.strcoll.find().forEach(doc =>
var update = '$set': ;

if ( doc.hasOwnProperty('children') )
update.$set.children = doc.children.map(e => ( _id: new ObjectId(e._id) ));
if ( doc.hasOwnProperty('ancestors') )
update.$set.ancestors = doc.ancestors.map(e => ( _id: new ObjectId(e._id) ));

updates.push(
"updateOne":
"filter": "_id": doc._id ,
update

);

if ( updates.length > 1000 )
db.strcoll.bulkWrite(updates);
updates = [];


)

if ( updates.length > 0 )
db.strcoll.bulkWrite(updates);
updates = [];



Which is actually a bit of a "sledgehammer" due to actually overwriting the entire array in a single go. Not a great idea for a production environment, but enough as a demonstration for the purposes of this exercise.



Conclusion



So whilst MongoDB 4.0 will add these "casting" features which can indeed be very useful, their actual intent is not really for cases such as this. They are in fact much more useful as demonstrated in the "conversion" to a new collection using an aggregation pipeline than most other possible uses.



Whilst we "can" create a "view" which transforms the data types to enable things like $lookup and $graphLookup to work where the actual collection data differs, this really is only a "band-aid" on the real problem as the data types really should not differ, and should in fact be permanently converted.



Using a "view" actually means that the aggregation pipeline for construction needs to effectively run every time the "collection" ( actually a "view" ) is accessed, which creates a real overhead.



Avoiding overhead is usually a design goal, therefore correcting such data storage mistakes is imperative to getting real performance out of your application, rather than just working with "brute force" that will only slow things down.




A much safer "conversion" script which applied "matched" updates to each array element. The code here requires NodeJS v10.x and a latest release MongoDB node driver 3.1.x:



const MongoClient, ObjectID: ObjectId = require('mongodb');
const EJSON = require('mongodb-extended-json');

const uri = 'mongodb://localhost/';

const log = data => console.log(EJSON.stringify(data, undefined, 2));

(async function()

try

const client = await MongoClient.connect(uri);
let db = client.db('test');
let coll = db.collection('strcoll');

let fields = ["ancestors", "children"];

let cursor = coll.find(
$or: fields.map(f => ( [`$f._id`]: "$type": "string" ))
).project(fields.reduce((o,f) => ( ...o, [f]: 1 ),));

let batch = [];

for await ( let _id, ...doc of cursor )

let $set = ;
let arrayFilters = [];

for ( const f of fields )
if ( doc.hasOwnProperty(f) )
$set = ...$set,
...doc[f].reduce((o, _id ,i) =>
( ...o, [`$f.$[$f.substr(0,1)$i]._id`]: ObjectId(_id) ),
)
;

arrayFilters = [ ...arrayFilters,
...doc[f].map(( _id ,i) =>
( [`$f.substr(0,1)$i._id`]: _id ))
];



if (arrayFilters.length > 0)
batch = [ ...batch,
updateOne: filter: _id , update: $set , arrayFilters
];

if ( batch.length > 1000 )
let result = await coll.bulkWrite(batch);
batch = [];




if ( batch.length > 0 )
log( batch );
let result = await coll.bulkWrite(batch);
log( result );


await client.close();

catch(e)
console.error(e)
finally
process.exit()


)()


Produces and executes bulk operations like these for the seven documents:




"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e272"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e273"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e273"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e273"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e272"
,
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e272"
,

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e274"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e275"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e276"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e276"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e276"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e275"
,
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e277"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e275"
,

"c0._id": "5afe5763419503c46544e277"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5763419503c46544e277"

,
"update":
"$set":
"ancestors.$[a0]._id":
"$oid": "5afe5763419503c46544e273"
,
"ancestors.$[a1]._id":
"$oid": "5afe5763419503c46544e274"
,
"ancestors.$[a2]._id":
"$oid": "5afe5763419503c46544e276"


,
"arrayFilters": [

"a0._id": "5afe5763419503c46544e273"
,

"a1._id": "5afe5763419503c46544e274"
,

"a2._id": "5afe5763419503c46544e276"

]

,

"updateOne":
"filter":
"_id":
"$oid": "5afe5764419503c46544e278"

,
"update":
"$set":
"children.$[c0]._id":
"$oid": "5afe5763419503c46544e272"


,
"arrayFilters": [

"c0._id": "5afe5763419503c46544e272"

]








share|improve this answer














share|improve this answer



share|improve this answer








edited May 19 '18 at 4:45

























answered May 18 '18 at 6:24









Neil LunnNeil Lunn

101k23181188




101k23181188












  • Awesome explanation! I'll stay with performance and go with "clean" fields from the beginning 🙄

    – Cesar
    May 18 '18 at 11:18











  • @Cesar It was worth writing because you won't be the last to ask it. People already ask it now who are yet not aware that these "conversion" functions are coming. Once it hits production I would expect the number of people attempting it to only increase. So "thanks for asking" I guess.

    – Neil Lunn
    May 18 '18 at 11:22


















  • Awesome explanation! I'll stay with performance and go with "clean" fields from the beginning 🙄

    – Cesar
    May 18 '18 at 11:18











  • @Cesar It was worth writing because you won't be the last to ask it. People already ask it now who are yet not aware that these "conversion" functions are coming. Once it hits production I would expect the number of people attempting it to only increase. So "thanks for asking" I guess.

    – Neil Lunn
    May 18 '18 at 11:22

















Awesome explanation! I'll stay with performance and go with "clean" fields from the beginning 🙄

– Cesar
May 18 '18 at 11:18





Awesome explanation! I'll stay with performance and go with "clean" fields from the beginning 🙄

– Cesar
May 18 '18 at 11:18













@Cesar It was worth writing because you won't be the last to ask it. People already ask it now who are yet not aware that these "conversion" functions are coming. Once it hits production I would expect the number of people attempting it to only increase. So "thanks for asking" I guess.

– Neil Lunn
May 18 '18 at 11:22






@Cesar It was worth writing because you won't be the last to ask it. People already ask it now who are yet not aware that these "conversion" functions are coming. Once it hits production I would expect the number of people attempting it to only increase. So "thanks for asking" I guess.

– Neil Lunn
May 18 '18 at 11:22




















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f50402500%2fmatching-objectid-to-string-for-graphlookup%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript