Calculating the frequency counter of all the unique values of a nested fieldGet unique values from a list in pythonUsing Mongo aggregation to calculate sum of valuesMerging array fields in MongoDB aggregationUsing maxTimeMS parameter with aggregation queries on Mongo 2.6 and Pymongo 2.7.1Update all the values of the json document to default value using pymongoMongo aggregation with unwind operationCount Distinct Within Date RangeAggregate counts into count object using $group in mongodbCount of existing values for each field of a collection, strange behaviorProjecting flat values from nested objects
Reducing Spill Overs
Is there any use case for the bottom type as a function parameter type?
shutdown at specific date
Leading and Suffering Numbers
What is the 中 in ダウンロード中?
Why does the UK have more political parties than the US?
How did early x86 BIOS programmers manage to program full blown TUIs given very few bytes of ROM/EPROM?
Ticket sales for Queen at the Live Aid
How to capture more stars?
Why do Russians call their women expensive ("дорогая")?
What does uniform continuity mean exactly?
Employer demanding to see degree after poor code review
How can I find where certain bash function is defined?
How to prevent bad sectors?
How to extract lower and upper bound in numeric format from a confidence interval string?
In “An element of a set can never be a subset of itself”, what does ‘itself’ stand for?
Where is the logic in castrating fighters?
Break equation in parts
1960s sci-fi novella with a character who is treated as invisible by being ignored
If a person had control of every single cell of their body, would they be able to transform into another creature?
Plot exactly N bounce of a ball
Crossing US border with music files I'm legally allowed to possess
Canon 70D often overexposing or underexposing shots
If a massive object like Jupiter flew past the Earth how close would it need to come to pull people off of the surface?
Calculating the frequency counter of all the unique values of a nested field
Get unique values from a list in pythonUsing Mongo aggregation to calculate sum of valuesMerging array fields in MongoDB aggregationUsing maxTimeMS parameter with aggregation queries on Mongo 2.6 and Pymongo 2.7.1Update all the values of the json document to default value using pymongoMongo aggregation with unwind operationCount Distinct Within Date RangeAggregate counts into count object using $group in mongodbCount of existing values for each field of a collection, strange behaviorProjecting flat values from nested objects
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I want to find the frequency counter of all unique values of a nested field in Mongo Document.
To be more specific, if my collection say db['sample'], consists of the following documents -
'a' : 1, 'b' : 'c' : 25, 'd' : "x", 'e' : 36,
'a' : 2, 'b' : 'c' : 5, 'd' : "xx", 'e' : 36,
'a' : 33, 'b' : 'c' : 25, 'd' : "xx", 'e' : 36,
'a' : 17, 'b' : 'c' : 25, 'd' : "xxx", 'e' : 36,
how can I get the frequency counter of all unique values for the field 'd' ? i.e. my output should be 'd' : "xx" : 2, "x" : 1, "xxx" : 1
Is this even possible ? Appreciate any help on this. Thank you.
I looked up the documentation for aggregation and objectToArray transformation to convert the map to array and tried the following in PyMongo
1)
db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
"$unwind" : "$b",
"$group" : "_id" : "$b.k",
"count" : "$sum" : "$b.v" ])
This gives the cumulative count of each of the fields where it it possible - 'c' : 25 + 5 + 25 + 25 for example.
2)
db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
"$unwind" : "$b",
"$group" : "_id" : "$b.k",
"count" : "$sum" : 1 ])
This gives the total number of times the fields are present in the document - 'c' : 4, 'd' : 4 etc.
python mongodb aggregation-framework pymongo
add a comment |
I want to find the frequency counter of all unique values of a nested field in Mongo Document.
To be more specific, if my collection say db['sample'], consists of the following documents -
'a' : 1, 'b' : 'c' : 25, 'd' : "x", 'e' : 36,
'a' : 2, 'b' : 'c' : 5, 'd' : "xx", 'e' : 36,
'a' : 33, 'b' : 'c' : 25, 'd' : "xx", 'e' : 36,
'a' : 17, 'b' : 'c' : 25, 'd' : "xxx", 'e' : 36,
how can I get the frequency counter of all unique values for the field 'd' ? i.e. my output should be 'd' : "xx" : 2, "x" : 1, "xxx" : 1
Is this even possible ? Appreciate any help on this. Thank you.
I looked up the documentation for aggregation and objectToArray transformation to convert the map to array and tried the following in PyMongo
1)
db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
"$unwind" : "$b",
"$group" : "_id" : "$b.k",
"count" : "$sum" : "$b.v" ])
This gives the cumulative count of each of the fields where it it possible - 'c' : 25 + 5 + 25 + 25 for example.
2)
db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
"$unwind" : "$b",
"$group" : "_id" : "$b.k",
"count" : "$sum" : 1 ])
This gives the total number of times the fields are present in the document - 'c' : 4, 'd' : 4 etc.
python mongodb aggregation-framework pymongo
add a comment |
I want to find the frequency counter of all unique values of a nested field in Mongo Document.
To be more specific, if my collection say db['sample'], consists of the following documents -
'a' : 1, 'b' : 'c' : 25, 'd' : "x", 'e' : 36,
'a' : 2, 'b' : 'c' : 5, 'd' : "xx", 'e' : 36,
'a' : 33, 'b' : 'c' : 25, 'd' : "xx", 'e' : 36,
'a' : 17, 'b' : 'c' : 25, 'd' : "xxx", 'e' : 36,
how can I get the frequency counter of all unique values for the field 'd' ? i.e. my output should be 'd' : "xx" : 2, "x" : 1, "xxx" : 1
Is this even possible ? Appreciate any help on this. Thank you.
I looked up the documentation for aggregation and objectToArray transformation to convert the map to array and tried the following in PyMongo
1)
db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
"$unwind" : "$b",
"$group" : "_id" : "$b.k",
"count" : "$sum" : "$b.v" ])
This gives the cumulative count of each of the fields where it it possible - 'c' : 25 + 5 + 25 + 25 for example.
2)
db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
"$unwind" : "$b",
"$group" : "_id" : "$b.k",
"count" : "$sum" : 1 ])
This gives the total number of times the fields are present in the document - 'c' : 4, 'd' : 4 etc.
python mongodb aggregation-framework pymongo
I want to find the frequency counter of all unique values of a nested field in Mongo Document.
To be more specific, if my collection say db['sample'], consists of the following documents -
'a' : 1, 'b' : 'c' : 25, 'd' : "x", 'e' : 36,
'a' : 2, 'b' : 'c' : 5, 'd' : "xx", 'e' : 36,
'a' : 33, 'b' : 'c' : 25, 'd' : "xx", 'e' : 36,
'a' : 17, 'b' : 'c' : 25, 'd' : "xxx", 'e' : 36,
how can I get the frequency counter of all unique values for the field 'd' ? i.e. my output should be 'd' : "xx" : 2, "x" : 1, "xxx" : 1
Is this even possible ? Appreciate any help on this. Thank you.
I looked up the documentation for aggregation and objectToArray transformation to convert the map to array and tried the following in PyMongo
1)
db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
"$unwind" : "$b",
"$group" : "_id" : "$b.k",
"count" : "$sum" : "$b.v" ])
This gives the cumulative count of each of the fields where it it possible - 'c' : 25 + 5 + 25 + 25 for example.
2)
db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
"$unwind" : "$b",
"$group" : "_id" : "$b.k",
"count" : "$sum" : 1 ])
This gives the total number of times the fields are present in the document - 'c' : 4, 'd' : 4 etc.
python mongodb aggregation-framework pymongo
python mongodb aggregation-framework pymongo
edited Mar 24 at 9:05
Neil Lunn
103k23185191
103k23185191
asked Mar 24 at 8:25
pfatagagapfatagaga
102
102
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You are basically approaching this the wrong way around. You have a clear path to "b.d" as the key you want to aggregate on, there is no need to convert this to an array:
cursor = db.sample.aggregate([
"$group":
"_id": "$b.d",
"count": "$sum": 1
,
"$group":
"_id": None,
"data": "$push": "k": "$_id", "v": "$count"
,
"$replaceRoot":
"newRoot": "$arrayToObject": "$data"
])
for doc in cursor:
print(doc)
Returns
'x': 1, 'xx': 2, 'xxx': 1
But that is actually overkill, since in reality all of the work was done in that initial $group statement. All you really need to do is run that and fetch the results and combine them into a single dictionary as the desired output:
cursor = db.sample.aggregate([
"$group":
"_id": "$b.d",
"count": "$sum": 1
])
data = list(cursor)
result = reduce(
lambda x,y:
dict(x.items() + y['_id']: y['count'] .items()), data,)
print(result)
Which returns exactly the same thing:
'x': 1, 'xx': 2, 'xxx': 1
Moreover it does so without the gymnastics required by adding the other aggregation stages and operators, and you did not change what really comes back from the server since the initial $group response is basically:
"_id" : "xxx", "count" : 1
"_id" : "xx", "count" : 2
"_id" : "x", "count" : 1
So the real lesson here is whilst you can do fancy manipulation within an aggregation pipeline, what you really should be considering is you probably should not when the alternative is cleaner and much more readable code.
For reference though all that happens is the additional $group uses $push to create an array with k and v keys as would be expected in the next pipeline stage. Where that next stage uses $replaceRoot to take the output of $arrayToObject from that array created in the previous stage, and basically transforms that into an Object/Dictionary.
By contrast, the reduce is doing exactly the same thing. We basically take the cursor results into a list so python functions can act on that list. Then it's just a matter of traversing the doucments in that list which always have _id as the key and another named property for the "counted" output ( here we used count ) and simply transforming those as the key and value pairs for the final dictionary output.
Just for fun, something based off of your initial attempts might be:
db.sample.aggregate([
"$addFields": "b": "$objectToArray": "$b" ,
"$unwind": "$b" ,
"$group":
"_id":
"_id": "$b.k",
"k": "$b.v"
,
"count": "$sum": 1
,
"$group":
"_id": "$_id._id",
"data": "$push": "k": "$toString": "$_id.k" , "v": "$count"
,
"$addFields":
"data": "$arrayToObject": "$data"
])
Which would return:
"_id" : "c", "data" : "25" : 3, "5" : 1
"_id" : "e", "data" : "36" : 4
"_id" : "d", "data" : "xxx" : 1, "xx" : 2, "x" : 1
Again, the same result without the additional pipeline stages to transform comes from using map and reduce with python:
cursor = db.sample.aggregate([
"$addFields": "b": "$objectToArray": "$b" ,
"$unwind": "$b" ,
"$group":
"_id":
"_id": "$b.k",
"k": "$b.v"
,
"count": "$sum": 1
,
"$group":
"_id": "$_id._id",
"data": "$push": "k": "$_id.k", "v": "$count"
])
data = list(cursor)
result = map(lambda d:
'_id': d['_id'],
'data': reduce(lambda x,y:
dict(x.items() + y['k']: y['v'] .items()), d['data'],
)
,data)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55321910%2fcalculating-the-frequency-counter-of-all-the-unique-values-of-a-nested-field%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You are basically approaching this the wrong way around. You have a clear path to "b.d" as the key you want to aggregate on, there is no need to convert this to an array:
cursor = db.sample.aggregate([
"$group":
"_id": "$b.d",
"count": "$sum": 1
,
"$group":
"_id": None,
"data": "$push": "k": "$_id", "v": "$count"
,
"$replaceRoot":
"newRoot": "$arrayToObject": "$data"
])
for doc in cursor:
print(doc)
Returns
'x': 1, 'xx': 2, 'xxx': 1
But that is actually overkill, since in reality all of the work was done in that initial $group statement. All you really need to do is run that and fetch the results and combine them into a single dictionary as the desired output:
cursor = db.sample.aggregate([
"$group":
"_id": "$b.d",
"count": "$sum": 1
])
data = list(cursor)
result = reduce(
lambda x,y:
dict(x.items() + y['_id']: y['count'] .items()), data,)
print(result)
Which returns exactly the same thing:
'x': 1, 'xx': 2, 'xxx': 1
Moreover it does so without the gymnastics required by adding the other aggregation stages and operators, and you did not change what really comes back from the server since the initial $group response is basically:
"_id" : "xxx", "count" : 1
"_id" : "xx", "count" : 2
"_id" : "x", "count" : 1
So the real lesson here is whilst you can do fancy manipulation within an aggregation pipeline, what you really should be considering is you probably should not when the alternative is cleaner and much more readable code.
For reference though all that happens is the additional $group uses $push to create an array with k and v keys as would be expected in the next pipeline stage. Where that next stage uses $replaceRoot to take the output of $arrayToObject from that array created in the previous stage, and basically transforms that into an Object/Dictionary.
By contrast, the reduce is doing exactly the same thing. We basically take the cursor results into a list so python functions can act on that list. Then it's just a matter of traversing the doucments in that list which always have _id as the key and another named property for the "counted" output ( here we used count ) and simply transforming those as the key and value pairs for the final dictionary output.
Just for fun, something based off of your initial attempts might be:
db.sample.aggregate([
"$addFields": "b": "$objectToArray": "$b" ,
"$unwind": "$b" ,
"$group":
"_id":
"_id": "$b.k",
"k": "$b.v"
,
"count": "$sum": 1
,
"$group":
"_id": "$_id._id",
"data": "$push": "k": "$toString": "$_id.k" , "v": "$count"
,
"$addFields":
"data": "$arrayToObject": "$data"
])
Which would return:
"_id" : "c", "data" : "25" : 3, "5" : 1
"_id" : "e", "data" : "36" : 4
"_id" : "d", "data" : "xxx" : 1, "xx" : 2, "x" : 1
Again, the same result without the additional pipeline stages to transform comes from using map and reduce with python:
cursor = db.sample.aggregate([
"$addFields": "b": "$objectToArray": "$b" ,
"$unwind": "$b" ,
"$group":
"_id":
"_id": "$b.k",
"k": "$b.v"
,
"count": "$sum": 1
,
"$group":
"_id": "$_id._id",
"data": "$push": "k": "$_id.k", "v": "$count"
])
data = list(cursor)
result = map(lambda d:
'_id': d['_id'],
'data': reduce(lambda x,y:
dict(x.items() + y['k']: y['v'] .items()), d['data'],
)
,data)
add a comment |
You are basically approaching this the wrong way around. You have a clear path to "b.d" as the key you want to aggregate on, there is no need to convert this to an array:
cursor = db.sample.aggregate([
"$group":
"_id": "$b.d",
"count": "$sum": 1
,
"$group":
"_id": None,
"data": "$push": "k": "$_id", "v": "$count"
,
"$replaceRoot":
"newRoot": "$arrayToObject": "$data"
])
for doc in cursor:
print(doc)
Returns
'x': 1, 'xx': 2, 'xxx': 1
But that is actually overkill, since in reality all of the work was done in that initial $group statement. All you really need to do is run that and fetch the results and combine them into a single dictionary as the desired output:
cursor = db.sample.aggregate([
"$group":
"_id": "$b.d",
"count": "$sum": 1
])
data = list(cursor)
result = reduce(
lambda x,y:
dict(x.items() + y['_id']: y['count'] .items()), data,)
print(result)
Which returns exactly the same thing:
'x': 1, 'xx': 2, 'xxx': 1
Moreover it does so without the gymnastics required by adding the other aggregation stages and operators, and you did not change what really comes back from the server since the initial $group response is basically:
"_id" : "xxx", "count" : 1
"_id" : "xx", "count" : 2
"_id" : "x", "count" : 1
So the real lesson here is whilst you can do fancy manipulation within an aggregation pipeline, what you really should be considering is you probably should not when the alternative is cleaner and much more readable code.
For reference though all that happens is the additional $group uses $push to create an array with k and v keys as would be expected in the next pipeline stage. Where that next stage uses $replaceRoot to take the output of $arrayToObject from that array created in the previous stage, and basically transforms that into an Object/Dictionary.
By contrast, the reduce is doing exactly the same thing. We basically take the cursor results into a list so python functions can act on that list. Then it's just a matter of traversing the doucments in that list which always have _id as the key and another named property for the "counted" output ( here we used count ) and simply transforming those as the key and value pairs for the final dictionary output.
Just for fun, something based off of your initial attempts might be:
db.sample.aggregate([
"$addFields": "b": "$objectToArray": "$b" ,
"$unwind": "$b" ,
"$group":
"_id":
"_id": "$b.k",
"k": "$b.v"
,
"count": "$sum": 1
,
"$group":
"_id": "$_id._id",
"data": "$push": "k": "$toString": "$_id.k" , "v": "$count"
,
"$addFields":
"data": "$arrayToObject": "$data"
])
Which would return:
"_id" : "c", "data" : "25" : 3, "5" : 1
"_id" : "e", "data" : "36" : 4
"_id" : "d", "data" : "xxx" : 1, "xx" : 2, "x" : 1
Again, the same result without the additional pipeline stages to transform comes from using map and reduce with python:
cursor = db.sample.aggregate([
"$addFields": "b": "$objectToArray": "$b" ,
"$unwind": "$b" ,
"$group":
"_id":
"_id": "$b.k",
"k": "$b.v"
,
"count": "$sum": 1
,
"$group":
"_id": "$_id._id",
"data": "$push": "k": "$_id.k", "v": "$count"
])
data = list(cursor)
result = map(lambda d:
'_id': d['_id'],
'data': reduce(lambda x,y:
dict(x.items() + y['k']: y['v'] .items()), d['data'],
)
,data)
add a comment |
You are basically approaching this the wrong way around. You have a clear path to "b.d" as the key you want to aggregate on, there is no need to convert this to an array:
cursor = db.sample.aggregate([
"$group":
"_id": "$b.d",
"count": "$sum": 1
,
"$group":
"_id": None,
"data": "$push": "k": "$_id", "v": "$count"
,
"$replaceRoot":
"newRoot": "$arrayToObject": "$data"
])
for doc in cursor:
print(doc)
Returns
'x': 1, 'xx': 2, 'xxx': 1
But that is actually overkill, since in reality all of the work was done in that initial $group statement. All you really need to do is run that and fetch the results and combine them into a single dictionary as the desired output:
cursor = db.sample.aggregate([
"$group":
"_id": "$b.d",
"count": "$sum": 1
])
data = list(cursor)
result = reduce(
lambda x,y:
dict(x.items() + y['_id']: y['count'] .items()), data,)
print(result)
Which returns exactly the same thing:
'x': 1, 'xx': 2, 'xxx': 1
Moreover it does so without the gymnastics required by adding the other aggregation stages and operators, and you did not change what really comes back from the server since the initial $group response is basically:
"_id" : "xxx", "count" : 1
"_id" : "xx", "count" : 2
"_id" : "x", "count" : 1
So the real lesson here is whilst you can do fancy manipulation within an aggregation pipeline, what you really should be considering is you probably should not when the alternative is cleaner and much more readable code.
For reference though all that happens is the additional $group uses $push to create an array with k and v keys as would be expected in the next pipeline stage. Where that next stage uses $replaceRoot to take the output of $arrayToObject from that array created in the previous stage, and basically transforms that into an Object/Dictionary.
By contrast, the reduce is doing exactly the same thing. We basically take the cursor results into a list so python functions can act on that list. Then it's just a matter of traversing the doucments in that list which always have _id as the key and another named property for the "counted" output ( here we used count ) and simply transforming those as the key and value pairs for the final dictionary output.
Just for fun, something based off of your initial attempts might be:
db.sample.aggregate([
"$addFields": "b": "$objectToArray": "$b" ,
"$unwind": "$b" ,
"$group":
"_id":
"_id": "$b.k",
"k": "$b.v"
,
"count": "$sum": 1
,
"$group":
"_id": "$_id._id",
"data": "$push": "k": "$toString": "$_id.k" , "v": "$count"
,
"$addFields":
"data": "$arrayToObject": "$data"
])
Which would return:
"_id" : "c", "data" : "25" : 3, "5" : 1
"_id" : "e", "data" : "36" : 4
"_id" : "d", "data" : "xxx" : 1, "xx" : 2, "x" : 1
Again, the same result without the additional pipeline stages to transform comes from using map and reduce with python:
cursor = db.sample.aggregate([
"$addFields": "b": "$objectToArray": "$b" ,
"$unwind": "$b" ,
"$group":
"_id":
"_id": "$b.k",
"k": "$b.v"
,
"count": "$sum": 1
,
"$group":
"_id": "$_id._id",
"data": "$push": "k": "$_id.k", "v": "$count"
])
data = list(cursor)
result = map(lambda d:
'_id': d['_id'],
'data': reduce(lambda x,y:
dict(x.items() + y['k']: y['v'] .items()), d['data'],
)
,data)
You are basically approaching this the wrong way around. You have a clear path to "b.d" as the key you want to aggregate on, there is no need to convert this to an array:
cursor = db.sample.aggregate([
"$group":
"_id": "$b.d",
"count": "$sum": 1
,
"$group":
"_id": None,
"data": "$push": "k": "$_id", "v": "$count"
,
"$replaceRoot":
"newRoot": "$arrayToObject": "$data"
])
for doc in cursor:
print(doc)
Returns
'x': 1, 'xx': 2, 'xxx': 1
But that is actually overkill, since in reality all of the work was done in that initial $group statement. All you really need to do is run that and fetch the results and combine them into a single dictionary as the desired output:
cursor = db.sample.aggregate([
"$group":
"_id": "$b.d",
"count": "$sum": 1
])
data = list(cursor)
result = reduce(
lambda x,y:
dict(x.items() + y['_id']: y['count'] .items()), data,)
print(result)
Which returns exactly the same thing:
'x': 1, 'xx': 2, 'xxx': 1
Moreover it does so without the gymnastics required by adding the other aggregation stages and operators, and you did not change what really comes back from the server since the initial $group response is basically:
"_id" : "xxx", "count" : 1
"_id" : "xx", "count" : 2
"_id" : "x", "count" : 1
So the real lesson here is whilst you can do fancy manipulation within an aggregation pipeline, what you really should be considering is you probably should not when the alternative is cleaner and much more readable code.
For reference though all that happens is the additional $group uses $push to create an array with k and v keys as would be expected in the next pipeline stage. Where that next stage uses $replaceRoot to take the output of $arrayToObject from that array created in the previous stage, and basically transforms that into an Object/Dictionary.
By contrast, the reduce is doing exactly the same thing. We basically take the cursor results into a list so python functions can act on that list. Then it's just a matter of traversing the doucments in that list which always have _id as the key and another named property for the "counted" output ( here we used count ) and simply transforming those as the key and value pairs for the final dictionary output.
Just for fun, something based off of your initial attempts might be:
db.sample.aggregate([
"$addFields": "b": "$objectToArray": "$b" ,
"$unwind": "$b" ,
"$group":
"_id":
"_id": "$b.k",
"k": "$b.v"
,
"count": "$sum": 1
,
"$group":
"_id": "$_id._id",
"data": "$push": "k": "$toString": "$_id.k" , "v": "$count"
,
"$addFields":
"data": "$arrayToObject": "$data"
])
Which would return:
"_id" : "c", "data" : "25" : 3, "5" : 1
"_id" : "e", "data" : "36" : 4
"_id" : "d", "data" : "xxx" : 1, "xx" : 2, "x" : 1
Again, the same result without the additional pipeline stages to transform comes from using map and reduce with python:
cursor = db.sample.aggregate([
"$addFields": "b": "$objectToArray": "$b" ,
"$unwind": "$b" ,
"$group":
"_id":
"_id": "$b.k",
"k": "$b.v"
,
"count": "$sum": 1
,
"$group":
"_id": "$_id._id",
"data": "$push": "k": "$_id.k", "v": "$count"
])
data = list(cursor)
result = map(lambda d:
'_id': d['_id'],
'data': reduce(lambda x,y:
dict(x.items() + y['k']: y['v'] .items()), d['data'],
)
,data)
edited Mar 24 at 9:51
answered Mar 24 at 9:03
Neil LunnNeil Lunn
103k23185191
103k23185191
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55321910%2fcalculating-the-frequency-counter-of-all-the-unique-values-of-a-nested-field%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown