Calculating the frequency counter of all the unique values of a nested fieldGet unique values from a list in pythonUsing Mongo aggregation to calculate sum of valuesMerging array fields in MongoDB aggregationUsing maxTimeMS parameter with aggregation queries on Mongo 2.6 and Pymongo 2.7.1Update all the values of the json document to default value using pymongoMongo aggregation with unwind operationCount Distinct Within Date RangeAggregate counts into count object using $group in mongodbCount of existing values for each field of a collection, strange behaviorProjecting flat values from nested objects

Reducing Spill Overs

Is there any use case for the bottom type as a function parameter type?

shutdown at specific date

Leading and Suffering Numbers

What is the 中 in ダウンロード中?

Why does the UK have more political parties than the US?

How did early x86 BIOS programmers manage to program full blown TUIs given very few bytes of ROM/EPROM?

Ticket sales for Queen at the Live Aid

How to capture more stars?

Why do Russians call their women expensive ("дорогая")?

What does uniform continuity mean exactly?

Employer demanding to see degree after poor code review

How can I find where certain bash function is defined?

How to prevent bad sectors?

How to extract lower and upper bound in numeric format from a confidence interval string?

In “An element of a set can never be a subset of itself”, what does ‘itself’ stand for?

Where is the logic in castrating fighters?

Break equation in parts

1960s sci-fi novella with a character who is treated as invisible by being ignored

If a person had control of every single cell of their body, would they be able to transform into another creature?

Plot exactly N bounce of a ball

Crossing US border with music files I'm legally allowed to possess

Canon 70D often overexposing or underexposing shots

If a massive object like Jupiter flew past the Earth how close would it need to come to pull people off of the surface?



Calculating the frequency counter of all the unique values of a nested field


Get unique values from a list in pythonUsing Mongo aggregation to calculate sum of valuesMerging array fields in MongoDB aggregationUsing maxTimeMS parameter with aggregation queries on Mongo 2.6 and Pymongo 2.7.1Update all the values of the json document to default value using pymongoMongo aggregation with unwind operationCount Distinct Within Date RangeAggregate counts into count object using $group in mongodbCount of existing values for each field of a collection, strange behaviorProjecting flat values from nested objects






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I want to find the frequency counter of all unique values of a nested field in Mongo Document.



To be more specific, if my collection say db['sample'], consists of the following documents -



'a' : 1, 'b' : 'c' : 25, 'd' : "x", 'e' : 36,

'a' : 2, 'b' : 'c' : 5, 'd' : "xx", 'e' : 36,

'a' : 33, 'b' : 'c' : 25, 'd' : "xx", 'e' : 36,

'a' : 17, 'b' : 'c' : 25, 'd' : "xxx", 'e' : 36,


how can I get the frequency counter of all unique values for the field 'd' ? i.e. my output should be 'd' : "xx" : 2, "x" : 1, "xxx" : 1



Is this even possible ? Appreciate any help on this. Thank you.



I looked up the documentation for aggregation and objectToArray transformation to convert the map to array and tried the following in PyMongo



1)



db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
"$unwind" : "$b",
"$group" : "_id" : "$b.k",
"count" : "$sum" : "$b.v" ])


This gives the cumulative count of each of the fields where it it possible - 'c' : 25 + 5 + 25 + 25 for example.



2)



db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
"$unwind" : "$b",
"$group" : "_id" : "$b.k",
"count" : "$sum" : 1 ])


This gives the total number of times the fields are present in the document - 'c' : 4, 'd' : 4 etc.










share|improve this question






























    1















    I want to find the frequency counter of all unique values of a nested field in Mongo Document.



    To be more specific, if my collection say db['sample'], consists of the following documents -



    'a' : 1, 'b' : 'c' : 25, 'd' : "x", 'e' : 36,

    'a' : 2, 'b' : 'c' : 5, 'd' : "xx", 'e' : 36,

    'a' : 33, 'b' : 'c' : 25, 'd' : "xx", 'e' : 36,

    'a' : 17, 'b' : 'c' : 25, 'd' : "xxx", 'e' : 36,


    how can I get the frequency counter of all unique values for the field 'd' ? i.e. my output should be 'd' : "xx" : 2, "x" : 1, "xxx" : 1



    Is this even possible ? Appreciate any help on this. Thank you.



    I looked up the documentation for aggregation and objectToArray transformation to convert the map to array and tried the following in PyMongo



    1)



    db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
    "$unwind" : "$b",
    "$group" : "_id" : "$b.k",
    "count" : "$sum" : "$b.v" ])


    This gives the cumulative count of each of the fields where it it possible - 'c' : 25 + 5 + 25 + 25 for example.



    2)



    db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
    "$unwind" : "$b",
    "$group" : "_id" : "$b.k",
    "count" : "$sum" : 1 ])


    This gives the total number of times the fields are present in the document - 'c' : 4, 'd' : 4 etc.










    share|improve this question


























      1












      1








      1








      I want to find the frequency counter of all unique values of a nested field in Mongo Document.



      To be more specific, if my collection say db['sample'], consists of the following documents -



      'a' : 1, 'b' : 'c' : 25, 'd' : "x", 'e' : 36,

      'a' : 2, 'b' : 'c' : 5, 'd' : "xx", 'e' : 36,

      'a' : 33, 'b' : 'c' : 25, 'd' : "xx", 'e' : 36,

      'a' : 17, 'b' : 'c' : 25, 'd' : "xxx", 'e' : 36,


      how can I get the frequency counter of all unique values for the field 'd' ? i.e. my output should be 'd' : "xx" : 2, "x" : 1, "xxx" : 1



      Is this even possible ? Appreciate any help on this. Thank you.



      I looked up the documentation for aggregation and objectToArray transformation to convert the map to array and tried the following in PyMongo



      1)



      db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
      "$unwind" : "$b",
      "$group" : "_id" : "$b.k",
      "count" : "$sum" : "$b.v" ])


      This gives the cumulative count of each of the fields where it it possible - 'c' : 25 + 5 + 25 + 25 for example.



      2)



      db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
      "$unwind" : "$b",
      "$group" : "_id" : "$b.k",
      "count" : "$sum" : 1 ])


      This gives the total number of times the fields are present in the document - 'c' : 4, 'd' : 4 etc.










      share|improve this question
















      I want to find the frequency counter of all unique values of a nested field in Mongo Document.



      To be more specific, if my collection say db['sample'], consists of the following documents -



      'a' : 1, 'b' : 'c' : 25, 'd' : "x", 'e' : 36,

      'a' : 2, 'b' : 'c' : 5, 'd' : "xx", 'e' : 36,

      'a' : 33, 'b' : 'c' : 25, 'd' : "xx", 'e' : 36,

      'a' : 17, 'b' : 'c' : 25, 'd' : "xxx", 'e' : 36,


      how can I get the frequency counter of all unique values for the field 'd' ? i.e. my output should be 'd' : "xx" : 2, "x" : 1, "xxx" : 1



      Is this even possible ? Appreciate any help on this. Thank you.



      I looked up the documentation for aggregation and objectToArray transformation to convert the map to array and tried the following in PyMongo



      1)



      db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
      "$unwind" : "$b",
      "$group" : "_id" : "$b.k",
      "count" : "$sum" : "$b.v" ])


      This gives the cumulative count of each of the fields where it it possible - 'c' : 25 + 5 + 25 + 25 for example.



      2)



      db['sample'].aggregate([ "$addFields" : "b" : "$objectToArray" : "$b",
      "$unwind" : "$b",
      "$group" : "_id" : "$b.k",
      "count" : "$sum" : 1 ])


      This gives the total number of times the fields are present in the document - 'c' : 4, 'd' : 4 etc.







      python mongodb aggregation-framework pymongo






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 24 at 9:05









      Neil Lunn

      103k23185191




      103k23185191










      asked Mar 24 at 8:25









      pfatagagapfatagaga

      102




      102






















          1 Answer
          1






          active

          oldest

          votes


















          0














          You are basically approaching this the wrong way around. You have a clear path to "b.d" as the key you want to aggregate on, there is no need to convert this to an array:



          cursor = db.sample.aggregate([
          "$group":
          "_id": "$b.d",
          "count": "$sum": 1
          ,
          "$group":
          "_id": None,
          "data": "$push": "k": "$_id", "v": "$count"
          ,
          "$replaceRoot":
          "newRoot": "$arrayToObject": "$data"

          ])

          for doc in cursor:
          print(doc)


          Returns



           'x': 1, 'xx': 2, 'xxx': 1 


          But that is actually overkill, since in reality all of the work was done in that initial $group statement. All you really need to do is run that and fetch the results and combine them into a single dictionary as the desired output:



          cursor = db.sample.aggregate([
          "$group":
          "_id": "$b.d",
          "count": "$sum": 1

          ])

          data = list(cursor)

          result = reduce(
          lambda x,y:
          dict(x.items() + y['_id']: y['count'] .items()), data,)

          print(result)


          Which returns exactly the same thing:



           'x': 1, 'xx': 2, 'xxx': 1 


          Moreover it does so without the gymnastics required by adding the other aggregation stages and operators, and you did not change what really comes back from the server since the initial $group response is basically:



           "_id" : "xxx", "count" : 1 
          "_id" : "xx", "count" : 2
          "_id" : "x", "count" : 1


          So the real lesson here is whilst you can do fancy manipulation within an aggregation pipeline, what you really should be considering is you probably should not when the alternative is cleaner and much more readable code.



          For reference though all that happens is the additional $group uses $push to create an array with k and v keys as would be expected in the next pipeline stage. Where that next stage uses $replaceRoot to take the output of $arrayToObject from that array created in the previous stage, and basically transforms that into an Object/Dictionary.



          By contrast, the reduce is doing exactly the same thing. We basically take the cursor results into a list so python functions can act on that list. Then it's just a matter of traversing the doucments in that list which always have _id as the key and another named property for the "counted" output ( here we used count ) and simply transforming those as the key and value pairs for the final dictionary output.




          Just for fun, something based off of your initial attempts might be:



          db.sample.aggregate([
          "$addFields": "b": "$objectToArray": "$b" ,
          "$unwind": "$b" ,
          "$group":
          "_id":
          "_id": "$b.k",
          "k": "$b.v"
          ,
          "count": "$sum": 1

          ,
          "$group":
          "_id": "$_id._id",
          "data": "$push": "k": "$toString": "$_id.k" , "v": "$count"
          ,
          "$addFields":
          "data": "$arrayToObject": "$data"

          ])


          Which would return:



           "_id" : "c", "data" : "25" : 3, "5" : 1 
          "_id" : "e", "data" : "36" : 4
          "_id" : "d", "data" : "xxx" : 1, "xx" : 2, "x" : 1


          Again, the same result without the additional pipeline stages to transform comes from using map and reduce with python:



          cursor = db.sample.aggregate([
          "$addFields": "b": "$objectToArray": "$b" ,
          "$unwind": "$b" ,
          "$group":
          "_id":
          "_id": "$b.k",
          "k": "$b.v"
          ,
          "count": "$sum": 1
          ,
          "$group":
          "_id": "$_id._id",
          "data": "$push": "k": "$_id.k", "v": "$count"

          ])

          data = list(cursor)


          result = map(lambda d:
          '_id': d['_id'],
          'data': reduce(lambda x,y:
          dict(x.items() + y['k']: y['v'] .items()), d['data'],
          )
          ,data)





          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55321910%2fcalculating-the-frequency-counter-of-all-the-unique-values-of-a-nested-field%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            You are basically approaching this the wrong way around. You have a clear path to "b.d" as the key you want to aggregate on, there is no need to convert this to an array:



            cursor = db.sample.aggregate([
            "$group":
            "_id": "$b.d",
            "count": "$sum": 1
            ,
            "$group":
            "_id": None,
            "data": "$push": "k": "$_id", "v": "$count"
            ,
            "$replaceRoot":
            "newRoot": "$arrayToObject": "$data"

            ])

            for doc in cursor:
            print(doc)


            Returns



             'x': 1, 'xx': 2, 'xxx': 1 


            But that is actually overkill, since in reality all of the work was done in that initial $group statement. All you really need to do is run that and fetch the results and combine them into a single dictionary as the desired output:



            cursor = db.sample.aggregate([
            "$group":
            "_id": "$b.d",
            "count": "$sum": 1

            ])

            data = list(cursor)

            result = reduce(
            lambda x,y:
            dict(x.items() + y['_id']: y['count'] .items()), data,)

            print(result)


            Which returns exactly the same thing:



             'x': 1, 'xx': 2, 'xxx': 1 


            Moreover it does so without the gymnastics required by adding the other aggregation stages and operators, and you did not change what really comes back from the server since the initial $group response is basically:



             "_id" : "xxx", "count" : 1 
            "_id" : "xx", "count" : 2
            "_id" : "x", "count" : 1


            So the real lesson here is whilst you can do fancy manipulation within an aggregation pipeline, what you really should be considering is you probably should not when the alternative is cleaner and much more readable code.



            For reference though all that happens is the additional $group uses $push to create an array with k and v keys as would be expected in the next pipeline stage. Where that next stage uses $replaceRoot to take the output of $arrayToObject from that array created in the previous stage, and basically transforms that into an Object/Dictionary.



            By contrast, the reduce is doing exactly the same thing. We basically take the cursor results into a list so python functions can act on that list. Then it's just a matter of traversing the doucments in that list which always have _id as the key and another named property for the "counted" output ( here we used count ) and simply transforming those as the key and value pairs for the final dictionary output.




            Just for fun, something based off of your initial attempts might be:



            db.sample.aggregate([
            "$addFields": "b": "$objectToArray": "$b" ,
            "$unwind": "$b" ,
            "$group":
            "_id":
            "_id": "$b.k",
            "k": "$b.v"
            ,
            "count": "$sum": 1

            ,
            "$group":
            "_id": "$_id._id",
            "data": "$push": "k": "$toString": "$_id.k" , "v": "$count"
            ,
            "$addFields":
            "data": "$arrayToObject": "$data"

            ])


            Which would return:



             "_id" : "c", "data" : "25" : 3, "5" : 1 
            "_id" : "e", "data" : "36" : 4
            "_id" : "d", "data" : "xxx" : 1, "xx" : 2, "x" : 1


            Again, the same result without the additional pipeline stages to transform comes from using map and reduce with python:



            cursor = db.sample.aggregate([
            "$addFields": "b": "$objectToArray": "$b" ,
            "$unwind": "$b" ,
            "$group":
            "_id":
            "_id": "$b.k",
            "k": "$b.v"
            ,
            "count": "$sum": 1
            ,
            "$group":
            "_id": "$_id._id",
            "data": "$push": "k": "$_id.k", "v": "$count"

            ])

            data = list(cursor)


            result = map(lambda d:
            '_id': d['_id'],
            'data': reduce(lambda x,y:
            dict(x.items() + y['k']: y['v'] .items()), d['data'],
            )
            ,data)





            share|improve this answer





























              0














              You are basically approaching this the wrong way around. You have a clear path to "b.d" as the key you want to aggregate on, there is no need to convert this to an array:



              cursor = db.sample.aggregate([
              "$group":
              "_id": "$b.d",
              "count": "$sum": 1
              ,
              "$group":
              "_id": None,
              "data": "$push": "k": "$_id", "v": "$count"
              ,
              "$replaceRoot":
              "newRoot": "$arrayToObject": "$data"

              ])

              for doc in cursor:
              print(doc)


              Returns



               'x': 1, 'xx': 2, 'xxx': 1 


              But that is actually overkill, since in reality all of the work was done in that initial $group statement. All you really need to do is run that and fetch the results and combine them into a single dictionary as the desired output:



              cursor = db.sample.aggregate([
              "$group":
              "_id": "$b.d",
              "count": "$sum": 1

              ])

              data = list(cursor)

              result = reduce(
              lambda x,y:
              dict(x.items() + y['_id']: y['count'] .items()), data,)

              print(result)


              Which returns exactly the same thing:



               'x': 1, 'xx': 2, 'xxx': 1 


              Moreover it does so without the gymnastics required by adding the other aggregation stages and operators, and you did not change what really comes back from the server since the initial $group response is basically:



               "_id" : "xxx", "count" : 1 
              "_id" : "xx", "count" : 2
              "_id" : "x", "count" : 1


              So the real lesson here is whilst you can do fancy manipulation within an aggregation pipeline, what you really should be considering is you probably should not when the alternative is cleaner and much more readable code.



              For reference though all that happens is the additional $group uses $push to create an array with k and v keys as would be expected in the next pipeline stage. Where that next stage uses $replaceRoot to take the output of $arrayToObject from that array created in the previous stage, and basically transforms that into an Object/Dictionary.



              By contrast, the reduce is doing exactly the same thing. We basically take the cursor results into a list so python functions can act on that list. Then it's just a matter of traversing the doucments in that list which always have _id as the key and another named property for the "counted" output ( here we used count ) and simply transforming those as the key and value pairs for the final dictionary output.




              Just for fun, something based off of your initial attempts might be:



              db.sample.aggregate([
              "$addFields": "b": "$objectToArray": "$b" ,
              "$unwind": "$b" ,
              "$group":
              "_id":
              "_id": "$b.k",
              "k": "$b.v"
              ,
              "count": "$sum": 1

              ,
              "$group":
              "_id": "$_id._id",
              "data": "$push": "k": "$toString": "$_id.k" , "v": "$count"
              ,
              "$addFields":
              "data": "$arrayToObject": "$data"

              ])


              Which would return:



               "_id" : "c", "data" : "25" : 3, "5" : 1 
              "_id" : "e", "data" : "36" : 4
              "_id" : "d", "data" : "xxx" : 1, "xx" : 2, "x" : 1


              Again, the same result without the additional pipeline stages to transform comes from using map and reduce with python:



              cursor = db.sample.aggregate([
              "$addFields": "b": "$objectToArray": "$b" ,
              "$unwind": "$b" ,
              "$group":
              "_id":
              "_id": "$b.k",
              "k": "$b.v"
              ,
              "count": "$sum": 1
              ,
              "$group":
              "_id": "$_id._id",
              "data": "$push": "k": "$_id.k", "v": "$count"

              ])

              data = list(cursor)


              result = map(lambda d:
              '_id': d['_id'],
              'data': reduce(lambda x,y:
              dict(x.items() + y['k']: y['v'] .items()), d['data'],
              )
              ,data)





              share|improve this answer



























                0












                0








                0







                You are basically approaching this the wrong way around. You have a clear path to "b.d" as the key you want to aggregate on, there is no need to convert this to an array:



                cursor = db.sample.aggregate([
                "$group":
                "_id": "$b.d",
                "count": "$sum": 1
                ,
                "$group":
                "_id": None,
                "data": "$push": "k": "$_id", "v": "$count"
                ,
                "$replaceRoot":
                "newRoot": "$arrayToObject": "$data"

                ])

                for doc in cursor:
                print(doc)


                Returns



                 'x': 1, 'xx': 2, 'xxx': 1 


                But that is actually overkill, since in reality all of the work was done in that initial $group statement. All you really need to do is run that and fetch the results and combine them into a single dictionary as the desired output:



                cursor = db.sample.aggregate([
                "$group":
                "_id": "$b.d",
                "count": "$sum": 1

                ])

                data = list(cursor)

                result = reduce(
                lambda x,y:
                dict(x.items() + y['_id']: y['count'] .items()), data,)

                print(result)


                Which returns exactly the same thing:



                 'x': 1, 'xx': 2, 'xxx': 1 


                Moreover it does so without the gymnastics required by adding the other aggregation stages and operators, and you did not change what really comes back from the server since the initial $group response is basically:



                 "_id" : "xxx", "count" : 1 
                "_id" : "xx", "count" : 2
                "_id" : "x", "count" : 1


                So the real lesson here is whilst you can do fancy manipulation within an aggregation pipeline, what you really should be considering is you probably should not when the alternative is cleaner and much more readable code.



                For reference though all that happens is the additional $group uses $push to create an array with k and v keys as would be expected in the next pipeline stage. Where that next stage uses $replaceRoot to take the output of $arrayToObject from that array created in the previous stage, and basically transforms that into an Object/Dictionary.



                By contrast, the reduce is doing exactly the same thing. We basically take the cursor results into a list so python functions can act on that list. Then it's just a matter of traversing the doucments in that list which always have _id as the key and another named property for the "counted" output ( here we used count ) and simply transforming those as the key and value pairs for the final dictionary output.




                Just for fun, something based off of your initial attempts might be:



                db.sample.aggregate([
                "$addFields": "b": "$objectToArray": "$b" ,
                "$unwind": "$b" ,
                "$group":
                "_id":
                "_id": "$b.k",
                "k": "$b.v"
                ,
                "count": "$sum": 1

                ,
                "$group":
                "_id": "$_id._id",
                "data": "$push": "k": "$toString": "$_id.k" , "v": "$count"
                ,
                "$addFields":
                "data": "$arrayToObject": "$data"

                ])


                Which would return:



                 "_id" : "c", "data" : "25" : 3, "5" : 1 
                "_id" : "e", "data" : "36" : 4
                "_id" : "d", "data" : "xxx" : 1, "xx" : 2, "x" : 1


                Again, the same result without the additional pipeline stages to transform comes from using map and reduce with python:



                cursor = db.sample.aggregate([
                "$addFields": "b": "$objectToArray": "$b" ,
                "$unwind": "$b" ,
                "$group":
                "_id":
                "_id": "$b.k",
                "k": "$b.v"
                ,
                "count": "$sum": 1
                ,
                "$group":
                "_id": "$_id._id",
                "data": "$push": "k": "$_id.k", "v": "$count"

                ])

                data = list(cursor)


                result = map(lambda d:
                '_id': d['_id'],
                'data': reduce(lambda x,y:
                dict(x.items() + y['k']: y['v'] .items()), d['data'],
                )
                ,data)





                share|improve this answer















                You are basically approaching this the wrong way around. You have a clear path to "b.d" as the key you want to aggregate on, there is no need to convert this to an array:



                cursor = db.sample.aggregate([
                "$group":
                "_id": "$b.d",
                "count": "$sum": 1
                ,
                "$group":
                "_id": None,
                "data": "$push": "k": "$_id", "v": "$count"
                ,
                "$replaceRoot":
                "newRoot": "$arrayToObject": "$data"

                ])

                for doc in cursor:
                print(doc)


                Returns



                 'x': 1, 'xx': 2, 'xxx': 1 


                But that is actually overkill, since in reality all of the work was done in that initial $group statement. All you really need to do is run that and fetch the results and combine them into a single dictionary as the desired output:



                cursor = db.sample.aggregate([
                "$group":
                "_id": "$b.d",
                "count": "$sum": 1

                ])

                data = list(cursor)

                result = reduce(
                lambda x,y:
                dict(x.items() + y['_id']: y['count'] .items()), data,)

                print(result)


                Which returns exactly the same thing:



                 'x': 1, 'xx': 2, 'xxx': 1 


                Moreover it does so without the gymnastics required by adding the other aggregation stages and operators, and you did not change what really comes back from the server since the initial $group response is basically:



                 "_id" : "xxx", "count" : 1 
                "_id" : "xx", "count" : 2
                "_id" : "x", "count" : 1


                So the real lesson here is whilst you can do fancy manipulation within an aggregation pipeline, what you really should be considering is you probably should not when the alternative is cleaner and much more readable code.



                For reference though all that happens is the additional $group uses $push to create an array with k and v keys as would be expected in the next pipeline stage. Where that next stage uses $replaceRoot to take the output of $arrayToObject from that array created in the previous stage, and basically transforms that into an Object/Dictionary.



                By contrast, the reduce is doing exactly the same thing. We basically take the cursor results into a list so python functions can act on that list. Then it's just a matter of traversing the doucments in that list which always have _id as the key and another named property for the "counted" output ( here we used count ) and simply transforming those as the key and value pairs for the final dictionary output.




                Just for fun, something based off of your initial attempts might be:



                db.sample.aggregate([
                "$addFields": "b": "$objectToArray": "$b" ,
                "$unwind": "$b" ,
                "$group":
                "_id":
                "_id": "$b.k",
                "k": "$b.v"
                ,
                "count": "$sum": 1

                ,
                "$group":
                "_id": "$_id._id",
                "data": "$push": "k": "$toString": "$_id.k" , "v": "$count"
                ,
                "$addFields":
                "data": "$arrayToObject": "$data"

                ])


                Which would return:



                 "_id" : "c", "data" : "25" : 3, "5" : 1 
                "_id" : "e", "data" : "36" : 4
                "_id" : "d", "data" : "xxx" : 1, "xx" : 2, "x" : 1


                Again, the same result without the additional pipeline stages to transform comes from using map and reduce with python:



                cursor = db.sample.aggregate([
                "$addFields": "b": "$objectToArray": "$b" ,
                "$unwind": "$b" ,
                "$group":
                "_id":
                "_id": "$b.k",
                "k": "$b.v"
                ,
                "count": "$sum": 1
                ,
                "$group":
                "_id": "$_id._id",
                "data": "$push": "k": "$_id.k", "v": "$count"

                ])

                data = list(cursor)


                result = map(lambda d:
                '_id': d['_id'],
                'data': reduce(lambda x,y:
                dict(x.items() + y['k']: y['v'] .items()), d['data'],
                )
                ,data)






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Mar 24 at 9:51

























                answered Mar 24 at 9:03









                Neil LunnNeil Lunn

                103k23185191




                103k23185191



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55321910%2fcalculating-the-frequency-counter-of-all-the-unique-values-of-a-nested-field%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

                    용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

                    155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해