BigQuery Merge JSon Documents Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceBigquery - json_extract all elements from an arrayHow can I “pretty” format my JSON output in Ruby on Rails?How do I format a Microsoft JSON date?Can comments be used in JSON?How can I pretty-print JSON in a shell script?What is the correct JSON content type?How can I deserialize JSON to a simple Dictionary<string,string> in ASP.NET?Why does Google prepend while(1); to their JSON responses?Parse JSON in JavaScript?Pretty-Printing JSON with PHPHow do I POST JSON data with Curl from a terminal/commandline to Test Spring REST?

Mortgage adviser recommends a longer term than necessary combined with overpayments

Who can trigger ship-wide alerts in Star Trek?

Is there folklore associating late breastfeeding with low intelligence and/or gullibility?

How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time

Why is "Captain Marvel" translated as male in Portugal?

What is the electric potential inside a point charge?

Does the STL have a way to apply a function before calling less than?

Single author papers against my advisor's will?

How can I protect witches in combat who wear limited clothing?

Keep going mode for require-package

Problem when applying foreach loop

Why does tar appear to skip file contents when output file is /dev/null?

Replacing HDD with SSD; what about non-APFS/APFS?

Estimated State payment too big --> money back; + 2018 Tax Reform

What is the largest species of polychaete?

What was the last x86 CPU that did not have the x87 floating-point unit built in?

How does modal jazz use chord progressions?

Using "nakedly" instead of "with nothing on"

Can I throw a sword that doesn't have the Thrown property at someone?

How to say that you spent the night with someone, you were only sleeping and nothing else?

New Order #5: where Fibonacci and Beatty meet at Wythoff

I'm thinking of a number

What did Darwin mean by 'squib' here?

Windows 10: How to Lock (not sleep) laptop on lid close?



BigQuery Merge JSon Documents



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
The Ask Question Wizard is Live!
Data science time! April 2019 and salary with experienceBigquery - json_extract all elements from an arrayHow can I “pretty” format my JSON output in Ruby on Rails?How do I format a Microsoft JSON date?Can comments be used in JSON?How can I pretty-print JSON in a shell script?What is the correct JSON content type?How can I deserialize JSON to a simple Dictionary<string,string> in ASP.NET?Why does Google prepend while(1); to their JSON responses?Parse JSON in JavaScript?Pretty-Printing JSON with PHPHow do I POST JSON data with Curl from a terminal/commandline to Test Spring REST?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I would like to have a table with a JSON column. This JSON column can include arbitrary documents. I would like to merge these documents according to their timestamp which is available in another column. Is there a way to merge these JSON documents by their timestamp?



Here is an example:



at t3 time a:"1", b:"2"
at t2 time b:"4"
at t1 time a:"4", c:"5"



I want to create a:"1", b:"2", c:"5" as output. Is it possible to this in BigQuery?










share|improve this question




























    0















    I would like to have a table with a JSON column. This JSON column can include arbitrary documents. I would like to merge these documents according to their timestamp which is available in another column. Is there a way to merge these JSON documents by their timestamp?



    Here is an example:



    at t3 time a:"1", b:"2"
    at t2 time b:"4"
    at t1 time a:"4", c:"5"



    I want to create a:"1", b:"2", c:"5" as output. Is it possible to this in BigQuery?










    share|improve this question
























      0












      0








      0








      I would like to have a table with a JSON column. This JSON column can include arbitrary documents. I would like to merge these documents according to their timestamp which is available in another column. Is there a way to merge these JSON documents by their timestamp?



      Here is an example:



      at t3 time a:"1", b:"2"
      at t2 time b:"4"
      at t1 time a:"4", c:"5"



      I want to create a:"1", b:"2", c:"5" as output. Is it possible to this in BigQuery?










      share|improve this question














      I would like to have a table with a JSON column. This JSON column can include arbitrary documents. I would like to merge these documents according to their timestamp which is available in another column. Is there a way to merge these JSON documents by their timestamp?



      Here is an example:



      at t3 time a:"1", b:"2"
      at t2 time b:"4"
      at t1 time a:"4", c:"5"



      I want to create a:"1", b:"2", c:"5" as output. Is it possible to this in BigQuery?







      json google-bigquery






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 22 at 7:19









      CemoCemo

      2,92743765




      2,92743765






















          3 Answers
          3






          active

          oldest

          votes


















          1














          Below is for BigQuery Standard SQL



          #standardSQL
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          Note: above solution generic and does not require knowing in advance attribute names (like a, b, etc.) rather it parse and extract whatever it will find. Obviously it relies on assumption of simple jsons like in your examples



          You can test, play with above using sample data from your question as in example below



          #standardSQL
          WITH `project.dataset.table` AS (
          SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
          SELECT '"b":"4"', 2 UNION ALL
          SELECT '"a":"4", "c":"5"', 1
          )
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          with result



          Row json 
          1 "a":"1", "b":"2", "c":"5"


          Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below



          #standardSQL
          WITH `project.dataset.table` AS (
          SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
          SELECT '"b":"4"', 2 UNION ALL
          SELECT '"a":"4", "c":"5"', 1 UNION ALL
          SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
          SELECT '"abc":"3", "vwy":"4"', 3
          )
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          with result



          Row json 
          1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"





          share|improve this answer























          • This is really so beautiful. :)

            – Cemo
            Mar 23 at 21:33


















          1














          There might be a better way, while the first idea occurred to me is:



          1. Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')

          2. Use JavaScript UDF to parse JSON / merge / output to string.

          #standardSQL
          CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
          RETURNS STRING
          LANGUAGE js
          AS
          """
          // TODO 1: split json string with '||||||' to get multiple parts
          // . 2: parse each json parts into object
          // 3: merge objects in your own way

          // fake output, just to demonstrate the idea
          var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
          return JSON.stringify(obj);
          """;
          WITH
          sample_data AS (
          SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
          UNION ALL
          SELECT 'b:"4"' AS json, 2000 AS timestamp
          UNION ALL
          SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )

          SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
          FROM sample_data
          GROUP BY timestamp


          Output:



          output






          share|improve this answer

























          • very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

            – Tamir Klein
            Mar 22 at 8:41











          • Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

            – Mikhail Berlyant
            Mar 22 at 11:34











          • I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

            – Yun Zhang
            Mar 22 at 17:05


















          1














          Here is a possible solution using BigQuery standard SQL functions, with your data:



          #standardSQL
          WITH test AS (
          SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
          SELECT '"b":"4"' AS json, 2 AS t UNION ALL
          SELECT '"a":"4", "c":"5"' AS json, 1 AS t
          )
          SELECT data_row, TO_JSON_STRING(data_row) AS json_row
          FROM (
          SELECT
          ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
          ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
          ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
          FROM(
          SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
          JSON_EXTRACT_SCALAR(json,'$.b') AS b,
          JSON_EXTRACT_SCALAR(json,'$.c') AS c,
          t
          FROM test
          )
          ) AS data_row


          Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:



          Row data_row.a data_row.b data_row.c json_row 
          1 1 2 5 "a":"1","b":"2","c":"5"


          The problem with this query is that you have to specify all the documents (in this case, a,b,c).






          share|improve this answer























          • Thanks. This is also beautiful.

            – Cemo
            Mar 23 at 21:34











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55294662%2fbigquery-merge-json-documents%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Below is for BigQuery Standard SQL



          #standardSQL
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          Note: above solution generic and does not require knowing in advance attribute names (like a, b, etc.) rather it parse and extract whatever it will find. Obviously it relies on assumption of simple jsons like in your examples



          You can test, play with above using sample data from your question as in example below



          #standardSQL
          WITH `project.dataset.table` AS (
          SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
          SELECT '"b":"4"', 2 UNION ALL
          SELECT '"a":"4", "c":"5"', 1
          )
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          with result



          Row json 
          1 "a":"1", "b":"2", "c":"5"


          Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below



          #standardSQL
          WITH `project.dataset.table` AS (
          SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
          SELECT '"b":"4"', 2 UNION ALL
          SELECT '"a":"4", "c":"5"', 1 UNION ALL
          SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
          SELECT '"abc":"3", "vwy":"4"', 3
          )
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          with result



          Row json 
          1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"





          share|improve this answer























          • This is really so beautiful. :)

            – Cemo
            Mar 23 at 21:33















          1














          Below is for BigQuery Standard SQL



          #standardSQL
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          Note: above solution generic and does not require knowing in advance attribute names (like a, b, etc.) rather it parse and extract whatever it will find. Obviously it relies on assumption of simple jsons like in your examples



          You can test, play with above using sample data from your question as in example below



          #standardSQL
          WITH `project.dataset.table` AS (
          SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
          SELECT '"b":"4"', 2 UNION ALL
          SELECT '"a":"4", "c":"5"', 1
          )
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          with result



          Row json 
          1 "a":"1", "b":"2", "c":"5"


          Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below



          #standardSQL
          WITH `project.dataset.table` AS (
          SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
          SELECT '"b":"4"', 2 UNION ALL
          SELECT '"a":"4", "c":"5"', 1 UNION ALL
          SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
          SELECT '"abc":"3", "vwy":"4"', 3
          )
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          with result



          Row json 
          1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"





          share|improve this answer























          • This is really so beautiful. :)

            – Cemo
            Mar 23 at 21:33













          1












          1








          1







          Below is for BigQuery Standard SQL



          #standardSQL
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          Note: above solution generic and does not require knowing in advance attribute names (like a, b, etc.) rather it parse and extract whatever it will find. Obviously it relies on assumption of simple jsons like in your examples



          You can test, play with above using sample data from your question as in example below



          #standardSQL
          WITH `project.dataset.table` AS (
          SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
          SELECT '"b":"4"', 2 UNION ALL
          SELECT '"a":"4", "c":"5"', 1
          )
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          with result



          Row json 
          1 "a":"1", "b":"2", "c":"5"


          Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below



          #standardSQL
          WITH `project.dataset.table` AS (
          SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
          SELECT '"b":"4"', 2 UNION ALL
          SELECT '"a":"4", "c":"5"', 1 UNION ALL
          SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
          SELECT '"abc":"3", "vwy":"4"', 3
          )
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          with result



          Row json 
          1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"





          share|improve this answer













          Below is for BigQuery Standard SQL



          #standardSQL
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          Note: above solution generic and does not require knowing in advance attribute names (like a, b, etc.) rather it parse and extract whatever it will find. Obviously it relies on assumption of simple jsons like in your examples



          You can test, play with above using sample data from your question as in example below



          #standardSQL
          WITH `project.dataset.table` AS (
          SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
          SELECT '"b":"4"', 2 UNION ALL
          SELECT '"a":"4", "c":"5"', 1
          )
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          with result



          Row json 
          1 "a":"1", "b":"2", "c":"5"


          Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below



          #standardSQL
          WITH `project.dataset.table` AS (
          SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
          SELECT '"b":"4"', 2 UNION ALL
          SELECT '"a":"4", "c":"5"', 1 UNION ALL
          SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
          SELECT '"abc":"3", "vwy":"4"', 3
          )
          SELECT STRING_AGG(y, ', ' ORDER BY y) json
          FROM (
          SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
          FROM `project.dataset.table`,
          UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
          GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
          )


          with result



          Row json 
          1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 22 at 11:27









          Mikhail BerlyantMikhail Berlyant

          63.3k43874




          63.3k43874












          • This is really so beautiful. :)

            – Cemo
            Mar 23 at 21:33

















          • This is really so beautiful. :)

            – Cemo
            Mar 23 at 21:33
















          This is really so beautiful. :)

          – Cemo
          Mar 23 at 21:33





          This is really so beautiful. :)

          – Cemo
          Mar 23 at 21:33













          1














          There might be a better way, while the first idea occurred to me is:



          1. Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')

          2. Use JavaScript UDF to parse JSON / merge / output to string.

          #standardSQL
          CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
          RETURNS STRING
          LANGUAGE js
          AS
          """
          // TODO 1: split json string with '||||||' to get multiple parts
          // . 2: parse each json parts into object
          // 3: merge objects in your own way

          // fake output, just to demonstrate the idea
          var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
          return JSON.stringify(obj);
          """;
          WITH
          sample_data AS (
          SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
          UNION ALL
          SELECT 'b:"4"' AS json, 2000 AS timestamp
          UNION ALL
          SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )

          SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
          FROM sample_data
          GROUP BY timestamp


          Output:



          output






          share|improve this answer

























          • very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

            – Tamir Klein
            Mar 22 at 8:41











          • Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

            – Mikhail Berlyant
            Mar 22 at 11:34











          • I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

            – Yun Zhang
            Mar 22 at 17:05















          1














          There might be a better way, while the first idea occurred to me is:



          1. Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')

          2. Use JavaScript UDF to parse JSON / merge / output to string.

          #standardSQL
          CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
          RETURNS STRING
          LANGUAGE js
          AS
          """
          // TODO 1: split json string with '||||||' to get multiple parts
          // . 2: parse each json parts into object
          // 3: merge objects in your own way

          // fake output, just to demonstrate the idea
          var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
          return JSON.stringify(obj);
          """;
          WITH
          sample_data AS (
          SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
          UNION ALL
          SELECT 'b:"4"' AS json, 2000 AS timestamp
          UNION ALL
          SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )

          SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
          FROM sample_data
          GROUP BY timestamp


          Output:



          output






          share|improve this answer

























          • very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

            – Tamir Klein
            Mar 22 at 8:41











          • Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

            – Mikhail Berlyant
            Mar 22 at 11:34











          • I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

            – Yun Zhang
            Mar 22 at 17:05













          1












          1








          1







          There might be a better way, while the first idea occurred to me is:



          1. Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')

          2. Use JavaScript UDF to parse JSON / merge / output to string.

          #standardSQL
          CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
          RETURNS STRING
          LANGUAGE js
          AS
          """
          // TODO 1: split json string with '||||||' to get multiple parts
          // . 2: parse each json parts into object
          // 3: merge objects in your own way

          // fake output, just to demonstrate the idea
          var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
          return JSON.stringify(obj);
          """;
          WITH
          sample_data AS (
          SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
          UNION ALL
          SELECT 'b:"4"' AS json, 2000 AS timestamp
          UNION ALL
          SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )

          SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
          FROM sample_data
          GROUP BY timestamp


          Output:



          output






          share|improve this answer















          There might be a better way, while the first idea occurred to me is:



          1. Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')

          2. Use JavaScript UDF to parse JSON / merge / output to string.

          #standardSQL
          CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
          RETURNS STRING
          LANGUAGE js
          AS
          """
          // TODO 1: split json string with '||||||' to get multiple parts
          // . 2: parse each json parts into object
          // 3: merge objects in your own way

          // fake output, just to demonstrate the idea
          var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
          return JSON.stringify(obj);
          """;
          WITH
          sample_data AS (
          SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
          UNION ALL
          SELECT 'b:"4"' AS json, 2000 AS timestamp
          UNION ALL
          SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )

          SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
          FROM sample_data
          GROUP BY timestamp


          Output:



          output







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 22 at 8:02

























          answered Mar 22 at 7:50









          Yun ZhangYun Zhang

          36710




          36710












          • very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

            – Tamir Klein
            Mar 22 at 8:41











          • Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

            – Mikhail Berlyant
            Mar 22 at 11:34











          • I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

            – Yun Zhang
            Mar 22 at 17:05

















          • very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

            – Tamir Klein
            Mar 22 at 8:41











          • Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

            – Mikhail Berlyant
            Mar 22 at 11:34











          • I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

            – Yun Zhang
            Mar 22 at 17:05
















          very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

          – Tamir Klein
          Mar 22 at 8:41





          very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

          – Tamir Klein
          Mar 22 at 8:41













          Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

          – Mikhail Berlyant
          Mar 22 at 11:34





          Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

          – Mikhail Berlyant
          Mar 22 at 11:34













          I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

          – Yun Zhang
          Mar 22 at 17:05





          I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

          – Yun Zhang
          Mar 22 at 17:05











          1














          Here is a possible solution using BigQuery standard SQL functions, with your data:



          #standardSQL
          WITH test AS (
          SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
          SELECT '"b":"4"' AS json, 2 AS t UNION ALL
          SELECT '"a":"4", "c":"5"' AS json, 1 AS t
          )
          SELECT data_row, TO_JSON_STRING(data_row) AS json_row
          FROM (
          SELECT
          ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
          ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
          ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
          FROM(
          SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
          JSON_EXTRACT_SCALAR(json,'$.b') AS b,
          JSON_EXTRACT_SCALAR(json,'$.c') AS c,
          t
          FROM test
          )
          ) AS data_row


          Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:



          Row data_row.a data_row.b data_row.c json_row 
          1 1 2 5 "a":"1","b":"2","c":"5"


          The problem with this query is that you have to specify all the documents (in this case, a,b,c).






          share|improve this answer























          • Thanks. This is also beautiful.

            – Cemo
            Mar 23 at 21:34















          1














          Here is a possible solution using BigQuery standard SQL functions, with your data:



          #standardSQL
          WITH test AS (
          SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
          SELECT '"b":"4"' AS json, 2 AS t UNION ALL
          SELECT '"a":"4", "c":"5"' AS json, 1 AS t
          )
          SELECT data_row, TO_JSON_STRING(data_row) AS json_row
          FROM (
          SELECT
          ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
          ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
          ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
          FROM(
          SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
          JSON_EXTRACT_SCALAR(json,'$.b') AS b,
          JSON_EXTRACT_SCALAR(json,'$.c') AS c,
          t
          FROM test
          )
          ) AS data_row


          Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:



          Row data_row.a data_row.b data_row.c json_row 
          1 1 2 5 "a":"1","b":"2","c":"5"


          The problem with this query is that you have to specify all the documents (in this case, a,b,c).






          share|improve this answer























          • Thanks. This is also beautiful.

            – Cemo
            Mar 23 at 21:34













          1












          1








          1







          Here is a possible solution using BigQuery standard SQL functions, with your data:



          #standardSQL
          WITH test AS (
          SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
          SELECT '"b":"4"' AS json, 2 AS t UNION ALL
          SELECT '"a":"4", "c":"5"' AS json, 1 AS t
          )
          SELECT data_row, TO_JSON_STRING(data_row) AS json_row
          FROM (
          SELECT
          ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
          ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
          ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
          FROM(
          SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
          JSON_EXTRACT_SCALAR(json,'$.b') AS b,
          JSON_EXTRACT_SCALAR(json,'$.c') AS c,
          t
          FROM test
          )
          ) AS data_row


          Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:



          Row data_row.a data_row.b data_row.c json_row 
          1 1 2 5 "a":"1","b":"2","c":"5"


          The problem with this query is that you have to specify all the documents (in this case, a,b,c).






          share|improve this answer













          Here is a possible solution using BigQuery standard SQL functions, with your data:



          #standardSQL
          WITH test AS (
          SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
          SELECT '"b":"4"' AS json, 2 AS t UNION ALL
          SELECT '"a":"4", "c":"5"' AS json, 1 AS t
          )
          SELECT data_row, TO_JSON_STRING(data_row) AS json_row
          FROM (
          SELECT
          ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
          ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
          ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
          FROM(
          SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
          JSON_EXTRACT_SCALAR(json,'$.b') AS b,
          JSON_EXTRACT_SCALAR(json,'$.c') AS c,
          t
          FROM test
          )
          ) AS data_row


          Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:



          Row data_row.a data_row.b data_row.c json_row 
          1 1 2 5 "a":"1","b":"2","c":"5"


          The problem with this query is that you have to specify all the documents (in this case, a,b,c).







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 22 at 10:24









          enle linenle lin

          1,021210




          1,021210












          • Thanks. This is also beautiful.

            – Cemo
            Mar 23 at 21:34

















          • Thanks. This is also beautiful.

            – Cemo
            Mar 23 at 21:34
















          Thanks. This is also beautiful.

          – Cemo
          Mar 23 at 21:34





          Thanks. This is also beautiful.

          – Cemo
          Mar 23 at 21:34

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55294662%2fbigquery-merge-json-documents%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

          용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

          155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해