BigQuery Merge JSon Documents Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceBigquery - json_extract all elements from an arrayHow can I “pretty” format my JSON output in Ruby on Rails?How do I format a Microsoft JSON date?Can comments be used in JSON?How can I pretty-print JSON in a shell script?What is the correct JSON content type?How can I deserialize JSON to a simple Dictionary<string,string> in ASP.NET?Why does Google prepend while(1); to their JSON responses?Parse JSON in JavaScript?Pretty-Printing JSON with PHPHow do I POST JSON data with Curl from a terminal/commandline to Test Spring REST?

Mortgage adviser recommends a longer term than necessary combined with overpayments

Who can trigger ship-wide alerts in Star Trek?

Is there folklore associating late breastfeeding with low intelligence and/or gullibility?

How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time

Why is "Captain Marvel" translated as male in Portugal?

What is the electric potential inside a point charge?

Does the STL have a way to apply a function before calling less than?

Single author papers against my advisor's will?

How can I protect witches in combat who wear limited clothing?

Keep going mode for require-package

Problem when applying foreach loop

Why does tar appear to skip file contents when output file is /dev/null?

Replacing HDD with SSD; what about non-APFS/APFS?

Estimated State payment too big --> money back; + 2018 Tax Reform

What is the largest species of polychaete?

What was the last x86 CPU that did not have the x87 floating-point unit built in?

How does modal jazz use chord progressions?

Using "nakedly" instead of "with nothing on"

Can I throw a sword that doesn't have the Thrown property at someone?

How to say that you spent the night with someone, you were only sleeping and nothing else?

New Order #5: where Fibonacci and Beatty meet at Wythoff

I'm thinking of a number

What did Darwin mean by 'squib' here?

Windows 10: How to Lock (not sleep) laptop on lid close?

BigQuery Merge JSon Documents

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

The Ask Question Wizard is Live!

Data science time! April 2019 and salary with experienceBigquery - json_extract all elements from an arrayHow can I “pretty” format my JSON output in Ruby on Rails?How do I format a Microsoft JSON date?Can comments be used in JSON?How can I pretty-print JSON in a shell script?What is the correct JSON content type?How can I deserialize JSON to a simple Dictionary<string,string> in ASP.NET?Why does Google prepend while(1); to their JSON responses?Parse JSON in JavaScript?Pretty-Printing JSON with PHPHow do I POST JSON data with Curl from a terminal/commandline to Test Spring REST?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I would like to have a table with a JSON column. This JSON column can include arbitrary documents. I would like to merge these documents according to their timestamp which is available in another column. Is there a way to merge these JSON documents by their timestamp?

Here is an example:

at t3 time a:"1", b:"2"
at t2 time b:"4"
at t1 time a:"4", c:"5"

I want to create a:"1", b:"2", c:"5" as output. Is it possible to this in BigQuery?

asked Mar 22 at 7:19

Cemo

2,92743765

add a comment |

Here is an example:

at t3 time a:"1", b:"2"
at t2 time b:"4"
at t1 time a:"4", c:"5"

I want to create a:"1", b:"2", c:"5" as output. Is it possible to this in BigQuery?

asked Mar 22 at 7:19

Cemo

2,92743765

add a comment |

Here is an example:

at t3 time a:"1", b:"2"
at t2 time b:"4"
at t1 time a:"4", c:"5"

I want to create a:"1", b:"2", c:"5" as output. Is it possible to this in BigQuery?

asked Mar 22 at 7:19

Cemo

2,92743765

Here is an example:

at t3 time a:"1", b:"2"
at t2 time b:"4"
at t1 time a:"4", c:"5"

I want to create a:"1", b:"2", c:"5" as output. Is it possible to this in BigQuery?

json google-bigquery

asked Mar 22 at 7:19

Cemo

2,92743765

asked Mar 22 at 7:19

Cemo

2,92743765

asked Mar 22 at 7:19

Cemo

2,92743765

asked Mar 22 at 7:19

Cemo

2,92743765

asked Mar 22 at 7:19

Cemo

2,92743765

add a comment |

3 Answers
3

active

oldest

votes

Below is for BigQuery Standard SQL

#standardSQL
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

Note: above solution generic and does not require knowing in advance attribute names (like a, b, etc.) rather it parse and extract whatever it will find. Obviously it relies on assumption of simple jsons like in your examples

You can test, play with above using sample data from your question as in example below

#standardSQL
WITH `project.dataset.table` AS (
 SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
 SELECT '"b":"4"', 2 UNION ALL 
 SELECT '"a":"4", "c":"5"', 1 
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

with result

Row json 
1 "a":"1", "b":"2", "c":"5"

Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below

#standardSQL
WITH `project.dataset.table` AS (
 SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
 SELECT '"b":"4"', 2 UNION ALL 
 SELECT '"a":"4", "c":"5"', 1 UNION ALL
 SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
 SELECT '"abc":"3", "vwy":"4"', 3 
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

with result

Row json 
1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"

answered Mar 22 at 11:27

Mikhail Berlyant

63.3k43874

This is really so beautiful. :)

– Cemo
Mar 23 at 21:33

add a comment |

There might be a better way, while the first idea occurred to me is:

Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')

Use JavaScript UDF to parse JSON / merge / output to string.

#standardSQL
CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
RETURNS STRING
LANGUAGE js
AS 
"""
 // TODO 1: split json string with '||||||' to get multiple parts
 // . 2: parse each json parts into object
 // 3: merge objects in your own way

 // fake output, just to demonstrate the idea
 var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
 return JSON.stringify(obj);
""";
WITH
 sample_data AS (
 SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
 UNION ALL
 SELECT 'b:"4"' AS json, 2000 AS timestamp
 UNION ALL 
 SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )

SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
FROM sample_data
GROUP BY timestamp

Output:

output

edited Mar 22 at 8:02

answered Mar 22 at 7:50

Yun Zhang

36710

very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

– Tamir Klein
Mar 22 at 8:41

Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

– Mikhail Berlyant
Mar 22 at 11:34

I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

– Yun Zhang
Mar 22 at 17:05

add a comment |

Here is a possible solution using BigQuery standard SQL functions, with your data:

#standardSQL
WITH test AS (
 SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
 SELECT '"b":"4"' AS json, 2 AS t UNION ALL 
 SELECT '"a":"4", "c":"5"' AS json, 1 AS t 
)
SELECT data_row, TO_JSON_STRING(data_row) AS json_row
FROM (
 SELECT 
 ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
 ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
 ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
 FROM(
 SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
 JSON_EXTRACT_SCALAR(json,'$.b') AS b,
 JSON_EXTRACT_SCALAR(json,'$.c') AS c, 
 t
 FROM test
 )
) AS data_row

Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:

Row data_row.a data_row.b data_row.c json_row 
1 1 2 5 "a":"1","b":"2","c":"5"

The problem with this query is that you have to specify all the documents (in this case, a,b,c).

answered Mar 22 at 10:24

enle lin

1,021210

Thanks. This is also beautiful.

– Cemo
Mar 23 at 21:34

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55294662%2fbigquery-merge-json-documents%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Below is for BigQuery Standard SQL

#standardSQL
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

You can test, play with above using sample data from your question as in example below

#standardSQL
WITH `project.dataset.table` AS (
 SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
 SELECT '"b":"4"', 2 UNION ALL 
 SELECT '"a":"4", "c":"5"', 1 
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

with result

Row json 
1 "a":"1", "b":"2", "c":"5"

Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below

#standardSQL
WITH `project.dataset.table` AS (
 SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
 SELECT '"b":"4"', 2 UNION ALL 
 SELECT '"a":"4", "c":"5"', 1 UNION ALL
 SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
 SELECT '"abc":"3", "vwy":"4"', 3 
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

with result

Row json 
1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"

answered Mar 22 at 11:27

Mikhail Berlyant

63.3k43874

This is really so beautiful. :)

– Cemo
Mar 23 at 21:33

add a comment |

Below is for BigQuery Standard SQL

#standardSQL
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

You can test, play with above using sample data from your question as in example below

#standardSQL
WITH `project.dataset.table` AS (
 SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
 SELECT '"b":"4"', 2 UNION ALL 
 SELECT '"a":"4", "c":"5"', 1 
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

with result

Row json 
1 "a":"1", "b":"2", "c":"5"

Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below

#standardSQL
WITH `project.dataset.table` AS (
 SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
 SELECT '"b":"4"', 2 UNION ALL 
 SELECT '"a":"4", "c":"5"', 1 UNION ALL
 SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
 SELECT '"abc":"3", "vwy":"4"', 3 
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

with result

Row json 
1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"

answered Mar 22 at 11:27

Mikhail Berlyant

63.3k43874

This is really so beautiful. :)

– Cemo
Mar 23 at 21:33

add a comment |

Below is for BigQuery Standard SQL

#standardSQL
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

You can test, play with above using sample data from your question as in example below

#standardSQL
WITH `project.dataset.table` AS (
 SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
 SELECT '"b":"4"', 2 UNION ALL 
 SELECT '"a":"4", "c":"5"', 1 
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

with result

Row json 
1 "a":"1", "b":"2", "c":"5"

Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below

#standardSQL
WITH `project.dataset.table` AS (
 SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
 SELECT '"b":"4"', 2 UNION ALL 
 SELECT '"a":"4", "c":"5"', 1 UNION ALL
 SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
 SELECT '"abc":"3", "vwy":"4"', 3 
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

with result

Row json 
1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"

answered Mar 22 at 11:27

Mikhail Berlyant

63.3k43874

Below is for BigQuery Standard SQL

#standardSQL
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

You can test, play with above using sample data from your question as in example below

#standardSQL
WITH `project.dataset.table` AS (
 SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
 SELECT '"b":"4"', 2 UNION ALL 
 SELECT '"a":"4", "c":"5"', 1 
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

with result

Row json 
1 "a":"1", "b":"2", "c":"5"

Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below

#standardSQL
WITH `project.dataset.table` AS (
 SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
 SELECT '"b":"4"', 2 UNION ALL 
 SELECT '"a":"4", "c":"5"', 1 UNION ALL
 SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
 SELECT '"abc":"3", "vwy":"4"', 3 
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
 SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
 FROM `project.dataset.table`, 
 UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
 GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)]) 
)

with result

Row json 
1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"

answered Mar 22 at 11:27

Mikhail Berlyant

63.3k43874

answered Mar 22 at 11:27

Mikhail Berlyant

63.3k43874

answered Mar 22 at 11:27

Mikhail Berlyant

63.3k43874

answered Mar 22 at 11:27

Mikhail Berlyant

63.3k43874

This is really so beautiful. :)

– Cemo
Mar 23 at 21:33

add a comment |

This is really so beautiful. :)

– Cemo
Mar 23 at 21:33

This is really so beautiful. :)

– Cemo
Mar 23 at 21:33

add a comment |

There might be a better way, while the first idea occurred to me is:

Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')

Use JavaScript UDF to parse JSON / merge / output to string.

#standardSQL
CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
RETURNS STRING
LANGUAGE js
AS 
"""
 // TODO 1: split json string with '||||||' to get multiple parts
 // . 2: parse each json parts into object
 // 3: merge objects in your own way

 // fake output, just to demonstrate the idea
 var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
 return JSON.stringify(obj);
""";
WITH
 sample_data AS (
 SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
 UNION ALL
 SELECT 'b:"4"' AS json, 2000 AS timestamp
 UNION ALL 
 SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )

SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
FROM sample_data
GROUP BY timestamp

Output:

output

edited Mar 22 at 8:02

answered Mar 22 at 7:50

Yun Zhang

36710

very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

– Tamir Klein
Mar 22 at 8:41

Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

– Mikhail Berlyant
Mar 22 at 11:34

I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

– Yun Zhang
Mar 22 at 17:05

add a comment |

There might be a better way, while the first idea occurred to me is:

Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')

Use JavaScript UDF to parse JSON / merge / output to string.

#standardSQL
CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
RETURNS STRING
LANGUAGE js
AS 
"""
 // TODO 1: split json string with '||||||' to get multiple parts
 // . 2: parse each json parts into object
 // 3: merge objects in your own way

 // fake output, just to demonstrate the idea
 var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
 return JSON.stringify(obj);
""";
WITH
 sample_data AS (
 SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
 UNION ALL
 SELECT 'b:"4"' AS json, 2000 AS timestamp
 UNION ALL 
 SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )

SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
FROM sample_data
GROUP BY timestamp

Output:

output

edited Mar 22 at 8:02

answered Mar 22 at 7:50

Yun Zhang

36710

very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

– Tamir Klein
Mar 22 at 8:41

Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

– Mikhail Berlyant
Mar 22 at 11:34

I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

– Yun Zhang
Mar 22 at 17:05

add a comment |

There might be a better way, while the first idea occurred to me is:

Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')

Use JavaScript UDF to parse JSON / merge / output to string.

#standardSQL
CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
RETURNS STRING
LANGUAGE js
AS 
"""
 // TODO 1: split json string with '||||||' to get multiple parts
 // . 2: parse each json parts into object
 // 3: merge objects in your own way

 // fake output, just to demonstrate the idea
 var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
 return JSON.stringify(obj);
""";
WITH
 sample_data AS (
 SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
 UNION ALL
 SELECT 'b:"4"' AS json, 2000 AS timestamp
 UNION ALL 
 SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )

SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
FROM sample_data
GROUP BY timestamp

Output:

output

edited Mar 22 at 8:02

answered Mar 22 at 7:50

Yun Zhang

36710

There might be a better way, while the first idea occurred to me is:

Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')

Use JavaScript UDF to parse JSON / merge / output to string.

#standardSQL
CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
RETURNS STRING
LANGUAGE js
AS 
"""
 // TODO 1: split json string with '||||||' to get multiple parts
 // . 2: parse each json parts into object
 // 3: merge objects in your own way

 // fake output, just to demonstrate the idea
 var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
 return JSON.stringify(obj);
""";
WITH
 sample_data AS (
 SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
 UNION ALL
 SELECT 'b:"4"' AS json, 2000 AS timestamp
 UNION ALL 
 SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )

SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
FROM sample_data
GROUP BY timestamp

Output:

output

edited Mar 22 at 8:02

answered Mar 22 at 7:50

Yun Zhang

36710

edited Mar 22 at 8:02

answered Mar 22 at 7:50

Yun Zhang

36710

answered Mar 22 at 7:50

Yun Zhang

36710

answered Mar 22 at 7:50

Yun Zhang

36710

very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

– Tamir Klein
Mar 22 at 8:41

Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

– Mikhail Berlyant
Mar 22 at 11:34

I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

– Yun Zhang
Mar 22 at 17:05

add a comment |

very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

– Tamir Klein
Mar 22 at 8:41

Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

– Mikhail Berlyant
Mar 22 at 11:34

I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

– Yun Zhang
Mar 22 at 17:05

very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958

– Tamir Klein
Mar 22 at 8:41

Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already

– Mikhail Berlyant
Mar 22 at 11:34

I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.

– Yun Zhang
Mar 22 at 17:05

add a comment |

Here is a possible solution using BigQuery standard SQL functions, with your data:

#standardSQL
WITH test AS (
 SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
 SELECT '"b":"4"' AS json, 2 AS t UNION ALL 
 SELECT '"a":"4", "c":"5"' AS json, 1 AS t 
)
SELECT data_row, TO_JSON_STRING(data_row) AS json_row
FROM (
 SELECT 
 ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
 ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
 ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
 FROM(
 SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
 JSON_EXTRACT_SCALAR(json,'$.b') AS b,
 JSON_EXTRACT_SCALAR(json,'$.c') AS c, 
 t
 FROM test
 )
) AS data_row

Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:

Row data_row.a data_row.b data_row.c json_row 
1 1 2 5 "a":"1","b":"2","c":"5"

The problem with this query is that you have to specify all the documents (in this case, a,b,c).

answered Mar 22 at 10:24

enle lin

1,021210

Thanks. This is also beautiful.

– Cemo
Mar 23 at 21:34

add a comment |

Here is a possible solution using BigQuery standard SQL functions, with your data:

#standardSQL
WITH test AS (
 SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
 SELECT '"b":"4"' AS json, 2 AS t UNION ALL 
 SELECT '"a":"4", "c":"5"' AS json, 1 AS t 
)
SELECT data_row, TO_JSON_STRING(data_row) AS json_row
FROM (
 SELECT 
 ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
 ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
 ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
 FROM(
 SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
 JSON_EXTRACT_SCALAR(json,'$.b') AS b,
 JSON_EXTRACT_SCALAR(json,'$.c') AS c, 
 t
 FROM test
 )
) AS data_row

Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:

Row data_row.a data_row.b data_row.c json_row 
1 1 2 5 "a":"1","b":"2","c":"5"

The problem with this query is that you have to specify all the documents (in this case, a,b,c).

answered Mar 22 at 10:24

enle lin

1,021210

Thanks. This is also beautiful.

– Cemo
Mar 23 at 21:34

add a comment |

Here is a possible solution using BigQuery standard SQL functions, with your data:

#standardSQL
WITH test AS (
 SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
 SELECT '"b":"4"' AS json, 2 AS t UNION ALL 
 SELECT '"a":"4", "c":"5"' AS json, 1 AS t 
)
SELECT data_row, TO_JSON_STRING(data_row) AS json_row
FROM (
 SELECT 
 ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
 ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
 ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
 FROM(
 SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
 JSON_EXTRACT_SCALAR(json,'$.b') AS b,
 JSON_EXTRACT_SCALAR(json,'$.c') AS c, 
 t
 FROM test
 )
) AS data_row

Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:

Row data_row.a data_row.b data_row.c json_row 
1 1 2 5 "a":"1","b":"2","c":"5"

The problem with this query is that you have to specify all the documents (in this case, a,b,c).

answered Mar 22 at 10:24

enle lin

1,021210

Here is a possible solution using BigQuery standard SQL functions, with your data:

#standardSQL
WITH test AS (
 SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
 SELECT '"b":"4"' AS json, 2 AS t UNION ALL 
 SELECT '"a":"4", "c":"5"' AS json, 1 AS t 
)
SELECT data_row, TO_JSON_STRING(data_row) AS json_row
FROM (
 SELECT 
 ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
 ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
 ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
 FROM(
 SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
 JSON_EXTRACT_SCALAR(json,'$.b') AS b,
 JSON_EXTRACT_SCALAR(json,'$.c') AS c, 
 t
 FROM test
 )
) AS data_row

Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:

Row data_row.a data_row.b data_row.c json_row 
1 1 2 5 "a":"1","b":"2","c":"5"

The problem with this query is that you have to specify all the documents (in this case, a,b,c).

answered Mar 22 at 10:24

enle lin

1,021210

answered Mar 22 at 10:24

enle lin

1,021210

answered Mar 22 at 10:24

enle lin

1,021210

answered Mar 22 at 10:24

enle lin

1,021210

Thanks. This is also beautiful.

– Cemo
Mar 23 at 21:34

add a comment |

Thanks. This is also beautiful.

– Cemo
Mar 23 at 21:34

Thanks. This is also beautiful.

– Cemo
Mar 23 at 21:34

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

3 Answers
3

3 Answers
3

3 Answers
3