BigQuery Merge JSon Documents Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceBigquery - json_extract all elements from an arrayHow can I “pretty” format my JSON output in Ruby on Rails?How do I format a Microsoft JSON date?Can comments be used in JSON?How can I pretty-print JSON in a shell script?What is the correct JSON content type?How can I deserialize JSON to a simple Dictionary<string,string> in ASP.NET?Why does Google prepend while(1); to their JSON responses?Parse JSON in JavaScript?Pretty-Printing JSON with PHPHow do I POST JSON data with Curl from a terminal/commandline to Test Spring REST?
Mortgage adviser recommends a longer term than necessary combined with overpayments
Who can trigger ship-wide alerts in Star Trek?
Is there folklore associating late breastfeeding with low intelligence and/or gullibility?
How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time
Why is "Captain Marvel" translated as male in Portugal?
What is the electric potential inside a point charge?
Does the STL have a way to apply a function before calling less than?
Single author papers against my advisor's will?
How can I protect witches in combat who wear limited clothing?
Keep going mode for require-package
Problem when applying foreach loop
Why does tar appear to skip file contents when output file is /dev/null?
Replacing HDD with SSD; what about non-APFS/APFS?
Estimated State payment too big --> money back; + 2018 Tax Reform
What is the largest species of polychaete?
What was the last x86 CPU that did not have the x87 floating-point unit built in?
How does modal jazz use chord progressions?
Using "nakedly" instead of "with nothing on"
Can I throw a sword that doesn't have the Thrown property at someone?
How to say that you spent the night with someone, you were only sleeping and nothing else?
New Order #5: where Fibonacci and Beatty meet at Wythoff
I'm thinking of a number
What did Darwin mean by 'squib' here?
Windows 10: How to Lock (not sleep) laptop on lid close?
BigQuery Merge JSon Documents
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
The Ask Question Wizard is Live!
Data science time! April 2019 and salary with experienceBigquery - json_extract all elements from an arrayHow can I “pretty” format my JSON output in Ruby on Rails?How do I format a Microsoft JSON date?Can comments be used in JSON?How can I pretty-print JSON in a shell script?What is the correct JSON content type?How can I deserialize JSON to a simple Dictionary<string,string> in ASP.NET?Why does Google prepend while(1); to their JSON responses?Parse JSON in JavaScript?Pretty-Printing JSON with PHPHow do I POST JSON data with Curl from a terminal/commandline to Test Spring REST?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I would like to have a table with a JSON column. This JSON column can include arbitrary documents. I would like to merge these documents according to their timestamp which is available in another column. Is there a way to merge these JSON documents by their timestamp?
Here is an example:
at t3 time a:"1", b:"2"at t2 time b:"4"at t1 time a:"4", c:"5"
I want to create a:"1", b:"2", c:"5" as output. Is it possible to this in BigQuery?
json google-bigquery
add a comment |
I would like to have a table with a JSON column. This JSON column can include arbitrary documents. I would like to merge these documents according to their timestamp which is available in another column. Is there a way to merge these JSON documents by their timestamp?
Here is an example:
at t3 time a:"1", b:"2"at t2 time b:"4"at t1 time a:"4", c:"5"
I want to create a:"1", b:"2", c:"5" as output. Is it possible to this in BigQuery?
json google-bigquery
add a comment |
I would like to have a table with a JSON column. This JSON column can include arbitrary documents. I would like to merge these documents according to their timestamp which is available in another column. Is there a way to merge these JSON documents by their timestamp?
Here is an example:
at t3 time a:"1", b:"2"at t2 time b:"4"at t1 time a:"4", c:"5"
I want to create a:"1", b:"2", c:"5" as output. Is it possible to this in BigQuery?
json google-bigquery
I would like to have a table with a JSON column. This JSON column can include arbitrary documents. I would like to merge these documents according to their timestamp which is available in another column. Is there a way to merge these JSON documents by their timestamp?
Here is an example:
at t3 time a:"1", b:"2"at t2 time b:"4"at t1 time a:"4", c:"5"
I want to create a:"1", b:"2", c:"5" as output. Is it possible to this in BigQuery?
json google-bigquery
json google-bigquery
asked Mar 22 at 7:19
CemoCemo
2,92743765
2,92743765
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
Below is for BigQuery Standard SQL
#standardSQL
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
Note: above solution generic and does not require knowing in advance attribute names (like a, b, etc.) rather it parse and extract whatever it will find. Obviously it relies on assumption of simple jsons like in your examples
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
SELECT '"b":"4"', 2 UNION ALL
SELECT '"a":"4", "c":"5"', 1
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
with result
Row json
1 "a":"1", "b":"2", "c":"5"
Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
SELECT '"b":"4"', 2 UNION ALL
SELECT '"a":"4", "c":"5"', 1 UNION ALL
SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
SELECT '"abc":"3", "vwy":"4"', 3
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
with result
Row json
1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"
This is really so beautiful. :)
– Cemo
Mar 23 at 21:33
add a comment |
There might be a better way, while the first idea occurred to me is:
- Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')
- Use JavaScript UDF to parse JSON / merge / output to string.
#standardSQL
CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
RETURNS STRING
LANGUAGE js
AS
"""
// TODO 1: split json string with '||||||' to get multiple parts
// . 2: parse each json parts into object
// 3: merge objects in your own way
// fake output, just to demonstrate the idea
var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
return JSON.stringify(obj);
""";
WITH
sample_data AS (
SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
UNION ALL
SELECT 'b:"4"' AS json, 2000 AS timestamp
UNION ALL
SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )
SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
FROM sample_data
GROUP BY timestamp
Output:

very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958
– Tamir Klein
Mar 22 at 8:41
Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already
– Mikhail Berlyant
Mar 22 at 11:34
I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.
– Yun Zhang
Mar 22 at 17:05
add a comment |
Here is a possible solution using BigQuery standard SQL functions, with your data:
#standardSQL
WITH test AS (
SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
SELECT '"b":"4"' AS json, 2 AS t UNION ALL
SELECT '"a":"4", "c":"5"' AS json, 1 AS t
)
SELECT data_row, TO_JSON_STRING(data_row) AS json_row
FROM (
SELECT
ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
FROM(
SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
JSON_EXTRACT_SCALAR(json,'$.b') AS b,
JSON_EXTRACT_SCALAR(json,'$.c') AS c,
t
FROM test
)
) AS data_row
Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:
Row data_row.a data_row.b data_row.c json_row
1 1 2 5 "a":"1","b":"2","c":"5"
The problem with this query is that you have to specify all the documents (in this case, a,b,c).
Thanks. This is also beautiful.
– Cemo
Mar 23 at 21:34
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55294662%2fbigquery-merge-json-documents%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Below is for BigQuery Standard SQL
#standardSQL
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
Note: above solution generic and does not require knowing in advance attribute names (like a, b, etc.) rather it parse and extract whatever it will find. Obviously it relies on assumption of simple jsons like in your examples
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
SELECT '"b":"4"', 2 UNION ALL
SELECT '"a":"4", "c":"5"', 1
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
with result
Row json
1 "a":"1", "b":"2", "c":"5"
Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
SELECT '"b":"4"', 2 UNION ALL
SELECT '"a":"4", "c":"5"', 1 UNION ALL
SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
SELECT '"abc":"3", "vwy":"4"', 3
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
with result
Row json
1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"
This is really so beautiful. :)
– Cemo
Mar 23 at 21:33
add a comment |
Below is for BigQuery Standard SQL
#standardSQL
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
Note: above solution generic and does not require knowing in advance attribute names (like a, b, etc.) rather it parse and extract whatever it will find. Obviously it relies on assumption of simple jsons like in your examples
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
SELECT '"b":"4"', 2 UNION ALL
SELECT '"a":"4", "c":"5"', 1
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
with result
Row json
1 "a":"1", "b":"2", "c":"5"
Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
SELECT '"b":"4"', 2 UNION ALL
SELECT '"a":"4", "c":"5"', 1 UNION ALL
SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
SELECT '"abc":"3", "vwy":"4"', 3
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
with result
Row json
1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"
This is really so beautiful. :)
– Cemo
Mar 23 at 21:33
add a comment |
Below is for BigQuery Standard SQL
#standardSQL
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
Note: above solution generic and does not require knowing in advance attribute names (like a, b, etc.) rather it parse and extract whatever it will find. Obviously it relies on assumption of simple jsons like in your examples
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
SELECT '"b":"4"', 2 UNION ALL
SELECT '"a":"4", "c":"5"', 1
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
with result
Row json
1 "a":"1", "b":"2", "c":"5"
Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
SELECT '"b":"4"', 2 UNION ALL
SELECT '"a":"4", "c":"5"', 1 UNION ALL
SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
SELECT '"abc":"3", "vwy":"4"', 3
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
with result
Row json
1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"
Below is for BigQuery Standard SQL
#standardSQL
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
Note: above solution generic and does not require knowing in advance attribute names (like a, b, etc.) rather it parse and extract whatever it will find. Obviously it relies on assumption of simple jsons like in your examples
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
SELECT '"b":"4"', 2 UNION ALL
SELECT '"a":"4", "c":"5"', 1
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
with result
Row json
1 "a":"1", "b":"2", "c":"5"
Because (as I mentioned) it is generic enough - you can add rows with more attributes w/o changing code like in below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '"a":"1", "b":"2"' json, 3 t UNION ALL
SELECT '"b":"4"', 2 UNION ALL
SELECT '"a":"4", "c":"5"', 1 UNION ALL
SELECT '"abc":"1", "xyz":"2"', 3 UNION ALL
SELECT '"abc":"3", "vwy":"4"', 3
)
SELECT STRING_AGG(y, ', ' ORDER BY y) json
FROM (
SELECT STRING_AGG(TRIM(x) ORDER BY t DESC LIMIT 1) y
FROM `project.dataset.table`,
UNNEST(SPLIT(REGEXP_REPLACE(json, r'', ''))) x
GROUP BY TRIM(SPLIT(x, ':')[OFFSET(0)])
)
with result
Row json
1 "a":"1", "abc":"1", "b":"2", "c":"5", "vwy":"4", "xyz":"2"
answered Mar 22 at 11:27
Mikhail BerlyantMikhail Berlyant
63.3k43874
63.3k43874
This is really so beautiful. :)
– Cemo
Mar 23 at 21:33
add a comment |
This is really so beautiful. :)
– Cemo
Mar 23 at 21:33
This is really so beautiful. :)
– Cemo
Mar 23 at 21:33
This is really so beautiful. :)
– Cemo
Mar 23 at 21:33
add a comment |
There might be a better way, while the first idea occurred to me is:
- Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')
- Use JavaScript UDF to parse JSON / merge / output to string.
#standardSQL
CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
RETURNS STRING
LANGUAGE js
AS
"""
// TODO 1: split json string with '||||||' to get multiple parts
// . 2: parse each json parts into object
// 3: merge objects in your own way
// fake output, just to demonstrate the idea
var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
return JSON.stringify(obj);
""";
WITH
sample_data AS (
SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
UNION ALL
SELECT 'b:"4"' AS json, 2000 AS timestamp
UNION ALL
SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )
SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
FROM sample_data
GROUP BY timestamp
Output:

very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958
– Tamir Klein
Mar 22 at 8:41
Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already
– Mikhail Berlyant
Mar 22 at 11:34
I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.
– Yun Zhang
Mar 22 at 17:05
add a comment |
There might be a better way, while the first idea occurred to me is:
- Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')
- Use JavaScript UDF to parse JSON / merge / output to string.
#standardSQL
CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
RETURNS STRING
LANGUAGE js
AS
"""
// TODO 1: split json string with '||||||' to get multiple parts
// . 2: parse each json parts into object
// 3: merge objects in your own way
// fake output, just to demonstrate the idea
var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
return JSON.stringify(obj);
""";
WITH
sample_data AS (
SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
UNION ALL
SELECT 'b:"4"' AS json, 2000 AS timestamp
UNION ALL
SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )
SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
FROM sample_data
GROUP BY timestamp
Output:

very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958
– Tamir Klein
Mar 22 at 8:41
Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already
– Mikhail Berlyant
Mar 22 at 11:34
I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.
– Yun Zhang
Mar 22 at 17:05
add a comment |
There might be a better way, while the first idea occurred to me is:
- Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')
- Use JavaScript UDF to parse JSON / merge / output to string.
#standardSQL
CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
RETURNS STRING
LANGUAGE js
AS
"""
// TODO 1: split json string with '||||||' to get multiple parts
// . 2: parse each json parts into object
// 3: merge objects in your own way
// fake output, just to demonstrate the idea
var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
return JSON.stringify(obj);
""";
WITH
sample_data AS (
SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
UNION ALL
SELECT 'b:"4"' AS json, 2000 AS timestamp
UNION ALL
SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )
SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
FROM sample_data
GROUP BY timestamp
Output:

There might be a better way, while the first idea occurred to me is:
- Aggregate multiple json string using STRING_AGG with unique separator (like in below example: '||||||')
- Use JavaScript UDF to parse JSON / merge / output to string.
#standardSQL
CREATE TEMPORARY FUNCTION merge_json(json_string STRING)
RETURNS STRING
LANGUAGE js
AS
"""
// TODO 1: split json string with '||||||' to get multiple parts
// . 2: parse each json parts into object
// 3: merge objects in your own way
// fake output, just to demonstrate the idea
var obj = JSON.parse('"a":"1", "b":"2", "c":"5"')
return JSON.stringify(obj);
""";
WITH
sample_data AS (
SELECT 'a:"1", b:"2"' AS json, 1000 AS timestamp
UNION ALL
SELECT 'b:"4"' AS json, 2000 AS timestamp
UNION ALL
SELECT 'a:"4", c:"5"' AS json, 1000 AS timestamp )
SELECT timestamp, merge_json(STRING_AGG(json, '||||||')) as joined_json
FROM sample_data
GROUP BY timestamp
Output:

edited Mar 22 at 8:02
answered Mar 22 at 7:50
Yun ZhangYun Zhang
36710
36710
very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958
– Tamir Klein
Mar 22 at 8:41
Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already
– Mikhail Berlyant
Mar 22 at 11:34
I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.
– Yun Zhang
Mar 22 at 17:05
add a comment |
very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958
– Tamir Klein
Mar 22 at 8:41
Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already
– Mikhail Berlyant
Mar 22 at 11:34
I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.
– Yun Zhang
Mar 22 at 17:05
very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958
– Tamir Klein
Mar 22 at 8:41
very nice, more example of similar solution can be found in this question stackoverflow.com/q/52120182/1031958
– Tamir Klein
Mar 22 at 8:41
Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already
– Mikhail Berlyant
Mar 22 at 11:34
Fake output? Fake answers? Really - what is nice about it? Curious about this type of the answers! I saw few of such already
– Mikhail Berlyant
Mar 22 at 11:34
I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.
– Yun Zhang
Mar 22 at 17:05
I feel sorry about @MikhailBerlyant seems offended by a "fake" answer. While the consideration behind this answer is, it opens a door to handle much more complicated input with more complicated aggregation method. The original question mentioned it is to deal with "arbitrary documents" which can easily be nested data structure. REGEXP_REPLACE is going to have hard time to go further.
– Yun Zhang
Mar 22 at 17:05
add a comment |
Here is a possible solution using BigQuery standard SQL functions, with your data:
#standardSQL
WITH test AS (
SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
SELECT '"b":"4"' AS json, 2 AS t UNION ALL
SELECT '"a":"4", "c":"5"' AS json, 1 AS t
)
SELECT data_row, TO_JSON_STRING(data_row) AS json_row
FROM (
SELECT
ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
FROM(
SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
JSON_EXTRACT_SCALAR(json,'$.b') AS b,
JSON_EXTRACT_SCALAR(json,'$.c') AS c,
t
FROM test
)
) AS data_row
Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:
Row data_row.a data_row.b data_row.c json_row
1 1 2 5 "a":"1","b":"2","c":"5"
The problem with this query is that you have to specify all the documents (in this case, a,b,c).
Thanks. This is also beautiful.
– Cemo
Mar 23 at 21:34
add a comment |
Here is a possible solution using BigQuery standard SQL functions, with your data:
#standardSQL
WITH test AS (
SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
SELECT '"b":"4"' AS json, 2 AS t UNION ALL
SELECT '"a":"4", "c":"5"' AS json, 1 AS t
)
SELECT data_row, TO_JSON_STRING(data_row) AS json_row
FROM (
SELECT
ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
FROM(
SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
JSON_EXTRACT_SCALAR(json,'$.b') AS b,
JSON_EXTRACT_SCALAR(json,'$.c') AS c,
t
FROM test
)
) AS data_row
Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:
Row data_row.a data_row.b data_row.c json_row
1 1 2 5 "a":"1","b":"2","c":"5"
The problem with this query is that you have to specify all the documents (in this case, a,b,c).
Thanks. This is also beautiful.
– Cemo
Mar 23 at 21:34
add a comment |
Here is a possible solution using BigQuery standard SQL functions, with your data:
#standardSQL
WITH test AS (
SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
SELECT '"b":"4"' AS json, 2 AS t UNION ALL
SELECT '"a":"4", "c":"5"' AS json, 1 AS t
)
SELECT data_row, TO_JSON_STRING(data_row) AS json_row
FROM (
SELECT
ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
FROM(
SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
JSON_EXTRACT_SCALAR(json,'$.b') AS b,
JSON_EXTRACT_SCALAR(json,'$.c') AS c,
t
FROM test
)
) AS data_row
Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:
Row data_row.a data_row.b data_row.c json_row
1 1 2 5 "a":"1","b":"2","c":"5"
The problem with this query is that you have to specify all the documents (in this case, a,b,c).
Here is a possible solution using BigQuery standard SQL functions, with your data:
#standardSQL
WITH test AS (
SELECT '"a":"1", "b":"2"' AS json, 3 AS t UNION ALL
SELECT '"b":"4"' AS json, 2 AS t UNION ALL
SELECT '"a":"4", "c":"5"' AS json, 1 AS t
)
SELECT data_row, TO_JSON_STRING(data_row) AS json_row
FROM (
SELECT
ARRAY_TO_STRING(ARRAY_AGG(a IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS a,
ARRAY_TO_STRING(ARRAY_AGG(b IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS b,
ARRAY_TO_STRING(ARRAY_AGG(c IGNORE NULLS ORDER BY t DESC LIMIT 1),'') AS c
FROM(
SELECT JSON_EXTRACT_SCALAR(json,'$.a') AS a,
JSON_EXTRACT_SCALAR(json,'$.b') AS b,
JSON_EXTRACT_SCALAR(json,'$.c') AS c,
t
FROM test
)
) AS data_row
Notice that ARRAY_AGG is used only to find the latest no NULL value for each document, so it is transformed to STRING with ARRAY_TO_STRING. The result of this query is, which should be desired:
Row data_row.a data_row.b data_row.c json_row
1 1 2 5 "a":"1","b":"2","c":"5"
The problem with this query is that you have to specify all the documents (in this case, a,b,c).
answered Mar 22 at 10:24
enle linenle lin
1,021210
1,021210
Thanks. This is also beautiful.
– Cemo
Mar 23 at 21:34
add a comment |
Thanks. This is also beautiful.
– Cemo
Mar 23 at 21:34
Thanks. This is also beautiful.
– Cemo
Mar 23 at 21:34
Thanks. This is also beautiful.
– Cemo
Mar 23 at 21:34
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55294662%2fbigquery-merge-json-documents%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown