Avro data is not converted SparkConvert Xml to Avro from Kafka to hdfs via spark streaming or flumeConverting data into Parquet in SparkConverting Json string to Avro in scala results in ClassCastExceptionAvro format deserialization in Spark structured streamConvert a spark dataframe Row to Avro and publish to kakfaStoring Kafka Avro serialzed data to parquet file using Spark streamingConsume Avro event in spark streaming and create data-frameUnable to deserialize avro data using direct kafka stream in spark streamingHow can I set a logicalType in a spark-avro 2.4 schema?
What Linux Kernel runs Ubuntu 18.04.3 LTS
Is there a commercial liquid with refractive index greater than n=2?
Playing a fast but quiet Alberti bass
Are there any OR challenges that are similar to kaggle's competitions?
How to fix Sprinkles in rendering?
What happened after the end of the Truman Show?
Can I check a small array of bools in one go?
How can I train a replacement without them knowing?
Indirect speech - breaking the rules of it
Earliest evidence of objects intended for future archaeologists?
Starships without computers?
From France west coast to Portugal via ship?
Tabularx with hline and overrightarrow vertical spacing
What causes burn marks on the air handler in the attic?
Why don't politicians push for fossil fuel reduction by pointing out their scarcity?
Can sulfuric acid itself be electrolysed?
What is "super" in superphosphate?
Why do aircraft leave cruising altitude long before landing just to circle?
What is the evidence on the danger of feeding whole blueberries and grapes to infants and toddlers?
How to translate 脑袋短路 into English?
How to detect a failed AES256 decryption programmatically?
Build a mob of suspiciously happy lenny faces ( ͡° ͜ʖ ͡°)
Best model for precedence constraints within scheduling problem
Is recepted a word?
Avro data is not converted Spark
Convert Xml to Avro from Kafka to hdfs via spark streaming or flumeConverting data into Parquet in SparkConverting Json string to Avro in scala results in ClassCastExceptionAvro format deserialization in Spark structured streamConvert a spark dataframe Row to Avro and publish to kakfaStoring Kafka Avro serialzed data to parquet file using Spark streamingConsume Avro event in spark streaming and create data-frameUnable to deserialize avro data using direct kafka stream in spark streamingHow can I set a logicalType in a spark-avro 2.4 schema?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have written one of the Spark data frame columns into Kafka in Avro format. Then I try to read the data from this topic and convert from Avro to the data frame column. The type of the data is a timestamp and instead of the timestamps from the database, I get some default values:
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
The same behavior can be noticed with columns of other data types, like String. Initial timestamp value looks like this and this the result I want to obtain:
2019-03-19 12:26:03.003
2019-03-19 12:26:09
2019-03-19 12:27:04.003
2019-03-19 12:27:08.007
2019-03-19 12:28:01.013
2019-03-19 12:28:05.007
2019-03-19 12:28:09.023
2019-03-19 12:29:04.003
2019-03-19 12:29:07.047
2019-03-19 12:30:00.003
And here is the same data after conversion to Avro:
00 F0 E1 9B BC B3 9C C2 05
00 80 E9 F7 C1 B3 9C C2 05
00 F0 86 B2 F6 B3 9C C2 05
00 B0 E9 9A FA B3 9C C2 05
00 90 A4 E1 AC B4 9C C2 05
00 B0 EA C8 B0 B4 9C C2 05
00 B0 88 B3 B4 B4 9C C2 05
00 F0 BE EA E8 B4 9C C2 05
00 B0 89 DE EB B4 9C C2 05
00 F0 B6 9E 9E B5 9C C2 05
What can I do to fix this conversion problem?
The code for writing Avro into Kafka, reading it and converting back to the data frame. I tried to use to_avro and from_avro Spark-avro methods:
import org.apache.spark.sql.avro._
val castDF = testDataDF.select(to_avro(testDataDF.col("update_database_time")) as 'value)
castDF
.write
.format("kafka")
.option("kafka.bootstrap.servers", bootstrapServers)
.option("topic", "app_state_test")
.save()
val cachedDf = spark
.read
.format("kafka")
.option("kafka.bootstrap.servers", bootstrapServers)
.option("subscribe", "app_state_test")
.load()
val jsonSchema = ""name": "update_database_time", "type": "long", "logicalType": "timestamp-millis", "default": "NONE""
cachedDf.select(from_avro(cachedDf.col("value"), jsonSchema) as 'test)
apache-spark apache-kafka spark-avro
add a comment |
I have written one of the Spark data frame columns into Kafka in Avro format. Then I try to read the data from this topic and convert from Avro to the data frame column. The type of the data is a timestamp and instead of the timestamps from the database, I get some default values:
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
The same behavior can be noticed with columns of other data types, like String. Initial timestamp value looks like this and this the result I want to obtain:
2019-03-19 12:26:03.003
2019-03-19 12:26:09
2019-03-19 12:27:04.003
2019-03-19 12:27:08.007
2019-03-19 12:28:01.013
2019-03-19 12:28:05.007
2019-03-19 12:28:09.023
2019-03-19 12:29:04.003
2019-03-19 12:29:07.047
2019-03-19 12:30:00.003
And here is the same data after conversion to Avro:
00 F0 E1 9B BC B3 9C C2 05
00 80 E9 F7 C1 B3 9C C2 05
00 F0 86 B2 F6 B3 9C C2 05
00 B0 E9 9A FA B3 9C C2 05
00 90 A4 E1 AC B4 9C C2 05
00 B0 EA C8 B0 B4 9C C2 05
00 B0 88 B3 B4 B4 9C C2 05
00 F0 BE EA E8 B4 9C C2 05
00 B0 89 DE EB B4 9C C2 05
00 F0 B6 9E 9E B5 9C C2 05
What can I do to fix this conversion problem?
The code for writing Avro into Kafka, reading it and converting back to the data frame. I tried to use to_avro and from_avro Spark-avro methods:
import org.apache.spark.sql.avro._
val castDF = testDataDF.select(to_avro(testDataDF.col("update_database_time")) as 'value)
castDF
.write
.format("kafka")
.option("kafka.bootstrap.servers", bootstrapServers)
.option("topic", "app_state_test")
.save()
val cachedDf = spark
.read
.format("kafka")
.option("kafka.bootstrap.servers", bootstrapServers)
.option("subscribe", "app_state_test")
.load()
val jsonSchema = ""name": "update_database_time", "type": "long", "logicalType": "timestamp-millis", "default": "NONE""
cachedDf.select(from_avro(cachedDf.col("value"), jsonSchema) as 'test)
apache-spark apache-kafka spark-avro
add a comment |
I have written one of the Spark data frame columns into Kafka in Avro format. Then I try to read the data from this topic and convert from Avro to the data frame column. The type of the data is a timestamp and instead of the timestamps from the database, I get some default values:
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
The same behavior can be noticed with columns of other data types, like String. Initial timestamp value looks like this and this the result I want to obtain:
2019-03-19 12:26:03.003
2019-03-19 12:26:09
2019-03-19 12:27:04.003
2019-03-19 12:27:08.007
2019-03-19 12:28:01.013
2019-03-19 12:28:05.007
2019-03-19 12:28:09.023
2019-03-19 12:29:04.003
2019-03-19 12:29:07.047
2019-03-19 12:30:00.003
And here is the same data after conversion to Avro:
00 F0 E1 9B BC B3 9C C2 05
00 80 E9 F7 C1 B3 9C C2 05
00 F0 86 B2 F6 B3 9C C2 05
00 B0 E9 9A FA B3 9C C2 05
00 90 A4 E1 AC B4 9C C2 05
00 B0 EA C8 B0 B4 9C C2 05
00 B0 88 B3 B4 B4 9C C2 05
00 F0 BE EA E8 B4 9C C2 05
00 B0 89 DE EB B4 9C C2 05
00 F0 B6 9E 9E B5 9C C2 05
What can I do to fix this conversion problem?
The code for writing Avro into Kafka, reading it and converting back to the data frame. I tried to use to_avro and from_avro Spark-avro methods:
import org.apache.spark.sql.avro._
val castDF = testDataDF.select(to_avro(testDataDF.col("update_database_time")) as 'value)
castDF
.write
.format("kafka")
.option("kafka.bootstrap.servers", bootstrapServers)
.option("topic", "app_state_test")
.save()
val cachedDf = spark
.read
.format("kafka")
.option("kafka.bootstrap.servers", bootstrapServers)
.option("subscribe", "app_state_test")
.load()
val jsonSchema = ""name": "update_database_time", "type": "long", "logicalType": "timestamp-millis", "default": "NONE""
cachedDf.select(from_avro(cachedDf.col("value"), jsonSchema) as 'test)
apache-spark apache-kafka spark-avro
I have written one of the Spark data frame columns into Kafka in Avro format. Then I try to read the data from this topic and convert from Avro to the data frame column. The type of the data is a timestamp and instead of the timestamps from the database, I get some default values:
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
1970-01-01 00:00:00
The same behavior can be noticed with columns of other data types, like String. Initial timestamp value looks like this and this the result I want to obtain:
2019-03-19 12:26:03.003
2019-03-19 12:26:09
2019-03-19 12:27:04.003
2019-03-19 12:27:08.007
2019-03-19 12:28:01.013
2019-03-19 12:28:05.007
2019-03-19 12:28:09.023
2019-03-19 12:29:04.003
2019-03-19 12:29:07.047
2019-03-19 12:30:00.003
And here is the same data after conversion to Avro:
00 F0 E1 9B BC B3 9C C2 05
00 80 E9 F7 C1 B3 9C C2 05
00 F0 86 B2 F6 B3 9C C2 05
00 B0 E9 9A FA B3 9C C2 05
00 90 A4 E1 AC B4 9C C2 05
00 B0 EA C8 B0 B4 9C C2 05
00 B0 88 B3 B4 B4 9C C2 05
00 F0 BE EA E8 B4 9C C2 05
00 B0 89 DE EB B4 9C C2 05
00 F0 B6 9E 9E B5 9C C2 05
What can I do to fix this conversion problem?
The code for writing Avro into Kafka, reading it and converting back to the data frame. I tried to use to_avro and from_avro Spark-avro methods:
import org.apache.spark.sql.avro._
val castDF = testDataDF.select(to_avro(testDataDF.col("update_database_time")) as 'value)
castDF
.write
.format("kafka")
.option("kafka.bootstrap.servers", bootstrapServers)
.option("topic", "app_state_test")
.save()
val cachedDf = spark
.read
.format("kafka")
.option("kafka.bootstrap.servers", bootstrapServers)
.option("subscribe", "app_state_test")
.load()
val jsonSchema = ""name": "update_database_time", "type": "long", "logicalType": "timestamp-millis", "default": "NONE""
cachedDf.select(from_avro(cachedDf.col("value"), jsonSchema) as 'test)
apache-spark apache-kafka spark-avro
apache-spark apache-kafka spark-avro
asked Mar 27 at 13:49
CassieCassie
82615 silver badges28 bronze badges
82615 silver badges28 bronze badges
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55378835%2favro-data-is-not-converted-spark%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.
Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55378835%2favro-data-is-not-converted-spark%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown