Checking if elements of a tweets array contain one of the elements of positive words array and count The Ask Question Wizard is Live! Data science time! April 2019 and salary with experiencecount occurances of each word in apache sparkPer-Document Word Count in SparkWord count using Spark and ScalaCount occurences of a word in a tupleCount words Scala and create a dictionnarycount words in string element in a tab delimited fileHow to count the elements in a column of arrays?Counting number of occurrences of Array element in a RDDunable to print scala word countSpark check if any words from array of dataframe is contained in another list?
What is the numbering system used for the DSN dishes?
Eigenvalues of the Laplacian of the directed De Bruijn graph
Will I be more secure with my own router behind my ISP's router?
Bright yellow or light yellow?
What's the difference between using dependency injection with a container and using a service locator?
Has a Nobel Peace laureate ever been accused of war crimes?
Protagonist's race is hidden - should I reveal it?
How can I wire a 9-position switch so that each position turns on one more LED than the one before?
How do I deal with an erroneously large refund?
Why would the Overseers waste their stock of slaves on the Game?
Specify the range of GridLines
Writing a T-SQL stored procedure to receive 4 numbers and insert them into a table
All ASCII characters with a given bit count
Was there ever a LEGO store in Miami International Airport?
Why do people think Winterfell crypts is the safest place for women, children & old people?
Married in secret, can marital status in passport be changed at a later date?
What's parked in Mil Moscow helicopter plant?
What is /etc/mtab in Linux?
Is a self contained air-bullet cartridge feasible?
false 'Security alert' from Google - every login generates mails from 'no-reply@accounts.google.com'
What were wait-states, and why was it only an issue for PCs?
Determinant of a matrix with 2 equal rows
Is it accepted to use working hours to read general interest books?
My admission is revoked after accepting the admission offer
Checking if elements of a tweets array contain one of the elements of positive words array and count
The Ask Question Wizard is Live!
Data science time! April 2019 and salary with experiencecount occurances of each word in apache sparkPer-Document Word Count in SparkWord count using Spark and ScalaCount occurences of a word in a tupleCount words Scala and create a dictionnarycount words in string element in a tab delimited fileHow to count the elements in a column of arrays?Counting number of occurrences of Array element in a RDDunable to print scala word countSpark check if any words from array of dataframe is contained in another list?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq)
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size)
for (f <- 0 until positive.size)
if (messages(e).contains(positive(f)))
happyCount=happyCount+1
print("nNumber of happy messages: " +happyCount)
scala apache-spark
|
We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq)
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size)
for (f <- 0 until positive.size)
if (messages(e).contains(positive(f)))
happyCount=happyCount+1
print("nNumber of happy messages: " +happyCount)
scala apache-spark
Which error are you getting? BTW, it is not recommended to callcollect
on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.
– Luis Miguel Mejía Suárez
Mar 22 at 14:51
I editted my question
– user10856854
Mar 22 at 14:57
|
We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq)
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size)
for (f <- 0 until positive.size)
if (messages(e).contains(positive(f)))
happyCount=happyCount+1
print("nNumber of happy messages: " +happyCount)
scala apache-spark
We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq)
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size)
for (f <- 0 until positive.size)
if (messages(e).contains(positive(f)))
happyCount=happyCount+1
print("nNumber of happy messages: " +happyCount)
scala apache-spark
scala apache-spark
edited Mar 22 at 15:12
asked Mar 22 at 14:48
user10856854
Which error are you getting? BTW, it is not recommended to callcollect
on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.
– Luis Miguel Mejía Suárez
Mar 22 at 14:51
I editted my question
– user10856854
Mar 22 at 14:57
|
Which error are you getting? BTW, it is not recommended to callcollect
on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.
– Luis Miguel Mejía Suárez
Mar 22 at 14:51
I editted my question
– user10856854
Mar 22 at 14:57
Which error are you getting? BTW, it is not recommended to call
collect
on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.– Luis Miguel Mejía Suárez
Mar 22 at 14:51
Which error are you getting? BTW, it is not recommended to call
collect
on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.– Luis Miguel Mejía Suárez
Mar 22 at 14:51
I editted my question
– user10856854
Mar 22 at 14:57
I editted my question
– user10856854
Mar 22 at 14:57
|
1 Answer
1
active
oldest
votes
This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.
val messages = tweetDF.select("msg").as[String]
val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)
def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))
val positiveMessages = messages.filter(hasPositiveWords _)
println(positiveMessages.count())
I tested this code locally with:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._
val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")
val positiveWords = List("yes", "happy")
And it worked.
gives error org.apache.spark.SparkException: Task not serializable
– user10856854
Mar 22 at 15:03
Can you provide a MCVE of how to create thetweetDF
so I can test the code myself. It could be just theshow
of your actual DF.
– Luis Miguel Mejía Suárez
Mar 22 at 15:08
I added to my question
– user10856854
Mar 22 at 15:14
@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace thepositiveWords
list with the one reading from a file, it should work too.
– Luis Miguel Mejía Suárez
Mar 22 at 15:34
|
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.
val messages = tweetDF.select("msg").as[String]
val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)
def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))
val positiveMessages = messages.filter(hasPositiveWords _)
println(positiveMessages.count())
I tested this code locally with:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._
val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")
val positiveWords = List("yes", "happy")
And it worked.
gives error org.apache.spark.SparkException: Task not serializable
– user10856854
Mar 22 at 15:03
Can you provide a MCVE of how to create thetweetDF
so I can test the code myself. It could be just theshow
of your actual DF.
– Luis Miguel Mejía Suárez
Mar 22 at 15:08
I added to my question
– user10856854
Mar 22 at 15:14
@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace thepositiveWords
list with the one reading from a file, it should work too.
– Luis Miguel Mejía Suárez
Mar 22 at 15:34
|
This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.
val messages = tweetDF.select("msg").as[String]
val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)
def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))
val positiveMessages = messages.filter(hasPositiveWords _)
println(positiveMessages.count())
I tested this code locally with:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._
val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")
val positiveWords = List("yes", "happy")
And it worked.
gives error org.apache.spark.SparkException: Task not serializable
– user10856854
Mar 22 at 15:03
Can you provide a MCVE of how to create thetweetDF
so I can test the code myself. It could be just theshow
of your actual DF.
– Luis Miguel Mejía Suárez
Mar 22 at 15:08
I added to my question
– user10856854
Mar 22 at 15:14
@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace thepositiveWords
list with the one reading from a file, it should work too.
– Luis Miguel Mejía Suárez
Mar 22 at 15:34
|
This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.
val messages = tweetDF.select("msg").as[String]
val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)
def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))
val positiveMessages = messages.filter(hasPositiveWords _)
println(positiveMessages.count())
I tested this code locally with:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._
val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")
val positiveWords = List("yes", "happy")
And it worked.
This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.
val messages = tweetDF.select("msg").as[String]
val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)
def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))
val positiveMessages = messages.filter(hasPositiveWords _)
println(positiveMessages.count())
I tested this code locally with:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._
val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")
val positiveWords = List("yes", "happy")
And it worked.
edited Mar 22 at 15:32
answered Mar 22 at 15:00
Luis Miguel Mejía SuárezLuis Miguel Mejía Suárez
2,87921023
2,87921023
gives error org.apache.spark.SparkException: Task not serializable
– user10856854
Mar 22 at 15:03
Can you provide a MCVE of how to create thetweetDF
so I can test the code myself. It could be just theshow
of your actual DF.
– Luis Miguel Mejía Suárez
Mar 22 at 15:08
I added to my question
– user10856854
Mar 22 at 15:14
@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace thepositiveWords
list with the one reading from a file, it should work too.
– Luis Miguel Mejía Suárez
Mar 22 at 15:34
|
gives error org.apache.spark.SparkException: Task not serializable
– user10856854
Mar 22 at 15:03
Can you provide a MCVE of how to create thetweetDF
so I can test the code myself. It could be just theshow
of your actual DF.
– Luis Miguel Mejía Suárez
Mar 22 at 15:08
I added to my question
– user10856854
Mar 22 at 15:14
@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace thepositiveWords
list with the one reading from a file, it should work too.
– Luis Miguel Mejía Suárez
Mar 22 at 15:34
gives error org.apache.spark.SparkException: Task not serializable
– user10856854
Mar 22 at 15:03
gives error org.apache.spark.SparkException: Task not serializable
– user10856854
Mar 22 at 15:03
Can you provide a MCVE of how to create the
tweetDF
so I can test the code myself. It could be just the show
of your actual DF.– Luis Miguel Mejía Suárez
Mar 22 at 15:08
Can you provide a MCVE of how to create the
tweetDF
so I can test the code myself. It could be just the show
of your actual DF.– Luis Miguel Mejía Suárez
Mar 22 at 15:08
I added to my question
– user10856854
Mar 22 at 15:14
I added to my question
– user10856854
Mar 22 at 15:14
@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the
positiveWords
list with the one reading from a file, it should work too.– Luis Miguel Mejía Suárez
Mar 22 at 15:34
@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the
positiveWords
list with the one reading from a file, it should work too.– Luis Miguel Mejía Suárez
Mar 22 at 15:34
|
Which error are you getting? BTW, it is not recommended to call
collect
on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.– Luis Miguel Mejía Suárez
Mar 22 at 14:51
I editted my question
– user10856854
Mar 22 at 14:57