Checking if elements of a tweets array contain one of the elements of positive words array and count The Ask Question Wizard is Live! Data science time! April 2019 and salary with experiencecount occurances of each word in apache sparkPer-Document Word Count in SparkWord count using Spark and ScalaCount occurences of a word in a tupleCount words Scala and create a dictionnarycount words in string element in a tab delimited fileHow to count the elements in a column of arrays?Counting number of occurrences of Array element in a RDDunable to print scala word countSpark check if any words from array of dataframe is contained in another list?

What is the numbering system used for the DSN dishes?

Eigenvalues of the Laplacian of the directed De Bruijn graph

Will I be more secure with my own router behind my ISP's router?

Bright yellow or light yellow?

What's the difference between using dependency injection with a container and using a service locator?

Has a Nobel Peace laureate ever been accused of war crimes?

Protagonist's race is hidden - should I reveal it?

How can I wire a 9-position switch so that each position turns on one more LED than the one before?

How do I deal with an erroneously large refund?

Why would the Overseers waste their stock of slaves on the Game?

Specify the range of GridLines

Writing a T-SQL stored procedure to receive 4 numbers and insert them into a table

All ASCII characters with a given bit count

Was there ever a LEGO store in Miami International Airport?

Why do people think Winterfell crypts is the safest place for women, children & old people?

Married in secret, can marital status in passport be changed at a later date?

What's parked in Mil Moscow helicopter plant?

What is /etc/mtab in Linux?

Is a self contained air-bullet cartridge feasible?

false 'Security alert' from Google - every login generates mails from 'no-reply@accounts.google.com'

What were wait-states, and why was it only an issue for PCs?

Determinant of a matrix with 2 equal rows

Is it accepted to use working hours to read general interest books?

My admission is revoked after accepting the admission offer



Checking if elements of a tweets array contain one of the elements of positive words array and count



The Ask Question Wizard is Live!
Data science time! April 2019 and salary with experiencecount occurances of each word in apache sparkPer-Document Word Count in SparkWord count using Spark and ScalaCount occurences of a word in a tupleCount words Scala and create a dictionnarycount words in string element in a tab delimited fileHow to count the elements in a column of arrays?Counting number of occurrences of Array element in a RDDunable to print scala word countSpark check if any words from array of dataframe is contained in another list?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:



val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq)
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size)
for (f <- 0 until positive.size)
if (messages(e).contains(positive(f)))
happyCount=happyCount+1



print("nNumber of happy messages: " +happyCount)


enter image description here









share
























  • Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

    – Luis Miguel Mejía Suárez
    Mar 22 at 14:51











  • I editted my question

    – user10856854
    Mar 22 at 14:57

















0















We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:



val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq)
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size)
for (f <- 0 until positive.size)
if (messages(e).contains(positive(f)))
happyCount=happyCount+1



print("nNumber of happy messages: " +happyCount)


enter image description here









share
























  • Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

    – Luis Miguel Mejía Suárez
    Mar 22 at 14:51











  • I editted my question

    – user10856854
    Mar 22 at 14:57













0












0








0








We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:



val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq)
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size)
for (f <- 0 until positive.size)
if (messages(e).contains(positive(f)))
happyCount=happyCount+1



print("nNumber of happy messages: " +happyCount)


enter image description here









share
















We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:



val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq)
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size)
for (f <- 0 until positive.size)
if (messages(e).contains(positive(f)))
happyCount=happyCount+1



print("nNumber of happy messages: " +happyCount)


enter image description here







scala apache-spark





share














share












share



share








edited Mar 22 at 15:12

























asked Mar 22 at 14:48







user10856854



















  • Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

    – Luis Miguel Mejía Suárez
    Mar 22 at 14:51











  • I editted my question

    – user10856854
    Mar 22 at 14:57

















  • Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

    – Luis Miguel Mejía Suárez
    Mar 22 at 14:51











  • I editted my question

    – user10856854
    Mar 22 at 14:57
















Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

– Luis Miguel Mejía Suárez
Mar 22 at 14:51





Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

– Luis Miguel Mejía Suárez
Mar 22 at 14:51













I editted my question

– user10856854
Mar 22 at 14:57





I editted my question

– user10856854
Mar 22 at 14:57












1 Answer
1






active

oldest

votes


















0














This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.



val messages = tweetDF.select("msg").as[String]

val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))


val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())


I tested this code locally with:



import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")


And it worked.





share

























  • gives error org.apache.spark.SparkException: Task not serializable

    – user10856854
    Mar 22 at 15:03











  • Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:08











  • I added to my question

    – user10856854
    Mar 22 at 15:14











  • @büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:34

















1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.



val messages = tweetDF.select("msg").as[String]

val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))


val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())


I tested this code locally with:



import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")


And it worked.





share

























  • gives error org.apache.spark.SparkException: Task not serializable

    – user10856854
    Mar 22 at 15:03











  • Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:08











  • I added to my question

    – user10856854
    Mar 22 at 15:14











  • @büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:34















0














This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.



val messages = tweetDF.select("msg").as[String]

val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))


val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())


I tested this code locally with:



import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")


And it worked.





share

























  • gives error org.apache.spark.SparkException: Task not serializable

    – user10856854
    Mar 22 at 15:03











  • Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:08











  • I added to my question

    – user10856854
    Mar 22 at 15:14











  • @büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:34













0












0








0







This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.



val messages = tweetDF.select("msg").as[String]

val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))


val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())


I tested this code locally with:



import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")


And it worked.





share















This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.



val messages = tweetDF.select("msg").as[String]

val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))


val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())


I tested this code locally with:



import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")


And it worked.






share













share


share








edited Mar 22 at 15:32

























answered Mar 22 at 15:00









Luis Miguel Mejía SuárezLuis Miguel Mejía Suárez

2,87921023




2,87921023












  • gives error org.apache.spark.SparkException: Task not serializable

    – user10856854
    Mar 22 at 15:03











  • Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:08











  • I added to my question

    – user10856854
    Mar 22 at 15:14











  • @büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:34

















  • gives error org.apache.spark.SparkException: Task not serializable

    – user10856854
    Mar 22 at 15:03











  • Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:08











  • I added to my question

    – user10856854
    Mar 22 at 15:14











  • @büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:34
















gives error org.apache.spark.SparkException: Task not serializable

– user10856854
Mar 22 at 15:03





gives error org.apache.spark.SparkException: Task not serializable

– user10856854
Mar 22 at 15:03













Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

– Luis Miguel Mejía Suárez
Mar 22 at 15:08





Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

– Luis Miguel Mejía Suárez
Mar 22 at 15:08













I added to my question

– user10856854
Mar 22 at 15:14





I added to my question

– user10856854
Mar 22 at 15:14













@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

– Luis Miguel Mejía Suárez
Mar 22 at 15:34





@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

– Luis Miguel Mejía Suárez
Mar 22 at 15:34





Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript