Checking if elements of a tweets array contain one of the elements of positive words array and count The Ask Question Wizard is Live! Data science time! April 2019 and salary with experiencecount occurances of each word in apache sparkPer-Document Word Count in SparkWord count using Spark and ScalaCount occurences of a word in a tupleCount words Scala and create a dictionnarycount words in string element in a tab delimited fileHow to count the elements in a column of arrays?Counting number of occurrences of Array element in a RDDunable to print scala word countSpark check if any words from array of dataframe is contained in another list?

What is the numbering system used for the DSN dishes?

Eigenvalues of the Laplacian of the directed De Bruijn graph

Will I be more secure with my own router behind my ISP's router?

Bright yellow or light yellow?

What's the difference between using dependency injection with a container and using a service locator?

Has a Nobel Peace laureate ever been accused of war crimes?

Protagonist's race is hidden - should I reveal it?

How can I wire a 9-position switch so that each position turns on one more LED than the one before?

How do I deal with an erroneously large refund?

Why would the Overseers waste their stock of slaves on the Game?

Specify the range of GridLines

Writing a T-SQL stored procedure to receive 4 numbers and insert them into a table

All ASCII characters with a given bit count

Was there ever a LEGO store in Miami International Airport?

Why do people think Winterfell crypts is the safest place for women, children & old people?

Married in secret, can marital status in passport be changed at a later date?

What's parked in Mil Moscow helicopter plant?

What is /etc/mtab in Linux?

Is a self contained air-bullet cartridge feasible?

false 'Security alert' from Google - every login generates mails from 'no-reply@accounts.google.com'

What were wait-states, and why was it only an issue for PCs?

Determinant of a matrix with 2 equal rows

Is it accepted to use working hours to read general interest books?

My admission is revoked after accepting the admission offer



Checking if elements of a tweets array contain one of the elements of positive words array and count



The Ask Question Wizard is Live!
Data science time! April 2019 and salary with experiencecount occurances of each word in apache sparkPer-Document Word Count in SparkWord count using Spark and ScalaCount occurences of a word in a tupleCount words Scala and create a dictionnarycount words in string element in a tab delimited fileHow to count the elements in a column of arrays?Counting number of occurrences of Array element in a RDDunable to print scala word countSpark check if any words from array of dataframe is contained in another list?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:



val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq)
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size)
for (f <- 0 until positive.size)
if (messages(e).contains(positive(f)))
happyCount=happyCount+1



print("nNumber of happy messages: " +happyCount)


enter image description here









share
























  • Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

    – Luis Miguel Mejía Suárez
    Mar 22 at 14:51











  • I editted my question

    – user10856854
    Mar 22 at 14:57

















0















We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:



val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq)
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size)
for (f <- 0 until positive.size)
if (messages(e).contains(positive(f)))
happyCount=happyCount+1



print("nNumber of happy messages: " +happyCount)


enter image description here









share
























  • Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

    – Luis Miguel Mejía Suárez
    Mar 22 at 14:51











  • I editted my question

    – user10856854
    Mar 22 at 14:57













0












0








0








We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:



val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq)
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size)
for (f <- 0 until positive.size)
if (messages(e).contains(positive(f)))
happyCount=happyCount+1



print("nNumber of happy messages: " +happyCount)


enter image description here









share
















We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:



val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq)
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size)
for (f <- 0 until positive.size)
if (messages(e).contains(positive(f)))
happyCount=happyCount+1



print("nNumber of happy messages: " +happyCount)


enter image description here







scala apache-spark





share














share












share



share








edited Mar 22 at 15:12

























asked Mar 22 at 14:48







user10856854



















  • Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

    – Luis Miguel Mejía Suárez
    Mar 22 at 14:51











  • I editted my question

    – user10856854
    Mar 22 at 14:57

















  • Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

    – Luis Miguel Mejía Suárez
    Mar 22 at 14:51











  • I editted my question

    – user10856854
    Mar 22 at 14:57
















Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

– Luis Miguel Mejía Suárez
Mar 22 at 14:51





Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

– Luis Miguel Mejía Suárez
Mar 22 at 14:51













I editted my question

– user10856854
Mar 22 at 14:57





I editted my question

– user10856854
Mar 22 at 14:57












1 Answer
1






active

oldest

votes


















0














This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.



val messages = tweetDF.select("msg").as[String]

val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))


val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())


I tested this code locally with:



import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")


And it worked.





share

























  • gives error org.apache.spark.SparkException: Task not serializable

    – user10856854
    Mar 22 at 15:03











  • Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:08











  • I added to my question

    – user10856854
    Mar 22 at 15:14











  • @büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:34

















1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.



val messages = tweetDF.select("msg").as[String]

val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))


val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())


I tested this code locally with:



import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")


And it worked.





share

























  • gives error org.apache.spark.SparkException: Task not serializable

    – user10856854
    Mar 22 at 15:03











  • Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:08











  • I added to my question

    – user10856854
    Mar 22 at 15:14











  • @büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:34















0














This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.



val messages = tweetDF.select("msg").as[String]

val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))


val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())


I tested this code locally with:



import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")


And it worked.





share

























  • gives error org.apache.spark.SparkException: Task not serializable

    – user10856854
    Mar 22 at 15:03











  • Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:08











  • I added to my question

    – user10856854
    Mar 22 at 15:14











  • @büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:34













0












0








0







This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.



val messages = tweetDF.select("msg").as[String]

val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))


val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())


I tested this code locally with:



import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")


And it worked.





share















This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.



val messages = tweetDF.select("msg").as[String]

val positiveWords =
Source
.fromFile("/home/teslavm/positive.txt")
.getLines
.toList
.map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean =
val _message = message.toLowerCase
positiveWords.exists(word => _message.contains(word))


val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())


I tested this code locally with:



import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
(1, "Yes I am happy"),
(2, "Sadness is a way of life"),
(3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")


And it worked.






share













share


share








edited Mar 22 at 15:32

























answered Mar 22 at 15:00









Luis Miguel Mejía SuárezLuis Miguel Mejía Suárez

2,87921023




2,87921023












  • gives error org.apache.spark.SparkException: Task not serializable

    – user10856854
    Mar 22 at 15:03











  • Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:08











  • I added to my question

    – user10856854
    Mar 22 at 15:14











  • @büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:34

















  • gives error org.apache.spark.SparkException: Task not serializable

    – user10856854
    Mar 22 at 15:03











  • Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:08











  • I added to my question

    – user10856854
    Mar 22 at 15:14











  • @büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

    – Luis Miguel Mejía Suárez
    Mar 22 at 15:34
















gives error org.apache.spark.SparkException: Task not serializable

– user10856854
Mar 22 at 15:03





gives error org.apache.spark.SparkException: Task not serializable

– user10856854
Mar 22 at 15:03













Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

– Luis Miguel Mejía Suárez
Mar 22 at 15:08





Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

– Luis Miguel Mejía Suárez
Mar 22 at 15:08













I added to my question

– user10856854
Mar 22 at 15:14





I added to my question

– user10856854
Mar 22 at 15:14













@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

– Luis Miguel Mejía Suárez
Mar 22 at 15:34





@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

– Luis Miguel Mejía Suárez
Mar 22 at 15:34





Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현