Checking if elements of a tweets array contain one of the elements of positive words array and count The Ask Question Wizard is Live! Data science time! April 2019 and salary with experiencecount occurances of each word in apache sparkPer-Document Word Count in SparkWord count using Spark and ScalaCount occurences of a word in a tupleCount words Scala and create a dictionnarycount words in string element in a tab delimited fileHow to count the elements in a column of arrays?Counting number of occurrences of Array element in a RDDunable to print scala word countSpark check if any words from array of dataframe is contained in another list?

What is the numbering system used for the DSN dishes?

Eigenvalues of the Laplacian of the directed De Bruijn graph

Will I be more secure with my own router behind my ISP's router?

Bright yellow or light yellow?

What's the difference between using dependency injection with a container and using a service locator?

Has a Nobel Peace laureate ever been accused of war crimes?

Protagonist's race is hidden - should I reveal it?

How can I wire a 9-position switch so that each position turns on one more LED than the one before?

How do I deal with an erroneously large refund?

Why would the Overseers waste their stock of slaves on the Game?

Specify the range of GridLines

Writing a T-SQL stored procedure to receive 4 numbers and insert them into a table

All ASCII characters with a given bit count

Was there ever a LEGO store in Miami International Airport?

Why do people think Winterfell crypts is the safest place for women, children & old people?

Married in secret, can marital status in passport be changed at a later date?

What's parked in Mil Moscow helicopter plant?

What is /etc/mtab in Linux?

Is a self contained air-bullet cartridge feasible?

false 'Security alert' from Google - every login generates mails from 'no-reply@accounts.google.com'

What were wait-states, and why was it only an issue for PCs?

Determinant of a matrix with 2 equal rows

Is it accepted to use working hours to read general interest books?

My admission is revoked after accepting the admission offer

Checking if elements of a tweets array contain one of the elements of positive words array and count

The Ask Question Wizard is Live!

Data science time! April 2019 and salary with experiencecount occurances of each word in apache sparkPer-Document Word Count in SparkWord count using Spark and ScalaCount occurences of a word in a tupleCount words Scala and create a dictionnarycount words in string element in a tab delimited fileHow to count the elements in a column of arrays?Counting number of occurrences of Array element in a RDDunable to print scala word countSpark check if any words from array of dataframe is contained in another list?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array consisting of positive words. But we cannot count the number of tweets containing one of those positive words. We tried these and we get 1 as result. It must be more than 1. Apparently it did not count:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq) 
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size) 
 for (f <- 0 until positive.size) 
 if (messages(e).contains(positive(f)))
 happyCount=happyCount+1
 
 

print("nNumber of happy messages: " +happyCount)

enter image description here

edited Mar 22 at 15:12

asked Mar 22 at 14:48

user10856854

Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

– Luis Miguel Mejía Suárez
Mar 22 at 14:51

I editted my question

– user10856854
Mar 22 at 14:57

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq) 
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size) 
 for (f <- 0 until positive.size) 
 if (messages(e).contains(positive(f)))
 happyCount=happyCount+1
 
 

print("nNumber of happy messages: " +happyCount)

enter image description here

edited Mar 22 at 15:12

asked Mar 22 at 14:48

user10856854

Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

– Luis Miguel Mejía Suárez
Mar 22 at 14:51

I editted my question

– user10856854
Mar 22 at 14:57

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq) 
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size) 
 for (f <- 0 until positive.size) 
 if (messages(e).contains(positive(f)))
 happyCount=happyCount+1
 
 

print("nNumber of happy messages: " +happyCount)

enter image description here

edited Mar 22 at 15:12

asked Mar 22 at 14:48

user10856854

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
var tweetDF = sqlContext.read.json("hdfs:///sandbox/tutorial-files/770/tweets_staging/*")
tweetDF.show()
var messages = tweetDF.select("msg").collect.map(_.toSeq) 
println("Total messages: " + messages.size)
val positive = Source.fromFile("/home/teslavm/positive.txt").getLines.toArray
var happyCount=0
for (e <- 0 until messages.size) 
 for (f <- 0 until positive.size) 
 if (messages(e).contains(positive(f)))
 happyCount=happyCount+1
 
 

print("nNumber of happy messages: " +happyCount)

enter image description here

scala apache-spark

edited Mar 22 at 15:12

asked Mar 22 at 14:48

user10856854

edited Mar 22 at 15:12

asked Mar 22 at 14:48

user10856854

edited Mar 22 at 15:12

asked Mar 22 at 14:48

user10856854

asked Mar 22 at 14:48

user10856854

asked Mar 22 at 14:48

user10856854

Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

– Luis Miguel Mejía Suárez
Mar 22 at 14:51

I editted my question

– user10856854
Mar 22 at 14:57

Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

– Luis Miguel Mejía Suárez
Mar 22 at 14:51

I editted my question

– user10856854
Mar 22 at 14:57

Which error are you getting? BTW, it is not recommended to call collect on Spark, you lose all the advantages of distributed computing and if the dataset is pretty big you would blow out the memory.

– Luis Miguel Mejía Suárez
Mar 22 at 14:51

I editted my question

– user10856854
Mar 22 at 14:57

1 Answer
1

active

oldest

votes

This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.

val messages = tweetDF.select("msg").as[String]

val positiveWords =
 Source
 .fromFile("/home/teslavm/positive.txt")
 .getLines
 .toList
 .map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean = 
 val _message = message.toLowerCase
 positiveWords.exists(word => _message.contains(word))
 

val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())

I tested this code locally with:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
 (1, "Yes I am happy"),
 (2, "Sadness is a way of life"),
 (3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")

And it worked.

edited Mar 22 at 15:32

answered Mar 22 at 15:00

Luis Miguel Mejía Suárez

2,87921023

gives error org.apache.spark.SparkException: Task not serializable

– user10856854
Mar 22 at 15:03

Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

– Luis Miguel Mejía Suárez
Mar 22 at 15:08

I added to my question

– user10856854
Mar 22 at 15:14

@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

– Luis Miguel Mejía Suárez
Mar 22 at 15:34

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.

val messages = tweetDF.select("msg").as[String]

val positiveWords =
 Source
 .fromFile("/home/teslavm/positive.txt")
 .getLines
 .toList
 .map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean = 
 val _message = message.toLowerCase
 positiveWords.exists(word => _message.contains(word))
 

val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())

I tested this code locally with:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
 (1, "Yes I am happy"),
 (2, "Sadness is a way of life"),
 (3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")

And it worked.

edited Mar 22 at 15:32

answered Mar 22 at 15:00

Luis Miguel Mejía Suárez

2,87921023

gives error org.apache.spark.SparkException: Task not serializable

– user10856854
Mar 22 at 15:03

Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

– Luis Miguel Mejía Suárez
Mar 22 at 15:08

I added to my question

– user10856854
Mar 22 at 15:14

@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

– Luis Miguel Mejía Suárez
Mar 22 at 15:34

This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.

val messages = tweetDF.select("msg").as[String]

val positiveWords =
 Source
 .fromFile("/home/teslavm/positive.txt")
 .getLines
 .toList
 .map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean = 
 val _message = message.toLowerCase
 positiveWords.exists(word => _message.contains(word))
 

val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())

I tested this code locally with:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
 (1, "Yes I am happy"),
 (2, "Sadness is a way of life"),
 (3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")

And it worked.

edited Mar 22 at 15:32

answered Mar 22 at 15:00

Luis Miguel Mejía Suárez

2,87921023

gives error org.apache.spark.SparkException: Task not serializable

– user10856854
Mar 22 at 15:03

Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

– Luis Miguel Mejía Suárez
Mar 22 at 15:08

I added to my question

– user10856854
Mar 22 at 15:14

@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

– Luis Miguel Mejía Suárez
Mar 22 at 15:34

This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.

val messages = tweetDF.select("msg").as[String]

val positiveWords =
 Source
 .fromFile("/home/teslavm/positive.txt")
 .getLines
 .toList
 .map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean = 
 val _message = message.toLowerCase
 positiveWords.exists(word => _message.contains(word))
 

val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())

I tested this code locally with:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
 (1, "Yes I am happy"),
 (2, "Sadness is a way of life"),
 (3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")

And it worked.

edited Mar 22 at 15:32

answered Mar 22 at 15:00

Luis Miguel Mejía Suárez

2,87921023

This should work.
It has the advantage that you do not have to collect the result, as well as being more functional.

val messages = tweetDF.select("msg").as[String]

val positiveWords =
 Source
 .fromFile("/home/teslavm/positive.txt")
 .getLines
 .toList
 .map(word => word.toLowerCase)

def hasPositiveWords(message: String): Boolean = 
 val _message = message.toLowerCase
 positiveWords.exists(word => _message.contains(word))
 

val positiveMessages = messages.filter(hasPositiveWords _)

println(positiveMessages.count())

I tested this code locally with:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.master("local[*]").getOrCreate()
import spark.implicits._

val tweetDF = List(
 (1, "Yes I am happy"),
 (2, "Sadness is a way of life"),
 (3, "No, no, no, no, yes")
).toDF("id", "msg")

val positiveWords = List("yes", "happy")

And it worked.

edited Mar 22 at 15:32

answered Mar 22 at 15:00

Luis Miguel Mejía Suárez

2,87921023

edited Mar 22 at 15:32

answered Mar 22 at 15:00

Luis Miguel Mejía Suárez

2,87921023

answered Mar 22 at 15:00

Luis Miguel Mejía Suárez

2,87921023

answered Mar 22 at 15:00

Luis Miguel Mejía Suárez

2,87921023

gives error org.apache.spark.SparkException: Task not serializable

– user10856854
Mar 22 at 15:03

Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

– Luis Miguel Mejía Suárez
Mar 22 at 15:08

I added to my question

– user10856854
Mar 22 at 15:14

@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

– Luis Miguel Mejía Suárez
Mar 22 at 15:34

gives error org.apache.spark.SparkException: Task not serializable

– user10856854
Mar 22 at 15:03

Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

– Luis Miguel Mejía Suárez
Mar 22 at 15:08

I added to my question

– user10856854
Mar 22 at 15:14

@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

– Luis Miguel Mejía Suárez
Mar 22 at 15:34

gives error org.apache.spark.SparkException: Task not serializable

– user10856854
Mar 22 at 15:03

Can you provide a MCVE of how to create the tweetDF so I can test the code myself. It could be just the show of your actual DF.

– Luis Miguel Mejía Suárez
Mar 22 at 15:08

I added to my question

– user10856854
Mar 22 at 15:14

@büşratabak could you give it another try after the edit and see if it works? If not, could you please check with the simple tests I made? You can replace the positiveWords list with the one reading from a file, it should work too.

– Luis Miguel Mejía Suárez
Mar 22 at 15:34

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

1 Answer
1

1 Answer
1

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer 1

1 Answer 1

1 Answer 1

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer
1

1 Answer
1

1 Answer
1