subject

Task 1: We have provided some synthetic (fake, semi-randomly generated) twitter data in a csv file named project_twitter_data. csv which has the text of a tweet, the number of retweets of that tweet, and the number of replies to that tweet. We have also words that express positive sentiment and negative sentiment, in the files positive_words. txt and negative_words. txt.

Your task is to build a sentiment classifier, which will detect how positive or negative each tweet is. You will create a csv file, which contains columns for the Number of Retweets, Number of Replies, Positive Score (which is how many happy words are in the tweet), Negative Score (which is how many angry words are in the tweet), and the Net Score for each tweet. At the end, you upload the csv file to Excel or Google Sheets, and produce a graph of the Net Score vs Number of Retweets.

To start, define a function called strip_punctuation which takes one parameter, a string which represents a word, and removes characters considered punctuation from everywhere in the word. (Hint: remember the .replace() method for strings.)

Task 2 :Next, copy in your strip_punctuation function and define a function called get_pos which takes one parameter, a string which represents a one or more sentences, and calculates how many words in the string are considered positive words. Use the list, positive_words to determine what words will count as positive. The function should return a positive integer - how many occurances there are of positive words in the text.

task 3 :Next, copy in your strip_punctuation function and define a function called get_neg which takes one parameter, a string which represents a one or more sentences, and calculates how many words in the string are considered negative words. Use the list, negative_words to determine what words will count as negative. The function should return a positive integer - how many occurances there are of negative words in the text.

task 4:Finally, copy in your previous functions and write code that opens the file project_twitter_data. csv which has the fake generated twitter data (the text of a tweet, the number of retweets of that tweet, and the number of replies to that tweet). Your task is to build a sentiment classifier, which will detect how positive or negative each tweet is. Copy the code from the code windows above, and put that in the top of this code window. Now, you will write code to create a csv file called resulting_data. csv, which contains the Number of Retweets, Number of Replies, Positive Score (which is how many happy words are in the tweet), Negative Score (which is how many angry words are in the tweet), and the Net Score (how positive or negative the text is overall) for each tweet. The file should have those headers in that order. Remember that there is another component to this project. You will upload the csv file to Excel or Google Sheets and produce a graph of the Net Score vs Number of Retweets

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 21.06.2019 21:00
It is not a good idea in a cover letter to mention another person whom the employer knows.
Answers: 1
question
Computers and Technology, 22.06.2019 17:30
Working on this program in python 3.7: a year in the modern gregorian calendar consists of 365 days. in reality, the earth takes longer to rotate around the sun. to account for the difference in time, every 4 years, a leap year takes place. a leap year is when a year has 366 days: an extra day, february 29th. the requirements for a given year to be a leap year are: 1) the year must be divisible by 42) if the year is a century year (1700, 1800, the year must be evenly divisible by 400some example leap years are 1600, 1712, and 2016.write a program that takes in a year and determines whether that year is a leap year.ex: if the input is 1712, the output is: 1712 is a leap year. ex: if the input is 1913, the output is: 1913 is not a leap year. your program must define and call the function isleapyear(useryear). the function should return true if the input year is a leap year and false otherwise.
Answers: 1
question
Computers and Technology, 23.06.2019 01:30
In deadlock avoidance using banker’s algorithm, what would be the consequence(s) of: (i) a process declaring its maximum need as maximum possible for each resource. in other words, if a resource a has 5 instances, then each process declares its maximum need as 5. (ii) a process declaring its minimum needs as maximum needs. for example, a process may need 2-5 instances of resource a. but it declares its maximum need as 2.
Answers: 3
question
Computers and Technology, 23.06.2019 16:30
If i wanted to include a built-in calendar in a document, what option could i select? draw table insert table insert chart quick tables
Answers: 1
You know the right answer?
Task 1: We have provided some synthetic (fake, semi-randomly generated) twitter data in a csv file n...
Questions
question
Mathematics, 25.12.2021 19:30
question
English, 25.12.2021 19:30
question
Chemistry, 25.12.2021 19:30