subject

Example: data set: collections of text documents. problem: count the frequency of nouns that appear at least 100 times in the documents. (i) mapper function: tokenize each line into a set of terms (words), and filter out terms that are not nouns. (ii) mapper output: key is a noun, value is 1. (iii) reducer input: key is a word, value is list of 1’s. (iv) reduce function: sums up the 1’s for each key (noun). (v) reducer output: key is a noun, value is frequency of the word (filter the nouns whose frequencies are below ) data set: amazon book ratings data. each line in the data file has 4 columns (reviewer id, book id, book genre, rating), where ratings are integer-valued ranging from 1 to 4. problem: identify the highest rated book, i. e., the book with highest average rating, for each book genre. note that each book can have more than one ratings (e. g., by different ) data set: movie preference data. each record in the data file contains the movie title and list of users who liked the movie. for example, the record jaws user111 user134 user313 user5812 star_wars user111 user313 user388 user4422 problem: for each pair of users, count the number of movies they both liked. the output may exclude pairs of users who do not have any movies they both liked.(c) data set: maximum and minimum daily temperature readings for weather stations from around the world. each line in the data files has 4 columns (station id, date, max temperature, min temperature). 2 problem: find the station id and date of anomalous temperature readings in the dataset. a temperature reading is anomalous if the minimum daily temperature exceeds the maximum temperature for the given day.(d) data set: instagram friendship graph. each record corresponds to an instagram user, followed by a list of his/her friends. for example, the graph data may contain the following records: john123 mary456 tom312 lee222 mary456 john123 tom312 john123 lee222 lee222 john123 tom312 the first line above states that mary456, tom312, and lee222 are friends of john123. problem: find pairs of instagram users who are not friends with each other but who share one or more common friends. this is known as the "friend-of-a-friend" (fof) problem. for example, mary456 and tom312 are both friends of john123, but they are not friends with each other. the hadoop program should only output the pair (u, v) if u < v. in the previous example, the program should only output the pair (mary456, tom312) but not (tom312, ) data set: cancer data. each line in the data file corresponds to a patient with the following nominal-valued attributes: patientid, gender, marital status, smoker, weight class, and class, where the class attribute has value yes or no to indicate whether the patient has cancer. 12345, female, married, smoker, normal, yes. 13, male, single, nonsmoker, normal, no. 14423, male, married, smoker, overweight, yes. problem: compute the gini index for each of the following attributes: gender, marital status, smoker, and weight class, based on the distribution of their class values.

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 04:30
Ryan is working on the layout of her web page. she needs to figure out where her header, navigation bar, text, and images should go. what technique can her?
Answers: 1
question
Computers and Technology, 22.06.2019 16:10
Drag each label to the correct location on the imagelist the do’s and don’ts of safeguarding your password.keep yourself loggedin when you leave your computer.don’t write your password down and leave it whereothers can find it.share your password with your friends.each time you visit a website,retain the cookies on your computer.use a long password with mixed characters.
Answers: 1
question
Computers and Technology, 23.06.2019 03:30
Hashtags serve to identify the topic of a given tweet true false
Answers: 2
question
Computers and Technology, 23.06.2019 05:30
Sally is editing her science report about living things. she needs to copy a paragraph from her original report. order the steps sally needs to do to copy the text to her new document.
Answers: 1
You know the right answer?
Example: data set: collections of text documents. problem: count the frequency of nouns that appe...
Questions
question
Biology, 27.11.2019 18:31
question
French, 27.11.2019 18:31