subject

Question 1 (Index Construction):
Suppose you have joined a search engine development team to design a search algorithm based on both the Vector model and the Boolean model.
You have collected the following documents (unstructured) and plan to apply an index technique to convert them into an inverted index.

Doc 1ļ¼šdata science is field to use scientific method, process, algorithm, system to extract knowledge.

Doc 2ļ¼šdata mining is the process to discover pattern in large data to involve method at the database system.

Doc 3ļ¼šinformation system is the study of network of hardware and software that people use to process data.

To answer the below questions, you have to provide the detailed procedures step by step.
You need to remove all stop words and punctuation before the process of creating the inverted index. After that, please complete the following steps:

Question 1.1:
Create a merged inverted list including the within-document frequencies for each term.

Question 1.2:
Use the index created as above to create a dictionary and the related posting file.

Question 1.3:
Please design three Boolean queries, (for example, web AND search) and list the relevant documents for each query. Each query must contain at least two keywords while no one keyword appears in one document only.

Question 1.4:
Please use the Vector model to query on the inverted index, and compare the result with the Boolean model. (Hint: you can use cosine similarity and set a similarity threshold).

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 24.06.2019 04:30
1. web and mobile applications allow users to be actively engaged in an online activity. a true b false 2. some examples of business applications purposes are to collaborate, share files, meet virtually in real-time, and accept payments. a true b false 3. an education application would most likely do which of the following? a allow users to watch popular movies and tv shows b connect users with social and business contacts c confirm users' travel plans d teach users a new language 4. a uniform resource locator (url) is how the internet knows where to take users when an address is typed into a browser. a true b false 5. deon is required to provide the citation information for his sources. what type of information should he collect from his sources? a author name, title, date of publication, date of access, url b connections to background information c interesting facts and statistics d notes on important information
Answers: 1
question
Computers and Technology, 24.06.2019 08:20
Which type of entity describes a fundamental business aspect of a database? a. linking b. lookup c. domain d. weak
Answers: 3
question
Computers and Technology, 24.06.2019 15:30
If you want to delete an entire word at a time, which key should you press along with the backspace or delete key?
Answers: 1
question
Computers and Technology, 24.06.2019 20:20
Write python code that prompts the user to enter his or her age and assigns the userā€™s input to an integer variable named age.
Answers: 1
You know the right answer?
Question 1 (Index Construction):
Suppose you have joined a search engine development team to...
Questions
question
Mathematics, 20.05.2020 13:58
question
Mathematics, 20.05.2020 13:58
question
Mathematics, 20.05.2020 13:58
question
English, 20.05.2020 13:58
question
Mathematics, 20.05.2020 13:58
question
Mathematics, 20.05.2020 13:58