subject
Computers and Technology, 03.03.2020 01:28 196336

In word segmentation, you are given as input a string of alphabetical characters ([a-z]) without whitespace, and your goal is to insert spaces into this string such that the result is the most fluent according to the language model.

a. Consider the following greedy algorithm: Begin at the front of the string. Find the ending position for the next word that minimizes the language model cost. Repeat, beginning at the end of this chosen segment.
Show that this greedy search is suboptimal. In particular, provide an example input string on which the greedy approach would fail to find the lowest-cost segmentation of the input.
In creating this example, you are free to design the n-gram cost function (both the choice of n and the cost of any n-gram sequences) but costs must be positive and lower cost should indicate better fluency. Note that the cost function doesn't need to be explicitly defined. You can just point out the relative cost of different word sequences that are relevant to the example you provide. And your example should be based on a realistic English word sequence β€” don't simply use abstract symbols with designated costs.

b. Implement an algorithm that, unlike greedy, finds the optimal word segmentation of an input character sequence. Your algorithm will consider costs based simply on a unigram cost function.
Before jumping into code, you should think about how to frame this problem as a state-space search problem. How would you represent a state? What are the successors of a state? What are the state transition costs? (You don't need to answer these questions in your writeup.)
Fill in the member functions of the SegmentationProblem class and the segmentWords function. The argument unigramCost is a function that takes in a single string representing a word and outputs its unigram cost. You can assume that all the inputs would be in lower case. The function segmentWords should return the segmented sentence with spaces as delimiters, i. e. ' '.join(words).
For convenience, you can actually run python submission. py to enter a console in which you can type character sequences that will be segmented by your implementation of segmentWords. To request a segmentation, type seg mystring into the prompt. For example:

>> seg
Query (seg):
this is not my beautiful house

Console commands other than seg β€” namely ins and both β€” will be used for the upcoming parts of the assignment. Other commands that might help with debugging can be found by typing help at the prompt.
Hint: You are encouraged to refer to NumberLineSearchProblem and GridSearchProblem implemented in util. py for reference. They don't contribute to testing your submitted code but only serve as a guideline for what your code should look like.
Hint: the final actions that ucs (a UniformCostSearch object) takes can be accessed through ucs. actions.

ansver
Answers: 2

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 15:00
When designing content as part of your content marketing strategy, what does the "think" stage represent in the "see, think, do, care" framework?
Answers: 3
question
Computers and Technology, 23.06.2019 07:30
To check spelling errors in a document, the word application uses the to determine appropriate spelling. internet built-in dictionary user-defined words other text in the document
Answers: 2
question
Computers and Technology, 23.06.2019 22:20
What is a programming method that provides for interactive modules to a website?
Answers: 1
question
Computers and Technology, 24.06.2019 22:30
In writing a paper for his english class, gavin quoted an author of the book. what should he include in his paper to credit the source? citation caption header entry
Answers: 1
You know the right answer?
In word segmentation, you are given as input a string of alphabetical characters ([a-z]) without whi...
Questions
question
Mathematics, 25.06.2019 03:00
question
Mathematics, 25.06.2019 03:00
question
History, 25.06.2019 03:00