The Levenshtein distance is a charater-based string metric used to measure the difference between two strings (for details, look here). In this problem, you need to implement a word-based version of the Levenshtein distance.
Given two strings, compute the minimum number of word-edits to transform one string into another. The allowable edits are insertion, deletion, or substitution of a single word. Assume words are case-insensitive. Contractions and hyphenated words are allowed, but you may ignore other punctuation.
Example
If
s1 = 'I do not like MATLAB'
s2 = 'I love MATLAB a lot'
then
d = 4
because at least four edits are required to transform s1 into s2 (substitution on the last four words).
Solution Stats
Problem Comments
3 Comments
Solution Comments
Show comments
Loading...
Problem Recent Solvers133
Suggested Problems
-
Remove the small words from a list of words.
1559 Solvers
-
Rotate and display numbered tile
377 Solvers
-
Unique values without using UNIQUE function
440 Solvers
-
1906 Solvers
-
Find my daddy long leg (No 's')
2717 Solvers
More from this Author43
Problem Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
Question on Test Suite #2.
s1 = 'Which words need to be edited?';
s2 = 'Can you tell which words need to be edited?';
d_correct = 3;
I see this as a 2, Substitute w for W and insert 'Can you tell ' - done. Where is the 3rd change?
Richard - you do not have to substitute w for W (words are case-insensitive). And inserting 'Can you tell' is three edits, one for each word.
I understand how this problem was designed, but I disagree. For instance transforming s1 = 'I do not like MATLAB' into s2 = 'I love MATLAB a lot' should be a 2-word edit, because the 3-words 'do not like' could be grouped into 1-word and changed into 'love', as well as 'a lot' could be treated as just 1-word insertion after the word MATLAB. And that's probably how the original Leveshtein distance would measure it, since the algorithm returns 15-character edits: transforming 'do not like' into 'love' requires 9-character edits, and inserting ' a lot' after 'MATLAB' requires 6-character edits.