R

Fuzzy matching multiword strings & sentences in R

800px-plantains

Some “large green plantains”, courtesy of Wikipedia user Daegis.

A colleague asked me about fuzzy matching of string data, which is a problem that can come up when linking datasets. I figured I might as well reproduce my comments here since this is such a common problem, and many of the built-in algorithms are well suited to word matching but not to multiword strings. (more…)