Let's say I have a bunch of product strings that could vary slightly in format, e.g.:
- Super red Megaman T-shirt
- Super Megaman blue T-shirt
- Super black Megaman T-shirt
- Super Megaman T-shirt - aquamarine
I want to compare them all and remove any words/symbols that are different in all the strings, ideally to leave:
- Super Megaman T-shirt
This is so I can streamline making product multipacks from a bunch of existing products, and give users an automatic product title for the bundle as a starting point.
I appreciate it won't be perfect and, depending on how strict the original product strings are, it might go a bit wonky (e.g. in the above example I might get the trailing hyphen in the output), but this is just a baseline best guess to save a bit of typing/copying/pasting.
The surrounding string content won't always be T-shirts or follow a particular pattern. The number of strings to compare is variable. It likely won't be more than six, but an efficient general purpose solution would be great.
I thought maybe PHPs array_diff()
or array_intersect()
might help but thay only compare whole string entries. Likewise strcmp()
only tells me if one string is the same as another (or not).
My only (failed) approach has been to split the strings and use array_intersect()
to do character-by-character comparison like this:
$test = array(
'Super red Megaman T-shirt',
'Super Megaman blue T-shirt',
'Super black Megaman T-shirt',
'Super Megaman T-shirt - aquamarine',
);
foreach ($test as $item) {
$prodCompare[] = str_split($item);
}
var_dump(join(array_intersect($prodCompare[0], $prodCompare[1], $prodCompare[2], $prodCompare[3])));
That gets close:
Super re Megaman T-shirt
but results vary depending on which characters in the remaining string feature in the differences, which one I start with, and how long the first string is.
It also has the drawback that I need to hard-code the arrays to compare, which means complex logic when I don't know how many strings there are going to be.
I keep thinking I'm missing something obvious and there has to be a clever approach to this by, say, XORing the strings, or using one of the array callback functions that I can use to iterate over the strings or array entries. Perhaps array_reduce()
?
But I can't wrap my head round how to compare N strings with N other strings to leave only the bits that don't vary.
It also has the drawback that I need to hard-code the arrays to compare
...I don't think you do: 3v4l.org/RvFFP