8

Let's say I have a bunch of product strings that could vary slightly in format, e.g.:

  • Super red Megaman T-shirt
  • Super Megaman blue T-shirt
  • Super black Megaman T-shirt
  • Super Megaman T-shirt - aquamarine

I want to compare them all and remove any words/symbols that are different in all the strings, ideally to leave:

  • Super Megaman T-shirt

This is so I can streamline making product multipacks from a bunch of existing products, and give users an automatic product title for the bundle as a starting point.

I appreciate it won't be perfect and, depending on how strict the original product strings are, it might go a bit wonky (e.g. in the above example I might get the trailing hyphen in the output), but this is just a baseline best guess to save a bit of typing/copying/pasting.

The surrounding string content won't always be T-shirts or follow a particular pattern. The number of strings to compare is variable. It likely won't be more than six, but an efficient general purpose solution would be great.

I thought maybe PHPs array_diff() or array_intersect() might help but thay only compare whole string entries. Likewise strcmp() only tells me if one string is the same as another (or not).

My only (failed) approach has been to split the strings and use array_intersect() to do character-by-character comparison like this:

$test = array(
   'Super red Megaman T-shirt',
   'Super Megaman blue T-shirt',
   'Super black Megaman T-shirt',
   'Super Megaman T-shirt - aquamarine',
);

foreach ($test as $item) {
   $prodCompare[] = str_split($item);
}

var_dump(join(array_intersect($prodCompare[0], $prodCompare[1], $prodCompare[2], $prodCompare[3])));

That gets close:

Super re Megaman T-shirt

but results vary depending on which characters in the remaining string feature in the differences, which one I start with, and how long the first string is.

It also has the drawback that I need to hard-code the arrays to compare, which means complex logic when I don't know how many strings there are going to be.

I keep thinking I'm missing something obvious and there has to be a clever approach to this by, say, XORing the strings, or using one of the array callback functions that I can use to iterate over the strings or array entries. Perhaps array_reduce()?

But I can't wrap my head round how to compare N strings with N other strings to leave only the bits that don't vary.

New contributor
Stef Dawson is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
4
  • It also has the drawback that I need to hard-code the arrays to compare...I don't think you do: 3v4l.org/RvFFP
    – ADyson
    Commented Jul 29 at 21:18
  • @ADyson Thanks, yes: if the client is using PHP 8.2+, sure. This product supports PHP 7 too, for now, and I think the client isn't up-to-date yet for various (plugin-based) reasons. Part of the reason I'm doing this is to help the client wrangle the product database into a suitable shape so reliance on some older tech isn't necessary and allow the server and CMS to be upgraded. Commented Jul 29 at 21:26
  • Only an idea: split strings into words, find words that are in all strings, create a new string, assign/use the new string.
    – Wiimm
    Commented Jul 29 at 21:43
  • @Wiimm Yes, that's the kind of approach supplied as the accepted answer. Thank you for chipping in. It's a great idea. I got hung up on checking characters/strings rather than words, and couldn't see the wood for the trees. Commented Jul 29 at 21:49

1 Answer 1

8

I came up with this solution:

<?php

$productTitles = array(
   'Super red Megaman T-shirt',
   'Super Megaman blue T-shirt',
   'Super black Megaman T-shirt',
   'Super Megaman T-shirt - aquamarine',
);

$productWords = explode(' ', array_pop($productTitles));
foreach ($productTitles as $productTitle) {
   $productWords = array_intersect($productWords, explode(' ', $productTitle));
}

echo implode(' ', $productWords);

It returns:

Super Megaman T-shirt

The trick is to use array_intersect() between each title of words.

Demo: https://3v4l.org/0vlcL

This is a basic example. You probably would need to check if there's only one title, and skip it. It will also only work on "words", not parts of words.

2
  • Way cool. That's clever to iterate over the items by splitting on words and using array_intersect() on itself. I never would have thought of that. Thank you so much. I'm going to run it on a bunch of products in the database and see what happens, but from a cursory test, this looks like it hits the spot and is simple and performant. Brilliant! Commented Jul 29 at 21:42
  • @StefDawson OK, let me know how it goes. Remember, the last title will determine the word order in the result, but by exchanging the arguments of array_intersect() you can make that the first title. Commented Jul 30 at 8:13

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.