AI and product matching: a 20-fold increase in productivity with Mercio
Product matching, or chaining, consists in linking distinct products that look comparable for the consumer. As catalogs become increasingly differentiated (private labels, exclusivity), this process grows extremely time-consuming and tedious. Yet it is essential for analyzing the competition and adjusting prices. Semantic analysis and AI make it possible to automate this task, while leaving the management of specific cases to humans. This combined approach ensures accurate matching, essential for effective pricing and price image management. In this article, we give you an insight into our latest developments in this field.
Product matching: what does it mean for a retailer?
Bringing together distinct products that consumers consider equivalent
Product matching, sometimes called "horizontal chaining" or simply "chaining", consists in bringing together two distinct products - i.e. with different product codes - that are nevertheless comparable for the end customer.
Indeed, while some products are sold by multiple retailers on the same market (e.g. a bottle of Coca-Cola or a jar of Nutella), others can only be found on the shelves of a single chain.
This applies not only to private label products specific to each chain, but also to national-brand products for which some exclusivity has been negotiated with manufacturers (new products, special formats).
Exemples de produits strictement identiques ou simplement similaires, qui seront comparés par les clients.
The financial stakes are high: the share of products that are not strictly identical with the rest of the market varies according to market and brand, but systematically represents a significant proportion of the retailer's sales - and even more so of its margin.
The trend is towards the development of these specific features by brand, particularly with the private label development objectives announced by many brands, and the increasing complexity of comparability between brands, allowing greater latitude in setting prices and managing the brand's price image. In the UK, private labels account for around 60% of the grocery market, with retailers such as Tesco, Sainsbury's, Asda and Morrisons offering extensive private label ranges. This non-comparability also makes price positioning relative to competitors more difficult. That’s the reason why many retailers do not display product EANs - only internal IDs - on their websites.
In recent years, a game of hide-and-seek has been played out between retailers, with EAN reset and exclusivity releases. The aim is to offer attractive prices that are visible to consumers, but hard to spot by competitors, and thus to lose them in the race for low prices.
An exhaustive, reliable database of matching links is useful for several reasons
For assortment management, it enables you to identify "holes" in your product catalog or that of your competitors, for catalog development or rationalization.
For pricing, it enables you to position your products' prices and monitor your price positioning (calculation of your price indices), as customers compare products that meet the same needs. Without matching links, you will be steering your pricing blindly on a significant proportion of your sales.
A complete and up-to-date list of matching links is therefore a must for retail pricing!
Setting up and maintaining a database of matching links: easier said than done!
The workload for pricers or product managers is huge
One of the obstacles to reliable product matching is the workload involved. A national food retailer has between 5,000 and 8,000 active private label references on average, to be matched with 7 other retailers. That's a base of more than 35,000 links to maintain, at the very least. This figure is a low assumption, as it does not count exclusivities, some sectors have even larger catalogs (e.g. DIY), and new references appear regularly (new products, EAN reset...).
A non-automated matching process entails a heavy workload, even for the largest teams. All the more so as matching is only one responsibility among many, regardless of the team in charge.
Product heterogeneity and poor data quality make the task more complicated
The great diversity of products in a retailer catalog adds another layer of complexity to the matching task. Certain characteristics will be crucial in the perceived value for the consumer in one unit of need, but will not exist for another type of product.
For example, the duration of protection is discriminating for the consumer of a deodorant, while flavour is important for products such as yoghurts, and cooking time for pasta. Characteristics therefore vary from one product family to another, and values are not standardized.
Product characteristics vary according to product type, and are not standardized in catalogs.
An exhaustive, structured database of product characteristics is not available, and would require curation work on all products available on the market (including private labels and exclusivity, for matching).
Semantic analysis: a proven AI application
Lexical extension has undergone spectacular development in recent years
The first research works which aimed at understanding the human language mathematically dates back to the 50s, but technological developments - in particular the increase in computing and storage capacities - and algorithmic developments have led to significant progress in recent years.
Word embeddings involve encoding discrete information (a word or a set of words) into a vector of numbers in a space with many dimensions - if you're interested in this subject, we recommend reading this presentation from Stanford University. LLMs (ChatGPT and others) work thanks to these vector representations. These vectors will encode different aspects of language, and a properly trained model can therefore "learn" the semantic relationships between words.
Illustration d’un exemple classique mettant en valeur la préservation de la sémantique dans l’espace vectoriel : v(roi) – v(homme) + v(femme) = v(reine)
Recent advances in AI models are directly applicable to product matching
Product labels are excellent candidates for the lexical extension described above
- all products have a label (cash register or ERP), enabling complete catalog coverage.
- the main characteristics are often present in the label, or the label enriched with nomenclature information, enabling good discrimination of products in the resulting vector space.
The encoding of labels enables the various characteristics to be represented in space, and therefore to accurately detect "close" products, i.e. those with the same characteristics (or almost: in practice, similar products have the same characteristics, with the exception of the brand, in the case of own-brand products).
*Note on label quality: The choice of which type of label to use depends on several factors. Web labels are often of better quality, but are missing on parts of the catalog. Internal labels have the advantage of being available throughout the catalog, but are of poorer quality, with numerous abbreviations to compensate for character limitations. Using a combination of the different types of label leads to the best results.
How Mercio meets the challenge of product matching with the latest AI models
The approach described above is used by Mercio to algorithmically generate link recommendations, with the dual goal of boosting user productivity and providing a detailed view of positioning in relation to the competition.
Mercio's technical teams met two challenges in the operational implementation of the algorithm:
- Selecting the right model, and training and parameterizing it.
The LLM revolution and initiatives to make pre-trained models available have reduced the barrier to entry in this field. However, a good understanding of the underlying workings is still required for fine-tuning.
- Integrating the algorithm into the application's data lifecycle.
A naive approach is to train and run the algorithm independently of the application. This approach often leads to data duplication, multiplication of data exchanges and a significant increase in complexity and therefore maintenance. The latest data warehouse technologies make it possible to run these algorithms directly where the data is stored. The complete overhaul of our solution architecture now enables us to take full advantage of these latest revolutions.
What role do operational teams play in this new product matching process?
Augmented operational teams
We have observed that the development of AI rarely leads to an outright removal of humans from the process, but rather to a redistribution of roles between the machine and us. This is also the case in product matching automation, which removes a significant workload from low value-added tasks and significantly improves matching quality and coverage.
Our parameterized and optimized product matching algorithm leads to validation rates of over 95%. It will enable pricers and category managers to focus on special cases efficiently.
In the case of matching, there are two reasons why humans will remain useful:
- Data problems
Operational data is never perfect, and incorrect or incomplete labels, although rare, are always possible. - The intuitive nature of certain links
Indeed, while some products are clearly comparable (same characteristics), other product pairs may be the subject of disagreement between pricers or buyers on comparability. For example, two products whose volume difference is very small will be compared by customers (all other characteristics being identical), but if the volume difference becomes significant, then these products will be too far apart to be compared. The definition of this limit is the subject of debate - sometimes heated - among operational staff.
We also see other operational reasons for controlling the deployment of chaining: "I gradually chain this category, as I can make investments. Because if I chain everything immediately, my index will explode and I'll get my knuckles rapped..." - an anonymous user 😉.
How can we ensure good man-machine collaboration?
The challenge, in this man-machine coordination, is to precisely circumscribe the perimeters of each party (based on the strengths and weaknesses of each). And to reduce the number of clicks in the application interface to an absolute minimum.
An optimized review process for operational teams
Excellent precision in calculated recommendations and an optimized user experience enable a creation rate of 15 links per minute, or 900 links per hour.
And while the machine facilitates the human's work by proposing link recommendations, the human improves the algorithm's accuracy by validating the correct proposals. Indeed, the information generated by the link review is then used to train the algorithm on concrete examples. A perfect collaboration, where human and machine mutually enrich each other.
Complete, up-to-date product chaining is within the reach of even the largest retail catalogs!
Conclusion: A revolution in retail product matching
Product matching is a complex and time-consuming task, but one that is essential for optimizing assortment management, pricing and price image. Thanks to advances in artificial intelligence, it is now possible to automate a large part of this process, while leaving the most complex cases to human intervention.
With high-performance algorithms and an optimized user interface, complete and precise matching becomes possible, even for the most extensive catalogs. Retailers who adopt these tools can improve their competitiveness and performance, through greater efficiency and precision. A controlled process is a strategic lever for retail!
If you're interested in consumer price perception, don't hesitate to read on cognitive biases in pricing.
And if you'd like to discuss pricing, technologies and algorithms, get in touch!