Research and Insights

By CrowdFlower Admin, July 18, 2016

Improving Your Search Relevance Algorithm With Human Curated Data

 

Join CrowdFlower, Adobe, and Etsy on July 19 to discuss how to improve search relevance, business outcomes, and user experience using human curated data.

 

As you set out to improve your search algorithm, it’s important to know what tools you have at your disposal. Ahead of our webinar tomorrow, How to Improve Search Relevance, Business Outcomes and User Experience, we’re previewing some tools and tactics our guests from Adobe and Etsy use every day to enhance their search algorithms.

In our previous post, we outlined the difference between Click Data and Human Curated Data metrics. Today, we’ll dig into the deeper use cases that require human curated data in particular.

Why You Need Human Curated Data

Tapping individual contributors to evaluate search results allows you to get explicit relevance judgments, which is a higher quality metric to optimize for than just clicks. For example, Etsy turned to CrowdFlower to help them solve for brand affinity. They wanted to make sure the products that most aligned with the Etsy brand (the most “Etsy-ness,” if you will) were displayed first in their search results. This is a problem that needed human judgement. Due to the nature of Etsy’s platform, typical click data doesn’t suffice. One example of this is the plain fact that Etsy is fun to browse through. If a user clicks from page to page of search results it doesn’t mean they can’t find what they’re looking for, it just means they’re enjoying browsing.

That’s where human curated data comes in. Etsy used CrowdFlower to create a better filtered search, taking the burden off of its independent sellers to label their products and instead tapping CrowdFlower contributors to take on the job. With an ecosystem of more than 40 million products, this was no small task.

When embarking on setting up your relevance scoring system for human curated data, we recommend that you score your current search algorithm with individual contributors as-is to establish a baseline. Then you can make changes based on the metrics that are right for you and your site, and then re-test the query-result pairings the new algorithm produces on the same random set of queries against your old one.

Here’s how you’ll be able to understand if your new algorithm is an improvement or if you should make further changes.

Ways Contributors Can Improve Your Algorithm:

  • Score Query-Results Pairs: One of the most effective ways to use a contributor is for query-results pairs to measure relevance. To establish this metric, you must design a numerical scale (typically our customers create a 2, 3 or 5 point scale), which Contributors use to score each query-result pairing. This will give you a high-level idea of how well your search relevance algorithm is performing as well as a number to try and beat during your later relevance testing.
  • Additional Tagging: Item metadata can significantly increase search relevance. Leveraging contributors on their own or in tandem with automated, machine learning-enabled tagging can fill a product database with new tags quickly.
  • Data Cleaning and Product Categorization: Product databases get messy. Manufacturers may use different wordings for similar products; different distributors can describe or title identical products in different ways; or sometimes, you may just have several images associated with one product with no real way of knowing which is best. Contributors can easily reconcile these discrepancies.

Conclusion

Human curated data is the key to elevating your search relevance algorithm from good to great. To get your hands on some real-life relevance scoring examples from the data science leaders of CrowdFlower, Adobe, and Etsy, CrowdFlower, make sure to sign up for our webinar tomorrow, July 19.