When you’re searching for something on the Web, does it matter whether you use the singular or plural version of a word in your search?

For example, let’s say that you are looking for a new pair of sneakers to go jogging in, and you want to find the right combination of comfort and support, so you decide to look into the best sneakers for running. Does it make a difference in search results when you type in running shoes or running shoe in a search box?

If a search engine just returned results to you based upon your choice of a singular or plural queries, would you get the best results? Should a search engine explore both versions, and try to provide you with a mix of results based upon what it believes are the best results, after looking at results from the singular and plural queries?

A quick look at the top ten results at Yahoo and Google for both “running shoe” and “running shoes” (both searches without the quotation marks) showed some overlap in pages returned for singular and plural versions at each search engine, but the vast majority of search results seem to focus upon returning results for the plural version of the word, instead of the singular version.

So it does seem like both Yahoo and Google are looking at both singular and plural queries when someone enters one or the other.

Are the search engines performing queries for both the singular and plural versions, and showing a mix of the most relevant results? Are they adding more weight from one set of results over the other when they present those results? Are they looking at singular and plural versions of all words in a query (running or runnings and shoe or shoes), or somehow just picking out certain words to look at the plural and non-plural forms of when presenting search results?

I performed searches at Google and Yahoo for these pages with the singular and plural versions of shoe, and also separate searches for those terms with quotation marks around them, which the advanced search pages at Google and Yahoo tell us should return matches for the exact phrase searched for by the search engines.

Using the quotation marks provides an “exact” search result at Google or Yahoo. Without the quotation marks, the search engine returns a “findall” or “find all” set of results The difference between exact and findall search results is that the query terms in a findall search might appear on a page, but not as a phrase (for instance, “I went running for my shoes”). However, it’s still interesting to compare exact and findall results, as well as singular and plurals.

Here are the URLs that I received in my searches for running shoe, “running shoe”, running shoes, and “running shoes” at Yahoo and Google.

Yahoo – running shoe

www.runningshoes.com www.roadrunnersports.com answers.yahoo.com/question/index?qid=20060726101554AAIrZIx www.runnersworld.com/topic/0,7122,s6-240-400-0-0,00.html www.runnersworld.com/topic/0,7122,s6-240-319-0-0,00.html answers.yahoo.com/question/index?qid=20060817165921AAPdd7e www.aapsm.org/runshoe.html www.brooksrunning.com www.posetech.com/runningshoes www.runningcenters.com

Yahoo – “running shoe”

www.runningshoes.com www.roadrunnersports.com answers.yahoo.com/question/index?qid=20071121220728AAtZ3a6 answers.yahoo.com/question/index?qid=20070322072938AAIrngn www.aapsm.org/runshoe.html www.ehow.com/how_2041199_select-running-shoe.html www.runnersworld.com/topic/0,7122,s6-240-400-0-0,00.html http://www.runnersworld.com/cda/shoelabshoefinder/0,7154,s6-240-325-329-0-0-0-0-0,00.html www.epodiatry.com/running-shoes.htm www.therunningadvisor.com/running_shoes.html

Yahoo – running shoes

www.roadrunnersports.com www.runningshoes.com answers.yahoo.com/question/index?qid=20080324203225AANacNU answers.yahoo.com/question/index?qid=20060817165921AAPdd7e www.runnersworld.com/topic/0,7122,s6-240-319-0-0,00.html www.runnersworld.com/topic/0,7122,s6-240-400-0-0,00.html www.holabirdsports.com www.finishline.com/running-shoes.shtml www.posetech.com/runningshoes www.finishline.com

Yahoo – “running shoes”

www.roadrunnersports.com www.runningshoes.com answers.yahoo.com/question/index?qid=20080324203225AANacNU answers.yahoo.com/question/index?qid=20060817165921AAPdd7e www.runnersworld.com/topic/0,7122,s6-240-319-0-0,00.html www.runnersworld.com/topic/0,7122,s6-240-400-0-0,00.html www.holabirdsports.com www.finishline.com/running-shoes.shtml www.posetech.com/runningshoes www.finishline.com

Google – running shoe

www.runningshoes.com/ www.roadrunnersports.com/ www.runnersworld.co.uk/quicklinks/shoechoice/ www.runnersworld.com/topic/0,7122,s6-240-400-0-0,00.html www.runnersworld.com/article/0,7120,s6-240-400–12623-1-1X2X3X4X5X6-6,00.html www.runningwarehouse.com/ www.epodiatry.com/running-shoes.htm www.aapsm.org/selectingshoes.html www.runningunlimited.com/ www.milanrunning.com/

Google – “running shoe”

www.runnersworld.co.uk/quicklinks/shoechoice/ www.runningshoes.com/ www.roadrunnersports.com/ www.runnersworld.com/topic/0,7122,s6-240-400-0-0,00.html www.aapsm.org/selectingshoes.html www.aapsm.org/runshoe.html www.runningwarehouse.com/ www.epodiatry.com/running-shoes.htm www.therunnershigh.com/shoes/ www.ehow.com/how_2041199_select-running-shoe.html

Google – running shoes

www.runningshoes.com/ www.runningwarehouse.com/ www.roadrunnersports.com/ www.runnersworld.com/topic/0,7122,s6-240-400-0-0,00.html www.epodiatry.com/running-shoes.htm www.mpire.com/running-shoes-s.htm www.holabirdsports.com/ www.brooksrunning.com/ www.aapsm.org/runshoe.html www.milanrunning.com/

Google – “running shoes”

www.runningshoes.com/ www.runnersworld.com/topic/0,7122,s6-240-400-0-0,00.html www.runningwarehouse.com/ www.roadrunnersports.com/ www.epodiatry.com/running-shoes.htm www.brooksrunning.com/ www.holabirdsports.com/ www.milanrunning.com/ www.aapsm.org/runshoe.html www.telarun.com/

A new patent application from Yahoo explores how a search engine might handle the singular and plural queries, and convert those query terms to plural or non-plural forms to provide the most relevant results while also limiting how much computation a search engine has to do to return those results.

Word pluralization handling in query for web search Invented by Fuchun Peng, Nawaaz Ahmed, Xin Li, and Yumao Lu Assigned to Yahoo US Patent Application 20080189262 Published August 7, 2008 Filed: February 1, 2007

Abstract

Techniques for determining when and how to transform words in a query to its plural or non-plural form to provide the most relevant search results while minimizing computational overhead are provided.

A dictionary is generated based upon the words used in a specified number of previous most frequent search queries and comprises lists of transformations from plural queries to singular queries and singular to plural.

Unnecessary transformations are removed from the dictionary based upon language modeling. The word to transform is determined by finding the last non-stop re-writable word of the query.

The context of the transformed word is confirmed in the search documents and a version of the query is executed using both the original form of the word and the transformation of the word.

The authors of the patent filing tell us that:

Up to 50% of queries directed to web search engines possess at least one term in the search query that may be transformed either from singular to plural form or plural to singular form.

However, among this 50 % of queries, only 25% would benefit from pluralization or de-pluralization.

So, it seems that sometimes providing results that are singular or plural will provide more relevant results for a searcher than if the search engine had just returned results for the version that a searcher entered into a search box.

Determining when and how to transform an original query term to it’s plural or singular form is important to obtain the most relevant search results with minimal overhead.

1) First, a dictionary is generated, based upon the most frequent previous search queries.

2) Once a query is received from the user, in this example “running shoes”, a determination is made to find the particular word to transform.

3) Finding the headword makes that determination, and in this example, the headword is “shoes”.

4) The selected headword is examined in the dictionary to find the transformed non-plural form of the word. The dictionary may or may not contain the transformation because transformations may be removed if they are found not to be relevant.

5) Finally, a version of the query is created using the transformed word and the original form of the word. To the user, this transformation is not visible and only the original submitted query is observed.

The authors also collaborated on a paper titled Context-Sensitive Stemming for Web Search (pdf), and it provides a slightly different look at issues involving pluralization, and other variations of words.