Rewriting a Query Using Entity Detection
Google has several special search operators that you can use in a search to specialize your searches.
One of those special search operators is the “site” operator, which allows you to restrict your searches to a specific domain or website if you use a special “site” command (or operator).
Example “site search” queries:
site:www.seobythesea.com google patents site:www.espn.go.com Derek Jeter
A newly granted patent from Google may assume that a searcher would like to see results from a search of a specific site and search results from other pages on the Web. The patent attempts to make up for typical searchers who may fail to use the “site” operator in their searches. As the patent tells us:
Some search engines permit a user to restrict a search to a set of related documents, such as documents associated with the same website, by including special characters or terms in the search query. Oftentimes, however, users forget to include these special characters/terms or do not know about them.
The process behind this patent looks for what the inventors call “entities” as part of the search query. An entity can be “anything that can be tagged as being associated with certain documents.” For example, entitles can include:
- News sources,
- Online stores,
- Product categories,
- Brands or manufacturers,
- Specific product models,
- Condition (such as new, used, refurbished, etc.),
- Authors,
- Artists,
- People,
- Places, and;
- Organizations.
Some entity names are unambiguous and unique, while many others are somewhat ambiguous or generic. If an entity name can be identified, a searcher’s query might be rewritten based upon that entity name. That rewritten query may become part of the search results shown to a searcher, or a link to “site” search results may be provided.
The entity names may be found on the Web in directories, lists, and other places and may be associated with a particular set of pages.
Entity Detection Example
The term “MSNBC” may be identified as an entity associated with the set of pages at the domain http://www.msnbc.msn.com/. If someone were to search for [george bush msnbc], Google might rewrite that search to be [“George Bush site:www.msnbc.msn.com/], and include those results within the set of search results for [george bush msnbc], possibly near or at the top of those result. Or, it may include a link to results for that “site” search at the top of the results. It’s also possible that since the entity “MSNBC” is news content, that news results blended into the Web search results may focus upon site search results from http://www.msnbc.msn.com/.
The patent is:
Query rewriting with entity detection Invented by Hong Zhou, Krishna Bharat, Michael Schmitt, Michael Curtiss, and Marissa Mayer Assigned to Google US Patent 7,536,382 Granted May 19, 2009 Filed: March 31, 2004
Abstract
A system receives a search query, determines whether the received search query includes an entity name, and determines whether the entity name is associated with a common word or phrase. When the entity name is associated with a common word or phrase, the system generates a link to a rewritten query, performs a search based on the received search query to obtain the first search results, and provides the first search results and the link the rewritten query.
When the entity name is not associated with a common word or phrase, the system rewrites the received search query to include a restricted identifier associated with the entity name, generates a link to the received search query, performs a search based on the rewritten search query to obtain second search results, and provides the second search results and the link to the received search query.
Entity Detection Conclusion
The patent provides more details on how the inclusion of an entity name may influence the search results you see and how Google might identify entity names and associate them with specific pages.
One potential impact of this query rewriting process based upon the detection of entities in a query might be that if queries include brand names or business names or product names, or any of the other kinds of “entity names,” that pages associated with those entities may appear at the top of the search results.