How Search Engines Might Use Knowledge Base Information for Underserved Queries

If I were to tell you that the major search engines have a bigger and richer database full of information than their World Wide Web index, would you believe me? The chances are that you’re one of the persons who helped build it. The information that Google and Bing, and Yahoo collect about the searches and query sessions and clicks that searchers perform on the Web covers an incredible number of searches a day. When Google introduced their Knowledge Graph this past May, they gave us a hint of the scope and usage of this database:

For example, the information we show for Tom Cruise answers 37 percent of the next queries that people ask about him. In fact, some of the most serendipitous discoveries I’ve made using the Knowledge Graph are through the magical “People also search for” feature.

When someone searches for a query that doesn’t produce many results at Google or Bing, the search engines might remove some of the query terms to provide more results, or they might look for synonyms that might help fill the same or a similar informational need. But the chances are that such approaches still might not produce the kinds of results that searchers want to see.

In How Google Might Suggest Topics for You to Write About, I covered a patent application (now granted – patents that Google had filed 8,037,063 and 7,668,823) co-authored by Jeffrey D. Oldham, Hal R. Varian, Matthew D. Cutts, and Matt Rosencrantz, which describe how Google might identify queries that don’t produce any relevant results, and what Google might do to increase the amount shown.

A Microsoft patent granted last week covers similar ground. The patent describes how it might take advantage of a knowledge base made up of information gathered from query logs and search history to broaden the results available to searches. It also describes a marketplace where those underserved queries might become accessible to content creators to enable them to produce relevant content for a fee. This echoes in many ways the approach that the Google patents uncovered.

In short, Bing might identify underserved queries where there might be below a certain threshold of search results. It might find other similarly underserved queries related and categorize those into a taxonomy with a set of associated attributes. The content shown in search results might match the query terms and associated queries around those related categories.

We did get a glimpse of how Bing might generate and include knowledge base results within search results from a Microsoft patent application I wrote about in Should You be Doing Concept Research Instead of Keyword Research? The types of knowledge base results described in that approach would definitely make those search results much more useful by including information about the kinds of things that people commonly search for that might be related to the queries originally searched for. Bing Snapshot Results help expand search results in many useful ways. But how well do they help with queries that might be long-tail queries?

The Microsoft patent is:

Generating content to satisfy underserved search queries Invented by Mark Looi Assigned to Microsoft US Patent 8,311,996 Granted November 13, 2012 Filed: January 18, 2008

Abstract

Generating content to satisfy search engine queries is described. First, a knowledge base including a plurality of prior search queries for a search engine and corresponding prior search results provided by the search engine is accessed, and a plurality of underserved search queries are identified, wherein each of the underserved search queries comprises a search query pattern having a below a threshold number of search results.

Each of the underserved search queries is heuristically related to one another. The plurality of underserved search queries is aggregated into a taxonomy category having a set of associated attributes, the attributes descriptive of the plurality of underserved search queries. Targeted content is generated based on the attributes, wherein the targeted content is tailored to satisfy the underserved search queries.

Both the Microsoft and the Google approach to underserved queries mention a way of letting people know which queries are underserved in an easy-to-find manner and set up some marketplace where people could even be paid to create content queries. In some ways, both remind me of what Demand Media patented to try to capture underserved queries, as I described in How Demand Media May Target Keywords for Profitability.

I’m not sure that we will ever see either Google or Bing create such a marketplace. Still, I can see how they might provide more quality results by each using their query logs and web search histories as knowledge bases to provide related search information, locating different aspects of long-tail queries that might include information in similar related categories.