I’ve been faced with a pretty difficult decision, choosing the last of the patents, or patent families to include in this series of posts about the most important search-related patents to people who promote sites on the Web. I find I just can’t choose one.
Synonyms
For the last few weeks, I’ve been arguing with myself over a choice of at least two sets of patents. One patent that I wanted to include involved responding to informational needs by going beyond matching keywords to expand the query terms used in search results to include synonyms and pages on related concepts. There are several related patents granted to Google that describe how the search engine might identify synonyms, and it’s worth spending some time with all of them.
- Search queries improved based on query semantic information
- Identifying a synonym with N-gram agreement for a query phrase
- Determining query term synonyms within query context
- Identifying common co-occurring elements in lists
- Longest-common-subsequence detection for common synonyms
- Document-based synonym generation
- Machine Translation for Query Expansion
Large Data Sets
Another patent, or in reality a group of patents that I kept on coming back to is a set that focuses upon “large data sets.” Yes, they use the phrase multiple times in each patent, as well as in titles for those patents. And when they write “large,” they mean really really big.
A couple of years back, a search engineer from one of the top commercial search engines shared with me the thought that the search engines collect so much data about how people search, use search engines, and browse the Web, that their difficulty wasn’t so much gathering the data, but rather figuring how to use it.
If you’ve purchased books at Amazon.com, you’ve likely experienced their recommendation engine which shares with you books that people who viewed or purchased some of the same books as you might have viewed or purchased.
Imagine a search engine building a model that might look at a combination of data from and about users, queries they used, and documents they might or might not have selected. Each of these combinations is referred to as an “instance. An instance is a “triple” of data: (u, q, d), where u is user information, q is query data from the user, and d is document information relating to pages returned from the query data.
This model would be a prediction model that would rank pages based upon the likelihood that a particular page or another kind of document would be selected by a particular search at a particular time and day from a certain location.
I did cover a couple of those patents in a fair amount of detail in my post, Google and Large Scale Data Models Like Panda. Rather than going over those again, I’d recommend visiting that post.
Why I look at Patents
So instead of recommending one last patent or set of patents, I’d rather use this last post to point out the reasons why I spend a lot of time looking at search-related patents:
Search related patents provide insights into many assumptions that search engines and search engineers hold about search, searchers, and the Web.
They sometimes predict and provide a preview of things that the search engines might launch.
They sometimes give us previously unknown details about things that the search engines have been doing.
Search related patents describe some of the research conducted by people at search engines, even if the methods and processes behind some of them may not have been implemented.
We get a glimpse of search engines as businesses, their desires to improve the quality of services they provide, their methods for measuring and testing what they do, and the different directions they might take the things they offer into.
Search patents offer the possibility of raising questions worth asking and experimenting with about how search works or might work.
I spend a few hours every week looking for the holy grail of search, the patent that explains the latest and greatest shifts and changes to the algorithms that power how Google and Yahoo and Bing work. Most patents don’t rise to that level, but instead offer tantalizing hints of a jagged bigger picture, like picture puzzle pieces that don’t necessarily always fit together.
The search engines don’t patent every idea that they have, and some patents may even mislead by their very existence, pointing down paths that the search engines may never follow. I’m often left with more questions than answers when reading through a patent, or at least some of the best of them. Those are the patents that force me to ask myself things like:
- What would it mean if the search engines did this?
- How could I tell that the search engines aren’t doing this?
- What might the search engines be doing instead?
- How might people attempt to abuse this method?
- Does the technology exist to do this yet?
You don’t have to read through lots and lots of patents to do SEO. But I find it helps me.
I hope you have enjoyed this series. Thanks for reading.
All parts of the 10 Most Important SEO Patents series:
Part 1 – The Original PageRank Patent Application Part 2 – The Original Historical Data Patent Filing and its Children Part 3 – Classifying Web Blocks with Linguistic Features Part 4 – PageRank Meets the Reasonable Surfer Part 5 – Phrase Based Indexing Part 6 – Named Entity Detection in Queries Part 7 – Sets, Semantic Closeness, Segmentation, and Webtables Part 8 – Assigning Geographic Relevance to Web Pages Part 9 – From Ten Blue Links to Blended and Universal Search Part 10 – Just the Beginning