Last Thursday, the Wall Street Journal published a couple of articles that point to a new direction in the future from Google, With Semantic Search, Google Eyes Competitors, and Google Gives Search a Refresh. On Friday, Barry Schwartz reported at Search Engine Land that Google’s Head of Spam, Matt Cutts announced that Google was working upon an “Over Optimization” penalty for websites that were stuffed with too many links and had excessive links pointed to them, in the post, Too Much SEO? Google’s Working On An “Over-Optimization” Penalty For That.
Thursday evening I visited the Philadelphia offices of Seer Interactive to give a presentation on some of the changes in Search and Social activities involving SEO in a free presentation hosted by Wil Reynolds and the Seer Interactive team. Amongst the possible changes I pointed out included more emphasis on search as a knowledge base, with more Q&A results, and a greater emphasis on information extraction around entities as described in the Wall Street Journal article.
Search and social patents for 2012 and beyond View more PowerPoint from SEO by the SeaThere was a nice turnout and the hospitality of the Philadelphia search community was tremendous. Seer Interactive is hoping to help Philadelphia become better known as a centerpoint of the search community, and with Seer’s Search Church office as a meeting place for future presentations, I think that’s a very real possibility:
Some of the points in the presentation that you might not get from viewing it by itself. In the third slide, I mentioned that I had an epiphany in second grade that stuck with me. I went to a basketball game with my father and noticed that I couldn’t see the score on the scoreboard, even though everyone else could. I didn’t know what an “epiphany” was (I was in second grade), but I realized that everyone didn’t have the same perspective and that our vision shaped our world around us.
I linked to a blog post of mine in that slide (and to other blog posts from SEO by the Sea and other sites, and some patents as well in other slides) that describes a patent application that I looked at and wrote about back in 2004 that gave me a different perspective on the importance of geography in local search. I experimented with and tried several things described in that patent filing, and they made a difference. With that success, I started looking at many other patents.
In the presentation, I point towards the tremendous growth of Google’s patent portfolio over the past few years, from 187 granted patents assigned to Google at the USPTO in October of 2008 to 809 granted patents assigned to them in February of 2011, to 4,163 granted patents assigned to them on March 14th, 2012. Many of those patents were acquired from other companies such as IBM and Xerox and Verizon, but many were also developed in house.
While a good number of the new patents point to hardware including game controllers, desktop and network computer architecture, fiber optics networking, and so on, many also point to new approaches to search, including recently acquired patents from Xerox on scoring document quality.
New Approaches to Rankings
Representatives from Google have been repeatedly telling us that the company has been averaging about 500 new changes a year to its core search ranking algorithms as well over the past few years.
One place where this has been increasingly visible is in Google’s ongoing redefinition of “relevance” and what it means to present relevant results to searchers. At one point, matching keywords in a query to pages on the Web that contained those keywords was what search engines specialized in. Google brought some advances to what earlier search engines such as AltaVista and Excite and Lycos were showing us by using link analysis methods like PageRank to try to show us the most important (or at least most popular) of those pages at the tops of results.
In one of the slides within my presentation, titled “Expanding Relevance,” I show some of the results on a search for the term “Wilco” which gives us some web page results, including the home page for the band, and for a business with that name, videos in case searchers wanted to listen to the band, a tour schedule with links to ticket sales, news results for the band, and links to albums by the band. Relevance has expanded beyond just finding and ranking webpages that include the query terms on those pages to providing different ways to meet the intent of searchers.
The Wall Street Journal articles hint at a time when Google might better understand different attributes associated with specific entities and show results that might be more relevant. The results for the band Wilco are a good example of what we might see with other types of entities in the future. Google recognized that Wilco is a named entity and that there are attributes associated with it that it might be good to show in search results, such as [wilco tickets], [wilco schedule], [wilco videos], and [wilco albums].
Expect that in the future, when our searches include other named entities, or specific people, places, and things, that we might see a wider range of results that cover attributes associated with those entities. A search for Hawaii, for example, might include information about travel and tickets to Hawaii, weather, recent news, history, and other types of attributes that searchers might associate with searches related to Hawaii. What this might mean for someone creating pages on the Web about Hawaii is that it might be harder to rank well for general pages about Hawaii, and easier to show up in search results for different attributes people might be interested in related to Hawaii.
This type of information extraction to understand specific entities, and extract facts about them is something people at Google have been pursuing for years, and has been part of what they’ve been focusing upon at least as long as they’ve been using systems like PageRank. An early paper from Sergey Brin from his days at Stanford shows an interest in this kind of approach that goes back more than a decade: Extracting Patterns and Relations from the World Wide Web (pdf).
The presentation covers many other approaches to search and rankings, including Phrase-based indexing, Concept-Based Indexing, Using triples of data from large data sets involving users and queries and sites to predict pages people might want to see, building a knowledge base of aspects about entities, and a planet-scale distributed data system that can include both global and regional results. The growth of systems like these can mean less reliance upon exactly matching keywords on documents and upon excessive links.
The presentation also looks at some of the patents and papers that might be behind Google’s increasing use of social signals in ranking pages in both social search and eventually web search itself. Again, the use of signals like these can mean that some of the signals that Google used in the past might not carry as much value as they do now.
Over Optimization
Regarding Matt Cutts’ statement that Google may come up with an “over-optimization” penalty in the future to help sites that aren’t showing up as high in search results because of other sites that might have excessive links pointed to them or contain specific keywords more often, you can get the sense that this is something Google has been aiming at for years by looking at many of the patent filings and whitepapers from the company.
For instance, one patent that I wrote about in October of last year described how Google might identify when site owners take over other sites and use them to create links to their sites, using pages from the acquired sites as doorway pages. That can include links that might be part of private blog networks, or from individual pages that aren’t part of such networks.
Google’s Phrase-Based indexing approach also includes a method that might help to identify web spam based upon a statistically unusually high amount of related co-occurring phrases appearing upon a page.
Another Google patent that I wrote about a couple of years ago, Google’s Affiliated Page Link Patent, described how Google might limit the amount of PageRank that flowed from pages on one site to pages on another that appeared to be related in some manner, such as being under the same ownership or having some other close relationship.
An aim of good SEO is to improve the quality, relevance, and usability of pages for visitors so that the objectives of the owners of those pages are furthered, and people looking for what is offered on those pages are more likely to find those pages. Optimization, as a term, means to make something the best that you can, and in SEO usually aims at making a page the best that one can in terms of satisfying people using a query term that the page is about, to meet their informational or situational or transactional needs.
Some people promoting web pages attempt to use tactics like overstuffing a page with a particular keyword or pointing as many links to it as possible that use that keyword in the anchor text, without necessarily attempting to make that particular page one that will satisfy visitor’s needs. If you listen to the audio from the Search Engine Land post that I linked to in the first paragraph of this post, those types of activities aimed at making a page appear more relevant than it is for a specific term or phrase appears to be what Matt Cutts is discussing when he talks about an “over-optimization” penalty.
So a penalty like this might do things like ignore the value of anchor text in blog comments or forum signatures pointing to pages, lessen the value of links between sites that are related in some manner, lessen the value of keywords or related terms that appear on the same page at a very high rate, or apply some other similar approaches.
That doesn’t mean that the value of thoughtfully created, high-quality pages, following best SEO practices will be harmed. The goals of that type of SEO align with the goals of search engines in helping people find pages that help meet their needs.