Is Ranking Search Results the Panda Patent?

Added – 3/4/2022 – I have written a follow-up to this post about this patent and the Google Panda Update, which I recently updated. I will edit this post some more too, but I wanted to make the updates from that post available here. It is essential to look at What Navneet Panda means by “Ranking Search Results” and the Ratio behind measuring site quality that looks at the referral queries that pages on the site get optimized for and the implied links (not actual links) that get mentioned on other sites that may point towards areas covered in this patent.

The other post is at: Google’s Panda Granted a Patent on Ranking Search Results

This post has a lot of comments, and some of the commentators do grasp that the point behind the patent is to understand one of the earliest versions of determining the quality.

Historically, the “Panda” update’s focus has been viewed as penalizing written content and not links to the site. however, under the “Farmer approach,” some areas were being seen as targeting an extensive range of very similar queries such as “how to tie a know” and “how to knot a Tie.” Because of that, this patent focus on what queries that pages are being optimized for in search engines or in the page content to reflect the query choices and how those pages are referred to in implied links. e ratio that the patent points at are what is being used to decide whether a site is of low quality and whether higher-quality pages should outrank it’s – thus the “Rankling Search Results” name.

Search Quality vs. Web Spam

Many patent filings I’ve written about from Google address webspam. They also look at how search engines may follow approaches to keep search results from manipulation. A early example is the patent from 2003 titled Methods and systems for identifying manipulated articles.

Is this Google's Panda?

But many patents I’ve written about involve Google trying to improve the quality of search results for searchers.

One of Google’s Patents (remember that PageRank was Stanford’s patent and not Google’s) detailed results in response to a query, boosting some for that query if they get linked to other top-ranking results for the same query T at patent, Ranking search results by reranking the results based on local inter-connectivity, got aimed at improving the quality of the top-ranking results.

Google’s Phrase-Based Indexing patents involve meaningful words and phrases that tend to co-occur in the search results for a specific query T e rankings of those Pages could get boosted when those phrases do appear or get inspired by how much weight gets passed along through anchor text using one of those related co-occurring terms T ese are search-quality patents.

There are several phrase-based indexing patents. At least one of those also addresses Web Spam by checking to see if there is a statistically abnormal amount of co-occurring words from the results on page S. The phrase-based indexing approach included a way to detect web spam.

Focus On Quality

A patent granted to Navneet Panda and Vladimir Ofitserov, Ranking search results, aims at improving search results rather than penalizing sites or identifying attempts to manipulate search results.

The Ranking Search Results patent does list only one “advantage” to following the process it describes:

Search results identifying low-quality resources can be demoted in a presentation order of search results returned in response to a user’s query. Thus, the user experience can be improved because search results higher in the presentation order will better match the user’s informational needs.

Just before the Panda update, there had been many public criticisms of the quality of search results showing up in searches at Google.

Here are a few examples:

December 13, 2009 – Dishwashers, and How Google Eats Its Tail – Paul Kedrosky

Google has become a snake that too readily consumes its own keyword tail. Identify some words that show up in profitable searches “from appliances, to mesothelioma suits, to kayak lessons churn out content cheaply and regularly, and you’re done O the web, no one knows you’re a content-grinder.

December 13, 2009 – Content Farms: Why Media, Blogs & Google Should Get Worried – Richard MacManus

From my analysis of Demand Media and similar sites, such content is very generic and lacks depth. W ile I wouldn’t go as far as wikiHow founder Jack Herrick and say that it “lacks soul,” it certainly lacks passion and often also lacks knowledge of the topic at hand A ringtone analogy with fast food is apt – it is content produced quickly and made to order.

January 2, 2011 – On the increasing uselessness of Google….. – Alan Patrick

But this year it hit home how Google’s systems have gotten spammed, as < vital>typically anything on Page 1 of the search results was some form of SEO spam – most commonly a site that doesn’t sell you anything, points to other sites (often doing the same thing) while slipping you some Ads (no doubt sold as “relevant”) T e other primary scam site type copies part of the relevant Wikipedia entry and throws lots of Ads at you

January 3, 2011 – Trouble In the House of Google – Jeff Atwood

Like any sane person, I’m rooting for Google in this battle. I’d love nothing more than for Google to tweak a few algorithmic knobs and make this entire blog entry moot S ill; this is the first time since 2000 that I can recall Google search quality ever declining. It has inspired some somewhat heretical thoughts in me — are we seeing the first signs that algorithmic search has failed as a strategy? Is the next generation of inquiry destined to be less algorithmic and more social?

It’s a scary thing to even entertain, but maybe gravity has broken.

January 27th, 2011 – Google Search Quality Decline or Elitism? – AJ Kohn

Google could certainly do that. T ey could stand up and say that fast food content from Demand Media wouldn’t gain prime SERP real estate G ogle could optimize for better instead of good enough. They could pick fine dining over fast food.

But is that what the ‘user’ wants?

Improving Quality

As you see from those quotes, there was a sense of Google results getting broken and showing results more focused upon matching queries than returning quality results.

These criticisms got heard, even at the Googleplex I February of 2011, the Official Google Blog told us of an update in Finding more high-quality sites in search T is change covered a fair number of inquiries and was aimed at surfacing higher quality content:

But in the last day or so, we launched a pretty significant algorithmic improvement to our rankings, “a change that impacts 11.8% of our queries,” and we wanted to let people know what’s going on T is update gets designed to reduce rankings for low-quality sites,” which are low-value add for users, copy content from other websites or areas that are not very useful A the same time, it will provide better rankings for high-quality sites with original content and information such as research, in-depth reports, thoughtful analysis, and so on.

After watching the Panda update and reading a lot of threads in forums and other places about sites impacted by Panda, and working on areas that were, is whether the patent from Navneet Panda describes the update and attempts to improve the quality of search results.

Here’s a quick summary from the patent of what happens that it describes:

Determining, for many groups of resources, a count of independent incoming links to help in the group
Deciding, for each of the plurality of groups of resources, a count of reference queries
Choosing, for each of the plurality of groups of resources, a respective group-specific modification factor, wherein the group-specific modification factor for each group get based on the count of independent links and the count of reference queries for the group
Associating, with each of the plurality of groups of resources, the respective group-specific modification factor for the group, wherein the respective group-specific modification for the group modifies initial scores generated for help in the group in response to received search queries

So the patent has many parts which work together.

The first involves looking at the links pointing to the pages of a site and removing all the backlinks that look like they might get affiliated (under co-ownership or control) with the site or reducing the number of independent links to account for things like sitewide links T is may get a sense of how many different pages and sites are linking to the pages of this site M re independent links from more sources could get seen as a sign of quality.

The second is an analysis of whether pages appear to get targeted at specific referring queries. W ile it’s not unusual for someone doing SEO to try to make every page on a site a potential landing page, many of the places that we refer to as content farm sites often use every page to target commercial queries and many variations of those queries S a content farm type of site might include many pages that attempt to refer to any questions.

The independent links count and the referring queries count for the different groups that a site might get broken into and looked at as a ratio, with separate link count over referring query count I there are a lot of independent links and few referring queries, this number could be over one I there are a few separate links and lots of referring queries, the number could be a fraction of one.

This number based upon the links and the queries would then get multiplied by a score modified by whether each page gets seen as a navigational type result for a query term or phrase T e more it is like a navigational term or phrase, the higher the score T e final score could boost ranking scores for some results and diminish scores for other effects.

Groups Rather Than Pages

Instead of targeting specific pages or sites as many ranking algorithms do, the patent tells us that it looks at “groups” of resources. A group might get defined in several ways, and group resources can only be included in one group.

A group might be address-based so that all the resources within the group are all in the same domain name, such as http://www.example.com O all in the same hostname on a domain, such as http://host1.example.com or http://host2.example.com.

Groups of resources might get partitioned by a count of reference queries for each of the groups “so that each partition includes groups of resources whose counts of reference queries are within a respective range of counts of reference queries.”

Under this approach, one website might get broken into more than one group or might be part of a group that contains more than one website T rank the pages within these groups, the ratio of independent links to reference queries might get multiplied by a score involving navigational signals to determine a final rank.

Independent Links in Ranking Search Results

If the purpose behind this patent is to rank pages higher that are higher quality, one way to do that is to look at the number of independent links to those pages or groups of pages.

For each group of resources, the patent might count the number of links to those groups – but not all the links A d not express links – links that you can click upon to get to another page I might also count implied links, which sound more like what we often refer to as citations A express link can get used to navigating to a place where a suggested link can’t get clicked on to bring a person to the target of that link.

Why doesn’t the patent mention PageRank T is a metric. Pagerank are quality signals, but not every ranking from Google includes PageRank T is reliance on independent links eliminates the benefit of having a site with lots and lots of pages to get linked to from the exact location or areas under the same ownership or control or linked to sitewide from other sites.

An independent link is where the source and the target get determined to be independent of each other T e source group that a link is in, and the target group can get checked to see if they are separate.

Determining that links from one group to another are not independent can also involve determining that those resources are likely to get related, such as owned, hosted, or created by the same entity.

If the resources have similar or identical content, images, formatting, or so on, their similarity is another sign that the resources are not independent.

There may be many links from one resource to a targeted group, and only one link might get counted as an independent link. Though it’s not said in the patent, this could keep sitewide links from getting counted more than once.

Reference Queries

Besides analyzing the links pointing to the different reference groups, this process looks at the site’s pages and queries that each might target H w well. Do those pages please those queries?

If a page includes the term “example.com,” it might refer to the home page I it contains words that searchers use to refer to the pages of a place, it might get said to involve referring queries that refer to those pages T e patent provides an example of others by telling us that:

…if the terms “example sf” and “esf” are often used by searchers to refer to the resource whose URL is “http://www.sf.example.com,” queries that contain the terms “example of” or “esf,” e.g., the queries “example of news” and “esf restaurant reviews,” can get counted as reference queries for the group that includes the resource whose URL is “http://www.sf.example.com.”

Navigational Queries

In the post, How Google May Identify Navigational Queries and Resources, I wrote about how Google used a document classification approach to determine whether a page was one that searchers entered a query for, expecting to find a specific page, such as the official homepage of the product or service included within the question.

To a degree, this kind of inquiry isn’t too different than the set of questions that get raised in Amit Singhal’s Official Google Blog post, More guidance on building high-quality sites. Such questions got worked into an analysis of a spot at a stage like at this point, though the patent doesn’t refer to them specifically.

The Ranking Search Results patent is:

Ranking search results Invented by Navneet Panda and Vladimir Ofitserov Assigned to Google United States Patent 8,682,892 Granted: March 25, 2014 Filed: September 28, 2012

Abstract

Methods, systems, and apparatus for ranking search results, including computer programs encoded on computer storage media.

The methods include:

Determining, for each of a plurality of groups of resources, a respective count of independent incoming links to help in the group

Deciding, for each of the plurality of groups of resources, a respective count of reference queries

Selecting, for each of the plurality of groups of resources, a respective group-specific modification factor, wherein the group-specific modification factor for each group got based on the count of independent links and the count of reference queries for the group

Associating, with each of the plurality of groups of resources, the respective group-specific modification factor for the group, wherein the respective group-specific modification for the group modifies initial scores generated for help in the group in response to received search queries

Ranking Search Results Observations

The chances are that Google tweaked and changed the Panda Algorithm in the weeks and months after it was first applied, and I may have made many changes to the patent after an initial beta period.

I’ve seen several denials from people about this particular patent describing the Panda update since I wrote about finding it last week in Google’s Panda Granted a Patent on Ranking Search Results. These denials got based upon the existence of a link analysis described within the patent, without looking more at the actual process involved and claiming that the patent more likely detailed the Penquin approach than the Panda approach.

But the link analysis here involving independent links and referring queries is more of an attempt to gauge the quality of a site than the backlink profile of that site. The “navigational” query analysis that could involve issues such as the example 23 questions that Amit Singhal provided us with also attempts to understand the quality of pages.

I changed the title of this post to stress that it is the Panda patent. I m open to the possibility that the Panda Updates followed a somewhat different course as they got implemented and tested.

There Have been many patents co-authored by Navneet Panda. He hasn’t been Prolific, but he has created some interesting inventions:

3/25/2014 – Google’s Panda Granted a Patent on Ranking Search Results
4/14/2015 – Early Panda and Concept Templates
5/12/2015 – How Google May Calculate a Site Quality Score (from Navneet Panda)
6/27/2017 – A Panda Patent on Website and Category Visit Duration
6/28/2017 – Click a Panda: High Quality Search Results based on Repeat Clicks and Visit Duration

Last Updated March 4, 2022.