These two patents granted today to Google look like they hold some interesting approaches to using large amounts of data about pages and queries and user interactions to rank pages in search results.

Ranking documents based on large data sets Invented by Jeremy Bem, Georges R. Harik, Joshua L. Levenberg, Noam Shazeer, and Simon Tong Assigned to Google US Patent 7,231,399 Granted June 12, 2007 Filed: November 14, 2003

Abstract

A system ranks documents based, at least in part, on a ranking model. The ranking model may be generated to predict the likelihood that a document will be selected. The system may receive a search query and identify documents relating to the search query. The system may then rank the documents based, at least in part, on the ranking model and form search results for the search query from the ranked documents.

Method and apparatus for learning a probabilistic generative model for text Invented by Georges Harik and Noam Shazeer Assigned to Google US Patent 7,231,393 Granted June 12, 2007 Filed: February 26, 2004

Abstract

One embodiment of the present invention provides a system that learns a generative model for textual documents. During operation, the system receives a current model, which contains terminal nodes representing random variables for words and cluster nodes representing clusters of conceptually related words. Within the current model, nodes are coupled together by weighted links, so that if a cluster node in the probabilistic model fires, a weighted link from the cluster node to another node causes the other node to fire with a probability proportionate to the link weight. The system also receives a set of training documents, wherein each training document contains a set of words. Next, the system applies the set of training documents to the current model to produce a new model.