Does Having keywords in URLs Make a Difference to Rankings?
Is there any value in using keywords in URLs of web pages? Would a search engine look at keywords that you might include in the addresses of your pages and associate those keywords with the content of your pages in the search engine’s index?
If so, how would a search engine go about looking at the web addresses indicated in the URLs to your pages and break them down into meaningful parts to identify keywords?
Breaking URLs down into parts may also play a role in how a search engine might crawl the pages of a website.
A newly published Yahoo patent application gives us some ideas on how it might extract keywords in URLs of pages and rank them and use information uncovered in the process to determine which pages to crawl first from a website.
Techniques for Tokenizing URLs Invented by Krishna Leela Poola and Arun Ramanujapuram Assigned to Yahoo US Patent Application 20090083266 Published March 26, 2009 Filed November 6, 2007
A search engine will look at many different signals to determine what a page on the Web is about and attempt to rank pages based upon keywords that might state the subject matter or content of those pages.
Many of those keywords can get extracted from the content of the pages themselves. Still, a search engine can look at other information associated with pages, such as the addresses of the pages.
Keywords may also get extracted from the URLs of pages by using an algorithm that can break the URL into components, understanding the structure of those URLs, and removing candidate keywords from the different parts found within the URL. Keywords in URLs may not be a powerful ranking signal, but it may be something that can influence rankings.
Parts of URLs
The patent application defines different parts of URLs:
Scheme – This section of a URL identifies the internet protocol used to access a resource, such as HTTP or FTP
Authority – The part of a URL that identifies the host server where the documents or resources get located or the domain name.
Path – This is the information following the slash character after the authority or domain name, and it identifies the specific page or resource
Query arguments – A string that may appear in a path that can get broken down into name and value pairs, such as “category=shirts.”
Fragments– A fragment identifies a subsection within a page that might get pointed to in a URL, usually started with the “#” symbol
An example of these five different components from the patent filing:
http://www.yahoo.com:80/shopping/search?kw=blaupunkt#desc
In this URL, the scheme is “HTTP.”
The authority is “www.yahoo.com:80,” which shows the domain and also includes a port number of “80” in this instance.
After that first single slash, the path is technically everything: “shopping/search?kw=blaupunkt#desc.”
A query argument shown in this example is “kw=blaupunkt.”
A fragment from this URL is #desc
Tokenizing URLs for Keywords and Web Crawling
The patent application describes how it might break down URLs into parts, or components, to extract keywords from URLs. Those keywords could categorize pages for web search and understand what pages are about when providing advertisements.
This breaking down of URLs into components and even smaller parts have gotten referred to as “tokenizing URLs.” In addition to helping a search engine find keywords in URLs, it can have an impact on the indexing of the pages of a website:
The tokens generated by URL tokenization may also get assigned with features of the web document to improve the efficiency of a web search. Tokenizing URLs is also the first step when clustering URLs of a website. Clustering URLs allows the identification of portions of a web document that hold more relevance. Thus, when a search engine crawls a website, some web documents may be white-listed and crawled, while other portions may be black-listed and should not get crawled. This leads to more efficient web crawling.
Conclusion
Yahoo provides a fair amount of detail in the patent filing on how URLs can become broken down into components. Or how keywords can get extracted from those components, and can get provided different rankings. If you’re interested in how the URLs of your site might get treated under this process, it’s worth spending some time with the patent filing itself to grasp the technical details. Keep in mind that the processes from this patent application may not be the ones that Yahoo may presently be using at this time,
A cautionary note – changing the URLs to your pages, especially if those URLs have been around for a while and have gotten indexed by search engines, is an undertaking that shouldn’t get started without careful consideration and without using a cautious approach that keeps the risk behind such a change to a minimum.(this is a change that I have made.) Such an approach can include using proper redirects (permanent 301 redirects) to any new URLs for external links pointed to pages of the site, actually changing URLs in internal links to the new addresses upon the site itself, and other technical methods that might help a site retain its rankings in search engines. How a search engine might react to changes to the URLs of the pages of a site can vary from one search engine to another, and traffic to the pages of a site may be negatively impacted by such a change for a period of time regardless of how carefully such a change can get implemented.
ps. Nice introduction to keyword research here: How To Choose Keywords and Variations of Keyword Phrases – SEO Basics (Sorry – no longer available.