One of the fastest growing and controversial pages on the Web is the Wikipedia. As of a few moments ago, the front page of the site indicated that it had 1,416,076 articles alone in English. There are many other articles in other languages. The Wikipedia:Multilingual coordination indicates that there are presently “somewhat active” Wiki encyclopedias in 114 languages.
Can studying the Wikipedia tell us things about the growth and structure of the World Wide Web?
A paper developed for presentation at the Proceedings of the Web Intelligence Conference in Hong Kong this coming December takes a look at the structure of links between pages of the Wikipedia, which it calls a wikigraph.
The paper is Temporal Analysis of the Wikigraph (pdf), from the Dipartimento di Informatica e Sistemistica at Universit`a di Roma “La Sapienza”.
What makes this interesting is that there are time stamps associated with changes to the Wikipedia. As the authors note:
The Wikigraph differs from other Web graphs studied in the literature by the fact that there are timestamps associated with each node. The timestamps indicate the creation and update dates of each page, and this allows us to do a detailed analysis of the Wikipedia evolution over time.
Can a study of updates and changes to the Wikipedia as reflected in these timestamps provide some insights into other “webgraphs” where such time information isn’t usually available? That’s one of the inquiries that is explored in the paper.
In addition to providing some insight into the structure of the web, the paper provides some tidbits about the wikipedia itself. For instance:
About only 7.5 % of the articles on the Wikipedia have a single editor.
About 50% have more than 7 people involved.
Around 5% have had more than 50 editors.
The average number of updates per user has dropped by about 30% in the last two years.
The average number of outlinks per article have grown from 7 out-links to an average of 16 over the past two and a half years.
There are also some interesting statistics about vandalism and the amount of time it takes to address acts of vandalism. For instance, when someone vandalizes a page by a mass deletion of content, there’s often a correction made within three minutes of that act – really an incredible figure.
Great paper.