Read Complete Research Material




The online encyclopedia Wikipedia is an impressive example of a global collective intelligence at work. Since its inception in January 2001, Wikipedia has grown to encompass 6.40 million articles in 250 languages generated from 236 million edits by 5.77 million contributors, as of this writing. Its growth has been exponential in key metrics such as number of editors and number of articles (Voss, 2005).

A number of methods for automatic assessment of the quality of Wikipedia's articles have also been proposed. Lih (2004) suggested that the number of edits and unique editors to an article were metrics for quality, but no justification was provided. Other characteristics such as factual accuracy (Giles, 2005; Encyclopaedia Britannica, 2006; Nature, 2006), credibility (Chesney, 2006), revert times (Viégas, et al., 2004), and formality of language (Emigh and Herring, 2005) have been used to assess small samples of Wikipedia's articles and in some cases compare them to articles of traditional encyclopedias. It is doubtful that encyclopedia quality can be assessed using a single metric (e.g. Crawford, 2001), but complex combinations of metrics (Stvilia, et al., 2005) depend on rather arbitrary parameter choices.

In this paper we first show that Wikipedia articles accrete edits according to a simple stochastic mechanism resulting in a population of disproportionally highly edited articles. We then demonstrate a strong correlation between number of edits and article quality. Topics of particular interest or relevance are thus naturally brought to the forefront of quality. This is significant because Wikipedia is frequently used as a source of information, and because other large collaborative efforts such as software development (Brooks, 1975), industrial design (Allen, 1966) and cooperative problem solving (Clearwater, et al., 1991) are known to produce ambiguous results as the size of the project increases.

Analyzing the Business Model of Wikipedia

Article growth

While individual users exhibit highly variable editing activities, the overall pattern of how articles accrete edits is well described by the simple stochastic mechanism described as follows. Consider the number of new edits ?n(t) to an article made between time t and time t + dt, an interval of perhaps several hours. Of course, complicated fluctuations in human behavior and activity cause this number to vary in a random way, but we claim that ?n(t) is on average proportional to the total number of previous edits. This is expressed mathematically as

where n(t) is the total number of edits to a given article up until time t, a is a constant (average) rate of edit accretion, and ?(t) is mean-zero random term accounting for fluctuations. The total number of edits at time t + dt is thus given by

Because of the random nature of human activity embodied by ?(t), the number of edits to a given article at a given time can be predicted only within a range of values specified by a probability distribution. Previous work on similar processes, such as World Wide Web traffic (Huberman and Adamic, 1999) and many others (e.g., Ross, 1996), has shown that the distribution resulting from equation (1) is lognormal and given by

where s2 is the variance of the ?(t). This equation shows that the distribution parameters µ = at and s2 = s2t are linearly related to the age t of the article. µ and s2 represent the mean and variance, respectively, of the ...
Related Ads