- "The Wisdom of Polarized Crowds"
- Reviewed by FULBERT
While politics in the United States appears to be increasingly polarized around extremes in political discourse, it was unclear how this affected the open, collective production of knowledge that is Wikipedia.
The researchers used a data dump of English Wikipedia from 12/1/16, including all edits made since its start within the domains of politics, social issues, and science. They focused on the "American liberalism" and "American conservatism" categories and sub-categories as delimiters, with breakdowns in social issues and science down four levels from the root. The researchers reached out to the Wikipedia community, Wikimedia staff, and those who directly inquired on the page they created through Meta-Wiki, with 118 responses overall for their survey. The researchers then analyzed user edits to determine political alignment based on contributions to conservative or liberal articles.
The researchers found that "articles attracting more attention tend to have more balanced engagement from editors along the conservative-liberal spectrum" (p. 4). They then measured the quality of articles using a tool developed by Wikimedia research staff (ORES), and determined that higher political polarization was associated with higher article quality. All this fed into their study goals of exploring the relationship between diversity of political alignment and article quality and bias. Through their statistical analysis, they determined that the quality of articles in Wikipedia improves when editors on both sides of politically polarized issues work together to seek collaborative consensus on topics. While this research was directly focused on politically-related topics, it surfaced both a need for political diversity and for motivated contributors.
(Cf. related earlier coverage: "Being Wikipedian is more important than the political affiliation", "Cross-language study of conflict on Wikipedia")
The study of controversy
- "Computing controversy: Formal model and algorithms for detecting controversy on Wikipedia and in search queries"
- Reviewed by Barbara Page and Tilman Bayer
This paper presents a "method for automatic detection of controversial articles and categories in Wikipedia", based on three data sources:
- Ratings submitted by readers via the Article Feedback Tool (AFT) in 2011 and 2012
- The list at Wikipedia:List of controversial issues (manually maintained by Wikipedia editors)
- A sample of 512 sections drawn randomly from the talk pages of articles on that list ("Surprisingly, only 19.5% of the sections turned out to be controversial").
The researchers argue that applying a mathematical model to Wikipedia talk page controversies has the potential of incorporating a 'controversy' metric in web-searches. This should give those searching for information on a topic a way to quickly assess controversial topics. Wikipedia provides researchers with accessible and historical controversial discussions. The authors further describe their work: "[Assessing] the controversy should offer [readers] a chance to see the 'wider picture' rather than letting [them] obtain one-sided views." The authors' conclusions were: "Our approach can be also applied in Wikipedia or other knowledge bases for supporting the detection of controversy and content maintenance. Finally, we believe that our results could be useful for...understanding the complex nature of controversy..."
Students edit but still doubt the value of Wikipedia
- "Wikipedia in higher education: Changes in perceived value through content contribution"
- Reviewed by Barbara Page
Students are a convenient group to study, especially if being studied is part of the syllabus. The 240 students in this study readily admitted to using Wikipedia as a resource even though they did not be consider it to be 'reliable and trustworthy'. Using Wikipedia as a resource does not necessarily encourage content contributions by students. In addition, when the students in this study actually added content, their perceptions of the reliability and usefulness of Wikipedia did not change.
(For coverage of various other papers studying the use and perception of Wikipedia by students, see also our 2017 special issue on Wikipedia in Education)
Researching the research using Wikipedia as a corpus
- "Excavating the mother lode of human-generated text: A systematic review of research that uses the Wikipedia corpus"
- Reviewed by Barbara Page
The amount of research that uses Wikipedia as a source of data continues to grow and enough scholarly content now exists that systematic reviews are available. Computer science has especially been quick to see the potential of this 'mother lode' and how it can be used to study information retrieval, natural language processing, and ontology building. The reference section in this article itself makes interesting reading if only to appreciate the collection of data sets and other research that exists and continues to expand.
(See also our earlier coverage of literature reviews, some involving the same authors: "A systematic review of the Wikipedia literature", "'Wikipedia in the eyes of its beholders: A systematic review of scholarly research on Wikipedia readers and readership'", "Literature reviews of Wikipedia's inputs, processes, and outputs")
Sneaky editing and masking bias
- "Persistent Bias on Wikipedia: Methods and Responses"
- Reviewed by Barbara Page
Apparently, Wikipedia editors are not the only ones who have observed biased editing. The author of this research article (already mentioned in a previous issue) used his own article as a case study and example of biased editing. It is no surprise that an editor can 'nominally' follow editing guidelines to maintain their bias. Here is the 'how to' on such behavior:
- deleting positive material
- adding negative material
- using a one-sided selection of sources
- exaggerating the significance of references and topics
Those who are biased sometimes support their editing even in 'the face of resistance'. This is done by:
- reverting edits
- selectively invoking Wikipedia rules
- overruling (bullying?) resistant editors
When bias is challenged by other editors, the strategies for dealing with it is making complaints, 'mobilizing counterediting', and exposing the bias. The authors' stinging conclusion speaks for itself: "It is worthwhile becoming aware of persistent bias and developing ways to counter it in order for Wikipedia to move closer to its goal of providing accurate and balanced information."
- "Information Fortification: An Online Citation Behavior"
- Reviewed by Barbara Page
This study is a rebuttal to a 2005 position paper by Forte (one of the authors) and Bruckman, which had drawn "on Latour’s sociology of science and citation to explain citation in Wikipedia with a focus on credibility seeking". Citing sources is associated with other issues of bias and identifies the patterns used to in citing sources to encourage and even fabricate controversy. This study was limited to non-scientific topics and used data derived from edit logs, interviews and text analysis. "[I]nformation fortification [is] a concept that explains online citation activity that arises from both naturally occurring and manufactured forms of controversy."
Anti-vandalism on Wikidata
- "Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017"
- Reviewed by Barbara Page
Vandalism of Wikidata can have significant disruptions in the use of the data leading to flaws in the analysis of such data. Collaborative efforts continue to address these concerns and included some friendly 'competitions'. Strategies for 'fighting' vandalism at this time include manual review, community feedback, and analyzing reverting patterns. Other 'vandalism' fighting tools are being developed. Interesting is the discussion about the effort to use "psychologically motivated features capturing a user’s personality and state of mind..."
Wikipedia's one-way relationships with Reddit and Stack Overflow
- "Examining Wikipedia With a Broader Lens: Quantifying the Value of Wikipedia's Relationships with Other Large-Scale Online Communities"
- Reviewed by Steve Jankowski
There is a growing body of literature that examines Wikipedia's role in creating value for other websites as part of a media ecosystem. Adding to these studies is the work of Vincent, Johnson & Hecht who examined the bidirectional value created for Reddit and Stack Overflow. Conceptually, the authors distinguished between two sets of metrics to define this value. For Reddit and Stack Overflow, they understood value as being a function of user engagement (score/votes, comments, page views) that is contextualized by potential revenue. For Wikipedia, value is likewise seen as user engagement, characterized by edit count, editors gained, editors retained, and article page views, but is not contextualized by revenue (p.4).
Based on this operationalization of value, the authors assessed the amount of content and links created through associative and causal analyses. They found that Wikipedia provided substantial value to Stack Overflow and Reddit. Most clearly, they illustrated this by explaining how posts containing Wikipedia links gained engagement levels that were estimated to be worth $100K per year (p.2). However, this level of engagement did not operate in the reverse. The authors found "negligible increases" (p.2) to the number of edits and editor signups. Based on these results, the authors observed that the relationship between Wikipedia and the two communities was "one-way", with Wikipedia providing more value than it received in return.
Considering this new direction in studying Wikipedia, there are a number of elements that require commentary. The first is the obvious care the authors displayed in their methods. For example, they were conscious of the need to adjust their analyses to consider the skew of current events by providing inter-rater agreement on the required qualitative analysis that this required. The second comment is that there is a conceptual mismatch of using revenue as an appropriate metric for analyzing value created "between communities", considering that the communities themselves do not receive any profit. Perhaps future research in this area might need greater granularity in the type of relationships that reflect differences between community-to-community, owner-to-owner, and community-to-owner.
Despite this terminological slippage, this research adds specific details to Van Djick's analysis of the social media ecosystem where she described the character of the relationship between Google and Wikipedia within a for-profit context. Likewise, the article provides greater support to conclusions presented in an earlier study conducted by McMahon, Johnson & Hecht. In that paper, Google's usage of Wikipedia content in its Knowledge Graph results was shown to reduce the amount of through traffic when a link to Wikipedia was removed. As the authors of both papers agree, contextualizing Wikipedia as part of an ecosystem is significant for understanding and assessing how external relationships can be adapted to the sustainability of Wikipedia.
A 2015 study confined to the subreddit /r/todayilearned (TIL) found "strong statistical evidence suggesting Reddit threads affect Wikipedia viewership levels in a non-trivial manner", but did not examine effects on editor activity.
Articles receiving the most attention (by editors) overall lack the depth of quality found in featured articles
- "Knowledge categorization affects popularity and quality of Wikipedia articles"
- Reviewed by FULBERT
This empirical research paper explored how knowledge categorization – common in classification systems within the information sciences – works as a scientific and social process when Wikipedia articles are attended to by editors. Categorization leads to nesting of information under major topics, and the further down a hierarchy, the less editing attention articles appear to garner. Articles higher in the hierarchy are referred to as coarse-grained, and while these receive the most attention, their levels of quality have not been the focus of previous studies.
The researchers analyzed a database dump of the English-language Wikipedia from October 20, 2016, considering all articles that were members of at least one category (n=5,006,601). They defined granularity as the length of the shortest path from the root (main category), which averaged 7.59 across all articles, which they then compared to the number of article edits (which related to preception of higher quality articles), the number of articles as rated by importance (done individually by WikiProjects), perceptions of quality (based on being classified as a featured article), and the notion of return on effort (quality of an article relative to the amount of work done on it by editors). They conducted non-parametric and parametric statistical analyses using numerous variables based on the many article records through their data dump.
There were many levels of findings, with the main one being that articles in coarse-grained categories (those nearest the top of the hierarchies) received the most number of edits and attention from editors, though they were least likely to be featured (highest quality) articles. This seemed to surprise the authors, as it means that those articles that receive the most attention (by editors) overall lack the depth of quality found in featured articles, most of which are further down the hierarchy.