Detecting Trends in Social Bookmarking Systems: A del.icio.us Endeavor

Detecting Trends in Social Bookmarking Systems: A del.icio.us Endeavor

Robert Wetzker, Carsten Zimmermann, Christian Bauckhage
Copyright: © 2010 |Pages: 20
DOI: 10.4018/jdwm.2010090803
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The authors present and evaluate an approach to trend detection in social bookmarking systems using a probabilistic generative model in combination with smoothing techniques. Social bookmarking systems are gaining major interest among researchers in the areas of data mining and Web intelligence, since they provide a large amount of user-generated annotations and reflect the interest of millions of people. Based on a vast corpus of approximately 150 million bookmarks found at del. icio.us, the authors analyze bookmarking and tagging patterns and discuss evidence that social bookmarking systems are vulnerable to spamming. They present a method to limit the impact of spam on a trend detector and provide conclusions as well as directions for future research.
Article Preview
Top

Introduction

Social bookmarking systems, such as del.icio.us, StumbleUpon or CiteULike, have been very successful in the recent past. Their success originated from members’ ability to centrally store bookmarks on the web. However, with the coming of age of these services, the perceived value of social bookmarking systems shifted towards the underlying social aspects, such as trend indication, advanced web search or recommendation functionality. These services are an invaluable source of information, since they provide a vast amount of user-generated annotations, such as tags, and reflect the interests of millions of users. One social aspect of these systems derives from the fact that resources, in general web pages, are tagged by the community and not by the creator of content alone, as in other services like Flickr or YouTube (Marlow, Naaman, Boyd, & Davis, 2006). This characteristic, called collaborative tagging, was shown to provide

relevant metadata (Heymann, Koutrika, & Garcia-Molina, 2008) and is expected to boost the semantic quality of labels (Surowiecki, 2004).

One of the first and most popular social bookmarking systems is del.icio.us. Because of its early acceptance in the market, the vast growth over the past five years and easy data accessibility, del.icio.us represents a suitable case for analyzing the characteristics of social bookmarking communities. Figure 1 shows the del.icio.us main page that lists the currently popular bookmarks and tags. Though this article mainly examines del.icio.us, we conjecture that the results presented here also apply for other social bookmarking services and collaborative tagging systems in general.

Figure 1.

The del.icio.us main page (September 2008)

JDWM.2010090803.f01

As shown by Heymann, Koutrika, and Garcia-Molina (2008), trends within the del.icio.us tagging and bookmarking behavior strongly correlate with real world events. This characteristic makes bookmarking services a valuable source for trend detection and creates new opportunities in areas such as product tracking or marketing. We will investigate the nature of trends within social bookmarking services and present a probabilistic method for the automated detection of trends within the del.icio.us community. In order to succeed in the trend detection task, we further present a method which limits the influence of spam users as their frequent appearance in the data and their anomalous behavior would, otherwise, strongly interfere with any trend patterns.

The purpose of this work is threefold: First, we investigate the underlying bookmarking and tagging dynamics of social bookmarking systems using the example of del.icio.us. Second, we discuss evidence that social bookmarking systems are highly vulnerable to spam and hence need to be preprocessed before any sophisticated analysis can take place. Finally, we show that trendswithin the bookmarking community can be successfully detected using a probabilistic generative model combined with smoothing techniques where trends are considered statistical anomalies. For a comprehensive study, we collected a corpus of 142,341,551 del.icio.us bookmarks, which - to the best of our knowledge - is the biggest dataset of its kind analyzed to date.

The paper is structured as follows: We start with an introduction into the specifics of social bookmarking systems using the example of del.icio.us. We then present our method of data mining and analyze the bookmarking and tagging patterns within the retrieved corpus. We also show that bookmarking systems are vulnerable to different forms of spam. We present behavioral patterns that help characterizing spam users and propose a method that limits their impact on trend detection without requiring high computational effort. Applying this method, we further investigate the possibility of detecting trends within the del.icio.us community using probabilistic measures. We present the results of our trend detector and conclude by discussing our findings and directions for future research.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing