Home Contact

DIP3 Project

Computer Science Department, University of Ioannina

College of Computing, Georgia Institute of Technology

DIP3

FP7 People IOF Action
The 3Ps of Distributed Information Delivery: Preferences, Privacy and Performance

Related work

Today there is an abundance of information on line. The grand challenge is turning this huge amount of data to knowledge useful to the individual users of the Internet. DiP3 will address this challenge by tackling one form of data processing, often referred to as .push. data delivery. In push data delivery, instead of explicitly searching for information, users get notified when relevant data becomes available. Examples of such systems include RSS feeds, news alerts and aggregators. The success of such systems lies on how .useful. the pushed information is to the users receiving it.

Push-based systems most commonly rely on a publication/subscription (pub/sub) model of operation. In the pub/sub model, users, called subscribers, register to the system either by indicating topics of interest (topic-based pub/sub) or by specifying conditions that the data items must satisfy (content-based pub/sub). As new data items, or publications, become available, they are delivered to the users whose subscriptions match their content. In the majority of pub/sub systems [EFGK03], matching subscriptions with publications is a binary process: a publication either matches a subscription or it does not. Depending on the specificity of the subscriptions and the content of the publications, this may result in delivering to the users overwhelming or insufficient amounts of data. Our objective in DIP3 is to control both the amount and the quality of information received by the users. To this end, we propose ranked push-based data delivery, allowing users to receive only the most relevant data. To achieve ranked data delivery DIP3 will explore user preferences.

Ranked pub/sub delivery is a novel concept. We are aware of only two very recent approaches, namely [MVGS08] and [PPZA08], that tackle it. However, none of them is based on preferences. Furthermore, [MVGS08] considers the reverse problem: sending each publication to its most relevant subscribers. In contrast our approach is user-centric in that we are interested in sending to each subscriber the most relevant publications. [PPZA08] assumes that publications are already ordered by relevance, thus avoiding tackling the difficult problem that DIP3 addresses, that is determining relevance. Finally, DIP3 will also explore different ways of delivering ranked data including skylines, top-k and iceberg queries, as well as their variations and possible combinations.

Preference specification has been extensively studied. There are two fundamental approaches for expressing preferences, a quantitative and a qualitative one. With the quantitative approach (for example [AW00]), users employ scoring functions that associate a numeric score with specific pieces of data to indicate their interest in them. With the qualitative approach (for example [K02, C03]) users express explicitly their preferences between pair of items trough binary relations. However, there is little previous research work on incorporating any type of preferences in Internet-scale distributed systems. In particular, relating preference with push-data delivery is a novel research problem. The large scale nature of the problem, involving millions of users and gigabytes of heterogeneous data, makes this a challenging problem.

DIP3 will go even beyond preference specifications by individual users; it will further enhance information delivery by leveraging the rich opportunities offered by Web 2.0. The past few years have seen the explosion of Web-based social networks (such as MySpace and Facebook), online social media sites (such as YouTube and Flickr), large-scale information and bookmarking sharing communities (such as Wikipedia, del.icio.us and Citeulike). MySpace alone has grown from 1 million user accounts in 2004 to an astonishing 250 million accounts. One of our goals is to exploit the inherent social connections between users as expressed through social networks, social tagging, and other community-based features to enhance ranked information delivery. Recently, such information (often, called collective intelligence or wisdom of the crowds) has been used to enhance the quality of search results (for example, [NXW+07, HGG08]) and in recommendation systems (for example, [DDGR08]). Our objective in DIP3 is to enhance preferences from an individual to a social level towards improving the quality of ranked information delivery.

However, using individual preferences as well as collective preferences raises issues of privacy. Privacy has attracted a lot of attention recently. However, ensuring privacy in publish/subscribe systems and in social preferences is novel. DIP3 will consider the application of classic techniques such as k-anonymity (that guarantees the inability to distinguish an individual record from at least k-1 other records [S02]) and various extensions proposed to address its shortcomings with regards to attribute disclosure including l-diversity [MKGV08] and t-closeness [LLV07]. Building on previous work by the outgoing host on location privacy [CL08], special emphasis will be placed on personalized privacy that allows users to express individual privacy requirements.

Besides privacy, DIP3 will consider trust (avoid propagating malicious or unintended misleading information) and diversity (provide users with non-overlapping and varying information).

A foremost requirement is performance: being able to provide users with timely responses. This raises many challenges mainly due to the scale of the system (in terms of the distribution, number of users involved and the volume of data), diversity (especially with regards to differences in preferences among users and the heterogeneity of publications) and dynamicity. To address performance, DIP3 will exploit caching and replication at various system levels. Although caching and replication are well studied, the above challenges introduce new problems that are envisioned to lead to new solutions. It will also consider clustering and indexing of subscriptions and preferences. This is expected to lead to novel data structures and algorithms.

References

  • [EFGK03] P. Th. Eugster, P. Felber, R. Guerraoui, A-M. Kermarrec: .The Many Faces of Publish/Subscribe.. ACM Computing Surveys 35(2), 2003
  • [MVGS08] A. Machanavajjhala, E. Vee, M. Garofalakis, J. Shanmugasundaram: .Scalable Ranked Publish/Subscribe., 34th Very Large Database Conference, (VLDB08), September 2008, To appear.
  • [PPZA08] K. Pripuzic, I. Podnar Zarko, K. Aberer: .Top-k/w Publish/Subscribe: Finding k Most Relevant Publications in Sliding Time Window w.. 2nd Intern. Conference on Distributed Event-Based Systems (DEBS 2008), June 2008
  • [AW00] R. Agrawal, E. L. Wimmers: .A Framework for Expressing and Combining Preferences.. SIGMOD 2000
  • [K02] W.Kie.ling: .Foundations of Preferences in Database Systems.. VLDB 2002
  • [C03] J.Chomicki: .Preference Formulas in Relational Queries.. ACM Transaction on Database Systems 28(4), 2003
  • [NXW+07] S. Bao, G-R. Xue, X. Wu, Y. Yu, B.Fei, Z. Su: .Optimizing Web Search using Social Annotations.. WWW 2007
  • [HGG08] P.Heymann, G. Koutrika, H.Garcia-Molina: .Can Social Bookmarking Improve Web Search? WSDM 2008
  • [DDGR07] A. Das, M. Datar, A. Garg, S. Rajaram: .Google News Personalization: Scalable Online Collaborative Filtering.. WWW 2007
  • [S02] L. Sweeney: .Achieving k-anonymity Privacy Protection using Generalization and Suppression.. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10 (5), 2002; 571-588.
  • [MKGV08] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam: .L-diversity: Privacy beyond k-anonymity.. ACM Transactions on Knowledge Discovery from Data, TKDD 1(1): (2007)
  • [LLV07] N. Li, T. Li, S. Venkatasubramanian: .t-Closeness: Privacy Beyond k-Anonymity and l-Diversity.. ICDE 2007
  • [CL08] B. Gedik, L. Liu: .Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms.. IEEE Transaction on Mobile Computing 7(1): 1-18 (2008)

Valid CSS! Valid XHTML 1.0 Strict