ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

RUE 2012 - Workshop on Recommendation Utility Evaluation: Beyond RMSE

Date2012-09-09

Deadline2012-06-08

VenueDublin, Ireland Ireland

Keywords

Websitehttps://recsys.acm.org/2012

Topics/Call fo Papers

Measuring the error in rating value prediction has been by far the dominant evaluation methodology in the Recommender Systems literature. Yet there seems to be a general consensus that this criterion alone is far from being enough to assess the practical effectiveness of a recommender system in matching user needs. The end users of recommendations receive lists of items rather than rating values, whereby recommendation accuracy metrics ?as surrogates of the evaluated task? should target the quality of the item selection, rather than the numeric system scores that determine this selection. Gaps in the adoption of ranking evaluation methodologies (e.g. IR metrics) result in methodological divergences though, which hinder the interpretation and comparability of empirical observations by different authors.
On the other hand, accuracy is only one among several relevant dimensions of recommendation effectiveness. Novelty and diversity, for instance, have been recognized as key aspects of recommendation utility in many application domains. From the business point of view, the value added by recommendation can be measured more directly in terms of clickthrough, conversion rate, order size, returning customers, increased revenue, etc. Furthermore, web portals and social networks commonly face multiple objective optimization problems related to user engagement, requiring appropriate evaluation methodologies for optimizing along the entire recommendation funnel. Other potentially relevant dimensions of effective recommendations for consumers and providers include confidence, coverage, risk, cost, robustness, ease of use, etc.
While the need for further extension, formalization, clarification and standardization of evaluation methodologies is recognized in the community, this need is still unmet for a large extent. When engaging in evaluation work, researchers and practitioners are still often faced with experimental design questions for which there are currently not always precise and consensual answers. RUE 2012 aims to gather researchers and practitioners interested in developing better, clearer, and/or more complete evaluation methodologies for recommender systems ?or just seeking clear guidelines for their experimental needs. The workshop aims to provide an informal setting for exchanging and discussing ideas, sharing experiences and viewpoints, seeking to advance in the consolidation and convergence of experimental methods and practice.
Issues of interest
Specific questions that the workshop aims to address include the following:
What are the unmet needs and challenges for evaluation in the RS field? What changes would we like to see? How could we speed up progress?
What relevant recommendation utility and quality dimensions should be cared for? How can they be captured and measured?
How can metrics be more clearly and/or formally related to the task, contexts and goals for which a recommender application is deployed?
How should IR metrics be applied to recommendation tasks? What aspects require adjustment or further clarification? What further methodologies should we draw from other disciplines (HCI, Machine Learning, etc.)?
What biases and noise should experimental design typically watch for?
Can we predict the success of a recommendation algorithm with our offline experiments? What offline metrics correlate better and under which conditions?
What are the outreach and limitations of offline evaluation? How can online and offline experiments complement each other?
What type of public datasets and benchmarks would we want to have available, and how can they be built?
How can the recommendation effect be traced on business outcomes?
How should the academic evaluation methodologies improve their relevance and usefulness for industrial settings?
How do we envision the evaluation of recommender systems in the future?

Last modified: 2012-05-02 23:29:27