Title: Indexing Shared Content in Information Retrieval Systems Speaker: Marcus Fontoura (Yahoo! Research) Abstract: In many corpora, documents share some content in full or in part. However, most information retrieval systems process each document separately, causing shared content to be indexed multiple times. In this paper, we describe a new document representation model where related documents are organized as a tree, which facilitates single indexing of shared content. We show how this representation can be encoded in an inverted index and we devise algorithms for evaluating free-text queries based on this encoding. We also show how our representation applies to web, email, and newsgroup search. Finally, we present experimental results that indicate that our method can provide a significant reduction in the size of an inverted index as well as in the time to build and query it. Short-Bio: Marcus Fontoura joined Yahoo! Research in November 2005. Prior to this, he was a Research Staff Member at the IBM Almaden Research Center in the Computer Science Department. He also had research posts at the Computer Systems Group, University of Waterloo and at Princeton University. His primary interests are web search, data management, algorithms and data structures for large data sets. He obtained his PhD in Computer Science from PUC-Rio, Brazil in July 1999.