ACM SIGMOD/PODS 2003 Conference

San Diego, California
June 9-12, 2003

SIGMOD Accepted Tutorials

Below is a list of the tutorials accepted for the ACM SIGMOD 2003 conference, to be held as part of the Federated Computing Research Conference (FCRC), in San Diego, California, USA, June 10-12, 2003.

The final titles and author lists are subject to change and will be posted after the Camera Ready copy due in mid March, 2003.


Tutorial 1
Chair: Surajit Chaudhuri
Data Quality and Data Cleaning: An Overview
Theodore Johnson and Tamraparni Dasu, AT&T Labs Research

Data quality is a serious concern in any data-driven enterprise, often creating misleading findings during data mining, and causing process disruptions in operational databases. The manifestations of data quality problems can be very expensive - ``losing'' customers, ``misplacing'' billions of dollars worth of equipment, misallocated resources due to glitched forecasts, and so on. Solving data quality problems typically requires a very large investment of time and energy -- often 80% to 90% of a data analysis project is spent in making the data reliable enough that the results can be trusted.

In this tutorial, we present a multidisciplinary approach to data quality problems. We start by discussing the meaning of data quality and the sources of data quality problems. We show how these problems can be addressed by a multi-disciplinary approach, combining techniques from management science, statistics, database research, and metadata management. Next, we present an updated definition of data quality metrics, and illustrate their application with a case study. We conclude with a survey of recent database research that is relevant to data quality problems, and suggest directions for future research.

Tutorial 2
Chair (Part 1): Sihem Amer-Yahia
Chair (Part 2): Dan Suciu
XQuery: A Query Language for XML
Don Chamberlin, IBM Almaden Research Center

XQuery is the XML query language currently under development in the World Wide Web Consortium (W3C). XQuery specifications have been published in a series of W3C working drafts, and several reference implementations of the language are already available on the Web. If successful, XQuery has the potential to be one of the most important new computer languages to be introduced in several years. This tutorial will provide an overview of the syntax and semantics of XQuery, as well as insight into the principles that guided the design of the language.

The speaker is Don Chamberlin, one of IBM's representatives on the XML Query Working Group, and co-author of the Quilt language proposal that influenced the basic design of XQuery.

Tutorial 3
Chair: Alin Deutsch
Data Grid Management Systems
Arun Jagatheesan and Arcot Rajasekar, San Diego Supercomputer Center

Data Grids are being built across the world as the next generation data handling systems to manage peta-bytes of inter-organizational data and storage space. A data grid (datagrid) is a logical name space consisting of storage resources and digital entities that is created by the cooperation of autonomous organizations and its users based on the coordination of local and global policies. Data Grid Management Systems (DGMSs) provide services for the confluence of organizations and management of inter-organizational data and resources in the datagrid.

The objective of the tutorial is to provide an introduction to the opportunities and challenges of this emerging technology. Novices and experts would benefit from this tutorial. The tutorial would cover introduction, use-cases, design philosophies, architecture, research issues, existing technologies and demonstrations. Hands on sessions for the participants to use and feel the existing technologies could be provided based on the availability of internet connections.

