Improving distributed query processing in OGSA-DAI

Primary mentor: Ally Hume (ahume [at] staffmail [dot] ed [dot] ac [dot] uk)

Secondary mentor: Amy Krause (a [dot] krause [at] epcc [dot] ed [dot] ac [dot] uk)

Background

OGSA-DAI is an open source framework for data access and integration. One of OGSA-DAI's key features is Distributed Query Processing (DQP) that allows clients to query multiple distributed relational databases as if they were a single database. In order to do this OGSA-DAI must construct and optimise a distributed query plan to execute the query. In order to make the good optimisation choices OGSA-DAI must be able to accurately estimate the number of tuples produced by each stage of the query plan. This is termed cardinality estimation.

Project Goals

The aim of this project is to improve OGSA-DAI's cardinality estimation and use the more accurate estimations to choose the most appropriate join algorithms. Some work may able be required to model the performance of OGSA-DAI's various distributed join algorithms in order to support the optimisation task.

Project Description

The project will start by developing code to extract physical database statistics from a variety of database systems. Currently OGSA-DAI extracts only the number of records in a table. To do better cardinality estimation requires more detailed statistics of the distribution of values for each
attribute.

The project will then use these statistics to produce better cardinality estimations and propagate these through the various relational operators that make up OGSA-DAI distributed query plans.

The final stage of the project will attempt to use these more accurate cardinality estimates to influence OGSA-DAI's join of appropriate join algorithms. This stage may involve studying and constructing a model of the performance of OGSA-DAI's various join algorithms.

Project Requirements

OGSA-DAI is written in Java or a good knowledge of Java is required. Knowledge of some relational database theory would be an advantage but not a requirement for an enthusiastic student.

Further information

OGSA-DAI project website: http://www.ogsadai.org.uk/
OGSA-DAI source forge website: http://sourceforge.net/projects/ogsa-dai/

Background reading on OGSA-DAI's DQP support:
http://rsta.royalsocietypublishing.org/content/368/1926/4133
(Email primary mentor for the paper if this link does not allow you to get the full content)

The following PhD thesis starts with a good state of the art analysis of data statistics. This may give you a background to the type of statistics one could get:
http://research.microsoft.com/pubs/74108/thesis.pdf
 

See Also: