You are here
Query planning for the grid: adapting to dynamic resource availability.
The availability of massive datasets, comprising sen- sor measurements or the results of scientific simulations, has had a significant impact on the methodology of scien- tific reasoning. Scientists require storage, bandwidth and computational capacity to query and analyze these datasets, to understand physical phenomena or to test hypotheses. This paper addresses the challenge of identifying and se- lecting resources to develop an evaluation plan for large scale data analysis queries when data processing capabili- ties and datasets are dispersed across nodes in one or more computing and storage clusters. We show that generating an optimal plan is hard, and we propose heuristic tech- niques to find a good choice of resources. We also con- sider heuristics to cope with dynamic resource availability; in this situation we have stale information about reusable cached results (datasets) and the load on various nodes. We develop a simulation tool for Distributed Data Analysis (DDA-Sim) and we model a large scale remote sensing ap- plication (Kronos). We report on the behavior of the query planning heuristics for both simulations and experiments of Kronos.