Modeling Errors in Vector Data Using Stochastic Simulation
-
Graphical Abstract
-
Abstract
Vector data are important components in geographical information systems (GISs), which are represented via discrete points and lines that are often topologically structured. It is a known fact that various errors exist in vector data and their geo-processing, where positional errors are of major concern for well-defined objects, though focus should be shifted to attribute errors Otherwise. Research on GIS errors is oriented to describing, modeling and visualizing them for spatial decision support. Modeling errors is a key issue. One method is to use analytic tools such as variance propagation to ascribe mathematical formulae for specific data sets or geo-processing. Error modeling is more effectively and flexibly carried out by simulating alternative, equalprobable realizations of a data set so that it is possible to analyze the propagation of errors from source data sets such as polygon coverages to an overlaid coverage. Based on error modeling, both data producers and data users are able to assess the fitness of a particular data product for a certain purpose. With geostatistics, spatially distributed phenomena are conceived as random variables, which are assumed to take values drawn from population conforming to a specific distribution. The notion of random variables conveys spatial variabilities, which are central to capturing spatially varying errors in spatial data. Geostatistics is of particular usability for research on GIS error issues on at least two aspects: one is spatial interpolation known as Kriging, which produces variance surfaces as by-products along with interpolated surfaces, the other is stochastic simulation, which generates alternative, equal-probable surfaces. The latter approach has been widely used for error modeling in environmental and geographical problem-solving. However, its application for vector data has been rare. This paper presents a novel use of geostatistical simulation for modeling coitional errors in vector data. Successive application of the conditional probability relation shows that drawing an N-variate sample from the eqUation above can be done in N successive steps, each using a univariate CCDF. An Edinburgh suburb was chosen at the test site, with the 1: 24 000 scale aerial photographs being used to generate tests data and 1:5 000 scale aerial photographs to provide reference data, for which coordinates at a certain point or verticcel located at x are denoted x(x) and X(x) respectively. The positional error at this location ε (x) can then be expressed as: ε (x)=X(x) x(x). The underlying rationale is that, using photogrammetric techniques in urban areas for increased efficiency, aerial photographs at large and medium scales are normally ed for topographic and thematic mapping. While it is common practice to use ε (x)'s at checking points to derive error measures.such as RMSE in position and elevation, it is not adequate for those error descriptors to be used is vector error models to predict the accuracy in derivative data products such as line lengths and polygon areas by means of variance propagation, unless homogeneity and spatial independence of positional errors among points are assumed. The method used in this experiment was to apply conditional simulation to simulate equal-probable ε (x)'s and, in turn, alternative versions of the test data (x(x)'s) in order to model errors in the source data and assess the consequences of using them in a certain map operation. Stochastic simulation was performed using a Gaussian apuential simulation program SGSIM provided in GSLIB. The parameter file was supplied with suitable data including grid cell size (8 by metres), ranges, sills and nugget effects describing semivariogram models. Ten realizations were created from SGSIM for X, Y independently, and were put as new data items in the Arc/Info PAT file mentioned above. Ten versions of vector data were generated from the expanded PAT file. The results confirm that spatial variability in positional errors can be usefully explored via geostatistics, and and desired number of simulated vector data sets can be generated from conditional simulation approach supported in public domain geostatistical software systems such as GSLIB, which are, unfortunately, often packaged independent of GIS platforms. It would thus be desirable to integrate error modeling functions such as that described above into mainstream GIS software systems so that information on spatial data errors is accessible for general GIS end users. Further research should be directed towards foundmental issues related to uncertain vector data modeling, such as simulation of vector data with geometric and topological constraints, which could be reached unduly by simulated artifacts.
-
-