GBIF Data Portal Design : DPSDataValidationServices

HomePage :: Categories :: Index :: RecentChanges :: RecentlyCommented :: Login/Register

Data Validation Services Overview (from GBIF Data Portal Strategy)


Purpose

Data Validation Services will be developed for use by the whole biodiversity informatics community. The purpose of these will be to allow any user to perform a range of checks against any XML data served to the network.

Data Validation Services will be developed as a software library which may be incorporated into a range of portals and tools (including provider software), but which will also be made accessible through a set of central web services.

Data Validation Services will provide support for checks such as the following to be performed against XML data:

1. Confirmation that data elements accord with rules for acceptable content (format and/or values).
2. Comparison of georeference data against expected values based on locality data or other parameters.
3. Verification of well-formedness of scientific name strings.
4. Verification that scientific name strings match names known from a referenced authority list (e.g. IPNI).
5. Comparison of higher taxonomy with taxonomy known from a referenced authority.

Data Validation Services will be developed as an extensible library capable of supporting new classes of data and data formats and of accepting new rules for validating data.

Interfaces

Data Validation Services will be used by other components:

1. The DataIndex will use Data Validation Services to check records for any identifiable issues during indexing and may store such information as metadata alongside each record and report it back to the data provider and to users viewing the records concerned.
2. External Clients of Biodiversity Data (Portals, Workbench Applications, etc.) may use Data Validation Services to assure themselves that data are fit for their intended usage.
3. Data Providers (DiGIR/BioCASe/TAPIR, etc.) may use Data Validation Services to report to data administrators on potential issues with data sets prior to making them public.

Status

Data Validation Services are proposed as an area for research and development starting in 2005.

Functions

The following items are in plan for the Data Validation Services.

Category Item Notes Timeline
Validation Validate XML by schema Report on validity of XML records (e.g. well-formedness, codepages, schema validation) 4Q 2005
Validation Validate element content model Check that the content of a particular element matches expected formats (e.g. regular expression matching, rejection of null values, correct data formats, latitudes and longitudes within expected range). 4Q 2005
Validation Locate scientific names in taxon name/concept resource Use a specific taxon name/concept data service (potentially the GBIF central index as an aggregated view) to identify scientific name elements with unknown names. 4Q 2005
Validation Compare taxonomic hierarchy with taxon name/concept service Use a specific taxon name/concept data service (potentially the GBIF central index as an aggregated view) to taxonomic hierarchies which include unexpected entries. 4Q 2005
Validation Compare georeference data with geographic region names Verify that latitude and longitude values fall within the boundaries of any identifiable geographic units (continent, ocean, country, national park) referenced in the record. 1Q 2006
Validation Open model to support new validation tests The Data Validation Services should be developed as an open software framework that can readily be extended with new tests. Wherever possible, test components should be parameterised to minimise the need for fresh development (e.g. only one test component need be written to check any possible element against any possible regular expression). 4Q 2005
Validation Validate data from URL As well as supporting the validation of XML data passed directly to the Data Validation Services, they should support validation of data from a specified URL (which could for example be a query URL against a DiGIR/BioCASe/TAPIR provider). 4Q 2005
Reporting Report validation results All validation tests should generate report elements which may be handled programmatically or returned as an XML document. 4Q 2005



CategoryDataValidationServices

There are no comments on this page. [Add comment]

Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by WikkaWiki
Page was generated in 0.1596 seconds