Purpose
Data Validation Services will be developed for use by the whole biodiversity informatics community. The purpose of these will be to allow any user to perform a range of checks against any XML data served to the network.
Data Validation Services will be developed as a software library which may be incorporated into a range of portals and tools (including provider software), but which will also be made accessible through a set of central web services.
Data Validation Services will provide support for checks such as the following to be performed against XML data:
1. Confirmation that data elements accord with rules for acceptable content (format and/or values).
2. Comparison of georeference data against expected values based on locality data or other parameters.
3. Verification of well-formedness of scientific name strings.
4. Verification that scientific name strings match names known from a referenced authority list (e.g. IPNI).
5. Comparison of higher taxonomy with taxonomy known from a referenced authority.
Data Validation Services will be developed as an extensible library capable of supporting new classes of data and data formats and of accepting new rules for validating data.
Interfaces
Data Validation Services will be used by other components:
1. The
DataIndex will use Data Validation Services to check records for any identifiable issues during indexing and may store such information as metadata alongside each record and report it back to the data provider and to users viewing the records concerned.
2. External Clients of Biodiversity Data (Portals, Workbench Applications, etc.) may use Data Validation Services to assure themselves that data are fit for their intended usage.
3. Data Providers (
DiGIR/
BioCASe/TAPIR, etc.) may use Data Validation Services to report to data administrators on potential issues with data sets prior to making them public.
Status
Data Validation Services are proposed as an area for research and development starting in 2005.
Functions
The following items are in plan for the Data Validation Services.
| Category |
Item |
Notes |
Timeline |
| Validation |
Validate XML by schema |
Report on validity of XML records (e.g. well-formedness, codepages, schema validation) |
4Q 2005 |
| Validation |
Validate element content model |
Check that the content of a particular element matches expected formats (e.g. regular expression matching, rejection of null values, correct data formats, latitudes and longitudes within expected range). |
4Q 2005 |
| Validation |
Locate scientific names in taxon name/concept resource |
Use a specific taxon name/concept data service (potentially the GBIF central index as an aggregated view) to identify scientific name elements with unknown names. |
4Q 2005 |
| Validation |
Compare taxonomic hierarchy with taxon name/concept service |
Use a specific taxon name/concept data service (potentially the GBIF central index as an aggregated view) to taxonomic hierarchies which include unexpected entries. |
4Q 2005 |
| Validation |
Compare georeference data with geographic region names |
Verify that latitude and longitude values fall within the boundaries of any identifiable geographic units (continent, ocean, country, national park) referenced in the record. |
1Q 2006 |
| Validation |
Open model to support new validation tests |
The Data Validation Services should be developed as an open software framework that can readily be extended with new tests. Wherever possible, test components should be parameterised to minimise the need for fresh development (e.g. only one test component need be written to check any possible element against any possible regular expression). |
4Q 2005 |
| Validation |
Validate data from URL |
As well as supporting the validation of XML data passed directly to the Data Validation Services, they should support validation of data from a specified URL (which could for example be a query URL against a DiGIR/BioCASe/TAPIR provider). |
4Q 2005 |
| Reporting |
Report validation results |
All validation tests should generate report elements which may be handled programmatically or returned as an XML document. |
4Q 2005 |
CategoryDataValidationServices
There are no comments on this page. [Add comment]