METADATA DEVELOPMENT BASED ON ISO / TC 211 STANDARDS

This paper reviews the present status and major problems of the existing ISO standards related to imagery metadata. An imagery metadata model is proposed to facilitate the development of imagery metadata on the basis of conformance to these standards and combination with other ISO standards related to imagery. The model presents an integrated metadata structure and content description for any imagery data for finding data and data integration. Using the application of satellite data integration in CEOP as an example, satellite imagery metadata is developed, and the resulting satellite metadata list is given.


INTRODUCTION
Within the geospatial environment and its applications, imagery data are important information sources and products.To apply various imagery data, it is necessary to represent or reconstruct exactly geometric conditions used in observing or acquiring these data.Information about description or reconstruction of the observation conditions is usually represented by imagery metadata.
For imagery metadata development, two primary uses can be identified: (1) finding data and (2) integrating data.In finding data, users try to identify data that may be useful for their own purposes.This usually requires two general categories of information about imagery data: suitability for the user's intended purpose and methods for understanding and interpreting the image.Therefore, metadata are required to provide a minimum amount of information to support users' judgment or evaluation of usability of the data.In integrating data, users spatially and temporally overlay the data so that the spatio-temporal relationships among data and/or pixels can be explicitly reconstructed.For this purpose, metadata are required to include enough information to let users overlay the retrieved data.For example if satellite data are not geometrically and radiometrically corrected, metadata should provide users with enough information to do their own geometric correction and radiometric correction, such as for satellite orbit, sensor attitude, and some calibration parameters, etc.
Standardization on imagery metadata is being conducted by several international organizations such as ISO/TC211 ( Geographic information/Geomatics).ISO/TC 211 is now producing a series of extensible metadata specifications to effectively manage data for users ' retrieval, including ISO 19115 geographic informationmetadata (ISO/TC 211, 2002), ISO 19115 geographic information -metadata -part 2: extension for imagery and gridded data (ISO/TC 211, 2005).However, these standards do not provide all the metadata information needed to describe imagery data or to meet all the requirements of applications.Further, the current metadata standards are still in draft phase.
The main problems are: (1) To apply imagery data to geographic information, the geolocation information must be added to the imagery metadata, so metadata and image processing are sufficient to permit users to extract directly the information needed to geo-reference the data.Present standards do not provide the information necessary for geolocation of an image.(2) Although these standards provide metadata schema for geographical data, these metadata are not rich enough for all types of thematic requirements specific to the applications.New metadata items are required for the specific needs in applications.
(3) There is currently no imagery metadata model.An integrated imagery metadata model is required to describe imagery data.It is necessary to apply ISO metadata standards (ISO 19115 and ISO 19115 part 2) and to combine them with information about other ISO standards related to imagery, such as ISO 19130, while resolving issues of overlap.Also it is necessary to consider the extension for cases of professional applications and diverse communities.Thus imagery metadata can be developed to provide users with enough information for the applications of finding data and data integration.
In the paper, an imagery metadata model is proposed that establishes an integrated metadata structure defining the essential components for imagery data based on the ISO existing standards and imagery-specific needs.The rest of the paper is organized as follows.Section 2 reviews the present status of the ISO existing standards related to metadata and imagery.Section 3 proposes imagery metadata model and uses the model for metadata design.An example using the CEOP satellite imagery data is given in section 4. Finally, further discussion and conclusions are presented in section 5.

ISO 19115 − metadata
The international community approved ISO 19115 in 2003 as a tool to define metadata in the field of geographic information.It defines schemas required for describing geographic information and services and provides information about identification, extent, quality, spatial and temporal schemas, spatial reference, and distribution of geographic data.The document includes metadata UML models, core metadata, metadata package, data dictionary, descriptions of extensions, and profiles.
ISO 19115 is designed to be the general metadata standard applicable to all data sets with geographic information.It identifies a set of core metadata derived from the many metadata elements it defines.It also specifies the conditions that metadata elements may be mandatory, conditional, or optional.Although there is some service metadata in ISO 19115, particularly in the area of identification, much of the service metadata is defined in ISO 19119 (ISO/TC 211, 2001).The metadata in 19115 provides only limited information about the spatial and temporal schemas: the extents of both and the spatial representation information.

ISO 19115 part 2 − metadata extension for imagery and gridded data
At the time when ISO 19115 was drafted, it provided some provisions of metadata models for imagery and gridded data, but it did not provide all the metadata elements needed to describe imagery.ISO 19115 part 2 extends the metadata defined in ISO 19115 and identifies additional metadata required to describe imagery and gridded data, such as data quality, spatial representation, content, and acquisition information.It provides information about the properties of the measuring equipment used to acquire data, geometry of the measuring process employed by the equipment, and production process used to digitize the raw data.A committee draft (CD) for ISO 19115 part 2 has been published.

ISO 19130 − sensor data model for imagery and gridded data
To apply imagery data to geographic information, the geolocation information is very important.

IMAGERY METADATA MODEL
Imagery metadata is a description of the content, quality, condition, and characteristics of imagery datasets.
Figure 1 presents a proposed structure and content for imagery metadata.A full set of imagery metadata contains one or more imagery metadata classes and elements.A metadata class is defined as a set of metadata elements describing the same aspects of data.A logically primitive item of metadata is called metadata element, which is defined as a discrete unit of metadata.The layout of imagery metadata class is described in Figure 1(a), including 8 main metadata classes as follows: metadata set information, identification information, data quality information, spatial representation information, geolocation information, reference system information, content information, and specific information.
Metadata elements can be organized into three types of components as shown in Figure 1(b): core elements, extended elements, and specific elements.Core elements define the basic minimum number of metadata elements that must be reported for all applications of a geographical dataset.The extended elements define those element extensions necessary for proper characterization of an imagery dataset.Specific elements include professional elements or organizational elements.Professional elements define conditional or optional metadata for specific professional (discipline) needs.Organizational elements define conditional or optional metadata for the specific needs of different research communities.

Core metadata elements
Even though metadata standards define an extensive set of metadata elements, typically only a subset of the full number of elements is used.It is essential to define the minimum set of metadata elements, called core metadata elements, needed for all application datasets, without which the application data set is not well described.ISO 19115 defines 5 core metadata classes with 13 core metadata elements.These are the mandatory metadata components that should be included in our metadata design.Figure 2 shows an UML model of these elements and their relationship.The definition of each imagery metadata element follows.

Specific metadata elements
The specific metadata elements identify metadata specifically required for the selected use cases.They include professional metadata elements or organizational metadata elements.
(1) Professional metadata elements One of the significant enhancements of metadata is the ability to extend for specific professional needs.Professional metadata elements define a set of metadata classes and their elements that are used by a specific discipline or profession.The extended elements are outside the ISO standards and must meet rules as follows.
• Each metadata element is unique and not already defined in the model.
• Each metadata class is unique and not previously used.
(2) Organizational metadata elements To meet specialized requirements of nations, regions, and organizations, organizational metadata elements can be defined by a dataset producer or a user community.The extended elements are also outside the ISO standards but are needed by the dataset producers or communities.Such elements also must meet rules as the same of rules of professional metadata elements.
CEOP (Coordinated Enhanced Observation Period) is currently establishing an integrated global observing system of water and energy cycle for scientific and social needs.Taking an example of CEOP application of satellite data integration, Figure 3 shows the UML model of specific metadata elements related to CEOP metadata, which include EOP3 and 4 resampling data, CEOP distribution information, and imagery content information.The definition of each metadata element is described as follows.valueForMissingData::MD_Band − whether data are not captured though a grid-cell is in the observation area outOfObservation::MD_Band − description of a grid-cell when the grid-cell is out of the observation area observationAreaRatio::MD_ImageDescription − observation area ratio endian::CEOP_Endian − a mixture of endian by organizations and systems that generated the satellite geocoded image product.fromNorth::CEOP_OrderOfDataRecording and fromSouth::CEOP_OrderOfDataRecording − some data products have pixels starting from the north, while some others have pixels starting from the south.The information of "from north" and "from south" is needed to describe the order of data recording special attributes for CEOP application.name::CEOP_Format − name of the data transfer format version::CEOP_Format − version of the format (date, number, etc) blank::CEOP_Format − blank, special attribute for CEOP application

Identification information
This information provides an outline of the content and format of the imagery data.In ISO 19115, it defines general metadata elements of identification information that apply to any geographic data.That information relevant to imagery is: MD_DataIdentification, MD_BrowseGraphic and MD_Keywords.MD_DataIdentification describes information required to identify a dataset.MD_BrowseGraphic describes a graphic that provides an illustration of the dataset.MD_Keywords provides keywords, their type, and reference source.Figure 5 shows the UML model of identification information for imagery.The definition of each metadata element is described as follows.

Data quality information
This information provides metadata classes and elements that describe the accuracy of the data and how the data presented in the current dataset are derived from the original measurement.Imagery data are not raw measurements but are obtained after processing.Metadata on the processing procedure are often of particular importance to imagery.LI_Lineage, LI_Source and LI_ProcessStep are the important information that can describe the algorithm used to derive the imagery and the processing used to create it.This information is defined in ISO 19115.However, data quality information defined in ISO 19115 is currently very general, as it must apply to any geographic data.This information can be extended to include more detailed elements about processing imagery data, such as LE_Processing, LE_ProcessStepReport, and LE_Algorithm, which can be found in ISO 19115-2.In addition, DQ_Element in ISO 19115 defines all values obtained from applying a data quality measure or the outcome of evaluating the obtained value.The class of DQ_PositionAccuracy, however, is only essential to describe positional accuracy information of an image.Figure 6 shows the UML model of data quality information for imagery.The definition of each metadata element is described as follows.

Spatial representation information
This information provides the spatial representation for imagery.MD_GridSpatialRepresentation, defined in ISO 19115, is essential to the description of any geolocated imagery data.MD_Georectified applies to a grid whose cells are regularly spaced in geographic coordinate system and whose cells in the grid can be geolocated according to its grid coordinate, grid origin, cell spacing, and orientation.MD_Georeferenceable applies to a grid whose cells are irregularly spaced in the geographic coordinate system and whose cells in the grid can be geolocated using geolocation information supplied with the data but not from the grid properties.This information can be extended to provide further details needed to reference imagery data to a geographic coordinate system, which can be found in ISO 19115-2, such as MI_Georectified and MI_GeorefencingDescription.MI_Georectified is the subclass of MD_Georectified that contains additional information used to further specify georectification details of the imagery data.MI_GeorefencingDescription is the subclass of MD_Georeferenceable that contains additional information used to support georectification of the imagery data.Figure 7 shows the UML model of spatial representation information for imagery.The definition of each metadata element is described as follows.

Geolocation information
This information determines the geographic location corresponding to an image.ISO 19130 defines the sensor model and external information required for geolocation of an image.It is defined that the geolocation information should be provided in SD_GCPCollection or SD_SensorModel.SD_GCPCollection defines collection of ground control points.SD_SensorModel describes the sensor model information, consisting of SD_SensorParameters and SD_PlatformParameters.SD_SensorParameters defines basic sensor parameters including the position, orientation, and operational mode at a given time.SD_PlatformParameter defines basic identification information for the platform.Figure 8 shows the UML model of geolocation information for imagery.The definition of each metadata element is described as follows.

Reference system information
This information describes the spatial and temporal reference system, important information required for any image that is geographically located.In ISO 19115, MD_CRS describes metadata about a coordinate system consisting of MD_ProjectionParameters and MD_EllipsoidParameters. MD_ProjectionParameters includes a set of parameters that describe the projection, and MD_EllipsoidParameters includes a set of parameters that describe the ellipsoid.Figure 9 shows the UML model of reference system information for imagery.The definition of each metadata element is described as follows.

Content information
This information contains the metadata classes and metadata elements required to describe attributes of the imagery content and how they are represented.ISO 19115 defines metadata elements of content information for any geographic data.MD_CoverageDescription is required, which contains information about the content of a grid data cell.MD_RangeDimension class identifies the range of each dimension of a cell measurement value, and MD_Band class describes the range of wavelengths in the electromagnetic spectrum used.
MD_ImageDescription class provides metadata about the image's suitability for use.ISO 19115 very generally describes content information for geographical data.This information can be extended to include more detailed information for imagery data, such as MI_Band, which is described in ISO 19115-2 by defining additional attributes for specifying properties of individual wavelength bands in the dataset.Figure 10 shows the UML model of content information for imagery.The definition of each metadata element is described as follows.

Metadata elements design
Figure 11 shows how to determine metadata elements in applications.When designing metadata elements, it is first required to answer the questions for the user requirements as follows: (1) What: Does a dataset on a specific topic exist?(2) Where: For which specific place?(3) When: For which specific date or period?(4) Whom: Who is the point of contact to learn more about or to order the dataset?The answer to these user requirements is the basic step for the collection of metadata items in applications.
For a specific application, metadata elements, including core elements, extended elements, and specific elements, should be identified before the establishment of metadata structure.All core metadata elements defined in the imagery metadata model should be included in the design.To identify the extended elements, the application specific elements must be compared to the imagery metadata model.Some elements are reflected in the model, and some are not.The items reflected in the model are determined as extended elements.The items non-reflected in the model can be added as specific elements for the requirements of applications.These additional items will be helpful for users in deciding the availability and interoperability of datasets.
After all application specific metadata elements have been identified, they can be then organized as metadata classes according to the structure of imagery metadata as shown in Figure 1(a).

Metadata elements identification
Metadata structure determination metadata elemenmt output Satellite products are supplied by JAXA, NASA, ESA and EUMETSAT, which include full scenes of data over the CEOP focused research areas, level 3 global gridded data and subset scene data (JAXA, 2005).Satellite products contain two major parts: the header and the data.The data part records the observed data, position data, etc.The header part is a metadata standard for satellite imagery data, which describes the outline of the satellite product and the main characteristics of the product.The header part is from Header Information of Each Data File (CEOP, 2003), which includes 21 necessary attributes such as filename, sensor, product, observation date and time, image size, data type, data unit, scale factor, observation channel, reference site, latitude/longitude in center of lower left pixel, grid size, missing value, observation area ratio, subset software version, processing date, processing center, input original filename, original file processing center, HDF library version, and blank.This information becomes the most important source to decide metadata structure and content.The CEOP satellite metadata can be designed from this satellite data file header.

CEOP satellite metadata element
We can define the mapping between CEOP items in the header and the proposed imagery metadata model.By mapping results, CEOP metadata elements are designed as shown in Table 1.In addition, some additional metadata elements are also required to describe CEOP specific information of metadata set, identification, citation and responsible party.Table 2 gives the results of additional items required.

Figure 1 .
Figure 1.Imagery metadata structure and content

Figure 2 .
Figure 2. UML model of core metadata elements (from ISO 19115) contact::MD_Metadata − party responsible for the metadata information dateStamp::MD_Metadata − date when the metadata was created citation::MD_Identification − citation data for the resources abstract::MD_Identification − brief summary of the content of the resources language::MD_DataIdentification − language(s) used within the dataset topicCategory::MD_DataIdentification − main themes of the dataset title::CI_Citation − name by which the cited resource is known date::CI_Citation − reference date for the cited resource individualName::CI_ResponsibleParty − name of the responsible person organisationName::CI_ResponsibleParty − name of the responsible organization positionName::CI_ResponsibleParty − role or position of the responsible person contactInfo::CI_ResponsibleParty − address of the responsible party role::CI_ResponsibleParty − function performed by the responsible party

Figure 3 .
Figure 3. UML model of specific metadata elements (an example)

Figure 4 Figure 4 .
Figure4presents the UML model of extended metadata elements for imagery, which is proposed based on ISO 19115 metadata standard, ISO 19115-2 metadata extension for imagery and ISO 19130 sensor model.It includes six main metadata classes as follows.MD_Metadata defines the imagery metadata.MD_Identification identifies the imagery data, including a graphic overview of the data, keywords describing the resource, and data identification.DQ_DataQuality describes an assessment of the quality of the dataset as well as sources and production processes used in producing a dataset.MD_SpatialRepresentation concerns the mechanisms used to represent spatial information in a dataset.MD_ReferenceSystem describes the spatial and temporal reference systems used in a dataset.MD_Content identifies the feature catalogue used and content of the coverage dataset.SD_GeolocationInformation defines the geographic location corresponding to image location.

Figure 5 .
Figure 5. UML model of identification information for imagery

Figure 6 .
Figure 6.UML model of data quality information for imagery

Figure 7 .
Figure 7. UML model of spatial representation information for imagery

Figure 8 .
Figure 8. UML model of geolocation information for imagery

Figure 9 .
Figure 9. UML model of reference system information for imagery attributeDescription::MD_CoverageDescription − description of the attribute described by the measurement value contentType::MD_CoverageDescription − type of information represented by the cell value sequenceIdentifier::MD_RangeDimension − number that uniquely identifies instances of bands of wavelengths on which a sensor operates descriptor::MD_RangeDimension − description of the range of a cell measurement value maxValue::MD_Band − longest wavelength that sensor is capable of collecting within a designated band minValue::MD_Band − shortest wavelength that sensor is capable of collecting within a designated band units::MD_Band − units in which sensor wavelengths are expressed peakResponse::MD_Band − wavelength at which the response is the highest bitsPerValue::MD_Band − maximum number of significant bits in the uncompressed representation for the value in each band of each pixel toneGradation::MD_Band − number of discrete numerical values in the grid data scaleFactor::MD_Band − scale factor which has been applied to the cell value offset::MD_Band − the physical value corresponding to a cell value of zero bandBoundaryDefinition::MI_Band − designation of criterion for defining maximum and minimum wavelengths for a spectral band nominalSpatialResolution::MI_Band − smallest distance between which separate points can be distinguished polarisation::MI_Band − polarisation of the transmitter or detector illuminationElevationAngle::MD_ImageDescription − illumination elevation measured in degrees illuminationAzimuthAngle::MD_ImageDescription − illumination azimuth measured in degrees imagingCondition::MD_ImageDescription − conditions affecting the image imageQualityCode::MD_ImageDescription − specification of the image quality cloudCoverPercentage::MD_ImageDescription − area of the dataset obscured by clouds processingLevelCode::MD_ImageDescription − image distributor's code that identifies the level of radiometric and geometric processing compressionGenerationQuantity::MD_ImageDescription − count of the number of lossy compression cycles performed on the image triangulationIndicator::MD_ImageDescription − indication of whether or not triangulation has been performed upon the image radiometricCalibrationDataAvailability::MD_ImageDescription − indication of availability of the radiometric calibration information for generating the radiometrically calibrated standard data product cameraCalibrationInformationAvailability::MD_ImageDescription − indication of availability of constants that allow for camera calibration corrections filmDistortionInformationAvailability::MD_ImageDescription − indication of availability of Calibration Reseau information lensDistortionInformationAvailability::MD_ImageDescription − indication of availability of lens

Figure 10 .
Figure 10.UML model of content information for imagery

Figure 11 .
Figure 11.Metadata elements design4 CASE STUDY: CEOP SATELLITE METADATACEOP(Toshio, 2005)  is currently establishing an integrated global observing system of water and energy cycles for both scientific and social needs, which necessitates integrating various CEOP satellite imagery data.To facilitate the accessibility of the data collected from information sources and maximize their retrieval and The ISO 19115  and ISO 19115part 2 do not provide the information required for geolocation of an image.ISO 19130 specifies the information required to support geolocation and sensor properties if georeferenced imagery.It defines how the sensor measurements and the geolocation information are logically associated.The georeferencing information in ISO 19130 is a subset of the georeferencing description of ISO 19115 part 2, and, that area of ISO 19130 should be associated with ISO 19115 part 2. In order to develop a full set of imagery metadata, it is necessary to combine the relevant parts of ISO metadata standards(ISO 19115, ISO 19115part 2) with geolocation information or sensor properties from ISO imagery standard(ISO 19130)or both.

Table 1 .
CEOP metadata elements for satellite imagery data

Table 2 .
Additional metadata elements for CEOP satellite imagery metadata Based on Table1 and Table 2, the metadata element list can be further developed.As given below, this list is provided in a tabbed-outline format.It presents the hierarchical structure of CEOP satellite metadata.