Unified Geomagnetic Database from Different Observation Networks for Geomagnetic Hazard Assessment Tasks

This paper presents the results of the creation of a geomagnetic data storage system that combines raw observation data from different geomagnetic observation networks along with derived indices and indicators of geomagnetic activity. Geomagnetic data, provided by observation networks, undergo a series of data quality validation procedures. The implemented instruments and procedures facilitate the creation of a uniform database from multiple sources.


Introduction
Observations of the Earth's magnetic field parameters are essential for fundamental research of solar-terrestrial interactions. Another significant application is supporting the stable operation of various technical systems and industrial facilities and assessing the risks of their exposure to extreme space weather events. Disturbances in the Earth's magnetosphere and ionosphere have been known to cause failures in radio transmission infrastructure and satellite navigation systems, as well as damage of industrial structures, such as power and communication lines, power stations, oil and gas pipelines, railroad equipment. Therefore, evaluation of possible characteristics of geomagnetic disturbances and geomagnetically induced currents (GICs) for a given region is an important task. A study done by Pirjola (2002) describes the occurrence of erroneous traffic signals on railways caused by geomagnetically induced currents. Other studies have reported the effects of GIC's on electric power grids (Belakhovsky et al. 2019), trans-oceanic cables (Lanzerotti et al. 1995) and pipelines (Pulkkinen et al. 2001).
To determine the various parameters of geomagnetic activity for given territories, access to long time series of geomagnetic measurements of sufficiently good quality is required. Geomagnetic observatories and stations are included in various observational networks such as INTERMAGNET, SuperMAG, IMAGE, etc. They have different spatial coverage and provide data of varying quality in diverse formats with different orientations of magnetometric instruments. These aspects complicate combining data into a single database in a uniform format. This article is dedicated to solving this problem. It describes a unified database that integrates data from different geomagnetic observation networks and brings them into a uniform format that is convenient for further analysis.
In addition to direct observational data other significant source of information on the Earth's magnetosphere are indices of geomagnetic activity (e.g., Kp, Dst, AE, SYM, etc.). It should be noted that these global indices of geomagnetic activity are insufficient to estimate GICs. It is necessary to construct regional indices that more accurately take into account the geomagnetic field disturbances in the regions of interest (Viljanen, Wintoft & Wik 2015). To solve this problem, it is necessary to integrate into a single database information from multiple observational networks located within the studied region. This requires adequate tools to aggregate observational data into a unified data storage. In addition to trivial conversion from multiple data formats used in different repositories, for certain networks it is also necessary to transform the reference orientation of magnetic observations, which are orthogonal components of the magnetic field vector.
Estimation of possible GIC's values requires construction of regional indicators of geomagnetic activity that are based on geomagnetic observations. It is also necessary to select from various indicators those that allow for the best estimation of emerging currents. To facilitate the calculation of such indicators using initial data it is convenient to organize data storage in a relational database instead of file storage systems that are still used in many repositories of geomagnetic data. In this article we present solutions for solving the abovementioned problems.

Data Sources
For more convenient interaction with data on the Earth's magnetic field, it is important to have access to a uniform database of initial geomagnetic data (magnetograms) that consist of measurements of the magnetic field vector orthogonal components. Magnetometers, installed at geomagnetic observatories or stations, can be oriented with reference to geographic (X -North; Y -East; Z -Vertical) or geomagnetic (H -Magnetic North; E -Magnetic East; Z -Vertical) coordinate systems. The latter depends on the location of the geomagnetic pole that changes with time. Data from magnetic observatories, where full magnetic field vector components are determined, can be easily transformed between the systems. Magnetic stations provide only variations of magnetic vector components, and in this case transformation of observations requires additional calculations. To facilitate further data processing and analysis of data both from observatories and stations it is convenient to provide storage in a geographic reference frame that is stable in time.
The spatial coverage of the database described in this paper is the territory of Russia and neighboring countries.
The main source of observatory geomagnetic data is the international network of geomagnetic observatories INTERMAGNET (INTERMAGNET 2020). We selected 27 observatories from this network located in Russia and neighboring countries. Data ingested before being quality controlled is considered 'preliminary'. These data include missing values and disturbances caused by man-made interference. Observational data that went through a baseline correction procedure, removing disturbances, and filling data gaps, are considered ' definitive data'. If available, the use of the definitive data is preferred. For the selected observatories of the INTERMAGNET network, the definitive data from 1991 to 2019 were obtained.
The SuperMAG network was chosen as the main source of data for magnetic stations that are located in the Northern regions (Gjerloev 2012;SuperMAG 2020). We selected 76 SuperMAG stations located in Russia and neighboring countries. A map of selected stations shown in Figure 1 is published on a GIS server and is available on the Internet as a cartographic web service (Map of Geomagnetic Observatories and Stations 2018). For the selected stations data for the entire available observation period from 1980 to 2019 were used.

Database Structure
The database was created in two versions: file storage of initial data and MySQL relational database. The standard IAGA-2002 file format (IAGA-2002(IAGA- 2016 and XYZ magnetic field component orientation were chosen as the data format. Files obtained from different geomagnetic observation networks in different formats and various orientations were converted to IAGA-2002 file format and, when necessary, the magnetic field component variations were converted to the XYZ coordinate system. For data conversion the freely available Geomag Algorithms library (Geomag Algorithms 2020) was used. The functionality of reading the necessary data formats and converting the coordinate system to XYZ was added to the library. After conversion, the data were loaded to the MySQL relational database.
The structure of the developed relational database rests upon one of the data storage systems described by Gvishiani et al. (2016). But in contrast to its predecessor versions it has significant alterations that allow storing data from different observation networks. Evaluation of the completeness of observational data was also performed. Initial geomagnetic measurements are stored along with geomagnetic activity indices and indicators. Geomagnetic activity indices were obtained from external sources and geomagnetic activity indicators were calculated using initial measurements. The database structure is shown in Figure 2.
The database consists of the following tables: • ref-reference information, • datasource_types-types of data sources, • pre_min-preliminary 1-minute data, • pre_10sec-preliminary 10-second data, • def_min-definitive 1-minute data, • pre_min_avail-time intervals of availability of preliminary 1-minute data, • pre_10sec_avail-time intervals of availability of preliminary 10-second data, • def_min_avail-time intervals of availability of definitive 1-minute data, • index_min-indices of geomagnetic activity in packed binary format, • index_min_plain-indices of geomagnetic activity without packing, • ind-indicators of geomagnetic activity, • ind_types-types of indicators of geomagnetic activity, • grades-graded scales of geomagnetic activity indicators, • sq-values of solar quiet variation, • files_log-source data files loaded into the database, • test_fc-test information about the number of missing values in the data.
The data itself is contained in the tables pre_min, pre_10sec and def_min. To increase the query speed, data is stored in an hourly-packed binary format. An important element in organizing data storage is the datasource field, which stores the data source code. This allows to store data from different sources for the same observatory or station. At the same time, it is possible to get access to all data for an observatory or station obtained from different sources or to request data only from one specific source. This is the unique feature of the proposed data storage system. The data sources themselves are stored in the datasource_types table.
Along with the initial observational data, the database also hosts indices and indicators of geomagnetic activity based on these data. To store indices, which are ready-made from other data sources, the index_min and index_min_plain tables are used to store packed and unpacked data, respectively. To store the indicators of geomagnetic activity, which are calculated in the system itself, the ind table is used. Indicator types are stored in the ind_types table. The following types of geomagnetic activity indicators are calculated and stored in the database: amplitude, rate of change dB/dt, the measure of anomalousness (Soloviev, Agayan & Bogoutdinov 2016). For each indicator and each observatory or station, an individual scale of geomagnetic activity is calculated using the available data. The scales calculated in this way are stored in the grades table.
The presence of such scales makes it possible to single out moments of geomagnetic activity uniformly and simultaneously at different observation points and for different indicators.
Since we do not have permission to provide online access to data from all used observational networks yet, only the online service for determining the availability of data stored in the database is now available (RSF4 2020). In the future, after obtaining the necessary permissions, it is planned to organize online access to the database similar to RUGDC (2020).

Data Quality Assessment
To assess the quality of data from the different observation networks integrated into the database, the data sources have been ranked according to the information on processing and verification of the geomagnetic observations. The evaluation of the data source was based on the following criteria: 1. Use of automatic algorithms to control and correct data quality (computer processing-1 point).
2. Carrying out manual verification of data by qualified specialists (expert processing-1 point).
3. Availability of a document describing the recommendations on data quality (data requirements-1 point). 4. Data verification by an independent expert (data verification-1-2 points).
The values of the criteria for the INTERMAGNET network are obtained from INTERMAGNET (2020) and St-Louis et al. (2012). The INTERMAGNET data is evaluated in three stages by independent experts from around the world. The data quality requirements of the IMAGE network are described by Viljanen & Hakkinen (1997). Table 1 shows the data sources used to create the database, with their evaluation according to the above criteria.
Lack of expert processing and verification of data may lead to both omissions of abnormal perturbations in the data and removal of periods containing natural field changes. Figure 3 shows a comparison between the data provided by SuperMAG and the definitive INTERMAGNET data for the horizontal X component of the Nurmijärvi (NUR) Observatory. As can be seen from the figure, there is a gap in SuperMAG data from 17:35 to 17:40 March 17, 2015. This gap is most likely associated with the false detection of man-made disturbances by the automatic SuperMAG algorithm.

Conclusion
When working with geomagnetic data it is necessary to have data in a uniform format and in a uniform coordinate system. However, existing geomagnetic observation networks have different spatial coverage, use different data formats and coordinate systems. The quality of data provided by different networks also varies considerably. Therefore, we have attempted to create a uniform geomagnetic database that combines data from different observation networks in a single format and coordinate system. It also must be possible to differentiate data from the different networks so that the quality of the data provided by them can be taken into account.