The Study of Time Series Using the DMA Methods and Geophysical Applications

Abstract

The discrete mathematical analysis (DMA) is a series of algorithms aimed at the solution of basic problems of data analysis: clustering and tracing in multidimensional arrays, morphological analysis of reliefs, search for anomalies and trends in records etc. All the DMA algorithms are of universal nature, joined by the same formal foundation, based, in its turn, on fuzzy logic (FL) and fuzzy mathematics (FM).

The current study finalizes the search for the anomalies in one-dimensional time series within the scope of DMA: here the initial concept of an interpreter’s logic gets its additional development. First, the formal expert’s opinions are more fully expressed, and this is realized with the more complex measures of activity (the concept of straightenings (Gvishiani et al. 2003; Gvishiani et al. 2004; Zlotnicki et al. 2005) is replaced by the measures of activity which come to the fore): second, for the junction of anomalies, a recently created DPS (Discrete Perfect Sets) algorithm is used DPS (Discrete Perfect Sets) (Agayan et al. 2011; Agayan et al. 2014).

DOI: http://doi.org/10.5334/dsj-2016-016
Accepted on 10 Oct 2016            Submitted on 20 Jun 2016

A short summary of the study

We will now describe, in a semi-formal way, what the study is dedicated to, and what will be carried out in it.

Let there be a time series f, defined on a finite, in common case, irregular set of nodes T = {t1 < … < tN}. The time series f is analyzed by an expert ε.

It is supposed that the expert ε has a point of view on the dynamic pattern of f in every node tT, and a number of questions is posed at this time. Let us formulate the first two questions:

Question 1: To what extent μεf(t) ∈ [0, 1] is the dynamic pattern of the time series f the expert ε is interested in, expressed (how active is it) in the node t?

Question 2: How is the dynamic pattern of the time series f, the ε is interested in, alternating at the time moment t is the measure μεf increasing (decreasing) in the node t or passing through an extremal transition?

The answers for this questions form the basis of what is called a monitoring of activity of the time series f with the expert ε point of view. The measure of activity μεf gives the key to the spot ε-anomalies: ε-anomalous moments of f are the moments t where μεf(t) ≥ 0.75 (if some feature is expressed with the scale of [0, 1], then it’s a matter of course to consider its occurrences within the interval [0.75, 1] as anomalous).

Let us denote LA(μεf) the set of ε-anomalous nodes:

$LA\text{\hspace{0.17em}}\left(\text{\hspace{0.17em}}\mu {\epsilon }_{f}\right)=\text{{}t\in T:\mu {\epsilon }_{f}\left(t\right)\ge 0.75\right\}$

Forming of LA(μεf) is also included into the problem of activity monitoring.

Due to the randomness of the time series f, its local anomalous nodes LA can comprise a subset in T which demands extra work (preliminary topological filtering (Gvishiani et al. 2003) and further clustering (Mikhailov et al. 2003; Agayan et al. 2004; Soloviev et al. 2005)). The point is that the connected side-by-side and sufficiently large excesses of the measure μεf over the anomaly level 0.75 are of the main interest as it is them what should be considered expressions (traces) of global anomalies of f from the perspective of ε.

Let us denote A(μεf) their plurality:

$A\left(\mu {\epsilon }_{f}\right)=\left\{{A}_{1},\dots ,{A}_{L}\right\}$

The transition LA(μεf) → A(μεf) comes with the help of the DPS. algorithm. This is how the answer is obtained for the question 3.

Question 3: Which fragments of the time series f are considered interesting from the perspective of the expert ε?

The last question is formulated this way:

Question 4: How to compare and rank the ε-anomalies of f?

DMA includes a corresponding implement which makes it possible to compare the fragments to each other and rank them (Gvishiani et al. 2008; Gvishiani et al. 2008), it will be told about that further. Here we shall list a couple of examples for the quantitative indicators of the anomalies Ai, inducing their ranking.

Example 1 Massiveness (continuity) mAi of an anomaly Ai: mAi = |Supp Ai| number of nodes in the carrier of Ai.

Example 2 Activity μAi of an anomaly

The summary will follow the scheme displayed on Fig. 1, in the simplest suppositions, however, fully keeping the thoroughness of the analysis of a time series announced above.

Figure 1

The scheme of the analysis of a time series using the FL methods.

A finite time series f (initial data)

Period of observation T – the finite regular number of nodes

Function f – a time series defined on T

Localization of space T in the node t is realized with the fuzzy structures δt on T, normalized at t and decreasing with the increase of the distance from t:

The δt structures are called the measures of closeness on T to the node t. They center T around t by assigning the weight functions ${\delta }_{t}\left(\overline{t}\right)$ to the nodes $\overline{t}$, act as surroundings for t in T, and formalize the view of the expert ε on T at t (Gvishiani et al. 2003; Gvishiani et al. 2004).

In the examples given below, p stand for a non-negative parameter responsible for the scale of δ-localization: the higher p is, the stronger the localization is: in particular, p = 0 results in the global point of view on T, the same for each node.

Example 3

1. The global measure of closeness (Fig. 2)
2. The local measure of closeness (Fig. 3)

Figure 2

The global measure of closeness.

Figure 3

The local measure of closeness.

An expert ε (formalization)

Dynamical indicator D – a non-negative functional on T, parametrized by T

where 𝔉(T) is a function space on T.

The values of the function D(f, t) are denoted Ff(t) and regarded as quantitative estimates of the behavior of the function f in a node tT according to the approach D to its dynamics.

There exists an open to supplementation set of basic indicators which showed a good performance in practice. Let us list some of them.

Example 4

1. Energy
(1)
${E}_{f}\left(t|\delta \right)=\frac{\sum {}_{\overline{t}\in T}|f\left(\overline{t}\right)-{M}_{f}\left(t|\delta \right)|{\delta }_{t}\left(\overline{t}\right)}{\sum {}_{\overline{t}\in T}{\delta }_{t}\left(\overline{t}\right)}$

where
${M}_{f}\left(t|\delta \right)=\frac{\sum {}_{\overline{t}\in T}f\left(\overline{t}\right){\delta }_{t}\left(\overline{t}\right)}{\sum {}_{\overline{t}\in T}{\delta }_{t}\left(\overline{t}\right)}$
2. Jaggedness
(2)
${L}_{f}\left(t|\delta \right)=\frac{\sum {}_{\overline{t}\in T}|{f}_{g}^{\prime }\left(\overline{t}\right)|{\delta }_{t}\left(\overline{t}\right)}{\sum {}_{\overline{t}\in T}{\delta }_{t}\left(\overline{t}\right)}$

where ${f}_{g}^{\prime }\left(\overline{t}\right)$ is the f regression derivative with respect to the closeness δ
${f}_{g}^{\prime }\left(\overline{t}\right)=\frac{|\begin{array}{cc}\sum {}_{\stackrel{˜}{t}\in T}{\delta }_{\overline{t}}\left(\stackrel{˜}{t}\right)f\left(\stackrel{˜}{t}\right)\stackrel{˜}{t}& \sum {}_{\stackrel{˜}{t}\in T}{\delta }_{\overline{t}}\left(\stackrel{˜}{t}\right)\stackrel{˜}{t}\\ \sum {}_{\stackrel{˜}{t}\in T}{\delta }_{\overline{t}}\left(\stackrel{˜}{t}\right)\stackrel{˜}{t}& \sum {}_{\stackrel{˜}{t}\in T}{\delta }_{\overline{t}}\left(\stackrel{˜}{t}\right)\end{array}|}{|\begin{array}{cc}\sum {}_{\stackrel{˜}{t}\in T}{\delta }_{\overline{t}}\left(\stackrel{˜}{t}\right){\stackrel{˜}{t}}^{2}& \sum {}_{\stackrel{˜}{t}\in T}{\delta }_{\overline{t}}\left(\stackrel{˜}{t}\right)\stackrel{˜}{t}\\ \sum {}_{\stackrel{˜}{t}\in T}{\delta }_{\overline{t}}\left(\stackrel{˜}{t}\right)\stackrel{˜}{t}& \sum {}_{\stackrel{˜}{t}\in T}{\delta }_{\overline{t}}\left(\stackrel{˜}{t}\right)\end{array}|}$
3. Scatterness
(3)

where q > 0.

The dynamical indicators are related to the very natural and important but the simplest points of view of the expert ε on the dynamics of the time series f. The general case is combined from the simple ones, plus, the composition can’t be straight and mechanical. The point is that, initially, the dynamical indicators Df for different D are not comparable to each other and their junction additional efforts by means of the measures of maximality or dynamical activity of the indicator Df.

Measure of maximality mesmax (constructions)

Measure of dynamical activity μDf is a function membership to the fuzzy concept of the “activity of the function f in the node t from the perspective of the indicator D” on T (Gvishiani et al. 2008; Gvishiani et al. 2008; Gvishiani et al. 2008c). The measure of activity μDf(t) results from the Df(t) indicator using one or another construction of the measure of maximality mesmax.

The measure of maximality mesmaxB(x) is a fuzzy structure on ℝ, which answers the following question for a finite weighted arrangement $B=\left\{\left({b}_{k},{\omega }_{k}\right){|}_{k=1}^{M},{\omega }_{k}>0\right\}$: “To what degree the number x is big modulo B?” In DMA, for a mesmax measure there are four constructions: “fuzzy comparisons”, “Kolmogorov averages”, “fuzzy means”, “iteration bounds” (Gvishiani et al. 2004; Zlotnicki et al. 2005; Gvishiani et al. 2008).

For a measure μDf(t), the (Im Df, δt), which is an image of Df centered in the node t with the help of the measure of closeness δt, acts as B:

and

(4)

Example 5

• 1. The “fuzzy comparison” construction: binary version.
• 2. The “fuzzy comparison” construction: gravitational version.

where
$\overline{{D}_{f}\left(t\right)}=\frac{\sum {}_{\overline{t}\in T}{D}_{f}\left(\overline{t}\right){\delta }_{t}\left(\overline{t}\right)}{\sum {}_{\overline{t}\in T}{\delta }_{t}\left(\overline{t}\right)}$
• 3. The “fuzzy comparison” construction: sigma version.
$\mu {D}_{f}\left(t\right)=\frac{{\sigma }^{r}\left({D}_{f}\left(t\right)\right)-{\sigma }^{l}\left({D}_{f}\left(t\right)\right)}{{\sigma }^{r}\left({D}_{f}\left(t\right)\right)+{\sigma }^{l}\left({D}_{f}\left(t\right)\right)}\text{\hspace{0.17em}}$

where
$\begin{array}{l}{\sigma }^{l}\left({D}_{f}\left(t\right)\right)=\sum {}_{\overline{t}\in T}\left({D}_{f}\left(t\right)-{D}_{f}\left(\overline{t}\right)\right){\delta }_{t}\left(\overline{t}\right):{D}_{f}\left(\overline{t}\right)<{D}_{f}\left(t\right)\\ {\sigma }^{r}\left({D}_{f}\left(t\right)\right)=\sum {}_{\overline{t}\in T}\left({D}_{f}\left(\overline{t}\right)-{D}_{f}\left(t\right)\right){\delta }_{t}\left(\overline{t}\right):{D}_{f}\left(\overline{t}\right)<{D}_{f}\left(t\right)\end{array}$
• 4. The “Kolmogorov averages” construction.

where p* is the solution of the equation
${\left(\frac{\sum {}_{\overline{t}\in T}f{\left(\overline{t}\right)}^{p*}{\delta }_{t}\left(\overline{t}\right)}{\sum {}_{\overline{t}\in T}{\delta }_{t}\left(\overline{t}\right)}\right)}^{1/p*}={D}_{f}\left(t\right)$

Remark 1 The listed measures of activity μDf possess the values within [–1, 1]. The proper measures of activity possess the values within [0, 1] and result from the transformation$\mu {D}_{f}\to \frac{\mu {D}_{f}+1}{2}$.

Measure of activity μεf (constructions)

The transition DfμDf translates the function f analysis into the fuzzy logic and fuzzy mathematics language (Gvishiani et al. 2003): the measures of activity μDf for different D possess the values within the unified scale of the interval [0, 1] and can be combined in any proportions and quantities using numerous FL operations and averagings of any type which will be denoted *. It becomes possible to instill the content to the complicated activity of f by the plurality of dynamical indicators 𝔇 in the node tT:

(5)
$\mu {\mathcal{D}}_{f}\left(t\right)={*}_{D\in \mathcal{D}}\left(\mu {D}_{f}\left(t\right)\right)$

It is the construction (1) by which the view of expert ε on the dynamics of f is commonly modeled (Gvishiani et al. 2004). Such modeling is an art in many respects, and it involves selection of basic dynamical indicators 𝔇(ε) and their appropriate junction *(ε):

(6)
$\mu {\epsilon }_{f}\left(t\right)=*{\left(\epsilon \right)}_{D\in \mathcal{D}\left(\epsilon \right)}\left(\mu {D}_{f}\left(t\right)\right)$

Example 6

1. Junctions * using the fuzzy conjunctionmake it possible to consider all the basis dynamics from 𝔇(ε) simultaneously:

(7)
$\mu {\epsilon }_{f}\left(t\right)={\top }_{D\in \mathcal{D}\left(\epsilon \right)}\left(\mu {D}_{f}\left(t\right)\right)$

2. Junctions * using the fuzzy disjunctionmake it possible for every basic dynamic from 𝔇(ε) to approve itself:

(8)
$\mu {\epsilon }_{f}\left(t\right)={\perp }_{D\in \mathcal{D}\left(\epsilon \right)}\left(\mu {D}_{f}\left(t\right)\right)$

3. Let us suppose that the expert ε gives initial precedence on the basic dynamics from 𝔇(E) in the form of weights ω(D), then one of possible junctions *, able to consider this fact will be the weighted averaging

(9)
$\mu {\epsilon }_{f}\left(t\right)=\frac{{\sum }_{D\in \mathcal{D}\left(\epsilon \right)}\omega \left(D\right)\mu {D}_{f}\left(t\right)}{{\sum }_{D\in \mathcal{D}\left(\epsilon \right)}\omega \left(D\right)}$

Monitoring of activity μεf (concept)

Analysis of the measure μεf and its basic components $\mu {D}_{f}^{m}$, Dm ∈ 𝔇(ε) is the essence of the monitoring of f by an expert ε. It includes:

1. Direct analysis of the μεf activity which makes it possible to understand to what extent the dynamic of interest for ε is expressed on f (Fig. 4) for every moment tT. The result is the disjunctive partition of T in three pieces
(10)
$T=L\text{\hspace{0.17em}}B\text{\hspace{0.17em}}\left(\mu {\epsilon }_{f}\right)\text{\hspace{0.17em}\hspace{0.17em}}\vee \text{\hspace{0.17em}\hspace{0.17em}}L\text{\hspace{0.17em}}P\text{\hspace{0.17em}}A\left(\mu {\epsilon }_{f}\right)\vee \text{\hspace{0.17em}}L\text{\hspace{0.17em}}A\text{\hspace{0.17em}}\left(\mu {\epsilon }_{f}\right)$
1. LB(μεf): locally background nodes
(10.1)
$L\text{\hspace{0.17em}}B\text{\hspace{0.17em}}\left(\mu {\epsilon }_{f}\right)\text{\hspace{0.17em}\hspace{0.17em}}=\left\{\text{\hspace{0.17em}}t\text{\hspace{0.17em}}\in \text{\hspace{0.17em}\hspace{0.17em}}T\text{\hspace{0.17em}}:\text{\hspace{0.17em}}\mu {\epsilon }_{f}\left(t\right)\text{\hspace{0.17em}\hspace{0.17em}}\in \text{\hspace{0.17em}\hspace{0.17em}}\left[0;\text{\hspace{0.17em}\hspace{0.17em}}0.5\right)\right\}$
2. LPA(μεf): locally potentially anomalous nodes
(10.2)
$L\text{\hspace{0.17em}}P\text{\hspace{0.17em}}A\text{\hspace{0.17em}}\left(\mu {\epsilon }_{f}\right)\text{\hspace{0.17em}\hspace{0.17em}}=\left\{\text{\hspace{0.17em}}t\text{\hspace{0.17em}}\in \text{\hspace{0.17em}\hspace{0.17em}}T\text{\hspace{0.17em}}:\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}\mu {\epsilon }_{f}\left(t\right)\text{\hspace{0.17em}\hspace{0.17em}}\in \text{\hspace{0.17em}\hspace{0.17em}}\left[0.5;\text{\hspace{0.17em}\hspace{0.17em}}0.75\right)\right\}$
3. LA(μεf): locally anomalous nodes
(10.3)
$L\text{\hspace{0.17em}}A\text{\hspace{0.17em}}\left(\mu {\epsilon }_{f}\right)\text{\hspace{0.17em}\hspace{0.17em}}=\left\{\text{\hspace{0.17em}}t\text{\hspace{0.17em}}\in \text{\hspace{0.17em}\hspace{0.17em}}T\text{\hspace{0.17em}}:\text{\hspace{0.17em}}\mu {\epsilon }_{f}\left(t\right)\text{\hspace{0.17em}\hspace{0.17em}}\in \text{\hspace{0.17em}\hspace{0.17em}}\left[\text{\hspace{0.17em}}0.75;\text{\hspace{0.17em}}1\right)\right\}$
2. Trend analysis of the activity of μεf based on regression derivatives: it helps to understand how is the f dynamics of ε’s interest t alternating at every moment tT: is it increasing, decreasing or passing through an extremal transition.
3. statistical analysis of the total of the basic activities $\left\{\mu {D}_{f}^{m},{D}^{m}\in \mathcal{D}\left(\epsilon \right)\right\}$ using mean and covariance characteristics $\overline{\mu {D}_{f}^{m}}$ and Cov$\mu {D}_{f}^{{m}_{1}},\mu {D}_{f}^{{m}_{2}}$. The result is the coding of the behavior of f on T from the perspective of ε and the possibility of comparison with the behavior of another time series $\overline{f}$ on, generally speaking, another period $\overline{T}$ from the perspective of the same ε.
Figure 4

The function under examination.

Remark 2 Real partitions (10) have close thresholds but not always equal, to ideal thresholds 0.5 and 0.75. It is not principal for further studies and therefore in the present paper is convenient to consider monitoring of activity to be ideal.

Measure and monitoring of activity (examples)

For a record f (Fig. 4), the simple measures of activity μLf (Fig. 2) and μOf (Fig. 3), were constructed. They correspond to the dynamical indicators of jaggedness (2) and scatterness (3). Afterwards, they are joined in the complex measures of activity using three approaches (7)–(9). On the constructed measures (Figs. 5 and 7) and the initial record (Fig. 6 and 8), a partition is shown (10): blue color for the background nodes (10.1), green for the potentially anomalous nodes (10.2), and red for the anomalous nodes (10.3).

Figure 5

The measures of activity μDf(t) : a) based on (2), b) based on (3).

Figure 6

Anomalies marked out using the measures of activity (Fig. 5).

Figure 7

The measures of activity μεf(t): a) based on (7), b) based on (8), c) based on (9).

Figure 8

Anomalies marked out using the measures of activity (Fig. 7).

On the simple measures of activity μLf (Fig. 2) and μOf (Fig. 3) respectively, 6 and 5 anomalous areas were identified and marked red. On the basis of simple measures, three complex measures of activity were constructed: using the fuzzy conjunction (7) (Fig. 7a), using the fuzzy disjunction (8) (Fig. 7b), using the weighted average (9) (Fig. 7c). Correspondingly, 3, 8 and 5 anomalous areas were identified, respectively, whose configuration and location are different from the anomalous areas identified by the simple measures of activity. This proves the disjunctive, conjunctive and compromise character of the complex measures of activity (7–9) (Zlotnicki et al. 2005).

On Fig. 9a and Fig. 10a, the trend analysis of the μLf are shown, based on the regression derivatives with respect to the sufficiently global measure of closeness δ. The result of the analysis is the partition of the μLf in increasing (red), decreasing (green) and flat areas (blue). As a result, a morphological separation of anomalies (Fig. 6a) trend components was carried out (Figs. 9b, 10b) (Gvishiani et al. 2008).

Figure 9

a) Areas of monotoneness (red is for increase, green is for decrease); b) Reduction to the initial record.

Figure 10

A fragment of initial record (Fig. 9): a) Areas of monotoneness (red is for increase, green is for decrease); b) Reduction to the initial record.

Global ε-anomalies (conception)

Globally anomalous fragments of a time series f from the perspective of the expert ε (ε-anomalies) are the uplifts of the relief μεf

The second part of the study is dedicated to the search for uplifts on an arbitrary non-negative relief. Their search is realized using the DPS algorithm in two stages.

The DPS algorithm

Let X be a finite set, and A, B, … and x, y, … are the subsets and the points in it, respectively.

The density P on a subset X is a mapping of 2X × X into the interval [0, 1], increasing with respect to the first argument:

The value of PA(x) is defined as the density of the subset A in the point x.

The subset A is called the α-perfect in X for α ∈ [0, 1], if

The process of construction of the maximal α-perfect subset X(α) in X is called the algorithm DPS (Agayan et al. 2011; Agayan et al. 2014):

$X\left(\alpha \right)\text{\hspace{0.17em}\hspace{0.17em}}=\text{\hspace{0.17em}\hspace{0.17em}DPS\hspace{0.17em}(}X\text{|}P,\text{\hspace{0.17em}\hspace{0.17em}}\alpha \text{)}$

The subset X(α) results from the intersection

where

${X}^{k+1}\left(\alpha \right)=\left\{x\in {X}^{k}\left(\alpha \right)\text{\hspace{0.17em}\hspace{0.17em}}:\text{\hspace{0.17em}\hspace{0.17em}}{P}_{{X}^{k}\left(\alpha \right)\text{\hspace{0.17em}\hspace{0.17em}}}\left(x\right)\ge \alpha \text{\hspace{0.17em}}\right\}$

The intersection ${\cap }_{k=1}^{\infty }$ is necessarily achieved because it’s always finite due to finiteness of X and the multiplicity of Xk+1(α) into Xk(α) for all k = 0, 1, …

The search for ε-anomalies: stage 1 – functional clustering

The search for ε-anomalies will be illustrated for the function f (Fig. 11) and the simple measure of activity μLf (Fig. 12) corresponding to the dynamical indicator of jaggedness (2).

Figure 11

The initial time series.

Figure 12

The corresponding activity.

Now we define the density P on T by the measure of dynamic activity μDf: let Δ > h be the local observation parameter, A is a subset in T, t – node from T, then

${P}_{A}\left(t\right)=\text{\hspace{0.17em}}\sum \frac{\mu {\epsilon }_{f}\left(\stackrel{-}{t}\right)}{\text{|}T\text{|}}\text{\hspace{0.17em}}:\text{\hspace{0.17em}\hspace{0.17em}}\stackrel{-}{t}\text{\hspace{0.17em}}\in \text{\hspace{0.17em}\hspace{0.17em}}\left[\text{\hspace{0.17em}}t-\Delta ,\text{\hspace{0.17em}\hspace{0.17em}}t\text{\hspace{0.17em}}+\Delta \right]\text{\hspace{0.17em}}\cap \text{\hspace{0.17em}}A$

The result of the DPS algorithm operation with such density on T with a fixed α is the first part of the search for anomalous nodes for the function f on the period of observation T from the perspective of the expert ε (Fig. 13):

$T\left(\alpha \right)=\text{\hspace{0.17em}\hspace{0.17em}DPS\hspace{0.17em}(}T\text{|}P,\text{\hspace{0.17em}\hspace{0.17em}}\alpha ,\text{\hspace{0.17em}\hspace{0.17em}}\mu {\epsilon }_{f}\text{)}$

Figure 13

DPS: step 1. Results of identification of anomalies on the indicator of activity.

The set T(α) is assumed as a disjunctive union of the intervals [ak, bk], k = 1, …, m. Every [ak, bk] is located in T and has no gaps inside (Fig. 13):

$T\left(\alpha \right)=\text{\hspace{0.17em}\hspace{0.17em}}{\vee }_{k=1}^{m}\left[{a}_{k},\text{\hspace{0.17em}\hspace{0.17em}}{b}_{k}\right]$

The search for ε-anomalies: stage 2 – interval clustering

Let X(α) be the totality of the intervals $\left\{\left[{a}_{k},{b}_{k}\right]{|}_{1}^{m}\right\}$, and let us introduce the measure of closeness ρ between two intervals from X(α):

The closeness p helps to construct the density 𝒫 on X(α): if SX(α), [ak, bk] ∈ X(α), then

${\mathcal{P}}_{S}=\underset{\left[{a}_{\overline{k}},{b}_{\overline{k}}\right]\in S\\left[{a}_{k},{b}_{k}\right]}{\mathrm{max}}\rho \left(\left[{a}_{k},{b}_{k}\right],\left[{a}_{\overline{k}},{b}_{\overline{k}}\right]\right)$

The result of the algorithm operation DPS with such density on X(α) with a fixed β is the second part of the search for anomalous points for the function f on the period of observation T from the perspective of the expert ε (Figs. 1415)

$X\left(\alpha \right)\text{\hspace{0.17em}}\left(\beta \right)=\text{DPS\hspace{0.17em}\hspace{0.17em}(}X\left(\alpha \right)\text{\hspace{0.17em}|}\mathcal{P},\text{\hspace{0.17em}\hspace{0.17em}}\beta ,\text{\hspace{0.17em}\hspace{0.17em}}\mu {\epsilon }_{f}\text{)}$

Figure 14

DPS: step 2. Clustering of the intervals.

Figure 15

DPS: final result. Reduction to the initial function.

On the second step, the DPS algorithm indicates which intervals in X(α) should additionally be joined into one anomaly – these are exactly the intervals from X(α)(β) (Agayan et al. 2014).

Thus, the finally anomalous intervals for the function f, from the perspective of the expert ε, would be the fairly isolated ones from X(α) which are not included into X(α)(β), and also the new intervals which result from uniting the intervals from X(α)(β).

Let us denote the totality of all the anomalies 𝒜 = 𝒜(μεf):

$\mathcal{A}=\left\{\text{\hspace{0.17em}}{A}_{l}=\text{\hspace{0.17em}\hspace{0.17em}}{A}_{l}\left(\mu {\epsilon }_{f}\right)|{}_{l=1}{}^{L}\right\}$

Studying the ε-anomalies

Let us denote $\mathcal{A}\left(\mu {\epsilon }_{f}\right)=\left\{{A}_{l}{|}_{/=1}^{L}\right\}$ the totality of all the ε-anomalies detected by DPS. Any ε-anomaly Al defines the fragment fl = f|Al of the time series f and could be studied according to the program listed above with respect to the measure μεf (the internal Al-monitoring of activity, superanomalies inside Al, and so on). It is natural to call such study of Al the internal study. As an ambivalent an external study of the anomaly Al is supposed, characterizing Al among the other anomalies from 𝒜(μεf). Let’s talk about it more in detail.

There are many characteristics of ε-anomalies, each of them ranks the ε-anomalies in 𝒜(μεf) in its own way. The characteristics can be simple and complex (an analogy to the simple and complex measures of activity). Keeping in mind a compromise between simplicity and thoroughness, mentioned in the beginning, we shall list five natural characteristics and display how to make the complex ones from them.

The DPS algorithm identified 44 anomalies on the record f (Fig. 15). Let us construct the simple characteristics from them:

1. Massiveness (continuity) of the anomaly Al (Fig. 16):
$m\text{\hspace{0.17em}}{A}_{l}=\text{|}{A}_{l}\text{|}$
2. Mean activity of the anomaly Al (Fig. 17):
$\mu {A}_{l}=\frac{\text{\hspace{0.17em}}\sum \mu {\epsilon }_{f}\left(t\right)\text{\hspace{0.17em}\hspace{0.17em}}:\text{\hspace{0.17em}\hspace{0.17em}}t\in {A}_{l}}{\text{|}{A}_{l}\text{|}}$
3. Mean energy of the anomaly Al (Fig. 18):
$E{A}_{l}=\frac{\text{\hspace{0.17em}}\sum {E}_{f}\left(t\right)\text{\hspace{0.17em}\hspace{0.17em}}:\text{\hspace{0.17em}\hspace{0.17em}}t\in {A}_{l}}{\text{|}{A}_{l}\text{|}}$
4. Mean jaggedness of the anomaly Al (Fig. 19):
$L{A}_{l}=\frac{\text{\hspace{0.17em}}\sum {L}_{f}\left(t\right)\text{\hspace{0.17em}\hspace{0.17em}}:\text{\hspace{0.17em}\hspace{0.17em}}t\in {A}_{l}}{\text{|}{A}_{l}\text{|}}$
5. Mean scatterness of the anomaly Al (Fig. 20):
$O{A}_{l}=\frac{\sum {O}_{f}\left(t\right)\text{\hspace{0.17em}\hspace{0.17em}}:\text{\hspace{0.17em}}t\in {A}_{l}}{\text{|}{A}_{l}\text{|}}$
Figure 16

The massiveness of anomalies.

Figure 17

Mean activity of the anomalies.

Figure 18

Mean energy of the anomalies.

Figure 19

Mean jaggedness of the anomalies.

Figure 20

Mean scatterness of the anomalies.

As a result, five totalities occur:

$\begin{array}{c}m\mathcal{A}\left(\mu {\epsilon }_{f}\right)=\left\{m{A}_{l}{|}_{l=1}^{L}\right\};\mu \mathcal{A}\left(\mu {\epsilon }_{f}\right)=\left\{\mu {A}_{l}{|}_{l=1}^{L}\right\};\\ E\mathcal{A}\left(\mu {\epsilon }_{f}\right)=\left\{E{A}_{l}{|}_{l=1}^{L}\right\};L\mathcal{A}\left(\mu {\epsilon }_{f}\right)=\left\{L{A}_{l}{|}_{l=1}^{L}\right\};O\mathcal{A}\left(\mu {\epsilon }_{f}\right)=\left\{O{A}_{l}{|}_{l=1}^{L}\right\}\end{array}$

The comparison of every anomaly A with the corresponding indicators of other anomalies from 𝒜(μεf) based on the measure of maximality mesmax gives five indices of the anomaly A, similar to dynamical indicators:

1. ${\text{ind}}_{m}A=\underset{m\mathcal{A}\left(\mu {\epsilon }_{f}\right)}{\text{mes}\mathrm{max}}mA$ (Fig. 21)
2. ${\text{ind}}_{\mu }A=\underset{\mu \mathcal{A}\left(\mu {\epsilon }_{f}\right)}{\text{mes}\mathrm{max}}\mu A$ (Fig. 22)
3. ${\text{ind}}_{E}A=\underset{E\mathcal{A}\left(\mu {\epsilon }_{f}\right)}{\text{mes}\mathrm{max}}EA$ (Fig. 23)
4. ${\text{ind}}_{L}A=\underset{L\mathcal{A}\left(\mu {\epsilon }_{f}\right)}{\text{mes}\mathrm{max}}LA$ (Fig. 24)
5. ${\text{ind}}_{O}A=\underset{O\mathcal{A}\left(\mu {\epsilon }_{f}\right)}{\text{mes}\mathrm{max}}OA$ (Fig. 25)
Figure 21

Index of anomalies based on massiveness.

Figure 22

Index of anomalies based on mean activity.

Figure 23

Index of anomalies based on mean energy.

Figure 24

Index of anomalies based on mean jaggedness.

Figure 25

Index of anomalies based on mean Scatterness.

Simple indices, using the FL * operations, produce the new complex integral indices of anomalies $\stackrel{˜}{\text{ind}}$.

Example 7 A complex index based on two simple indices (Fig. 26)

(11)
$\stackrel{˜}{\text{ind}}A=\frac{{\text{ind}}_{L}A+{\text{ind}}_{O}A}{2}$

Figure 26

Complex index of anomalies.

Example 8 On Fig. 27a, five of 44 anomalies identified using the indLA index are displayed, on Fig. 27b anomalies identified using the indOA are displayed, and on Fig. 27c anomalies marked out using the $\stackrel{˜}{\text{ind}}A$ index (11) are shown.

Figure 27

Anomalies marked out using: a) indLA; b) indOA; c)$\stackrel{˜}{\text{ind}}A$.

Measure and monitoring of geomagnetic activity

Let us analyze the geomagnetic monitoring (10–10.3) on the basis of the measure of activity μ = μLf (4) constructed according to the dynamical indicator of irregularity Lf (2). The application of this measure to the ground-based observation data from the global geomagnetic observatory network brought an opportunity of spatial-temporal analysis of geomagnetic activity and the estimation of the dynamics of magnetic storm propagation in realtime (Soloviev et al. 2013; Gvishiani et al. 2014; Soloviev et al. 2016). The usage of the measure makes it possible to estimate the geomagnetic activity in various regions of the Earth within a single scale, taking its regional specification into account. The developed measure allows the analysis of the internal fine structure of magnetic storms, the dynamics of their development both around the globe and within particular regions. When estimating the geomagnetic activity, the whole range of initial data is efficiently considered in, as the temporal resolution of the measure matches the initial data sampling frequency.

The example of application of the developed measure of anomality to the geophysical data analysis was the study of magnetic activity using the time series from the magnetic observatory “Saint Petersburg” (IAGA code SPG). The initial observatory data is available at the Russian-Ukrainian Geomagnetic Data Center website (http://geomag.gcras.ru). For illustration purposes, the data for the period March 28 – April 8, 2016 was taken, as it contains the fragments of both increased (April 2–3, April 7–8) and decreased (March 31 – April 1, April 5) magnetic activity.

According to the planetary Kp index data (Fig. 28), during the given time period the magnetic conditions varied from calm (the Kp index values were from 0 to 2) and slightly disturbed (Kp = 2 … 3) to disturbed (the Kp index values were up to 5+). The disturbed data fragments corresponded to the magnetic storms which occurred from April 7, 2016, 17:00 UT, to April 8, 2016, 02:00 UT, and the increased magnetic activity on April 2–3, 2016.

Figure 28

Planetary Kp index values for the period from 28.03.2016 to 08.04.2016.

The magnetic disturbance, which lasted from April 2, 2016, 15:00 UT to April 3, 2016, 03:00 UT, was characterized first by the increase of the amplitude of variations of the components X, Z and the total intensity F by 70–80 nT, after which the values of these components smoothly decreased to –180… –200 nT from their initial levels. This minimum took place approximately at midnight on April 2–3, next the values of the components reached the initial values by 03:00 UT, April 3, 2016. The magnetic storm which took place on April 7–8 was characterized by rapid evolution. During the sudden commencement period (about 17:11 UT) the amplitude of the variations of the X component at the Saint Petersburg magnetic observatory increased respectively to its background value from 14565 to 14580 nT. During the main phase of the magnetic storm, which evolved approximately from 19:00 to 21:53 UT, the decrease by about 150 nT for the X component, 180 nT for the Y component, and more than 200 nT for the Z component took place. According to the data records of several Russian magnetic observatories located in the same latitudinal belt, the main phase was accompanied by the decrease, which varied between 100–150 and 250 nT for the X and Y components, and was up to 300 nT for the Z component. The recovery phase of the magnetic storm lasted from about April 7, 2016, 22:00 UT to April 8, 2016, 02:00–02:30 UT. During this phase, some oscillations in the magnetic field intensity with a period of about 1 hour and the maximal amplitude of about 140–150 nT were observed. By the end of the recovery phase, the magnetic field at the observatory reached the values of the same level, which was registered before the storm onset.

The monitoring of geomagnetic activity for its Northern component X was performed using the measure of activity μ = μLf . Its results are displayed on Fig. 29. The geomagnetic events listed above were considered anomalous according to the measure μ and marked red on the plot. The data registered during the period of less disturbed magnetic field (e.g., for the period of 28–29 March), were recognized as potentially anomalous and marked green and purple on the plot (the potentially anomalous fragments are divided into two parts by the threshold value 0.61). Finally, for the period of quiet magnetic field, March 31–April 1, the values of the measure μ practically do not exceed the threshold value 0.5, and no data fragment for this period was classified as anomalous.

Figure 29

The results of classification of initial data into background (blue), weakly anomalous (green), anomalous (purple) and strongly anomalous (red) fragments using the measure of anomality. On the upper plot the initial record is shown, on the bottom plot the measure of anomality is displayed, calculated for this record.

Discussion

The comparison of the result of monitoring with the planetary Kp index data demonstrated that the classification of magnetic activity using the described measure does not contradict the classical methods for its estimation. At that, the principal advantages of the measure of anomality should be mentioned:

1. the matching of the temporal resolution of the measure to the initial sampling frequency of the time series;
2. the estimation of anomality using the records from different observatories in a unified and normalized scale.

The first advantage gives an opportunity of application of the measure to the monitoring of magnetic disturbance with minimal temporal delay in realtime mode. At that, the estimation of the magnetic activity is performed less raw than using the traditional 3-hour K index. The second advantage makes it possible to study the magnetic disturbance regardless of latitudinal location of the observatory, taking the regional specifications into account.

Conclusions

An exponential growth of digital observations of the Earth’s environment acts as a challenge for the development of adequate mathematical methods of data mining and systems analysis of input data. It is the usage of these methods that provides an opportunity operational extraction of useful information, such as specific geomagnetic events of different origin, from the large volumes of continuously collected data. The approach to the studying of time series based on fuzzy logic, proposed in the paper, is largely focused on this challenge and finds its practical application. In particular, several algorithms based on DMA (Soloviev et al. 2009; Bogoutdinov et al. 2010; Soloviev et al. 2012; Sidorov et al. 2012; Zelinskiy et al. 2014) are integrated into Russian-Ukrainian Geomagnetic Data Center (http://geomag.gcras.ru) and used for continuous automated detection of anthropogenic disturbances and geomagnetic pulsations in observatory data.

Further research plans include, first of all, the expansion of analysis from a single time series f to a network S of time series fs(t), sS. This will allow to define the monitoring of activity in network, anomalous events in network, and more.

The results presented in this paper rely on data collected at magnetic observatories. We thank the national institutes that support them and INTERMAGNET for promoting high standards of magnetic observatory practice (www.intermagnet.org).

The research has been carried out in the framework of the Federal target program of the Ministry of Education and Science of the Russian Federation, agreement No. 14.607.21.0058, project unique identifier RFMEFI60714X0058, and is a part of research aimed at the creation of the experimental prototype of the hardware and software complex for monitoring and detection of extreme geomagnetic phenomena using ground-based and satellite data.

Competing Interests

The authors have no competing interests to declare.