## 1 Introduction

Corrosion is very harmful to the useful lifetime of materials. It can bring enormous economic losses and severe dangers to society. Natural environment corrosion is one of the most common corrosion phenomena. It is very complex and has a vast set of factors responsible for its existence. Natural environment corrosion includes mainly atmospheric corrosion and marine corrosion.

Atmospheric corrosion is defined as the corrosion of materials exposed to the air and its pollutants. Electrochemical corrosion and chemical corrosion can occur in the atmospheric environment. However, electrochemical corrosion is more important than chemical corrosion in atmospheric corrosion (Morcillo et al., 2013). Metal absorbs a layer of ultra-thin water film on its surface that forms an electrolyte under atmospheric exposure conditions after a certain critical humidity level has been reached (Melchers, 2013; Wall et al., 2005). In the presence of this thin-film electrolyte, atmospheric corrosion proceeds by balancing anodic and cathodic reactions. The anodic oxidation reaction involves dissolution of the metal while the cathodic reaction is commonly an oxygen reduction reaction. Under thin-film corrosion conditions, oxygen from the atmosphere is readily supplied to the electrolyte. Seawater is the electrolyte solution with very high salinity. It is one of the most corrosive of all the natural corrosion agents. Thus, electrochemical corrosion of metals is stronger in a marine environment (Traverso et al., 2014; Shifler, 2005).

A large amount of data concerning the natural environment corrosion of materials has accumulated over the last 16 years through research supported by the Major Program of the National Natural Science Foundation of China (Xiao et al., 2002; Zhu et al., 2002). In this work, analysis architecture is built to make full use of these data. This architecture analyzes the inherent laws of data and obtains information on the influences of natural environment factors and the evolutionary rule of corrosion damage. This provides corresponding support and help for research on natural environment corrosion.

## 2 Statistical Analysis Modelling for Environmental Factors

The framework for the module of statistical analysis modelling for environmental factors is shown in Figure 1. The module builds the atmospheric environment and the marine environment database. All environmental factors for different atmospheric and marine test stations are modelled monthly by a statistical analysis technique. The Shapiro-Wilk method (W method) and the Kolmogorov-Smirnov method (K-S method) are used to check the normal distribution, the logarithmic normal distribution, the Weibull distribution, and the uniform distribution for the sample. Subsequently, the probability distribution function, the expected value and variance of population, and the confidence interval are estimated.

Figure 1

Framework for the module of statistical analysis modelling for environmental factors.

The atmospheric environment database involves seventeen kinds of atmospheric environmental factors: average temperature, average relative humidity, rainfall, rainfall hours, sunshine duration, SO2 settlement, NO2 settlement, H2S settlement, HCl settlement, SO3 settlement, Cl- settlement, NH3 settlement, dustfall of water-solubility, dustfall of water-insolubility, pH value of rainwater, and the SO4 and Cl- content of rainwater at the nine atmospheric test stations, Beijing, Wuhan, Jiangjin, Wanning, Qingdao, Guangzhou, Shenyang, Qionghai, and Hailar. The marine environment database involves nine kinds of marine environmental factors: seawater temperature, salinity, dissolved oxygen, pH value, temperature of the marine atmosphere, humidity, rainfall, sunshine, and wind speed at the four marine test stations, Qingdao, Zhoushan, Xiamen, and Yulin.

In the statistical analysis module, the normal distribution and the logarithmic normal distribution for the sample are checked by the W method. The W test is a kind of small sample test (3 ≤ n ≤ 50) that can only check the normal distribution (Jurečková et al., 2007; Srivastava et al., 1987). It can deal with the small sample events of atmospheric and marine environments. The W test is shown in Eq. (1)

(1)
$W=\frac{{\left(\sum _{i=1}^{k}{a}_{i}\left({x}_{n+1-i}-{x}_{i}\right)\right)}^{2}}{\sum _{i=1}^{n}{\left({x}_{i}-\mu \right)}^{2}}$

where n is the sample number, k = n/2 when n is an even number or k = (n-1)/2 when n is an odd number, µ is the mean of the sample, xi is the value of the ith sample, and ai is the ith constant. If W > Wα, the sample follows normal distribution, Wα is the quantile of the W test, and α is the level of significance. In order to check the logarithmic normal distribution, the sample needs a logarithm conversion before Eq. (1) is used.

The probability density functions of the normal and logarithmic normal distributions are shown in Eq. (2)

(2)

where σ is the standard deviation of the sample. For normal distribution, the estimate of expected value of population E(x) = µ and the estimate of variance of population D(x)= σ2. For the logarithmic normal distribution, the estimate of expected value of population E(X) = exp(µ + σ2/2) and the estimate of variance of population D(X) = (exp(σ2) – 1) exp(2µ + σ2).

The K-S test is a type of universal test. It is used to check normal and nonnormal distributions (Vlček et al., 2009; Schröer et al., 1995). However, its reliability in checking the normal distribution is lower than that of the W test for a small sample. In the statistical analysis module, the Weibull and uniform distributions of the sample are checked by the K-S method. Suppose that F0(x) is the probability distribution function of the theoretical distribution and Fn(x) is the cumulative frequency function of the current sample,

(3)

If D < Dα, the sample follows the current theoretical distribution, Dα is the critical value of the W test, and α is the level of significance.

The probability density functions of the Weibull and uniform distributions are shown in Eq. (4)

(4)

where λ > 0 is the scale parameter and k > 0 is the shape parameter for the Weibull distribution. The para­meters are estimated by the probability weighted moment method (Zhang, 1994) in Eq. 5

(5)

where Г is the gamma function, the estimate of expected value of population E(x) = λГ(1+1/k), and the estimate of variance of population D(x) = λ2Г(1+2/k) – µ2. For uniform distribution, a and b are the interval parameters. The estimate of expected value of population E(x) = (a + b)/2 and the estimate of variance of population D(x) = (ba)2/12.

The confidence interval is a measure of the reliability of an interval estimate for population parameter (Huwang et al., 2002). Suppose that θ is an unknown parameter for population. When a random interval [θ1, θ2] exists, if p{θ1θθ2} = 1 – α for a given 0 < α < 1, the interval [θ1, θ2] is the confidence interval of θ at the confidence level 1-α. For a small sample n < 30 following the normal distribution, the estimate of confidence interval for the expected value of population is $\mu ±\frac{\sigma }{\sqrt{n}}{z}_{\alpha /2}$ and $\mu ±\frac{\sigma }{\sqrt{n}}{t}_{\alpha /2,n-1}$ when σ2 is known and unknown respectively. For a large sample, the estimate of confidence interval for the expected value of population is $\mu ±\frac{\sigma }{\sqrt{n}}{z}_{\alpha /2}$ regardless of the distribution. The estimate of confidence interval for variance of population is $\left[\frac{\left(n-1\right){\sigma }^{2}}{{\chi }_{\alpha /2,n-1}^{2}},\frac{\left(n-1\right){\sigma }^{2}}{{\chi }_{1-\alpha /2,n-1}^{2}}\right]$, where zα/2 is the bilateral quantile of the standard normal distribution at the level of significance α, tα/2,n-1 is the bilateral quantile of the t-distribution at the degree of freedom n, χ2α/2,n-1 is the bilateral quantile of the Chi-square distribution.

## 3 Assessment of Corrosion Damage

The framework for the module of the assessment of corrosion damage is shown in Figure 2. The module establishes the atmospheric and marine corrosion database. The grey relational analysis is used to determine the main environmental factors of atmospheric corrosion and marine corrosion for each type of steel. Subsequently, a BP artificial neural network is used to build the atmospheric and marine corrosion damage model, incorporating the main corrosion environmental factors, exposure time, and corrosion rate for each steel type. After the evolution of the corrosion pit depth is predicted by the artificial neural network, the residual life of a structure with corrosion pit defect is evaluated by the fracture mechanics calculation.

Figure 2

Framework for the module of the assessment of corrosion damage.

The atmospheric corrosion database involves the average corrosion rate over a period of 16 years for seventeen kinds of carbon steels and low alloy steels: A3, 3C, 20, 08Al, 16Mn, 16MnQ, D36, 15MnMoVN, 14MnMoNbB, 09MnNb, 09CuPTiRE, 10CrMoAl, 10CrCuSiV, 12CrMnCu, 09CuPCrNi, 09CuPCrNiA, 06CuPCrNiMo and five kinds of stainless steels: 2Cr13, 00Cr17AlTi, 1Cr18Ni9Ti, 00Cr19Ni10, 0000Cr18Mo2 in the atmospheric test stations. The marine corrosion database involves the average corrosion rate, the average pit depth, the maximum pit depth, and the maximum crevice corrosion depth over a period of 16 years for these steels in the marine test stations.

Many environmental factors can affect the corrosion processes. Determining the main corrosion environmental factors is very meaningful. It can simplify the corrosion damage modelling and improve the model’s reliability. In the corrosion assessment module, the grey relational analysis is used to obtain the main environmental factors of atmospheric and marine corrosion for each carbon steel, low alloy steel, and stainless steel. The grey system is a system in which some of its information is clear and some is not clear. The grey relation is the uncertainty of the association between things or the uncertainty of the association between system factors and the main behavioural factors (Deng, 1989). The grey relational analysis, which is an important factor of the grey system theory, is based on the degree of similarity or difference of development trends between an alternative and the ideal alternative (Wang et al., 2006). The grey relational analysis is shown in Eq. (6)

(6)

where ri is the grey relational grade, ξ0i is the grey relational coefficient, δ is the resolution coefficient, ${\Delta }_{0i}\left(k\right)=|{X}_{0}\text{'}\left(k\right)-{X}_{i}\text{'}\left(k\right)|$${\Delta }_{\mathrm{min}}=\underset{i}{\mathrm{min}}\underset{k}{\mathrm{min}}|{X}_{0}\text{'}\left(k\right)-{X}_{i}\text{'}\left(k\right)|$${\Delta }_{\mathrm{max}}=\underset{i}{\mathrm{max}}\underset{k}{\mathrm{max}}|{X}_{0}\text{'}\left(k\right)-{X}_{i}\text{'}\left(k\right)|$, X0'(k) and Xi(k) are the reference sequence and the comparative sequence by standardization treatment ${X}_{0}\text{'}\left(k\right)={X}_{0}\left(k\right)/\mu$${X}_{i}\text{'}\left(k\right)={X}_{i}\left(k\right)/\mu$. The bigger the grey relational grade of corrosion environmental factors, the more significant its influence is on corrosion damage.

The artificial neural network (ANN) is an information processing model that is inspired by biological nervous systems. It is composed of a large number of highly interconnected neurons. It can learn correlative patterns between input and output information without specific models and can use that learned association to predict the appropriate output for input data not used in training (Carpenter et al., 1995). There are many kinds of artificial neural network models. The most widely applied one is the multilayer forward spread and error back propagation network (BP neural network). In the corrosion assessment module, a three-layer BP neural network is used to build the atmospheric and marine corrosion damage model. It includes an input node layer where data are presented to the network, a hidden node layer that receives internal signals to process, and an output node layer that holds the network response to a given input. The ANN input variables and the exposure time are the main corrosion environmental factors that the grey relational analysis determines. The ANN output variables are the average corrosion rate for atmospheric corrosion, the average corrosion rate, the average pit depth, and the maximum pit depth for marine corrosion. Before the ANN is trained, the corrosion data is scaled to fit within the interval 0-1, which is a fundamental operation prior to training known as data homogenization (Cai et al., 1999). In order to forbid the ANN training to fall into the local minimum and reduce the network sensitivity to local error surfaces, a momentum coefficient is added to the ANN training. The number of nodes in the hidden layer, the connection weight values, and the threshold values are most important to the forecast accuracy of the BP network (Wang et al., 2006). However, until now there has been no valid theory with which to select the number of nodes in the hidden layer. Better precision can be obtained by comparing the calculation results with different numbers of nodes in the hidden layer. The connection weight values and threshold values are adjusted by the gradient descent algorithm

(7)
$\Delta {w}_{ij}^{\left(l\right)}=-\eta {\delta }_{j}^{\left(l\right)}{O}_{i}^{\left(l-1\right)}\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}\Delta {a}_{j}^{\left(l\right)}=\eta {\delta }_{j}^{\left(l\right)}\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{\delta }_{j}^{\left(l\right)}=\frac{\partial {E}_{j}^{\left(l\right)}}{\partial {O}_{j}^{\left(i\right)}}f\text{'}\left({I}_{j}^{\left(l\right)}\right)$

where $\Delta {w}_{ij}^{\left(l\right)}$ is the connection weight increment between jth node of l layer with ith node of l-1 layer, $\Delta {a}_{j}^{\left(l\right)}$ is the threshold increment of jth node of l layer, η is the learning rate, ${E}_{j}^{\left(l\right)}$ is the output error ${E}_{j}^{\left(l\right)}={\left({O}_{j}^{*\left(l\right)}-{O}_{j}^{\left(l\right)}\right)}^{2}/2$, ${O}_{j}^{*\left(l\right)}$ is the calculation output value, ${O}_{j}^{\left(l\right)}$ is the real output value, ${I}_{j}^{\left(l\right)}$ is the input value, and f(x) is the activation function f(x) = 1/(1 + exp(x)).

In order to check the validity of the ANN algorithm, the corrosion data for A3 steel in the marine splash zone is used as an example for analysis. The number of nodes in the hidden layer, the learning rate, and the momentum coefficient are set as 4, 0.3, and 0.9 respectively. It can be seen in Figure 3(a) that in the beginning the mean output error drops sharply with the epoch followed by a very slow decrease. After training for 100000 epochs, the mean output error arrives at 9.7E-4. Figure 3(b) shows the comparison between the calculation output value and the real output value after 100000 epochs. When the calculation output value is equal to the real output value, the data point will be on the straight line y = x. It can be seen in Figure 3(b) that all data points are close to the straight line.

Figure 3

(a) Evolution of mean square error; (b) Comparison between calculation output value and real output value.

Fracture mechanics is a method for predicting the failure of a structure containing cracks. It uses analytical solid mechanics methods to calculate the driving force on cracks and experimental solid mechanics methods to characterize the material resistance to fracture (Zhu et al., 2012). After the evolution of corrosion pit depth is predicted by the BP network, a fracture mechanics model is used to evaluate the residual life of the structure. Suppose that the structure is a plate with infinite length and finite thickness and there is a semi-circular crack in the center of the plate on the surface. If the plastic zone is considered,

(8)

where KI is the stress intensity factor, σ is the tensile stress, M is the coefficient, Q is the shape parameter of crack, σy is the yield strength, α is the equivalent crack length, d is the corrosion pit depth, and L is the plate thickness.

Suppose that the corrosion pit depth is d1 at current time t1. When the corrosion pit depth is d2 at time t2, KI2 = KIC, where KI2 is the stress intensity factor at time t2 and KIC is the fracture toughness. Thus d2 is the critical corrosion pit depth. The residual life of the structure is equal to t2-t1.

## 4 Conclusions

This paper introduces the overall architecture for the analysis of natural environment corrosion. Atmospheric and marine environmental data are analyzed by a statistical method. The W method’s reliability in checking the normal distribution for a small sample is higher than that of the K-S method; but the W method is a special method to check normal distribution. It does not check any other distributions. The K-S test is a kind of universal test. It can check any distribution as long as the distribution function is known. Although the expected values and variances of the population can be estimated by the distribution function, a more reliable estimate is given by the confidence interval at different confidence levels. For atmospheric and marine corrosion data, the main corrosion environmental factors are confirmed by grey relational analysis. These factors are the input variables for the BP artificial neural network. This can simplify the BP artificial neural network modelling and improve the reliability of the model. After the relationship between corrosion pit depth and time is predicted by the BP network, the residual life of the structure can be obtained by a fracture mechanics calculation on the critical corrosion pit depth.