Damage detection in structural health monitoring using hybrid convolution neural network and recurrent neural network

A BSTRACT . The process of damage identification in Structural Health Monitoring (SHM) gives us a lot of practical information about the current status of the inspected structure. The target of the process is to detect damage status by processing data collected from sensors, followed by identifying the difference between the damaged and the undamaged states. Different machine learning techniques have been applied to attempt to extract features or knowledge from vibration data, however, they need to learn prior knowledge about the factors affecting the structure. In this paper, a novel method of structural damage detection is proposed using a hybrid convolution neural network and recurrent neural network. A convolution neural network is used to extract deep features while recurrent neural network is trained to learn the long-term historical dependency in time series data. This proposed method w hich combines two types of features -spatial and temporal-helps to increase discrimination ability when being compared with the one that contains deep features only. Finally, the neural network is applied to categorize the time series into two states - undamaged and damaged. The accuracy of the proposed method was tested on a benchmark dataset of Z24-bridge (Switzerland). The result shows that the hybrid method provides a high level of accuracy in damage identification of the tested structure.


INTRODUCTION
tructural health monitoring (SHM) has proved to be an effective system that plays a vital role in ensuring the integrity and safety of the structure as well as detecting the development of structural damage in order to predict the remaining life cycle of civil infrastructures. Before implementing SHM, it is important to understand what the SHM does. It processes the experimental data which can be used to identify and classify structural damage levels accordingly. In literature, existing vibration-based damage detection methods can be categorized into either data-driven or model-based methods [1,2]. Traditional model-based methods used dynamic characteristics of the structure such as natural frequencies, mode shapes, or damping ratio to detect damage [3,4,5,6]. However, to detect damage correctly, complicated features that affect the structure such as ambient vibration, the temperature have to be considered. In data-driven methods, there exist two main approaches to solve the damage detection problem. The classic approach is based on solving inverse problems [7] and the advanced one is based on the application of pattern recognition or machine learning [8,9]. In recent decades, different applications of machine learning in SHM have been proposed [8,10,11,12]. All of them focus on the selection and extraction of damaged sensitive features as well as training the system to learn optimal classifiers. For example, Tran-Ngoc, H et al [13] proposed employing the global search capacity of Cuckoo Search (CS) algorithm to deal with local minimum problems of Artificial Neural Network (ANN) when detecting damages of a laboratory beam and a large-scale truss bridge. The core idea is that the proposed approach referred to as ANNCS can gain both advantages of ANN: fast convergence based on gradient descent technique and avoidance of local minima based on stochastic search technique of CS. Moreover, in their work, a vectorization technique was applied to reduce the dimension of data. The results showed that ANNCS not only outperformed traditional ANN and metaheuristic algorithms in terms of accuracy and but also reduced significantly the computational cost in comparison with CS. Auto-regressive model used time series data to extract damaged sensitive characteristics from their model coefficients [11]. Another popular machine learning method is the Support vector machine (SVM), which has proven to give highly accurate results when dealing with small sets of data [10,15]. Other researchers used statistical features (which can be obtained from measurement data and used as inputs to artificial neural network) and combined them with several algorithms to classify the damage levels [16,17]. Recently, with the rapid development in technology, a huge quantity of data such as image data, textbased data, vibration data, etc. can be mined. Moreover, lots of modern algorithms such as deep learning can help to solve big data problems efficiently due to their ability to learn and analyze massive amounts of data (for example, vibration data in structural health monitoring) without the need for feature extraction [18,19]. Convolution neural network (CNN) is an ANN algorithm that has been used on various processes such as image classification [20], speed recognition [21], object recognition [22] and damaged detection [23]. In damaged detection problem, the first step is to encode the time series data into images, followed by applying Convolution Neural Network (CNN) to localize and classify damage levels accordingly [19]. However, in the 1D problem, although applying CNN can help to learn the inner characteristics of time series data, it is unable to put into consideration the correlation of different time series data measured. Hence, this paper proposed a novel method combining CNN and RNN to solve this setback of CNN in identifying optimal features.

Convolution Neural Network for categorization of Time-Series data
onvolution Neural Network (CNN) was first proposed by LeCun et al [20] as a deep learning model. A CNN architecture has two main layers: convolution and pooling [22], which can be connected to other fully-connected layers. From these two layers, featured maps in the form of 2D matrices of CNN can be extracted. One of the main advantages of CNN is the ability to learn pertinent characteristics from the provided data as well as parameter sharing, hence the computational cost using CNN is significantly lower in comparison with other classes of neural network. Commonly CNN has a 2D matrix as input data, but an altered model called 1D CNN has been proposed in image processing. It can use one-dimensional matrix as input data while still having all the existing advantages of CNN. Recently, many researchers have shown that 1D CNN can be advantageous when dealing with time series data in SHM to reduce the computational cost since 1D CNN only uses array operations for the calculation of forward and backward propagation. Moreover, shallow architecture 1D CNN can be trained and implemented easily and effectively to learn the required function in time series problems. The 1D CNN architecture for damage detection used in this paper is employed from [24]. The 1D CNN architecture consists of two main layers: the convolution and pooling layers for the extraction of the concerning features. Extracted features are then classified from fully connected layers as desired. S C For each layer, the forward propagation is calculated by the following Eqn. (1) as below: Back propagation algorithm is used to train the network based on identifying the gradient of the loss function   E y from the weights of the CNN. The derivative of the error with respect to each weight is calculated by Eqn. (2) as below: The weight is then calculated based on the computation of the gradients of layers as below: where  is the learning rate, * , w l i k is the weight of the next iteration. Details of the calculation can be seen in [23].

Recurrent Neural Network
Recurrent neural network (RNN) is a class of ANN in which the outputs from neurons are used as feedback to the neurons of the previous layer.RNN has been proved to have various advantages in data processing: It has the ability to process input data of any length; Model size does not increase when the number of input increases; Calculation process can make use of the old information; Weights are shared throughout the processing. Fig. 1 below presents a common RNN structure: In RNN, the hidden state at time tt h can be calculated in the Eqn. (4) as below: where t h is a hidden state at time t, t x is an input at time t. f is a linear function liked tang hyperbolic (tanh) or ReLU. For the first hidden state, the initial 1 t h is assigned to zero. t o is output at time t and can be used as: Since RNN only has 1 output, that output would be able to learn from all the information fed from the given inputs. RNN, simply combines the state information from the previous timestamp with the input from the current timestamp to generate the state information and output for the current timestamp. Weights and biases are updated according to the relative gradient of the loss functions. The gradients are calculated recursively from the output layer towards the input layer. The gradient of the input layer is the product of the gradients of the subsequent layers. If the values of those gradients are small, the gradient of the input layer (which is the product of multiple small values) will be much smaller as well, resulting in insignificant updates to weights/biases of the initial layers of the RNN, effectively halting the learning process.

Proposed method
In this proposed method, vibration-based time series data is used as an input to the architecture. As explained above, when the time series data is the inputs, CNN is unable to identify the correlations of the measured data. To solve this problem, CNN will extract the required features while the classification of the features will be performed by RNN. The framework of the method is shown below in  In the proposed method, the input time-series data will be given to the convolution neural network to extract the required features automatically. The built-in CNN consists of 1D layers for testing and training by replacing all the two-dimensional layers with the one-dimensional one. In the network, convolution is created by the moving of the kernel along with the time series, followed by multiplication of the kernel's elements. The multiplication results are added together, then a nonlinear activation function for the obtained value is conducted. Activation maps are generated, which are also the spatial features of the data. However, once being fully connected, these features are time-independent so they could not be used for a timeseries data. This is why RNN is applied thanks to its ability to capture the time-dependent features. These features are then fed to the softmax activation function for the classification of damage states.

EXPERIMENTAL RESULTS
o validate the proposed method, two set of experimental data are used: one is from the Z24-Bridge in Bern, Switzerland and the other is from Los Alamos National Laboratory (LANL) for a three-floor structure.

Z24 Bridge
The first set of data used for testing the proposed method is taken from the monitoring of Z24-Bridge in Switzerland. The provided bridge was part of the road connection between Bern and Zürich. Z24 is a prestressed bridge with a main span of 30m and 2 side spans of 14m (Fig. 3). The bridge abutments consisted of triple concrete columns connected with concrete hinges to the girder. Both intermediate supports were concrete piers clamped into the girder. The monitoring data of Z24 is considered a benchmark example and has been widely used in the literature before such as [27,28]. For our study, two states of the structure are considered: Undamaged and damaged states. Progressive damage tests were actually performed on the bridge. For our case, 5 damage scenarios of the progressive damage tests are chosen as the main set of data for training. Firstly, a hinge is added to one pier to create varying damage in the pier foundation settlement. The pier is then lowered by 95mm which caused visible cracking to the pier structure. The hinge is then removed, followed by simulating spalling of concrete for 24 m2 in the east abutment. A cut in the concrete connection between one pier column and box girder is performed to create failure of concrete hinge. In the last scenario, 4 anchor heads of the pretension system T are removed completely. Further details of damage tests can be referred in [25]. The descriptions of damage for the 5 damage scenarios are summarized in Tab. 1.  The chosen data are randomly separated into two sets: training and testing. The training part consists of 70% of all the data, the remaining 30% are for testing. During the training process, different features and labels are given for the training data set. The goal is to capture the relationship between features and class labels. In total, the CNN consists of three layers with 10000 time point of time series data. Each layer consists of 512, 256 and 128 nodes respectively. [20]. The performance of the combined CNN-RNN network is shown in Fig. 5 for scenario 1. In Fig. 5, the value accuracy of the train is stable, but the value accuracy of the test is unstable. Specifically, in the first 60 epochs, the value accuracy of the test is fluctuating strongly but in those next epochs, the value accuracy of the test is more stable. The final model is stable and the accuracy value has converged, the train accuracy value reaches 68% and the test accuracy value reaches 67%. The model is evaluated from the condition of the testing set. Positive and negative detection is used to evaluate the accuracy of the methods. The accuracy level of the model is the proportion of correctly classified samples over the total number of samples as in the Eqn. (6) below

TruePositive TrueNegative Accuracy TruePositive TrueNegative FalsePositive FalseNegative
When the data are unbalanced, the results can also be determined using F-measure [26] as below: where Recall = TruePositive/(TruePositive + FalseNetgative), Precision = TruePositive/(TruePositive + FalsePositive) The remaining 30% of testing data consist of 174 samples in which 96 samples are of undamaged state and 78 samples are of damaged state for each scenario. There are in total five damage scenarios. The accuracy level detected from each scenario is shown in Tab. 2 below.  The results from Tab. 2 show that the accuracy levels of damages detected are very high for scenario number 1, 2 and 4 while they are low for scenario numbers 3 and 5, though in the latter cases the accuracy level is high for the un-damaged state. The results are calculated based on Eqn. (6). This shows that one model cannot be applied for all the data. The results also validated that the integration of RNN into CNN has helped to capture the temporal features, thus improving significantly the effectiveness and accuracy in damage detection problem.

Three-floor structure from Los Alamos National Laboratory (LANL)
The second set of data to be tested is from the measurement of a three-floor structure from Los Alamos National Laboratory (LANL). The structure has different aluminum-made columns and plates connected by bolted joints to form 03 stories with built-in rails for slides that allows for translation horizontally. Different damage scenarios are simulated by adjusting the column in the middle of the top floor. On each floor there are five specific sensors to measure the vibration during the experiment test. The details of the given structure and the data used in our case-study are described in [29].  The CNN model is not only detecting the damage state in the data, it can be extracted the spatial features, which can be show the level of the damages [30]. In Fig. 8, the feature map learned from CNN are presented. The color in the figure represents the level of the damage. The color from channel 1 to channel 3 is almost blue, which means that the low level of the damage, otherwise channel 4 and 5 are in hot colors, which mean higher level of damage. It is suitable with the experimental due to the sensors 4 and 5 located near the damage source. Like the analysis of Z24 bridge data above, Eqns. (6) and (7) are used to evaluate the results for the accuracy of the proposed method. The results show that the proposed method has accurately classified a total of 97 samples of undamaged and 86 samples of damaged while only 38 data samples of undamaged and 34 data samples of damaged were misidentified. It results in an accuracy level of 75% and F-measure of 76% respectively. For this set of data, a comparison was made between the traditional CNN method and the proposed method. Fig. 9 shows the differences in testing accuracy between CNN and CNN-RNN method in undamaged (left) and damaged (right) detection. As shown in Fig. 9, there are in total 122 data samples of undamaged state and 82 data samples of damaged state were accurately identified for CNN method. Our proposed method shows a more accurate result in the damaged scenario than the existing CNN. However, it is not as effective in the undamaged scenario. This is due to the fact that CNN method is unable to learn the temporal relation in time-series data.

CONCLUSION
n this paper, a novel damage detection method using a combination of convolution neural network and recurrent neural network is proposed. We employed the advantages of CNN to extract features from time series data and to generate deep features automatically by using convolution and pooling layers. The RNN learns the correlation of the extracted features from CNN for the classification of data as desired. Two sets of experimental data from Z24 Bridge and LANL were used to test the validity of the proposed method. For the case of Z24 bridge, 5 damage scenarios consisting of lowering of the pier, hinge restored, spalling of concrete at soffit, failure of the concrete hinge, and failure of 4 anchor heads are taken into account, whereas in the LANL experiment, 126 samples of damaged state and 129 samples of undamaged state are considered. The results show that our proposed method is able to identify damage with a high level of accuracy. However, this work has not yet focused on determining the damage locations as well as the damage levels of the structure. Further research can also be conducted to further increase the effectiveness of this method.