Wavelet-based multiresolution analysis coupled with deep learning to efficiently monitor cracks in concrete

A BSTRACT . This paper proposes an efficient methodology to monitor the formation of cracks in concrete after non-destructive ultrasonic testing of a structure. The objective is to be able to automatically detect the initiation of cracks early enough, i.e. well before they are visible on the concrete surface, in order to implement adequate maintenance actions on civil engineering structures. The key element of this original approach is the wavelet-based multiresolution analysis of the ultrasonic signal received from a sample or a specimen of the studied material subjected to several types of solicitation. This analysis is finally coupled to an automatic identification scheme of the types of cracks based on artificial neural networks (ANNs), and in particular deep learning by convolutional neural networks (CNNs); a technology today at the cutting edge of machine learning, in particular for all applications of pattern recognition. Wavelet-based multiresolution analysis does not add any value in detecting fractures in concrete visible by optical inspection. However, the results of its implementation coupled with different CNN architectures show cracks in concrete can be identified at an early stage with a very high accuracy, i.e. around 98%, and a loss function of less than 0.1, regardless


INTRODUCTION
ince its creation, concrete, the flagship material of civil engineering structures, has been the basis of construction techniques, whether industrial (factories, warehouses, ...) or hydraulic (dams, dikes, ...), but also of infrastructures such as transportation (bridges, tunnels, ...) or urban infrastructure (aqueducts, ...). This success is due to several factors: concrete is an economical material, easy to work with, resistant to compressive stress, durable, sound and heat insulating, and it contributes to architecture through the shapes, textures and colors it provides [1]. Coupled with a reinforcement usually made of steel, reinforced concrete can compensate for the low tensile strength of concrete.
The service life of reinforced concrete structures is conditioned by the response to chemical (e.g. carbonation, corrosion, etc.), physical (e.g. freeze-thaw cycle) and mechanical (e.g. overloading) aggressions of the environment, as well as by the capacity of the constituent materials to protect themselves against these aggressions. In this work, we are interested in the corrosion of reinforcement which is one of the main causes of degradation of reinforced concrete structures [2]. This corrosion induces a modification of the steel-concrete bond, a reduction in section of the steel bars, a decrease of the ductility of the steel as well as a peripheral damage of the concrete due to the pressure of the corrosion products. More precisely, Bhaskar et al. showed that when the porous zone at the interface between steel and concrete (a zone whose volume depends on the surface of the reinforcement, the water/cement mass ratio and the degree of hydration) is completely filled by the corrosion products, pressures are exerted on the concrete cover and can generate cracks [3]. As shown in Figure 1, monitoring the propagation of these cracks is therefore essential to stop them from reaching critical sizes that could lead to the reduction of the bearing capacity of the concrete or reinforced concrete structure or even to its failure [4][5][6][7][8]. This research topic is still of great interest and recent studies have shown that it is possible to limit interfacial micro-cracks in concrete and its composites subjected to dynamic loads, for instance by adding fly ash and/or silica fume at a rate of a few thenths of the weight of the cement [7,8]. S Visual inspection of civil engineering structures is now complemented by high-performance, high-definition scanner or photogrammetry surveys, and artificial intelligence (AI), in particular deep neural networks, can detect defects, classify them and propose a diagnosis. The Internet of Things (IoT) and new generations of sensors (e.g. fissurometers, inclinometers) make it possible to instrument infrastructure and continuously monitor a number of structural health indicators from 24/7 control centers [6]. There are many methods to evaluate the detection of anomalies in materials or components of a civil engineering structure. Non-destructive testing (NDT), which is a long-standing, common and mandatory practice in many industries such as aeronautics or aerospace, is an important category [10]. NDT methods consist in causing a disturbance in the material to be studied, here of an ultrasonic nature, and recording its response. This response to the ultrasonic excitation is a function of the state of the material or the component of the structure to be controlled. These techniques are therefore important tools to help detect cracks, characterize the degree of corrosion of reinforcement in reinforced concrete, determine the thickness of concrete slabs, etc. [11,12]. However, a critical point of this type of method is the extraction of relevant information on the state of the material from its response. Multiresolution analysis (MRA) is the original method implemented in this work to extract this key information by decomposing the signals at different levels of resolution. In particular, we will use wavelets, i.e. an extension of Fourier analysis, as an analytical tool to mathematically describe the increment of information required to move from a coarser approximation of the material response to a higher resolution approximation. Wavelet-based multiresolution analysis, which has received significant attention in recent years in various fields, is therefore a powerful tool for efficiently representing signals and images at multiple levels of detail [13,14]. The last key point of the work described in this paper is to build a classifier to detect cracks from the images obtained at the spatial scale. In this regard, an automatic crack type identification scheme, based on artificial neural networks (ANNs), is proposed. Crack detection techniques based on deep ANNs, i.e. deep learning, are currently under active research due to their renowned outstanding performance [15]. In particular, some authors have recently proposed improved convolutional neural networks (CNNs) that can extract crack patches in an image with 99% accuracy [16]. The structure of this article is as follows. First, we will point out the fundamental concepts, as well as the experimental procedures, associated with each of the three key points of the method proposed here: non-destructive ultrasonic testing to obtain an ultrasonic signal identifying the defect; multiresolution wavelet-based analysis to preserve the important elements of the signal, i.e. the cracks, at high resolution and produce a scalogram localizing the defect; and finally classification by CNNs. The results obtained will then be analyzed and discussed. Finally, we will emphasize the originality of this work, namely the multiresolution analysis based on wavelets, as input to the deep neural network, which allows us to obtain a high level of classification accuracy, independently of the chosen CNN architecture.

Research signification
his work solves the important problem of detecting the onset of cracks inside concrete structures. These cracks are optically invisible from the outside and may propagate unexpectedly until structure failure. In some sensitive infrastructures, such as nuclear power plants, dams or bridges, a concrete failure can lead to very serious disasters. Although this type of disaster remains unusual, each occurrence can generate serious human, environmental and technical consequences. This is why it is important to have a protocol for detecting and monitoring cracks in their early stages in order to secure structures of vital interest. What is interesting is to know the cost of our proposed protocol to practically evaluate its implementation in the field. The cost of our investigations is low since all that is required is an on-site portable ultrasonic device and an ordinary processor, either a DSP card or a laptop computer since we are implementing an architecture that has already learned to detect and track possible internal cracks or the beginnings of cracks. To quantify the hardware implementation of our approach, we recall the instruments used: The instrument used is the Pundit PL-200. It allows first class ultrasonic pulse velocity tests to examine the quality of concrete: to estimate the compressive strength of concrete or to measure the surface velocity and the depth of cracks. Our software supports settings directly accessible in real time from the measurement screen. The developed software is implemented on an electronic board with DSP or directly implemented in a laptop. The global cost of all this instrumentation is about $5,000.00.

Methodology
The methodology implemented in this paper is composed of three main steps (see Figure 2) that we will detail in the rest of this section. The objective of the proposed methodology is the detection and monitoring of internal cracks in concrete structures. Such cracks will be detected by ultrasonic NDT and analyzed by the wavelet transform providing a spatially scaled image allowing to localize the crack in space and at each resolution. The resulting multi-resolution image is then subjected to a deep learning-based crack/non-crack classification process (AlexNet, VGG16). This methodology comprises three steps: The first step consists of performing a non-destructive ultrasonic test (NDT) on aging concrete samples of different compositions. The objective is to collect the ultrasonic signal received. In the second step, a wavelet-based multiresolution analysis is conducted on the received ultrasound signal. This is the key step of the work presented in this article since it will allow to highlight the initiation of cracks within the material. From the multi-resolution analysis, a B-scan mapping will then be obtained. Finally, the obtained image will be the input of a deep learning algorithm based on Convolutional Neural Networks (CNNs). Two well-known architectures in the literature, namely AlexNet and VGG16, will be tested. AlexNet won the ImageNet competition in 2012, and VGG16 won the same competition in 2014. These are the two networks that were used in our experiments due to the fact that they are behind the explosive emergence of Deep Learning. They will be the basis for evaluating the performance of our approach for crack detection in concrete structures. Other neural networks exist, as powerful as AlexNet and VGG16 and maybe more, which are used in pattern recognition. However, the goal of this work is not to find the network that will give the best accuracy in detecting an internal crack in concrete from optical images. Indeed, the aim is to demonstrate that with wavelet-based multiresolution analysis, the detection of a crack in concrete at an early stage will be very accurate independently of the type of deep learning architecture used.

EXPERIMENTAL PROCEDURE
Preparation of concrete specimens n this section, the four main steps in the preparation of concrete specimens will be detailed, i.e. fabrication, casting, curing and end grinding. As shown in Figure 3 a, the following raw materials were used to manufacture the concrete specimens:  cement (Sour El-Ghozlane cement plant in Algeria) dosed at 350 kg/m 3 ;  sand (Bou-Sâada sand in Algeria) characterized by a grain size between 200 µm and 500 µm;  three types of gravel i.e., grain size 3/8 mm, grain size 8/15 mm and grain size 15/25 mm. Five batches of standardized size specimens, corresponding to five different concrete mixes (see Table 1), were made. Each batch is composed of one cubic specimen of 150 mm side and six cylindrical specimens of 160 mm diameter and 320 mm height. Each of the thirty-five specimens dedicated to this study was cast in a galvanized steel mold on a vibrating table (see Figure 3 b. The final mass of a concrete specimen is 15 kg. As shown in Figure 3 c, each specimen was stored at 20 °C and 98% humidity in a curing chamber for 28 days to simulate the fabrication of concrete columns. Finally, as shown in Figure 3 d, a "Deluxe Hi-Kenma TSURU-TSURU" type end grinding machine from the manufacturer MARUI & CO., LTD. was then used so that each concrete specimen had a perfect surface, i.e., low roughness after machining.

Ultrasonic tests on concrete specimens
There are many standardized processes related to NDT that can be classified into two categories: the first allowing to evaluate the strength and its variation in time; the second to evaluate characteristics other than strength (e.g. dimensions of structural elements, corrosion, dampness etc.) [10], [17][18][19][20]. In the first category, we find sclerometric methods [21] (whether static or dynamic), acoustic methods [22,23] (e.g. ultrasound) or pull-off methods [24] which are semi-destructive. In the second category, the techniques are much more numerous: we can mention, among others, acoustic methods [25,26] (e.g. acoustic emission, echo, ultrasonic, impact echo), electromagnetic methods [27] (e.g. continuous wave eddy current testing), physical methods [28] (e.g. methods based on measuring thermal properties, electric methods such as linear polarization resistance) or radiological methods [29] (e.g. techniques based on X-rays). Unfortunately, civil engineering cannot benefit from all the technical advances of NDT in the mechanical industries because the nature of the materials used and the concerns differ. Unlike metals, concrete is a composite material that originally contains a large number of defects in the form of small cavities, pores and gaps. It is also a material whose mechanical properties are not rigorously reproducible, even under the best conditions. Moreover, these properties degrade more or less rapidly over time due to increased service loads, climatic conditions, alkaliaggregate reaction, etc. Therefore, only acoustic techniques, infrared thermography, penetrant testing and corrosion rate measurement (e.g. linear polarization) methods are generally used in civil and mechanical engineering. Many authors have designated acoustic NDT methods, used individually (e.g. ultrasonic tomography or impact-echo, both used individually) or in combination (e.g. impulse response combined with impact-echo), as particularly well suited for testing structures or building materials, especially concrete.
In this article, we have implemented an acoustic method based on the measurement of the ultrasonic pulse velocity. This type of method is only suitable for the study of concrete consistency, discontinuities, cracks and crack depth but is not reliable for strength determination, except for the determination of Poisson's ratio and Young's modulus with reasonable accuracy [30]. By correlating ultrasonic pulse velocity and the concrete compressive strength, this latter can also be determined. The equipment used to perform the tests is the Pundit ® PL-200 from Proceq (see Figure 4). Two P-wave ultrasonic pulse velocity transducers with a frequency of 54 kHz are used. The ultrasonic pulse velocities are between 100 Vpp and 400 Vpp. The pulse echo range is from 0.1 µs to 1,200 µs. A 7 inch 800 × 480 pixel touch screen with very high resolution is available with the equipment to analyze the measured waveforms.
To determine the compressive strength, a 2,000-3,000 kN one-piece compression testing machine from 3R is used (see Figure 5).

Wavelet-based multiresolution analysis
Wavelets are an interesting analytical tool to describe mathematically the increase in information required to go from a coarse approximation to a higher resolution approximation. Through a multiresolution analysis (MRA), a signal can be decomposed and reconstructed as a series of approximations of decreasing scale, completed by a series of details [13]. To illustrate this concept, let us consider an image built from a succession of approximations; the details enrich this image. Thus, thanks to the MRA based on wavelets, the coarse vision becomes finer and more precise. Engineers, practitioners and researchers are confronted daily with increasingly difficult technological problems at multiple scales of analysis, in terms of classification, segmentation, detection of contours or parameters of interest, noise reduction or elimination, compression for transmission or storage, synthesis or reconstruction, etc. This concerns many fields such as astrophysics [31], finance [32], fluid mechanics [33], thermodynamics [34], medicine and biology [35][36][37], multimedia [38], telecommunications [39,40], signal and image processing, and of course, the monitoring of cracks and detection of fractures in materials [41][42][43][44][45][46]. MRA based on wavelets can thus become an essential tool for solving the difficulties encountered in the above-mentioned fields. This tool, sometimes described as miraculous, produces an immediate, easily interpretable and exploitable result. However, for specific applications requiring the extraction of targeted information, it is clear that advanced methods will have to be developed and "merged" in order to effectively utilize existing techniques or to optimize the analyses (for example in compression) by taking into account edges or contours, using 2nd and 3rd generation wavelets such as peaks, curves [47], contours [48], bands [49], etc. Indeed, these anisotropic wavelets are automatically oriented and extended by unifying the geometry of a given edge or contour. This conceptualization of MRA is comparable to a camera that moves closer to a subject or uses a zoom lens to distinguish details, and further away to capture larger features-the famous concept of the mathematical microscope. The principle of wavelet-based MRA is illustrated in Figure 6. Three levels of resolution are considered here. At the first level of resolution, the signal S is decomposed into an approximation 1 A and a detail 1 D . At the second level of resolution, the 1 A approximation is decomposed into an 2 A approximation and a 2 D detail. Finally, at the third level of resolution, the 1 A approximation is in turn decomposed into an 3 A approximation and a 3 D detail. Thus, the signal S can be expressed as shown in (1).
Let    t denote a reference pattern called the mother wavelet. It is generally requested that    t has jointly highly concentrated time and frequency supports.    t satisfies (2), when n controls the number of oscillations of    t . This relation means that    t is orthogonal to polynomial components of degree less than n .
The wavelet transform   , X W u s of a signal X at time u and scale s is defined by (3), where  * denotes the complex conjugate of  .
Looking closely at equations (2) Clearly, to reduce or eliminate redundancy, the family       2 , , j k j k must be an orthonormal basis of denotes the vector space of one-dimensional measurable, square-integrable functions. This property of the wavelet makes it possible to obtain a fast wavelet transform. The fast wavelet transformation is calculated by a cascade of low-pass filtering by h and high-pass filtering by g followed by a downsampling (or decimation) by a factor of 2 (see Figure 7). In Figure 7   In this study, the investigative ultrasonic signal scalogram will be used to determine and analyze cracks in concrete. The scalogram of the signal   x t can be defined using (5). Figure 9 shows an example of scalogram of a signal representing three cracks in a concrete specimen, one of which (the central crack) is in an advanced state that could lead to an imminent rupture.

Detecting cracks in concrete using deep neural networks
In recent years, artificial intelligence has become a necessity because of its groundbreaking innovations in many areas, including pattern recognition in construction and structural engineering [50]. Deep learning methods, which use consecutive hidden layers of information processing organized in a hierarchical manner, have become essential for representation, learning and classification. Considered today in the Top 10 of the most efficient and flexible deep learning techniques, convolutional neural networks (CNNs) are particularly well suited for tasks such as image recognition, image analysis, image segmentation, video analysis or natural language processing [51,52]. However, this type of machine learning requires the use of sufficiently large input database for training and testing to ensure the highest possible accuracy of the recognition process [53]. A CNN architecture is typically characterized by the presence of multiple convolutional blocks-each consisting of a convolution layer, an activation function and a pooling layer-and a fully connected layer [54]. A convolutional layer, which is a key element of the method, performs a convolution operation on the output of the previous layers using a set of filters, also called kernels, to extract the features that are important for classification, i.e. in this case the "crack" and "non-crack" classes. A CNN architecture is typically characterized by the presence of multiple convolutional blocks-each consisting of a convolution layer, an activation function and a pooling layer-and a fully connected layer [54]. A convolutional layer, which is a key element of the method, performs a convolution operation on the output of the previous layers using a set of filters, also called kernels, to extract the features that are important for classification, i.e. in this case the "crack" and "non-crack" classes.
There are many CNN architectures, including AlexNet, VGG16, Inception and ResNet, and their performances are regularly compared by many authors [55,56]. In this article, two different pre-trained CNN models, i.e. AlexNet and VGG16, are experimented. The objective is to demonstrate that the wavelet-based MRA is the key component of the proposed approach which guarantees a very high level of accuracy in the classification, independently of the type of CNN architecture used. Since its introduction in 2012, in the framework of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2012), the AlexNet architecture has already become a very popular CNN architecture and has obtained good results in many applications such as computer vision [51], [57]. AlexNet is not a complex architecture when compared to other major CNN architectures, such as ResNet, that have emerged in recent years [58]. It is also easy to implement with TensorFlow and Keras. As shown in Figure 2, AlexNet consists of five convolutional layers that use kernels to scan the input image by performing convolution operations. The first two convolution layers use a (11 × 11) and (5 × 5) size filter, respectively; the last three layers each use a (3 × 3) kernel. Some of these convolutional layers (i.e., the first two layers and the last layer) are followed by max-pooling (i.e., a subsampling operation usually applied after a convolutional layer, where the maximum values are taken). At this stage, the model is composed of more than 1.7 million parameters. Each convolution layer uses the rectifier linear unit (ReLU) activation function. Unlike the sigmoid activation function, which is frequently used for a binary classification network, ReLU increases the non-linear properties of the decision function and the global network without affecting the receiver fields of the convolution layers. At the output of the convolutional layers, a flattening step is necessary to create a single vector containing the main characteristics of the crack to be identified. Initially intended to classify 1,000 categories, we have modified the AlexNet architecture so it handles only two possible classes, i.e. images with and without cracks. This architecture ends with three fully connected layers (the first two layers are composed of 4,096 outputs, and the last layer has only two) with a Softmax classifier, which is reduced in our work to a simple logistic regression, composed of the two possible labels. The three fully connected layers alone account for more than 18.8 million parameters. Therefore, the network, having more than 20 million parameters, can then be trained. The Adam optimizer was used. However, in case of dropout, a neuron is removed from the network with a probability of 0.5. Even if dropout increases the number of iterations by 2, this step is essential to prevent oversizing of AlexNet. VGG16 (the number 16 meaning that the architecture is composed of 16 layers) is a convolutional neural network model proposed by Simonyan and Zisserman [59] to achieve 92.7% accuracy in the famous ImageNet top-5 test, which is a dataset of over 15 million labeled high-resolution images belonging to roughly 22,000 categories. This architecture improves AlexNet by replacing the large kernel filters (11 and 5 in the first and second convolutional layers, respectively) with multiple 3 × 3 kernel filters, one after another. As shown in Figure 2, the image to be classified goes through a stack of convolutional layers, where filters of size 3 × 3. The spatial padding of the input to the convolution layers is such that the spatial resolution is preserved after convolution. The spatial pooling is performed by five max-pooling layers, which follow some of the convolution layers (not all convolution layers are followed by max-pooling). As for the AlexNet architecture, the ReLU activation function is used in the convolution steps. Three fully connected layers follow the convolutional layer stack. The last layer is a softmax layer used to classify each pixel into "crack" or "non-crack" classes. Although the VGG16 architecture is very large and requires nearly eight times more parameters to be trained compared to the AlexNet architecture, it is easy to implement in current open-source software libraries for artificial neural networks.
To conduct this study, we used two computers connected in parallel; each of the two being equipped with a 9th generation Intel Core i7 Hexa Core microprocessor. Each computer is equipped with a high-end NVIDIA GeForce RTX 2080 graphics processing unit (GPU) with the following main memory features: GDDR6 type; 8 gigabyte (Gb) capacity; 14 Gb/s speed; 448 Gb/s bandwidth; and a speed of 60 TOPS (tera operations per second) to process the very large number of operations (up to a few billion for each image) required to compute neural networks. As for software tools, the open source machine learning tool TensorFlow, developed by the Google Brain team, was used. This is now an essential tool for machine learning applications, such as neural networks [60]. The implementation of the convolutional neural network algorithms was done with the Keras library, using the Python programming language. Keras, which is used here as an interface for TensorFlow, was chosen for the ease of its implementation of many functions and procedures, its modularity, and its extension capabilities.

MAIN RESULTS AND DISCUSSION
irst, we tested the methodology on available image datasets of visually or optically observable cracks on the surface of concrete samples. For this purpose, we used a public database containing 4,800 manually labeled images of cracked and non-cracked concrete bridge decks [61]. 80% of these images were allocated to the training phase and 20% for validation. It is noted that regardless of the deep learning architectures implemented, the wavelet-based MRA does not add anything at this stage as the accuracy levels obtained are those found in the literature. We then sought to demonstrate the relevance of wavelet-based multiresolution analysis to identify the initiation of cracks in concrete, i.e. well before the fracture is visible on the surface of the material. For this purpose, a private database of Bscan mappings obtained by wavelet-based MRA was constructed from the 35 concrete specimens we fabricated and aged by the compression tests. For each concrete specimen, this database contains 40 images without cracks, and 100 images representing several stages of aging, i.e. from the initiation of cracks in the core of the material, then to their propagation, to the fracture of the specimen itself. In total, 4,900 images are available, each with dimensions of 120 pixels × 120 pixels × 3 color channels. For each of the two classes, i.e. "crack" and "non-crack", 80% of the images are assigned to the training phase and 20% to the validation. Before training and validation, the images are normalized by subtracting their mean in order to have centered data. This ensures similar image characteristics to avoid uncontrollable gradients in the loss function with respect to the neural network weights during backpropagation. Figure 10 gives an illustrative example of the impact of wavelet-based multiresolution analysis combined with a simple deep learning architecture to automatically detect cracks in concrete long before they are visible by optical inspection. We have of course tested both architectures in Figure 2 Figure 10, the graph on the left shows the smoothed accuracy of the recognition of a crack as a function of the epochs.
The graph on the right shows the evolution of the associated cost function. The cross-entropy loss function, also known as the logarithmic loss function, is one of the most commonly used cost functions when adjusting model weights during training. We have used the binary cross-entropy loss function is implemented. It consists of comparing each predicted class probability with the desired 0 or 1 output, identifying "crack" or "non-crack" respectively, for the actual class. A score is then computed, penalizing the probability according to the distance between it and the expected real value. In the case that we have implemented, the penalty function is logarithmic, which gives a high score for large differences close to 1 and a low score for small differences tending towards 0. For both training and validation, the results in Figure 10 show that fewer than 25 epochs are required for the smoothed accuracy and loss function to converge. More precisely, the smoothed accuracy reaches a maximum of 97% during the training phase and 98% during the validation phase, thus confirming the great importance of the wavelet-based multiresolution analysis. The loss function, on the other hand, reaches a minimum lower than 0.1 during both training and validation, which means that the model is adequately fine-tuned.

CONCLUSIONS
his paper reports on an original methodology that was implemented to efficiently detect crack initiation using nondestructive ultrasonic testing of elements of a concrete civil engineering structure. Compared to existing related studies in the literature, our main contributions are as follows:  Proposal of a detection method at an early stage, i.e. well before the concrete fracture is visible on the surface, in order to implement appropriate maintenance actions and thus avoid the failure of the structure.  A key element of this method is the wavelet-based multi-resolution analysis (MRA) of the ultrasonic signal received from a sample or a concrete specimen subjected to several types of solicitation. The received ultrasonic signal is analyzed at each resolution (or scale) by wavelet transformation.  The resulting image is squared to serve as input to an automatic crack type identification system based on deep learning by convolutional neural networks (CNNs). Two architectures, chosen both for their ease of implementation in open-source platforms and libraries dedicated to machine learning and to limit the computational load, were tested. The purpose was not to optimize CNN architectures. If this were the case, then we would have chosen modular structures (e.g. ResNext, Xception, Channel Boosted CNN, etc.) based on auxiliary learners that utilize either spatial or feature map information or input channels to improve classification performance. The objective was to rather show that with a multiresolution analysis based on wavelets, it is possible to detect crack initiations in concrete and that the accuracy of this detection is independent of the chosen CNN architecture. After aging concrete specimens in compression tests, we built a database containing nearly 5,000 B-scan mappings from wavelet-based MRA of specimens with and without crack initiation and propagation. Regardless of the two architectures implemented, the results show that the accuracy is greater than 98%. The loss function reaches values less than 0.1, which T means that both models are finely tuned. All these results prove the relevance and efficiency of the approach described in this paper. It would be interesting if the non-destructive methodology proposed in this paper could be implemented on all or part of civil engineering structures, such as suspension bridges, reinforced concrete bridges with central cantilever spans or masonry railway viaducts, that require permanent remote monitoring in order to prevent the occurrence of failures that would jeopardize the safety and performance of the structure itself. Remote monitoring should not in any case replace visual or optical surveillance of structures, which remains the basis of monitoring. However, deep learning algorithms are of undisputable relevance for remote monitoring, especially when many images or videos showcasing structures' state of health are captured, because any cracks can in this case be detected very quickly and automatically.