Open AccessArticle

Deep Learning-Powered Optical Microscopy for Steel Research

Šárka Mikmeková

^1,*

Martin Zouhar

Jan Čermák

Ondřej Ambrož

Patrik Jozefovič

¹,

Ivo Konvalina

Eliška Materna Mikmeková

and

Jiří Materna

Institute of Scientific Instruments of the Czech Academy of Sciences, Královopolská 147, 612 00 Brno, Czech Republic

Machine Learning College s.r.o., Chrlická 787/56, 620 00 Brno, Czech Republic

Author to whom correspondence should be addressed.

Mach. Learn. Knowl. Extr. 2024, 6(3), 1579-1596; https://doi.org/10.3390/make6030076

Submission received: 28 May 2024 / Revised: 28 June 2024 / Accepted: 5 July 2024 / Published: 11 July 2024

(This article belongs to the Section Learning)

Download

Browse Figures

Versions Notes

Abstract

The success of machine learning (ML) models in object or pattern recognition naturally leads to ML being employed in the classification of the microstructure of steel surfaces. Light optical microscopy (LOM) is the traditional imaging process in this field. However, the increasing use of ML to extract or relate more aspects of the aforementioned materials and the limitations of LOM motivated us to provide an improvement to the established image acquisition process. In essence, we perform style transfer from LOM to scanning electron microscopy (SEM) combined with “intelligent” upscaling. This is achieved by employing an ML model trained on a multimodal dataset to generate an SEM-like image from the corresponding LOM image. This transformation, in our opinion, which is corroborated by a detailed analysis of the source, target and prediction, successfully pushes the limits of LOM in the case of steel surfaces. The expected consequence is the improvement of the precise characterization of advanced multiphase steels’ structure based on these transformed LOM images.

Keywords:

machine learning; steel surface; multimodal microscopy; light optical microscopy; scanning electron microscopy

Graphical Abstract

1. Introduction

In the steel industry, challenges include producing alloys with tailored properties and minimizing waste. The mechanical properties for a given chemical composition are determined by the micro-/nanostructure of the steel, i.e., the presence of different steel phases, their size and their shape. This fact implies the need to analyze the structure and possibly correlate (the preparation process and) the structural data and properties. The correlation can be “studied” using machine learning (ML) models, sometimes also referred to by the fancier term “artificial intelligence” (AI) in the broader context; see, e.g., the recent review paper [1]. The aforementioned challenges and (potential) use of ML puts restrictions on imaging techniques and it pushes for high-quality and detailed information about the micro-/nanostructure.

The traditional and well-established tool for acquiring microstructure data is the light optical microscope (LOM) [2,3,4,5].

The physical limitations, in particular the resolution, of this imaging method can be partially circumvented by using state-of-the-art machines or advanced “super-resolution microscopy” techniques; see, e.g., Ref. [6] for a review of some of them. Unfortunately, these techniques—utilized very often in biology—may not always be applicable to a given objective. Quite often, LOM is used together with other methods (see, e.g., Refs. [7,8,9,10]) to acquire more complete information about the system studied. Sometimes, its capabilities complicate the task at hand [11,12]. Of course, one can improve the phase contrast by using special etching and carefully adapting other steps in the sample preparation before the actual imaging of the sample in question.

However, can something else be done to “push towards” higher-quality images when using LOM as the single input? We believe so, and we introduce a new approach application that is primarily aimed at the precise characterization of advanced multiphase steels. We employ a multimodal approach, well established in biology, by imaging the same field of view with different techniques. The core of the multimodal approach is to use several probes—light and electrons—and/or detection techniques to acquire more complete information about the system investigated and then train an ML model on the data to transform a LOM image. Here, we use this approach to successfully push the limits of LOM in the case of steel surfaces.

We are well aware that it is fundamentally impossible to create high-resolution images with more information than was stored in the original data without “hallucinating” the details. However, some portion of the higher-resolution information may only be hidden from human perception by the image blurriness, deformations and noise. We propose that the rest of the missing information can be completed from the knowledge of general properties of the investigated materials and their high-resolution images. This hypothesis is validated using the latter input in experiments of transforming LOM images into SEM images using deep learning techniques trained on an extensive image dataset of steels. The training data consist of LOM–SEM pairs of images of the same field of view. It is implicitly assumed that the ML model is able to generalize the necessary general properties. This is a natural consequence of the fact that ML finds statistical patterns that generalize the data outside the training dataset. Thus, the abovementioned software-based transformation of the LOM images represents more than a simple “style transfer”.

We tested different models and trained each of them to distill all of the important information from low-resolution images in order to combine it with general knowledge of the investigated materials—generalized about by the model during its training—and thus generate high-resolution images. We started with a so-called U-Net neural network architecture [13] but eventually switched to a model based on generative adversarial networks (GANs) [14]. This allowed us to achieve high precision and consistency in the generation process. Please see Section 2.5 for more details. Extra care was taken to prevent the model(s) from creating any details not present in the original low-quality data. This made the resulting pseudo-SEM images suitable for further processing such as by segmentation or phase classification. To the best of our knowledge, there are no publicly available GAN-based models for converting LOM images to SEM images. Thus, we expect the present findings to be of high potential to both the experimentalists and the steel industry.

2. Materials and Methods

2.1. Materials

Four types of steel were investigated. Their chemical composition is shown in Table 1. The first two chemical compositions were measured on an optical emission spectrometer; the third is declared by the manufacturer [15] and the fourth by the corresponding standard [16].

Each steel was wet-cut on a Struers Secotom-60 rip saw to the final sample dimensions with a maximum area of 100

{mm}^{2}

. The samples were then hot-mounted in a Struers CitoPress-1 press. The mounting of the samples preceded wet grinding on a Struers Tegramin-20 on MD Piano diamond discs of 220, 500 and 1200 grit and SiC abrasive foils of 2000 and 4000 grit for 3 to 5 min. Mechanical polishing followed on the same apparatus using MD Dac cloth using Struers diamond paste with a grain size of

3 μ

m and on MD Nap cloth using Struers diamond paste with grain sizes of 1 and

0.25 μ

m. Cooling of the pad was achieved by the addition of isopropyl alcohol. All samples were chemically etched in a solution of 100 mL ethanol and 4 mL concentrated nitric acid (Nital

4 %

) for 3 s for visualization of the structure and removal of the deformed layer after mechanical preparation. The S355J2 sample was etched for 6 s instead of 3 s. We intentionally selected a rather standard preparation procedure (without special etching chemicals that may enhance contrast) in order to ensure the procedure can be easily reproduced in the largest number of laboratories.

2.2. Microscope Equipment

LOM images were acquired on a fully automated Zeiss Axio Observer 7 materials inverted light microscope equipped with EC Epiplan-Neofluar objectives. The lens, with which the bright-field images were collected, is defined by a 100× magnification, a numerical aperture of

0.9

and a working distance of 1 mm. All LOM images were taken at a

1000 \times

magnification. The microscope is equipped with a microLED illuminator, its color temperature is 5700 K and it has a color-rendering index is equal to >90. Image quality is ensured by a 5 megapixel Zeiss Axiocam 305 color camera with CMOS Global Shutter technology [17]. Autofocus was turned on in most of the cases.

Confocal laser scanning microscopy (CLSM) images were acquired on a VK-X1000 microscope by KEYENCE (residing in Mechelen, Belgium). It is equipped with an X1100 head unit with a 404 nm violet semiconductor laser. The Nikon CF IC EPI Plan ApoDeluxe objective, with which the laser images were collected, is defined by a magnification of 150×, a numerical aperture of 0.95 and a working distance of 0.2 mm. All CLSM images were acquired at a magnification of

1500 \times

[18].

In order to obtain high-resolution images, we performed a series of measurements of our test samples on a scanning electron microscope (SEM). We used an ultra-high-resolution Magellan 400 FEG SEM with an Elstar column (Thermo Fisher Scientific Inc., Waltham, MA, USA). The microscope is equipped with several in-lens and out-lens detectors and can operate in ultra-high-resolution, high-resolution (HR) and beam deceleration modes. Our experiments utilized a circular backscatter segmented (CBS) detector, which was placed under the objective lens. The CBS detector is an annular detector (see Figure 1). Additional data from the Everhart–Thornley detector (ETD) were also acquired simultaneously with the CBS data. We imaged the sample with the following parameters: primary beam energy

E_{P} = 5

keV, beam current

I_{P} = 0.8

nA, working distance WD = 8 mm, signal from all segments of the CBS detector, HR mode.

2.3. Data Collecting

The task of locating specific regions of interest in a sample area measuring just a few micrometers by utilizing various imaging techniques can be notably difficult. This challenge is particularly pronounced when employing instruments from various manufacturers, as previously described. By adopting a colocalization grid, we developed a method to systematically capture a large volume of images from targeted regions using our equipment. We opted to simplify the navigation process by introducing a grid onto the finely etched metallographic sample. Initially, we glued a TEM grid to the sample surface (this pertains to the TRIP2 dataset only, the earliest data). After several iterations, an improved method for colocalization by engraved navigation grid was utilized instead (this pertains to the TRIP1, USIBOR and S355J2 datasets). See Figure 2 for a visual example of the two navigation grids we used independent of each other.

An auxiliary grid—either of the two types described above—facilitates correlative mapping of relatively large microstructural areas, with each grid cell measuring approximately 500 × 500

μ

m. This enables detailed examination at the required magnification across different microscopy modalities. The areas of interest were captured using images that partially overlapped—ranging from 10% to 20% overlap, depending on the technique—to facilitate subsequent processing, as outlined in the dataset workflow. For illustration, a single exemplary grid cell was mapped using 20 LOM images at a magnification of 1000×, 48 CLSM images at 1500× magnification and 48 SEM images at 1200× magnification. Further details are provided in Table 2.

2.4. Creating Datasets from Raw Images

The images obtained from each microscope present challenges in alignment due to their varying fields of view, aspect ratios, resolutions or degrees of overlap. In order to address these challenges and to simplify the alignment process, we adopted the workflow outlined in Figure 3, as detailed in our publication [19]. While this method facilitates rapid alignment, discrepancies may still occur because of the significant differences across imaging modalities, even after fine-tuning the alignment parameters. As a result, a meticulous review of the compiled dataset by an expert was indispensable.

Due to factors such as microstructure complexity, imaging conditions, stitching artifacts or contamination, a significant portion of the images in the USIBOR dataset required thorough scrutiny. In contrast, nearly 90% of the images registered in the TRIP2 dataset were deemed satisfactory. Table 3 contains the quantity and distribution of training examples in the final dataset.

In order to demonstrate the model’s capabilities, an ETD dataset was prepared for the TRIP1 steel. The workflow used for image registration was identical to that for CBS images with the exception that intensity-inverted ETD images replaced the CBS images. Intensity inversion was employed for the purpose of achieving a better alignment due to the significant visual differences between the grayscale LOM image and the original ETD images, which hindered proper automatic registration of the as-is images. The intensity of the aligned inverted ETD images was reverted back to its original value.

2.5. Method Alias ML Models

We used two models—a “vanilla” U-Net and a GAN-based model. A simplified description of the U-Net architecture is shown in Figure 4, and more details are present in Section 2.5.1. An illustration of the architecture of the GAN is displayed in Figure 5.

Traditional GANs are generative models that learn mapping from a noise vector z to an image y,

G : z \to y

[14]. Another well-known architecture called a conditional GAN [20] is trained to map an input image x and a noise vector z to an output image y,

G : \{x, z\} \to y

. The purpose of having the noise vector z in the input is to achieve “creativity” of the generative process. Some image-to-image models (see, e.g., Ref. [21]) attempting to achieve a good balance between creativity and determinism propose to train just mapping from an image to another image,

G : x \to y

, and compensate for the lack of stochasticity by adding dropout layers [22]. For our purposes, we need to avoid creativity (i.e., “hallucinations”) and stochasticity as much as possible; hence, we skip both the noise vector and dropout and train

G : x \to y

without noise. This results in less diverse but more consistent and predictable results.

2.5.1. Generator

Our generator G is based on the U-Net neural network architecture with an altered output layer. U-Net [13] is a fully convolutional neural network [23] that uses skip connections [24] to avoid the gradient vanishing problem. Our implementation of a U-Net consists of 10 convolutional layers in the contracting path and 10 convolutional layers in the expanding path. In contrast to the original implementation of the U-Net, we use a combination of up-sampling and convolution instead of single up-convolution layers, which turns out to be more resistant against creating so-called checkerboard artifacts in the output images [25].

The most important difference between our implementation and the original U-Net is in the last layer. Instead of the sigmoid activation function with binary cross-entropy loss function designed for the segmentation task, we use the hyperbolic tangent activation function with mean absolute error (MAE) as the loss function. The reason for usage of MAE instead of the more commonly used mean squared error (MSE) is that it typically produces sharper output images [26].

2.5.2. Discriminator

Discriminators in GAN architectures are typically neural network-based binary classifiers trained to distinguish fake (generated) images from real ones. In our model, we used a deep convolutional network consisting of 5 convolutional layers with batch normalization [27] and a leaky ReLU (LReLU) activation function, which turned out to be a better choice than standard ReLU [27]. We also employed the idea of a PatchGAN discriminator [21], which applies the discriminator on small patches of the investigated image and then computes the average loss. This method helps to improve the quality of the resulting images when working with a high resolution.

2.5.3. GAN Objective

The generator and the discriminator are trained in an adversarial way, i.e., in two steps. For each batch of training examples, we first optimize the loss function of the discriminator:

L_{d i s c} = \frac{1}{2} L_{BCE} [D (x, y), 1_{m \times n}] + \frac{1}{2} L_{BCE} [D (x, G (x)), 0_{m \times n}],

where

L_{BCE}

stands for the binary cross-entropy loss function [28] and

m \times n

are the dimensions of the patch matrix. The formula can be broken down into two parts—the first part, where the discriminator is optimized to predict ones for true output images y, and the second part, where the discriminator is optimized to predict zeros for fake (generated) output images.

Then we freeze the weights of the discriminator and optimize the loss function of the generator:

L_{g e n} = L_{MSE} [D (x, G (x)), 1_{m \times n}] + λ \cdot L_{MAE} [G (x), y],

(1)

where

L_{MAE}

is the pixel-wise mean absolute error and

λ

is a scalar coefficient. The first part of the loss function is designed such that the generator is trained to fool the discriminator. In the second part, we train the generator to minimize the MAE of the generated and true output images. The coefficient

λ

controls the relative importance of these two parts.

2.6. Experiments

A large dataset of LOM–SEM image pairs was used in the experiments. We collected 847 grayscale image pairs with a resolution of 1024 × 1024 pixels for the CBS detector and 206 image pairs in the same resolution for the ETD detector.

For testing the CBS detector, we used various types of steel materials. For testing the ETD detector, only TRIP 1 steel was used. The constitution of both the CBS and ETD datasets is described in Table 3.

Before training, the dataset was randomly split into training and validation datasets with proportions of 90% for training and 10% for validation. To increase the diversity of the training dataset, we applied several augmentation techniques. The augmentation procedure is defined as follows:

Choose a random image pair from the training dataset.
Perform a random crop of the paired images, resulting in an image size of 512 × 512 pixels.
With a probability of $0.5$ , apply a horizontal flip.
With a probability of $0.5$ , apply a vertical flip.

This procedure is applied to obtain all the samples forming each training batch.

We trained the CBS and ETD models separately. All models were trained for 10,000 epochs with a batch size of 8 on a Tesla V100 GPU unit with 16 gigabytes of internal memory. The training of each model took approximately 10 days.

The pixel grayscale 8-bit values were normalized to the interval

[- 1, 1]

before training and validation. The standardization mapping

S

and its inverse (employed when finalizing the predicted data) are presented in Equation (2).

S_{b} : x \to x / s_{b} - 1, S_{b}^{- 1} : x \to (x + 1) s_{b}, s_{b} = (2^{b} - 1) / 2,

(2)

where the scale

s_{b}

is half of the maximal achievable value

2^{b} - 1

in the original 0-based integer data corresponding to the bit depth

b = 8

3. Results and Discussion

Quantitative evaluation of the level of image improvement is quite complicated. We used a standard evaluation metric called root mean squared error (RMSE), originally designed for the evaluation of regression models. Specifically, we measured the root mean of the square differences between the pixels of the SEM and predicted images.

Using the 8-bit depth optimizes the size of the batch in the GPU-RAM, and we are convinced that 16-bit depth is unnecessarily large.

In order to demonstrate the benefits of the GAN architecture, we compared a vanilla U-Net model with the GAN model for the CBS dataset. The vanilla U-Net model has exactly the same architecture as the standalone generator from the GAN model. We obtained RMSE = 0.2109 for the vanilla U-Net and RMSE = 0.2059 for the GAN model after 10,000 epochs of training. Both RMSE values correspond to the standardized data (see Equation (2)).

Preliminary visual examination of the CBS predictions, with only a selection displayed in the below images, shows that the GAN model significantly outperforms the vanilla U-Net model on the CBS data. A similar procedure was repeated for the ETD data but only in the case of the GAN model.

Pearlite is composed of alternating layers of ferrite and cementite that form a lamellar structure. The lamellar structure is very fine and invisible (or hardly visible) in our light optical microscope; see Figure 6. The lamellas in the LOM micrographs coincide with each other, and the result is only a dark, blurred area. Reliable identification of a pearlite phase in the LOM micrographs is impossible. The U-Net prediction slightly improves the visibility of the pearlite internal structure, but lamellas are still invisible. GAN predictions are markedly realistic, and they enable us to identify the pearlite phase. Let us note that the pearlite structure is visible in LOM images reported in Ref. [11]. However the details of data acquisition are not explicitly mentioned, and we conclude that a coarser pearlite structure can be imaged using LOM.

Obviously, the LOM micrographs are hardly suitable for visualization of the complex microstructure of TRIP steel consisting of a ferrite–bainite matrix and secondary phases arising from the matrix (as a consequence of selective etching), such as martensite, retained austenite and martensite–austenite constituents. The secondary phases are very fine and hence partly blurred in the LOM. The U-Net and GAN predictions are able to depict the secondary phases and better define matrix properties. The GAN pictures present the structure more realistically than the U-Net, e.g., see the region marked by the second arrow from top in Figure 7.

The insufficiency of the simple U-Net model is clearly seen in Figure 8 and Figure 9, where the U-Net manages to approximate the boundaries among the phases but visual contrast among the phases is significantly suppressed. On the other hand, the GAN model retains the visual distinction of the secondary phases and at the same time represents the phase boundaries in a better way, which can be observed in the finer features.

The displayed region of the USIBOR sample, see Figure 10, consists mostly of the martensitic phase. We see that this pure martensite is the most difficult to describe for the models presented here.

In order to demonstrate a real-life application of LOM micrograph enhancement, the original LOM image was transformed into a CBS-like image using a GAN. The input RGB LOM image was automatically converted to an 8-bit grayscale format and then upscaled from its original size to a resolution approximately matching that of an SEM image, using bilinear interpolation. The upscaled image was then divided into 1024 × 1024 px tiles, which were suitable as input for model prediction. After the prediction of all tiles, they were stitched together using the OpenCV library to match the field of view of the original LOM image but with the pixel resolution of an SEM image. The original RGB LOM image and the CBS-like prediction can be compared in Figure 11, which illustrates the possible output of our model. We note that this LOM field of view has no corresponding SEM data measured.

The preparation of the dataset for training of the ML model revealed that, e.g., a high value of MS-SSIM does not necessarily guarantee well-aligned images. We decided not to base the discussion of the transformed LOM images solely on quantitative results of metrics measuring their “distance” from the target SEM (either CBS or ETD) images. We describe the transformation in terms of a steel microstructure analysis by the naked eye of an expert, one of the coauthors. Nevertheless, the metrics are still a useful tool in the postprocessing of the images, e.g., they clearly indicate that the GAN model performs better than the U-Net one.

Let us comment on the metrics and their use in more detail. As already described in Section 2.6, the data from pictures were standardized to the interval

[- 1, 1]

before training using the simple linear transformation in Equation (2). Consider a metric proportional to a power of absolute value of difference of pixel values, i.e., of the following type:

f_{p, q} (x_{1}, x_{2}) = C \times \sqrt[q]{∥{(x_{1} - x_{2})}^{p}∥} = C \times \sqrt[q]{\sum_{i = 1}^{N_{pixels}} {|x_{1, i} - x_{2, i}|}^{p}},

(3)

where the power is understood element-wise and C is a constant of proportionality which may be related to the pixel count

N_{pixels}

. Such a general case covers both the (R)MSE (

p = 2

) and MAE (

q = p = 1

) metrics.

Consider a linear transformation, specified by means of two scalar coefficients a and b, of the independent variables (such as the standardization mapping

S_{b}

and its inverse

S_{b}^{- 1}

in Equation (2)). Then it follows that

\begin{matrix} f_{p, q} (x_{1}, x_{2}) & \overset{x \to a x + b}{⟶} & C \times \sqrt[q]{∥{([a x_{1} + b] - [a x_{2} + b])}^{p}∥} = C \times \sqrt[q]{∥{(a [x_{1} - x_{2}])}^{p}∥} \\ = & {| a |}^{p / q} C \times \sqrt[q]{{∥x_{1} - x_{2}∥}^{p}} = {| a |}^{p / q} \times f_{p, q} (x_{1}, x_{2}) . \end{matrix}

The above considerations show that the same linear transformation applied to both images affects some of the metrics only by an overall multiplicative factor. This means that it does not alter the performance order of the different models when evaluated by metrics of the type in Equation (3). The above statement is not valid in the case of several other metrics tested. Some of them are not implemented in the case of noninteger input data (namely, MS-SSIM), and some produce a different order of the models for as-is and standardized values.

Let us consider the images displayed in Figure 6 through Figure 10 as a small sample of the test dataset, with each image containing some 262,000 pixels. We use several of the reasonable metrics as implemented in Python libraries Sci-Kit Image [29], version

0.22

, and sewar [30], version

0.4 . 6

. We calculate the values of metrics for all the SEM-based predictions and the as-is LOM image. The results are presented in Appendix A in the case of images displayed in this paper, with Figure 9 excluded since the TRIP2 steel is already represented.

Using these data only, we find that the (R)MSE and universal quality index (UQI) metrics prefer the U-Net model except on the TRIP2 steel (512-10_T5-15-TRIP2). See Table A1 and Table A2 for the detailed values. The differences in RMSE values are less than 10%. Quite surprisingly, the MAE has the lowest values in the case of as-is LOM images, which we disregard, followed by the GAN-CBS model. We attribute the better performance of GAN-CBS models over U-Net-CBS to the fact that the MAE was used in the generator training; see Equation (1).

Thus far, we have discussed the metrics of the type described in Equation (3) except for the UQI. Let us consider other metrics that take into account other features than the differences in the individual pixels. Such metrics are as their titles suggest, e.g., SSIM and MS-SSIM; these two do prefer the GAN-CBS model except for the case of TRIP1 steel; the differences in the SSIM values do not exceed 6%.

Thus, the differences among the U-Net-CBS and GAN-CBS models as measured by the above-discussed metric do not seem to be very large. Two SSIM-based metrics that take into account “the larger picture” and not only the differences in individual pixels prefer the GAN-based model. In other words, these two metrics indicate that the predictions of GAN-based models are somewhat better.

The above-described conclusion based on the metric values was visually corroborated when examining the images in Figure 6 through Figure 10 in detail. This is illustrated in the following three figures, yet to be described. This means that GAN performs better in the style-transfer part of the processing.

A zoom of a pearlite-heavy region is displayed in Figure 12. It shows that the boundaries between the pearlite and ferritic matrix are stricter in the case of both the U-Net and GAN models than in the case of the original LOM image. Both LOM and U-Net prediction misses information about pearlite’s inner structure, i.e., cementite laths are invisible. On the other hand, the GAN model indication of this inner structure is present, although the orientation is mostly incorrect. Nevertheless, this indication is enough to provide a hint that pearlite is observed. This shows that apart from more training data for pearlite, a separate model may be needed.

Figure 13 shows that the ETD is described by the GAN model accurately.

Furthermore, Figure 14 clearly shows the superiority of the GAN model over U-Net; the former presents improvement in the contrast and visibility of the secondary phases. The secondary phases become easier to separate from the matrix, and their shape is more precise and closer to SEM micrographs.

The USIBOR material is clearly the hardest to describe, see Figure 10. This is due to two factors. First, this mostly martensitic steel has the richest microstructure. Second, the dataset is not fully balanced, as indicated in Table 3; the USIBOR represents the smallest part of the training dataset. We decided not to artificially (increase the) augment(ation of) this particular material to avoid performance degradation on the other, more frequently occurring materials.

Now, let us close this section by discussing the robustness of the presented ML model. We intentionally used the “simplest” etching that is widely available. Furthermore, the range of settings used in imaging using LOM and SEM was rather narrow, though the use of the autofocus tool in the case of the LOM images ensured a certain degree of variability in the imaging conditions. On the one hand, this means this model is not very robust against such changes. On the other hand, it implies that employing the typical standard sample preparation procedure—widely available at low cost—should provide the best results. We believe that this represents a fair trade-off between demands on laboratory costs and skills of the operators (including but not limited to knowledge of which etching to use on which material to achieve the best contrast among the phases) and the applicability of the model.

4. Conclusions and Outlook

We presented a software-based transformation of LOM images trained on pairs of LOM and corresponding high-resolution SEM images acquired after a standard sample preparation technique (polishing and chemical etching with Nital). The resulting output of the neural network exceeds a simple “style transfer” by making some features—previously obscured in the as-acquired LOM images—more pronounced in the predicted output, i.e., it can be regarded as a super-resolution (pixel upscaling of the original LOM). The quality of the style transfer was measured with three relevant metrics (MAE, NRMSE and SSIM), comparing the predictions to the corresponding testing SEM data, which implied that the vanilla U-Net performance is worse. Furthermore, the data were analyzed by the naked eye of experts, and the findings clearly indicate improvements such as deblurring and denoising of the phase boundaries.

Thus, we are confident that the reported GAN-based transformation can improve any subsequent processing of the resulting transformed images provided the sample preparation procedure and imaging settings are reasonably close to those described in this paper. This, of course, includes semantic segmentation. As a result, we expect improvements in techniques such as machine learning-based prediction of material properties utilizing datasets combining knowledge of both the microstructure (analysis of surface micrographs) and mechanical properties of the samples. Because we kept the steel processing to a common standard, most notably etching with Nital, we believe the presented model could be successfully applied to LOM data measured by a wide range of metallographic laboratories.

A possible continuation of this work can include exploring different sample preparation techniques, attempting to improve the transformation model itself or training the model on a larger dataset when more data are acquired. A natural extension of the here-presented work is to proceed with the semantic segmentation—our original motivation—and to compare the results from as-acquired LOM images to those from the predicted transformed images.

Author Contributions

Conceptualization, Š.M.; methodology, J.Č., O.A., J.M., I.K. and E.M.M.; software, J.Č., J.M. and M.Z.; validation, I.K. and E.M.M.; formal analysis, M.Z. and J.Č.; investigation, O.A., P.J., Š.M. and J.M.; writing—original draft preparation, M.Z., J.Č., J.M., O.A. and Š.M.; writing—review and editing, M.Z.; visualization, J.Č., P.J., J.M. and I.K.; supervision, Š.M.; funding acquisition, Š.M. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge funding from the Lumina Quaeruntur fellowship established by the Czech Academy of Sciences (LQ100652201).

Data Availability Statement

The data are available upon a reasonable request.

Acknowledgments

Some computational resources used by M.Z. were provided by the e-INFRA CZ project (ID: 90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BSE	Backscattered electron
CBS	Circular backscatter segmented
ETD	Everhart–Thornley detector
GAN	Generative adversarial network
CLSM	Confocal laser scanning microscope/microscopy
LOM	Light optical microscope/microscopy
SEM	Scanning electron microscope/microscopy
TEM	Transmission electron microscope/microscopy
ML	Machine learning
MAE	Mean absolute error
MSE	Mean squared error
RMSE	Root MSE
SSIM	Structure similarity index measure
UQI	Universal quality index

Appendix A. Several Metrics Calculated on the Example Figures

We present values of metrics calculated for figures in Figure 6 through Figure 10 (except for Figure 9). The reference image is always the target modality, i.e., SEM—either CBS or ETD.

Table A1. Values of selected metrics in the case of the CBS-based images; the data correspond to Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10.

S355J	CBS	GAN-CBS	GAN-SSIM-CBS	LOM	U-Net-CBS
MSE	0.0000	1413.4103	1530.2152	5181.6819	1225.4711
RMSE	0.0000	37.5953	39.1180	71.9839	35.0067
UQI	1.0000	0.9279	0.9240	0.8334	0.9347
ERGAS	0.0000	6.1486	6.6714	9.1791	5.8335
SCC	1.0000	0.0182	0.0154	0.0017	0.0180
RASE	0.0000	1482.4335	1609.8404	2323.2832	1387.1829
SAM	0.0000	0.2418	0.2643	0.2551	0.2325
VIFP	1.0000	0.0611	0.0371	0.0705	0.0971
PSNR(-B)	∞	16.6281	16.2833	10.9861	17.2478
MAE	0.0000	93.8957	110.5815	83.4367	102.5156
SSIM	1.0000	0.2500	0.2306	0.2142	0.2493
NMI	2.0000	1.0272	1.0203	1.0307	1.0320
USIBOR	CBS	GAN-CBS	GAN-SSIM-CBS	LOM	U-Net-CBS
MSE	0.0000	1202.9259	1649.5680	5379.0297	1100.1092
RMSE	0.0000	34.6832	40.6149	73.3419	33.1679
UQI	1.0000	0.9600	0.9441	0.8542	0.9645
ERGAS	0.0000	5.8666	7.0141	8.6471	5.7163
SCC	1.0000	0.0028	0.0072	0.0009	0.0061
RASE	0.0000	1450.9247	1785.5065	2126.7761	1395.8939
SAM	0.0000	0.2303	0.2705	0.2287	0.2194
VIFP	1.0000	0.0483	0.0246	0.0634	0.0874
PSNR(-B)	∞	17.3284	15.9571	10.8238	17.7164
MAE	0.0000	130.0179	128.5432	72.2570	139.1927
SSIM	1.0000	0.1526	0.1344	0.1314	0.1519
NMI	2.0000	1.0093	1.0072	1.0105	1.0134
TRIP1	CBS	GAN-CBS	GAN-SSIM-CBS	LOM	U-Net-CBS
MSE	0.0000	1714.1705	2082.5065	6480.1903	1425.8503
RMSE	0.0000	41.4025	45.6345	80.4996	37.7604
UQI	1.0000	0.9249	0.9119	0.7907	0.9355
ERGAS	0.0000	7.4016	8.4751	10.3441	7.1666
SCC	1.0000	0.0362	0.0362	0.0028	0.0401
RASE	0.0000	1792.8091	2125.0509	2539.0422	1760.6165
SAM	0.0000	0.2936	0.3307	0.3088	0.2834
VIFP	1.0000	0.0738	0.0551	0.0593	0.0898
PSNR(-B)	∞	15.7903	14.9449	10.0149	16.5901
MAE	0.0000	104.7441	111.8847	84.9302	120.1928
SSIM	1.0000	0.2304	0.2371	0.1375	0.2366
NMI	2.0000	1.0179	1.0158	1.0164	1.0203
TRIP2	CBS	GAN-CBS	GAN-SSIM-CBS	LOM	U-Net-CBS
MSE	0.0000	1395.1641	1908.1480	3221.6828	1420.6772
RMSE	0.0000	37.3519	43.6824	56.7599	37.6919
UQI	1.0000	0.9660	0.9503	0.9268	0.9637
ERGAS	0.0000	5.6028	6.9035	6.6220	5.9237
SCC	1.0000	0.0396	0.0425	0.0024	0.0396
RASE	0.0000	1334.9829	1676.3173	1591.7851	1432.8856
SAM	0.0000	0.2080	0.2379	0.2188	0.1941
VIFP	1.0000	0.0648	0.0489	0.0470	0.0916
PSNR(-B)	∞	16.6846	15.3247	13.0500	16.6058
MAE	0.0000	148.6040	159.7932	83.1193	158.8045
SSIM	1.0000	0.2464	0.2343	0.1728	0.2406
NMI	2.0000	1.0189	1.0166	1.0155	1.0163

Table A2. Values of selected metrics in the case of the ETD-based images; the data correspond to Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10.

S355J	ETD	GAN-ETD	LOM
MSE	0.0000	1364.2305	13,368.6097
RMSE	0.0000	36.9355	115.6227
UQI	1.0000	0.9197	0.5922
ERGAS	0.0000	8.7869	14.7437
SCC	1.0000	−0.0031	0.0004
RASE	0.0000	2004.9993	3562.5937
SAM	0.0000	0.3503	0.4120
VIFP	1.0000	0.0240	0.0203
PSNR(-B)	∞	16.7819	6.8699
MAE	0.0000	89.9326	122.6471
SSIM	1.0000	0.2030	0.1431
NMI	2.0000	1.0116	1.0205
USIBOR	ETD	GAN-ETD	LOM
MSE	0.0000	2762.5560	17,876.9607
RMSE	0.0000	52.5600	133.7048
UQI	1.0000	0.8211	0.4843
ERGAS	0.0000	12.4827	15.7640
SCC	1.0000	0.0038	−0.0002
RASE	0.0000	2951.8036	3896.2233
SAM	0.0000	0.5053	0.5131
VIFP	1.0000	0.0231	0.0328
PSNR(-B)	∞	13.7177	5.6079
MAE	0.0000	96.8956	131.4236
SSIM	1.0000	0.0830	0.0515
NMI	2.0000	1.0046	1.0110
TRIP1	ETD	GAN-ETD	LOM
MSE	0.0000	2339.0107	9082.4975
RMSE	0.0000	48.3633	95.3021
UQI	1.0000	0.8978	0.7217
ERGAS	0.0000	11.4541	12.2462
SCC	1.0000	0.0006	−0.0038
RASE	0.0000	2503.5621	2925.4410
SAM	0.0000	0.3824	0.4173
VIFP	1.0000	0.0438	0.0190
PSNR(-B)	∞	14.4405	8.5488
MAE	0.0000	154.9834	107.7843
SSIM	1.0000	0.2197	0.1152
PSNR	∞	14.4405	8.5488
NMI	2.0000	1.0161	1.0251
TRIP2	ETD	GAN-ETD	LOM
MSE	0.0000	744.8763	16,487.5409
RMSE	0.0000	27.2924	128.4038
UQI	1.0000	0.9519	0.5109
ERGAS	0.0000	6.6815	14.9804
SCC	1.0000	0.0096	0.0065
RASE	0.0000	1200.4835	3703.5935
SAM	0.0000	0.2338	0.1805
VIFP	1.0000	0.0347	0.0132
PSNR(-B)	∞	19.4100	5.9592
MAE	0.0000	74.7704	125.0576
SSIM	1.0000	0.5126	0.3538
PSNR	∞	19.4100	5.9592
NMI	2.0000	1.0144	1.0220

References

Pan, G.; Wang, F.; Shang, C.; Wu, H.; Wu, G.; Gao, J.; Wang, S.; Gao, Z.; Zhou, X.; Mao, X. Advances in machine learning- and artificial intelligence-assisted material design of steels. Int. J. Miner. Metall. Mater. 2023, 30, 1003–1024. [Google Scholar] [CrossRef]
Ozdem, S.; Orak, I.M. A novel method based on deep learning algorithms for material deformation rate detection. J. Intell. Manuf. 2024. [Google Scholar] [CrossRef]
Pantilimon, M.C.; Berbecaru, A.C.; Coman, G.; Sohaciu, M.G.; Dumitrescu, R.E.; Ciuca, S.; Gherghescu, I.A.; Predescu, C. Preliminary structures assessment of some TRIP steels. Arch. Metall. Mater. 2023, 68, 491–498. [Google Scholar] [CrossRef]
Soliman, M.; Weidenfeller, B.; Palkowski, H. Metallurgical Phenomena during Processing of Cold Rolled TRIP Steel. Steel Res. Int. 2009, 80, 57–65. [Google Scholar] [CrossRef]
Wendler, M.; Weiß, A.; Krüger, L.; Mola, J.; Franke, A.; Kovalev, A.; Wolf, S. Effect of Manganese on Microstructure and Mechanical Properties of Cast High Alloyed CrMnNi-N Steels. Adv. Eng. Mater. 2013, 15, 558–565. [Google Scholar] [CrossRef]
Yamanaka, M.; Smith, N.I.; Fujita, K. Introduction to super-resolution microscopy. Microscopy 2014, 63, 177–192. [Google Scholar] [CrossRef] [PubMed]
Bachmann, B.I.; Müller, M.; Britz, D.; Durmaz, A.R.; Ackermann, M.; Shchyglo, O.; Staudt, T.; Mücklich, F. Efficient reconstruction of prior austenite grains in steel from etched light optical micrographs using deep learning and annotations from correlative microscopy. Front. Mater. 2022, 9, 1033505. [Google Scholar] [CrossRef]
Radwanski, K. Structural characterization of low-carbon multiphase steels merging advanced research methods with light optical microscopy. Arch. Civ. Mech. Eng. 2016, 16, 282–293. [Google Scholar] [CrossRef]
Rosenauer, A.; Krammer, K.; Stadler, M.; Turk, C.; Schnitzer, R. Influence of Ausforming on the Micro- and Nanostructure of PH 13-8 Mo Maraging Steels. Steel Res. Int. 2024. [Google Scholar] [CrossRef]
Elramady, A.; Sullivan, E.; Sham, K.; O’Brien, L.; Liu, S. Characterization of steel weld metal in multi-pass submerged arc welds after post-weld heat treatment using electron backscatter diffraction. Weld. World 2022, 66, 195–211. [Google Scholar] [CrossRef]
Azimi, S.M.; Britz, D.; Engstler, M.; Fritz, M.; Muecklich, F. Advanced Steel Microstructural Classification by Deep Learning Methods. Sci. Rep. 2018, 8, 2128. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Ramazani, A.; Prahl, U.; Bleck, W. Quantification of complex-phase steel microstructure by using combined EBSD and EPMA measurements. Mater. Charact. 2018, 142, 179–186. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
Automotive, A. Steels for Hot Stamping—Usibor and Ductibor. Available online: https://automotive.arcelormittal.com/products/flat/PHS/usibor_ductibor (accessed on 29 February 2024).
EN 10025-2:2004; Hot Rolled Products of Structural Steels, Part 2. Technical Report 10025-2; European Committee for Standardization: Brussels, Belgium, 2004.
ZEISS. Axio Observer 7 Materials. Available online: https://www.zeiss.com/microscopy/en/products/light-microscopes/widefield-microscopes/axio-observer-for-materials.html (accessed on 30 March 2023).
Corporation, K. VK-X1000 3D Laser Scanning Confocal Microscope. Available online: https://www.keyence.com/products/microscope/laser-microscope/vk-x100_x200/models/vk-x1000/ (accessed on 30 March 2023).
Čermák, J.; Ambrož, O.; Zouhar, M.; Jozefovič, P.; Mikmeková, V. Methodology for Collecting and Aligning Correlative SEM, CLSM and LOM Images of Bulk Material Microstructure to Create a Large Machine Learning Training Dataset. Microsc. Microanal. 2023, 29, 2016–2018. [Google Scholar] [CrossRef] [PubMed]
Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. arXiv 2016, arXiv:1611.07004. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar] [CrossRef]
Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv 2016, arXiv:1605.06211. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
Odena, A.; Dumoulin, V.; Olah, C. Deconvolution and Checkerboard Artifacts. Distill 2016. [Google Scholar] [CrossRef]
Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context Encoders: Feature learning by inpainting. arXiv 2016, arXiv:1604.07379. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
Good, I.J. Rational Decisions. J. R. Stat. Soc. Ser. 1952, 14, 107–114. [Google Scholar] [CrossRef]
van der Walt, S.; Schönberger, J.L.; Nunez-Iglesias, J.; Boulogne, F.; Warner, J.D.; Yager, N.; Gouillart, E.; Yu, T.; The Scikit-Image Contributors. Scikit-Image: Image processing in Python. PeerJ 2014, 2, e453. [Google Scholar] [CrossRef]
Khalel, A. sewar (a Python Library). 2023. Available online: https://pypi.org/project/sewar/ (accessed on 1 March 2024).

Figure 1. Schematics of the CBS and ETD detectors’ arrangement in the HR mode for collecting backscattered electrons (BSEs) emitted from the sample.

Figure 2. Illustrations of different navigation grids include (a) a TEM grid that was glued onto the sample (early data only), and (b) a picosecond laser-engraved navigation grid, with a subgrid and one of its individual cells highlighted in the top-left corner; a single square subgrid and its elemental square are highlighted by red color. Republished from Ref. [19] with permission.

Figure 3. This schematic illustrates the step-by-step workflow that was employed to generate a final dataset of correlative images of a bulk metallographic sample captured using SEM, CLSM and LOM modalities. The process begins with the engraving of a navigation grid and culminates in the creation of the final dataset. Republished from Ref. [19] with permission.

Figure 4. The U-Net neural network architecture. Both input and output are grayscale images and their three dimensions—pixels in both directions and the number of channels—are explicitly provided. All other single values beside individual components represent the number of features/filters in the corresponding convolutional layers except where otherwise noted. The number of pixels is clear from the operations performed; convolution is always paired with padding (using TensorFlow’s function Conv2D with its parameter padding set to value ‘same’) at the edges to prevent pixel-count reduction. Two inputs are joined by a simple layer concatenation indicated by a circle.

Figure 5. Visualization of the discriminator part of the GAN architecture. The U-Net serves as the generator (see Figure 4) and a basic CNN (displayed) as the discriminator. The meaning of symbols is as in Figure 4. The “zoom” in the left-most part indicates an internal batching process.

Figure 6. The first row displays as-measured preprocessed data; the second row comprises transformed LOM images. As the labels indicate, the first column represents the U-Net results and the rest are GAN results. We marked some occurrences of several phases which may include ferrite (red “F”), pearlite (red “P”) and (other) secondary phases. The displayed field of view represents one corner-aligned 512 × 512 tile, a quarter of a single 1024 × 1024 image in the dataset. The material is construction steel S355J2.

Figure 7. The same as Figure 6 in the case of TRIP1 steel. Yellow “SP”; each highlighted example is indicated by an arrow. We marked some occurrences of several phases which may include bainite (red “B”), ferrite (red “F”).

Figure 8. The same as Figure 6 in the case of TRIP2 steel. Yellow “SP”; each highlighted example is indicated by an arrow.

Figure 9. The same as Figure 6 in the case of TRIP2 steel (another dataset). Yellow “SP”; each highlighted example is indicated by an arrow.

Figure 10. The same as Figure 6 in the case of boron steel for hot-stamping USIBOR.

Figure 11. Comparison of an original RGB LOM image (2464 × 2056 pixels) with the CBS-like prediction (8069 × 6745 pixels). Both images were cropped to the same field of view in order to remove minor artifacts in the prediction (due to inelastic stitching of individual predicted tiles).

Figure 12. Zoom of a region of interest in the case of S355J2 steel, top-right corner of segments in Figure 6.

Figure 13. The same as Figure 12 in the case of TRIP1 steel, top-right corner of segments in Figure 7. The arrows highlight visual improvements over LOM. We note that pure U-Net ETD prediction is missing as this model was not trained on ETD data.

Figure 14. The same as Figure 12 in the case of TRIP2 steel, slightly below the center in Figure 9. The arrows highlight visual improvements over LOM.

Table 1. Chemical composition [wt. %] of the steel samples considered.

Element	TRIP2	TRIP1	USIBOR 1500	S355J2
C	0.20	0.23	≤0.25	≤0.20
Si	1.49	1.46	≤0.40	≤0.55
Mn	2.09	2.02	≤1.40	≤1.60
P	0.010	0.010	≤0.030	≤0.025
S	0.0005	0.0006	≤0.010	≤0.025

Table 2. Description of the as-measured data. We note the final dataset consists of grayscale (GS) images, i.e., of a single channel, of size 1024 × 1024 pixels with bit depth equal to 8.

Modality	No. of Channels	Size	Field of View	Linear Pixel Density	Bit Depth
	[1]	[px × px]	[ $μ$ m × $μ$ m]	[1/ $μ m$ ]	[1]
LOM	3	2464 × 2056	164.3 × 137	15	8
SEM-ETD	1	6144 × 4096	124 × 82.67	49.5	16
SEM-CBS	1	6144 × 4096	124 × 82.67	49.5	16
CLSM	3	2048 × 1536	97.18 × 72.88	21	8

Table 3. Distribution of a final training dataset (847 grayscale 8-bit image pairs (LOM/CBS) 1024 × 1024 px, 206 grayscale 8-bit image (LOM/ETD) 1024 × 1024 px), both absolute and relative number of LOM–SEM pairs in the case of the two modes CBS and ETD.

Counts	S355J2	TRIP1	TRIP2	USIBOR	Total
CBS, absolute	229	339	178	101	847
CBS, relative [%]	27	40	21	12	100
ETD	0	206	0	0	206
ETD, relative [%]	0	100	0	0	100

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mikmeková, Š.; Zouhar, M.; Čermák, J.; Ambrož, O.; Jozefovič, P.; Konvalina, I.; Materna Mikmeková, E.; Materna, J. Deep Learning-Powered Optical Microscopy for Steel Research. Mach. Learn. Knowl. Extr. 2024, 6, 1579-1596. https://doi.org/10.3390/make6030076

AMA Style

Mikmeková Š, Zouhar M, Čermák J, Ambrož O, Jozefovič P, Konvalina I, Materna Mikmeková E, Materna J. Deep Learning-Powered Optical Microscopy for Steel Research. Machine Learning and Knowledge Extraction. 2024; 6(3):1579-1596. https://doi.org/10.3390/make6030076

Chicago/Turabian Style

Mikmeková, Šárka, Martin Zouhar, Jan Čermák, Ondřej Ambrož, Patrik Jozefovič, Ivo Konvalina, Eliška Materna Mikmeková, and Jiří Materna. 2024. "Deep Learning-Powered Optical Microscopy for Steel Research" Machine Learning and Knowledge Extraction 6, no. 3: 1579-1596. https://doi.org/10.3390/make6030076

Article Menu