Open AccessArticle

OptimalNN: A Neural Network Architecture to Monitor Chemical Contamination in Cancer Alley

Uchechukwu Leo Udeji

^1,*

and

Martin Margala

^2,*

Department of Electrical and Computer Engineering, University of Massachusetts Lowell, Lowell, MA 01854, USA

Department of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, LA 70504, USA

Authors to whom correspondence should be addressed.

J. Low Power Electron. Appl. 2024, 14(2), 33; https://doi.org/10.3390/jlpea14020033

Submission received: 17 March 2024 / Revised: 26 May 2024 / Accepted: 1 June 2024 / Published: 10 June 2024

(This article belongs to the Special Issue Advancements in Low-Power Ubiquitous Sensing, Computing, and Communication Interfaces for IoT: Circuits, Systems, and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The detrimental impact of toxic chemicals, gas, and oil spills in aquatic environments poses a severe threat to plants, animals, and human life. Regions such as Cancer Alley exemplify the profound consequences of inadequately controlled chemical spills, significantly affecting the local community. Given the far-reaching effects of these spills, it has become imperative to devise an efficient method for early monitoring, estimation, and cleanup, utilizing affordable and effective techniques. In this research, we explore the application of U-shaped neural Network (UNET) and U-shaped neural network transformer (UNETR) neural network models designed for the image segmentation of chemical and oil spills. Our models undergo training using the Commonwealth Scientific and Industrial Research Organization (CSIRO) dataset and the Oil Spill Detection dataset, employing a specialized filtering technique to enhance detection accuracy. We achieved training accuracies of 95.35% and 91% by applying UNET on the Oil Spill and the CSIRO datasets after 50 epochs of training, respectively. We also achieved a training accuracy of 75% by applying UNETR to the Oil Spill dataset. Additionally, we integrated mixed precision to expedite the model training process, thus maximizing data throughput. To further accelerate our implementation, we propose the utilization of the Field Programmable Gate Array (FPGA) architecture. The results obtained from our study demonstrate improvements in inference latency on FPGA.

Keywords:

artificial neural network; transformers; chemical contamination; neuromorphic circuits; cancer valley; mixed-precision; FPGA

1. Introduction

Cancer Alley spans 85 miles in southeastern Louisiana, stretching from New Orleans to Baton Rouge along the Mississippi River, with a population of approximately 45,000 [1]. This region hosts around 150 plastic plants, chemical facilities, and oil refineries, and this number continues to grow despite the evident environmental impact. The air in Cancer Alley is characterized by toxic emissions and ranks among the most polluted in the United States [1]. Approximately 50 toxic chemicals, including benzene, formaldehyde, ethylene oxide, and chloroprene, contribute to air pollution, with chloroprene being particularly concerning. The pollution in Cancer Alley has severe consequences for residents, many of whom eventually require nebulizers for survival. The recent coronavirus pandemic further exacerbated their plight because of their compromised health. Despite efforts by the Environmental Protection Agency (EPA) to regulate industries in the area and enhance the living standards of its residents, individuals in this region still face a 95% higher risk of cancer from air pollution compared with the rest of America [2,3]. The potential for catastrophic damage to land, marine, and coastal ecosystems underscores the importance of early detection and cleaning of oil and chemical spills in Cancer Alley to minimize environmental harm [4,5].

Chemical spill incidents exhibit a distinctive appearance in satellite images generated by Synthetic Aperture Radar (SAR) technology. This distinctiveness arises from the short gravity waves they induce, altering radar backscatter intensity and creating unique dark formations in SAR images [6,7,8]. Exploiting this characteristic enables the segmentation of the resulting SAR images and facilitates the training of a neural network model on the acquired data. The field of segmentation, as outlined in the literature [9], encompasses various types, including foreground segmentation, panoptic segmentation, semantic segmentation, and instance segmentation. Real-time segmentation primarily aims to predict masks over objects within an image frame with low latency.

Image segmentation is the process of dividing an image into distinct segments, thus enhancing the ease of analysis and comprehension [9]. This technique finds applications in critical fields such as healthcare, transportation, and pattern recognition. Various image segmentation algorithms fall into categories such as basic threshold-based, graph-based, morphological-based, edge-based, clustering-based, Bayesian-based, and neural network-based segmentation. Each of these algorithms comes with its own set of advantages and disadvantages, tailored to specific applications. Numerous studies have explored the topic, including the recent publication by A. Kirillov et al. [9], which introduces the “segment anything” framework by Meta AI. Their study implements a prompt-based segmentation tool trained on the most extensive segmentation dataset to date, utilizing 256 GPUs. The SA-1B dataset, created using Meta’s custom data engine, comprises 1 billion masks and 11 million images collected from various countries and continents worldwide. Models trained on the SA-1B dataset demonstrate a capacity to generalize across a wide range of data. However, the model, a transformer model, has notable drawbacks, requiring a substantial amount of energy for training. Additionally, the study reveals that training accuracy improves with larger datasets, indicating a need for more data to train more precise models [10]. In another study, Olaf et al. [11] developed the U-NET, a specialized neural network for image segmentation tasks. The training strategy in their study relies on data augmentation during training, enabling effective training on a minimal number of images compared with the previous study [10]. The U-NET achieves high accuracy of approximately 92 percent and 77.56 percent when trained on the PhC-U373 and DIC-HeLa datasets, respectively, for image segmentation tasks. Subsequently, Oktay et al. introduced the Attention U-Net [12], a more energy-efficient adaptation of the U-NET. This model efficiently learns to focus on target structures of varying shapes and sizes within the dataset, maintaining prediction accuracy without significant energy costs. In addition to these segmentation-focused studies, others, documented in references [13], have developed neural network models specifically for monitoring oil spills, while [14,15] delve into the development of neural network models for healthcare-related applications.

In this research, we employed neural network models based on the U-NET architecture for image segmentation, specifically targeting chemical and oil spills. Our model was trained using the CSIRO dataset and the Oil Spill Detection dataset. Furthermore, we introduced mixed precision to streamline the model training process, optimizing data throughput on both the CPU and GPU platforms. As an additional acceleration strategy, we advocate for the adoption of FPGA architectures, leveraging frameworks like the FINN Xilinx framework [16,17] and HLS4ML [18] to synthesize bitstreams for machine learning models quickly. The structure of this manuscript unfolds as follows: Section 2 provides an expansive description of the segmentation approach employed in this study. Section 3 delves into the intricacies of neural networks, while Section 4 elucidates the concept of mixed precision. Section 5 elucidates the methodology applied in training the neural network models under investigation. Section 6 unveils the preliminary simulation results conducted on both CPU (central processing unit) and GPU (graphics processing unit). Following that, Section 7 outlines our FPGA optimizer architectures and presents the corresponding simulation results on the FPGA platform. Section 8 expounds on the results and challenges of this study. Finally, Section 9 serves as the conclusion of this study.

2. Segmentation of Chemical Spills

Segmentation is a computer vision task that involves the classification of pixels within an image into classes. The image segmentation task involves extensive pixel-based processing, emphasizing the necessity of a thorough understanding of the data before selecting an appropriate model. Preceding the model training phase, we conducted a detailed examination of the images to enhance our comprehension of the datasets used in this study. Figure 1 illustrates the color distribution of a randomly selected open-source sample of an RGB image and a sample from the Oil Spill dataset across both the RGB and HSV color spaces. The diagram showcases the color distribution of our data sample compared to a typical RGB image. As shown in Figure 1, color spaces offer insight into the dispersion or concentration of content across the color channels of our images. Leveraging this understanding, we can determine the most suitable technique for segmenting different components of the image. In Figure 1, the distribution of samples from the Oil Spill dataset follows a linear pattern, differing from that of the golden fish. However, it displays varying color intensities, offering guidance on the optimal approach for crafting a segmentation model to capture the various segments of the images.

Semantic segmentation and instance segmentation stand out as the two predominant forms of segmentation used today. In instance segmentation [9], individual objects are identified and segmented within an image, with each instance assigned a unique label or color. Semantic segmentation, on the other hand, categorizes each pixel in an image into one of several predefined classes, where objects belonging to the same class share the same label or color. This contrasts with instance segmentation, which treats each object instance as a separate entity within the image. For this study, semantic segmentation is employed.

3. Neural Network

In this research, we employed neural networks based on the U-NET architecture for the segmentation task. The U-NET architecture, as described in previous works [11,12], utilizes techniques such as data augmentation, convolution, pooling, upscaling, and downscaling to achieve its distinctive U-shaped network structure. Because of its ability to attain high training accuracy in a shorter time, the U-NET is well-suited for large-scale oil and chemical spill detection, offering a more power-efficient training process compared with transformer models.

Presently, various neural network models are utilized in segmentation tasks, including the robust SegmentAnything transformer model by Meta AI, the Vanilla U-NET model, the Attention U-NET model, and others. However, transformer models are less power-efficient for this specific task, requiring extensive training on large datasets to achieve comparable accuracy to the U-NET, which can achieve satisfactory results with minimal training. The U-NET implementation in this study is tailored to accommodate the distinct datasets, adapting to variations in image sizes across the datasets.

Architecture

The U-NET architecture adopted in this study closely resembles the configuration described in the previous work [11]. However, unlike the previous work [11], our study employs oil spill datasets for training the model. Illustrated in Figure 2, the U-NET consists of both a contracting and expansive path. The contracting path iteratively employs convolution, followed by rectified linear unit (ReLU) and max-pooling operations. In our architecture, the convolution layers’ feature channels extract features in the form of feature maps, which are subsequently propagated down the network.

The transformer variation of UNET is named UNETR [19]. In UNETR, the downscaling or encoding portion of the network is replaced with a transformer encoder, and the upscaling or decoding portion of the network maintains the U-shape, as shown in Figure 3. The UNETR transformer encoder is directly connected to the decoder via skip connections, instead of an attention layer, at different resolutions to compute the final three-dimensional (3D) semantic segmentation output. Skip connections, just like in the UNET model, help the network preserve information about features from the original input at each convolution level. Unlike the local modeling capacities of convolutional neural networks (CNNs) transformers encode images as a sequence of 1D patch embeddings and utilize self-attention modules to learn the weighted sum of values that are calculated in the hidden layers.

The encoder shown in Figure 3 below and in Figure 1 [19] comprises a positional encoding layer, a stack of encoding layers that make up the encoder, and the causal attention and feed-forward layers. The encoder reads input signals and generates representations of the input data after it has learned the sequence representations of the input volume and effectively captured the global multi-scale information. The decoder, on the other hand, generates output word by word, based on the output signal representation, in the form of tokenized patches generated by the encoder. The vanilla decoder model comprises a stack of decoder layers and a positional encoder. The decoder layers contain global self-attention/cross-attention and feed-forward layers. The encoder and decoder are used to build the transformer model. In this work, we replace the decoder section with a U-NET decoder.

Transformers used for image recognition tasks are commonly called vision transformers. Therefore, our UNETR architecture in Figure 3 can be more precisely referred to as a vision UNETR model. Figure 4 shows how images are encoded by vision transformers [20] for image-related classification tasks. The input image is split into patches that constitute a linear sequence of tokens similar to words in the case of the Bidirectional Encoder Representations from Transformer (BERT) model [21]. The Multiheaded Self Attention (MSA) block involves computing self-attention on each head and finally concatenating the results as shown in (1) to (13). The computation on each head can be parallelized. By observing the data structure of the transformer, we came up with our design of a hardware accelerator.

X \in R^{H * W * D * C}

(1)

X_{v} \in R^{N * (P^{3} * C)}

(2)

N = (H * W * D) / P^{3}

(3)

E_{p o s} \in R^{N * K}

(4)

E \in R^{(P^{3} * C) * K}

(5)

Z_{0} = [x_{v}^{1} E; x_{v}^{2} E; \dots; x_{v}^{N} E] + E_{p o s}

(6)

Z_{i}^{'} = M S A (N o r m (Z_{i - 1})) + Z_{i - 1}, i = 1 \dots L,

(7)

Z_{i} = M L P (N o r m (Z_{i}^{'})) + Z_{i}^{'}, i = 1 \dots L,

(8)

A = S o f t \max (\frac{q k^{T}}{\sqrt{K_{h}}})

(9)

K_{h} = \frac{K}{n}

(10)

S A (z) = A v

(11)

M S A (z) = [S A_{1} (z); S A_{2} (z); \dots; S A_{n} (z)] W_{m s a}

(12)

W_{m s a} \in R^{n * K_{h} * K}

(13)

In the equations above, X in (1) represents the input, R denotes real numbers, H, W and D denote the height, weight, and depth of our image frames, while C represents the number of input image channels. Xv in (2) represents the flattened uniform non-overlapping patches version of X. P in (2) denotes the resolution of each patch, and N is the length of the sequence, estimated using (3). Epos in (4) denotes learnable positional embedding, while E in (5) represents projected patch embedding. K in (4) and (5) denotes the size/dimension of the embedding space. Z in (6)–(8) represents the output sequence of the query (q) and the corresponding key (k) and value (v) pairs. A in (9) represents the attention weights/scores, and k represents the key. Kh in (10) is the scaling factor used to maintain the number of parameters to a constant value with different key values. In (11), v denotes the values of the input sequence and is used to calculate SA in sequence z. MSA represents multiheaded self-attention and is represented by (12). Wmsa in (13) is the multiheaded trainable parameter weights.

4. Mixed Precision Architecture

The mixed precision architecture is an optimization technique harnessing the computational power of GPU cores, resulting in 2 to 4 times faster computation and a 50% reduction in memory usage. This approach creates a potent compute engine without necessitating alterations to the hardware architecture [22]. Specifically, Volta cores in NVIDIA GPUs, with a data throughput of 123 teraflops, experience significant benefits from this architecture [22]. By employing 16-bit precision instead of 32-bit precision, computing throughput in Volta cores can be enhanced by a factor of 8, memory throughput can be doubled, and the data unit input size can be halved [22].

In our implementation, we opted for mixed precision over a constant 16-bit precision to address potential imprecision in weight updates associated with FP16. This precision choice is crucial, as cumulative errors could significantly impact the final predictions. Mixed precision allows us to achieve nearly the same training and prediction accuracy as FP32 without altering hyperparameters. NVIDIA libraries, optimized for tensor cores, derive significant advantages from this architecture [22].

Major machine learning frameworks like PyTorch and TensorFlow have seamlessly integrated the mixed precision feature into their frameworks, facilitating the implementation of automatic mixed precision with just a few lines of code, as illustrated in Figure 5 and Figure 6. For further customization, the mixed precision method can be manually added to different sections or lines of code.

In our framework, we utilized the APEX AMP (automatic mixed precision) PyTorch extension to implement mixed precision seamlessly with minimal code in Nvidia A100 GPU. Figure 6 illustrates the concept of mixed precision, where FP16 and FP32 values are cast to preserve accuracy. A scale factor of 128, commonly used for loss scaling, is employed to maintain values and accuracy, serving as a constant in our study.

5. Training the Model

This phase of this project is the most resource-intensive. We conceptualized the U-NET model, conducted a profiling analysis on our model utilizing the PyTorch profiling library, implemented mixed precision, and scrutinized the resource utilization on both CPU and GPU.

5.1. Datasets

In this study, we utilized the CSIRO Sentinel dataset and the Oil Spill Detection dataset. The Oil Spill dataset [23], renowned for its non-commercial nature, has been widely adopted in numerous studies owing to its well-organized structure and ease of use in model training [24,25]. Conversely, the CSIRO Sentinel dataset [26] is expansive and open source but lacks pre-segmented ground truth labels, adding a layer of complexity to our task. The CSIRO dataset comprises 5630 binary images categorized into two classes, denoted as “0” and “1,” where “0” represents images without any oil features (resembling clean seas), while “1” includes images featuring oil. The images in the CSIRO dataset have dimensions of 400 × 400 pixels, distinguishing them from those in the Oil Spill dataset.

The CSIRO dataset is generated via synthetic aperture radar (SAR) sensors [27]. These sensors are active microwave satellite instruments that operate day and night in any weather conditions, with wide swaths (>100 km) that can cover large areas of the ocean. They transmit repeated, regular short pulses of radio waves at a rate of about 100 microseconds and record the strength, phase, and travel time of the returning signal. Oil spill signatures in the generated image typically appear as dark patches because of the decreased radar backscatter compared with the much brighter surrounding seawater. These images can be used to assess the frequency and spatial distribution of oil spills. The information provided by SAR imagery used to create these data was found to be far superior to that from optical and thermal satellite imagery [27].

5.2. Preprocessing

Before initiating the model training, we generated ground truth labels for the CSIRO Sentinel dataset, marking a notable achievement due to the dataset’s substantial size. As the dataset comprised unlabeled images, we employed LabKit software [28] to label the images manually. LabKit, an open-source tool, facilitates semi-manual image segmentation through selected samples and a random forest algorithm. The segmentation output from LabKit is in TIFF format, which prompted us to extract cropped images (mask) of the segmented samples. Subsequently, we resized these cropped images in paint software to obtain the final output, as illustrated in Figure 7.

While exploring alternative tools, we experimented with LabelStudio [29] and QuPath [30]. LabelStudio proved less suitable for this task, generating multiple images for various segments and making the output challenging to utilize.

5.3. Feature Extraction

The U-NET neural network leverages convolution and augmentation to extract features. To enhance its performance, we incorporate a Gaussian filter to reduce background noise effectively. This filtering process aids in better distinguishing between background pixels and surrounding pixels.

5.4. Classification

Given the similarity between the Oil Spill dataset and the CSIRO dataset, we adhered to the color labeling standards employed in various studies for SAR images [6,24,31], as depicted in Figure 8. Although there are emerging datasets [32,33], one of the objectives of this research is to establish a standardized color representation. This standardization aims to provide a consistent framework across diverse datasets, enabling researchers in the field to access a more extensive and uniformly labeled dataset for training purposes.

6. Simulation Results

After training our model for a total of 50 epochs using NVIDIA A100 GPU on Google Colab, we obtained 91% and 93% training accuracy for the two datasets, as shown in Table 1 below.

The results in Table 1 indicate a training accuracy of approximately 75% with the UNETR model, which is noticeably lower than the approximate 95% accuracy achieved with the UNET model. Vision transformer models typically demand a greater volume of data samples to match the accuracy levels of deep neural network models such as UNET. Therefore, employing a larger dataset might yield noticeable improvements in the training accuracy of the UNETR model. A limitation of the Oil Spill dataset that could have influenced the training and testing accuracies may lie in the composition of the available samples. The dataset prominently features images containing ocean, oil look-alike, and land samples. The emphasis on these features may have skewed the inference results towards more positive outcomes for these three classes.

In Figure 9 and Figure 10, we show the plot of training and testing accuracies over time for the Oil Spill and the CSIRO datasets over 50 epochs.

7. FPGA Accelerator

Integrating an FPGA accelerator serves the purpose of power optimization, and improved streaming efficiency. The SegmentAnything model, for instance, employs around 250 GPUs during training, resulting in significant power consumption. In this study, we save our model in ONNX format for compatibility with FINN. The FINN framework [16] incorporates the Brevitas library, allowing the generation of FPGA accelerators using pre-trained models. ONNX, as an open-source format, is employed to represent machine learning models. The FINN framework takes the ONNX file and generates an FPGA model for each layer of the network, establishing communication between layers through AXI streams.

The Brevitas framework [34], which works with the FINN builder, is used in the development of an FPGA accelerator for our model. The Brevitas framework is a PyTorch library for neural network quantization, with support for both post-training quantization (PTQ) and quantization-aware training (QAT) [34]. It offers quantized implementations of the most common PyTorch layers used in deep neural networks (DNNs) under brevitas.nn. This includes QuantConv1d, QuantConv2d, QuantConvTranspose1d, QuantConvTranspose2d, QuanMultiheadAttention, QuantRNN, QuantLSTM, and several others. For each of these layers, the quantization of different tensors (input, weight, bias, outputs, and other factors) can be individually tuned according to a wide range of quantization settings [34]. Brevitas enables fine-grain quantization-aware training [16].

Another tool used to generate our accelerator is ONNXRuntime [35], which is used for integration with standard ONNX-based toolchains. ONNX is prebuilt in PyTorch as torch.ONNX. It also has the transformers.onnx package, which converts transformer models to ONNX-format models. Open Neural Network eXchange (ONNX) is an open standard format for representing machine learning models. This module captures the computation graph from a native PyTorch torch.nn.Module model and converts it into an ONNX graph, which can be exported and consumed by several runtimes that support ONNX. The ONNX standard supports down to 8-bit quantization, but another version named Quantized ONNX (QONNX) supports expressing down to 1-bit quantization for both weights and activations [16].

Finally, the FINN framework [16] is a quantization-aware framework used for the generation of custom FPGA dataflow accelerators or to register transfer language models (RTL model). It is designed to work with the ONNX model. FINN uses ONNX as an intermediate representation for neural networks, as such almost every FINN component uses ONNX and its Python API. FINN supports two specialized variants of ONNX, namely, QONNX and FINN-ONNX. FINN also provides a ModelWrapper class, a thin wrapper around the ONNX model to make it easier to analyze and manipulate ONNX graphs. This wrapper provides many helper functions, while still giving full access to ONNX protobuf representation.

FINN supports three types of mem_mode attributes for the node MatrixVectorActivation [16]. This mode controls how the weight values are accessed during the execution phase. The mode setting has a direct influence on the resulting circuit. The three settings for the mem_mode supported in FINN are “const”, “decoupled”, and “external”. Each comes with its own advantages and disadvantages. Figure 11 shows the design flow employed in the design of our accelerator.

A significant challenge encountered during our model development was ensuring the proper functioning of the software stack. We attempted to utilize HLS4ML [18] as an alternative to FINN, primarily designed for Keras, but faced similar compatibility issues. Both frameworks exhibited instability during accelerator development. However, they hold promise for significantly enhancing the speed of FPGA bitstream development for machine learning models in the future, given their ongoing development. To overcome the hurdles associated with developing and verifying bitstreams using FINN and HLS4ML, we opted to design accelerators for the UNET and UNETR models from scratch using High-Level Synthesis (HLS). The resulting accelerators are depicted in Figure 12 and Figure 13.

FPGA Inference Results

We generated an FPGA design for our model via HLS and verified the design on the Pynq Z1 board. Currently, resource usage by our FPGA design suggests low power consumption by the Pynq Z1 board [36]. Table 2 shows the resource usage profile of our UNET and UNETR models, as well as the inference latency achieved.

8. Discussion

The semantic segmentation technique utilized in this study assigns a class label to each pixel in the image samples from our dataset. Figure 8 illustrates the names of the various classes present in our dataset, while Figure 14 reveals that the ocean (background) constitutes most of the samples. This distribution suggests that our models are more likely to predict class 0 (ocean) accurately because of its predominance among the samples compared with the other classes.

In Figure 15, we evaluate our classification model’s performance using a confusion matrix, specifically focusing on the UNET model. The results indicate that all classes perform well except for class 3 (ship). The UNET model struggles to distinguish between the ocean (class 0) and ships (class 3), frequently misclassifying ships as the ocean. This issue can be attributed to the fact that class 3 has the fewest samples (22,981) in the dataset, as shown in Figure 14, which may be insufficient for the model to generalize effectively with only 50 training epochs. Figure 16 illustrates the differences between the test mask and the predicted mask after training the model for 50 epochs. Finally, we performed inference on FPGA and displayed the results in Figure 17. In the real world, the impacts of chemical spills and contamination are not only prevalent in Cancer Alley but also in other less-developed parts of the world. The results of this study will have far-reaching implications and reduce the cost of monitoring contamination and effectively detecting chemical spills.

To compare the performance of our model, we looked at related studies that applied neural network models for image segmentation of an oil spill dataset or a related dataset, as shown in Table 3. C. Li et al. [37] perform image segmentation using dual stream U-NET (DS-UNET) on two datasets, namely, the Palsar and sentinel datasets. Their study further measures model performance according to three metrics, namely, the dice similarity coefficient (DSC), the average Hausdorff distance (HD), and the F1 score. Another study by A.V., Maria Anto et al. [38] uses a convolutional neural network (CNN) for oil spill detection. They achieved 85% testing accuracy. The study by J. Fan and C. Liu [39] addresses two problems including the scarcity of sufficient oil spill data and the difficulty in detecting oil spills in an environment where there is an oil spill look-alike. Their study [39] uses multitask generative adversarial networks (MTGANs) to detect and semantically segment oil spill data. They applied their model to three datasets, namely, the Sentinel-1 dataset, ERS-1/2, and the GF-3 Satellite datasets. In [40], the study by X. Kang et al. uses a self-supervised spectral–spatial transformer network (SSTNet) for feature extraction using custom hyperspectral oil spill database (HOSD) data. The training technique applied in this method involved a large number of training epochs to achieve a model that can generalize with high accuracy. Another study by J. Fan et al. [41] built a framework using a multi-feature semantic complementation network (MFSCNet) for oil spill localization and segmentation of SAR images obtained via Sentinel-1 satellite data. The study by Mahmoud, A.S. et al. [42] applies a novel deep learning UNET model based on the Dual Attention Model (DAM). This model, named DAM-UNet, integrates a dual attention model to selectively highlight the relevant and discriminative global and local characteristics of oil spills in SAR images. It does this using a channel attention map and a position attention map. Finally, Dong et al. [43] propose the application of three deep learning-based marine oil spill detection methods, namely, a direct detection method based on transformer and UNet, a detection method based on Fast and Flexible CNN (FFDNet) and TransUNet with denoising before detection, and a detection method based on integrated multi-model learning. The performance benefits of the proposed method are then verified by comparing them with semantic segmentation models such as UNet, SegNet, and DeepLabV3+. When compared with our work, these approaches mostly require more training to obtain better accuracy as shown in Table 3.

Apart from FPGAs, other alternative hardware used to perform inference includes various Application Specific Integrated Circuits (ASICs) and neuromorphic hardware for event-based datasets. Since this study focuses on CPUs, GPUs, and FPGA, Table 4 compares results from related FPGA implementations for image segmentation using UNET or other related networks.

When performing machine learning inference on images using specialized non-reconfigurable hardware, latency and throughput can present significant challenges. FPGAs address these issues effectively because of their ability to be reconfigured and programmed with different architectures, thereby enhancing inference performance without requiring new hardware purchases. Additionally, FPGAs consume less power compared with GPUs and CPUs. These benefits make FPGAs the preferred choice for resource-intensive tasks like image segmentation, as demonstrated in this study.

9. Conclusions

In this study, we utilized the UNET and UNETR neural network architectures to perform semantic segmentation of chemical spills, leveraging two distinct datasets. This study has profound application in the real world as it can be challenging to detect and separate oil spill look-alikes from actual oil spills in the field. A notable aspect of our work is the development of reusable labeled ground truth images specifically tailored for the CSIRO dataset, a task previously unexplored. Our implementation integrates mixed precision techniques to enhance computational efficiency across both CPU and GPU platforms. Furthermore, we engineered an FPGA optimizer for the neural networks using High-Level Synthesis (HLS). Despite initial setbacks with tools like FINN and HLS4ML, we successfully devised a custom FPGA implementation using Vivado HLS. Our findings reveal a significant discrepancy in resource utilization between the UNETR and UNET models, primarily because of their divergent sizes. Consequently, the implementation of UNETR necessitates targeting alternative Pynq (software)-compatible FPGA boards boasting ample LUTs and DSP resources, such as the ZCU 102 and Alveo boards. Ultimately, our experiments demonstrate that the UNET model surpasses the UNETR model in terms of prediction accuracy on both CPU and GPU platforms. Moreover, owing to its more efficient resource utilization, the UNET model emerges as the preferred choice for this task. Finally, the results obtained from our study demonstrate improvements in inference latency on FPGA and ~94% prediction accuracy using UNET and ~77% prediction accuracy using UNETR.

Author Contributions

Conceptualization, U.L.U. and M.M.; methodology, U.L.U. and M.M.; software, U.L.U.; validation, U.L.U.; formal analysis, U.L.U.; investigation, U.L.U..; resources, U.L.U. and M.M.; writing—original draft preparation, U.L.U.; writing—review and editing, U.L.U. and M.M.; supervision, M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Endowed Chair Fund.

Data Availability Statement

Data used in this research will be available here: https://github.com/Leoudeji/OptimalNN_JLPEA, accessed on 16 March 2024.

Acknowledgments

I appreciate the support provided by my advisor during this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Friese, Z. Fight or Flight: A Story of Survival and Justice in Cancer Alley. Women Lead. Chang. 2023, 7, 3–18. [Google Scholar]
Terrell, K.A.; Julien, G.S. Discriminatory outcomes of industrial air permitting in Louisiana, United States. Environ. Chall. 2023, 10, 100672. [Google Scholar] [CrossRef]
James, W.; Jia, C.; Kedia, S. Uneven magnitude of disparities in cancer risks from air toxics. Int. J. Environ. Res. Public Health 2012, 9, 4365–4385. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Terrell, K.A.; Julien, G.S. Air pollution is linked to higher cancer rates among black or impoverished communities in Louisiana. Environ. Res. Lett. 2022, 17, 014033. [Google Scholar] [CrossRef]
Bonatesta, F.; Emadi, C.; Price, E.R.; Wang, Y.; Greer, J.B.; Xu, E.G.; Schlenk, D.; Grosell, M.; Mager, E.M. The developing zebrafish kidney is impaired by Deepwater Horizon crude oil early-life stage exposure: A molecular to whole-organism perspective. Sci. Total Environ. 2022, 808, 151988. [Google Scholar] [CrossRef] [PubMed]
Rousso, R.; Katz, N.; Sharon, G.; Glizerin, Y.; Kosman, E.; Shuster, A. Automatic Recognition of Oil Spills Using Neural Networks and Classic Image Processing. Water 2022, 14, 1127. [Google Scholar] [CrossRef]
Radeta, M.; Zuniga, A.; Motlagh, N.H.; Liyanage, M.; Freitas, R.; Youssef, M.; Tarkoma, S.; Flores, H.; Nurmi, P. Deep Learning and the Oceans. Computer 2022, 55, 39–50. [Google Scholar] [CrossRef]
Incardona, J.P.; Carls, M.G.; Teraoka, H.; Sloan, C.A.; Collier, T.K.; Scholz, N.L. Aryl hydrocarbon receptor-independent toxicity of weathered crude oil during fish development. Environ. Health Perspect. 2005, 113, 1755–1762. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything; Meta AI: New York, NY, USA, 2023. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6000–6010. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-NET: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; University of Freiburg: Freiburg, Germany, 2015. [Google Scholar] [CrossRef]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.P.; Misawa, K.; Mori, K.; McDonagh, S.G.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Štepec, D.; Martinčič, T.; Skočaj, D. Automated System for Ship Detection from Medium Resolution Satellite Optical Imagery. In Proceedings of the OCEANS 2019 MTS/IEEE SEATTLE, Seattle, WA, USA, 27–31 October 2019; pp. 1–10. [Google Scholar] [CrossRef]
Bai, P.; Yang, K.; Min, X.; Guo, Z.; Li, C.; Fu, Y.; Han, C.; Lu, X.; Liu, Q. A Novel Framework for Improving Pulse-Coupled Neural Networks with Fuzzy Connectedness for Medical Image Segmentation. IEEE Access 2020, 8, 138129–138140. [Google Scholar] [CrossRef]
Trang, K.; Nguyen, H.A.; TonThat, L.; Do, H.N.; Vuong, B.Q. An Ensemble Voting Method of Pre-Trained Deep Learning Models for Skin Disease Identification. In Proceedings of the 2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), Malang, Indonesia, 16–18 June 2022; pp. 445–450. [Google Scholar]
FINN. Available online: https://finn.readthedocs.io/en/latest/getting_started.html (accessed on 24 February 2024).
FINN. Available online: https://github.com/Xilinx/finn/blob/main/docs/finn/internals.rst (accessed on 24 February 2024).
HLS4ML. Available online: https://github.com/fastmachinelearning/hls4ml (accessed on 3 March 2024).
Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B.; Roth, H.R.; Xu, D. UNETR: Transformers for 3D Medical Image Segmentation. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 1748–1758. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beye, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the ICLR 2021, Virtual, 3–7 May 2021. [Google Scholar]
Topal, M.O.; Bas, A.; van Heerden, I. Exploring Transformers in Natural Language Generation: GPT, BERT, and XLNet. arXiv 2021, arXiv:2102.08036. [Google Scholar]
Narang, S.; Diamos, G.; Elsen, E.; Micikeviciuss, P.; Alben, J.; Garcia, D.; Ginsburg, B.; Houston, M.; Kuchaiev, O.; Venkatesh, G.; et al. Mixed Precision Training. arXiv 2018, arXiv:1710.03740. [Google Scholar]
Oil Spill detection Dataset. Available online: https://m4d.iti.gr/oil-spill-detection-dataset/ (accessed on 26 August 2023).
Krestenitis, M.; Orfanidis, G.; Ioannidis, K.; Avgerinakis, K.; Vrochidis, S.; Kompatsiaris, I. Oil Spill Identification from Satellite Images Using Deep Neural Networks. Remote Sens. 2019, 11, 1762. [Google Scholar] [CrossRef]
Krestenitis, M.; Orfanidis, G.; Ioannidis, K.; Avgerinakis, K.; Vrochidis, S.; Kompatsiaris, I. Early Identification of Oil Spills in Satellite Images Using Deep CNNs. In International Conference on Multimedia Modeling; Springer: Cham, Switzerland, 2019; pp. 424–435. [Google Scholar]
CSIRO Sentinel-1 SAR Image Dataset of Oil and Non-Oil Features. Available online: https://data.csiro.au/collection/csiro:57430 (accessed on 3 September 2023).
Blondeau-Patissier, D.; Schroeder, T.; Irving, P.; Witte, C.; Steven, A. Satellite Detection of Oil Spills in the Great Barrier Reef Using the Sentinel-1, -2 and -3 Satellite Constellations—A Technical Assessment of a Synergistic Approach Using SAR, Optical and Thermal Information; The Commonwealth Scientific and Industrial Research Organisation: Canberra, Australia, 2019. [Google Scholar]
Labkit. Available online: https://imagej.net/downloads (accessed on 20 December 2023).
Label Studio. Available online: https://labelstud.io/ (accessed on 20 December 2023).
Qupath. Available online: https://qupath.github.io/ (accessed on 20 December 2023).
Tysiąc, P.; Strelets, T.; Tuszyńska, W. The Application of Satellite Image Analysis in Oil Spill Detection. Appl. Sci. 2022, 12, 4016. [Google Scholar] [CrossRef]
Weber, E.; Papadopoulos, D.P.; Lapedriza, A.; Ofli, F.; Imran, M.; Torralba, A. Incidents1M: A Large-Scale Dataset of Images with Natural Disasters, Damage, and Incidents. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4768–4781. [Google Scholar] [CrossRef] [PubMed]
Tate, C.; Fries, D.P.; Vignati, M.; Francis, K. Using Model-Free Reinforcement Learning Combined with Underwater Mass Spectrometer and Material Archiving Coupled to Lab Analysis for Autonomous Chemical Source Verifications. In Proceedings of the OCEANS 2021: San Diego—Porto, San Diego, CA, USA, 20–23 September 2021; pp. 1–9. [Google Scholar]
Brevitas. Available online: https://xilinx.github.io/brevitas/setup.html (accessed on 24 February 2024).
ONNX Runtime. Available online: https://onnxruntime.ai/ (accessed on 24 February 2024).
Pynq Z1. Available online: https://digilent.com/shop/pynq-z1-python-productivity-for-zynq-7000-arm-fpga-soc/ (accessed on 26 February 2024).
Li, C.; Wang, M.; Yang, X.; Chu, D. DS-UNet: Dual-Stream U-Net for Oil Spill Detection of SAR Image. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4014905. [Google Scholar] [CrossRef]
Anto, A.V.M.; Eswar, B.V.; C, T.; Subash, N.; Thoufiq, K.R. Liquid Petroleum Hydrocarbon Ocean Coastal Water Pollution Identification Using Deep Neural Network. In Proceedings of the 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 17–18 March 2023; pp. 1609–1613. [Google Scholar] [CrossRef]
Fan, J.; Liu, C. Multitask GANs for Oil Spill Classification and Semantic Segmentation Based on SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 2532–2546. [Google Scholar] [CrossRef]
Kang, X.; Deng, B.; Duan, P.; Wei, X.; Li, S. Self-Supervised Spectral–Spatial Transformer Network for Hyperspectral Oil Spill Mapping. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5507410. [Google Scholar] [CrossRef]
Fan, J.; Zhang, S.; Wang, X.; Xing, J. Multifeature Semantic Complementation Network for Marine Oil Spill Localization and Segmentation Based on SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 3771–3783. [Google Scholar] [CrossRef]
Mahmoud, A.S.; Mohamed, S.A.; El-Khoriby, R.A.; AbdelSalam, H.M.; El-Khodary, I.A. Oil Spill Identification based on Dual Attention UNet Model Using Synthetic Aperture Radar Images. J. Indian Soc. Remote Sens. 2023, 51, 121–133. [Google Scholar] [CrossRef]
Dong, X.; Li, J.; Li, B.; Jin, Y.; Miao, S. Marine Oil Spill Detection from Low-Quality SAR Remote Sensing Images. J. Mar. Sci. Eng. 2023, 11, 1552. [Google Scholar] [CrossRef]
Jia, W.; Cui, J.; Zheng, X.; Wu, Q. Design and Implementation of Real-time Semantic Segmentation Network Based on FPGA. In Proceedings of the 2021 7th International Conference on Computing and Artificial Intelligence (ICCAI’21), Tianjin, China, 23–26 April 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 321–325. [Google Scholar] [CrossRef]
Basalama, S.; Sohrabizadeh, A.; Wang, J.; Guo, L.; Cong, J. FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA. ACM Trans. Reconfigurable Technol. Syst. 2023, 16, 23. [Google Scholar] [CrossRef]
Chen, J.; Wang, B.; He, S.; Xing, Q.; Su, X.; Liu, W.; Gao, G. HISP: Heterogeneous Image Signal Processor Pipeline Combining Traditional and Deep Learning Algorithms Implemented on FPGA. Electronics 2023, 12, 3525. [Google Scholar] [CrossRef]

Figure 1. RGB (Red, Green, Blue) image and dataset sample across the RGB and HSV (Hue, Saturation, Value) color spaces.

Figure 2. U-Net architecture.

Figure 3. UNETR architecture.

Figure 4. Architecture of the vision transformer.

Figure 5. Diagram showing the implementation of mixed precision using the PyTorch apex amp framework.

Figure 6. Diagram showing the concept of the mixed precision architecture.

Figure 7. Preprocessing architecture.

Figure 8. Labeling convention used to create labels for the dataset.

Figure 9. Plot of training vs. testing accuracy using the UNET model on the Oil Spill (left) and CSIRO datasets (right).

Figure 10. Plot of training vs. testing accuracy using the UNETR model on the Oil Spill (left) and CSIRO datasets (right).

Figure 11. Diagram showing the generated FPGA architecture using Vivado HLS.

Figure 12. Architectures to handle the transformer component of our UNETR model.

Figure 13. Architecture that handles the CNN components of our UNETR model.

Figure 14. Distribution of data classes within the Oil Spill dataset.

Figure 15. Confusion matrix showing the performance of our UNET model.

Figure 16. Showing the test mask and the predicted mask.

Figure 17. FPGA setup.

Table 1. Simulation results.

Study	Dataset	Training Accuracy	Testing Accuracy
Mario et al. [20]	Oil Spill dataset	96.43% (600 epoch)	-
Rousso et al. [6]	Oil Spill dataset	96.78% (491 epoch)	data
Our study (UNETR)	Oil Spill dataset	~75% (50 epoch)	~77%
Our study (UNET)	Oil Spill dataset	95.35% (50 epoch)	94.2%
Our study (UNET)	CSIRO dataset	91% (50 epoch)	87.3%

Table 2. FPGA resource usage and power consumption.

Resources	LUT	BRAM	DSP	Power (W)	Latency (s)
UNET	13,255	N/A	109	3.91	0.67
UNETR	372 k	N/A	2421	-	-

“N/A” denotes that the amount of memory used varies.

Table 3. Related studies.

Study	Dataset	Model	Train Accuracy	Test Accuracy	# of Epochs
C. Li et al. [37]	Palsar dataset Sentinel dataset	(DS-UNET)	~73%	~73%	-
A. V. Maria Anto et al. [38]	Oil Spill	CNN	-	85%	-
J. Fan and C. Liu [39]	Multiple Oil Spill datasets	MTGANS	-	97.47%	300 (classification) 50 (segmentation)
X. Kang et al. [40]	Hyperspectral Oil Spill data	SSTNet	-	95.96%	510 (pretext task) 30 (downstream task)
J. Fan et al. [41]	Oil Spill dataset (Sentinel-1)	MFSCNet	-	99.41%	100
Mahmoud, A.S. et al. [42]	EG-Oil Spill dataset	DAM-UNet	-	94.2%	300
Dong et al. [43]	Deep SAR Oil Spill Dataset (SOS dataset)	Transformer UNET FFDNET TransUNet	-	93.58%	-
This study	Oil Spill dataset and CSIRO dataset	UNET UNETR	95.35%/91% 75%	94% 77%	50 50

“N/A” denotes that the amount of memory used varies.

Table 4. Related FPGA studies.

Study	Model	LUT	BRAM	DSP	Power (W)	Latency (s)
W. Jia et al. [44]	E-Net	62,599	257	689	-	-
S. Basalama et al. [45]	Flex-CNN	-	-	-	-	0.066 (66 ms)
Chen et al. [46]	UNet	3269	6	-	71.496	15 (15,139 ms)
This study	UNET UNETR	13,255 372 k	N/A 2421	109 -	3.91 -	0.67 -

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Udeji, U.L.; Margala, M. OptimalNN: A Neural Network Architecture to Monitor Chemical Contamination in Cancer Alley. J. Low Power Electron. Appl. 2024, 14, 33. https://doi.org/10.3390/jlpea14020033

AMA Style

Udeji UL, Margala M. OptimalNN: A Neural Network Architecture to Monitor Chemical Contamination in Cancer Alley. Journal of Low Power Electronics and Applications. 2024; 14(2):33. https://doi.org/10.3390/jlpea14020033

Chicago/Turabian Style

Udeji, Uchechukwu Leo, and Martin Margala. 2024. "OptimalNN: A Neural Network Architecture to Monitor Chemical Contamination in Cancer Alley" Journal of Low Power Electronics and Applications 14, no. 2: 33. https://doi.org/10.3390/jlpea14020033

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu