Next Article in Journal
Predictive Control of Trajectory Tracking for Flapping-Wing Aircraft Based on Linear Active Disturbance Rejection
Previous Article in Journal
Characteristic Mode-Based Dual-Mode Dual-Band of Single-Feed Antenna for On-/Off-Body Communication
Previous Article in Special Issue
Recent Advances in Synthesis and Interaction of Speech, Text, and Vision
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Editorial

Recent Advances in Computer Vision: Technologies and Applications

1
School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo 255000, China
2
Nanyang Science and Technology Museum, Nanyang 473000, China
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(14), 2734; https://doi.org/10.3390/electronics13142734
Submission received: 22 May 2024 / Revised: 29 June 2024 / Accepted: 4 July 2024 / Published: 12 July 2024
(This article belongs to the Special Issue Recent Advances in Computer Vision: Technologies and Applications)

1. Introduction

Computer vision plays a pivotal role in modern society, which transforms fields such as healthcare, transportation, entertainment, and manufacturing by enabling machines to interpret and understand visual information, revolutionizing industries, and enhancing daily life. Recently, the theory and technology of computer vision have advanced significantly, driven by the rapid growth in computing power and intelligent learning algorithms. This progress has led to substantial achievements in various fields, including object detection and tracking, image analysis and understanding, object recognition, and smart cities. Despite the remarkable progress in computer vision, several challenges persist. One of the primary challenges is the robustness and generalization of models, particularly in complex and dynamic environments. Current models often struggle with handling occlusions, variations in lighting conditions, and interpreting context in ambiguous situations. Additionally, the interpretability and explainability of deep learning models in computer vision remain areas of active research, as understanding model decisions is crucial for building trust and ensuring ethical deployment. The future of computer vision lies in developing multi-modal, large-scale models that can effectively integrate information from multiple sources such as images, text, and audio. This trend towards multimodal approaches will enable a more comprehensive understanding and interpretation of visual content, which will help improve performance in high-level vision tasks such as image captioning, visual question answering, and image synthesis. Furthermore, advancements in areas such as self-supervised learning, few-shot learning, and meta-learning are expected to enhance the capabilities of computer vision systems further, making them more adaptable, efficient, and intelligent in diverse real-world scenarios.
Therefore, this Special Issue aims to highlight cutting-edge research and innovative approaches in computer vision, including but not limited to multimodal learning, large-scale model development, and novel applications in various domains. This Special Issue seeks to foster collaboration, inspire new ideas, and drive the future development of computer vision toward more robust, versatile, and impactful solutions by bringing together researchers and experts in the field.

2. The Present Issue

This Special Issue includes 10 papers, of which 8 present novel studies and approaches to resolve issues in computer vision and 2 reviews that show a brief overview of image segmentation and multi-modal information fusion. A brief description of these 10 papers is provided below.
Liu et al. (contribution 1) highlighted that 2D convolution is more computationally efficient and faster by comparing 2D and 3D convolution in stereo matching. However, 2D convolution intends to lack rich semantic information, which impacts the robustness and accuracy of disparity maps under different illumination conditions. To address this issue, they introduced a multi-scale adaptive cost attention and adaptive fusion stereo-matching network (MCAFNet) to enrich cost volume information. The experimental results show that the proposed MCAFNet achieves a lower error-matching rate.
Zhao et al. (contribution 2) proposed a robust method to enhance low-light images, especially in extremely dark scenes. The approach involves multi-exposure sequencing, fusion using luminance and gradient weights, and noise suppression with a bootstrap filter. Evaluation using various metrics demonstrates superior accuracy and robustness compared to conventional algorithms.
Jocovic et al. (contribution 3) examined and classified the questions used in tests, assessed current automated test grading systems, and developed an AI-based automated test grading system that outperformed existing tools. They comprehensively evaluated the system across various digitally formatted paper tests and proved that the system achieved a satisfactory recognition success rate.
Hachaj et al. (contribution 4) introduced the Frangi neuron for high-level Hessian-based image processing, which can be trained with minimal data. Their experiments demonstrated its effectiveness in image segmentation, showing that it outperforms the U-net on medical datasets. Additionally, the Frangi network exhibited superior performance in ROC AUC compared to both U-net and the Frangi algorithm, while being significantly faster than non-GPU implementations. This neuron can also be integrated into other networks to detect specific second-order features in two-dimensional images.
Yu et al. (contribution 5) integrated virtual simulation technology into fashion talent development by creating a platform for virtual clothing design. The platform employs three-dimensional models to simulate clothing structure changes on human body models, aiding designers in creating realistic designs. Furthermore, they developed a 3D scanning garment generation and editing system and algorithms for three-dimensional garment simulation and offline point cloud generation. These advancements collectively enhance the design process, which makes it more efficient and accurate.
Wang et al. (contribution 6) proposed the Hierarchical PPCA method for learning from datasets with numerous classes without the need for hierarchical annotation. This method models image classes as Gaussian distributions using Probabilistic PCA, which significantly reduces per-observation classification time. It also includes an efficient training procedure that clusters image classes instead of features, making it suitable for unbalanced data. Experimental results demonstrate that the method can speed up classification without sacrificing accuracy, which shows great promise for large-scale image classification tasks.
Chen et al. (contribution 7) introduced a GMIW-Pose for estimating relative camera poses between two views. It combines global matching and the weighted eight-point algorithm to address scale ambiguity challenges in posing regression-based methods. Meanwhile, it incorporated the Transformer to enhance matching robustness in complex scenes. Finally, it proposed a weight-updating module based on ConvGRU to improve matching weights. Evaluation of the TartanAir and KITTI datasets demonstrated the effectiveness of the GMIW-Pose.
Wang et al. (contribution 8) proposed the Adaptive Self-Distillation (ASD) module to address the intra-class gap in Few-Shot Segmentation (FSS). Simultaneously, it introduced the Self-Support Background Prototype (SBP) module to learn feature comparison between irrelevant class prototypes and query features, which mitigates the impact of background features on the teacher prototypes.
Yu et al. (contribution 9) systematically reviewed the advancement of image segmentation methods, which focused on three important stages: classic segmentation, collaborative segmentation, and semantic segmentation based on deep learning. They elaborated on the main algorithms and key techniques in each stage, comparing and summarizing the advantages and drawbacks of different segmentation models. Ultimately, they discussed the applicability of these models and analyzed the main challenges and development trends in image segmentation techniques.
Orynbay et al. (contribution 10) presented a comprehensive review of image-to-audio conversion techniques, a field merging computer vision (CV) and natural language processing (NLP). They explored tasks such as generating textual descriptions of images and transforming them into auditory representations and synthesizing speech from text. The review covered a range of approaches, from basic encoder-decoder architectures to advanced methods like transformers. Additionally, they discussed the creation of visual content from language descriptions and emphasized the critical role of datasets. Finally, the review highlights future implications, including the potential for more natural audio descriptions and intuitive auditory representations of visual content through direct image-to-speech tasks.

3. Future Prospects

These studies highlight the importance of computer vision in addressing a wide range of challenges, from high-level to low-level visual tasks, and from single-modal to multi-modal tasks. Researchers are continually pushing the boundaries of computer vision by exploring large multimodal models to enhance performance and generalization. The field will increasingly focus on developing multimodal approaches, particularly in enabling effective information exchange between images and text and leveraging large pre-trained models to advance computer vision tasks further. Furthermore, with ongoing technological advancements, we anticipate accelerating interdisciplinary, multi-modal and large-model research to tackle complex real-world problems.

Acknowledgments

We thank all the authors for submitting their work to this Special Issue. We also thank all the reviewer who helped improve the quality of the papers through their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

List of Contributions

1.
Liu, Z.; Li, Z.; Ao, W.; Zhang, S.; Liu, W.; He, Y. Multi-Scale Cost Attention and Adaptive Fusion Stereo Matching Network. Electronics 2023, 12, 1594.
2.
Zhao, W.; Jiang, C.; An, Y.; Yan, X.; Dai, C. Study on a Low-Illumination Enhancement Method for Online Monitoring Images Considering Multiple-Exposure Image Sequence Fusion. Electronics 2023, 12, 2654.
3.
Jocovic, V.; Nikolic, B.; Bacanin, N. Software System for Automatic Grading of Paper Tests. Electronics 2023, 12, 4080.
4.
Hachaj, T.; Piekarczyk, M. High-Level Hessian-Based Image Processing with the Frangi Neuron. Electronics 2023, 12, 4159.
5.
Yu, Q.; Zhu, G. Digital Restoration and 3D Virtual Space Display of Hakka Cardigan Based on Optimization of Numerical Algorithm. Electronics 2023, 12, 4190.
6.
Wang, B.; Barbu, A. Hierarchical Classification for Large-Scale Learning. Electronics 2023, 12, 4646.
7.
Chen, F.; Wu, Y.; Liao, T.; Zeng, H.; Ouyang, S.; Guan, J. GMIW-Pose: Camera Pose Estimation via Global Matching and Iterative Weighted Eight-Point Algorithm. Electronics 2023, 12, 4689.
8.
Wang, J.Y.; Liu, S.K.; Guo, S.C.; Jiang, C.Y.; Zheng, W.M. PCNet: Leveraging Prototype Complementarity to Improve Prototype Affinity for Few-Shot Segmentation. Electronics 2023, 13, 142.
9.
Yu, Y.; Wang, C.; Fu, Q.; Kou, R.; Huang, F.; Yang, B.; Yang, T.; Gao, M. Techniques and challenges of image segmentation: A review. Electronics 2023, 12, 1199.
10.
Orynbay, L.; Razakhova, B.; Peer, P.; Meden, B.; Emeršič, Ž. Recent Advances in Synthesis and Interaction of Speech, Text, and Vision. Electronics 2024, 13, 1726.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, M.; Zou, G.; Li, Y.; Guo, X. Recent Advances in Computer Vision: Technologies and Applications. Electronics 2024, 13, 2734. https://doi.org/10.3390/electronics13142734

AMA Style

Gao M, Zou G, Li Y, Guo X. Recent Advances in Computer Vision: Technologies and Applications. Electronics. 2024; 13(14):2734. https://doi.org/10.3390/electronics13142734

Chicago/Turabian Style

Gao, Mingliang, Guofeng Zou, Yun Li, and Xiangyu Guo. 2024. "Recent Advances in Computer Vision: Technologies and Applications" Electronics 13, no. 14: 2734. https://doi.org/10.3390/electronics13142734

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop