subscribe to arXiv mailings

Evaluating AI Group Fairness: a Fuzzy Logic Perspective

Authors: Emmanouil Krasanakis, Symeon Papadopoulos

Abstract: Artificial intelligence systems often address fairness concerns by evaluating and mitigating measures of group discrimination, for example that indicate biases against certain genders or races. However, what constitutes group fairness depends on who is asked and the social context, whereas definitions are often relaxed to accept small deviations from the statistical constraints they set out to imp… ▽ More Artificial intelligence systems often address fairness concerns by evaluating and mitigating measures of group discrimination, for example that indicate biases against certain genders or races. However, what constitutes group fairness depends on who is asked and the social context, whereas definitions are often relaxed to accept small deviations from the statistical constraints they set out to impose. Here we decouple definitions of group fairness both from the context and from relaxation-related uncertainty by expressing them in the axiomatic system of Basic fuzzy Logic (BL) with loosely understood predicates, like encountering group members. We then evaluate the definitions in subclasses of BL, such as Product or Lukasiewicz logics. Evaluation produces continuous instead of binary truth values by choosing the logic subclass and truth values for predicates that reflect uncertain context-specific beliefs, such as stakeholder opinions gathered through questionnaires. Internally, it follows logic-specific rules to compute the truth values of definitions. We show that commonly held propositions standardize the resulting mathematical formulas and we transcribe logic and truth value choices to layperson terms, so that anyone can answer them. We also use our framework to study several literature definitions of algorithmic fairness, for which we rationalize previous expedient practices that are non-probabilistic and show how to re-interpret their formulas and parameters in new contexts. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: preprint, 32 pages, 7 figures, 2 theorems, 6 appendices

arXiv:2405.19022 [pdf, other]

Towards Standardizing AI Bias Exploration

Authors: Emmanouil Krasanakis, Symeon Papadopoulos

Abstract: Creating fair AI systems is a complex problem that involves the assessment of context-dependent bias concerns. Existing research and programming libraries express specific concerns as measures of bias that they aim to constrain or mitigate. In practice, one should explore a wide variety of (sometimes incompatible) measures before deciding which ones warrant corrective action, but their narrow scop… ▽ More Creating fair AI systems is a complex problem that involves the assessment of context-dependent bias concerns. Existing research and programming libraries express specific concerns as measures of bias that they aim to constrain or mitigate. In practice, one should explore a wide variety of (sometimes incompatible) measures before deciding which ones warrant corrective action, but their narrow scope means that most new situations can only be examined after devising new measures. In this work, we present a mathematical framework that distils literature measures of bias into building blocks, hereby facilitating new combinations to cover a wide range of fairness concerns, such as classification or recommendation differences across multiple multi-value sensitive attributes (e.g., many genders and races, and their intersections). We show how this framework generalizes existing concepts and present frequently used blocks. We provide an open-source implementation of our framework as a Python library, called FairBench, that facilitates systematic and extensible exploration of potential bias concerns. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Workshop on AI bias: Measurements, Mitigation, Explanation Strategies (AIMMES 2024)

arXiv:2405.11320 [pdf, other]

Sampling Strategies for Mitigating Bias in Face Synthesis Methods

Authors: Emmanouil Maragkoudakis, Symeon Papadopoulos, Iraklis Varlamis, Christos Diou

Abstract: Synthetically generated images can be used to create media content or to complement datasets for training image analysis models. Several methods have recently been proposed for the synthesis of high-fidelity face images; however, the potential biases introduced by such methods have not been sufficiently addressed. This paper examines the bias introduced by the widely popular StyleGAN2 generative m… ▽ More Synthetically generated images can be used to create media content or to complement datasets for training image analysis models. Several methods have recently been proposed for the synthesis of high-fidelity face images; however, the potential biases introduced by such methods have not been sufficiently addressed. This paper examines the bias introduced by the widely popular StyleGAN2 generative model trained on the Flickr Faces HQ dataset and proposes two sampling strategies to balance the representation of selected attributes in the generated face images. We focus on two protected attributes, gender and age, and reveal that biases arise in the distribution of randomly sampled images against very young and very old age groups, as well as against female faces. These biases are also assessed for different image quality levels based on the GIQA score. To mitigate bias, we propose two alternative methods for sampling on selected lines or spheres of the latent space to increase the number of generated samples from the under-represented classes. The experimental results show a decrease in bias against underrepresented groups and a more uniform distribution of the protected features at different levels of image quality. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: Accepted to the BIAS 2023 ECML-PKDD Workshop

MSC Class: 68T99 ACM Class: I.2; I.5

arXiv:2404.18971 [pdf, other]

doi 10.1145/3643491.3660278

Credible, Unreliable or Leaked?: Evidence Verification for Enhanced Automated Fact-checking

Authors: Zacharias Chrysidis, Stefanos-Iordanis Papadopoulos, Symeon Papadopoulos, Panagiotis C. Petrantonakis

Abstract: Automated fact-checking (AFC) is garnering increasing attention by researchers aiming to help fact-checkers combat the increasing spread of misinformation online. While many existing AFC methods incorporate external information from the Web to help examine the veracity of claims, they often overlook the importance of verifying the source and quality of collected "evidence". One overlooked challeng… ▽ More Automated fact-checking (AFC) is garnering increasing attention by researchers aiming to help fact-checkers combat the increasing spread of misinformation online. While many existing AFC methods incorporate external information from the Web to help examine the veracity of claims, they often overlook the importance of verifying the source and quality of collected "evidence". One overlooked challenge involves the reliance on "leaked evidence", information gathered directly from fact-checking websites and used to train AFC systems, resulting in an unrealistic setting for early misinformation detection. Similarly, the inclusion of information from unreliable sources can undermine the effectiveness of AFC systems. To address these challenges, we present a comprehensive approach to evidence verification and filtering. We create the "CREDible, Unreliable or LEaked" (CREDULE) dataset, which consists of 91,632 articles classified as Credible, Unreliable and Fact checked (Leaked). Additionally, we introduce the EVidence VERification Network (EVVER-Net), trained on CREDULE to detect leaked and unreliable evidence in both short and long texts. EVVER-Net can be used to filter evidence collected from the Web, thus enhancing the robustness of end-to-end AFC systems. We experiment with various language models and show that EVVER-Net can demonstrate impressive performance of up to 91.5% and 94.4% accuracy, while leveraging domain credibility scores along with short or long texts, respectively. Finally, we assess the evidence provided by widely-used fact-checking datasets including LIAR-PLUS, MOCHEG, FACTIFY, NewsCLIPpings+ and VERITE, some of which exhibit concerning rates of leaked and unreliable evidence. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18649 [pdf, other]

Towards Quantitative Evaluation of Explainable AI Methods for Deepfake Detection

Authors: Konstantinos Tsigos, Evlampios Apostolidis, Spyridon Baxevanakis, Symeon Papadopoulos, Vasileios Mezaris

Abstract: In this paper we propose a new framework for evaluating the performance of explanation methods on the decisions of a deepfake detector. This framework assesses the ability of an explanation method to spot the regions of a fake image with the biggest influence on the decision of the deepfake detector, by examining the extent to which these regions can be modified through a set of adversarial attack… ▽ More In this paper we propose a new framework for evaluating the performance of explanation methods on the decisions of a deepfake detector. This framework assesses the ability of an explanation method to spot the regions of a fake image with the biggest influence on the decision of the deepfake detector, by examining the extent to which these regions can be modified through a set of adversarial attacks, in order to flip the detector's prediction or reduce its initial prediction; we anticipate a larger drop in deepfake detection accuracy and prediction, for methods that spot these regions more accurately. Based on this framework, we conduct a comparative study using a state-of-the-art model for deepfake detection that has been trained on the FaceForensics++ dataset, and five explanation methods from the literature. The findings of our quantitative and qualitative evaluations document the advanced performance of the LIME explanation method against the other compared ones, and indicate this method as the most appropriate for explaining the decisions of the utilized deepfake detector. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Accepted for publication, 3rd ACM Int. Workshop on Multimedia AI against Disinformation (MAD'24) at ACM ICMR'24, June 10, 2024, Phuket, Thailand. This is the "accepted version"

arXiv:2404.18552 [pdf, other]

SIDBench: A Python Framework for Reliably Assessing Synthetic Image Detection Methods

Authors: Manos Schinas, Symeon Papadopoulos

Abstract: The generative AI technology offers an increasing variety of tools for generating entirely synthetic images that are increasingly indistinguishable from real ones. Unlike methods that alter portions of an image, the creation of completely synthetic images presents a unique challenge and several Synthetic Image Detection (SID) methods have recently appeared to tackle it. Yet, there is often a large… ▽ More The generative AI technology offers an increasing variety of tools for generating entirely synthetic images that are increasingly indistinguishable from real ones. Unlike methods that alter portions of an image, the creation of completely synthetic images presents a unique challenge and several Synthetic Image Detection (SID) methods have recently appeared to tackle it. Yet, there is often a large gap between experimental results on benchmark datasets and the performance of methods in the wild. To better address the evaluation needs of SID and help close this gap, this paper introduces a benchmarking framework that integrates several state-of-the-art SID models. Our selection of integrated models was based on the utilization of varied input features, and different network architectures, aiming to encompass a broad spectrum of techniques. The framework leverages recent datasets with a diverse set of generative models, high level of photo-realism and resolution, reflecting the rapid improvements in image synthesis technology. Additionally, the framework enables the study of how image transformations, common in assets shared online, such as JPEG compression, affect detection performance. SIDBench is available on https://github.com/mever-team/sidbench and is designed in a modular manner to enable easy inclusion of new datasets and SID models. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.17255 [pdf, other]

SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes

Authors: Georgia Baltsou, Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos

Abstract: AI systems rely on extensive training on large datasets to address various tasks. However, image-based systems, particularly those used for demographic attribute prediction, face significant challenges. Many current face image datasets primarily focus on demographic factors such as age, gender, and skin tone, overlooking other crucial facial attributes like hairstyle and accessories. This narrow f… ▽ More AI systems rely on extensive training on large datasets to address various tasks. However, image-based systems, particularly those used for demographic attribute prediction, face significant challenges. Many current face image datasets primarily focus on demographic factors such as age, gender, and skin tone, overlooking other crucial facial attributes like hairstyle and accessories. This narrow focus limits the diversity of the data and consequently the robustness of AI systems trained on them. This work aims to address this limitation by proposing a methodology for generating synthetic face image datasets that capture a broader spectrum of facial diversity. Specifically, our approach integrates a systematic prompt formulation strategy, encompassing not only demographics and biometrics but also non-permanent traits like make-up, hairstyle, and accessories. These prompts guide a state-of-the-art text-to-image model in generating a comprehensive dataset of high-quality realistic images and can be used as an evaluation set in face analysis systems. Compared to existing datasets, our proposed dataset proves equally or more challenging in image classification tasks while being much smaller in size. △ Less

Submitted 29 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.15385 [pdf, ps, other]

Sum of Group Error Differences: A Critical Examination of Bias Evaluation in Biometric Verification and a Dual-Metric Measure

Authors: Alaa Elobaid, Nathan Ramoly, Lara Younes, Symeon Papadopoulos, Eirini Ntoutsi, Ioannis Kompatsiaris

Abstract: Biometric Verification (BV) systems often exhibit accuracy disparities across different demographic groups, leading to biases in BV applications. Assessing and quantifying these biases is essential for ensuring the fairness of BV systems. However, existing bias evaluation metrics in BV have limitations, such as focusing exclusively on match or non-match error rates, overlooking bias on demographic… ▽ More Biometric Verification (BV) systems often exhibit accuracy disparities across different demographic groups, leading to biases in BV applications. Assessing and quantifying these biases is essential for ensuring the fairness of BV systems. However, existing bias evaluation metrics in BV have limitations, such as focusing exclusively on match or non-match error rates, overlooking bias on demographic groups with performance levels falling between the best and worst performance levels, and neglecting the magnitude of the bias present. This paper presents an in-depth analysis of the limitations of current bias evaluation metrics in BV and, through experimental analysis, demonstrates their contextual suitability, merits, and limitations. Additionally, it introduces a novel general-purpose bias evaluation measure for BV, the ``Sum of Group Error Differences (SEDG)''. Our experimental results on controlled synthetic datasets demonstrate the effectiveness of demographic bias quantification when using existing metrics and our own proposed measure. We discuss the applicability of the bias evaluation metrics in a set of simulated demographic bias scenarios and provide scenario-based metric recommendations. Our code is publicly available under \url{https://github.com/alaaobeid/SEDG}. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2403.12229 [pdf, other]

Fusion Transformer with Object Mask Guidance for Image Forgery Analysis

Authors: Dimitrios Karageorgiou, Giorgos Kordopatis-Zilos, Symeon Papadopoulos

Abstract: In this work, we introduce OMG-Fuser, a fusion transformer-based network designed to extract information from various forensic signals to enable robust image forgery detection and localization. Our approach can operate with an arbitrary number of forensic signals and leverages object information for their analysis -- unlike previous methods that rely on fusion schemes with few signals and often di… ▽ More In this work, we introduce OMG-Fuser, a fusion transformer-based network designed to extract information from various forensic signals to enable robust image forgery detection and localization. Our approach can operate with an arbitrary number of forensic signals and leverages object information for their analysis -- unlike previous methods that rely on fusion schemes with few signals and often disregard image semantics. To this end, we design a forensic signal stream composed of a transformer guided by an object attention mechanism, associating patches that depict the same objects. In that way, we incorporate object-level information from the image. Each forensic signal is processed by a different stream that adapts to its peculiarities. A token fusion transformer efficiently aggregates the outputs of an arbitrary number of network streams and generates a fused representation for each image patch. We assess two fusion variants on top of the proposed approach: (i) score-level fusion that fuses the outputs of multiple image forensics algorithms and (ii) feature-level fusion that fuses low-level forensic traces directly. Both variants exceed state-of-the-art performance on seven datasets for image forgery detection and localization, with a relative average improvement of 12.1% and 20.4% in terms of F1. Our model is robust against traditional and novel forgery attacks and can be expanded with new signals without training from scratch. Our code is publicly available at: https://github.com/mever-team/omgfuser △ Less

Submitted 27 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2402.19091 [pdf, other]

Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection

Authors: Christos Koutlis, Symeon Papadopoulos

Abstract: The recently developed and publicly available synthetic image generation methods and services make it possible to create extremely realistic imagery on demand, raising great risks for the integrity and safety of online information. State-of-the-art Synthetic Image Detection (SID) research has led to strong evidence on the advantages of feature extraction from foundation models. However, such extra… ▽ More The recently developed and publicly available synthetic image generation methods and services make it possible to create extremely realistic imagery on demand, raising great risks for the integrity and safety of online information. State-of-the-art Synthetic Image Detection (SID) research has led to strong evidence on the advantages of feature extraction from foundation models. However, such extracted features mostly encapsulate high-level visual semantics instead of fine-grained details, which are more important for the SID task. On the contrary, shallow layers encode low-level visual information. In this work, we leverage the image representations extracted by intermediate Transformer blocks of CLIP's image-encoder via a lightweight network that maps them to a learnable forgery-aware vector space capable of generalizing exceptionally well. We also employ a trainable module to incorporate the importance of each Transformer block to the final prediction. Our method is compared against the state-of-the-art by evaluating it on 20 test datasets and exhibits an average +10.6% absolute performance improvement. Notably, the best performing models require just a single epoch for training (~8 minutes). Code available at https://github.com/mever-team/rine. △ Less

Submitted 8 July, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: ECCV 2024

arXiv:2311.10476 [pdf, other]

FRCSyn Challenge at WACV 2024:Face Recognition Challenge in the Era of Synthetic Data

Authors: Pietro Melzi, Ruben Tolosana, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Ivan DeAndres-Tame, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Weisong Zhao, Xiangyu Zhu, Zheyu Yan, Xiao-Yu Zhang, Jinlin Wu, Zhen Lei, Suvidha Tripathi, Mahak Kothari, Md Haider Zama, Debayan Deb, Bernardo Biesseck, Pedro Vidal, Roger Granada, Guilherme Fickel, Gustavo Führ , et al. (22 additional authors not shown)

Abstract: Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are still several challenges that must be covered in more detail. This paper offers an overview of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first international challenge aiming to explore the use… ▽ More Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are still several challenges that must be covered in more detail. This paper offers an overview of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first international challenge aiming to explore the use of synthetic data in face recognition to address existing limitations in the technology. Specifically, the FRCSyn Challenge targets concerns related to data privacy issues, demographic biases, generalization to unseen scenarios, and performance limitations in challenging scenarios, including significant age disparities between enrollment and testing, pose variations, and occlusions. The results achieved in the FRCSyn Challenge, together with the proposed benchmark, contribute significantly to the application of synthetic data to improve face recognition technology. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 10 pages, 1 figure, WACV 2024 Workshops

arXiv:2311.09939 [pdf, other]

RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection

Authors: Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, Panagiotis C. Petrantonakis

Abstract: Online misinformation is often multimodal in nature, i.e., it is caused by misleading associations between texts and accompanying images. To support the fact-checking process, researchers have been recently developing automatic multimodal methods that gather and analyze external information, evidence, related to the image-text pairs under examination. However, prior works assumed all external info… ▽ More Online misinformation is often multimodal in nature, i.e., it is caused by misleading associations between texts and accompanying images. To support the fact-checking process, researchers have been recently developing automatic multimodal methods that gather and analyze external information, evidence, related to the image-text pairs under examination. However, prior works assumed all external information collected from the web to be relevant. In this study, we introduce a "Relevant Evidence Detection" (RED) module to discern whether each piece of evidence is relevant, to support or refute the claim. Specifically, we develop the "Relevant Evidence Detection Directed Transformer" (RED-DOT) and explore multiple architectural variants (e.g., single or dual-stage) and mechanisms (e.g., "guided attention"). Extensive ablation and comparative experiments demonstrate that RED-DOT achieves significant improvements over the state-of-the-art (SotA) on the VERITE benchmark by up to 33.7%. Furthermore, our evidence re-ranking and element-wise modality fusion led to RED-DOT surpassing the SotA on NewsCLIPings+ by up to 3% without the need for numerous evidence or multiple backbone encoders. We release our code at: https://github.com/stevejpapad/relevant-evidence-detection △ Less

Submitted 7 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.13746 [pdf, other]

FairBranch: Fairness Conflict Correction on Task-group Branches for Fair Multi-Task Learning

Authors: Arjun Roy, Christos Koutlis, Symeon Papadopoulos, Eirini Ntoutsi

Abstract: The generalization capacity of Multi-Task Learning (MTL) becomes limited when unrelated tasks negatively impact each other by updating shared parameters with conflicting gradients, resulting in negative transfer and a reduction in MTL accuracy compared to single-task learning (STL). Recently, there has been an increasing focus on the fairness of MTL models, necessitating the optimization of both a… ▽ More The generalization capacity of Multi-Task Learning (MTL) becomes limited when unrelated tasks negatively impact each other by updating shared parameters with conflicting gradients, resulting in negative transfer and a reduction in MTL accuracy compared to single-task learning (STL). Recently, there has been an increasing focus on the fairness of MTL models, necessitating the optimization of both accuracy and fairness for individual tasks. Similarly to how negative transfer affects accuracy, task-specific fairness considerations can adversely influence the fairness of other tasks when there is a conflict of fairness loss gradients among jointly learned tasks, termed bias transfer. To address both negative and bias transfer in MTL, we introduce a novel method called FairBranch. FairBranch branches the MTL model by assessing the similarity of learned parameters, grouping related tasks to mitigate negative transfer. Additionally, it incorporates fairness loss gradient conflict correction between adjoining task-group branches to address bias transfer within these task groups. Our experiments in tabular and visual MTL problems demonstrate that FairBranch surpasses state-of-the-art MTL methods in terms of both fairness and accuracy. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2307.10334 [pdf, other]

Mitigating Viewer Impact from Disturbing Imagery using AI Filters: A User-Study

Authors: Ioannis Sarridis, Jochen Spangenberg, Olga Papadopoulou, Symeon Papadopoulos

Abstract: Exposure to disturbing imagery can significantly impact individuals, especially professionals who encounter such content as part of their work. This paper presents a user study, involving 107 participants, predominantly journalists and human rights investigators, that explores the capability of Artificial Intelligence (AI)-based image filters to potentially mitigate the emotional impact of viewing… ▽ More Exposure to disturbing imagery can significantly impact individuals, especially professionals who encounter such content as part of their work. This paper presents a user study, involving 107 participants, predominantly journalists and human rights investigators, that explores the capability of Artificial Intelligence (AI)-based image filters to potentially mitigate the emotional impact of viewing such disturbing content. We tested five different filter styles, both traditional (Blurring and Partial Blurring) and AI-based (Drawing, Colored Drawing, and Painting), and measured their effectiveness in terms of conveying image information while reducing emotional distress. Our findings suggest that the AI-based Drawing style filter demonstrates the best performance, offering a promising solution for reducing negative feelings (-30.38%) while preserving the interpretability of the image (97.19%). Despite the requirement for many professionals to eventually inspect the original images, participants suggested potential strategies for integrating AI filters into their workflow, such as using AI filters as an initial, preparatory step before viewing the original image. Overall, this paper contributes to the development of a more ethically considerate and effective visual environment for professionals routinely engaging with potentially disturbing imagery. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.10011 [pdf, other]

Towards Fair Face Verification: An In-depth Analysis of Demographic Biases

Authors: Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou

Abstract: Deep learning-based person identification and verification systems have remarkably improved in terms of accuracy in recent years; however, such systems, including widely popular cloud-based solutions, have been found to exhibit significant biases related to race, age, and gender, a problem that requires in-depth exploration and solutions. This paper presents an in-depth analysis, with a particular… ▽ More Deep learning-based person identification and verification systems have remarkably improved in terms of accuracy in recent years; however, such systems, including widely popular cloud-based solutions, have been found to exhibit significant biases related to race, age, and gender, a problem that requires in-depth exploration and solutions. This paper presents an in-depth analysis, with a particular emphasis on the intersectionality of these demographic factors. Intersectional bias refers to the performance discrepancies w.r.t. the different combinations of race, age, and gender groups, an area relatively unexplored in current literature. Furthermore, the reliance of most state-of-the-art approaches on accuracy as the principal evaluation metric often masks significant demographic disparities in performance. To counter this crucial limitation, we incorporate five additional metrics in our quantitative analysis, including disparate impact and mistreatment metrics, which are typically ignored by the relevant fairness-aware approaches. Results on the Racial Faces in-the-Wild (RFW) benchmark indicate pervasive biases in face recognition systems, extending beyond race, with different demographic factors yielding significantly disparate outcomes. In particular, Africans demonstrate an 11.25% lower True Positive Rate (TPR) compared to Caucasians, while only a 3.51% accuracy drop is observed. Even more concerning, the intersections of multiple protected groups, such as African females over 60 years old, demonstrate a +39.89% disparate mistreatment rate compared to the highest Caucasians rate. By shedding light on these biases and their implications, this paper aims to stimulate further research towards developing fairer, more equitable face recognition and verification systems. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2306.09489 [pdf, other]

The 2023 Video Similarity Dataset and Challenge

Authors: Ed Pizzi, Giorgos Kordopatis-Zilos, Hiral Patel, Gheorghe Postelnicu, Sugosh Nagavara Ravindra, Akshay Gupta, Symeon Papadopoulos, Giorgos Tolias, Matthijs Douze

Abstract: This work introduces a dataset, benchmark, and challenge for the problem of video copy detection and localization. The problem comprises two distinct but related tasks: determining whether a query video shares content with a reference video ("detection"), and additionally temporally localizing the shared content within each video ("localization"). The benchmark is designed to evaluate methods on t… ▽ More This work introduces a dataset, benchmark, and challenge for the problem of video copy detection and localization. The problem comprises two distinct but related tasks: determining whether a query video shares content with a reference video ("detection"), and additionally temporally localizing the shared content within each video ("localization"). The benchmark is designed to evaluate methods on these two tasks, and simulates a realistic needle-in-haystack setting, where the majority of both query and reference videos are "distractors" containing no copied content. We propose a metric that reflects both detection and localization accuracy. The associated challenge consists of two corresponding tracks, each with restrictions that reflect real-world settings. We provide implementation code for evaluation and baselines. We also analyze the results and methods of the top submissions to the challenge. The dataset, baseline methods and evaluation code is publicly available and will be discussed at a dedicated CVPR'23 workshop. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2304.14252 [pdf, other]

FLAC: Fairness-Aware Representation Learning by Suppressing Attribute-Class Associations

Authors: Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou

Abstract: Bias in computer vision systems can perpetuate or even amplify discrimination against certain populations. Considering that bias is often introduced by biased visual datasets, many recent research efforts focus on training fair models using such data. However, most of them heavily rely on the availability of protected attribute labels in the dataset, which limits their applicability, while label-u… ▽ More Bias in computer vision systems can perpetuate or even amplify discrimination against certain populations. Considering that bias is often introduced by biased visual datasets, many recent research efforts focus on training fair models using such data. However, most of them heavily rely on the availability of protected attribute labels in the dataset, which limits their applicability, while label-unaware approaches, i.e., approaches operating without such labels, exhibit considerably lower performance. To overcome these limitations, this work introduces FLAC, a methodology that minimizes mutual information between the features extracted by the model and a protected attribute, without the use of attribute labels. To do that, FLAC proposes a sampling strategy that highlights underrepresented samples in the dataset, and casts the problem of learning fair representations as a probability matching problem that leverages representations extracted by a bias-capturing classifier. It is theoretically shown that FLAC can indeed lead to fair representations, that are independent of the protected attributes. FLAC surpasses the current state-of-the-art on Biased MNIST, CelebA, and UTKFace, by 29.1%, 18.1%, and 21.9%, respectively. Additionally, FLAC exhibits 2.2% increased accuracy on ImageNet-A consisting of the most challenging samples of ImageNet. Finally, in most experiments, FLAC even outperforms the bias label-aware state-of-the-art methods. △ Less

Submitted 27 April, 2023; originally announced April 2023.

arXiv:2304.14133 [pdf, other]

VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias

Authors: Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, Panagiotis C. Petrantonakis

Abstract: Multimedia content has become ubiquitous on social media platforms, leading to the rise of multimodal misinformation (MM) and the urgent need for effective strategies to detect and prevent its spread. In recent years, the challenge of multimodal misinformation detection (MMD) has garnered significant attention by researchers and has mainly involved the creation of annotated, weakly annotated, or s… ▽ More Multimedia content has become ubiquitous on social media platforms, leading to the rise of multimodal misinformation (MM) and the urgent need for effective strategies to detect and prevent its spread. In recent years, the challenge of multimodal misinformation detection (MMD) has garnered significant attention by researchers and has mainly involved the creation of annotated, weakly annotated, or synthetically generated training datasets, along with the development of various deep learning MMD models. However, the problem of unimodal bias has been overlooked, where specific patterns and biases in MMD benchmarks can result in biased or unimodal models outperforming their multimodal counterparts on an inherently multimodal task; making it difficult to assess progress. In this study, we systematically investigate and identify the presence of unimodal bias in widely-used MMD benchmarks, namely VMU-Twitter and COSMOS. To address this issue, we introduce the "VERification of Image-TExt pairs" (VERITE) benchmark for MMD which incorporates real-world data, excludes "asymmetric multimodal misinformation" and utilizes "modality balancing". We conduct an extensive comparative study with a Transformer-based architecture that shows the ability of VERITE to effectively address unimodal bias, rendering it a robust evaluation framework for MMD. Furthermore, we introduce a new method -- termed Crossmodal HArd Synthetic MisAlignment (CHASMA) -- for generating realistic synthetic training data that preserve crossmodal relations between legitimate images and false human-written captions. By leveraging CHASMA in the training process, we observe consistent and notable improvements in predictive performance on VERITE; with a 9.2% increase in accuracy. We release our code at: https://github.com/stevejpapad/image-text-verification △ Less

Submitted 18 October, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

arXiv:2304.12053 [pdf, other]

doi 10.1145/3592572.3592846

Improving Synthetically Generated Image Detection in Cross-Concept Settings

Authors: Pantelis Dogoulis, Giorgos Kordopatis-Zilos, Ioannis Kompatsiaris, Symeon Papadopoulos

Abstract: New advancements for the detection of synthetic images are critical for fighting disinformation, as the capabilities of generative AI models continuously evolve and can lead to hyper-realistic synthetic imagery at unprecedented scale and speed. In this paper, we focus on the challenge of generalizing across different concept classes, e.g., when training a detector on human faces and testing on syn… ▽ More New advancements for the detection of synthetic images are critical for fighting disinformation, as the capabilities of generative AI models continuously evolve and can lead to hyper-realistic synthetic imagery at unprecedented scale and speed. In this paper, we focus on the challenge of generalizing across different concept classes, e.g., when training a detector on human faces and testing on synthetic animal images - highlighting the ineffectiveness of existing approaches that randomly sample generated images to train their models. By contrast, we propose an approach based on the premise that the robustness of the detector can be enhanced by training it on realistic synthetic images that are selected based on their quality scores according to a probabilistic quality estimation model. We demonstrate the effectiveness of the proposed approach by conducting experiments with generated images from two seminal architectures, StyleGAN2 and Latent Diffusion, and using three different concepts for each, so as to measure the cross-concept generalization ability. Our results show that our quality-based sampling method leads to higher detection performance for nearly all concepts, improving the overall effectiveness of the synthetic image detectors. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2304.07395 [pdf, other]

Investigation of ensemble methods for the detection of deepfake face manipulations

Authors: Nikolaos Giatsoglou, Symeon Papadopoulos, Ioannis Kompatsiaris

Abstract: The recent wave of AI research has enabled a new brand of synthetic media, called deepfakes. Deepfakes have impressive photorealism, which has generated exciting new use cases but also raised serious threats to our increasingly digital world. To mitigate these threats, researchers have tried to come up with new methods for deepfake detection that are more effective than traditional forensics and h… ▽ More The recent wave of AI research has enabled a new brand of synthetic media, called deepfakes. Deepfakes have impressive photorealism, which has generated exciting new use cases but also raised serious threats to our increasingly digital world. To mitigate these threats, researchers have tried to come up with new methods for deepfake detection that are more effective than traditional forensics and heavily rely on deep AI technology. In this paper, following up on encouraging prior work for deepfake detection with attribution and ensemble techniques, we explore and compare multiple designs for ensemble detectors. The goal is to achieve robustness and good generalization ability by leveraging ensembles of models that specialize in different manipulation categories. Our results corroborate that ensembles can achieve higher accuracy than individual models when properly tuned, while the generalization ability relies on access to a large number of training data for a diverse set of known manipulations. △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: 16 pages, 4 figures

arXiv:2304.03378 [pdf, other]

Self-Supervised Video Similarity Learning

Authors: Giorgos Kordopatis-Zilos, Giorgos Tolias, Christos Tzelepis, Ioannis Kompatsiaris, Ioannis Patras, Symeon Papadopoulos

Abstract: We introduce S$^2$VS, a video similarity learning approach with self-supervision. Self-Supervised Learning (SSL) is typically used to train deep models on a proxy task so as to have strong transferability on target tasks after fine-tuning. Here, in contrast to prior work, SSL is used to perform video similarity learning and address multiple retrieval and detection tasks at once with no use of labe… ▽ More We introduce S$^2$VS, a video similarity learning approach with self-supervision. Self-Supervised Learning (SSL) is typically used to train deep models on a proxy task so as to have strong transferability on target tasks after fine-tuning. Here, in contrast to prior work, SSL is used to perform video similarity learning and address multiple retrieval and detection tasks at once with no use of labeled data. This is achieved by learning via instance-discrimination with task-tailored augmentations and the widely used InfoNCE loss together with an additional loss operating jointly on self-similarity and hard-negative similarity. We benchmark our method on tasks where video relevance is defined with varying granularity, ranging from video copies to videos depicting the same incident or event. We learn a single universal model that achieves state-of-the-art performance on all tasks, surpassing previously proposed methods that use labeled data. The code and pretrained models are publicly available at: https://github.com/gkordo/s2vs △ Less

Submitted 16 June, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

arXiv:2304.02906 [pdf, other]

MemeFier: Dual-stage Modality Fusion for Image Meme Classification

Authors: Christos Koutlis, Manos Schinas, Symeon Papadopoulos

Abstract: Hate speech is a societal problem that has significantly grown through the Internet. New forms of digital content such as image memes have given rise to spread of hate using multimodal means, being far more difficult to analyse and detect compared to the unimodal case. Accurate automatic processing, analysis and understanding of this kind of content will facilitate the endeavor of hindering hate s… ▽ More Hate speech is a societal problem that has significantly grown through the Internet. New forms of digital content such as image memes have given rise to spread of hate using multimodal means, being far more difficult to analyse and detect compared to the unimodal case. Accurate automatic processing, analysis and understanding of this kind of content will facilitate the endeavor of hindering hate speech proliferation through the digital world. To this end, we propose MemeFier, a deep learning-based architecture for fine-grained classification of Internet image memes, utilizing a dual-stage modality fusion module. The first fusion stage produces feature vectors containing modality alignment information that captures non-trivial connections between the text and image of a meme. The second fusion stage leverages the power of a Transformer encoder to learn inter-modality correlations at the token level and yield an informative representation. Additionally, we consider external knowledge as an additional input, and background image caption supervision as a regularizing component. Extensive experiments on three widely adopted benchmarks, i.e., Facebook Hateful Memes, Memotion7k and MultiOFF, indicate that our approach competes and in some cases surpasses state-of-the-art. Our code is available on https://github.com/ckoutlis/memefier. △ Less

Submitted 7 April, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: 8 pages, 2 figures, ICMR 2023

arXiv:2303.08157 [pdf, other]

Graph Neural Network Surrogates of Fair Graph Filtering

Authors: Emmanouil Krasanakis, Symeon Papadopoulos

Abstract: Graph filters that transform prior node values to posterior scores via edge propagation often support graph mining tasks affecting humans, such as recommendation and ranking. Thus, it is important to make them fair in terms of satisfying statistical parity constraints between groups of nodes (e.g., distribute score mass between genders proportionally to their representation). To achieve this while… ▽ More Graph filters that transform prior node values to posterior scores via edge propagation often support graph mining tasks affecting humans, such as recommendation and ranking. Thus, it is important to make them fair in terms of satisfying statistical parity constraints between groups of nodes (e.g., distribute score mass between genders proportionally to their representation). To achieve this while minimally perturbing the original posteriors, we introduce a filter-aware universal approximation framework for posterior objectives. This defines appropriate graph neural networks trained at runtime to be similar to filters but also locally optimize a large class of objectives, including fairness-aware ones. Experiments on a collection of 8 filters and 5 graphs show that our approach performs equally well or better than alternatives in meeting parity constraints while preserving the AUC of score-based community member recommendation and creating minimal utility loss in prior diffusion. △ Less

Submitted 16 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: 40 pages, 5 figures, 5 papers

arXiv:2303.01544 [pdf, other]

Exciton-assisted electron tunneling in van der Waals heterostructures

Authors: Lujun Wang, Sotirios Papadopoulos, Fadil Iyikanat, Jian Zhang, Jing Huang, Kenji Watanabe, Takashi Taniguchi, Michel Calame, Mickael L. Perrin, F. Javier García de Abajo, Lukas Novotny

Abstract: The control of elastic and inelastic electron tunneling relies on materials with well defined interfaces. Van der Waals materials made of two-dimensional constituents form an ideal platform for such studies. Signatures of acoustic phonons and defect states have been observed in current-to-voltage ($I-V$) measurements. These features can be explained by direct electron-phonon or electron-defect int… ▽ More The control of elastic and inelastic electron tunneling relies on materials with well defined interfaces. Van der Waals materials made of two-dimensional constituents form an ideal platform for such studies. Signatures of acoustic phonons and defect states have been observed in current-to-voltage ($I-V$) measurements. These features can be explained by direct electron-phonon or electron-defect interactions. Here, we use a novel tunneling process that involves excitons in transition metal dichalcogenides (TMDs). We study tunnel junctions consisting of graphene and gold electrodes separated by hexagonal boron nitride (hBN) with an adjacent TMD monolayer and observe prominent resonant features in $I-V$ measurements. These resonances appear at bias voltages that correspond to TMD exciton energies. By placing the TMD outside of the tunneling pathway, we demonstrate that this phonon-exciton mediated tunneling process does not require any charge injection into the TMD. This work demonstrates the appearance of optical modes in electrical transport measurements and introduces a new functionality for optoelectronic devices based on van der Waals materials. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 26 pages, 23 figures, 77 references

arXiv:2303.01217 [pdf, other]

Synthetic Misinformers: Generating and Combating Multimodal Misinformation

Authors: Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, Panagiotis C. Petrantonakis

Abstract: With the expansion of social media and the increasing dissemination of multimedia content, the spread of misinformation has become a major concern. This necessitates effective strategies for multimodal misinformation detection (MMD) that detect whether the combination of an image and its accompanying text could mislead or misinform. Due to the data-intensive nature of deep neural networks and the… ▽ More With the expansion of social media and the increasing dissemination of multimedia content, the spread of misinformation has become a major concern. This necessitates effective strategies for multimodal misinformation detection (MMD) that detect whether the combination of an image and its accompanying text could mislead or misinform. Due to the data-intensive nature of deep neural networks and the labor-intensive process of manual annotation, researchers have been exploring various methods for automatically generating synthetic multimodal misinformation - which we refer to as Synthetic Misinformers - in order to train MMD models. However, limited evaluation on real-world misinformation and a lack of comparisons with other Synthetic Misinformers makes difficult to assess progress in the field. To address this, we perform a comparative study on existing and new Synthetic Misinformers that involves (1) out-of-context (OOC) image-caption pairs, (2) cross-modal named entity inconsistency (NEI) as well as (3) hybrid approaches and we evaluate them against real-world misinformation; using the COSMOS benchmark. The comparative study showed that our proposed CLIP-based Named Entity Swapping can lead to MMD models that surpass other OOC and NEI Misinformers in terms of multimodal accuracy and that hybrid approaches can lead to even higher detection accuracy. Nevertheless, after alleviating information leakage from the COSMOS evaluation protocol, low Sensitivity scores indicate that the task is significantly more challenging than previous studies suggested. Finally, our findings showed that NEI-based Synthetic Misinformers tend to suffer from a unimodal bias, where text-only MMDs can outperform multimodal ones. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2303.00363 [pdf, other]

Overbias photon emission from light-emitting devices based on monolayer transition metal dichalcogenides

Authors: Shengyu Shan, Jing Huang, Sotirios Papadopoulos, Ronja Khelifa, Takashi Taniguchi, Kenji Watanabe, Lujun Wang, Lukas Novotny

Abstract: Tunneling light-emitting devices (LEDs) based on transition metal dichalcogenides (TMDs) and other 2D materials are a new platform for on-chip optoelectronic integration. Some of the physical processes underlying this LED architecture are not fully understood, especially the emission at photon energies higher than the applied electrostatic potential, so-called overbias emission. Here we report ove… ▽ More Tunneling light-emitting devices (LEDs) based on transition metal dichalcogenides (TMDs) and other 2D materials are a new platform for on-chip optoelectronic integration. Some of the physical processes underlying this LED architecture are not fully understood, especially the emission at photon energies higher than the applied electrostatic potential, so-called overbias emission. Here we report overbias emission for potentials that are near half of the optical bandgap energy in TMD-based tunneling LEDs. We show that this emission is not thermal in nature, but consistent with exciton generation via a two-electron coherent tunneling process. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 6 pages, 4 figures

arXiv:2212.07187 [pdf, other]

Design-time Fashion Popularity Forecasting in VR Environments

Authors: Stefanos-Iordanis Papadopoulos, Christos Koutlis, Anastasios Papazoglou-Chalikias, Symeon Papadopoulos, Spiros Nikolopoulos

Abstract: Being able to forecast the popularity of new garment designs is very important in an industry as fast paced as fashion, both in terms of profitability and reducing the problem of unsold inventory. Here, we attempt to address this task in order to provide informative forecasts to fashion designers within a virtual reality designer application that will allow them to fine tune their creations based… ▽ More Being able to forecast the popularity of new garment designs is very important in an industry as fast paced as fashion, both in terms of profitability and reducing the problem of unsold inventory. Here, we attempt to address this task in order to provide informative forecasts to fashion designers within a virtual reality designer application that will allow them to fine tune their creations based on current consumer preferences within an interactive and immersive environment. To achieve this we have to deal with the following central challenges: (1) the proposed method should not hinder the creative process and thus it has to rely only on the garment's visual characteristics, (2) the new garment lacks historical data from which to extrapolate their future popularity and (3) fashion trends in general are highly dynamical. To this end, we develop a computer vision pipeline fine tuned on fashion imagery in order to extract relevant visual features along with the category and attributes of the garment. We propose a hierarchical label sharing (HLS) pipeline for automatically capturing hierarchical relations among fashion categories and attributes. Moreover, we propose MuQAR, a Multimodal Quasi-AutoRegressive neural network that forecasts the popularity of new garments by combining their visual features and categorical features while an autoregressive neural network is modelling the popularity time series of the garment's category and attributes. Both the proposed HLS and MuQAR prove capable of surpassing the current state-of-the-art in key benchmark datasets, DeepFashion for image classification and VISUELLE for new garment sales forecasting. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2212.01128 [pdf, other]

A Multi-Stream Fusion Network for Image Splicing Localization

Authors: Maria Siopi, Giorgos Kordopatis-Zilos, Polychronis Charitidis, Ioannis Kompatsiaris, Symeon Papadopoulos

Abstract: In this paper, we address the problem of image splicing localization with a multi-stream network architecture that processes the raw RGB image in parallel with other handcrafted forensic signals. Unlike previous methods that either use only the RGB images or stack several signals in a channel-wise manner, we propose an encoder-decoder architecture that consists of multiple encoder streams. Each st… ▽ More In this paper, we address the problem of image splicing localization with a multi-stream network architecture that processes the raw RGB image in parallel with other handcrafted forensic signals. Unlike previous methods that either use only the RGB images or stack several signals in a channel-wise manner, we propose an encoder-decoder architecture that consists of multiple encoder streams. Each stream is fed with either the tampered image or handcrafted signals and processes them separately to capture relevant information from each one independently. Finally, the extracted features from the multiple streams are fused in the bottleneck of the architecture and propagated to the decoder network that generates the output localization map. We experiment with two handcrafted algorithms, i.e., DCT and Splicebuster. Our proposed approach is benchmarked on three public forensics datasets, demonstrating competitive performance against several competing methods and achieving state-of-the-art results, e.g., 0.898 AUC on CASIA. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: Accepted to the International Conference on MultiMedia Modeling (MMM 2023)

arXiv:2212.00668 [pdf, other]

Leveraging Large-scale Multimedia Datasets to Refine Content Moderation Models

Authors: Ioannis Sarridis, Christos Koutlis, Olga Papadopoulou, Symeon Papadopoulos

Abstract: The sheer volume of online user-generated content has rendered content moderation technologies essential in order to protect digital platform audiences from content that may cause anxiety, worry, or concern. Despite the efforts towards developing automated solutions to tackle this problem, creating accurate models remains challenging due to the lack of adequate task-specific training data. The fac… ▽ More The sheer volume of online user-generated content has rendered content moderation technologies essential in order to protect digital platform audiences from content that may cause anxiety, worry, or concern. Despite the efforts towards developing automated solutions to tackle this problem, creating accurate models remains challenging due to the lack of adequate task-specific training data. The fact that manually annotating such data is a highly demanding procedure that could severely affect the annotators' emotional well-being is directly related to the latter limitation. In this paper, we propose the CM-Refinery framework that leverages large-scale multimedia datasets to automatically extend initial training datasets with hard examples that can refine content moderation models, while significantly reducing the involvement of human annotators. We apply our method on two model adaptation strategies designed with respect to the different challenges observed while collecting data, i.e. lack of (i) task-specific negative data or (ii) both positive and negative data. Additionally, we introduce a diversity criterion applied to the data collection process that further enhances the generalization performance of the refined models. The proposed method is evaluated on the Not Safe for Work (NSFW) and disturbing content detection tasks on benchmark datasets achieving 1.32% and 1.94% accuracy improvements compared to the state of the art, respectively. Finally, it significantly reduces human involvement, as 92.54% of data are automatically annotated in case of disturbing content while no human intervention is required for the NSFW task. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.10996 [pdf, other]

MINTIME: Multi-Identity Size-Invariant Video Deepfake Detection

Authors: Davide Alessandro Coccomini, Giorgos Kordopatis Zilos, Giuseppe Amato, Roberto Caldelli, Fabrizio Falchi, Symeon Papadopoulos, Claudio Gennaro

Abstract: In this paper, we introduce MINTIME, a video deepfake detection approach that captures spatial and temporal anomalies and handles instances of multiple people in the same video and variations in face sizes. Previous approaches disregard such information either by using simple a-posteriori aggregation schemes, i.e., average or max operation, or using only one identity for the inference, i.e., the l… ▽ More In this paper, we introduce MINTIME, a video deepfake detection approach that captures spatial and temporal anomalies and handles instances of multiple people in the same video and variations in face sizes. Previous approaches disregard such information either by using simple a-posteriori aggregation schemes, i.e., average or max operation, or using only one identity for the inference, i.e., the largest one. On the contrary, the proposed approach builds on a Spatio-Temporal TimeSformer combined with a Convolutional Neural Network backbone to capture spatio-temporal anomalies from the face sequences of multiple identities depicted in a video. This is achieved through an Identity-aware Attention mechanism that attends to each face sequence independently based on a masking operation and facilitates video-level aggregation. In addition, two novel embeddings are employed: (i) the Temporal Coherent Positional Embedding that encodes each face sequence's temporal information and (ii) the Size Embedding that encodes the size of the faces as a ratio to the video frame size. These extensions allow our system to adapt particularly well in the wild by learning how to aggregate information of multiple identities, which is usually disregarded by other methods in the literature. It achieves state-of-the-art results on the ForgeryNet dataset with an improvement of up to 14% AUC in videos containing multiple people and demonstrates ample generalization capabilities in cross-forgery and cross-dataset settings. The code is publicly available at https://github.com/davide-coccomini/MINTIME-Multi-Identity-size-iNvariant-TIMEsformer-for-Video-Deepfake-Detection. △ Less

Submitted 7 December, 2022; v1 submitted 20 November, 2022; originally announced November 2022.

arXiv:2209.11641 [pdf, other]

Energy transfer from tunneling electrons to excitons

Authors: Sotirios Papadopoulos, Lujun Wang, Takashi Taniguchi, Kenji Watanabe, Lukas Novotny

Abstract: Excitons in optoelectronic devices have been generated through optical excitation, external carrier injection, or employing pre-existing charges. Here, we reveal a new way to electrically generate excitons in transition metal dichalcogenides (TMDs). The TMD is placed on top of a gold-hBN-graphene tunnel junction, outside of the tunneling pathway. This electrically driven device features a photoemi… ▽ More Excitons in optoelectronic devices have been generated through optical excitation, external carrier injection, or employing pre-existing charges. Here, we reveal a new way to electrically generate excitons in transition metal dichalcogenides (TMDs). The TMD is placed on top of a gold-hBN-graphene tunnel junction, outside of the tunneling pathway. This electrically driven device features a photoemission spectrum with a distinct peak at the exciton energy of the TMD. We interpret this observation as exciton generation by energy transfer from tunneling electrons, which is further supported by a theoretical model based on inelastic electron tunneling. Our findings introduce a new paradigm for exciton creation in van der Waals heterostructures and provide inspiration for a new class of optoelectronic devices in which the optically active material is separated from the electrical pathway. △ Less

Submitted 30 September, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

arXiv:2209.08309 [pdf, other]

AdaCC: Cumulative Cost-Sensitive Boosting for Imbalanced Classification

Authors: Vasileios Iosifidis, Symeon Papadopoulos, Bodo Rosenhahn, Eirini Ntoutsi

Abstract: Class imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is… ▽ More Class imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is a challenging task that requires domain knowledge and moreover, wrong adjustments might lead to overall predictive performance deterioration. In this work, we propose a novel cost-sensitive boosting approach for imbalanced data that dynamically adjusts the misclassification costs over the boosting rounds in response to model's performance instead of using a fixed misclassification cost matrix. Our method, called AdaCC, is parameter-free as it relies on the cumulative behavior of the boosting model in order to adjust the misclassification costs for the next boosting round and comes with theoretical guarantees regarding the training error. Experiments on 27 real-world datasets from different domains with high class imbalance demonstrate the superiority of our method over 12 state-of-the-art cost-sensitive boosting approaches exhibiting consistent improvements in different measures, for instance, in the range of [0.3%-28.56%] for AUC, [3.4%-21.4%] for balanced accuracy, [4.8%-45%] for gmean and [7.4%-85.5%] for recall. △ Less

Submitted 17 September, 2022; originally announced September 2022.

Comments: 30 pages

arXiv:2207.13458 [pdf, other]

VICTOR: Visual Incompatibility Detection with Transformers and Fashion-specific contrastive pre-training

Authors: Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, Ioannis Kompatsiaris

Abstract: For fashion outfits to be considered aesthetically pleasing, the garments that constitute them need to be compatible in terms of visual aspects, such as style, category and color. Previous works have defined visual compatibility as a binary classification task with items in a garment being considered as fully compatible or fully incompatible. However, this is not applicable to Outfit Maker applica… ▽ More For fashion outfits to be considered aesthetically pleasing, the garments that constitute them need to be compatible in terms of visual aspects, such as style, category and color. Previous works have defined visual compatibility as a binary classification task with items in a garment being considered as fully compatible or fully incompatible. However, this is not applicable to Outfit Maker applications where users create their own outfits and need to know which specific items may be incompatible with the rest of the outfit. To address this, we propose the Visual InCompatibility TransfORmer (VICTOR) that is optimized for two tasks: 1) overall compatibility as regression and 2) the detection of mismatching items and utilize fashion-specific contrastive language-image pre-training for fine tuning computer vision neural networks on fashion imagery. We build upon the Polyvore outfit benchmark to generate partially mismatching outfits, creating a new dataset termed Polyvore-MISFITs, that is used to train VICTOR. A series of ablation and comparative analyses show that the proposed architecture can compete and even surpass the current state-of-the-art on Polyvore datasets while reducing the instance-wise floating operations by 88%, striking a balance between high performance and efficiency. We release our code at https://github.com/stevejpapad/Visual-InCompatibility-Transformer △ Less

Submitted 8 September, 2022; v1 submitted 27 July, 2022; originally announced July 2022.

Comments: 14 pages, 6 figures

arXiv:2205.13268 [pdf, other]

MemeTector: Enforcing deep focus for meme detection

Authors: Christos Koutlis, Manos Schinas, Symeon Papadopoulos

Abstract: Image memes and specifically their widely-known variation image macros, is a special new media type that combines text with images and is used in social media to playfully or subtly express humour, irony, sarcasm and even hate. It is important to accurately retrieve image memes from social media to better capture the cultural and social aspects of online phenomena and detect potential issues (hate… ▽ More Image memes and specifically their widely-known variation image macros, is a special new media type that combines text with images and is used in social media to playfully or subtly express humour, irony, sarcasm and even hate. It is important to accurately retrieve image memes from social media to better capture the cultural and social aspects of online phenomena and detect potential issues (hate-speech, disinformation). Essentially, the background image of an image macro is a regular image easily recognized as such by humans but cumbersome for the machine to do so due to feature map similarity with the complete image macro. Hence, accumulating suitable feature maps in such cases can lead to deep understanding of the notion of image memes. To this end, we propose a methodology, called Visual Part Utilization, that utilizes the visual part of image memes as instances of the regular image class and the initial image memes as instances of the image meme class to force the model to concentrate on the critical parts that characterize an image meme. Additionally, we employ a trainable attention mechanism on top of a standard ViT architecture to enhance the model's ability to focus on these critical parts and make the predictions interpretable. Several training and test scenarios involving web-scraped regular images of controlled text presence are considered for evaluating the model in terms of robustness and accuracy. The findings indicate that light visual part utilization combined with sufficient text presence during training provides the best and most robust model, surpassing state of the art. Source code and dataset are available at https://github.com/mever-team/memetector. △ Less

Submitted 20 January, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2205.10003 [pdf, other]

InDistill: Information flow-preserving knowledge distillation for model compression

Authors: Ioannis Sarridis, Christos Koutlis, Giorgos Kordopatis-Zilos, Ioannis Kompatsiaris, Symeon Papadopoulos

Abstract: In this paper we introduce InDistill, a model compression approach that combines knowledge distillation and channel pruning in a unified framework for the transfer of the critical information flow paths from a heavyweight teacher to a lightweight student. Such information is typically collapsed in previous methods due to an encoding stage prior to distillation. By contrast, InDistill leverages a p… ▽ More In this paper we introduce InDistill, a model compression approach that combines knowledge distillation and channel pruning in a unified framework for the transfer of the critical information flow paths from a heavyweight teacher to a lightweight student. Such information is typically collapsed in previous methods due to an encoding stage prior to distillation. By contrast, InDistill leverages a pruning operation applied to the teacher's intermediate layers reducing their width to the corresponding student layers' width. In that way, we force architectural alignment enabling the intermediate layers to be directly distilled without the need of an encoding stage. Additionally, a curriculum learning-based training scheme is adopted considering the distillation difficulty of each layer and the critical learning periods in which the information flow paths are created. The proposed method surpasses state-of-the-art performance on three standard benchmarks, i.e. CIFAR-10, CUB-200, and FashionMNIST by 3.08%, 14.27%, and 1% mAP, respectively, as well as on more challenging evaluation settings, i.e. ImageNet and CIFAR-100 by 1.97% and 5.65% mAP, respectively. △ Less

Submitted 16 June, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

arXiv:2204.12902 [pdf, other]

A Graph Diffusion Scheme for Decentralized Content Search based on Personalized PageRank

Authors: Nikolaos Giatsoglou, Emmanouil Krasanakis, Symeon Papadopoulos, Ioannis Kompatsiaris

Abstract: Decentralization is emerging as a key feature of the future Internet. However, effective algorithms for search are missing from state-of-the-art decentralized technologies, such as distributed hash tables and blockchain. This is surprising, since decentralized search has been studied extensively in earlier peer-to-peer (P2P) literature. In this work, we adopt a fresh outlook for decentralized sear… ▽ More Decentralization is emerging as a key feature of the future Internet. However, effective algorithms for search are missing from state-of-the-art decentralized technologies, such as distributed hash tables and blockchain. This is surprising, since decentralized search has been studied extensively in earlier peer-to-peer (P2P) literature. In this work, we adopt a fresh outlook for decentralized search in P2P networks that is inspired by advancements in dense information retrieval and graph signal processing. In particular, we generate latent representations of P2P nodes based on their stored documents and diffuse them to the rest of the network with graph filters, such as personalized PageRank. We then use the diffused representations to guide search queries towards relevant content. Our preliminary approach is successful in locating relevant documents in nearby nodes but the accuracy declines sharply with the number of stored documents, highlighting the need for more sophisticated techniques. △ Less

Submitted 27 April, 2022; originally announced April 2022.

Comments: 7 pages, 3 figures, to be published in the Decentralized Internet, Networks, Protocols, and Systems (DINPS 2022) workshop hosted at the 42nd IEEE International Conference on Distributed Computing Systems (ICDCS 2022) - https://research.protocol.ai/sites/dinps/

arXiv:2204.12816 [pdf, other]

The MeVer DeepFake Detection Service: Lessons Learnt from Developing and Deploying in the Wild

Authors: Spyridon Baxevanakis, Giorgos Kordopatis-Zilos, Panagiotis Galopoulos, Lazaros Apostolidis, Killian Levacher, Ipek B. Schlicht, Denis Teyssou, Ioannis Kompatsiaris, Symeon Papadopoulos

Abstract: Enabled by recent improvements in generation methodologies, DeepFakes have become mainstream due to their increasingly better visual quality, the increase in easy-to-use generation tools and the rapid dissemination through social media. This fact poses a severe threat to our societies with the potential to erode social cohesion and influence our democracies. To mitigate the threat, numerous DeepFa… ▽ More Enabled by recent improvements in generation methodologies, DeepFakes have become mainstream due to their increasingly better visual quality, the increase in easy-to-use generation tools and the rapid dissemination through social media. This fact poses a severe threat to our societies with the potential to erode social cohesion and influence our democracies. To mitigate the threat, numerous DeepFake detection schemes have been introduced in the literature but very few provide a web service that can be used in the wild. In this paper, we introduce the MeVer DeepFake detection service, a web service detecting deep learning manipulations in images and video. We present the design and implementation of the proposed processing pipeline that involves a model ensemble scheme, and we endow the service with a model card for transparency. Experimental results show that our service performs robustly on the three benchmark datasets while being vulnerable to Adversarial Attacks. Finally, we outline our experience and lessons learned when deploying a research system into production in the hopes that it will be useful to other academic and industry teams. △ Less

Submitted 27 April, 2022; originally announced April 2022.

Comments: 10 pages, 6 figures

arXiv:2204.04014 [pdf, other]

Multimodal Quasi-AutoRegression: Forecasting the visual popularity of new fashion products

Authors: Stefanos I. Papadopoulos, Christos Koutlis, Symeon Papadopoulos, Ioannis Kompatsiaris

Abstract: Estimating the preferences of consumers is of utmost importance for the fashion industry as appropriately leveraging this information can be beneficial in terms of profit. Trend detection in fashion is a challenging task due to the fast pace of change in the fashion industry. Moreover, forecasting the visual popularity of new garment designs is even more demanding due to lack of historical data. T… ▽ More Estimating the preferences of consumers is of utmost importance for the fashion industry as appropriately leveraging this information can be beneficial in terms of profit. Trend detection in fashion is a challenging task due to the fast pace of change in the fashion industry. Moreover, forecasting the visual popularity of new garment designs is even more demanding due to lack of historical data. To this end, we propose MuQAR, a Multimodal Quasi-AutoRegressive deep learning architecture that combines two modules: (1) a multi-modal multi-layer perceptron processing categorical, visual and textual features of the product and (2) a quasi-autoregressive neural network modelling the "target" time series of the product's attributes along with the "exogenous" time series of all other attributes. We utilize computer vision, image classification and image captioning, for automatically extracting visual features and textual descriptions from the images of new products. Product design in fashion is initially expressed visually and these features represent the products' unique characteristics without interfering with the creative process of its designers by requiring additional inputs (e.g manually written texts). We employ the product's target attributes time series as a proxy of temporal popularity patterns, mitigating the lack of historical data, while exogenous time series help capture trends among interrelated attributes. We perform an extensive ablation analysis on two large scale image fashion datasets, Mallzee and SHIFT15m to assess the adequacy of MuQAR and also use the Amazon Reviews: Home and Kitchen dataset to assess generalisability to other domains. A comparative study on the VISUELLE dataset, shows that MuQAR is capable of competing and surpassing the domain's current state of the art by 4.65% and 4.8% in terms of WAPE and MAE respectively. △ Less

Submitted 5 May, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

Comments: 14 pages, 4 figures

arXiv:2111.14837 [pdf, other]

doi 10.1109/ACCESS.2022.3159688

p2pGNN: A Decentralized Graph Neural Network for Node Classification in Peer-to-Peer Networks

Authors: Emmanouil Krasanakis, Symeon Papadopoulos, Ioannis Kompatsiaris

Abstract: In this work, we aim to classify nodes of unstructured peer-to-peer networks with communication uncertainty, such as users of decentralized social networks. Graph Neural Networks (GNNs) are known to improve the accuracy of simple classifiers in centralized settings by leveraging naturally occurring network links, but graph convolutional layers are challenging to implement in decentralized settings… ▽ More In this work, we aim to classify nodes of unstructured peer-to-peer networks with communication uncertainty, such as users of decentralized social networks. Graph Neural Networks (GNNs) are known to improve the accuracy of simple classifiers in centralized settings by leveraging naturally occurring network links, but graph convolutional layers are challenging to implement in decentralized settings when node neighbors are not constantly available. We address this problem by employing decoupled GNNs, where base classifier predictions and errors are diffused through graphs after training. For these, we deploy pre-trained and gossip-trained base classifiers and implement peer-to-peer graph diffusion under communication uncertainty. In particular, we develop an asynchronous decentralized formulation of diffusion that converges to centralized predictions in distribution and linearly with respect to communication rates. We experiment on three real-world graphs with node features and labels and simulate peer-to-peer networks with uniformly random communication frequencies; given a portion of known labels, our decentralized graph diffusion achieves comparable accuracy to centralized GNNs with minimal communication overhead (less than 3% of what gossip training already adds). △ Less

Submitted 15 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: 12 pages, 4 figures, 2 tables, accepted manuscript, IEEE Access

arXiv:2111.11952 [pdf, other]

Leveraging Selective Prediction for Reliable Image Geolocation

Authors: Apostolos Panagiotopoulos, Giorgos Kordopatis-Zilos, Symeon Papadopoulos

Abstract: Reliable image geolocation is crucial for several applications, ranging from social media geo-tagging to fake news detection. State-of-the-art geolocation methods surpass human performance on the task of geolocation estimation from images. However, no method assesses the suitability of an image for this task, which results in unreliable and erroneous estimations for images containing no geolocatio… ▽ More Reliable image geolocation is crucial for several applications, ranging from social media geo-tagging to fake news detection. State-of-the-art geolocation methods surpass human performance on the task of geolocation estimation from images. However, no method assesses the suitability of an image for this task, which results in unreliable and erroneous estimations for images containing no geolocation clues. In this paper, we define the task of image localizability, i.e. suitability of an image for geolocation, and propose a selective prediction methodology to address the task. In particular, we propose two novel selection functions that leverage the output probability distributions of geolocation models to infer localizability at different scales. Our selection functions are benchmarked against the most widely used selective prediction baselines, outperforming them in all cases. By abstaining from predicting non-localizable images, we improve geolocation accuracy from 27.8% to 70.5% at the city-scale, and thus make current geolocation models reliable for real-world applications. △ Less

Submitted 23 November, 2021; originally announced November 2021.

Comments: Accepted to the 28th International Conference on MultiMedia Modeling (MMM' 22)

arXiv:2110.09274 [pdf, other]

pygrank: A Python Package for Graph Node Ranking

Authors: Emmanouil Krasanakis, Symeon Papadopoulos, Ioannis Kompatsiaris, Andreas Symeonidis

Abstract: We introduce pygrank, an open source Python package to define, run and evaluate node ranking algorithms. We provide object-oriented and extensively unit-tested algorithm components, such as graph filters, post-processors, measures, benchmarks and online tuning. Computations can be delegated to numpy, tensorflow or pytorch backends and fit in back-propagation pipelines. Classes can be combined to d… ▽ More We introduce pygrank, an open source Python package to define, run and evaluate node ranking algorithms. We provide object-oriented and extensively unit-tested algorithm components, such as graph filters, post-processors, measures, benchmarks and online tuning. Computations can be delegated to numpy, tensorflow or pytorch backends and fit in back-propagation pipelines. Classes can be combined to define interoperable complex algorithms. Within the context of this paper we compare the package with related alternatives and demonstrate its flexibility and ease of use with code examples. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: 6 pages, 1 figure, 2 tables, 3 code snippets

arXiv:2108.12397 [pdf, other]

Prior Signal Editing for Graph Filter Posterior Fairness Constraints

Authors: Emmanouil Krasanakis, Symeon Papadopoulos, Ioannis Kompatsiaris, Andreas Symeonidis

Abstract: Graph filters are an emerging paradigm that systematizes information propagation in graphs as transformation of prior node values, called graph signals, to posterior scores. In this work, we study the problem of mitigating disparate impact, i.e. posterior score differences between a protected set of sensitive nodes and the rest, while minimally editing scores to preserve recommendation quality. To… ▽ More Graph filters are an emerging paradigm that systematizes information propagation in graphs as transformation of prior node values, called graph signals, to posterior scores. In this work, we study the problem of mitigating disparate impact, i.e. posterior score differences between a protected set of sensitive nodes and the rest, while minimally editing scores to preserve recommendation quality. To this end, we develop a scheme that respects propagation mechanisms by editing graph signal priors according to their posteriors and node sensitivity, where a small number of editing parameters can be tuned to constrain or eliminate disparate impact. We also theoretically explain that coarse prior editing can locally optimize posteriors objectives thanks to graph filter robustness. We experiment on a diverse collection of 12 graphs with varying number of nodes, where our approach performs equally well or better than previous ones in minimizing disparate impact and preserving posterior AUC under fairness constraints. △ Less

Submitted 27 August, 2021; originally announced August 2021.

Comments: 40 pages, 4 figures, 9 tables

arXiv:2107.07919 [pdf, other]

A Survey on Bias in Visual Datasets

Authors: Simone Fabbrizzi, Symeon Papadopoulos, Eirini Ntoutsi, Ioannis Kompatsiaris

Abstract: Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks. Nonetheless, it may result in significant discrimination if not handled properly as CV systems highly depend on the data they are fed with and can learn and amplify biases within such data. Thus, the problems of understanding and discovering biases are of utmost importance. Yet, there is no comprehensive s… ▽ More Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks. Nonetheless, it may result in significant discrimination if not handled properly as CV systems highly depend on the data they are fed with and can learn and amplify biases within such data. Thus, the problems of understanding and discovering biases are of utmost importance. Yet, there is no comprehensive survey on bias in visual datasets. Hence, this work aims to: i) describe the biases that might manifest in visual datasets; ii) review the literature on methods for bias discovery and quantification in visual datasets; iii) discuss existing attempts to collect bias-aware visual datasets. A key conclusion of our study is that the problem of bias discovery and quantification in visual datasets is still open, and there is room for improvement in terms of both methods and the range of biases that can be addressed. Moreover, there is no such thing as a bias-free dataset, so scientists and practitioners must become aware of the biases in their datasets and make them explicit. To this end, we propose a checklist to spot different types of bias during visual dataset collection. △ Less

Submitted 23 June, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

arXiv:2106.13266 [pdf, other]

doi 10.1007/s11263-022-01651-3

DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval

Authors: Giorgos Kordopatis-Zilos, Christos Tzelepis, Symeon Papadopoulos, Ioannis Kompatsiaris, Ioannis Patras

Abstract: In this paper, we address the problem of high performance and computationally efficient content-based video retrieval in large-scale datasets. Current methods typically propose either: (i) fine-grained approaches employing spatio-temporal representations and similarity calculations, achieving high performance at a high computational cost or (ii) coarse-grained approaches representing/indexing vide… ▽ More In this paper, we address the problem of high performance and computationally efficient content-based video retrieval in large-scale datasets. Current methods typically propose either: (i) fine-grained approaches employing spatio-temporal representations and similarity calculations, achieving high performance at a high computational cost or (ii) coarse-grained approaches representing/indexing videos as global vectors, where the spatio-temporal structure is lost, providing low performance but also having low computational cost. In this work, we propose a Knowledge Distillation framework, called Distill-and-Select (DnS), that starting from a well-performing fine-grained Teacher Network learns: a) Student Networks at different retrieval performance and computational efficiency trade-offs and b) a Selector Network that at test time rapidly directs samples to the appropriate student to maintain both high retrieval performance and high computational efficiency. We train several students with different architectures and arrive at different trade-offs of performance and efficiency, i.e., speed and storage requirements, including fine-grained students that store/index videos using binary representations. Importantly, the proposed scheme allows Knowledge Distillation in large, unlabelled datasets -- this leads to good students. We evaluate DnS on five public datasets on three different video retrieval tasks and demonstrate a) that our students achieve state-of-the-art performance in several cases and b) that the DnS framework provides an excellent trade-off between retrieval performance, computational speed, and storage space. In specific configurations, the proposed method achieves similar mAP with the teacher but is 20 times faster and requires 240 times less storage space. The collected dataset and implementation are publicly available: https://github.com/mever-team/distill-and-select. △ Less

Submitted 5 August, 2022; v1 submitted 24 June, 2021; originally announced June 2021.

Journal ref: International Journal of Computer Vision (2022)

arXiv:2105.07645 [pdf, ps, other]

Leveraging EfficientNet and Contrastive Learning for Accurate Global-scale Location Estimation

Authors: Giorgos Kordopatis-Zilos, Panagiotis Galopoulos, Symeon Papadopoulos, Ioannis Kompatsiaris

Abstract: In this paper, we address the problem of global-scale image geolocation, proposing a mixed classification-retrieval scheme. Unlike other methods that strictly tackle the problem as a classification or retrieval task, we combine the two practices in a unified solution leveraging the advantages of each approach with two different modules. The first leverages the EfficientNet architecture to assign i… ▽ More In this paper, we address the problem of global-scale image geolocation, proposing a mixed classification-retrieval scheme. Unlike other methods that strictly tackle the problem as a classification or retrieval task, we combine the two practices in a unified solution leveraging the advantages of each approach with two different modules. The first leverages the EfficientNet architecture to assign images to a specific geographic cell in a robust way. The second introduces a new residual architecture that is trained with contrastive learning to map input images to an embedding space that minimizes the pairwise geodesic distance of same-location images. For the final location estimation, the two modules are combined with a search-within-cell scheme, where the locations of most similar images from the predicted geographic cell are aggregated based on a spatial clustering scheme. Our approach demonstrates very competitive performance on four public datasets, achieving new state-of-the-art performance in fine granularity scales, i.e., 15.0% at 1km range on Im2GPS3k. △ Less

Submitted 17 May, 2021; originally announced May 2021.

arXiv:2105.05515 [pdf, other]

Operation-wise Attention Network for Tampering Localization Fusion

Authors: Polychronis Charitidis, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Kompatsiaris

Abstract: In this work, we present a deep learning-based approach for image tampering localization fusion. This approach is designed to combine the outcomes of multiple image forensics algorithms and provides a fused tampering localization map, which requires no expert knowledge and is easier to interpret by end users. Our fusion framework includes a set of five individual tampering localization methods for… ▽ More In this work, we present a deep learning-based approach for image tampering localization fusion. This approach is designed to combine the outcomes of multiple image forensics algorithms and provides a fused tampering localization map, which requires no expert knowledge and is easier to interpret by end users. Our fusion framework includes a set of five individual tampering localization methods for splicing localization on JPEG images. The proposed deep learning fusion model is an adapted architecture, initially proposed for the image restoration task, that performs multiple operations in parallel, weighted by an attention mechanism to enable the selection of proper operations depending on the input signals. This weighting process can be very beneficial for cases where the input signal is very diverse, as in our case where the output signals of multiple image forensics algorithms are combined. Evaluation in three publicly available forensics datasets demonstrates that the performance of the proposed approach is competitive, outperforming the individual forensics techniques as well as another recently proposed fusion framework in the majority of cases. △ Less

Submitted 13 May, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

Comments: 6 pages, 3 figures, cbmi

arXiv:2010.08737 [pdf, ps, other]

Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning

Authors: Pavlos Avgoustinakis, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Andreas L. Symeonidis, Ioannis Kompatsiaris

Abstract: In this work, we address the problem of audio-based near-duplicate video retrieval. We propose the Audio Similarity Learning (AuSiL) approach that effectively captures temporal patterns of audio similarity between video pairs. For the robust similarity calculation between two videos, we first extract representative audio-based video descriptors by leveraging transfer learning based on a Convolutio… ▽ More In this work, we address the problem of audio-based near-duplicate video retrieval. We propose the Audio Similarity Learning (AuSiL) approach that effectively captures temporal patterns of audio similarity between video pairs. For the robust similarity calculation between two videos, we first extract representative audio-based video descriptors by leveraging transfer learning based on a Convolutional Neural Network (CNN) trained on a large scale dataset of audio events, and then we calculate the similarity matrix derived from the pairwise similarity of these descriptors. The similarity matrix is subsequently fed to a CNN network that captures the temporal structures existing within its content. We train our network following a triplet generation process and optimizing the triplet loss function. To evaluate the effectiveness of the proposed approach, we have manually annotated two publicly available video datasets based on the audio duplicity between their videos. The proposed approach achieves very competitive results compared to three state-of-the-art methods. Also, unlike the competing methods, it is very robust to the retrieval of audio duplicates generated with speed transformations. △ Less

Submitted 11 January, 2021; v1 submitted 17 October, 2020; originally announced October 2020.

arXiv:2009.00945 [pdf, other]

doi 10.1016/j.asoc.2020.106685

LAVARNET: Neural Network Modeling of Causal Variable Relationships for Multivariate Time Series Forecasting

Authors: Christos Koutlis, Symeon Papadopoulos, Manos Schinas, Ioannis Kompatsiaris

Abstract: Multivariate time series forecasting is of great importance to many scientific disciplines and industrial sectors. The evolution of a multivariate time series depends on the dynamics of its variables and the connectivity network of causal interrelationships among them. Most of the existing time series models do not account for the causal effects among the system's variables and even if they do the… ▽ More Multivariate time series forecasting is of great importance to many scientific disciplines and industrial sectors. The evolution of a multivariate time series depends on the dynamics of its variables and the connectivity network of causal interrelationships among them. Most of the existing time series models do not account for the causal effects among the system's variables and even if they do they rely just on determining the between-variables causality network. Knowing the structure of such a complex network and even more specifically knowing the exact lagged variables that contribute to the underlying process is crucial for the task of multivariate time series forecasting. The latter is a rather unexplored source of information to leverage. In this direction, here a novel neural network-based architecture is proposed, termed LAgged VAriable Representation NETwork (LAVARNET), which intrinsically estimates the importance of lagged variables and combines high dimensional latent representations of them to predict future values of time series. Our model is compared with other baseline and state of the art neural network architectures on one simulated data set and four real data sets from meteorology, music, solar activity, and finance areas. The proposed architecture outperforms the competitive architectures in most of the experiments. △ Less

Submitted 2 September, 2020; originally announced September 2020.

arXiv:2006.08361 [pdf, other]

An Unsupervised Machine Learning Approach to Assess the ZIP Code Level Impact of COVID-19 in NYC

Authors: Fadoua Khmaissia, Pegah Sagheb Haghighi, Aarthe Jayaprakash, Zhenwei Wu, Sokratis Papadopoulos, Yuan Lai, Freddy T. Nguyen

Abstract: New York City has been recognized as the world's epicenter of the novel Coronavirus pandemic. To identify the key inherent factors that are highly correlated to the Increase Rate of COVID-19 new cases in NYC, we propose an unsupervised machine learning framework. Based on the assumption that ZIP code areas with similar demographic, socioeconomic, and mobility patterns are likely to experience simi… ▽ More New York City has been recognized as the world's epicenter of the novel Coronavirus pandemic. To identify the key inherent factors that are highly correlated to the Increase Rate of COVID-19 new cases in NYC, we propose an unsupervised machine learning framework. Based on the assumption that ZIP code areas with similar demographic, socioeconomic, and mobility patterns are likely to experience similar outbreaks, we select the most relevant features to perform a clustering that can best reflect the spread, and map them down to 9 interpretable categories. We believe that our findings can guide policy makers to promptly anticipate and prevent the spread of the virus by taking the right measures. △ Less

Submitted 18 September, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: Presented at ICML 2020 Workshop on the Healthcare Systems, Population Health, and the Role of Health-Tech

arXiv:2006.07084 [pdf, other]

Investigating the Impact of Pre-processing and Prediction Aggregation on the DeepFake Detection Task

Authors: Polychronis Charitidis, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Kompatsiaris

Abstract: Recent advances in content generation technologies (widely known as DeepFakes) along with the online proliferation of manipulated media content render the detection of such manipulations a task of increasing importance. Even though there are many DeepFake detection methods, only a few focus on the impact of dataset preprocessing and the aggregation of frame-level to video-level prediction on model… ▽ More Recent advances in content generation technologies (widely known as DeepFakes) along with the online proliferation of manipulated media content render the detection of such manipulations a task of increasing importance. Even though there are many DeepFake detection methods, only a few focus on the impact of dataset preprocessing and the aggregation of frame-level to video-level prediction on model performance. In this paper, we propose a pre-processing step to improve the training data quality and examine its effect on the performance of DeepFake detection. We also propose and evaluate the effect of video-level prediction aggregation approaches. Experimental results show that the proposed pre-processing approach leads to considerable improvements in the performance of detection models, and the proposed prediction aggregation scheme further boosts the detection efficiency in cases where there are multiple faces in a video. △ Less

Submitted 19 October, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

Showing 1–50 of 61 results for author: Papadopoulos, S