Skip to main content

Showing 1–50 of 87 results for author: Dev, S

  1. Interpreting a Semantic Segmentation Model for Coastline Detection

    Authors: Conor O'Sullivan, Seamus Coveney, Xavier Monteys, Soumyabrata Dev

    Abstract: We interpret a deep-learning semantic segmentation model used to classify coastline satellite images into land and water. This is to build trust in the model and gain new insight into the process of coastal water body extraction. Specifically, we seek to understand which spectral bands are important for predicting segmentation masks. This is done using a permutation importance approach. Results sh… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Journal ref: 2023 Photonics & Electromagnetics Research Symposium (PIERS)

  2. The Effectiveness of Edge Detection Evaluation Metrics for Automated Coastline Detection

    Authors: Conor O'Sullivan, Seamus Coveney, Xavier Monteys, Soumyabrata Dev

    Abstract: We analyse the effectiveness of RMSE, PSNR, SSIM and FOM for evaluating edge detection algorithms used for automated coastline detection. Typically, the accuracy of detected coastlines is assessed visually. This can be impractical on a large scale leading to the need for objective evaluation metrics. Hence, we conduct an experiment to find reliable metrics. We apply Canny edge detection to 95 coas… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Journal ref: 2023 Photonics & Electromagnetics Research Symposium (PIERS)

  3. Automated Coastline Extraction Using Edge Detection Algorithms

    Authors: Conor O'Sullivan, Seamus Coveney, Xavier Monteys, Soumyabrata Dev

    Abstract: We analyse the effectiveness of edge detection algorithms for the purpose of automatically extracting coastlines from satellite images. Four algorithms - Canny, Sobel, Scharr and Prewitt are compared visually and using metrics. With an average SSIM of 0.8, Canny detected edges that were closest to the reference edges. However, the algorithm had difficulty distinguishing noisy edges, e.g. due to de… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  4. arXiv:2404.14695  [pdf, other

    cs.CL

    MisgenderMender: A Community-Informed Approach to Interventions for Misgendering

    Authors: Tamanna Hossain, Sunipa Dev, Sameer Singh

    Abstract: Content Warning: This paper contains examples of misgendering and erasure that could be offensive and potentially triggering. Misgendering, the act of incorrectly addressing someone's gender, inflicts serious harm and is pervasive in everyday technologies, yet there is a notable lack of research to combat it. We are the first to address this lack of research into interventions for misgendering b… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NAACL 2024

  5. arXiv:2404.05866  [pdf, other

    cs.CL

    GeniL: A Multilingual Dataset on Generalizing Language

    Authors: Aida Mostafazadeh Davani, Sagar Gubbi, Sunipa Dev, Shachi Dave, Vinodkumar Prabhakaran

    Abstract: LLMs are increasingly transforming our digital ecosystem, but they often inherit societal biases learned from their training data, for instance stereotypes associating certain attributes with specific identity groups. While whether and how these biases are mitigated may depend on the specific use cases, being able to effectively detect instances of stereotype perpetuation is a crucial first step.… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  6. arXiv:2403.05696  [pdf, other

    cs.CL cs.CV

    SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes

    Authors: Mukul Bhutani, Kevin Robinson, Vinodkumar Prabhakaran, Shachi Dave, Sunipa Dev

    Abstract: While generative multilingual models are rapidly being deployed, their safety and fairness evaluations are largely limited to resources collected in English. This is especially problematic for evaluations targeting inherently socio-cultural phenomena such as stereotyping, where it is important to build multi-lingual resources that reflect the stereotypes prevalent in respective language communitie… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  7. arXiv:2401.06935  [pdf, other

    cs.CL cs.CY

    MiTTenS: A Dataset for Evaluating Misgendering in Translation

    Authors: Kevin Robinson, Sneha Kudugunta, Romina Stella, Sunipa Dev, Jasmijn Bastings

    Abstract: Misgendering is the act of referring to someone in a way that does not reflect their gender identity. Translation systems, including foundation models capable of translation, can produce errors that result in misgendering harms. To measure the extent of such potential harms when translating into and out of English, we introduce a dataset, MiTTenS, covering 26 languages from a variety of language f… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: GitHub repository https://github.com/google-research-datasets/mittens

  8. arXiv:2401.06310  [pdf, other

    cs.CV cs.CL cs.CY

    ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation

    Authors: Akshita Jha, Vinodkumar Prabhakaran, Remi Denton, Sarah Laszlo, Shachi Dave, Rida Qadri, Chandan K. Reddy, Sunipa Dev

    Abstract: Recent studies have shown that Text-to-Image (T2I) model generations can reflect social stereotypes present in the real world. However, existing approaches for evaluating stereotypes have a noticeable lack of coverage of global identity groups and their associated stereotypes. To address this gap, we introduce the ViSAGe (Visual Stereotypes Around the Globe) dataset to enable the evaluation of kno… ▽ More

    Submitted 14 July, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Association for Computational Linguistics (ACL) 2024

  9. arXiv:2311.17259  [pdf, other

    cs.LG cs.CY

    SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata

    Authors: Mark Díaz, Sunipa Dev, Emily Reif, Emily Denton, Vinodkumar Prabhakaran

    Abstract: The unstructured nature of data used in foundation model development is a challenge to systematic analyses for making data use and documentation decisions. From a Responsible AI perspective, these decisions often rely upon understanding how people are represented in data. We propose a framework designed to guide analysis of human representation in unstructured data and identify downstream risks. W… ▽ More

    Submitted 1 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  10. arXiv:2307.10514  [pdf, other

    cs.CL cs.AI cs.HC

    Building Socio-culturally Inclusive Stereotype Resources with Community Engagement

    Authors: Sunipa Dev, Jaya Goyal, Dinesh Tewari, Shachi Dave, Vinodkumar Prabhakaran

    Abstract: With rapid development and deployment of generative language models in global settings, there is an urgent need to also scale our measurements of harm, not just in the number and types of harms covered, but also how well they account for local cultural contexts, including marginalized identities and the social biases experienced by them. Current evaluation paradigms are limited in their abilities… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  11. arXiv:2306.03950  [pdf, other

    cs.CL

    MISGENDERED: Limits of Large Language Models in Understanding Pronouns

    Authors: Tamanna Hossain, Sunipa Dev, Sameer Singh

    Abstract: Content Warning: This paper contains examples of misgendering and erasure that could be offensive and potentially triggering. Gender bias in language technologies has been widely studied, but research has mostly been restricted to a binary paradigm of gender. It is essential also to consider non-binary gender identities, as excluding them can cause further harm to an already marginalized group.… ▽ More

    Submitted 7 July, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: Accepted at ACL 2023 as a long paper

  12. arXiv:2305.11840  [pdf, other

    cs.CL cs.CY

    SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models

    Authors: Akshita Jha, Aida Davani, Chandan K. Reddy, Shachi Dave, Vinodkumar Prabhakaran, Sunipa Dev

    Abstract: Stereotype benchmark datasets are crucial to detect and mitigate social stereotypes about groups of people in NLP models. However, existing datasets are limited in size and coverage, and are largely restricted to stereotypes prevalent in the Western society. This is especially problematic as language technologies gain hold across the globe. To address this gap, we present SeeGULL, a broad-coverage… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  13. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  14. arXiv:2303.17204  [pdf, other

    cs.DS cs.CG

    A Subquadratic Time Algorithm for the Weighted $k$-Center Problem on Cactus Graphs

    Authors: Binay Bhattacharya, Sandip Das, Subhadeep Ranjan Dev

    Abstract: The weighted $k$-center problem in graphs is a classical facility location problem where we place $k$ centers on the graph, which minimize the maximum weighted distance of a vertex to its nearest center. We study this problem when the underlying graph is a cactus with $n$ vertices and present an $O(n \log^2 n)$ time algorithm for the same. This time complexity improves upon the $O(n^2)$ time algor… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: Submitted to Theoretical Computer Science

    MSC Class: 68W01 ACM Class: F.2.0

  15. arXiv:2211.11206  [pdf, other

    cs.CL cs.AI cs.CY

    Cultural Re-contextualization of Fairness Research in Language Technologies in India

    Authors: Shaily Bhatt, Sunipa Dev, Partha Talukdar, Shachi Dave, Vinodkumar Prabhakaran

    Abstract: Recent research has revealed undesirable biases in NLP data and models. However, these efforts largely focus on social disparities in the West, and are not directly portable to other geo-cultural contexts. In this position paper, we outline a holistic research agenda to re-contextualize NLP fairness research for the Indian context, accounting for Indian societal context, bridging technological gap… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted to NeurIPS Workshop on "Cultures in AI/AI in Culture". This is a non-archival short version, to cite please refer to our complete paper: arXiv:2209.12226

  16. arXiv:2211.08742  [pdf, other

    cs.LG cs.AI cs.CL

    Auditing Algorithmic Fairness in Machine Learning for Health with Severity-Based LOGAN

    Authors: Anaelia Ovalle, Sunipa Dev, Jieyu Zhao, Majid Sarrafzadeh, Kai-Wei Chang

    Abstract: Auditing machine learning-based (ML) healthcare tools for bias is critical to preventing patient harm, especially in communities that disproportionately face health inequities. General frameworks are becoming increasingly available to measure ML fairness gaps between groups. However, ML for health (ML4H) auditing principles call for a contextual, patient-centered approach to model assessment. Ther… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: v1

  17. arXiv:2211.08283  [pdf, other

    cs.DM cs.DS math.CO

    The RED-BLUE SEPARATION problem on graphs

    Authors: Subhadeep Ranjan Dev, Sanjana Dey, Florent Foucaud, Ralf Klasing, Tuomo Lehtilä

    Abstract: We introduce the Red-Blue Separation problem on graphs, where we are given a graph $G=(V,E)$ whose vertices are colored either red or blue, and we want to select a (small) subset $S \subseteq V$, called red-blue separating set, such that for every red-blue pair of vertices, there is a vertex $s \in S$ whose closed neighborhood contains exactly one of the two vertices of the pair. We study the comp… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Journal ref: Theoretical Computer Science 970:114061, 2023

  18. arXiv:2211.01885  [pdf, other

    eess.IV cs.CV q-bio.QM

    Using U-Net Network for Efficient Brain Tumor Segmentation in MRI Images

    Authors: Jason Walsh, Alice Othmani, Mayank Jain, Soumyabrata Dev

    Abstract: Magnetic Resonance Imaging (MRI) is the most commonly used non-intrusive technique for medical image acquisition. Brain tumor segmentation is the process of algorithmically identifying tumors in brain MRI scans. While many approaches have been proposed in the literature for brain tumor segmentation, this paper proposes a lightweight implementation of U-Net. Apart from providing real-time segmentat… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: Published in Healthcare Analytics, 2022

  19. arXiv:2210.16050  [pdf, other

    cs.DB

    Link Climate: An Interoperable Knowledge Graph Platform for Climate Data

    Authors: Jiantao Wu, Fabrizio Orlandi, Declan O'Sullivan, Soumyabrata Dev

    Abstract: Climate science has become more ambitious in recent years as global awareness about the environment has grown. To better understand climate, historical climate (e.g. archived meteorological variables such as temperature, wind, water, etc.) and climate-related data (e.g. geographical features and human activities) are widely used by today's climate research to derive models for an explainable clima… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Published in Computers and Geosciences, 2022

  20. arXiv:2210.10040  [pdf, other

    cs.CL cs.CY cs.LG cs.SI

    The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks

    Authors: Nikil Roashan Selvam, Sunipa Dev, Daniel Khashabi, Tushar Khot, Kai-Wei Chang

    Abstract: How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given language model? In this work, we study this question by contrasting social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye. To do so, we empirically simulate various alternat… ▽ More

    Submitted 16 June, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: ACL 2023

  21. arXiv:2209.12226  [pdf, other

    cs.CL cs.CY

    Re-contextualizing Fairness in NLP: The Case of India

    Authors: Shaily Bhatt, Sunipa Dev, Partha Talukdar, Shachi Dave, Vinodkumar Prabhakaran

    Abstract: Recent research has revealed undesirable biases in NLP data and models. However, these efforts focus on social disparities in West, and are not directly portable to other geo-cultural contexts. In this paper, we focus on NLP fair-ness in the context of India. We start with a brief account of the prominent axes of social disparities in India. We build resources for fairness evaluation in the Indian… ▽ More

    Submitted 21 November, 2022; v1 submitted 25 September, 2022; originally announced September 2022.

    Comments: Accepted to AACL-IJCNLP 2022

  22. arXiv:2208.10265  [pdf, other

    cs.AI cs.IR cs.LG

    A semantic web approach to uplift decentralized household energy data

    Authors: Jiantao Wu, Fabrizio Orlandi, Tarek AlSkaif, Declan O'Sullivan, Soumyabrata Dev

    Abstract: In a decentralized household energy system comprised of various devices such as home appliances, electric vehicles, and solar panels, end-users are able to dig deeper into the system's details and further achieve energy sustainability if they are presented with data on the electric energy consumption and production at the granularity of the device. However, many databases in this field are siloed… ▽ More

    Submitted 26 August, 2022; v1 submitted 18 August, 2022; originally announced August 2022.

    Comments: Published in Sustainable Energy, Grids and Networks (SEGAN) 2022

  23. arXiv:2206.07176  [pdf, other

    cs.SD cs.CL eess.AS

    Frequency-centroid features for word recognition of non-native English speakers

    Authors: Pierre Berjon, Rajib Sharma, Avishek Nag, Soumyabrata Dev

    Abstract: The objective of this work is to investigate complementary features which can aid the quintessential Mel frequency cepstral coefficients (MFCCs) in the task of closed, limited set word recognition for non-native English speakers of different mother-tongues. Unlike the MFCCs, which are derived from the spectral energy of the speech signal, the proposed frequency-centroids (FCs) encapsulate the spec… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: Published in IEEE Irish Signals & Systems Conference (ISSC), 2022

  24. arXiv:2206.03239  [pdf, other

    cs.LG eess.SP

    Analyzing the impact of feature selection on the accuracy of heart disease prediction

    Authors: Muhammad Salman Pathan, Avishek Nag, Muhammad Mohisn Pathan, Soumyabrata Dev

    Abstract: Heart Disease has become one of the most serious diseases that has a significant impact on human life. It has emerged as one of the leading causes of mortality among the people across the globe during the last decade. In order to prevent patients from further damage, an accurate diagnosis of heart disease on time is an essential factor. Recently we have seen the usage of non-invasive medical proce… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Published in Healthcare Analytics, 2022

  25. arXiv:2205.12617  [pdf, other

    cs.CL cs.AI cs.CV

    DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation

    Authors: Jingnong Qu, Liunian Harold Li, Jieyu Zhao, Sunipa Dev, Kai-Wei Chang

    Abstract: Disinformation has become a serious problem on social media. In particular, given their short format, visual attraction, and humorous nature, memes have a significant advantage in dissemination among online communities, making them an effective vehicle for the spread of disinformation. We present DisinfoMeme to help detect disinformation memes. The dataset contains memes mined from Reddit covering… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  26. arXiv:2205.01583  [pdf, other

    cs.MM cs.HC

    An Explore of Virtual Reality for Awareness of the Climate Change Crisis: A Simulation of Sea Level Rise

    Authors: Zixiang Xu, Abraham G. Campbell, Soumyabrata Dev, Yuan Liang

    Abstract: Virtual Reality (VR) technology has been shown to achieve remarkable results in multiple fields. Due to the nature of the immersive medium of Virtual Reality it logically follows that it can be used as a high-quality educational tool as it offers potentially a higher bandwidth than other mediums such as text, pictures and videos. This short paper illustrates the development of a climate change edu… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: Published in 8th International Conference of the Immersive Learning Research Network (iLRN 2022)

  27. arXiv:2204.06454  [pdf, other

    cs.CV

    DMCNet: Diversified Model Combination Network for Understanding Engagement from Video Screengrabs

    Authors: Sarthak Batra, Hewei Wang, Avishek Nag, Philippe Brodeur, Marianne Checkley, Annette Klinkert, Soumyabrata Dev

    Abstract: Engagement is an essential indicator of the Quality-of-Learning Experience (QoLE) and plays a major role in developing intelligent educational interfaces. The number of people learning through Massively Open Online Courses (MOOCs) and other online resources has been increasing rapidly because they provide us with the flexibility to learn from anywhere at any time. This provides a good learning exp… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

    Comments: Published in Systems and Soft Computing, 2022

  28. arXiv:2204.02311  [pdf, other

    cs.CL

    PaLM: Scaling Language Modeling with Pathways

    Authors: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin , et al. (42 additional authors not shown)

    Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran… ▽ More

    Submitted 5 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

  29. arXiv:2203.08118  [pdf, other

    cs.CL

    Representation Learning for Resource-Constrained Keyphrase Generation

    Authors: Di Wu, Wasi Uddin Ahmad, Sunipa Dev, Kai-Wei Chang

    Abstract: State-of-the-art keyphrase generation methods generally depend on large annotated datasets, limiting their performance in domains with limited annotated data. To overcome this challenge, we design a data-oriented approach that first identifies salient information using retrieval-based corpus-level statistics, and then learns a task-specific intermediate representation based on a pre-trained langua… ▽ More

    Submitted 21 October, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: EMNLP 2022 (Findings)

  30. arXiv:2203.00497  [pdf, other

    cs.LG

    A predictive analytics approach for stroke prediction using machine learning and neural networks

    Authors: Soumyabrata Dev, Hewei Wang, Chidozie Shamrock Nwosu, Nishtha Jain, Bharadwaj Veeravalli, Deepu John

    Abstract: The negative impact of stroke in society has led to concerted efforts to improve the management and diagnosis of stroke. With an increased synergy between technology and medical diagnosis, caregivers create opportunities for better patient management by systematically mining and archiving the patients' medical records. Therefore, it is vital to study the interdependency of these risk factors in pa… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: Published in Healthcare Analytics, 2022

  31. arXiv:2112.06263  [pdf, other

    cs.DC

    Sage: Leveraging ML to Diagnose Unpredictable Performance in Cloud Microservices

    Authors: Yu Gan, Mingyu Liang, Sundar Dev, David Lo, Christina Delimitrou

    Abstract: Cloud applications are increasingly shifting from large monolithic services, to complex graphs of loosely-coupled microservices. Despite their advantages, microservices also introduce cascading QoS violations in cloud applications, which are difficult to diagnose and correct. We present Sage, a ML-driven root cause analysis system for interactive cloud microservices. Sage leverages unsupervised… ▽ More

    Submitted 12 December, 2021; originally announced December 2021.

  32. arXiv:2110.11039  [pdf, other

    cs.DB

    Automated Climate Analyses Using Knowledge Graph

    Authors: Jiantao Wu, Huan Chen, Fabrizio Orlandi, Yee Hui Lee, Declan O'Sullivan, Soumyabrata Dev

    Abstract: The FAIR (Findable, Accessible, Interoperable, Reusable) data principles are fundamental for climate researchers and all stakeholders in the current digital ecosystem. In this paper, we demonstrate how relational climate data can be "FAIR" and modeled using RDF, in line with Semantic Web technologies and our Climate Analysis ontology. Thus, heterogeneous climate data can be stored in graph databas… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

    Comments: Accepted in Proc. IEEE AP-S Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting, 2021

  33. arXiv:2110.10152  [pdf, other

    cs.LG

    Identifying Stroke Indicators Using Rough Sets

    Authors: Muhammad Salman Pathan, Jianbiao Zhang, Deepu John, Avishek Nag, Soumyabrata Dev

    Abstract: Stroke is widely considered as the second most common cause of mortality. The adverse consequences of stroke have led to global interest and work for improving the management and diagnosis of stroke. Various techniques for data mining have been used globally for accurate prediction of occurrence of stroke based on the risk factors that are associated with the electronic health care records (EHRs)… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted in IEEE Access, 2020

  34. arXiv:2110.09797  [pdf, other

    cs.DB

    An Interoperable Open Data Portal for Climate Analysis

    Authors: Jiantao Wu, Huan Chen, Fabrizio Orlandi, Yee Hui Lee, Declan O'Sullivan, Soumyabrata Dev

    Abstract: This work proposes an open interoperable data portal that offers access to a Web-wide climate domain knowledge graph created for Ireland and England's NOAA climate daily data. There are three main components contributing to this data portal: the first is the upper layer schema of the knowledge graph -- the climate analysis (CA) ontology -- the second is an ad hoc SPARQL server by which to store th… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted in Proc. IEEE AP-S Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting, 2021

  35. arXiv:2110.09764  [pdf, other

    cs.CV

    Detecting Blurred Ground-based Sky/Cloud Images

    Authors: Mayank Jain, Navya Jain, Yee Hui Lee, Stefan Winkler, Soumyabrata Dev

    Abstract: Ground-based whole sky imagers (WSIs) are being used by researchers in various fields to study the atmospheric events. These ground-based sky cameras capture visible-light images of the sky at regular intervals of time. Owing to the atmospheric interference and camera sensor noise, the captured images often exhibit noise and blur. This may pose a problem in subsequent image processing stages. Ther… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted in Proc. IEEE AP-S Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting, 2021

  36. arXiv:2110.09209  [pdf, other

    physics.ao-ph cs.LG

    Graph-based Local Climate Classification in Iran

    Authors: Neda Akrami, Koorush Ziarati, Soumyabrata Dev

    Abstract: In this paper, we introduce a novel graph-based method to classify the regions with similar climate in a local area. We refer our proposed method as Graph Partition Based Method (GPBM). Our proposed method attempts to overcome the shortcomings of the current state-of-the-art methods in the literature. It has no limit on the number of variables that can be used and also preserves the nature of clim… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: Accepted in International Journal of Climatology, 2021

  37. arXiv:2110.09179  [pdf, other

    cs.CL

    Analysis of French Phonetic Idiosyncrasies for Accent Recognition

    Authors: Pierre Berjon, Avishek Nag, Soumyabrata Dev

    Abstract: Speech recognition systems have made tremendous progress since the last few decades. They have developed significantly in identifying the speech of the speaker. However, there is a scope of improvement in speech recognition systems in identifying the nuances and accents of a speaker. It is known that any specific natural language may possess at least one accent. Despite the identical word phonemic… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: Accepted in Soft Computing Letters, 2021

  38. arXiv:2110.07871  [pdf, ps, other

    cs.CL

    Socially Aware Bias Measurements for Hindi Language Representations

    Authors: Vijit Malik, Sunipa Dev, Akihiro Nishi, Nanyun Peng, Kai-Wei Chang

    Abstract: Language representations are efficient tools used across NLP applications, but they are strife with encoded societal biases. These biases are studied extensively, but with a primary focus on English language representations and biases common in the context of Western society. In this work, we investigate biases present in Hindi language representations with focuses on caste and religion-associated… ▽ More

    Submitted 9 May, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: 12 Pages (5 Pages main content+ 2 pages for references + 5 Pages Appendix)

  39. arXiv:2108.12084  [pdf, other

    cs.CL cs.AI cs.LG

    Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies

    Authors: Sunipa Dev, Masoud Monajatipoor, Anaelia Ovalle, Arjun Subramonian, Jeff M Phillips, Kai-Wei Chang

    Abstract: Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of non-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understandin… ▽ More

    Submitted 10 September, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

    Journal ref: EMNLP 2021

  40. arXiv:2108.03362  [pdf, other

    cs.CL cs.CY

    On Measures of Biases and Harms in NLP

    Authors: Sunipa Dev, Emily Sheng, Jieyu Zhao, Aubrie Amstutz, Jiao Sun, Yu Hou, Mattie Sanseverino, Jiin Kim, Akihiro Nishi, Nanyun Peng, Kai-Wei Chang

    Abstract: Recent studies show that Natural Language Processing (NLP) technologies propagate societal biases about demographic groups associated with attributes such as gender, race, and nationality. To create interventions and mitigate these biases and associated harms, it is vital to be able to detect and measure such biases. While existing works propose bias evaluation and mitigation methods for various t… ▽ More

    Submitted 13 October, 2022; v1 submitted 7 August, 2021; originally announced August 2021.

  41. arXiv:2108.01504  [pdf, other

    cs.DB

    Ontology Modeling for Decentralized Household Energy Systems

    Authors: Jiantao Wu, Fabrizio Orlandi, Tarek AlSkaif, Declan O'Sullivan, Soumyabrata Dev

    Abstract: In a decentralized household energy system consisting of various devices such as washing machines, heat pumps, and solar panels, understanding the electric energy consumption and production data at the granularity of the device helps end-users be closer to the system and further achieve the sustainability of energy use. However, many datasets in this area are isolated from other domains with recor… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

    Comments: Published in IEEE International Conference on Smart Energy Systems and Technologies (SEST), 2021

  42. arXiv:2108.01433  [pdf, other

    cs.CY

    Impact of Load Demand Dataset Characteristics on Clustering Validation Indices

    Authors: Mayank Jain, Mukta Jain, Tarek AlSkaif, Soumyabrata Dev

    Abstract: With the inclusion of smart meters, electricity load consumption data can be fetched for individual consumer buildings at high temporal resolutions. Availability of such data has made it possible to study daily load demand profiles of the households. Clustering households based on their demand profiles is one of the primary, yet a key component of such analysis. While many clustering algorithms/fr… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

    Comments: Published in IEEE International Conference on Smart Energy Systems and Technologies (SEST), 2021

  43. arXiv:2106.03085  [pdf, other

    cs.DB

    An Ontology Model for Climatic Data Analysis

    Authors: Jiantao Wu, Fabrizio Orlandi, Declan O'Sullivan, Soumyabrata Dev

    Abstract: Recently ontologies have been exploited in a wide range of research areas for data modeling and data management. They greatly assists in defining the semantic model of the underlying data combined with domain knowledge. In this paper, we propose the Climate Analysis (CA) Ontology to model climate datasets used by remote sensing analysts. We use the data published by National Oceanic and Atmospheri… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

    Comments: Published in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2021

  44. arXiv:2106.03064  [pdf, other

    cs.CV eess.IV

    Using GANs to Augment Data for Cloud Image Segmentation Task

    Authors: Mayank Jain, Conor Meegan, Soumyabrata Dev

    Abstract: While cloud/sky image segmentation has extensive real-world applications, a large amount of labelled data is needed to train a highly accurate models to perform the task. Scarcity of such volumes of cloud/sky images with corresponding ground-truth binary maps makes it highly difficult to train such complex image segmentation models. In this paper, we demonstrate the effectiveness of using Generati… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

    Comments: Published in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2021

  45. arXiv:2105.10831  [pdf, other

    cs.CV

    Stereo Matching Based on Visual Sensitive Information

    Authors: Hewei Wang, Muhammad Salman Pathan, Soumyabrata Dev

    Abstract: The area of computer vision is one of the most discussed topics amongst many scholars, and stereo matching is its most important sub fields. After the parallax map is transformed into a depth map, it can be applied to many intelligent fields. In this paper, a stereo matching algorithm based on visual sensitive information is proposed by using standard images from Middlebury dataset. Aiming at the… ▽ More

    Submitted 22 May, 2021; originally announced May 2021.

    Comments: Published in 6th IEEE International Conference on Image, Vision and Computing (ICIVC), 2021

  46. arXiv:2105.08537  [pdf, other

    cs.LG cs.CY

    A Clustering Framework for Residential Electric Demand Profiles

    Authors: Mayank Jain, Tarek AlSkaif, Soumyabrata Dev

    Abstract: The availability of residential electric demand profiles data, enabled by the large-scale deployment of smart metering infrastructure, has made it possible to perform more accurate analysis of electricity consumption patterns. This paper analyses the electric demand profiles of individual households located in the city Amsterdam, the Netherlands. A comprehensive clustering framework is defined to… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Published in Proc. International Conference on Smart Energy Systems and Technologies (SEST), 2020

  47. arXiv:2105.08390  [pdf, other

    cs.HC

    3D Displays: Their Evolution, Inherent Challenges & Future Perspectives

    Authors: Xingyu Pan, Xuanhui Xu, Soumyabrata Dev, Abraham G Campbell

    Abstract: The popularity of 3D displays has risen drastically over the past few decades but these displays are still merely a novelty compared to their true potential. The development has mostly focused on Head Mounted Displays (HMD) development for Virtual Reality and in general ignored non-HMD 3D displays. This is due to the inherent difficulty in the creation of these displays and their impracticability… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

    Comments: Published in Future Technologies Conference (FTC), 2021

  48. arXiv:2104.02797  [pdf, other

    cs.CL cs.HC

    VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

    Authors: Archit Rathore, Sunipa Dev, Jeff M. Phillips, Vivek Srikumar, Yan Zheng, Chin-Chia Michael Yeh, Junpeng Wang, Wei Zhang, Bei Wang

    Abstract: Word vector embeddings have been shown to contain and amplify biases in data they are extracted from. Consequently, many techniques have been proposed to identify, mitigate, and attenuate these biases in word representations. In this paper, we utilize interactive visualization to increase the interpretability and accessibility of a collection of state-of-the-art debiasing techniques. To aid this,… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 11 pages

  49. arXiv:2101.00267  [pdf, other

    cs.DC cs.PF

    Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices

    Authors: Yu Gan, Mingyu Liang, Sundar Dev, David Lo, Christina Delimitrou

    Abstract: Cloud applications are increasingly shifting from large monolithic services to complex graphs of loosely-coupled microservices. Despite the advantages of modularity and elasticity microservices offer, they also complicate cluster management and performance debugging, as dependencies between tiers introduce backpressure and cascading QoS violations. We present Sage, a machine learning-driven root… ▽ More

    Submitted 1 January, 2021; originally announced January 2021.

  50. arXiv:2011.12465  [pdf, other

    cs.CL cs.AI cs.CG cs.DS

    The Geometry of Distributed Representations for Better Alignment, Attenuated Bias, and Improved Interpretability

    Authors: Sunipa Dev

    Abstract: High-dimensional representations for words, text, images, knowledge graphs and other structured data are commonly used in different paradigms of machine learning and data mining. These representations have different degrees of interpretability, with efficient distributed representations coming at the cost of the loss of feature to dimension mapping. This implies that there is obfuscation in the wa… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

    Comments: PhD thesis, University of Utah (2020)