-
e-Health CSIRO at "Discharge Me!" 2024: Generating Discharge Summary Sections with Fine-tuned Language Models
Authors:
Jinghui Liu,
Aaron Nicolson,
Jason Dowling,
Bevan Koopman,
Anthony Nguyen
Abstract:
Clinical documentation is an important aspect of clinicians' daily work and often demands a significant amount of time. The BioNLP 2024 Shared Task on Streamlining Discharge Documentation (Discharge Me!) aims to alleviate this documentation burden by automatically generating discharge summary sections, including brief hospital course and discharge instruction, which are often time-consuming to syn…
▽ More
Clinical documentation is an important aspect of clinicians' daily work and often demands a significant amount of time. The BioNLP 2024 Shared Task on Streamlining Discharge Documentation (Discharge Me!) aims to alleviate this documentation burden by automatically generating discharge summary sections, including brief hospital course and discharge instruction, which are often time-consuming to synthesize and write manually. We approach the generation task by fine-tuning multiple open-sourced language models (LMs), including both decoder-only and encoder-decoder LMs, with various configurations on input context. We also examine different setups for decoding algorithms, model ensembling or merging, and model specialization. Our results show that conditioning on the content of discharge summary prior to the target sections is effective for the generation task. Furthermore, we find that smaller encoder-decoder LMs can work as well or even slightly better than larger decoder based LMs fine-tuned through LoRA. The model checkpoints from our team (aehrc) are openly available.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation
Authors:
Xinyu Mao,
Shengyao Zhuang,
Bevan Koopman,
Guido Zuccon
Abstract:
The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review. This saves reviewing effort if paired with a stopping criterion, and speeds up review completion if performed alongside downstream tasks. Recent studies have shown that neural models have good potential on this task, but their time-consuming fin…
▽ More
The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review. This saves reviewing effort if paired with a stopping criterion, and speeds up review completion if performed alongside downstream tasks. Recent studies have shown that neural models have good potential on this task, but their time-consuming fine-tuning and inference discourage their widespread use for screening prioritisation. In this paper, we propose an alternative approach that still relies on neural models, but leverages dense representations and relevance feedback to enhance screening prioritisation, without the need for costly model fine-tuning and inference. This method exploits continuous relevance feedback from reviewers during document screening to efficiently update the dense query representation, which is then applied to rank the remaining documents to be screened. We evaluate this approach across the CLEF TAR datasets for this task. Results suggest that the investigated dense query-driven approach is more efficient than directly using neural models and shows promising effectiveness compared to previous methods developed on the considered datasets. Our code is available at https://github.com/ielab/dense-screening-feedback.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
The Simons Observatory: Deployment and current configuration of the Observatory Control System for SAT-MF1 and data access software systems
Authors:
Sanah Bhimani,
Jack Lashner,
Simone Aiola,
Kevin T. Crowley,
Nicholas Galitzki,
Kathleen Harrington,
Matthew Hasselfield,
Alyssa Johnson,
Brian J. Koopman,
Hironobu Nakata,
Laura Newburgh,
David V. Nguyen,
Michael J. Randall,
Max Silva-Feaver
Abstract:
The Simons Observatory (SO) is a Cosmic Microwave Background experiment located in the Atacama Desert in Chile. SO consists of three small aperture telescopes (SATs) and one large aperture telescope (LAT) with a total of 60,000 detectors in six frequency bands. As an observatory, SO encompasses hundreds of hardware components simultaneously running at different readout rates, all separate from its…
▽ More
The Simons Observatory (SO) is a Cosmic Microwave Background experiment located in the Atacama Desert in Chile. SO consists of three small aperture telescopes (SATs) and one large aperture telescope (LAT) with a total of 60,000 detectors in six frequency bands. As an observatory, SO encompasses hundreds of hardware components simultaneously running at different readout rates, all separate from its 60,000 detectors on-sky and their metadata. We provide an overview of commissioning SO's data acquisition software system for SAT-MF1, the first SAT deployed to the Atacama site. Additionally, we share insights from deploying data access software for all four telescopes, detailing how performance limitations affected data loading and quality investigations, which led to site-compatible software improvements.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
The Simons Observatory: Deployment of the observatory control system and supporting infrastructure
Authors:
Brian J. Koopman,
Sanah Bhimani,
Nicholas Galitzki,
Matthew Hasselfield,
Jack Lashner,
Hironobu Nakata,
Laura Newburgh,
David V. Nguyen,
Tai Sakuma,
Kyohei Yamada
Abstract:
The Simons Observatory (SO) is a cosmic microwave background (CMB) observatory consisting of three small aperture telescopes and one large aperture telescope. SO is located in the Atacama Desert in Chile at an elevation of 5180m. Distributed among the four telescopes are over 60,000 transition-edge sensor (TES) bolometers across six spectral bands centered between 27 and 280 GHz. A large collectio…
▽ More
The Simons Observatory (SO) is a cosmic microwave background (CMB) observatory consisting of three small aperture telescopes and one large aperture telescope. SO is located in the Atacama Desert in Chile at an elevation of 5180m. Distributed among the four telescopes are over 60,000 transition-edge sensor (TES) bolometers across six spectral bands centered between 27 and 280 GHz. A large collection of ancillary hardware devices which produce lower rate `housekeeping' data are used to support the detector data collection.
We developed a distributed control system, which we call the observatory control system (ocs), to coordinate data collection among all systems within the observatory. ocs is a core component of the deployed site software, interfacing with all on-site hardware. Alongside ocs we utilize a combination of internally and externally developed open source projects to enable remote monitoring, data management, observation coordination, and data processing.
Deployment of a majority of the software is done using Docker containers. The deployment of software packages is partially done via automated Ansible scripts, utilizing a GitOps based approach for updating infrastructure on site. We describe an overview of the software and computing systems deployed within SO, including how those systems are deployed and interact with each other. We also discuss the timing distribution system and its configuration as well as lessons learned during the deployment process and where we plan to make future improvements.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
The Simons Observatory: Alarms and Detector Quality Monitoring
Authors:
David V. Nguyen,
Sanah Bhimani,
Nicholas Galitzki,
Brian J. Koopman,
Jack Lashner,
Laura Newburgh,
Max Silva-Feaver,
Kyohei Yamada
Abstract:
The Simons Observatory (SO) is a group of modern telescopes dedicated to observing the polarized cosmic microwave background (CMB), transients, and more. The Observatory consists of four telescopes and instruments, with over 60,000 superconducting detectors in total, located at ~5,200 m altitude in the Atacama Desert of Chile. During observations, it is important to ensure the detectors, telescope…
▽ More
The Simons Observatory (SO) is a group of modern telescopes dedicated to observing the polarized cosmic microwave background (CMB), transients, and more. The Observatory consists of four telescopes and instruments, with over 60,000 superconducting detectors in total, located at ~5,200 m altitude in the Atacama Desert of Chile. During observations, it is important to ensure the detectors, telescope platforms, calibration and receiver hardware, and site hardware are within operational bounds. To facilitate rapid response when problems arise with any part of the system, it is essential that alerts are generated and distributed to appropriate personnel if components exceed these bounds. Similarly, alerts are generated if the quality of the data has become degraded. In this paper, we describe the SO alarm system we developed within the larger Observatory Control System (OCS) framework, including the data sources, alert architecture, and implementation. We also present results from deploying the alarm system during the commissioning of the SO telescopes and receivers.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It
Authors:
Aaron Nicolson,
Shengyao Zhuang,
Jason Dowling,
Bevan Koopman
Abstract:
This study investigates the integration of diverse patient data sources into multimodal language models for automated chest X-ray (CXR) report generation. Traditionally, CXR report generation relies solely on CXR images and limited radiology data, overlooking valuable information from patient health records, particularly from emergency departments. Utilising the MIMIC-CXR and MIMIC-IV-ED datasets,…
▽ More
This study investigates the integration of diverse patient data sources into multimodal language models for automated chest X-ray (CXR) report generation. Traditionally, CXR report generation relies solely on CXR images and limited radiology data, overlooking valuable information from patient health records, particularly from emergency departments. Utilising the MIMIC-CXR and MIMIC-IV-ED datasets, we incorporate detailed patient information such as aperiodic vital signs, medications, and clinical history to enhance diagnostic accuracy. We introduce a novel approach to transform these heterogeneous data sources into embeddings that prompt a multimodal language model, significantly enhancing the diagnostic accuracy of generated radiology reports. Our comprehensive evaluation demonstrates the benefits of using a broader set of patient data, underscoring the potential for enhanced diagnostic capabilities and better patient outcomes through the integration of multimodal data in CXR report generation.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Simons Observatory: Observatory Scheduler and Automated Data Processing
Authors:
Yilun Guan,
Kathleen Harrington,
Jack Lashner,
Sanah Bhimani,
Kevin T. Crowley,
Nicholas Galitzki,
Ken Ganga,
Matthew Hasselfield,
Adam D. Hincks,
Brian Keating,
Brian J. Koopman,
Laura Newburgh,
David V. Nguyen,
Max Silva-Feaver
Abstract:
The Simons Observatory (SO) is a next-generation ground-based telescope located in the Atacama Desert in Chile, designed to map the cosmic microwave background (CMB) with unprecedented precision. The observatory consists of three small aperture telescopes (SATs) and one large aperture telescope (LAT), each optimized for distinct but complementary scientific goals. To achieve these goals, optimized…
▽ More
The Simons Observatory (SO) is a next-generation ground-based telescope located in the Atacama Desert in Chile, designed to map the cosmic microwave background (CMB) with unprecedented precision. The observatory consists of three small aperture telescopes (SATs) and one large aperture telescope (LAT), each optimized for distinct but complementary scientific goals. To achieve these goals, optimized scan strategies have been defined for both the SATs and LAT. This paper describes a software system deployed in SO that effectively translates high-level scan strategies into realistic observing scripts executable by the telescope, taking into account realistic observational constraints. The data volume of SO also necessitates a scalable software infrastructure to support its daily data processing needs. This paper also outlines an automated workflow system for managing data packaging and daily data reduction at the site.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking
Authors:
Ferdinand Schlatt,
Maik Fröbe,
Harrisen Scells,
Shengyao Zhuang,
Bevan Koopman,
Guido Zuccon,
Benno Stein,
Martin Potthast,
Matthias Hagen
Abstract:
Cross-encoders distilled from large language models (LLMs) are often more effective re-rankers than cross-encoders fine-tuned on manually labeled data. However, the distilled models usually do not reach their teacher LLM's effectiveness. To investigate whether best practices for fine-tuning cross-encoders on manually labeled data (e.g., hard-negative sampling, deep sampling, and listwise loss func…
▽ More
Cross-encoders distilled from large language models (LLMs) are often more effective re-rankers than cross-encoders fine-tuned on manually labeled data. However, the distilled models usually do not reach their teacher LLM's effectiveness. To investigate whether best practices for fine-tuning cross-encoders on manually labeled data (e.g., hard-negative sampling, deep sampling, and listwise loss functions) can help to improve LLM ranker distillation, we construct and release a new distillation dataset: Rank-DistiLLM. In our experiments, cross-encoders trained on Rank-DistiLLM reach the effectiveness of LLMs while being orders of magnitude more efficient. Our code and data is available at https://github.com/webis-de/msmarco-llm-distillation.
△ Less
Submitted 16 June, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
The Simons Observatory: Design, integration, and testing of the small aperture telescopes
Authors:
Nicholas Galitzki,
Tran Tsan,
Jake Spisak,
Michael Randall,
Max Silva-Feaver,
Joseph Seibert,
Jacob Lashner,
Shunsuke Adachi,
Sean M. Adkins,
Thomas Alford,
Kam Arnold,
Peter C. Ashton,
Jason E. Austermann,
Carlo Baccigalupi,
Andrew Bazarko,
James A. Beall,
Sanah Bhimani,
Bryce Bixler,
Gabriele Coppi,
Lance Corbett,
Kevin D. Crowley,
Kevin T. Crowley,
Samuel Day-Weiss,
Simon Dicker,
Peter N. Dow
, et al. (55 additional authors not shown)
Abstract:
The Simons Observatory (SO) is a cosmic microwave background (CMB) survey experiment that includes small-aperture telescopes (SATs) observing from an altitude of 5,200 m in the Atacama Desert in Chile. The SO SATs will cover six spectral bands between 27 and 280 GHz to search for primordial B-modes to a sensitivity of $σ(r)=0.002$, with quantified systematic errors well below this value. Each SAT…
▽ More
The Simons Observatory (SO) is a cosmic microwave background (CMB) survey experiment that includes small-aperture telescopes (SATs) observing from an altitude of 5,200 m in the Atacama Desert in Chile. The SO SATs will cover six spectral bands between 27 and 280 GHz to search for primordial B-modes to a sensitivity of $σ(r)=0.002$, with quantified systematic errors well below this value. Each SAT is a self-contained cryogenic telescope with a 35$^\circ$ field of view, 42 cm diameter optical aperture, 40 K half-wave plate, 1 K refractive optics, and $<0.1$ K focal plane that holds $>12,000$ TES detectors. We describe the nominal design of the SATs and present details about the integration and testing for one operating at 93 and 145 GHz.
△ Less
Submitted 10 May, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval
Authors:
Shengyao Zhuang,
Xueguang Ma,
Bevan Koopman,
Jimmy Lin,
Guido Zuccon
Abstract:
Utilizing large language models (LLMs) for zero-shot document ranking is done in one of two ways: 1) prompt-based re-ranking methods, which require no further training but are only feasible for re-ranking a handful of candidate documents due to computational costs; and 2) unsupervised contrastive trained dense retrieval methods, which can retrieve relevant documents from the entire corpus but requ…
▽ More
Utilizing large language models (LLMs) for zero-shot document ranking is done in one of two ways: 1) prompt-based re-ranking methods, which require no further training but are only feasible for re-ranking a handful of candidate documents due to computational costs; and 2) unsupervised contrastive trained dense retrieval methods, which can retrieve relevant documents from the entire corpus but require a large amount of paired text data for contrastive training. In this paper, we propose PromptReps, which combines the advantages of both categories: no need for training and the ability to retrieve from the whole corpus. Our method only requires prompts to guide an LLM to generate query and document representations for effective document retrieval. Specifically, we prompt the LLMs to represent a given text using a single word, and then use the last token's hidden states and the corresponding logits associated with the prediction of the next token to construct a hybrid document retrieval system. The retrieval system harnesses both dense text embedding and sparse bag-of-words representations given by the LLM. We further explore variations of this core idea that consider the generation of multiple words, and representations that rely on multiple embeddings and sparse distributions. Our experimental evaluation on the MSMARCO, TREC deep learning and BEIR zero-shot document retrieval datasets illustrates that this simple prompt-based LLM retrieval method can achieve a similar or higher retrieval effectiveness than state-of-the-art LLM embedding methods that are trained with large amounts of unsupervised data, especially when using a larger LLM.
△ Less
Submitted 16 June, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders
Authors:
Ferdinand Schlatt,
Maik Fröbe,
Harrisen Scells,
Shengyao Zhuang,
Bevan Koopman,
Guido Zuccon,
Benno Stein,
Martin Potthast,
Matthias Hagen
Abstract:
Existing cross-encoder re-rankers can be categorized as pointwise, pairwise, or listwise models. Pair- and listwise models allow passage interactions, which usually makes them more effective than pointwise models but also less efficient and less robust to input order permutations. To enable efficient permutation-invariant passage interactions during re-ranking, we propose a new cross-encoder archi…
▽ More
Existing cross-encoder re-rankers can be categorized as pointwise, pairwise, or listwise models. Pair- and listwise models allow passage interactions, which usually makes them more effective than pointwise models but also less efficient and less robust to input order permutations. To enable efficient permutation-invariant passage interactions during re-ranking, we propose a new cross-encoder architecture with inter-passage attention: the Set-Encoder. In Cranfield-style experiments on TREC Deep Learning and TIREx, the Set-Encoder is as effective as state-of-the-art listwise models while improving efficiency and robustness to input permutations. Interestingly, a pointwise model is similarly effective, but when additionally requiring the models to consider novelty, the Set-Encoder is more effective than its pointwise counterpart and retains its advantageous properties compared to other listwise models. Our code and models are publicly available at https://github.com/webis-de/set-encoder.
△ Less
Submitted 16 June, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Optical modeling of systematic uncertainties in detector polarization angles for the Atacama Cosmology Telescope
Authors:
Colin C. Murphy,
Steve K. Choi,
Rahul Datta,
Mark J. Devlin,
Matthew Hasselfield,
Brian J. Koopman,
Jeff McMahon,
Sigurd Naess,
Michael D. Niemack,
Lyman A. Page,
Suzanne T. Staggs,
Robert Thornton,
Edward J. Wollack
Abstract:
We present an estimate of the Atacama Cosmology Telescope (ACT) detector polarization angle systematic uncertainty from optics perturbation analysis using polarization-sensitive ray tracing in CODE V optical design software. Uncertainties in polarization angle calibration in CMB measurements can limit constraints on cosmic birefringence and other cosmological measurements. Our framework estimates…
▽ More
We present an estimate of the Atacama Cosmology Telescope (ACT) detector polarization angle systematic uncertainty from optics perturbation analysis using polarization-sensitive ray tracing in CODE V optical design software. Uncertainties in polarization angle calibration in CMB measurements can limit constraints on cosmic birefringence and other cosmological measurements. Our framework estimates the angle calibration systematic uncertainties from possible displacements in lens positions and orientations, and anti-reflection coating (ARC) thicknesses and refractive indices. With millimeter displacements in lens positions and percent-level perturbations in ARC thicknesses and indices from design, we find the total systematic uncertainty for three ACT detector arrays operating between 90--220 GHz to be at the tenth of degree scale. Reduced lens position and orientation uncertainties from physical measurements could lead to a reduction in the systematic uncertainty estimated with the framework presented here. This optical modeling can inform polarization angle systematic uncertainties for current and future microwave polarimeters, such as the CCAT Observatory, Simons Observatory, and CMB-S4.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems
Authors:
Shengyao Zhuang,
Bevan Koopman,
Xiaoran Chu,
Guido Zuccon
Abstract:
The introduction of Vec2Text, a technique for inverting text embeddings, has raised serious privacy concerns within dense retrieval systems utilizing text embeddings, including those provided by OpenAI and Cohere. This threat comes from the ability for a malicious attacker with access to text embeddings to reconstruct the original text.
In this paper, we investigate various aspects of embedding…
▽ More
The introduction of Vec2Text, a technique for inverting text embeddings, has raised serious privacy concerns within dense retrieval systems utilizing text embeddings, including those provided by OpenAI and Cohere. This threat comes from the ability for a malicious attacker with access to text embeddings to reconstruct the original text.
In this paper, we investigate various aspects of embedding models that could influence the recoverability of text using Vec2Text. Our exploration involves factors such as distance metrics, pooling functions, bottleneck pre-training, training with noise addition, embedding quantization, and embedding dimensions -- aspects not previously addressed in the original Vec2Text paper. Through a thorough analysis of these factors, our aim is to gain a deeper understanding of the critical elements impacting the trade-offs between text recoverability and retrieval effectiveness in dense retrieval systems. This analysis provides valuable insights for practitioners involved in designing privacy-aware dense retrieval systems. Additionally, we propose a straightforward fix for embedding transformation that ensures equal ranking effectiveness while mitigating the risk of text recoverability.
Furthermore, we extend the application of Vec2Text to the separate task of corpus poisoning, where, theoretically, Vec2Text presents a more potent threat compared to previous attack methods. Notably, Vec2Text does not require access to the dense retriever's model parameters and can efficiently generate numerous adversarial passages.
In summary, this study highlights the potential threat posed by Vec2Text to existing dense retrieval systems, while also presenting effective methods to patch and strengthen such systems against such risks.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search
Authors:
Shuai Wang,
Shengyao Zhuang,
Bevan Koopman,
Guido Zuccon
Abstract:
Federated search, which involves integrating results from multiple independent search engines, will become increasingly pivotal in the context of Retrieval-Augmented Generation pipelines empowering LLM-based applications such as chatbots. These systems often distribute queries among various search engines, ranging from specialized (e.g., PubMed) to general (e.g., Google), based on the nature of us…
▽ More
Federated search, which involves integrating results from multiple independent search engines, will become increasingly pivotal in the context of Retrieval-Augmented Generation pipelines empowering LLM-based applications such as chatbots. These systems often distribute queries among various search engines, ranging from specialized (e.g., PubMed) to general (e.g., Google), based on the nature of user utterances. A critical aspect of federated search is resource selection - the selection of appropriate resources prior to issuing the query to ensure high-quality and rapid responses, and contain costs associated with calling the external search engines. However, current SOTA resource selection methodologies primarily rely on feature-based learning approaches. These methods often involve the labour intensive and expensive creation of training labels for each resource. In contrast, LLMs have exhibited strong effectiveness as zero-shot methods across NLP and IR tasks. We hypothesise that in the context of federated search LLMs can assess the relevance of resources without the need for extensive predefined labels or features. In this paper, we propose ReSLLM. Our ReSLLM method exploits LLMs to drive the selection of resources in federated search in a zero-shot setting. In addition, we devise an unsupervised fine tuning protocol, the Synthetic Label Augmentation Tuning (SLAT), where the relevance of previously logged queries and snippets from resources is predicted using an off-the-shelf LLM and then in turn used to fine-tune ReSLLM with respect to resource selection. Our empirical evaluation and analysis details the factors influencing the effectiveness of LLMs in this context. The results showcase the merits of ReSLLM for resource selection: not only competitive effectiveness in the zero-shot setting, but also obtaining large when fine-tuned using SLAT-protocol.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval
Authors:
Chuting Yu,
Hang Li,
Ahmed Mourad,
Bevan Koopman,
Guido Zuccon
Abstract:
This paper considers Pseudo-Relevance Feedback (PRF) methods for dense retrievers in a resource constrained environment such as that of cheap cloud instances or embedded systems (e.g., smartphones and smartwatches), where memory and CPU are limited and GPUs are not present. For this, we propose a transformer-based PRF method (TPRF), which has a much smaller memory footprint and faster inference ti…
▽ More
This paper considers Pseudo-Relevance Feedback (PRF) methods for dense retrievers in a resource constrained environment such as that of cheap cloud instances or embedded systems (e.g., smartphones and smartwatches), where memory and CPU are limited and GPUs are not present. For this, we propose a transformer-based PRF method (TPRF), which has a much smaller memory footprint and faster inference time compared to other deep language models that employ PRF mechanisms, with a marginal effectiveness loss. TPRF learns how to effectively combine the relevance feedback signals from dense passage representations. Specifically, TPRF provides a mechanism for modelling relationships and weights between the query and the relevance feedback signals. The method is agnostic to the specific dense representation used and thus can be generally applied to any dense retriever.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
A Reproducibility Study of Goldilocks: Just-Right Tuning of BERT for TAR
Authors:
Xinyu Mao,
Bevan Koopman,
Guido Zuccon
Abstract:
Screening documents is a tedious and time-consuming aspect of high-recall retrieval tasks, such as compiling a systematic literature review, where the goal is to identify all relevant documents for a topic. To help streamline this process, many Technology-Assisted Review (TAR) methods leverage active learning techniques to reduce the number of documents requiring review. BERT-based models have sho…
▽ More
Screening documents is a tedious and time-consuming aspect of high-recall retrieval tasks, such as compiling a systematic literature review, where the goal is to identify all relevant documents for a topic. To help streamline this process, many Technology-Assisted Review (TAR) methods leverage active learning techniques to reduce the number of documents requiring review. BERT-based models have shown high effectiveness in text classification, leading to interest in their potential use in TAR workflows. In this paper, we investigate recent work that examined the impact of further pre-training epochs on the effectiveness and efficiency of a BERT-based active learning pipeline. We first report that we could replicate the original experiments on two specific TAR datasets, confirming some of the findings: importantly, that further pre-training is critical to high effectiveness, but requires attention in terms of selecting the correct training epoch. We then investigate the generalisability of the pipeline on a different TAR task, that of medical systematic reviews. In this context, we show that there is no need for further pre-training if a domain-specific BERT backbone is used within the active learning pipeline. This finding provides practical implications for using the studied active learning pipeline within domain-specific TAR tasks.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Zero-shot Generative Large Language Models for Systematic Review Screening Automation
Authors:
Shuai Wang,
Harrisen Scells,
Shengyao Zhuang,
Martin Potthast,
Bevan Koopman,
Guido Zuccon
Abstract:
Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions. Conducting such reviews is often resource- and time-intensive, especially in the screening phase, where abstracts of publications are assessed for inclusion in a review. This study investigates the effectiveness of using zero-shot large language models~(LLMs…
▽ More
Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions. Conducting such reviews is often resource- and time-intensive, especially in the screening phase, where abstracts of publications are assessed for inclusion in a review. This study investigates the effectiveness of using zero-shot large language models~(LLMs) for automatic screening. We evaluate the effectiveness of eight different LLMs and investigate a calibration technique that uses a predefined recall threshold to determine whether a publication should be included in a systematic review. Our comprehensive evaluation using five standard test collections shows that instruction fine-tuning plays an important role in screening, that calibration renders LLMs practical for achieving a targeted recall, and that combining both with an ensemble of zero-shot models saves significant screening time compared to state-of-the-art approaches.
△ Less
Submitted 31 January, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
Team IELAB at TREC Clinical Trial Track 2023: Enhancing Clinical Trial Retrieval with Neural Rankers and Large Language Models
Authors:
Shengyao Zhuang,
Bevan Koopman,
Guido Zuccon
Abstract:
We describe team ielab from CSIRO and The University of Queensland's approach to the 2023 TREC Clinical Trials Track. Our approach was to use neural rankers but to utilise Large Language Models to overcome the issue of lack of training data for such rankers. Specifically, we employ ChatGPT to generate relevant patient descriptions for randomly selected clinical trials from the corpus. This synthet…
▽ More
We describe team ielab from CSIRO and The University of Queensland's approach to the 2023 TREC Clinical Trials Track. Our approach was to use neural rankers but to utilise Large Language Models to overcome the issue of lack of training data for such rankers. Specifically, we employ ChatGPT to generate relevant patient descriptions for randomly selected clinical trials from the corpus. This synthetic dataset, combined with human-annotated training data from previous years, is used to train both dense and sparse retrievers based on PubmedBERT. Additionally, a cross-encoder re-ranker is integrated into the system. To further enhance the effectiveness of our approach, we prompting GPT-4 as a TREC annotator to provide judgments on our run files. These judgments are subsequently employed to re-rank the results. This architecture tightly integrates strong PubmedBERT-based rankers with the aid of SOTA Large Language Models, demonstrating a new approach to clinical trial retrieval.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
Authors:
Shengyao Zhuang,
Bing Liu,
Bevan Koopman,
Guido Zuccon
Abstract:
In the field of information retrieval, Query Likelihood Models (QLMs) rank documents based on the probability of generating the query given the content of a document. Recently, advanced large language models (LLMs) have emerged as effective QLMs, showcasing promising ranking capabilities. This paper focuses on investigating the genuine zero-shot ranking effectiveness of recent LLMs, which are sole…
▽ More
In the field of information retrieval, Query Likelihood Models (QLMs) rank documents based on the probability of generating the query given the content of a document. Recently, advanced large language models (LLMs) have emerged as effective QLMs, showcasing promising ranking capabilities. This paper focuses on investigating the genuine zero-shot ranking effectiveness of recent LLMs, which are solely pre-trained on unstructured text data without supervised instruction fine-tuning. Our findings reveal the robust zero-shot ranking ability of such LLMs, highlighting that additional instruction fine-tuning may hinder effectiveness unless a question generation task is present in the fine-tuning dataset. Furthermore, we introduce a novel state-of-the-art ranking system that integrates LLM-based QLMs with a hybrid zero-shot retriever, demonstrating exceptional effectiveness in both zero-shot and few-shot scenarios. We make our codebase publicly available at https://github.com/ielab/llm-qlm.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
Authors:
Shengyao Zhuang,
Honglei Zhuang,
Bevan Koopman,
Guido Zuccon
Abstract:
We propose a novel zero-shot document ranking approach based on Large Language Models (LLMs): the Setwise prompting approach. Our approach complements existing prompting approaches for LLM-based zero-shot ranking: Pointwise, Pairwise, and Listwise. Through the first-of-its-kind comparative evaluation within a consistent experimental framework and considering factors like model size, token consumpt…
▽ More
We propose a novel zero-shot document ranking approach based on Large Language Models (LLMs): the Setwise prompting approach. Our approach complements existing prompting approaches for LLM-based zero-shot ranking: Pointwise, Pairwise, and Listwise. Through the first-of-its-kind comparative evaluation within a consistent experimental framework and considering factors like model size, token consumption, latency, among others, we show that existing approaches are inherently characterised by trade-offs between effectiveness and efficiency. We find that while Pointwise approaches score high on efficiency, they suffer from poor effectiveness. Conversely, Pairwise approaches demonstrate superior effectiveness but incur high computational overhead. Our Setwise approach, instead, reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, compared to previous methods. This significantly improves the efficiency of LLM-based zero-shot ranking, while also retaining high zero-shot ranking effectiveness. We make our code and results publicly available at \url{https://github.com/ielab/llm-rankers}.
△ Less
Submitted 30 May, 2024; v1 submitted 14 October, 2023;
originally announced October 2023.
-
The Simons Observatory: Cryogenic Half Wave Plate Rotation Mechanism for the Small Aperture Telescopes
Authors:
K. Yamada,
B. Bixler,
Y. Sakurai,
P. C. Ashton,
J. Sugiyama,
K. Arnold,
J. Begin,
L. Corbett,
S. Day-Weiss,
N. Galitzki,
C. A. Hill,
B. R. Johnson,
B. Jost,
A. Kusaka,
B. J. Koopman,
J. Lashner,
A. T. Lee,
A. Mangu,
H. Nishino,
L. A. Page,
M. J. Randall,
D. Sasaki,
X. Song,
J. Spisak,
T. Tsan
, et al. (2 additional authors not shown)
Abstract:
We present the requirements, design and evaluation of the cryogenic continuously rotating half-wave plate (CHWP) for the Simons Observatory (SO). SO is a cosmic microwave background (CMB) polarization experiment at Parque Astronómico Atacama in northern Chile that covers a wide range of angular scales using both small (0.42 m) and large (6 m) aperture telescopes. In particular, the small aperture…
▽ More
We present the requirements, design and evaluation of the cryogenic continuously rotating half-wave plate (CHWP) for the Simons Observatory (SO). SO is a cosmic microwave background (CMB) polarization experiment at Parque Astronómico Atacama in northern Chile that covers a wide range of angular scales using both small (0.42 m) and large (6 m) aperture telescopes. In particular, the small aperture telescopes (SATs) focus on large angular scales for primordial B-mode polarization. To this end, the SATs employ a CHWP to modulate the polarization of the incident light at 8~Hz, suppressing atmospheric $1/f$ noise and mitigating systematic uncertainties that would otherwise arise due to the differential response of detectors sensitive to orthogonal polarizations. The CHWP consists of a 505 mm diameter achromatic sapphire HWP and a cryogenic rotation mechanism, both of which are cooled down to $\sim$50 K to reduce detector thermal loading. Under normal operation the HWP is suspended by a superconducting magnetic bearing and rotates with a constant 2 Hz frequency, controlled by an electromagnetic synchronous motor. The rotation angle is detected through an angular encoder with a noise level of 0.07$μ\mathrm{rad}\sqrt{\mathrm{s}}$. During a cooldown, the rotor is held in place by a grip-and-release mechanism that serves as both an alignment device and a thermal path. In this paper we provide an overview of the SO SAT CHWP: its requirements, hardware design, and laboratory performance.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
ChatGPT Hallucinates when Attributing Answers
Authors:
Guido Zuccon,
Bevan Koopman,
Razia Shaik
Abstract:
Can ChatGPT provide evidence to support its answers? Does the evidence it suggests actually exist and does it really support its answer? We investigate these questions using a collection of domain-specific knowledge-based questions, specifically prompting ChatGPT to provide both an answer and supporting evidence in the form of references to external sources. We also investigate how different promp…
▽ More
Can ChatGPT provide evidence to support its answers? Does the evidence it suggests actually exist and does it really support its answer? We investigate these questions using a collection of domain-specific knowledge-based questions, specifically prompting ChatGPT to provide both an answer and supporting evidence in the form of references to external sources. We also investigate how different prompts impact answers and evidence. We find that ChatGPT provides correct or partially correct answers in about half of the cases (50.6% of the times), but its suggested references only exist 14% of the times. We further provide insights on the generated references that reveal common traits among the references that ChatGPT generates, and show how even if a reference provided by the model does exist, this reference often does not support the claims ChatGPT attributes to it. Our findings are important because (1) they are the first systematic analysis of the references created by ChatGPT in its answers; (2) they suggest that the model may leverage good quality information in producing correct answers, but is unable to attribute real evidence to support its answers. Prompts, raw result files and manual analysis are made publicly available.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation
Authors:
Shuai Wang,
Harrisen Scells,
Martin Potthast,
Bevan Koopman,
Guido Zuccon
Abstract:
Screening prioritisation in medical systematic reviews aims to rank the set of documents retrieved by complex Boolean queries. Prioritising the most important documents ensures that subsequent review steps can be carried out more efficiently and effectively. The current state of the art uses the final title of the review as a query to rank the documents using BERT-based neural rankers. However, th…
▽ More
Screening prioritisation in medical systematic reviews aims to rank the set of documents retrieved by complex Boolean queries. Prioritising the most important documents ensures that subsequent review steps can be carried out more efficiently and effectively. The current state of the art uses the final title of the review as a query to rank the documents using BERT-based neural rankers. However, the final title is only formulated at the end of the review process, which makes this approach impractical as it relies on ex post facto information. At the time of screening, only a rough working title is available, with which the BERT-based ranker performs significantly worse than with the final title. In this paper, we explore alternative sources of queries for prioritising screening, such as the Boolean query used to retrieve the documents to be screened and queries generated by instruction-based generative large-scale language models such as ChatGPT and Alpaca. Our best approach is not only viable based on the information available at the time of screening, but also has similar effectiveness to the final title.
△ Less
Submitted 23 November, 2023; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation
Authors:
Aaron Nicolson,
Jason Dowling,
Bevan Koopman
Abstract:
Radiologists face high burnout rates, partially due to the increasing volume of Chest X-rays (CXRs) requiring interpretation and reporting. Automated CXR report generation holds promise for reducing this burden and improving patient care. While current models show potential, their diagnostic accuracy is limited. Our proposed CXR report generator integrates elements of the radiologist workflow and…
▽ More
Radiologists face high burnout rates, partially due to the increasing volume of Chest X-rays (CXRs) requiring interpretation and reporting. Automated CXR report generation holds promise for reducing this burden and improving patient care. While current models show potential, their diagnostic accuracy is limited. Our proposed CXR report generator integrates elements of the radiologist workflow and introduces a novel reward for reinforcement learning. Our approach leverages longitudinal data from a patient's prior CXR study and effectively handles cases where no prior study exist, thus mirroring the radiologist's workflow. In contrast, existing models typically lack this flexibility, often requiring prior studies for the model to function optimally. Our approach also incorporates all CXRs from a patient's study and distinguishes between report sections through section embeddings. Our reward for reinforcement learning leverages CXR-BERT, which forces our model to learn the clinical semantics of radiology reporting. We conduct experiments on publicly available datasets -- MIMIC-CXR and Open-i IU X-ray -- with metrics shown to more closely correlate with radiologists' assessment of reporting. Results from our study demonstrate that the proposed model generates reports that are more aligned with radiologists' reports than state-of-the-art models, such as those utilising large language models, reinforcement learning, and multi-task learning. The proposed model improves the diagnostic accuracy of CXR report generation, which could one day reduce radiologists' workload and enhance patient care. Our Hugging Face checkpoint (https://huggingface.co/aehrc/cxrmate) and code (https://github.com/aehrc/cxrmate) are publicly available.
△ Less
Submitted 18 June, 2024; v1 submitted 19 July, 2023;
originally announced July 2023.
-
The Atacama Cosmology Telescope: High-resolution component-separated maps across one-third of the sky
Authors:
William R. Coulton,
Mathew S. Madhavacheril,
Adriaan J. Duivenvoorden,
J. Colin Hill,
Irene Abril-Cabezas,
Peter A. R. Ade,
Simone Aiola,
Tommy Alford,
Mandana Amiri,
Stefania Amodeo,
Rui An,
Zachary Atkins,
Jason E. Austermann,
Nicholas Battaglia,
Elia Stefano Battistelli,
James A. Beall,
Rachel Bean,
Benjamin Beringue,
Tanay Bhandarkar,
Emily Biermann,
Boris Bolliet,
J Richard Bond,
Hongbo Cai,
Erminia Calabrese,
Victoria Calafut
, et al. (129 additional authors not shown)
Abstract:
Observations of the millimeter sky contain valuable information on a number of signals, including the blackbody cosmic microwave background (CMB), Galactic emissions, and the Compton-$y$ distortion due to the thermal Sunyaev-Zel'dovich (tSZ) effect. Extracting new insight into cosmological and astrophysical questions often requires combining multi-wavelength observations to spectrally isolate one…
▽ More
Observations of the millimeter sky contain valuable information on a number of signals, including the blackbody cosmic microwave background (CMB), Galactic emissions, and the Compton-$y$ distortion due to the thermal Sunyaev-Zel'dovich (tSZ) effect. Extracting new insight into cosmological and astrophysical questions often requires combining multi-wavelength observations to spectrally isolate one component. In this work, we present a new arcminute-resolution Compton-$y$ map, which traces out the line-of-sight-integrated electron pressure, as well as maps of the CMB in intensity and E-mode polarization, across a third of the sky (around 13,000 sq.~deg.). We produce these through a joint analysis of data from the Atacama Cosmology Telescope (ACT) Data Release 4 and 6 at frequencies of roughly 93, 148, and 225 GHz, together with data from the \textit{Planck} satellite at frequencies between 30 GHz and 545 GHz. We present detailed verification of an internal linear combination pipeline implemented in a needlet frame that allows us to efficiently suppress Galactic contamination and account for spatial variations in the ACT instrument noise. These maps provide a significant advance, in noise levels and resolution, over the existing \textit{Planck} component-separated maps and will enable a host of science goals including studies of cluster and galaxy astrophysics, inferences of the cosmic velocity field, primordial non-Gaussianity searches, and gravitational lensing reconstruction of the CMB.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
The Atacama Cosmology Telescope: DR6 Gravitational Lensing Map and Cosmological Parameters
Authors:
Mathew S. Madhavacheril,
Frank J. Qu,
Blake D. Sherwin,
Niall MacCrann,
Yaqiong Li,
Irene Abril-Cabezas,
Peter A. R. Ade,
Simone Aiola,
Tommy Alford,
Mandana Amiri,
Stefania Amodeo,
Rui An,
Zachary Atkins,
Jason E. Austermann,
Nicholas Battaglia,
Elia Stefano Battistelli,
James A. Beall,
Rachel Bean,
Benjamin Beringue,
Tanay Bhandarkar,
Emily Biermann,
Boris Bolliet,
J Richard Bond,
Hongbo Cai,
Erminia Calabrese
, et al. (134 additional authors not shown)
Abstract:
We present cosmological constraints from a gravitational lensing mass map covering 9400 sq. deg. reconstructed from CMB measurements made by the Atacama Cosmology Telescope (ACT) from 2017 to 2021. In combination with BAO measurements (from SDSS and 6dF), we obtain the amplitude of matter fluctuations $σ_8 = 0.819 \pm 0.015$ at 1.8% precision, $S_8\equivσ_8({Ω_{\rm m}}/0.3)^{0.5}=0.840\pm0.028$ an…
▽ More
We present cosmological constraints from a gravitational lensing mass map covering 9400 sq. deg. reconstructed from CMB measurements made by the Atacama Cosmology Telescope (ACT) from 2017 to 2021. In combination with BAO measurements (from SDSS and 6dF), we obtain the amplitude of matter fluctuations $σ_8 = 0.819 \pm 0.015$ at 1.8% precision, $S_8\equivσ_8({Ω_{\rm m}}/0.3)^{0.5}=0.840\pm0.028$ and the Hubble constant $H_0= (68.3 \pm 1.1)\, \text{km}\,\text{s}^{-1}\,\text{Mpc}^{-1}$ at 1.6% precision. A joint constraint with CMB lensing measured by the Planck satellite yields even more precise values: $σ_8 = 0.812 \pm 0.013$, $S_8\equivσ_8({Ω_{\rm m}}/0.3)^{0.5}=0.831\pm0.023$ and $H_0= (68.1 \pm 1.0)\, \text{km}\,\text{s}^{-1}\,\text{Mpc}^{-1}$. These measurements agree well with $Λ$CDM-model extrapolations from the CMB anisotropies measured by Planck. To compare these constraints to those from the KiDS, DES, and HSC galaxy surveys, we revisit those data sets with a uniform set of assumptions, and find $S_8$ from all three surveys are lower than that from ACT+Planck lensing by varying levels ranging from 1.7-2.1$σ$. These results motivate further measurements and comparison, not just between the CMB anisotropies and galaxy lensing, but also between CMB lensing probing $z\sim 0.5-5$ on mostly-linear scales and galaxy lensing at $z\sim 0.5$ on smaller scales. We combine our CMB lensing measurements with CMB anisotropies to constrain extensions of $Λ$CDM, limiting the sum of the neutrino masses to $\sum m_ν < 0.12$ eV (95% c.l.), for example. Our results provide independent confirmation that the universe is spatially flat, conforms with general relativity, and is described remarkably well by the $Λ$CDM model, while paving a promising path for neutrino physics with gravitational lensing from upcoming ground-based CMB surveys.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
The Atacama Cosmology Telescope: A Measurement of the DR6 CMB Lensing Power Spectrum and its Implications for Structure Growth
Authors:
Frank J. Qu,
Blake D. Sherwin,
Mathew S. Madhavacheril,
Dongwon Han,
Kevin T. Crowley,
Irene Abril-Cabezas,
Peter A. R. Ade,
Simone Aiola,
Tommy Alford,
Mandana Amiri,
Stefania Amodeo,
Rui An,
Zachary Atkins,
Jason E. Austermann,
Nicholas Battaglia,
Elia Stefano Battistelli,
James A. Beall,
Rachel Bean,
Benjamin Beringue,
Tanay Bhandarkar,
Emily Biermann,
Boris Bolliet,
J Richard Bond,
Hongbo Cai,
Erminia Calabrese
, et al. (133 additional authors not shown)
Abstract:
We present new measurements of cosmic microwave background (CMB) lensing over $9400$ sq. deg. of the sky. These lensing measurements are derived from the Atacama Cosmology Telescope (ACT) Data Release 6 (DR6) CMB dataset, which consists of five seasons of ACT CMB temperature and polarization observations. We determine the amplitude of the CMB lensing power spectrum at $2.3\%$ precision ($43σ$ sign…
▽ More
We present new measurements of cosmic microwave background (CMB) lensing over $9400$ sq. deg. of the sky. These lensing measurements are derived from the Atacama Cosmology Telescope (ACT) Data Release 6 (DR6) CMB dataset, which consists of five seasons of ACT CMB temperature and polarization observations. We determine the amplitude of the CMB lensing power spectrum at $2.3\%$ precision ($43σ$ significance) using a novel pipeline that minimizes sensitivity to foregrounds and to noise properties. To ensure our results are robust, we analyze an extensive set of null tests, consistency tests, and systematic error estimates and employ a blinded analysis framework. The baseline spectrum is well fit by a lensing amplitude of $A_{\mathrm{lens}}=1.013\pm0.023$ relative to the Planck 2018 CMB power spectra best-fit $Λ$CDM model and $A_{\mathrm{lens}}=1.005\pm0.023$ relative to the $\text{ACT DR4} + \text{WMAP}$ best-fit model. From our lensing power spectrum measurement, we derive constraints on the parameter combination $S^{\mathrm{CMBL}}_8 \equiv σ_8 \left({Ω_m}/{0.3}\right)^{0.25}$ of $S^{\mathrm{CMBL}}_8= 0.818\pm0.022$ from ACT DR6 CMB lensing alone and $S^{\mathrm{CMBL}}_8= 0.813\pm0.018$ when combining ACT DR6 and Planck NPIPE CMB lensing power spectra. These results are in excellent agreement with $Λ$CDM model constraints from Planck or $\text{ACT DR4} + \text{WMAP}$ CMB power spectrum measurements. Our lensing measurements from redshifts $z\sim0.5$--$5$ are thus fully consistent with $Λ$CDM structure growth predictions based on CMB anisotropies probing primarily $z\sim1100$. We find no evidence for a suppression of the amplitude of cosmic structure at low redshifts
△ Less
Submitted 28 May, 2024; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Dr ChatGPT, tell me what I want to hear: How prompt knowledge impacts health answer correctness
Authors:
Guido Zuccon,
Bevan Koopman
Abstract:
Generative pre-trained language models (GPLMs) like ChatGPT encode in the model's parameters knowledge the models observe during the pre-training phase. This knowledge is then used at inference to address the task specified by the user in their prompt. For example, for the question-answering task, the GPLMs leverage the knowledge and linguistic patterns learned at training to produce an answer to…
▽ More
Generative pre-trained language models (GPLMs) like ChatGPT encode in the model's parameters knowledge the models observe during the pre-training phase. This knowledge is then used at inference to address the task specified by the user in their prompt. For example, for the question-answering task, the GPLMs leverage the knowledge and linguistic patterns learned at training to produce an answer to a user question. Aside from the knowledge encoded in the model itself, answers produced by GPLMs can also leverage knowledge provided in the prompts. For example, a GPLM can be integrated into a retrieve-then-generate paradigm where a search engine is used to retrieve documents relevant to the question; the content of the documents is then transferred to the GPLM via the prompt. In this paper we study the differences in answer correctness generated by ChatGPT when leveraging the model's knowledge alone vs. in combination with the prompt knowledge. We study this in the context of consumers seeking health advice from the model. Aside from measuring the effectiveness of ChatGPT in this context, we show that the knowledge passed in the prompt can overturn the knowledge encoded in the model and this is, in our experiments, to the detriment of answer correctness. This work has important implications for the development of more robust and transparent question-answering systems based on generative pre-trained language models.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?
Authors:
Shuai Wang,
Harrisen Scells,
Bevan Koopman,
Guido Zuccon
Abstract:
Systematic reviews are comprehensive reviews of the literature for a highly focused research question. These reviews are often treated as the highest form of evidence in evidence-based medicine, and are the key strategy to answer research questions in the medical field. To create a high-quality systematic review, complex Boolean queries are often constructed to retrieve studies for the review topi…
▽ More
Systematic reviews are comprehensive reviews of the literature for a highly focused research question. These reviews are often treated as the highest form of evidence in evidence-based medicine, and are the key strategy to answer research questions in the medical field. To create a high-quality systematic review, complex Boolean queries are often constructed to retrieve studies for the review topic. However, it often takes a long time for systematic review researchers to construct a high quality systematic review Boolean query, and often the resulting queries are far from effective. Poor queries may lead to biased or invalid reviews, because they missed to retrieve key evidence, or to extensive increase in review costs, because they retrieved too many irrelevant studies. Recent advances in Transformer-based generative models have shown great potential to effectively follow instructions from users and generate answers based on the instructions being made. In this paper, we investigate the effectiveness of the latest of such models, ChatGPT, in generating effective Boolean queries for systematic review literature search. Through a number of extensive experiments on standard test collections for the task, we find that ChatGPT is capable of generating queries that lead to high search precision, although trading-off this for recall. Overall, our study demonstrates the potential of ChatGPT in generating effective Boolean queries for systematic review literature search. The ability of ChatGPT to follow complex instructions and generate queries with high precision makes it a valuable tool for researchers conducting systematic reviews, particularly for rapid reviews where time is a constraint and often trading-off higher precision for lower recall is acceptable.
△ Less
Submitted 9 February, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
AgAsk: An Agent to Help Answer Farmer's Questions From Scientific Documents
Authors:
Bevan Koopman,
Ahmed Mourad,
Hang Li,
Anton van der Vegt,
Shengyao Zhuang,
Simon Gibson,
Yash Dang,
David Lawrence,
Guido Zuccon
Abstract:
Decisions in agriculture are increasingly data-driven; however, valuable agricultural knowledge is often locked away in free-text reports, manuals and journal articles. Specialised search systems are needed that can mine agricultural information to provide relevant answers to users' questions. This paper presents AgAsk -- an agent able to answer natural language agriculture questions by mining sci…
▽ More
Decisions in agriculture are increasingly data-driven; however, valuable agricultural knowledge is often locked away in free-text reports, manuals and journal articles. Specialised search systems are needed that can mine agricultural information to provide relevant answers to users' questions. This paper presents AgAsk -- an agent able to answer natural language agriculture questions by mining scientific documents.
We carefully survey and analyse farmers' information needs. On the basis of these needs we release an information retrieval test collection comprising real questions, a large collection of scientific documents split in passages, and ground truth relevance assessments indicating which passages are relevant to each question.
We implement and evaluate a number of information retrieval models to answer farmers questions, including two state-of-the-art neural ranking models. We show that neural rankers are highly effective at matching passages to questions in this context.
Finally, we propose a deployment architecture for AgAsk that includes a client based on the Telegram messaging platform and retrieval model deployed on commodity hardware.
The test collection we provide is intended to stimulate more research in methods to match natural language to answers in scientific documents. While the retrieval models were evaluated in the agriculture domain, they are generalisable and of interest to others working on similar problems.
The test collection is available at: \url{https://github.com/ielab/agvaluate}.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Neural Rankers for Effective Screening Prioritisation in Medical Systematic Review Literature Search
Authors:
Shuai Wang,
Harrisen Scells,
Bevan Koopman,
Guido Zuccon
Abstract:
Medical systematic reviews typically require assessing all the documents retrieved by a search. The reason is two-fold: the task aims for ``total recall''; and documents retrieved using Boolean search are an unordered set, and thus it is unclear how an assessor could examine only a subset. Screening prioritisation is the process of ranking the (unordered) set of retrieved documents, allowing asses…
▽ More
Medical systematic reviews typically require assessing all the documents retrieved by a search. The reason is two-fold: the task aims for ``total recall''; and documents retrieved using Boolean search are an unordered set, and thus it is unclear how an assessor could examine only a subset. Screening prioritisation is the process of ranking the (unordered) set of retrieved documents, allowing assessors to begin the downstream processes of the systematic review creation earlier, leading to earlier completion of the review, or even avoiding screening documents ranked least relevant.
Screening prioritisation requires highly effective ranking methods. Pre-trained language models are state-of-the-art on many IR tasks but have yet to be applied to systematic review screening prioritisation. In this paper, we apply several pre-trained language models to the systematic review document ranking task, both directly and fine-tuned. An empirical analysis compares how effective neural methods compare to traditional methods for this task. We also investigate different types of document representations for neural methods and their impact on ranking performance.
Our results show that BERT-based rankers outperform the current state-of-the-art screening prioritisation methods. However, BERT rankers and existing methods can actually be complementary, and thus, further improvements may be achieved if used in conjunction.
△ Less
Submitted 18 December, 2022;
originally announced December 2022.
-
Automated MeSH Term Suggestion for Effective Query Formulation in Systematic Reviews Literature Search
Authors:
Shuai Wang,
Harrisen Scells,
Bevan Koopman,
Guido Zuccon
Abstract:
High-quality medical systematic reviews require comprehensive literature searches to ensure the recommendations and outcomes are sufficiently reliable. Indeed, searching for relevant medical literature is a key phase in constructing systematic reviews and often involves domain (medical researchers) and search (information specialists) experts in developing the search queries. Queries in this conte…
▽ More
High-quality medical systematic reviews require comprehensive literature searches to ensure the recommendations and outcomes are sufficiently reliable. Indeed, searching for relevant medical literature is a key phase in constructing systematic reviews and often involves domain (medical researchers) and search (information specialists) experts in developing the search queries. Queries in this context are highly complex, based on Boolean logic, include free-text terms and index terms from standardised terminologies (e.g., the Medical Subject Headings (MeSH) thesaurus), and are difficult and time-consuming to build. The use of MeSH terms, in particular, has been shown to improve the quality of the search results. However, identifying the correct MeSH terms to include in a query is difficult: information experts are often unfamiliar with the MeSH database and unsure about the appropriateness of MeSH terms for a query. Naturally, the full value of the MeSH terminology is often not fully exploited. This article investigates methods to suggest MeSH terms based on an initial Boolean query that includes only free-text terms. In this context, we devise lexical and pre-trained language models based methods. These methods promise to automatically identify highly effective MeSH terms for inclusion in a systematic review query. Our study contributes an empirical evaluation of several MeSH term suggestion methods. We further contribute an extensive analysis of MeSH term suggestions for each method and how these suggestions impact the effectiveness of Boolean queries.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
The Simons Observatory: Antenna control software integration and implementation
Authors:
Lauren J. Saunders,
Matthew Hasselfield,
Brian J. Koopman,
Laura Newburgh
Abstract:
The Simons Observatory (SO) is a ground-based cosmic microwave background survey experiment that consists of three 0.5 m small-aperture telescopes and one 6 m large-aperture telescope, sited at an elevation of 5200 m in the Atacama Desert in Chile. SO will study the polarization and temperature anisotropies of the Cosmic Microwave Background (CMB). The observatory will require well-understood tele…
▽ More
The Simons Observatory (SO) is a ground-based cosmic microwave background survey experiment that consists of three 0.5 m small-aperture telescopes and one 6 m large-aperture telescope, sited at an elevation of 5200 m in the Atacama Desert in Chile. SO will study the polarization and temperature anisotropies of the Cosmic Microwave Background (CMB). The observatory will require well-understood telescope pointing and scanning. Good antenna control will allow us to execute the scan strategy devised to optimize sensitivity to our scientific goals, calibrate the system with celestial targets, and make maps. To achieve this, we integrate the data acquisition and control of the telescopes' Antenna Control Units (ACUs) within the software framework of the SO Observatory Control System (OCS). We present here the current status of the software integration for the ACUs, as well as measurements of the Small Aperture Telescope platforms' responsiveness to software commanding in the factory, plans for in situ measurements, and prospects for implementation on the Large Aperture Telescope.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
How does Feedback Signal Quality Impact Effectiveness of Pseudo Relevance Feedback for Passage Retrieval?
Authors:
Hang Li,
Ahmed Mourad,
Bevan Koopman,
Guido Zuccon
Abstract:
Pseudo-Relevance Feedback (PRF) assumes that the top results retrieved by a first-stage ranker are relevant to the original query and uses them to improve the query representation for a second round of retrieval. This assumption however is often not correct: some or even all of the feedback documents may be irrelevant. Indeed, the effectiveness of PRF methods may well depend on the quality of the…
▽ More
Pseudo-Relevance Feedback (PRF) assumes that the top results retrieved by a first-stage ranker are relevant to the original query and uses them to improve the query representation for a second round of retrieval. This assumption however is often not correct: some or even all of the feedback documents may be irrelevant. Indeed, the effectiveness of PRF methods may well depend on the quality of the feedback signal and thus on the effectiveness of the first-stage ranker. This aspect however has received little attention before.
In this paper we control the quality of the feedback signal and measure its impact on a range of PRF methods, including traditional bag-of-words methods (Rocchio), and dense vector-based methods (learnt and not learnt). Our results show the important role the quality of the feedback signal plays on the effectiveness of PRF methods. Importantly, and surprisingly, our analysis reveals that not all PRF methods are the same when dealing with feedback signals of varying quality. These findings are critical to gain a better understanding of the PRF methods and of which and when they should be used, depending on the feedback signal quality, and set the basis for future research in this area.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search
Authors:
Shuai Wang,
Harrisen Scells,
Justin Clark,
Bevan Koopman,
Guido Zuccon
Abstract:
Medical systematic review query formulation is a highly complex task done by trained information specialists. Complexity comes from the reliance on lengthy Boolean queries, which express a detailed research question. To aid query formulation, information specialists use a set of exemplar documents, called `seed studies', prior to query formulation. Seed studies help verify the effectiveness of a q…
▽ More
Medical systematic review query formulation is a highly complex task done by trained information specialists. Complexity comes from the reliance on lengthy Boolean queries, which express a detailed research question. To aid query formulation, information specialists use a set of exemplar documents, called `seed studies', prior to query formulation. Seed studies help verify the effectiveness of a query prior to the full assessment of retrieved studies. Beyond this use of seeds, specific IR methods can exploit seed studies for guiding both automatic query formulation and new retrieval models. One major limitation of work to date is that these methods exploit `pseudo seed studies' through retrospective use of included studies (i.e., relevance assessments). However, we show pseudo seed studies are not representative of real seed studies used by information specialists. Hence, we provide a test collection with real world seed studies used to assist with the formulation of queries. To support our collection, we provide an analysis, previously not possible, on how seed studies impact retrieval and perform several experiments using seed-study based methods to compare the effectiveness of using seed studies versus pseudo seed studies. We make our test collection and the results of all of our experiments and analysis available at http://github.com/ielab/sysrev-seed-collection
△ Less
Submitted 24 April, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Snowmass 2021 CMB-S4 White Paper
Authors:
Kevork Abazajian,
Arwa Abdulghafour,
Graeme E. Addison,
Peter Adshead,
Zeeshan Ahmed,
Marco Ajello,
Daniel Akerib,
Steven W. Allen,
David Alonso,
Marcelo Alvarez,
Mustafa A. Amin,
Mandana Amiri,
Adam Anderson,
Behzad Ansarinejad,
Melanie Archipley,
Kam S. Arnold,
Matt Ashby,
Han Aung,
Carlo Baccigalupi,
Carina Baker,
Abhishek Bakshi,
Debbie Bard,
Denis Barkats,
Darcy Barron,
Peter S. Barry
, et al. (331 additional authors not shown)
Abstract:
This Snowmass 2021 White Paper describes the Cosmic Microwave Background Stage 4 project CMB-S4, which is designed to cross critical thresholds in our understanding of the origin and evolution of the Universe, from the highest energies at the dawn of time through the growth of structure to the present day. We provide an overview of the science case, the technical design, and project plan.
This Snowmass 2021 White Paper describes the Cosmic Microwave Background Stage 4 project CMB-S4, which is designed to cross critical thresholds in our understanding of the origin and evolution of the Universe, from the highest energies at the dawn of time through the growth of structure to the present day. We provide an overview of the science case, the technical design, and project plan.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Improving Chest X-Ray Report Generation by Leveraging Warm Starting
Authors:
Aaron Nicolson,
Jason Dowling,
Bevan Koopman
Abstract:
Automatically generating a report from a patient's Chest X-Rays (CXRs) is a promising solution to reducing clinical workload and improving patient care. However, current CXR report generators -- which are predominantly encoder-to-decoder models -- lack the diagnostic accuracy to be deployed in a clinical setting. To improve CXR report generation, we investigate warm starting the encoder and decode…
▽ More
Automatically generating a report from a patient's Chest X-Rays (CXRs) is a promising solution to reducing clinical workload and improving patient care. However, current CXR report generators -- which are predominantly encoder-to-decoder models -- lack the diagnostic accuracy to be deployed in a clinical setting. To improve CXR report generation, we investigate warm starting the encoder and decoder with recent open-source computer vision and natural language processing checkpoints, such as the Vision Transformer (ViT) and PubMedBERT. To this end, each checkpoint is evaluated on the MIMIC-CXR and IU X-Ray datasets. Our experimental investigation demonstrates that the Convolutional vision Transformer (CvT) ImageNet-21K and the Distilled Generative Pre-trained Transformer 2 (DistilGPT2) checkpoints are best for warm starting the encoder and decoder, respectively. Compared to the state-of-the-art ($\mathcal{M}^2$ Transformer Progressive), CvT2DistilGPT2 attained an improvement of 8.3\% for CE F-1, 1.8\% for BLEU-4, 1.6\% for ROUGE-L, and 1.0\% for METEOR. The reports generated by CvT2DistilGPT2 have a higher similarity to radiologist reports than previous approaches. This indicates that leveraging warm starting improves CXR report generation. Code and checkpoints for CvT2DistilGPT2 are available at https://github.com/aehrc/cvt2distilgpt2.
△ Less
Submitted 12 July, 2023; v1 submitted 23 January, 2022;
originally announced January 2022.
-
Semantic Search for Large Scale Clinical Ontologies
Authors:
Duy-Hoa Ngo,
Madonna Kemp,
Donna Truran,
Bevan Koopman,
Alejandro Metke-Jimenez
Abstract:
Finding concepts in large clinical ontologies can be challenging when queries use different vocabularies. A search algorithm that overcomes this problem is useful in applications such as concept normalisation and ontology matching, where concepts can be referred to in different ways, using different synonyms. In this paper, we present a deep learning based approach to build a semantic search syste…
▽ More
Finding concepts in large clinical ontologies can be challenging when queries use different vocabularies. A search algorithm that overcomes this problem is useful in applications such as concept normalisation and ontology matching, where concepts can be referred to in different ways, using different synonyms. In this paper, we present a deep learning based approach to build a semantic search system for large clinical ontologies. We propose a Triplet-BERT model and a method that generates training data directly from the ontologies. The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to concept searching tasks, and outperforms all baseline methods.
△ Less
Submitted 1 January, 2022;
originally announced January 2022.
-
The Atacama Cosmology Telescope: Constraints on Pre-Recombination Early Dark Energy
Authors:
J. Colin Hill,
Erminia Calabrese,
Simone Aiola,
Nicholas Battaglia,
Boris Bolliet,
Steve K. Choi,
Mark J. Devlin,
Adriaan J. Duivenvoorden,
Jo Dunkley,
Simone Ferraro,
Patricio A. Gallardo,
Vera Gluscevic,
Matthew Hasselfield,
Matt Hilton,
Adam D. Hincks,
Renee Hlozek,
Brian J. Koopman,
Arthur Kosowsky,
Adrien La Posta,
Thibaut Louis,
Mathew S. Madhavacheril,
Jeff McMahon,
Kavilan Moodley,
Sigurd Naess,
Umberto Natale
, et al. (18 additional authors not shown)
Abstract:
The early dark energy (EDE) scenario aims to increase the value of the Hubble constant ($H_0$) inferred from cosmic microwave background (CMB) data over that found in $Λ$CDM, via the introduction of a new form of energy density in the early universe. The EDE component briefly accelerates cosmic expansion just prior to recombination, which reduces the physical size of the sound horizon imprinted in…
▽ More
The early dark energy (EDE) scenario aims to increase the value of the Hubble constant ($H_0$) inferred from cosmic microwave background (CMB) data over that found in $Λ$CDM, via the introduction of a new form of energy density in the early universe. The EDE component briefly accelerates cosmic expansion just prior to recombination, which reduces the physical size of the sound horizon imprinted in the CMB. Previous work has found that non-zero EDE is not preferred by Planck CMB power spectrum data alone, which yield a 95% confidence level (CL) upper limit $f_{\rm EDE} < 0.087$ on the maximal fractional contribution of the EDE field to the cosmic energy budget. In this paper, we fit the EDE model to CMB data from the Atacama Cosmology Telescope (ACT) Data Release 4. We find that a combination of ACT, large-scale Planck TT (similar to WMAP), Planck CMB lensing, and BAO data prefers the existence of EDE at $>99.7$% CL: $f_{\rm EDE} = 0.091^{+0.020}_{-0.036}$, with $H_0 = 70.9^{+1.0}_{-2.0}$ km/s/Mpc (both 68% CL). From a model-selection standpoint, we find that EDE is favored over $Λ$CDM by these data at roughly $3σ$ significance. In contrast, a joint analysis of the full Planck and ACT data yields no evidence for EDE, as previously found for Planck alone. We show that the preference for EDE in ACT alone is driven by its TE and EE power spectrum data. The tight constraint on EDE from Planck alone is driven by its high-$\ell$ TT power spectrum data. Understanding whether these differing constraints are physical in nature, due to systematics, or simply a rare statistical fluctuation is of high priority. The best-fit EDE models to ACT and Planck exhibit coherent differences across a wide range of multipoles in TE and EE, indicating that a powerful test of this scenario is anticipated with near-future data from ACT and other ground-based experiments.
△ Less
Submitted 24 June, 2022; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls
Authors:
Hang Li,
Ahmed Mourad,
Shengyao Zhuang,
Bevan Koopman,
Guido Zuccon
Abstract:
Pseudo Relevance Feedback (PRF) is known to improve the effectiveness of bag-of-words retrievers. At the same time, deep language models have been shown to outperform traditional bag-of-words rerankers. However, it is unclear how to integrate PRF directly with emergent deep language models. In this article, we address this gap by investigating methods for integrating PRF signals into rerankers and…
▽ More
Pseudo Relevance Feedback (PRF) is known to improve the effectiveness of bag-of-words retrievers. At the same time, deep language models have been shown to outperform traditional bag-of-words rerankers. However, it is unclear how to integrate PRF directly with emergent deep language models. In this article, we address this gap by investigating methods for integrating PRF signals into rerankers and dense retrievers based on deep language models. We consider text-based and vector-based PRF approaches, and investigate different ways of combining and scoring relevance signals. An extensive empirical evaluation was conducted across four different datasets and two task settings (retrieval and ranking). Text-based PRF results show that the use of PRF had a mixed effect on deep rerankers across different datasets. We found that the best effectiveness was achieved when (i) directly concatenating each PRF passage with the query, searching with the new set of queries, and then aggregating the scores; (ii) using Borda to aggregate scores from PRF runs. Vector-based PRF results show that the use of PRF enhanced the effectiveness of deep rerankers and dense retrievers over several evaluation metrics. We found that higher effectiveness was achieved when (i) the query retains either the majority or the same weight within the PRF mechanism, and (ii) a shallower PRF signal (i.e., a smaller number of top-ranked passages) was employed, rather than a deeper signal. Our vector-based PRF method is computationally efficient; thus this represents a general PRF method others can use with deep rerankers and dense retrievers.
△ Less
Submitted 30 June, 2022; v1 submitted 25 August, 2021;
originally announced August 2021.
-
The mass and galaxy distribution around SZ-selected clusters
Authors:
T. Shin,
B. Jain,
S. Adhikari,
E. J. Baxter,
C. Chang,
S. Pandey,
A. Salcedo,
D. H. Weinberg,
A. Amsellem,
N. Battaglia,
M. Belyakov,
T. Dacunha,
S. Goldstein,
A. V. Kravtsov,
T. N. Varga,
T. M. C. Abbott,
M. Aguena,
A. Alarcon,
S. Allam,
A. Amon,
F. Andrade-Oliveira,
J. Annis,
D. Bacon,
K. Bechtol,
M. R. Becker
, et al. (114 additional authors not shown)
Abstract:
We present measurements of the radial profiles of the mass and galaxy number density around Sunyaev-Zel'dovich-selected clusters using both weak lensing and galaxy counts. The clusters are selected from the Atacama Cosmology Telescope Data Release 5 and the galaxies from the Dark Energy Survey Year 3 dataset. With signal-to-noise of 62 (43) for galaxy (weak lensing) profiles over scales of about…
▽ More
We present measurements of the radial profiles of the mass and galaxy number density around Sunyaev-Zel'dovich-selected clusters using both weak lensing and galaxy counts. The clusters are selected from the Atacama Cosmology Telescope Data Release 5 and the galaxies from the Dark Energy Survey Year 3 dataset. With signal-to-noise of 62 (43) for galaxy (weak lensing) profiles over scales of about $0.2-20h^{-1}$ Mpc, these are the highest precision measurements for SZ-selected clusters to date. Because SZ selection closely approximates mass selection, these measurements enable several tests of theoretical models of the mass and light distribution around clusters. Our main findings are: 1. The splashback feature is detected at a consistent location in both the mass and galaxy profiles and its location is consistent with predictions of cold dark matter N-body simulations. 2. The full mass profile is also consistent with the simulations; hence it can constrain alternative dark matter models that modify the mass distribution of clusters. 3. The shapes of the galaxy and lensing profiles are remarkably similar for our sample over the entire range of scales, from well inside the cluster halo to the quasilinear regime. This can be used to constrain processes such as quenching and tidal disruption that alter the galaxy distribution inside the halo, and scale-dependent features in the transition regime outside the halo. We measure the dependence of the profile shapes on the galaxy sample, redshift and cluster mass. We extend the Diemer \& Kravtsov model for the cluster profiles to the linear regime using perturbation theory and show that it provides a good match to the measured profiles. We also compare the measured profiles to predictions of the standard halo model and simulations that include hydrodynamics. Applications of these results to cluster mass estimation and cosmology are discussed.
△ Less
Submitted 12 May, 2021;
originally announced May 2021.
-
The Atacama Cosmology Telescope: Microwave Intensity and Polarization Maps of the Galactic Center
Authors:
Yilun Guan,
Susan E. Clark,
Brandon S. Hensley,
Patricio A. Gallardo,
Sigurd Naess,
Cody J. Duell,
Simone Aiola,
Zachary Atkins,
Erminia Calabrese,
Steve K. Choi,
Nicholas F. Cothard,
Mark Devlin,
Adriaan J. Duivenvoorden,
Jo Dunkley,
Rolando Dünner,
Simone Ferraro,
Matthew Hasselfield,
John P. Hughes,
Brian J. Koopman,
Arthur B. Kosowsky,
Mathew S. Madhavacheril,
Jeff McMahon,
Federico Nati,
Michael D. Niemack,
Lyman A. Page
, et al. (8 additional authors not shown)
Abstract:
We present arcminute-resolution intensity and polarization maps of the Galactic center made with the Atacama Cosmology Telescope (ACT). The maps cover a 32 deg$^2$ field at 98, 150, and 224 GHz with $\vert l\vert\le4^\circ$, $\vert b\vert\le2^\circ$. We combine these data with Planck observations at similar frequencies to create coadded maps with increased sensitivity at large angular scales. With…
▽ More
We present arcminute-resolution intensity and polarization maps of the Galactic center made with the Atacama Cosmology Telescope (ACT). The maps cover a 32 deg$^2$ field at 98, 150, and 224 GHz with $\vert l\vert\le4^\circ$, $\vert b\vert\le2^\circ$. We combine these data with Planck observations at similar frequencies to create coadded maps with increased sensitivity at large angular scales. With the coadded maps, we are able to resolve many known features of the Central Molecular Zone (CMZ) in both total intensity and polarization. We map the orientation of the plane-of-sky component of the Galactic magnetic field inferred from the polarization angle in the CMZ, finding significant changes in morphology in the three frequency bands as the underlying dominant emission mechanism changes from synchrotron to dust emission. Selected Galactic center sources, including Sgr A*, the Brick molecular cloud (G0.253+0.016), the Mouse pulsar wind nebula (G359.23-0.82), and the Tornado supernova remnant candidate (G357.7-0.1), are examined in detail. These data illustrate the potential for leveraging ground-based Cosmic Microwave Background polarization experiments for Galactic science.
△ Less
Submitted 14 September, 2021; v1 submitted 11 May, 2021;
originally announced May 2021.
-
Atacama Cosmology Telescope measurements of a large sample of candidates from the Massive and Distant Clusters of WISE Survey: Sunyaev-Zeldovich effect confirmation of MaDCoWS candidates using ACT
Authors:
John Orlowski-Scherer,
Luca Di Mascolo,
Tanay Bhandarkar,
Alex Manduca,
Tony Mroczkowski,
Stefania Amodeo,
Nick Battaglia,
Mark Brodwin,
Steve K. Choi,
Mark Devlin,
Simon Dicker,
Jo Dunkley,
Anthony H. Gonzalez,
Dongwon Han,
Matt Hilton,
Kevin Huffenberger,
John P. Hughes,
Amanda MacInnis,
Kenda Knowles,
Brian J. Koopman,
Ian Lowe,
Kavilan Moodley,
Federico Nati,
Michael D. Niemack,
Lyman A. Page
, et al. (13 additional authors not shown)
Abstract:
Galaxy clusters are an important tool for cosmology, and their detection and characterization are key goals for current and future surveys. Using data from the Wide-field Infrared Survey Explorer (WISE), the Massive and Distant Clusters of WISE Survey (MaDCoWS) located 2,839 significant galaxy overdensities at redshifts $0.7\lesssim z\lesssim 1.5$, which included extensive follow-up imaging from t…
▽ More
Galaxy clusters are an important tool for cosmology, and their detection and characterization are key goals for current and future surveys. Using data from the Wide-field Infrared Survey Explorer (WISE), the Massive and Distant Clusters of WISE Survey (MaDCoWS) located 2,839 significant galaxy overdensities at redshifts $0.7\lesssim z\lesssim 1.5$, which included extensive follow-up imaging from the Spitzer Space Telescope to determine cluster richnesses. Concurrently, the Atacama Cosmology Telescope (ACT) has produced large area mm-wave maps in three frequency bands along with a large catalog of Sunyaev-Zeldovich (SZ) selected clusters, as part of its Data Release 5 (DR5). Using the maps and cluster catalog from DR5, we explore the scaling between SZ mass and cluster richness. We use complementary radio survey data from the Very Large Array, submillimeter data from Herschel, and ACT 224~GHz data to assess the impact of contaminating sources on the SZ signals. We then use a hierarchical Bayesian model to fit the mass-richness scaling relation. We find that MaDCoWS clusters have submillimeter contamination which is consistent with a gray-body spectrum, while the ACT clusters are consistent with no submillimeter emission on average. We find the best fit ACT SZ mass vs. MaDCoWS richness scaling relation has a slope of $κ= 1.84^{+0.15}_{-0.14}$, where the slope is defined as $M\propto λ_{15}^κ$ where $λ_{15}$ is the richness. Additionally, we find that the approximate level of in-fill of the ACT and MaDCoWS cluster SZ signals to be at the percent level
△ Less
Submitted 30 June, 2021; v1 submitted 30 April, 2021;
originally announced May 2021.
-
The Atacama Cosmology Telescope: A search for Planet 9
Authors:
Sigurd Naess,
Simone Aiola,
Nick Battaglia,
Richard J. Bond,
Erminia Calabrese,
Steve K. Choi,
Nicholas F. Cothard,
Mark Halpern,
J. Colin Hill,
Brian J. Koopman,
Mark Devlin,
Jeff McMahon,
Simon Dicker,
Adriaan J. Duivenvoorden,
Jo Dunkley,
Alexander Van Engelen,
Valentina Fanfani,
Simone Ferraro,
Patricio A. Gallardo,
Yilun Guan,
Dongwon Han,
Matthew Hasselfield,
Adam D. Hincks,
Kevin Huffenberger,
Arthur B. Kosowsky
, et al. (15 additional authors not shown)
Abstract:
We use Atacama Cosmology Telescope (ACT) observations at 98 GHz (2015--2019), 150 GHz (2013--2019) and 229 GHz (2017--2019) to perform a blind shift-and-stack search for Planet 9. The search explores distances from 300 AU to 2000 AU and velocities up to 6.3 arcmin per year, depending on the distance. For a 5 Earth-mass Planet 9 the detection limit varies from 325 AU to 625 AU, depending on the sky…
▽ More
We use Atacama Cosmology Telescope (ACT) observations at 98 GHz (2015--2019), 150 GHz (2013--2019) and 229 GHz (2017--2019) to perform a blind shift-and-stack search for Planet 9. The search explores distances from 300 AU to 2000 AU and velocities up to 6.3 arcmin per year, depending on the distance. For a 5 Earth-mass Planet 9 the detection limit varies from 325 AU to 625 AU, depending on the sky location. For a 10 Earth-mass planet the corresponding range is 425 AU to 775 AU. The search covers the whole 18,000 square degrees of the ACT survey, though a slightly deeper search is performed for the parts of the sky consistent with Planet 9's expected orbital inclination. No significant detections are found, which is used to place limits on the mm-wave flux density of Planet 9 over much of its orbit. Overall we eliminate roughly 17% and 9% of the parameter space for a 5 and 10 Earth-mass Planet 9 respectively. We also provide a list of the 10 strongest candidates from the search for possible follow-up. More generally, we exclude (at 95% confidence) the presence of an unknown Solar system object within our survey area brighter than 4--12 mJy (depending on position) at 150 GHz with current distance $300 \text{ AU} < r < 600 \text{ AU}$ and heliocentric angular velocity $1.5'/\text{yr} < v \cdot \frac{500 \text{ AU}}{r} < 2.3'\text{yr}$, corresponding to low-to-moderate eccentricities. These limits worsen gradually beyond 600 AU, reaching 5--15 mJy by 1500 AU.
△ Less
Submitted 11 May, 2021; v1 submitted 20 April, 2021;
originally announced April 2021.
-
The Simons Observatory: the Large Aperture Telescope (LAT)
Authors:
Zhilei Xu,
Shunsuke Adachi,
Peter Ade,
J. A. Beall,
Tanay Bhandarkar,
J. Richard Bond,
Grace E. Chesmore,
Yuji Chinone,
Steve K. Choi,
Jake A. Connors,
Gabriele Coppi,
Nicholas F. Cothard,
Kevin D. Crowley,
Mark Devlin,
Simon Dicker,
Bradley Dober,
Shannon M. Duff,
Nicholas Galitzki,
Patricio A. Gallardo,
Joseph E. Golec,
Jon E. Gudmundsson,
Saianeesh K. Haridas,
Kathleen Harrington,
Carlos Hervias-Caimapo,
Shuay-Pwu Patty Ho
, et al. (35 additional authors not shown)
Abstract:
The Simons Observatory (SO) is a Cosmic Microwave Background (CMB) experiment to observe the microwave sky in six frequency bands from 30GHz to 290GHz. The Observatory -- at $\sim$5200m altitude -- comprises three Small Aperture Telescopes (SATs) and one Large Aperture Telescope (LAT) at the Atacama Desert, Chile. This research note describes the design and current status of the LAT along with its…
▽ More
The Simons Observatory (SO) is a Cosmic Microwave Background (CMB) experiment to observe the microwave sky in six frequency bands from 30GHz to 290GHz. The Observatory -- at $\sim$5200m altitude -- comprises three Small Aperture Telescopes (SATs) and one Large Aperture Telescope (LAT) at the Atacama Desert, Chile. This research note describes the design and current status of the LAT along with its future timeline.
△ Less
Submitted 29 April, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
The Atacama Cosmology Telescope: Summary of DR4 and DR5 Data Products and Data Access
Authors:
Maya Mallaby-Kay,
Zachary Atkins,
Simone Aiola,
Stefania Amodeo,
Jason E. Austermann,
James A. Beall,
Daniel T. Becker,
J. Richard Bond,
Erminia Calabrese,
Grace E. Chesmore,
Steve K. Choi,
Kevin T. Crowley,
Omar Darwish,
Edwawd V. Denison,
Mark J. Devlin,
Shannon M. Duff,
Adriaan J. Duivenvoorden,
Jo Dunkley,
Simone Ferraro,
Kyra Fichman,
Patricio A. Gallardo,
Joseph E. Golec,
Yilun Guan,
Dongwon Han,
Matthew Hasselfield
, et al. (35 additional authors not shown)
Abstract:
Two recent large data releases for the Atacama Cosmology Telescope (ACT), called DR4 and DR5, are available for public access. These data include temperature and polarization maps that cover nearly half the sky at arcminute resolution in three frequency bands; lensing maps and component-separated maps covering ~ 2,100 deg^2 of sky; derived power spectra and cosmological likelihoods; a catalog of o…
▽ More
Two recent large data releases for the Atacama Cosmology Telescope (ACT), called DR4 and DR5, are available for public access. These data include temperature and polarization maps that cover nearly half the sky at arcminute resolution in three frequency bands; lensing maps and component-separated maps covering ~ 2,100 deg^2 of sky; derived power spectra and cosmological likelihoods; a catalog of over 4,000 galaxy clusters; and supporting ancillary products including beam functions and masks. The data and products are described in a suite of ACT papers; here we provide a summary. In order to facilitate ease of access to these data we present a set of Jupyter IPython notebooks developed to introduce users to DR4, DR5, and the tools needed to analyze these data. The data products (excluding simulations) and the set of notebooks are publicly available on the NASA Legacy Archive for Microwave Background Data Analysis (LAMBDA); simulation products are available on the National Energy Research Scientific Computing Center (NERSC).
△ Less
Submitted 29 April, 2021; v1 submitted 4 March, 2021;
originally announced March 2021.
-
The Simons Observatory Large Aperture Telescope Receiver
Authors:
Ningfeng Zhu,
Tanay Bhandarkar,
Gabriele Coppi,
Anna M. Kofman,
John L. Orlowski-Scherer,
Zhilei Xu,
Shunsuke Adachi,
Peter Ade,
Simone Aiola,
Jason Austermann,
Andrew O. Bazarko,
James A. Beall,
Sanah Bhimani,
J. Richard Bond,
Grace E. Chesmore,
Steve K. Choi,
Jake Connors,
Nicholas F. Cothard,
Mark Devlin,
Simon Dicker,
Bradley Dober,
Cody J. Duell,
Shannon M. Duff,
Rolando Dünner,
Giulio Fabbian
, et al. (46 additional authors not shown)
Abstract:
The Simons Observatory (SO) Large Aperture Telescope Receiver (LATR) will be coupled to the Large Aperture Telescope located at an elevation of 5,200 m on Cerro Toco in Chile. The resulting instrument will produce arcminute-resolution millimeter-wave maps of half the sky with unprecedented precision. The LATR is the largest cryogenic millimeter-wave camera built to date with a diameter of 2.4 m an…
▽ More
The Simons Observatory (SO) Large Aperture Telescope Receiver (LATR) will be coupled to the Large Aperture Telescope located at an elevation of 5,200 m on Cerro Toco in Chile. The resulting instrument will produce arcminute-resolution millimeter-wave maps of half the sky with unprecedented precision. The LATR is the largest cryogenic millimeter-wave camera built to date with a diameter of 2.4 m and a length of 2.6 m. It cools 1200 kg of material to 4 K and 200 kg to 100 mk, the operating temperature of the bolometric detectors with bands centered around 27, 39, 93, 145, 225, and 280 GHz. Ultimately, the LATR will accommodate 13 40 cm diameter optics tubes, each with three detector wafers and a total of 62,000 detectors. The LATR design must simultaneously maintain the optical alignment of the system, control stray light, provide cryogenic isolation, limit thermal gradients, and minimize the time to cool the system from room temperature to 100 mK. The interplay between these competing factors poses unique challenges. We discuss the trade studies involved with the design, the final optimization, the construction, and ultimate performance of the system.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
The Integration and Testing Program for the Simons Observatory Large Aperture Telescope Optics Tubes
Authors:
Kathleen Harrington,
Carlos Sierra,
Grace Chesmore,
Shreya Sutariya,
Aamir M. Ali,
Steve K. Choi,
Nicholas F. Cothard,
Simon Dicker,
Nicholas Galitzki,
Shuay-Pwu Patty Ho,
Anna M. Kofman,
Brian J. Koopman,
Jack Lashner,
Jeff McMahon,
Michael D. Niemack,
John Orlowski-Scherer,
Joseph Seibert,
Max Silva-Feaver,
Eve M. Vavagiakis,
Zhilei Xu,
Ningfeng Zhu
Abstract:
The Simons Observatory (SO) will be a cosmic microwave background (CMB) survey experiment with three small-aperture telescopes and one large-aperture telescope, which will observe from the Atacama Desert in Chile. In total, SO will field over 60,000 transition-edge sensor (TES) bolometers in six spectral bands centered between 27 and 280 GHz in order to achieve the sensitivity necessary to measure…
▽ More
The Simons Observatory (SO) will be a cosmic microwave background (CMB) survey experiment with three small-aperture telescopes and one large-aperture telescope, which will observe from the Atacama Desert in Chile. In total, SO will field over 60,000 transition-edge sensor (TES) bolometers in six spectral bands centered between 27 and 280 GHz in order to achieve the sensitivity necessary to measure or constrain numerous cosmological quantities, as outlined in The Simons Observatory Collaboration et al. (2019). The 6~m Large Aperture Telescope (LAT), which will target the smaller angular scales of the CMB, utilizes a cryogenic receiver (LATR) designed to house up to 13 individual optics tubes. Each optics tube is comprised of three silicon lenses, IR blocking filters, and three dual-polarization, dichroic TES detector wafers. The scientific objectives of the SO project require these optics tubes to achieve high-throughput optical performance while maintaining exquisite control of systematic effects. We describe the integration and testing program for the SO LATR optics tubes that will verify the design and assembly of the optics tubes before they are shipped to the SO site and installed in the LATR cryostat. The program includes a quick turn-around test cryostat that is used to cool single optics tubes and validate the cryogenic performance and detector readout assembly. We discuss the optical design specifications the optics tubes must meet to be deployed on sky and the suite of optical test equipment that is prepared to measure these requirements.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
The Atacama Cosmology Telescope: Detection of the Pairwise Kinematic Sunyaev-Zel'dovich Effect with SDSS DR15 Galaxies
Authors:
Victoria Calafut,
Patricio A. Gallardo,
Eve M. Vavagiakis,
Stefania Amodeo,
Simone Aiola,
Jason E. Austermann,
Nicholas Battaglia,
Elia S. Battistelli,
James A. Beall,
Rachel Bean,
J. Richard Bond,
Erminia Calabrese,
Steve K. Choi,
Nicholas F. Cothard,
Mark J. Devlin,
Cody J. Duell,
S. M. Duff,
Adriaan J. Duivenvoorden,
Jo Dunkley,
Rolando Dunner,
Simone Ferraro,
Yilun Guan,
J. Colin Hill,
Matt Hilton,
Renee Hlozek
, et al. (27 additional authors not shown)
Abstract:
We present a 5.4$σ$ detection of the pairwise kinematic Sunyaev-Zel'dovich (kSZ) effect using Atacama Cosmology Telescope (ACT) and $\it{Planck}$ CMB observations in combination with Luminous Red Galaxy samples from the Sloan Digital Sky Survey (SDSS) DR15 catalog. Results are obtained using three ACT CMB maps: co-added 150 GHz and 98 GHz maps, combining observations from 2008-2018 (ACT DR5), whic…
▽ More
We present a 5.4$σ$ detection of the pairwise kinematic Sunyaev-Zel'dovich (kSZ) effect using Atacama Cosmology Telescope (ACT) and $\it{Planck}$ CMB observations in combination with Luminous Red Galaxy samples from the Sloan Digital Sky Survey (SDSS) DR15 catalog. Results are obtained using three ACT CMB maps: co-added 150 GHz and 98 GHz maps, combining observations from 2008-2018 (ACT DR5), which overlap with SDSS DR15 over 3,700 sq. deg., and a component-separated map using night-time only observations from 2014-2015 (ACT DR4), overlapping with SDSS DR15 over 2,089 sq. deg. Comparisons of the results from these three maps provide consistency checks in relation to potential frequency-dependent foreground contamination. A total of 343,647 galaxies are used as tracers to identify and locate galaxy groups and clusters from which the kSZ signal is extracted using aperture photometry. We consider the impact of various aperture photometry assumptions and covariance estimation methods on the signal extraction. Theoretical predictions of the pairwise velocities are used to obtain best-fit, mass-averaged, optical depth estimates for each of five luminosity-selected tracer samples. A comparison of the kSZ-derived optical depth measurements obtained here to those derived from the thermal SZ effect for the same sample is presented in a companion paper.
△ Less
Submitted 24 August, 2021; v1 submitted 20 January, 2021;
originally announced January 2021.
-
The Atacama Cosmology Telescope: Probing the Baryon Content of SDSS DR15 Galaxies with the Thermal and Kinematic Sunyaev-Zel'dovich Effects
Authors:
Eve M. Vavagiakis,
Patricio A. Gallardo,
Victoria Calafut,
Stefania Amodeo,
Simone Aiola,
Jason E. Austermann,
Nicholas Battaglia,
Elia S. Battistelli,
James A. Beall,
Rachel Bean,
J. Richard Bond,
Erminia Calabrese,
Steve K. Choi,
Nicholas F. Cothard,
Mark J. Devlin,
Cody J. Duell,
S. M. Duff,
Adriaan J. Duivenvoorden,
Jo Dunkley,
Rolando Dunner,
Simone Ferraro,
Yilun Guan,
J. Colin Hill,
Matt Hilton,
Renee Hlozek
, et al. (27 additional authors not shown)
Abstract:
We present high signal-to-noise measurements (up to 12$σ$) of the average thermal Sunyaev Zel'dovich (tSZ) effect from optically selected galaxy groups and clusters and estimate their baryon content within a 2.1$^\prime$ radius aperture. Sources from the Sloan Digital Sky Survey (SDSS) Baryon Oscillation Spectroscopic Survey (BOSS) DR15 catalog overlap with 3,700 sq. deg. of sky observed by the At…
▽ More
We present high signal-to-noise measurements (up to 12$σ$) of the average thermal Sunyaev Zel'dovich (tSZ) effect from optically selected galaxy groups and clusters and estimate their baryon content within a 2.1$^\prime$ radius aperture. Sources from the Sloan Digital Sky Survey (SDSS) Baryon Oscillation Spectroscopic Survey (BOSS) DR15 catalog overlap with 3,700 sq. deg. of sky observed by the Atacama Cosmology Telescope (ACT) from 2008 to 2018 at 150 and 98 GHz (ACT DR5), and 2,089 sq. deg. of internal linear combination component-separated maps combining ACT and $\it{Planck}$ data (ACT DR4). The corresponding optical depths, $\barτ$, which depend on the baryon content of the halos, are estimated using results from cosmological hydrodynamic simulations assuming an AGN feedback radiative cooling model. We estimate the mean mass of the halos in multiple luminosity bins, and compare the tSZ-based $\barτ$ estimates to theoretical predictions of the baryon content for a Navarro-Frenk-White profile. We do the same for $\barτ$ estimates extracted from fits to pairwise baryon momentum measurements of the kinematic Sunyaev-Zel'dovich effect (kSZ) for the same data set obtained in a companion paper. We find that the $\barτ$ estimates from the tSZ measurements in this work and the kSZ measurements in the companion paper agree within $1σ$ for two out of the three disjoint luminosity bins studied, while they differ by 2-3$σ$ in the highest luminosity bin. The optical depth estimates account for one third to all of the theoretically predicted baryon content in the halos across luminosity bins. Potential systematic uncertainties are discussed. The tSZ and kSZ measurements provide a step towards empirical Compton-$\bar{y}$-$\barτ$ relationships to provide new tests of cluster formation and evolution models.
△ Less
Submitted 24 August, 2021; v1 submitted 20 January, 2021;
originally announced January 2021.