subscribe to arXiv mailings

Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models

Authors: Sunny Duan, Mikail Khona, Abhiram Iyer, Rylan Schaeffer, Ila R Fiete

Abstract: The proliferation of large language models has revolutionized natural language processing tasks, yet it raises profound concerns regarding data privacy and security. Language models are trained on extensive corpora including potentially sensitive or proprietary information, and the risk of data leakage -- where the model response reveals pieces of such information -- remains inadequately understoo… ▽ More The proliferation of large language models has revolutionized natural language processing tasks, yet it raises profound concerns regarding data privacy and security. Language models are trained on extensive corpora including potentially sensitive or proprietary information, and the risk of data leakage -- where the model response reveals pieces of such information -- remains inadequately understood. This study examines susceptibility to data leakage by quantifying the phenomenon of memorization in machine learning models, focusing on the evolution of memorization patterns over training. We investigate how the statistical characteristics of training data influence the memories encoded within the model by evaluating how repetition influences memorization. We reproduce findings that the probability of memorizing a sequence scales logarithmically with the number of times it is present in the data. Furthermore, we find that sequences which are not apparently memorized after the first encounter can be uncovered throughout the course of training even without subsequent encounters. The presence of these latent memorized sequences presents a challenge for data privacy since they may be hidden at the final checkpoint of the model. To this end, we develop a diagnostic test for uncovering these latent memorized sequences by considering their cross entropy loss. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.12785 [pdf, other]

In-Context Learning of Energy Functions

Authors: Rylan Schaeffer, Mikail Khona, Sanmi Koyejo

Abstract: In-context learning is a powerful capability of certain machine learning models that arguably underpins the success of today's frontier AI models. However, in-context learning is critically limited to settings where the in-context distribution of interest $p_θ^{ICL}( x|\mathcal{D})$ can be straightforwardly expressed and/or parameterized by the model; for instance, language modeling relies on expr… ▽ More In-context learning is a powerful capability of certain machine learning models that arguably underpins the success of today's frontier AI models. However, in-context learning is critically limited to settings where the in-context distribution of interest $p_θ^{ICL}( x|\mathcal{D})$ can be straightforwardly expressed and/or parameterized by the model; for instance, language modeling relies on expressing the next-token distribution as a categorical distribution parameterized by the network's output logits. In this work, we present a more general form of in-context learning without such a limitation that we call \textit{in-context learning of energy functions}. The idea is to instead learn the unconstrained and arbitrary in-context energy function $E_θ^{ICL}(x|\mathcal{D})$ corresponding to the in-context distribution $p_θ^{ICL}(x|\mathcal{D})$. To do this, we use classic ideas from energy-based modeling. We provide preliminary evidence that our method empirically works on synthetic data. Interestingly, our work contributes (to the best of our knowledge) the first example of in-context learning where the input space and output space differ from one another, suggesting that in-context learning is a more-general capability than previously realized. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Proceedings of the 1st Workshop on In-Context Learning at the 41st International Conference on Machine Learning, Vienna, Austria. 2024. arXiv admin note: text overlap with arXiv:2402.10202

arXiv:2406.10229 [pdf, other]

Quantifying Variance in Evaluation Benchmarks

Authors: Lovish Madaan, Aaditya K. Singh, Rylan Schaeffer, Andrew Poulton, Sanmi Koyejo, Pontus Stenetorp, Sharan Narang, Dieuwke Hupkes

Abstract: Evaluation benchmarks are the cornerstone of measuring capabilities of large language models (LLMs), as well as driving progress in said capabilities. Originally designed to make claims about capabilities (or lack thereof) in fully pretrained models, evaluation benchmarks are now also extensively used to decide between various training choices. Despite this widespread usage, we rarely quantify the… ▽ More Evaluation benchmarks are the cornerstone of measuring capabilities of large language models (LLMs), as well as driving progress in said capabilities. Originally designed to make claims about capabilities (or lack thereof) in fully pretrained models, evaluation benchmarks are now also extensively used to decide between various training choices. Despite this widespread usage, we rarely quantify the variance in our evaluation benchmarks, which dictates whether differences in performance are meaningful. Here, we define and measure a range of metrics geared towards measuring variance in evaluation benchmarks, including seed variance across initialisations, and monotonicity during training. By studying a large number of models -- both openly available and pretrained from scratch -- we provide empirical estimates for a variety of variance metrics, with considerations and recommendations for practitioners. We also evaluate the utility and tradeoffs of continuous versus discrete performance measures and explore options for better understanding and reducing this variance. We find that simple changes, such as framing choice tasks (like MMLU) as completion tasks, can often reduce variance for smaller scale ($\sim$7B) models, while more involved methods inspired from human testing literature (such as item analysis and item response theory) struggle to meaningfully reduce variance. Overall, our work provides insights into variance in evaluation benchmarks, suggests LM-specific techniques to reduce variance, and more generally encourages practitioners to carefully factor in variance when comparing models. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09366 [pdf, other]

Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

Authors: Rylan Schaeffer, Victor Lecomte, Dhruv Bhandarkar Pai, Andres Carranza, Berivan Isik, Alyssa Unell, Mikail Khona, Thomas Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

Abstract: Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to impro… ▽ More Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to improve our understanding and our utilization of MMCR. To better understand MMCR, we leverage tools from high dimensional probability to demonstrate that MMCR incentivizes alignment and uniformity of learned embeddings. We then leverage tools from information theory to show that such embeddings maximize a well-known lower bound on mutual information between views, thereby connecting the geometric perspective of MMCR to the information-theoretic perspective commonly discussed in MVSSL. To better utilize MMCR, we mathematically predict and experimentally confirm non-monotonic changes in the pretraining loss akin to double descent but with respect to atypical hyperparameters. We also discover compute scaling laws that enable predicting the pretraining loss as a function of gradients steps, batch size, embedding dimension and number of views. We then show that MMCR, originally applied to image data, is performant on multimodal image-text data. By more deeply understanding the theoretical and empirical behavior of MMCR, our work reveals insights on improving MVSSL methods. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.04391 [pdf, other]

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Authors: Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda, Gabriel Mukobi, Varun Madan, Adam Ibrahim, Herbie Bradley, Stella Biderman, Sanmi Koyejo

Abstract: Predictable behavior from scaling advanced AI systems is an extremely desirable property. Although a well-established literature exists on how pretraining performance scales, the literature on how particular downstream capabilities scale is significantly muddier. In this work, we take a step back and ask: why has predicting specific downstream capabilities with scale remained elusive? While many f… ▽ More Predictable behavior from scaling advanced AI systems is an extremely desirable property. Although a well-established literature exists on how pretraining performance scales, the literature on how particular downstream capabilities scale is significantly muddier. In this work, we take a step back and ask: why has predicting specific downstream capabilities with scale remained elusive? While many factors are certainly responsible, we identify a new factor that makes modeling scaling behavior on widely used multiple-choice question-answering benchmarks challenging. Using five model families and twelve well-established multiple-choice benchmarks, we show that downstream performance is computed from negative log likelihoods via a sequence of transformations that progressively degrade the statistical relationship between performance and scale. We then reveal the mechanism causing this degradation: downstream metrics require comparing the correct choice against a small number of specific incorrect choices, meaning accurately predicting downstream capabilities requires predicting not just how probability mass concentrates on the correct choice with scale, but also how probability mass fluctuates on specific incorrect choices with scale. We empirically study how probability mass on the correct choice co-varies with probability mass on incorrect choices with increasing compute, suggesting that scaling laws for incorrect choices might be achievable. Our work also explains why pretraining scaling laws are commonly regarded as more predictable than downstream capabilities and contributes towards establishing scaling-predictable evaluations of frontier AI models. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2404.01413 [pdf, other]

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Authors: Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

Abstract: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration… ▽ More The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration until fitted models become useless. However, those studies largely assumed that new data replace old data over time, where an arguably more realistic assumption is that data accumulate over time. In this paper, we ask: what effect does accumulating data have on model collapse? We empirically study this question by pretraining sequences of language models on text corpora. We confirm that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse, then demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse; these results hold across a range of model sizes, architectures, and hyperparameters. We obtain similar results for deep generative models on other types of real data: diffusion models for molecule conformation generation and variational autoencoders for image generation. To understand why accumulating data can avoid model collapse, we use an analytically tractable framework introduced by prior work in which a sequence of linear models are fit to the previous models' outputs. Previous work used this framework to show that if data are replaced, the test error increases with the number of model-fitting iterations; we extend this argument to prove that if data instead accumulate, the test error has a finite upper bound independent of the number of iterations, meaning model collapse no longer occurs. △ Less

Submitted 29 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2402.10202 [pdf, other]

Bridging Associative Memory and Probabilistic Modeling

Authors: Rylan Schaeffer, Nika Zahedi, Mikail Khona, Dhruv Pai, Sang Truong, Yilun Du, Mitchell Ostrow, Sarthak Chandra, Andres Carranza, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo

Abstract: Associative memory and probabilistic modeling are two fundamental topics in artificial intelligence. The first studies recurrent neural networks designed to denoise, complete and retrieve data, whereas the second studies learning and sampling from probability distributions. Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log like… ▽ More Associative memory and probabilistic modeling are two fundamental topics in artificial intelligence. The first studies recurrent neural networks designed to denoise, complete and retrieve data, whereas the second studies learning and sampling from probability distributions. Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log likelihoods, we build a bridge between the two that enables useful flow of ideas in both directions. We showcase four examples: First, we propose new energy-based models that flexibly adapt their energy functions to new in-context datasets, an approach we term \textit{in-context learning of energy functions}. Second, we propose two new associative memory models: one that dynamically creates new memories as necessitated by the training data using Bayesian nonparametrics, and another that explicitly computes proportional memory assignments using the evidence lower bound. Third, using tools from associative memory, we analytically and numerically characterize the memory capacity of Gaussian kernel density estimators, a widespread tool in probababilistic modeling. Fourth, we study a widespread implementation choice in transformers -- normalization followed by self attention -- to show it performs clustering on the hypersphere. Altogether, this work urges further exchange of useful ideas between these two continents of artificial intelligence. △ Less

Submitted 13 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

arXiv:2401.06059 [pdf, other]

Investigating Data Contamination for Pre-training Language Models

Authors: Minhao Jiang, Ken Ziyu Liu, Ming Zhong, Rylan Schaeffer, Siru Ouyang, Jiawei Han, Sanmi Koyejo

Abstract: Language models pre-trained on web-scale corpora demonstrate impressive capabilities on diverse downstream tasks. However, there is increasing concern whether such capabilities might arise from evaluation datasets being included in the pre-training corpus -- a phenomenon known as \textit{data contamination} -- in a manner that artificially increases performance. There has been little understanding… ▽ More Language models pre-trained on web-scale corpora demonstrate impressive capabilities on diverse downstream tasks. However, there is increasing concern whether such capabilities might arise from evaluation datasets being included in the pre-training corpus -- a phenomenon known as \textit{data contamination} -- in a manner that artificially increases performance. There has been little understanding of how this potential contamination might influence LMs' performance on downstream tasks. In this paper, we explore the impact of data contamination at the pre-training stage by pre-training a series of GPT-2 models \textit{from scratch}. We highlight the effect of both text contamination (\textit{i.e.}\ input text of the evaluation samples) and ground-truth contamination (\textit{i.e.}\ the prompts asked on the input and the desired outputs) from evaluation data. We also investigate the effects of repeating contamination for various downstream tasks. Additionally, we examine the prevailing n-gram-based definitions of contamination within current LLM reports, pinpointing their limitations and inadequacy. Our findings offer new insights into data contamination's effects on language model capabilities and underscore the need for independent, comprehensive contamination assessments in LLM studies. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 16 pages, 5 figures

arXiv:2312.03954 [pdf, other]

Disentangling Fact from Grid Cell Fiction in Trained Deep Path Integrators

Authors: Rylan Schaeffer, Mikail Khona, Sanmi Koyejo, Ila Rani Fiete

Abstract: Work on deep learning-based models of grid cells suggests that grid cells generically and robustly arise from optimizing networks to path integrate, i.e., track one's spatial position by integrating self-velocity signals. In previous work, we challenged this path integration hypothesis by showing that deep neural networks trained to path integrate almost always do so, but almost never learn grid-l… ▽ More Work on deep learning-based models of grid cells suggests that grid cells generically and robustly arise from optimizing networks to path integrate, i.e., track one's spatial position by integrating self-velocity signals. In previous work, we challenged this path integration hypothesis by showing that deep neural networks trained to path integrate almost always do so, but almost never learn grid-like tuning unless separately inserted by researchers via mechanisms unrelated to path integration. In this work, we restate the key evidence substantiating these insights, then address a response to by authors of one of the path integration hypothesis papers. First, we show that the response misinterprets our work, indirectly confirming our points. Second, we evaluate the response's preferred "unified theory for the origin of grid cells" in trained deep path integrators and show that it is at best "occasionally suggestive," not exact or comprehensive. We finish by considering why assessing model quality through prediction of biological neural activity by regression of activity in deep networks can lead to the wrong conclusions. △ Less

Submitted 16 December, 2023; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: arXiv admin note: text overlap with arXiv:2311.16295

arXiv:2312.03096 [pdf, other]

What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes

Authors: Victor Lecomte, Kushal Thaman, Rylan Schaeffer, Naomi Bashkansky, Trevor Chow, Sanmi Koyejo

Abstract: Polysemantic neurons -- neurons that activate for a set of unrelated features -- have been seen as a significant obstacle towards interpretability of task-optimized deep networks, with implications for AI safety. The classic origin story of polysemanticity is that the data contains more ``features" than neurons, such that learning to perform a task forces the network to co-allocate multiple unrela… ▽ More Polysemantic neurons -- neurons that activate for a set of unrelated features -- have been seen as a significant obstacle towards interpretability of task-optimized deep networks, with implications for AI safety. The classic origin story of polysemanticity is that the data contains more ``features" than neurons, such that learning to perform a task forces the network to co-allocate multiple unrelated features to the same neuron, endangering our ability to understand networks' internal processing. In this work, we present a second and non-mutually exclusive origin story of polysemanticity. We show that polysemanticity can arise incidentally, even when there are ample neurons to represent all features in the data, a phenomenon we term \textit{incidental polysemanticity}. Using a combination of theory and experiments, we show that incidental polysemanticity can arise due to multiple reasons including regularization and neural noise; this incidental polysemanticity occurs because random initialization can, by chance alone, initially assign multiple features to the same neuron, and the training dynamics then strengthen such overlap. Our paper concludes by calling for further research quantifying the performance-polysemanticity tradeoff in task-optimized deep neural networks to better understand to what extent polysemanticity is avoidable. △ Less

Submitted 13 February, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.16295 [pdf, other]

Testing Assumptions Underlying a Unified Theory for the Origin of Grid Cells

Authors: Rylan Schaeffer, Mikail Khona, Adrian Bertagnoli, Sanmi Koyejo, Ila Rani Fiete

Abstract: Representing and reasoning about physical space is fundamental to animal survival, and the mammalian lineage expresses a wealth of specialized neural representations that encode space. Grid cells, whose discovery earned a Nobel prize, are a striking example: a grid cell is a neuron that fires if and only if the animal is spatially located at the vertices of a regular triangular lattice that tiles… ▽ More Representing and reasoning about physical space is fundamental to animal survival, and the mammalian lineage expresses a wealth of specialized neural representations that encode space. Grid cells, whose discovery earned a Nobel prize, are a striking example: a grid cell is a neuron that fires if and only if the animal is spatially located at the vertices of a regular triangular lattice that tiles all explored two-dimensional environments. Significant theoretical work has gone into understanding why mammals have learned these particular representations, and recent work has proposed a ``unified theory for the computational and mechanistic origin of grid cells," claiming to answer why the mammalian lineage has learned grid cells. However, the Unified Theory makes a series of highly specific assumptions about the target readouts of grid cells - putatively place cells. In this work, we explicitly identify what these mathematical assumptions are, then test two of the critical assumptions using biological place cell data. At both the population and single-cell levels, we find evidence suggesting that neither of the assumptions are likely true in biological neural representations. These results call the Unified Theory into question, suggesting that biological grid cells likely have a different origin than those obtained in trained artificial neural networks. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Accepted at NeurIPS 2023 Workshops: UniReps, NeurReps, AI4Science

arXiv:2311.02316 [pdf, other]

Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells

Authors: Rylan Schaeffer, Mikail Khona, Tzuhsuan Ma, Cristóbal Eyzaguirre, Sanmi Koyejo, Ila Rani Fiete

Abstract: To solve the spatial problems of mapping, localization and navigation, the mammalian lineage has developed striking spatial representations. One important spatial representation is the Nobel-prize winning grid cells: neurons that represent self-location, a local and aperiodic quantity, with seemingly bizarre non-local and spatially periodic activity patterns of a few discrete periods. Why has the… ▽ More To solve the spatial problems of mapping, localization and navigation, the mammalian lineage has developed striking spatial representations. One important spatial representation is the Nobel-prize winning grid cells: neurons that represent self-location, a local and aperiodic quantity, with seemingly bizarre non-local and spatially periodic activity patterns of a few discrete periods. Why has the mammalian lineage learnt this peculiar grid representation? Mathematical analysis suggests that this multi-periodic representation has excellent properties as an algebraic code with high capacity and intrinsic error-correction, but to date, there is no satisfactory synthesis of core principles that lead to multi-modular grid cells in deep recurrent neural networks. In this work, we begin by identifying key insights from four families of approaches to answering the grid cell question: coding theory, dynamical systems, function optimization and supervised deep learning. We then leverage our insights to propose a new approach that combines the strengths of all four approaches. Our approach is a self-supervised learning (SSL) framework - including data, data augmentations, loss functions and a network architecture - motivated from a normative perspective, without access to supervised position information or engineering of particular readout representations as needed in previous approaches. We show that multiple grid cell modules can emerge in networks trained on our SSL framework and that the networks and emergent representations generalize well outside their training distribution. This work contains insights for neuroscientists interested in the origins of grid cells as well as machine learning researchers interested in novel SSL frameworks. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2309.08632 [pdf, other]

Pretraining on the Test Set Is All You Need

Authors: Rylan Schaeffer

Abstract: Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks. Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter tra… ▽ More Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks. Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter transformer-based LLM \textbf{phi-CTNL} (pronounced ``fictional") that achieves perfect results across diverse academic benchmarks, strictly outperforming all known foundation models. \textbf{phi-CTNL} also beats power-law scaling and exhibits a never-before-seen grokking-like ability to accurately predict downstream evaluation benchmarks' canaries. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: 3 pages, satire

arXiv:2307.10573 [pdf, other]

Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

Authors: Rylan Schaeffer, Kateryna Pistunova, Samar Khanna, Sarthak Consul, Sanmi Koyejo

Abstract: Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-s… ▽ More Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-specific information with abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easily solved tasks to draw meaningful conclusions. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically \textit{invalid} reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts. We also discover that some CoT prompts used by previous works contain logical errors. This suggests that covariates beyond logically valid reasoning are responsible for performance improvements. △ Less

Submitted 22 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: ICML 2023 Workshop: Knowledge and Logical Reasoning in the Era of Data-driven Learning

arXiv:2307.10569 [pdf, ps, other]

Deceptive Alignment Monitoring

Authors: Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo

Abstract: As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a seemingly reasonable manner, while secretly and subtly modifying its behavior for ulterior reasons is often referred to as deceptive alignment in the AI Safety &… ▽ More As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a seemingly reasonable manner, while secretly and subtly modifying its behavior for ulterior reasons is often referred to as deceptive alignment in the AI Safety & Alignment communities. Consequently, we call this new direction Deceptive Alignment Monitoring. In this work, we identify emerging directions in diverse machine learning subfields that we believe will become increasingly important and intertwined in the near future for deceptive alignment monitoring, and we argue that advances in these fields present both long-term challenges and new research opportunities. We conclude by advocating for greater involvement by the adversarial machine learning community in these emerging directions. △ Less

Submitted 25 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: Accepted as BlueSky Oral to 2023 ICML AdvML Workshop

arXiv:2307.10563 [pdf, other]

FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation

Authors: Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo

Abstract: We present FACADE, a novel probabilistic and geometric framework designed for unsupervised mechanistic anomaly detection in deep neural networks. Its primary goal is advancing the understanding and mitigation of adversarial attacks. FACADE aims to generate probabilistic distributions over circuits, which provide critical insights to their contribution to changes in the manifold properties of pseud… ▽ More We present FACADE, a novel probabilistic and geometric framework designed for unsupervised mechanistic anomaly detection in deep neural networks. Its primary goal is advancing the understanding and mitigation of adversarial attacks. FACADE aims to generate probabilistic distributions over circuits, which provide critical insights to their contribution to changes in the manifold properties of pseudo-classes, or high-dimensional modes in activation space, yielding a powerful tool for uncovering and combating adversarial attacks. Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: Accepted as BlueSky Poster at 2023 ICML AdvML Workshop

arXiv:2306.11698 [pdf, other]

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Authors: Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li

Abstract: Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly. To thi… ▽ More Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly. To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives -- including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. Based on our evaluations, we discover previously unpublished vulnerabilities to trustworthiness threats. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, potentially because GPT-4 follows (misleading) instructions more precisely. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps. Our benchmark is publicly available at https://decodingtrust.github.io/ ; our dataset can be previewed at https://huggingface.co/datasets/AI-Secure/DecodingTrust ; a concise version of this work is at https://openreview.net/pdf?id=kaHpo8OZw2 . △ Less

Submitted 26 February, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023 Outstanding Paper (Datasets and Benchmarks Track)

arXiv:2304.15004 [pdf, other]

Are Emergent Abilities of Large Language Models a Mirage?

Authors: Rylan Schaeffer, Brando Miranda, Sanmi Koyejo

Abstract: Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an… ▽ More Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due to the researcher's choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous predictable changes in model performance. We present our alternative explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities; (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show to choose metrics to produce never-before-seen seemingly emergent abilities in multiple vision tasks across diverse deep networks. Via all three analyses, we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling AI models. △ Less

Submitted 22 May, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

arXiv:2303.14151 [pdf, other]

Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Authors: Rylan Schaeffer, Mikail Khona, Zachary Robertson, Akhilan Boopathy, Kateryna Pistunova, Jason W. Rocks, Ila Rani Fiete, Oluwasanmi Koyejo

Abstract: Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine lea… ▽ More Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine learning. This non-monotonic behavior of test loss depends on the number of data, the dimensionality of the data and the number of model parameters. Here, we briefly describe double descent, then provide an explanation of why double descent occurs in an informal and approachable manner, requiring only familiarity with linear algebra and introductory probability. We provide visual intuition using polynomial regression, then mathematically analyze double descent with ordinary linear regression and identify three interpretable factors that, when simultaneously all present, together create double descent. We demonstrate that double descent occurs on real data when using ordinary linear regression, then demonstrate that double descent does not occur when any of the three factors are ablated. We use this understanding to shed light on recent observations in nonlinear models concerning superposition and double descent. Code is publicly available. △ Less

Submitted 24 March, 2023; originally announced March 2023.

arXiv:2205.01212 [pdf, other]

Streaming Inference for Infinite Non-Stationary Clustering

Authors: Rylan Schaeffer, Gabrielle Kaili-May Liu, Yilun Du, Scott Linderman, Ila Rani Fiete

Abstract: Learning from a continuous stream of non-stationary data in an unsupervised manner is arguably one of the most common and most challenging settings facing intelligent agents. Here, we attack learning under all three conditions (unsupervised, streaming, non-stationary) in the context of clustering, also known as mixture modeling. We introduce a novel clustering algorithm that endows mixture models… ▽ More Learning from a continuous stream of non-stationary data in an unsupervised manner is arguably one of the most common and most challenging settings facing intelligent agents. Here, we attack learning under all three conditions (unsupervised, streaming, non-stationary) in the context of clustering, also known as mixture modeling. We introduce a novel clustering algorithm that endows mixture models with the ability to create new clusters online, as demanded by the data, in a probabilistic, time-varying, and principled manner. To achieve this, we first define a novel stochastic process called the Dynamical Chinese Restaurant Process (Dynamical CRP), which is a non-exchangeable distribution over partitions of a set; next, we show that the Dynamical CRP provides a non-stationary prior over cluster assignments and yields an efficient streaming variational inference algorithm. We conclude with experiments showing that the Dynamical CRP can be applied on diverse synthetic and real data with Gaussian and non-Gaussian likelihoods. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: Published at the Workshop on Agent Learning in Open-Endedness (ALOE) at ICLR 2022

Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:19366-19387, 2022

arXiv:2202.10889 [pdf, other]

doi 10.1038/s41467-022-33154-7

Resolving molecular diffusion and aggregation of antibody proteins with megahertz X-ray free-electron laser pulses

Authors: Mario Reiser, Anita Girelli, Anastasia Ragulskaya, Sudipta Das, Sharon Berkowicz, Maddalena Bin, Marjorie Ladd-Parada, Mariia Filianina, Hanna-Friederike Poggemann, Nafisa Begam, Mohammad Sayed Akhundzadeh, Sonja Timmermann, Lisa Randolph, Yuriy Chushkin, Tilo Seydel, Ulrike Boesenberg, Jörg Hallmann, Johannes Möller, Angel Rodriguez-Fernandez, Robert Rosca, Robert Schaffer, Markus Scholz, Roman Shayduk, Alexey Zozulya, Anders Madsen , et al. (4 additional authors not shown)

Abstract: X-ray free-electron lasers (XFELs) with megahertz repetition rate can provide novel insights into structural dynamics of biological macromolecule solutions. However, very high dose rates can lead to beam-induced dynamics and structural changes due to radiation damage. Here, we probe the dynamics of dense antibody protein (Ig-PEG) solutions using megahertz X-ray photon correlation spectroscopy (MHz… ▽ More X-ray free-electron lasers (XFELs) with megahertz repetition rate can provide novel insights into structural dynamics of biological macromolecule solutions. However, very high dose rates can lead to beam-induced dynamics and structural changes due to radiation damage. Here, we probe the dynamics of dense antibody protein (Ig-PEG) solutions using megahertz X-ray photon correlation spectroscopy (MHz-XPCS) at the European XFEL. By varying the total dose and dose rate, we identify a regime for measuring the motion of proteins in their first coordination shell, quantify XFEL-induced effects such as driven motion, and map out the extent of agglomeration dynamics. The results indicate that for average dose rates below $1.06\,\mathrm{kGy}\mathrm{μs}^{-1}$ in a time window up to $10\,\mathrm{μs}$, it is possible to capture the protein dynamics before the onset of beam induced aggregation. We refer to this approach as correlation before aggregation and demonstrate that MHz-XPCS bridges an important spatio-temporal gap in measurement techniques for biological samples. △ Less

Submitted 5 October, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: 22 pages, 6 figures

Journal ref: Nat Commun 13, 5528 (2022)

arXiv:2202.06892 [pdf, other]

DeCorus: Hierarchical Multivariate Anomaly Detection at Cloud-Scale

Authors: Bruno Wassermann, David Ohana, Ronen Schaffer, Robert Shahla, Elliot K. Kolodner, Eran Raichstein, Michal Malka

Abstract: Multivariate anomaly detection can be used to identify outages within large volumes of telemetry data for computing systems. However, developing an efficient anomaly detector that can provide users with relevant information is a challenging problem. We introduce our approach to hierarchical multivariate anomaly detection called DeCorus, a statistical multivariate anomaly detector which achieves li… ▽ More Multivariate anomaly detection can be used to identify outages within large volumes of telemetry data for computing systems. However, developing an efficient anomaly detector that can provide users with relevant information is a challenging problem. We introduce our approach to hierarchical multivariate anomaly detection called DeCorus, a statistical multivariate anomaly detector which achieves linear complexity. It extends standard statistical techniques to improve their ability to find relevant anomalies within noisy signals and makes use of types of domain knowledge that system operators commonly possess to compute system-level anomaly scores. We describe the implementation of DeCorus an online log anomaly detection tool for network device syslog messages deployed at a cloud service provider. We use real-world data sets that consist of $1.5$ billion network device syslog messages and hundreds of incident tickets to characterize the performance of DeCorus and compare its ability to detect incidents with five alternative anomaly detectors. While DeCorus outperforms the other anomaly detectors, all of them are challenged by our data set. We share how DeCorus provides value in the field and how we plan to improve its incident detection accuracy. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: 11 pages, 4 figures, draft

arXiv:2111.03745 [pdf, other]

An Algorithmic Theory of Metacognition in Minds and Machines

Authors: Rylan Schaeffer

Abstract: Humans sometimes choose actions that they themselves can identify as sub-optimal, or wrong, even in the absence of additional information. How is this possible? We present an algorithmic theory of metacognition based on a well-understood trade-off in reinforcement learning (RL) between value-based RL and policy-based RL. To the cognitive (neuro)science community, our theory answers the outstanding… ▽ More Humans sometimes choose actions that they themselves can identify as sub-optimal, or wrong, even in the absence of additional information. How is this possible? We present an algorithmic theory of metacognition based on a well-understood trade-off in reinforcement learning (RL) between value-based RL and policy-based RL. To the cognitive (neuro)science community, our theory answers the outstanding question of why information can be used for error detection but not for action selection. To the machine learning community, our proposed theory creates a novel interaction between the Actor and Critic in Actor-Critic agents and notes a novel connection between RL and Bayesian Optimization. We call our proposed agent the Metacognitive Actor Critic (MAC). We conclude with showing how to create metacognition in machines by implementing a deep MAC and showing that it can detect (some of) its own suboptimal actions without external information or delay. △ Less

Submitted 5 November, 2021; originally announced November 2021.

arXiv:1706.07982 [pdf, ps, other]

doi 10.1063/1.4986291

The nature of three-body interactions in DFT: exchange and polarization effects

Authors: Michał Hapka, Łukasz Rajchel, Marcin Modrzejewski, Rainer Schäffer, Grzegorz Chałasiński, Małgorzata M. Szczęśniak

Abstract: We propose a physically motivated decomposition of DFT 3-body nonadditive interaction energies into the exchange and density-deformation (polarization) components. The exchange component represents the effect of the Pauli exclusion in the wave function of the trimer and is found to be challenging for density functional approximations (DFAs). The remaining density-deformation nonadditivity is less… ▽ More We propose a physically motivated decomposition of DFT 3-body nonadditive interaction energies into the exchange and density-deformation (polarization) components. The exchange component represents the effect of the Pauli exclusion in the wave function of the trimer and is found to be challenging for density functional approximations (DFAs). The remaining density-deformation nonadditivity is less dependent upon the DFAs. Numerical demonstration is carried out for rare gas atom trimers, Ar$_2$-HX (X = F, Cl) complexes, and small hydrogen-bonded and van der Waals molecular systems. None of the tested semilocal, hybrid, and range-separated DFAs properly accounts for the nonadditive exchange in dispersion-bonded trimers. By contrast, for hydrogen-bonded systems range-separated hybrids achieve a qualitative agreement to within 20% of the reference exchange energy. A reliable performance for all systems is obtained only when the monomers interact through the Hartree-Fock potential in the dispersion-free Pauli Blockade scheme. Additionally, we identify the nonadditive second-order exchange-dispersion energy as an important but overlooked contribution in force-field-like dispersion corrections. Our results suggest that range-separated functionals do not include this component although semilocal and global hybrid DFAs appear to imitate it in the short range. △ Less

Submitted 10 August, 2017; v1 submitted 24 June, 2017; originally announced June 2017.

arXiv:1703.09220 [pdf, other]

doi 10.1103/PhysRevB.96.165117

Fermionic spin liquid analysis of the paramagnetic state in Volborthite

Authors: Li Ern Chern, Robert Schaffer, Sopheak Sorn, Yong Baek Kim

Abstract: Recently, thermal Hall effect has been observed in the paramagnetic state of Volborthite, which consists of distorted Kagome layers with $S=1/2$ local moments. Despite the appearance of a magnetic order below $1 \, \mathrm{K}$, the response to external magnetic field and unusual properties of the paramagnetic state above $1 \, \mathrm{K}$ suggest possible realization of exotic quantum phases. Moti… ▽ More Recently, thermal Hall effect has been observed in the paramagnetic state of Volborthite, which consists of distorted Kagome layers with $S=1/2$ local moments. Despite the appearance of a magnetic order below $1 \, \mathrm{K}$, the response to external magnetic field and unusual properties of the paramagnetic state above $1 \, \mathrm{K}$ suggest possible realization of exotic quantum phases. Motivated by these discoveries, we investigate possible spin liquid phases with fermionic spinon excitations in a non-symmorphic version of the Kagome lattice, which belongs to the two-dimensional crystallographic group $p2gg$. This non-symmorphic structure is consistent with the spin model obtained in the density functional theory (DFT) calculation. Using projective symmetry group (PSG) analysis and fermionic parton mean field theory, we identify twelve distinct $\mathbb{Z}_2$ spin liquid states, four of which are found to have correspondence in the eight Schwinger boson spin liquid states we classified earlier. We focus on the four fermionic states with bosonic counterpart and find that the spectrum of their corresponding root $U(1)$ states feature spinon Fermi surface. The existence of spinon Fermi surface in candidate spin liquid states may offer a possible explanation of the finite thermal Hall conductivity observed in Volborthite. △ Less

Submitted 27 March, 2017; originally announced March 2017.

Comments: 16 pages, 8 figures, 3 tables

Journal ref: Phys. Rev. B 96, 165117 (2017)

arXiv:1605.05322 [pdf, other]

doi 10.1103/PhysRevB.95.054410

Quantum Spin Liquid in a Breathing Kagome Lattice

Authors: Robert Schaffer, Yejin Huh, Kyusung Hwang, Yong Baek Kim

Abstract: Motivated by recent experiments on the vanadium oxyfluoride material DQVOF, we examine possible spin liquid phases on a breathing kagome lattice of S=1/2 spins. By performing a projective symmetry group analysis, we determine the possible phases for both fermionic and bosonic $\mathbb{Z}_2$ spin liquids on this lattice, and establish the correspondence between the two. The nature of the ground sta… ▽ More Motivated by recent experiments on the vanadium oxyfluoride material DQVOF, we examine possible spin liquid phases on a breathing kagome lattice of S=1/2 spins. By performing a projective symmetry group analysis, we determine the possible phases for both fermionic and bosonic $\mathbb{Z}_2$ spin liquids on this lattice, and establish the correspondence between the two. The nature of the ground state of the Heisenberg model on the isotropic kagome lattice is a hotly debated topic, with both $\mathbb{Z}_2$ and U(1) spin liquids argued to be plausible ground states. Using variational Monte Carlo techniques, we show that a gapped $\mathbb{Z}_2$ spin liquid emerges as the clear ground state in the presence of this breathing anisotropy. Our results suggest that the breathing anisotropy helps to stabilize this spin liquid ground state, which may aid us in understanding the results of experiments and help to direct future numerical studies on these systems. △ Less

Submitted 17 May, 2016; originally announced May 2016.

Journal ref: Phys. Rev. B 95, 054410 (2017)

arXiv:1512.02224 [pdf, other]

doi 10.1088/0034-4885/79/9/094504

Recent progress on correlated electron systems with strong spin-orbit coupling

Authors: Robert Schaffer, Eric Kin-Ho Lee, Bohm-Jung Yang, Yong Baek Kim

Abstract: Emergence of novel quantum ground states in correlated electron systems with strong spin-orbit coupling has been a recent subject of intensive studies. While it has been realized that spin-orbit coupling can provide non-trivial band topology in weakly interacting electron systems, as in topological insulators and semi-metals, the role of electron-electron interaction in strongly spin-orbit coupled… ▽ More Emergence of novel quantum ground states in correlated electron systems with strong spin-orbit coupling has been a recent subject of intensive studies. While it has been realized that spin-orbit coupling can provide non-trivial band topology in weakly interacting electron systems, as in topological insulators and semi-metals, the role of electron-electron interaction in strongly spin-orbit coupled systems has not been fully understood. The availability of new materials with significant electron correlation and strong spin-orbit coupling now makes such investigations possible. Many of these materials contain 5d or 4d transition metal elements; the prominent examples are iridium oxides or iridates. In this review, we succinctly discuss recent theoretical and experimental progress on this subject. After providing a brief overview, we focus on pyrochlore iridates and three-dimensional honeycomb iridates. In pyrochlore iridates, we discuss the quantum criticality of the bulk and surface states, and the relevance of the surface/boundary states in a number of topological and magnetic ground states, both in the bulk and thin film configurations. Experimental signatures of these boundary and bulk states are discussed. Domain wall formation and strongly-direction-dependent magneto-transport are also discussed. Regarding the three-dimensional honeycomb iridates, we consider possible quantum spin liquid phases and unusual magnetic orders in theoretical models with strongly bond-dependent interactions. These theoretical ideas and results are discussed in light of recent resonant X-ray scattering experiments on three-dimensional honeycomb iridates. We also contrast these results with the situation in two-dimensional honeycomb iridates. We conclude with the outlook on other related systems. △ Less

Submitted 7 December, 2015; originally announced December 2015.

Comments: 19 pages plus references, submitted to Reports on Progress in Physics

Journal ref: Reports on Progress in Physics 79, 9, 094504 (2016)

arXiv:1409.5125 [pdf, other]

doi 10.1103/PhysRevLett.114.116803

Topological spinon semimetals and gapless boundary states in three dimensions

Authors: Robert Schaffer, Eric Kin-Ho Lee, Yuan-Ming Lu, Yong Baek Kim

Abstract: Recently there has been much effort in understanding topological phases of matter with gapless bulk excitations, which are characterized by topological invariants and protected intrinsic boundary states. Here we show that topological semimetals of Majorana fermions arise in exactly solvable Kitaev spin models on a series of three dimensional lattices. The ground states of these models are quantum… ▽ More Recently there has been much effort in understanding topological phases of matter with gapless bulk excitations, which are characterized by topological invariants and protected intrinsic boundary states. Here we show that topological semimetals of Majorana fermions arise in exactly solvable Kitaev spin models on a series of three dimensional lattices. The ground states of these models are quantum spin liquids with gapless nodal spectra of bulk Majorana fermion excitations. It is shown that these phases are topologically stable as long as certain discrete symmetries are protected. The corresponding topological indices and the gapless boundary states are explicitly computed to support these results. In contrast to previous studies of non-interacting systems, the phases discussed in this work are novel examples of gapless topological phases in interacting spin systems. △ Less

Submitted 18 March, 2015; v1 submitted 17 September, 2014; originally announced September 2014.

Comments: 5 pages and 2 figures. Supplemental material: 2 pages

Journal ref: Phys. Rev. Lett. 114, 116803 (2015)

arXiv:1308.6592 [pdf, other]

doi 10.1103/PhysRevB.89.045117

Heisenberg-Kitaev model on hyperhoneycomb lattice

Authors: Eric Kin-Ho Lee, Robert Schaffer, Subhro Bhattacharjee, Yong Baek Kim

Abstract: Motivated by recent experiments on $β-$Li$_2$IrO$_3$, we study the phase diagram of the Heisenberg-Kitaev model on a three dimensional lattice of tri-coordinated Ir$^{4+}$, dubbed the hyperhoneycomb lattice by Takagi et. al. The lattice geometry of this material, along with Ir$^{4+}$ ions carrying $J_{\rm eff}=1/2$ moments, suggests that the Heisenberg-Kitaev model may effectively capture the low… ▽ More Motivated by recent experiments on $β-$Li$_2$IrO$_3$, we study the phase diagram of the Heisenberg-Kitaev model on a three dimensional lattice of tri-coordinated Ir$^{4+}$, dubbed the hyperhoneycomb lattice by Takagi et. al. The lattice geometry of this material, along with Ir$^{4+}$ ions carrying $J_{\rm eff}=1/2$ moments, suggests that the Heisenberg-Kitaev model may effectively capture the low energy spin-physics of the system in the strong-coupling limit. Using a combination of semiclassical analysis, exact solution and slave-fermion mean field theory, we find, in addition to the spin-liquid, four different magnetically ordered phases depending on the parameter regime. All four magnetic phases--the Néel, the polarized ferromagnet, the skew-stripy and the skew-zig-zag, have collinear spin ordering. The three dimensional Z$_2$ spin liquid, which extends over an extended parameter regime around the exactly solvable Kitaev point, has a gapless Majorana mode with a deformed Fermi-circle (co-dimensions, $d_c=2$). We discuss the effect of the magnetic field and finite temperature on different phases that may be relevant for future experiments. △ Less

Submitted 20 January, 2014; v1 submitted 29 August, 2013; originally announced August 2013.

Comments: 11 pages, 15 figures; added new appendix, corrected typos, updated journal reference

Journal ref: Phys. Rev. B 89, 045117 (2014)

arXiv:1304.2766 [pdf, other]

doi 10.1103/PhysRevB.88.174405

Spin-orbital liquids in non-Kramers magnet on Kagome lattice

Authors: Robert Schaffer, Subhro Bhattacharjee, Yong Baek Kim

Abstract: Localized magnetic moments with crystal-field doublet or pseudo-spin 1/2 may arise in correlated insulators with even number of electrons and strong spin-orbit coupling. Such a non-Kramers pseudo-spin 1/2 is the consequence of crystalline symmetries as opposed to the Kramers doublet arising from time-reversal invariance, and is necessarily a composite of spin and orbital degrees of freedom. We inv… ▽ More Localized magnetic moments with crystal-field doublet or pseudo-spin 1/2 may arise in correlated insulators with even number of electrons and strong spin-orbit coupling. Such a non-Kramers pseudo-spin 1/2 is the consequence of crystalline symmetries as opposed to the Kramers doublet arising from time-reversal invariance, and is necessarily a composite of spin and orbital degrees of freedom. We investigate possible spin-orbital liquids with fermionic spinons for such non-Kramers pseudo-spin 1/2 systems on the Kagome lattice. Using the projective symmetry group analysis, we find {\it ten} new phases that are not allowed in the corresponding Kramers systems. These new phases are allowed due to unusual action of the time reversal operation on non-Kramers pseudo-spins. We compute the spin-spin dynamic structure factor that shows characteristic features of these non-Kramers spin-orbital liquids arising from their unusual coupling to neutrons, which is therefore relevant for neutron scattering experiments. We also point out possible anomalous broadening of Raman scattering intensity that may serve as a signature experimental feature for gapless non-Kramers spin-orbital liquids. △ Less

Submitted 9 April, 2013; originally announced April 2013.

Comments: 11 pages, 4 figures

Journal ref: Phys. Rev. B 88, 174405 (2013)

arXiv:1206.5814 [pdf, other]

doi 10.1103/PhysRevB.86.224417

Quantum Phase Transition in Heisenberg-Kitaev Model

Authors: Robert Schaffer, Subhro Bhattacharjee, Yong Baek Kim

Abstract: We explore the nature of the quantum phase transition between a magnetically ordered state with collinear spin pattern and a gapless $Z_2$ spin liquid in the Heisenberg-Kitaev model. We construct a slave particle mean field theory for the Heisenberg-Kitaev model in terms of complex fermionic spinons. It is shown that this theory, formulated in the appropriate basis, is capable of describing the Ki… ▽ More We explore the nature of the quantum phase transition between a magnetically ordered state with collinear spin pattern and a gapless $Z_2$ spin liquid in the Heisenberg-Kitaev model. We construct a slave particle mean field theory for the Heisenberg-Kitaev model in terms of complex fermionic spinons. It is shown that this theory, formulated in the appropriate basis, is capable of describing the Kitaev spin liquid as well as the transition between the gapless $Z_2$ spin liquid and the so-called stripy antiferromagnet. In particular, within a mean field theory, we have a discontinuous transition from the $Z_2$ spin liquid to the stripy antiferromagnet. We argue, however, that subtle spinon confinement effects, associated with the instability of gapped U(1) spin liquid in two spatial dimensions, are playing an important role at the transition. The possibility of an exotic continuous transition is briefly addressed. △ Less

Submitted 13 March, 2013; v1 submitted 25 June, 2012; originally announced June 2012.

Comments: 12 pages, 6 figures

Journal ref: Phys. Rev. B 86, 224417 (2012)

arXiv:1107.2946 [pdf, ps, other]

doi 10.1088/0004-637X/740/2/51

Average Heating Rate of Hot Atmospheres in Distant Clusters by Radio AGN: Evidence for Continuous AGN Heating

Authors: C. -J. Ma, B. R. McNamara, P. E. J. Nulsen, R. Schaffer, A. Vikhlinin

Abstract: We examine atmospheric heating by radio active galactic nuclei (AGN) in distant X-ray clusters by cross correlating clusters selected from the 400 Square Degree (400SD) X-ray Cluster survey with radio sources in the NRAO VLA Sky Survey. Roughly 30% of the clusters show radio emission above a flux threshold of 3 mJy within a projected radius of 250 kpc. The radio emission is presumably associated w… ▽ More We examine atmospheric heating by radio active galactic nuclei (AGN) in distant X-ray clusters by cross correlating clusters selected from the 400 Square Degree (400SD) X-ray Cluster survey with radio sources in the NRAO VLA Sky Survey. Roughly 30% of the clusters show radio emission above a flux threshold of 3 mJy within a projected radius of 250 kpc. The radio emission is presumably associated with the brightest cluster galaxy. The mechanical jet power for each radio source was determined using scaling relations between radio power and cavity (mechanical) power determined for nearby clusters, groups, and galaxies with hot atmospheres containing X-ray cavities. The average jet power of the central radio AGN is approximately $2\times 10^{44}$\ergs. We find no significant correlation between radio power, hence mechanical jet power, and the X-ray luminosities of clusters in the redshift range 0.1 -- 0.6. This implies that the mechanical heating rate per particle is higher in lower mass, lower X-ray luminosity clusters. The jet power averaged over the sample corresponds to an atmospheric heating of approximately 0.2 keV per particle within R$_{500}$. Assuming the current AGN heating rate does not evolve but remains constant to redshifts of 2, the heating rate per particle would rise by a factor of two. We find that the energy injected from radio AGN contribute substantially to the excess entropy in hot atmospheres needed to break self-similarity in cluster scaling relations. The detection frequency of radio AGN is inconsistent with the presence of strong cooling flows in 400SD clusters, but does not exclude weak cooling flows. It is unclear whether central AGN in 400SD clusters are maintained by feedback at the base of a cooling flow. Atmospheric heating by radio AGN may retard the development of strong cooling flows at early epochs. △ Less

Submitted 16 August, 2011; v1 submitted 14 July, 2011; originally announced July 2011.

Comments: ApJ in press

arXiv:0904.3751 [pdf, ps, other]

doi 10.1063/1.3151844

Quantum fields on curved spacetimes and a new look at the Unruh effect

Authors: Ugo Moschella, Richard Schaeffer

Abstract: We describe a new viewpoint on canonical quantization of linear fields on a general curved background that encompasses and generalizes the standard treatment of canonical QFT given in textbooks. Our method permits the construction of pure states and mixed stated with the same technique. We apply our scheme to the study of Rindler QFT and we present a new derivation of the Unruh effect based on i… ▽ More We describe a new viewpoint on canonical quantization of linear fields on a general curved background that encompasses and generalizes the standard treatment of canonical QFT given in textbooks. Our method permits the construction of pure states and mixed stated with the same technique. We apply our scheme to the study of Rindler QFT and we present a new derivation of the Unruh effect based on invariance arguments. △ Less

Submitted 23 April, 2009; originally announced April 2009.

Journal ref: AIP Conf.Proc.1132:303-332,2009

arXiv:0811.1224 [pdf, ps, other]

doi 10.1103/PhysRevB.80.014503

Superfluid and supersolid phases of lattice bosons with ring-exchange interaction

Authors: Robert Schaffer, Anton A. Burkov, Roger G. Melko

Abstract: We examine the superfluid phase of a hard-core boson model with nearest-neighbor exchange J and four-particle ring-exchange K at half-filling on the square lattice. At zero temperature we find that the superfluid in the pure-J model is quickly destroyed by the inclusion of negative-K ring-exchange interactions, favoring a state with a (pi,pi) ordering wavevector. Minimization of the mean-field e… ▽ More We examine the superfluid phase of a hard-core boson model with nearest-neighbor exchange J and four-particle ring-exchange K at half-filling on the square lattice. At zero temperature we find that the superfluid in the pure-J model is quickly destroyed by the inclusion of negative-K ring-exchange interactions, favoring a state with a (pi,pi) ordering wavevector. Minimization of the mean-field energy suggests that a supersolid state with coexisting superfluidity, charge-density wave, and valence-bond-like order is formed. We also study the behavior of the finite-T Kosterlitz-Thouless phase transition in the superfluid phase, by forcing the Nelson-Kosterlitz universal jump condition on the finite-T spin wave superfluid density. Away from the pure J point, T_{KT} decreases rapidly for negative K, while for positive K, T_{KT} reaches a maximum at some K \neq 0 in agreement with recent quantum Monte Carlo simulations. △ Less

Submitted 7 November, 2008; originally announced November 2008.

Comments: 7 pages, 5 figures

Journal ref: Phys. Rev. B 80, 014503 (2009)

arXiv:0802.2447 [pdf, ps, other]

doi 10.1088/1475-7516/2009/02/033

A note on canonical quantization of fields on a manifold

Authors: Ugo Moschella, Richard Schaeffer

Abstract: We propose a general construction of quantum states for linear canonical quantum fields on a manifold, which encompasses and generalizes the "standard" procedures existing in textbooks. Our method provides pure and mixed states on the same footing. A large class of examples finds a simple and unified treatment in our approach. Applications discussed here include thermodynamical equilibrium state… ▽ More We propose a general construction of quantum states for linear canonical quantum fields on a manifold, which encompasses and generalizes the "standard" procedures existing in textbooks. Our method provides pure and mixed states on the same footing. A large class of examples finds a simple and unified treatment in our approach. Applications discussed here include thermodynamical equilibrium states for Minkowski fields and quantum field theory in the Rindler's and in the open de Sitter universes. Our approach puts the above examples into perspective and unravels new possibilities for quantization. We call our generalization "extended canonical quantization" as it is suited to attack cases not directly covered by the standard canonical approach. △ Less

Submitted 1 April, 2009; v1 submitted 18 February, 2008; originally announced February 2008.

Journal ref: JCAP 0902:033,2009

arXiv:0709.2795 [pdf, ps, other]

doi 10.1088/0264-9381/24/14/003

Quantum Theory on Lobatchevski Spaces

Authors: Ugo Moschella, Richard Schaeffer

Abstract: In this paper we set up a general formalism to deal with quantum theories on a Lobatchevski space, i.e. a spatial manifold that is homogeneous, isotropic and has negative curvature. The heart of our approach is the construction of a suitable basis of plane waves which are eigenfunctions of the Laplace-Beltrami operator relative to the geometry of the curved space. These functions were previously… ▽ More In this paper we set up a general formalism to deal with quantum theories on a Lobatchevski space, i.e. a spatial manifold that is homogeneous, isotropic and has negative curvature. The heart of our approach is the construction of a suitable basis of plane waves which are eigenfunctions of the Laplace-Beltrami operator relative to the geometry of the curved space. These functions were previously introduced in the mathematical literature in the context of group theory; here we revisit and adapt the formalism in a way specific for quantum mechanics. Our developments render dealing with Lobatchevski spaces, which used to be quite difficult and source of controversies, easily tractable. Applications to the Milne and de Sitter universes are discussed as examples. △ Less

Submitted 18 September, 2007; originally announced September 2007.

Journal ref: Class.Quant.Grav.24:3571-3602,2007

arXiv:astro-ph/0410591 [pdf, ps, other]

doi 10.1051/0004-6361:20042238

Constraints on Dark Matter interactions from structure formation: Damping lengths

Authors: Celine Boehm, Richard Schaeffer

Abstract: (Shortened) Weakly Interacting Massive Particles are often said to be the best Dark Matter candidates. Studies have shown however that rather large Dark Matter-photon or Dark Matter-baryon interactions could be allowed by cosmology. Here we address the question of the role of the Dark Matter interactions in more detail to determine at which extent Dark Matter has to be necessarily weakly interac… ▽ More (Shortened) Weakly Interacting Massive Particles are often said to be the best Dark Matter candidates. Studies have shown however that rather large Dark Matter-photon or Dark Matter-baryon interactions could be allowed by cosmology. Here we address the question of the role of the Dark Matter interactions in more detail to determine at which extent Dark Matter has to be necessarily weakly interacting. To this purpose, we compute the collisional damping (and free-streaming) lengths of generic interacting Dark Matter candidates and compare them to the scale of the smallest primordial structures known to exist in the Universe. We obtain necessary conditions that any candidate must satisfy. We point out the existence of new Dark Matter scenarios and exhibit new damping regimes. For example, an interacting candidate may bear a similar damping than that of collisionless Warm Dark Matter particles. The main difference is due to the Dark Matter coupling to interacting (or even freely-propagating) species. Our approach yields a general classification of Dark Matter candidates which extends the definitions of the usual Cold, Warm and Hot Dark Matter scenarios when interactions, weak or strong, are considered. △ Less

Submitted 25 October, 2004; originally announced October 2004.

Comments: 35p

arXiv:astro-ph/0212449 [pdf, ps, other]

doi 10.1046/j.1365-8711.2003.06781.x

The phase-diagram of the IGM and the entropy floor of groups and clusters: are clusters born warm?

Authors: P. Valageas, R. Schaeffer, J. Silk

Abstract: We point out that two problems of observational cosmology, the facts i) that > 60% of the baryonic content of the universe is not observed at z=0 and ii) that the properties of small clusters do not agree with simple expectations, could be closely related. As shown by recent studies, the shock-heating associated with the formation of large-scale structures heats the intergalactic medium (IGM) an… ▽ More We point out that two problems of observational cosmology, the facts i) that > 60% of the baryonic content of the universe is not observed at z=0 and ii) that the properties of small clusters do not agree with simple expectations, could be closely related. As shown by recent studies, the shock-heating associated with the formation of large-scale structures heats the intergalactic medium (IGM) and leads to a ``warm IGM'' component for the gas. In the same spirit, we suggest the intracluster medium (ICM) to be a mixture of galaxy-recycled, metal enriched gas and intergalactic gas, shock-heated by the collapsing much larger scales. This could be obtained through two processes: 1) the late infalling gas from the external warm IGM is efficiently mixed within the halo and brings some additional entropy, or 2) the shocks generated by larger non-linear scales are also present within clusters and can heat the ICM. We show that if assumption (1) holds, the entropy brought by the warm IGM is sufficient to explain the observed properties of clusters, in particular the entropy floor and the LX-T relation. On the other hand, we briefly note that the scenario (2) would require a stronger shock-heating because of the larger density of the ICM as compared with filaments. Our scenario of clusters being "born warm" can be checked through the predicted redshift evolution of the entropy floor. △ Less

Submitted 29 September, 2003; v1 submitted 19 December, 2002; originally announced December 2002.

Comments: 8 pages, final version published in MNRAS

Journal ref: Mon.Not.Roy.Astron.Soc. 344 (2003) 53

arXiv:astro-ph/0205406 [pdf, ps, other]

Constraining the strength of Dark Matter Interactions from Structure Formation

Authors: Celine Boehm, Pierre Fayet, Richard Schaeffer

Abstract: We discuss the damping of primordial dark matter fluctuations, taking into account explicitly the interactions of dark matter - whatever their intensity - both with itself and with other particle species. Relying on a general classification of dark matter particle candidates, our analysis provides, from structure formation, a new set of constraints on the dark matter particle mass and interactio… ▽ More We discuss the damping of primordial dark matter fluctuations, taking into account explicitly the interactions of dark matter - whatever their intensity - both with itself and with other particle species. Relying on a general classification of dark matter particle candidates, our analysis provides, from structure formation, a new set of constraints on the dark matter particle mass and interaction rates (in particular with photons and neutrinos). This determines up to which cross sections the dark matter interactions may effectively be disregarded, and when they start playing an essential role, either through collisional damping or through an enhancement of the free-streaming scale. It leads us to extend the notions of Cold, Warm and Hot Dark Matter scenarios when dark matter interactions are no longer taken to be negligible. It also suggests the possibility of new scenarios of Collisional Warm Dark Matter, with moderate damping induced by dark matter interactions. △ Less

Submitted 23 May, 2002; originally announced May 2002.

Comments: 12 pages. Invited talk at DARK 2002, 4th Int. Conf. on Dark Matter in Astro and Particle Physics, Cape Town, Feb. 2002

Report number: LPTENS-02/34

arXiv:astro-ph/0112522 [pdf, ps, other]

doi 10.1103/PhysRevD.66.083505

Interacting Dark Matter disguised as Warm Dark Matter

Authors: Celine Boehm, Alain Riazuelo, Steen H. Hansen, Richard Schaeffer

Abstract: We explore some of the consequences of Dark Matter-photon interactions on structure formation, focusing on the evolution of cosmological perturbations and performing both an analytical and a numerical study. We compute the cosmic microwave background anisotropies and matter power spectrum in this class of models. We find, as the main result, that when Dark Matter and photons are coupled, Dark Ma… ▽ More We explore some of the consequences of Dark Matter-photon interactions on structure formation, focusing on the evolution of cosmological perturbations and performing both an analytical and a numerical study. We compute the cosmic microwave background anisotropies and matter power spectrum in this class of models. We find, as the main result, that when Dark Matter and photons are coupled, Dark Matter perturbations can experience a new damping regime in addition to the usual collisional Silk damping effect. Such Dark Matter particles (having quite large photon interactions) behave like Cold Dark Matter or Warm Dark Matter as far as the cosmic microwave background anisotropies or matter power spectrum are concerned, respectively. These Dark Matter-photon interactions leave specific imprints at sufficiently small scales on both of these two spectra, which may allow to put new constraints on the acceptable photon-Dark Matter interactions. Under the conservative assumption that the abundance of 10^12 M_sol galaxies is correctly given by Cold Dark Matter, and without any knowledge of the abundance of smaller objects, we obtain the limit on the ratio of the Dark Matter-photon cross section to the Dark Matter mass sigma_{gamma-DM} / m_DM < 10^-6 sigma_Thomson / 100 GeV \sim 6 * 10^-33 cm^2 GeV^-1 . △ Less

Submitted 11 September, 2002; v1 submitted 21 December, 2001; originally announced December 2001.

Comments: 14 pages, 5 figures, to appear in PRD

Report number: SPhT-Saclay t01/147

Journal ref: Phys.Rev. D66 (2002) 083505

arXiv:astro-ph/0112273 [pdf, ps, other]

doi 10.1051/0004-6361:20020548

The phase-diagram of cosmological baryons

Authors: P. Valageas, R. Schaeffer, J. Silk

Abstract: We investigate the behaviour of cosmological baryons at low redshifts $z<5$ after reionization through analytic means. In particular, we study the density-temperature phase-diagram which describes the history of the gas. We show how the location of the matter in this $(ρ,T)$ diagram expresses the various constraints implied by usual hierarchical scenarios. This yields robust model-independent re… ▽ More We investigate the behaviour of cosmological baryons at low redshifts $z<5$ after reionization through analytic means. In particular, we study the density-temperature phase-diagram which describes the history of the gas. We show how the location of the matter in this $(ρ,T)$ diagram expresses the various constraints implied by usual hierarchical scenarios. This yields robust model-independent results which agree with numerical simulations. The IGM is seen to be formed via two phases: a ``cool'' photo-ionized component and a ``warm'' component governed by shock-heating. We also briefly describe how the remainder of the matter is distributed over galaxies, groups and clusters. We recover the fraction of matter and the spatial clustering computed by numerical simulations. We also check that the soft X-ray background due to the ``warm'' IGM component is consistent with observations. We find in the present universe a baryon fraction of 7% in hot gas, 24% in the warm IGM, 38% in the cool IGM, 9% within star-like objects and, as a still un-observed component, 22% of dark baryons associated with collapsed structures, with a relative uncertainty no larger than 30% on these numbers. △ Less

Submitted 9 April, 2002; v1 submitted 12 December, 2001; originally announced December 2001.

Comments: 17 pages, accepted by A&A. This final version contains a more detailed discussion of the physics of the IGM and of the properties of the Warm IGM

Journal ref: Astron.Astrophys. 388 (2002) 741

arXiv:astro-ph/0012504 [pdf, ps, other]

doi 10.1016/S0370-2693(01)01060-7

Constraining Dark Matter candidates from structure formation

Authors: C. Boehm, P. Fayet, R. Schaeffer

Abstract: We show that collisional damping of adiabatic primordial fluctuations yields constraints on the possible range of mass and interaction rates of Dark Matter particles. Our analysis relies on a general classification of Dark Matter candidates, that we establish independently of any specific particle theory or model. From a relation between the collisional damping scale and the Dark Matter interact… ▽ More We show that collisional damping of adiabatic primordial fluctuations yields constraints on the possible range of mass and interaction rates of Dark Matter particles. Our analysis relies on a general classification of Dark Matter candidates, that we establish independently of any specific particle theory or model. From a relation between the collisional damping scale and the Dark Matter interaction rate, we find that Dark Matter candidates must have cross-sections at decoupling smaller than $ 10^{-33} \frac{m_{dm}}{1 MeV} cm^2$ with photons and $10^{-37} \frac{m_{dm}}{1 MeV} cm^2$ with neutrinos, to explain the observed primordial structures of $10^9$ Solar mass. These damping constraints are particularly relevant for Warm Dark Matter candidates. They also leave open less known regions of parameter space corresponding to particles having rather high interaction rates with other species than neutrinos and photons. △ Less

Submitted 1 April, 2001; v1 submitted 27 December, 2000; originally announced December 2000.

Comments: 9 pages, 1 figure. Our results on induced-damping were initially expressed in terms of momentum-weighted average cross-sections. We precise how these are related to ordinary cross-sections

Journal ref: Phys.Lett.B518:8-14,2001

arXiv:hep-th/0003098 [pdf, ps, other]

doi 10.1016/S0550-3213(00)00280-7

Decomposing Quantum Fields on Branes

Authors: M. Bertola, J. Bros, V. Gorini, U. Moschella, R. Schaeffer

Abstract: We provide a method to decompose the two-point function of a quantum field on a warped manifold in terms of fields living on a lower-dimensional manifold. We discuss explicit applications to Minkowski, de Sitter and anti-de Sitter quantum field theories. This decomposition presents a remarkable analogy with the holography principle, in the sense that physics in d+1 dimensions may be encoded into… ▽ More We provide a method to decompose the two-point function of a quantum field on a warped manifold in terms of fields living on a lower-dimensional manifold. We discuss explicit applications to Minkowski, de Sitter and anti-de Sitter quantum field theories. This decomposition presents a remarkable analogy with the holography principle, in the sense that physics in d+1 dimensions may be encoded into the physics in one dimension less. Moreover in a context a la Randall--Sundrum, the method outlined here allows a mechanism of generation of mass-spectra in the 3-brane (or more generally a d-1-brane). △ Less

Submitted 13 March, 2000; originally announced March 2000.

Comments: 25 pages

Journal ref: Nucl.Phys. B581 (2000) 575-603

arXiv:astro-ph/0001207 [pdf, ps, other]

doi 10.1051/0004-6361:20000351

The redshift evolution of bias and baryonic matter distribution

Authors: P. Valageas, J. Silk, R. Schaeffer

Abstract: We study the distribution of baryonic and luminous matter within the framework of a hierarchical scenario. Using an analytical model for structure formation which has already been checked against observations for galaxies, Lyman-$α$ clouds, clusters and reionization processes, we present its predictions for the bias of these objects. We describe its dependence on the luminosity (for galaxies or… ▽ More We study the distribution of baryonic and luminous matter within the framework of a hierarchical scenario. Using an analytical model for structure formation which has already been checked against observations for galaxies, Lyman-$α$ clouds, clusters and reionization processes, we present its predictions for the bias of these objects. We describe its dependence on the luminosity (for galaxies or quasars) or the column density (for Lyman-$α$ absorbers) of the considered objects. We also study its redshift evolution, which can exhibit an intricate behaviour. These astrophysical objects do not trace the dark matter density field, the Lyman-$α$ forest clouds being undercorrelated and the bright galaxies overcorrelated, while the intermediate class of Lyman-limit systems is seen to sample the matter field quite well. We also present the distribution of baryonic matter over these various objects. We show that light does not trace baryonic mass, since bright galaxies which contain most of the stars only form a small fraction of the mass associated with virialized and cooled halos. We consider two cosmologies: a critical density universe and an open universe. In both cases, our results agree with observations and show that hierarchical scenarios provide a good model for structure formation and can describe a wide range of objects which spans at least the seven orders of magnitude in mass for which data exist. More detailed observations, in particular of the clustering evolution of galaxies, will constrain the astrophysical models involved. △ Less

Submitted 21 February, 2001; v1 submitted 12 January, 2000; originally announced January 2000.

Comments: 13 pages, final version published in A&A

Journal ref: Astron.Astrophys. 366 (2001) 363

arXiv:astro-ph/9909370 [pdf, ps, other]

Multiplicity Functions and X-ray emission of Clusters and Groups versus Galaxies and Quasars

Authors: P. Valageas, R. Schaeffer

Abstract: We use a unified analytical formulation for the multiplicity functions of clusters and galaxies which is free from the cloud-in-cloud problem encountered in earlier approaches and well adapted to the description of the non-linear clustering features. It is especially suited to simultaneously describe rich clusters, groups and galaxies, consistently with the hierarchical picture of gravitational… ▽ More We use a unified analytical formulation for the multiplicity functions of clusters and galaxies which is free from the cloud-in-cloud problem encountered in earlier approaches and well adapted to the description of the non-linear clustering features. It is especially suited to simultaneously describe rich clusters, groups and galaxies, consistently with the hierarchical picture of gravitational clustering and their evolution in time. Using a simple model for the X-ray luminosity (taking into account entropy considerations), we obtain the X-ray luminosity distribution of groups and clusters. Then, using the same formalism we derive the galaxy and quasar multiplicity functions. In particular, we show that the use of the standard Press-Schechter prescription leads to erroneous conclusions at low redshifts while our approach provides a reasonable agreement with observations in a natural fashion because it is able to distinguish galactic halos from groups or clusters. Thus, we obtain a global and consistent picture of the X-ray emissions from all structures. In particular, we show that future observations (e.g., from AXAF) could provide interesting information on galaxy evolution. Indeed, they will constrain the importance of a possible hot diffuse gaseous phase in galactic halos and they could reveal massive galaxies which are just being formed, through the X-ray emission of their cooling gas. △ Less

Submitted 29 August, 2000; v1 submitted 22 September, 1999; originally announced September 1999.

Comments: 23 pages, final version published in A&A. Improved modeling of the temperature - X-ray luminosity relation for clusters. More detailed discussion of the need to properly distinguish galactic halos from simple "just-virialized" objects in order to draw meaningful conclusions for the luminosity functions of galaxies and QSOs

Journal ref: A&A (2000), 359, 821

arXiv:hep-th/9908140 [pdf, ps, other]

AdS/CFT correspondence for n-point functions

Authors: M. Bertola, J. Bros, U. Moschella, R. Schaeffer

Abstract: We provide a new general setting for scalar interacting fields on the covering of a d+1-dimensional AdS spacetime. The formalism is used at first to construct a one-paramater family of field theories, each living on a corresponding spacetime submanifold of AdS, which is a cylinder $R\times S_{d-1}$. We then introduce a limiting procedure which directly produces Luescher-Mack CFT's on the coverin… ▽ More We provide a new general setting for scalar interacting fields on the covering of a d+1-dimensional AdS spacetime. The formalism is used at first to construct a one-paramater family of field theories, each living on a corresponding spacetime submanifold of AdS, which is a cylinder $R\times S_{d-1}$. We then introduce a limiting procedure which directly produces Luescher-Mack CFT's on the covering of the AdS asymptotic cone. Our AdS/CFT correspondence is generally valid for interacting fields, and is illustrated by a complete treatment of two-point functions, the case of Klein-Gordon fields appearing as particularly simple in our context. We also show how the Minkowskian representation of these boundary CFT's can be directly generated by an alternative limiting procedure involving Minkowskian theories in horocyclic sections (nowadays called (d-1)-branes, 3-branes for AdS_5). These theories are restrictions to the brane of the ambient AdS field theory considered. This provides a more general correspondence between the AdS field theory and a Poincare' invariant QFT on the brane, satisfying all the Wightman axioms. The case of two-point functions is again studied in detail from this viewpoint as well as the CFT limit on the boundary. △ Less

Submitted 21 February, 2000; v1 submitted 20 August, 1999; originally announced August 1999.

Comments: 26 pages, changed abstract, added references, added introductory considerations, some other changes

arXiv:hep-th/9906035 [pdf, ps, other]

doi 10.1016/S0370-2693(99)00927-2

Correspondence between Minkowski and de Sitter Quantum Field Theory

Authors: Marco Bertola, Vittorio Gorini, Ugo Moschella, Richard Schaeffer

Abstract: In this letter we show that the ``preferred'' Klein-Gordon Quantum Field Theories (QFT's) on a d-dimensional de Sitter spacetime can be obtained from a Klein-Gordon QFT on a (d+1)-dimensional ``ambient'' Minkowski spacetime satisfying the spectral condition and, conversely, that a Klein-Gordon QFT on a (d+1)-dimensional ``ambient'' Minkowski spacetime satisfying the spectral condition can be obt… ▽ More In this letter we show that the ``preferred'' Klein-Gordon Quantum Field Theories (QFT's) on a d-dimensional de Sitter spacetime can be obtained from a Klein-Gordon QFT on a (d+1)-dimensional ``ambient'' Minkowski spacetime satisfying the spectral condition and, conversely, that a Klein-Gordon QFT on a (d+1)-dimensional ``ambient'' Minkowski spacetime satisfying the spectral condition can be obtained as superposition of d-dimensional de Sitter Klein-Gordon fields in the preferred vacuum. These results establish a correspondence between QFT's living on manifolds having different dimensions. The method exposed here can be applied to study other situations and notably QFT on Anti de Sitter spacetime. △ Less

Submitted 15 September, 1999; v1 submitted 4 June, 1999; originally announced June 1999.

Comments: 7 pages, no figures, typos corrected, added one reference

Journal ref: Phys.Lett.B462:249-253,1999

arXiv:astro-ph/9903388 [pdf, ps, other]

The redshift evolution of Lyman-$α$ absorbers

Authors: P. Valageas, R. Schaeffer, J. Silk

Abstract: We present a model for the Lyman-alpha absorbers that treats all objects (from the low-density forest clouds to the dense damped systems) in a unified description. This approach is consistent with an earlier model of galaxies (luminosity function, metallicity) but also with the known description of the density field in the small-scale non-linear regime. We consider two cosmological models: a cri… ▽ More We present a model for the Lyman-alpha absorbers that treats all objects (from the low-density forest clouds to the dense damped systems) in a unified description. This approach is consistent with an earlier model of galaxies (luminosity function, metallicity) but also with the known description of the density field in the small-scale non-linear regime. We consider two cosmological models: a critical universe $Ω=1$ with a CDM power-spectrum, and an open CDM universe with $Ω_0=0.3$, $Λ=0$. We reproduce the available data on column density distribution as a function of redshift, the value of the main new parameter, the background ionizing UV flux, being consistent with the observed limits. This allows a quantitatively trustable analytical description of the opacity, mass, size, velocity dispersion and metallicity of these absorbers, over a range of column densities spanning 10 orders of magnitude. Moreover, together with an earlier model of galaxy formation this draws a unified picture of the redshift evolution of structures in the universe, from underdense clouds to massive high density galaxies, from weak to very deep potential wells. △ Less

Submitted 25 March, 1999; originally announced March 1999.

Comments: 23 pages, accepted for publication by A&A

Journal ref: Astron.Astrophys. 345 (1999) 691

arXiv:astro-ph/9903387 [pdf, ps, other]

Halo correlations in nonlinear cosmic density fields

Authors: F. Bernardeau, R. Schaeffer

Abstract: The question we address in this paper is the determination of the correlation properties of the dark matter halos appearing in cosmic density fields once they underwent a strongly nonlinear evolution induced by gravitational dynamics. Assuming that the high-order correlation functions of the matter field behave as products of two-body correlation functions, we derive the correlation properties o… ▽ More The question we address in this paper is the determination of the correlation properties of the dark matter halos appearing in cosmic density fields once they underwent a strongly nonlinear evolution induced by gravitational dynamics. Assuming that the high-order correlation functions of the matter field behave as products of two-body correlation functions, we derive the correlation properties of the halos, that are assumed to represent the correlation properties of galaxies or clusters. The hierarchical pattern originally induced by gravity is shown to be conserved for the halos. The strength of their correlations at any order varies, however, but is found to depend only on their internal properties, namely on the parameter x=m/r^(3-gamma) where m is the mass of the halo, r its size and gammma is the power law index of the two-body correlation function. We were able to derive the explicit form of the generating function of the moments of the halo count probability distribution function. In particular we show explicitely that, generically, S_p(x) -> p^(p-2) in the rare halo limit. Various illustrations of our general results are presented. As a function of the properties of the underlying matter field, we construct the count probabilities for halos and in particular discuss the halo void probability. We also evaluate the dependence of the halo mass function on the environment. We found that within clusters, hierarchical clustering implies the higher masses are favored. We stress that this bias is naturally induced by gravity. △ Less

Submitted 25 March, 1999; originally announced March 1999.

Comments: 35 pages; Submitted to Astronomy and Astrophysics

Journal ref: Astron.Astrophys.349:697-728,1999

arXiv:astro-ph/9902320 [pdf, ps, other]

doi 10.1046/j.1365-8711.2000.03054.x

Scaling laws in gravitational clustering for counts-in-cells and mass functions

Authors: P. Valageas, C. Lacey, R. Schaeffer

Abstract: We present in this article an analysis of some of the properties of the density field realized in numerical simulations for power-law initial power-spectra in the case of a critical density universe. We compare our numerical results in the non-linear regime with the predictions of a specific scaling model, focusing on its much wider range of applicability, which is one of its main advantages ove… ▽ More We present in this article an analysis of some of the properties of the density field realized in numerical simulations for power-law initial power-spectra in the case of a critical density universe. We compare our numerical results in the non-linear regime with the predictions of a specific scaling model, focusing on its much wider range of applicability, which is one of its main advantages over the standard Press-Schechter approximation. We first check that the two-point correlation functions agree with the stable-clustering ansatz. Next we show that the statistics of the counts-in-cells obey the scaling law predicted by our scaling model. Then, we turn to mass functions of overdense and underdense regions. We first consider the mass function of "just collapsed" objects defined by a density threshold $Δ~177$. We note that the usual Press-Schechter prescription agrees reasonably well with the simulations (although there are some discrepancies) while the numerical results are also consistent with the scaling model. Then, we consider more general mass functions defined by different density thresholds which can even be negative. This is out of reach of the Press-Schechter approach while our scaling model can handle these mass functions and it shows a reasonably good agreement with numerical results. Finally, we consider objects defined by a constant radius condition. Thus, we find that the scaling model allows one to study many different classes of objects and it clarifies the links between various statistical tools. △ Less

Submitted 11 February, 2000; v1 submitted 23 February, 1999; originally announced February 1999.

Comments: 17 pages, final version published in MNRAS

Journal ref: Mon.Not.Roy.Astron.Soc. 311 (2000) 234

Showing 1–50 of 61 results for author: Schaeffer, R