A link between the size and composition of comets

James E. Robinson,122footnotemark: 2 Uri Malamud,2 Cyrielle Opitom,1 Hagai Perets,2 and Jürgen Blum3
1Institute for Astronomy, University of Edinburgh, Edinburgh EH9 3HJ, UK
2Department of Physics, Technion - Israel Institute of Technology, Technion City, 3200003 Haifa, Israel
3Institute for Geophysics and extraterrestrial Physics, Technische Universität Braunschweig, Mendelssohnstr. 3, D-38106, Braunschweig, Germany
E-mail: james.robinson@ed.ac.ukThese authors contributed equally to this work. Robinson - observational data compilation ; Malamud - initiative and theory.
(Accepted 2024 March 21. Received 2024 March 7; in original form 2023 October 17)
Abstract

All cometary nuclei that formed in the early Solar System incorporated radionuclides and therefore were subject to internal radiogenic heating. Previous work predicts that if comets have a pebble-pile structure internal temperature build-up is enhanced due to very low thermal conductivity, leading to internal differentiation. An internal thermal gradient causes widespread sublimation and migration of either ice condensates, or gases released from amorphous ice hosts during their crystallisation. Overall, the models predict that the degree of differentiation and re-distribution of volatile species to a shallower near-surface layer depends primarily on nucleus size. Hence, we hypothesise that cometary activity should reveal a correlation between the abundance of volatile species and the size of the nucleus. To explore this hypothesis we have conducted a thorough literature search for measurements of the composition and size of cometary nuclei, compiling these into a unified database. We report a statistically significant correlation between the measured abundance of \chCO/\chH2O and the size of cometary nuclei. We further recover the measured slope of abundance as a function of size, using a theoretical model based on our previous thermophysical models, invoking re-entrapment of outward migrating high volatility gases in the near-surface pristine amorphous ice layers. This model replicates the observed trend and supports the theory of internal differentiation of cometary nuclei by early radiogenic heating. We make our database available for future studies, and we advocate for collection of more measurements to allow more precise and statistically significant analyses to be conducted in the future.

keywords:
comets: general – astronomical data bases: miscellaneous
pubyear: 2024pagerange: A link between the size and composition of cometsD

1 Introduction

In a recent study of the long-term evolution of comets (Malamud et al., 2022), it was found, through a combination of various empirical laboratory works, that pebble-made comets have an extremely low thermal conductivity. The result of which is that due to radiogenic heating, comets are able to attain higher temperatures during their evolution than those typically considered in past models. This implies that internal transport of various volatile species is ubiquitous in comets.

Comet nuclei consist of various volatile species. Water and probably also carbon dioxide are currently viewed as primary volatile species, that exist as amorphous solids (Rubin et al., 2023), and can trap other high-volatility species during their incipient formation (Bar-Nun et al., 1985; Simon et al., 2023). We cannot know for certain which high-volatility species exist inside the comet as pure ice condensates, and which ones were co-deposited within the amorphous ice hosts, or in what precise proportion. Regardless, as was envisaged in Malamud et al. (2022), due to the internal temperature gradient in the comet, volatiles must migrate towards the surface upon their direct sublimation or else upon the phase transitions of their amorphous ice hosts. While migrating through an intact matrix of amorphous ice, the volatiles may become sequentially entrapped in the amorphous ice again, even though local temperatures are otherwise still too high for their deposition as pure ices (Bar-Nun et al., 1985; Laufer et al., 1987; Bar-Nun et al., 1987, 1988; Collings et al., 2003; Kumi et al., 2006; Gálvez et al., 2007; Maté et al., 2008; Gálvez et al., 2008; Herrero et al., 2010; Carmack et al., 2023). Amorphous ice hosts can thus become highly enriched in entrapped gases, since they have a very large uptake (storage capacity) of high-volatility species, amounting to a few tens of % of their own mass (see Carmack et al., 2023, and references therein). Even if sequential deposition were to be ignored, hyper- and super-volatile species will still generally flow outwards and eventually freeze according to their respective deposition temperatures, forming an onion-like internal stratification (De Sanctis et al., 2001; Choi et al., 2002; Davidsson, 2021).

Not only might this process differentiate hyper- and super-volatile species from the bulk of the comet to a much narrower layer, closer to the surface, but the degree of differentiation could be strongly dependent on the comet’s size. While Malamud et al. (2022) have established a correlation between the degree of differentiation and several factors, such as the assumed composition, the time of formation, the pore-space permeability and the pebble size, the dependence on the nucleus size involves fewer uncertainties, as follows.

Internal temperatures are, generally, governed by the interplay between the rate at which internal heat is released, and the rate in which it can diffuse out, either through conduction, advection or radiative transport. The amount of radiogenic heat release depends on the radionuclide abundances which in turn depend on the comet’s formation time (for short-lived radionuclides) and the assumed composition of refractories which sets the initial radiogenic abundances at t=0𝑡0t=0italic_t = 0. Arguments were made in Malamud et al. (2022) that for outer Solar System objects the abundances do not necessarily adhere to meteoritic levels, which are usually invoked in thermophysical models.

The effectiveness of advective flow and thus of differentiation, depends on the permeability of gas within the porous matrix in which it travels. This parameter is unconstrained in pebble media and has a potentially large range (Gundlach et al., 2011, 2020; Schweighart et al., 2021; Güttler et al., 2023). Radiative transport strongly depends on temperature as well as pebble size (Hu et al., 2019; Bischoff et al., 2021). Finally, heat conduction out of the interior, which is the primary mode of heat transport, depends quadratically on the characteristic length scale of the object. Hence, all else being equal (radionuclide abundances, internal permeability, thermal conductivity, heat capacity etc.), there is no question that the size of the nucleus dictates how much heat the nucleus can retain. The larger the comet, the greater bulk migration of hyper- and super-volatiles we might expect, sweeping them outwards and depositing them in differentiated layers of either gas laden amorphous ice or as pure ice. The former is much more likely for hyper-volatiles, as discussed next.

The survival of pure ices of hyper-volatile materials is not impossible, however probably unlikely in most present-day observed comets. Since comets are expected to lose their hyper-volatile content either in the contemporary Kuiper Belt (Lisse et al., 2021) or even in the primordial disc (Davidsson, 2021) relatively quickly, they must be emplaced onto distant Oort-cloud orbits early enough in the Solar System formation history in order to avoid this fate. At least some fraction of the outer nucleus must be kept below the threshold temperature of incipient sublimation of such ices, unaffected by both external insolation and internal radiogenic heating. C/2016 R2 might be an example of a rare, hyper-volatile rich, yet water and dust poor comet, belonging to this category (McKay et al., 2019). Even without early emplacement onto an Oort-cloud-like orbit, several super-volatile species as well as amorphous water and carbon dioxide ice would remain sufficiently cold and thus safe against insolation in the outer Solar System, as evident from both theory and observations (Prialnik et al., 1987; Jewitt, 2009; Li et al., 2020; Parhi & Prialnik, 2023). If perturbed into the inner Solar System for the first time, more vigorous activity is expected, however erosion limits the penetration of a heat wave to the interior (Capria et al., 2017), thus shielding the inner nucleus. The bulk composition of the eroded surface reflects the early rather than contemporary orbital state.

Based on the aforementioned arguments, and in light of the elevated internal temperatures suggested by the pebble nucleus model of Malamud et al. (2022), the following predictions have instigated this study:

1. Only extremely small nuclei cool effectively enough in order to prevent any internal differentiation, whereas increased nucleus size correlates with increased hyper- and super-volatile differentiation and concentration near the surface, triggered by internal radiogenic evolution. Thus, active comets will appear to have greater abundances of high-volatility species as a function of their size.

2. More dynamically evolved short period active comets will, as a group, be more eroded and therefore expose deeper layers compared to long period comets, which might be reflected in their size-dependent volatile abundances.

3. If prolonged activity of comets strictly requires gas laden amorphous ice, then small comets might outlive larger comets, because the latter might concentrate amorphous ice in a thinner outer layer. In turn, a testable prediction is that dynamically evolved active comets are smaller than their long period counterparts, because large inner Solar System comets have had more orbits over which to erode and become dormant.

In this paper we present the first evidence that comet observations partly support the above predictions, and in particular the size-composition dependence for the hyper-volatile CO. Future efforts are certainly needed in order to increase the quantity of currently available data and help verify our predictions. In what follows we first carefully describe the criteria for assembling our data set, consisting of comets for which both the size and the coma composition are reliably known (Section 2). We then present the observational evidence in Section 3. A discussion of our findings is given in Section 4. The paper is concluded in Section 5.

2 The data set

In this section, we describe the data we used for this study. We aimed to gather the largest amount of size and composition data as we could, given what was available in the literature at the time of writing.

Cometary activity allows us to measure the composition and abundance of gases in the coma that are being released from the nucleus via sublimation of volatile ices. This provides an opportunity to gain insight into the composition of the ices contained in cometary nuclei. Many radiative processes take place in cometary atmospheres, which can be observed across a range of wavelengths (Biver et al., 2022a; Bodewits et al., 2022). Spectroscopic observations of comets have been used to measure the composition of their coma through detection of emission bands or lines from a variety of molecules. This complements much rarer data from direct mass spectroscopy measurements following flyby of a comet by a space mission. In this study, we used mass spectroscopy data only for comet 67P/C-G (Rubin et al., 2019). We have gathered measurements from a large number of sources covering a range of molecules, from relatively complex ones to small radicals.

The other key information for this study is the size of comet nuclei, which is difficult to measure. Indeed, comets far from the Sun are faint and challenging to observe because of their small sizes and low geometric albedo. As they move inwards, solar radiation increases and causes the formation of the coma surrounding and obscuring the nucleus. Comets are most often discovered/observed while active, as this is when they are brightest.

In this study we generally refer to two broad classes of comets: nearly isotropic comets (NIC), which possess a fairly uniform inclination distribution, long orbital periods and a Tisserand parameter T<2𝑇2T<2italic_T < 2; and ecliptic comets (EC), also known as short period comets, which are sub-classified into Jupiter family comets (with a Tisserand parameter 2<T<32𝑇32<T<32 < italic_T < 3 and periods below 20 years) and Chiron-type comets (with T>3𝑇3T>3italic_T > 3 and periods in the range 20-200 years) (Levison, 1996).

2.1 Comet Nuclei Sizes

Different methods are used to measure the size of a comet nucleus, from optical photometric observations of inactive comets, observations of thermal emission from the nucleus, or direct measurements by spacecrafts. These techniques have different levels of accuracy and usually rely on different assumptions. We briefly describe the techniques that were used for the objects in this study but refer the reader to Lamy et al. (2004) and Knight et al. (2023) for a more complete description alongside the advantages/shortfalls of the different techniques.

2.1.1 Observing reflected light

This is one of the most commonly used techniques to determine the size of cometary nuclei. When a comet is at large heliocentric distances (4greater-than-or-equivalent-toabsent4\gtrsim 4≳ 4 au for short period comets, further away for other types of comets), the heating from the Sun is insufficient to efficiently drive sublimation of water ice and there is little to no coma obscuring the nucleus. The nucleus can then be observed directly, or its flux estimated once a small coma contribution is modelled and subtracted. The flux measured for the nucleus is then used to compute the nucleus size. This technique presents some difficulties, as it relies on observing faint objects far from the Sun. With a limited spatial resolution, it can also be impossible to ascertain that the object is truly inactive vs having an unresolved coma. For most objects, when these parameters are unknown, assumptions also need to be made concerning the geometric albedo of the target and its phase curve properties, introducing uncertainties in the nucleus size obtained.

2.1.2 Observing thermal emission

At longer wavelengths, the thermal emission of comet nuclei can be detected, which can be linked to the nucleus size. This generally requires observing at infrared wavelengths, often with a space telescope. Similar to observations of reflected light at optical wavelengths, in cases where the object is active the coma contribution has to be modelled to retrieve the contribution of the nucleus. Thermal modelling of the nucleus is then used to determine its size (e.g. NEATM; Harris, 1998). In most cases, assumptions must be made on the rotation period of the object, its shape (often, the derived size is an effective radius, assuming a spherical nucleus), albedo, or thermal inertia. Likewise, interferometric measurements can be made of the submillimetre continuum component of the thermal emission, from which nucleus size can also be determined through thermal modelling (e.g. Altenhoff et al., 1999; Boissier et al., 2011, 2013). At these wavelengths dust in the coma contributes much less to the total emission than in the IR and visible ranges, as such one expects such observations to be dominated by thermal emission from the nucleus. However, the submillimetre flux from the nucleus is lower than the IR flux making these observations challenging except for bright and/or nearby comets. Furthermore, in cases where thermal emission and visible photometry can be measured simultaneously it is possible to solve for the nucleus radius and albedo independently (Lamy et al., 2004).

2.1.3 Radar observations

Radar observations, where a burst of microwaves is sent towards the nucleus of a comet and the reflected echo measured, can accurately constrain the shape and size of comets and asteroids that pass very close to the Earth. However, this technique is limited by a relatively small number of comet nuclei with near-Earth orbits.

2.1.4 Space-based size measurements from flyby / rendezvous

The most accurate determination of the shape and size of any small Solar System body is made by direct observation during a spacecraft flyby/rendezvous. Naturally only a handful of comets have been visited by a spacecraft. These accurate size measurements are invaluable, alongside detailed information on the shapes (e.g. bilobate) and terrain (topography) of comet nuclei.

2.1.5 Other size-measuring techniques

In this study, we have gathered comet size measurements determined using the techniques mentioned above. Other techniques have been used to measure nucleus sizes, but either the comets they targeted did not have composition information and were not useful for this study or they were judged to be less reliable. We investigated the size estimates inferred by Jewitt (2022) using the water production rate and non-gravitational acceleration of long period comets. The former model assumes the production rate is linked to the sublimating area and therefore nucleus size, and the latter model estimates the nucleus mass/size from the magnitude of non-gravitational accelerations on the comet orbit. These methods are generally less accurate than those described above; they make use of simplified models of cometary activity and assume physical parameters such as active surface area and nucleus density. Furthermore, photometric/thermal techniques generally provide an upper limit on nucleus size whereas the techniques of Jewitt (2022) could either overestimate the size of hyperactive comets (when icy grains in the coma enhance activity) or underestimate the size depending on the accuracy of the model and choice of physical parameters. At the time of writing this source provides the only available literature size estimates for a number of NICs (8 comets for which we also found compositional information) therefore we considered using these comets in our analysis. We found that our overall results were not significantly changed by the inclusion of sizes from Jewitt (2022) therefore we elected not to include these sizes in the final results to avoid potential biases. However, we welcome the efforts to broaden the dataset of comet sizes in this manner and perhaps future work can incorporate such size estimates into a more complete analysis.

2.1.6 Selection criteria

In addition to searching the literature, we used the Small-Body Database Lookup tool111https://ssd.jpl.nasa.gov/tools/sbdb_lookup.html to query all comets listed in the MPC comet list222https://www.minorplanetcenter.net/iau/MPCORB/CometEls.txt (via astroquery, accessed 23/06/2022) and checked the original references. Furthermore we searched for additional size measurements in the Properties of Comet Nuclei v2.0 PDS333https://pds.nasa.gov/ dataset (Barnes et al., 2010). The studies used for our final selection of comet nucleus sizes are listed in Table 5.

Many comets had multiple measurements from different sources and obtained with different techniques. Previous compilations of comet nuclei sizes have sometimes taken the average of multiple measurements and used their variation to estimate the uncertainty (e.g. Combi et al., 2019, for 46P and 96P). In this work we chose to pick a single size for each comet, considering the reliability of each source and the methods used. We do this as one would expect the measurement of nucleus size to be more frequently biased towards larger sizes due to additional signal from unresolved activity. We applied the following guidelines to make our selection:

  • Use spacecraft rendezvous/flyby measurements (a direct measurement of size) where available.

  • Choose the smallest measured radius as some studies only provided an upper limit on size.

  • Prefer more modern sources which generally apply more well established techniques on mostly higher quality data.

  • Select a measurement with a directly calculated uncertainty if available444When no uncertainty is provided we use the radius-uncertainty relation of the literature data to assign an uncertainty estimate (Figure 8). For some of these objects the uncertainty on the radius is expressed as an upper and lower limit. In these cases, for simplicity, we have taken the mean of these values to obtain a single uncertainty.

For further details about size selection for particular comet nuclei, please refer to Appendix A.

2.1.7 Comet fragments

Comets have often been observed to split into multiple fragments due to tidal forces while passing too close to the Sun or a planet (e.g. the Kreutz sungrazers and the 1994 Shoemaker-Levy encounter with Jupiter) or for underdetermined reasons such as rotational spin-up, impacts or gas pressure (Boehnhardt, 2002). As such in this study we must be wary of when the nucleus size was measured relative to the observation of its composition. A famous example is comet 73P/Schwassmann-Wachmann 3, which fragmented in 1995 (and possibly earlier as well - Schuller & Struve, 1930). This is one of the only cases where the composition of different fragments of a split comet could be measured separately. The compositions of fragments B and C have been measured by Fink (2009), Dello Russo et al. (2016) and Lippi et al. (2021) and were determined to be similar. However, in this work we decided to use only pre-fragmentation sizes as this is more representative of the initial size of the primordial comet nucleus. We were able to find composition information of nuclei/fragments for a number of comets including 73P, 51P/Harrington, C/1996 B2 (Hyakutake) and C/2001 A2 (LINEAR), however we have excluded them from this study as they do not have reliable size determination prior to the splitting events. Likewise for other known split comets (listed in Boehnhardt, 2004), either no composition information was available or the size pre-splitting was unknown so they were not used in this study.

2.2 Comet Compositions

2.2.1 Observed species

Over the years, various techniques have been used to measure the compositions of comets by observing their atmosphere. Biver et al. (2022a) present an overview of the current stage of our knowledge about comet composition and how it is measured. The simplest measurements to obtain are made at optical wavelengths, using spectroscopy or narrow-band filters to measure the abundance of a set of radicals, such as \chCN, \chC2, \chC3, \chOH, \chNH (see for example A’Hearn et al., 1995). While databases containing comets observed at optical wavelengths are the largest available, the species they sample are what we call product or daughter species. They are not present as such in the nucleus ices but are instead produced in the coma by the photo-dissociation of larger molecules. Therefore they might not directly represent the composition of nucleus ices. Molecules produced directly by the sublimation of nucleus ices, called parent species, tend to emit at infrared and radio wavelengths (H2O, CO, CO2, HCN, NH3, CH4, …). These molecules are harder to detect, and their abundances can typically only be measured for relatively bright comets. In this study, we have included both types of species, in order to gather the largest sample possible. We have also included abundances measured in situ using mass spectroscopy in the coma of comet 67P/C-G by the ROSINA mass spectrometer onboard the Rosetta spacecraft (Rubin et al., 2019). We note that there are composition measurements of 67P made using other instruments on Rosetta (e.g. Bockelée-Morvan et al., 2016; Feldman et al., 2018; Biver et al., 2019), however we used the measurements of Rubin et al. (2019) as they report abundances for a large number of species together.

Comparing the abundance of species from different studies, sometimes derived from observations at different wavelengths and with different techniques, can be a challenge. Indeed, the size of the field of view and the model parameters used can have a significant effect on the abundances measured. For example, in hyperactive comets sublimation of icy grains in the coma can be a significant source of volatile gas compared to production from the nucleus alone (e.g. 103P, Kelley & Kolokolova, 2014). This could in principle lead to changes in measured abundance depending on where in the coma the measurement was made. However, the goal of this study is to find broad trends among comets rather than performing detailed comparison between a small number of comets. Given the limited number of targets for which we could find both composition and size measurements, we collected all the sources we could find for abundance measurements. We focused first on large-scale studies, and then complemented our database using works focused on individual comets.

For observations of radicals at optical wavelengths, we considered mainly the following studies. Among the largest available studies is the one by A’Hearn et al. (1995). They published the results of a survey of 85 comets observed from the 1970s until 1992 using narrow-band photometry at optical wavelengths. They sample a range of ECs and NICs. This was later updated by Schleicher (2008). We queried that dataset from PDS (Lowell Observatory Cometary Database - Production Rates, Osip et al., 2003). Cochran et al. (2012) obtained abundances from optical spectroscopy of 130 comets from 1980 - 2008 while Langland-Shula & Smith (2011) observed 26 comets using the Kask double spectrograph at Lick observatory (primarily in the 300600nm300600nm300-600\ $\mathrm{n}\mathrm{m}$300 - 600 roman_nm range). Finally, Fink (2009) present abundances for 50 comets with significant enough detections to derive reliable production rates from observations made in the wavelength range 5201040nm5201040nm520-1040\ $\mathrm{n}\mathrm{m}$520 - 1040 roman_nm at the Catalina Site telescope (Fink & Hicks, 1996). In this particular study, there are no exact uncertainties published, only a subjective quality grade. We thus converted the quality grade into an uncertainty using the suggested percentage errors. Most of these studies contain measurements of the AfρAf𝜌\mathrm{Af\rho}roman_Af italic_ρ parameter, a proxy for the dust production, in addition to the radical production rates. For completeness and as an estimate of the dust-to-gas ratio we have included AfρAf𝜌\mathrm{Af\rho}roman_Af italic_ρ measurements in this study.

For observations of parent species, we focused mainly on the following studies. Dello Russo et al. (2016) present high resolution IR spectroscopy of 30 comets observed between 1997 and 2013. We complemented this with data from Lippi et al. (2021). When a target was available in both data sets, we used data from Dello Russo et al. (2016) as it presents the largest dataset. Ootsubo et al. (2012) present CO2 production rates for a sample of 18 comets with the AKARI satellite, and Reach et al. (2013) measured abundances of CO and CO2 with the Spitzer space telescope for 23 comets. The production rates of \chCO and \chCO2 were further complemented by an existing compilation by Harrington Pinto et al. (2022), which contains the results of Ootsubo et al. (2012) & Reach et al. (2013) alongside additional sources. Table 1 summarises the number of comets included in the largest of these studies, as well as the species they measured.

Source Method Wavelength Number of Comets Species
A’Hearn et al. (1995) Narrowband photometry Visible 85 \chCN, \chC2, \chC3, \chNH, AfρAf𝜌\mathrm{Af\rho}roman_Af italic_ρ, \chOH
Fink (2009) Spectroscopy Visible 92 (50) \chC2, \chNH2, \chCN, AfρAf𝜌\mathrm{Af\rho}roman_Af italic_ρ, \chH2O
Langland-Shula & Smith (2011) Spectroscopy Visible 26 \chCN, \chC2, \chC3, \chNH, \chNH2, AfρAf𝜌\mathrm{Af\rho}roman_Af italic_ρ
Cochran et al. (2012) Spectroscopy Visible 130 (110) \chCN, \chNH, \chC3, \chCH, \chC2, \chNH2, \chOH
Ootsubo et al. (2012) Spectroscopy IR 18 (17) \chH2O, \chCO2, \chCO
Reach et al. (2013) Spectroscopy IR 23 (20) \chOH, \chCO2, AfρAf𝜌\mathrm{Af\rho}roman_Af italic_ρ
Dello Russo et al. (2016) Spectroscopy IR 30 \chCH3OH, \chHCN, \chNH3, \chH2CO, \chC2H2, \chC2H6, \chCH4, \chH2O
Lippi et al. (2021) Spectroscopy IR 20 \chCH3OH, \chHCN, \chNH3, \chH2CO, \chC2H2, \chC2H6, \chCH4, \chCO, \chH2O
Table 1: Table of the main compositional surveys considered in this work. The source is listed along with the observational method and wavelength. The number of comets considered in each study is listed, where the number in brackets is the number for which some or all of the species were detected. The species targeted by each survey are also listed, including AfρAf𝜌\mathrm{Af\rho}roman_Af italic_ρ which is a proxy for dust production.

The rest of the data comes from publications focused on individual comets by Biver et al. (1999, 2007, 2011a, 2011b, 2012, 2021a, 2021b, 2022b); Bockelée-Morvan et al. (2000, 2004, 2010, 2022); Bodewits et al. (2011); Bonev et al. (2021); Dello Russo et al. (2020); Faggi et al. (2019); Moulane et al. (2018); Opitom et al. (2016); Roth et al. (2018); Roth et al. (2020); Rubin et al. (2019)

2.2.2 Selection criteria

To select the final abundance measurements presented in Section 3, we applied the following methodology:

  • We collected data from all sources and identified the available species. We selected only data sets that also had available production rates for \chOH/\chH2O or \chCN (we considered abundances relative to water or CN when looking for correlation between composition and size).

  • For each data set, if multiple measurements were provided for a comet we calculated the mean heliocentric distance and date of the observations and the corresponding average for each production rate if required. We used a weighted average if and when the source provided the corresponding weights.

  • If multiple sources were available for a comet, we selected production rate ratios from the largest available dataset, to prioritise using larger homogeneous datasets.

  • For improved reliability, preference was given to sources who published production rates with an associated uncertainty.

By selecting composition from a single source we attempt to avoid difficulties in combining abundance measurements from multiple observational techniques and/or observational circumstances. Assessing the changes in compositional abundance as a function of methodology, viewing geometry and other circumstances such as outburst events for all comets analysed here is beyond the scope of this work. We note that determining the true bulk abundance of species in a cometary nucleus from remote observations alone will always be a difficult problem, and we attempted to minimise these effects by selecting from larger homogeneous datasets where possible.

As mentioned above, we focused mostly on abundances of other species relative to water. However, observations at optical (and sometimes radio) wavelengths only provide measurements of the \chOH production rate. Since OH is produced by the photo-dissociation of water, it is possible to convert between \chH2O and \chOH production rates. Several ways to do the conversion have been presented in the literature and we decided to use the conversion ratio of \chOH=0.85×\chH2O\ch𝑂𝐻0.85\ch𝐻2𝑂\ch{OH}=0.85\times\ch{H2O}italic_O italic_H = 0.85 × italic_H 2 italic_O based on the photo-dissociation rate of water into OH and H (Harris et al., 2002). As \chH2O and \chOH are hard to observe at optical wavelengths, abundance is sometimes reported relative to CN. We thus included these observations as well for completeness.

Most studies present composition measurements for a comet in a distinct time window, e.g. around its perihelion passage or the date range over which it was observable/observing time was available. If not provided in the original study, we determined the mean heliocentric distance and date of the observation for each comet from each source. In most cases this accurately captures the mean epoch at which the comet composition was measured, however in certain long running observing campaigns the mean might not reflect the true range of observing conditions (e.g. A’Hearn et al., 1995).

Whenever a source provided several measurements for the same target, and if a final summary table was not provided by the authors, we took a mean of the compositions for that source to get a single mixing ratio for each object (uncertainties were propagated forward when available), unless a large gap in heliocentric distance/time was present between the observations. In that case, the measurements closest to perihelion were selected. Some sources provide an upper and lower uncertainty estimate on composition/mixing ratio. For simplicity, with these measurements we took the mean value of the upper and lower limits to be the uncertainty.

As a sanity check we considered the range of heliocentric distances used to calculate each mean composition. For the comet compositions where this range is large (>1auabsent1au>1~{}$\mathrm{a}\mathrm{u}$> 1 roman_au) we inspected the individual measurements to ensure they were consistent. 10P and 103P displayed significant variation in their \chCO2/\chH2O abundance as reported by Reach et al. (2013), therefore for these two comets we excluded the production rates measured at >2auabsent2au>2~{}$\mathrm{a}\mathrm{u}$> 2 roman_au where ices more volatile than \chH2O begin to dominate activity. Likewise, there was large variation in the measurements of \chCO2/\chH2O for C/1995 O1 compiled by Harrington Pinto et al. (2022). The reported observations spanned a wide range of heliocentric distances, with some taken while the comet was >3auabsent3au>3\ $\mathrm{a}\mathrm{u}$> 3 roman_au. As such, we selected the measurement with the lowest heliocentric distance (rh=2.93ausubscript𝑟2.93aur_{h}=2.93\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = 2.93 roman_au) for our analysis. Furthermore, we note here that Cochran et al. (2012) provided average production rates with respect to \chCN scaled to 1au1au1\ $\mathrm{a}\mathrm{u}$1 roman_au; as such we set all measurements from this source to heliocentric distance rh=1ausubscript𝑟1aur_{h}=1\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = 1 roman_au when incorporating their results into our dataset.

We have taken the steps described above, applying a consistent methodology when selecting which source to use for a given composition measurement, in order to utilise the wide range of literature measurements in a reliable manner. We note that choice of source will greatly affect the outcome of an analysis such as ours. For example there is significant variation in the \chCO2/\chH2O production rates of comets such as 19P between Ootsubo et al. (2012) & Reach et al. (2013). We make available the full data table with all comet composition and size measurements (and their literature sources) so that in all cases the provenance of the data is clear. Furthermore we hope that this data collection is of value to future studies investigating the size and/or compositions of comets. A sample of the dataset is displayed in Table B and the full dataset is available at this link.

3 Observational findings

3.1 Raw data analysis

Figure 1 presents the abundance relative to water of a range of species commonly considered to be parent species as a function of the nucleus radius, for our entire data set. Each point represents a single comet. Different symbols are used for the ecliptic comets (ECs) and nearly isotropic comets (NICs), and the heliocentric distance of the comet when the abundance measurement was performed is indicated by the colour scale. Figure 9 shows the same information but for daughter species (and the proxy for dust production rate AfρAf𝜌\mathrm{Af\rho}roman_Af italic_ρ). Only species for which a significant number of measurements were available are shown. Additional figures displaying abundances relative to CN are presented in Figure 10. For each species we assess the possible presence of a correlation between the relative production rate of that species and the comet nuclear size. In order to do so we calculated the Pearson correlation coefficient (γ𝛾\gammaitalic_γ) of these data (in log-log space) to measure the degree of linear correlation. When the data has strong linear correlation γ𝛾\gammaitalic_γ has values approaching ±1plus-or-minus1\pm 1± 1 (signifying positive or negative correlation). In order to test the statistical significance of γ𝛾\gammaitalic_γ, a p𝑝pitalic_p-value is also calculated, which represents the probability of obtaining a result assuming that the null hypothesis (no correlation) is true. Therefore, when we measure large values of γ𝛾\gammaitalic_γ with a corresponding small value of p𝑝pitalic_p we can assume that the correlation is statistically significant, where typically a significance level of p<0.05𝑝0.05p<0.05italic_p < 0.05 is the de facto threshold often used in literature. Table 2 presents the values of the Pearson correlation and p𝑝pitalic_p-values for all species for ECs and NICs separately as well as for the full sample. The exact p𝑝pitalic_p-value for a significant result can be a somewhat arbitrary choice, therefore in Table 2 we have highlighted different ranges of p𝑝pitalic_p-value. We select thresholds that are analogous to 3-, 2-, 1-sigma significances, i.e. p0.003𝑝0.003p\leq 0.003italic_p ≤ 0.003 (strong significance), 0.003<p0.050.003𝑝0.050.003<p\leq 0.050.003 < italic_p ≤ 0.05 (moderate significance), and 0.05<p0.320.05𝑝0.320.05<p\leq 0.320.05 < italic_p ≤ 0.32 (marginal significance), respectively.

The strongest correlation by far is seen for CO at the 3-sigma level. CO is one of the most volatile ices in cometary nuclei. The trend is stronger for ECs than NICs. One potential bias to keep in mind is that short period comets tend to have lower CO abundances due to repeated passages close to the Sun (Dello Russo et al., 2016). The effect of the heliocentric distance on the CO abundance measurements in comets and how it could affect these results is discussed below in Section 3.2. For more detail on the compilation of \chCO/\chH2O abundances from the literature, please refer to Appendix D.

We also see a trend at the 1-sigma level for HCN for ECs and no significant trend for the NICs, however, the correlation significance increases to 2-sigma when the whole sample is considered. As noted by Biver et al. (2022b), HCN production rates derived from millimetre observations can differ from production rates derived from infrared observations by typically a factor two, which complicates the interpretation of the trend for HCN. We do not see any trend for CN but this is not entirely surprising. While CN was originally thought to be produced by the photo-dissociation of HCN, evidence indicates that another source is needed to account for the CN abundance and morphology in comets. This other source could be another parent species (C2N2, HC3N, or CH3CN), sublimation of dust grain, salts, or macro-molecules (Biver et al., 2022b). The importance of this additional source could vary from comet to comet and explain the different trends seen for CN and HCN.

We see a trend at the 2-sigma level for H2CO for the ECs and at 1-sigma for the sample as a whole. For CO2, we see a 1-sigma level trend for the ECs and the full sample. We do not see any correlation for any of the other typical parent species. However, this does not mean that the correlation is not present, but most likely that the current data available are insufficient to draw strong conclusions. This should improve in the future when full composition and size measurements become available for a larger number of comets. The only exception might be methanol for which the number of data points available are similar to CO, CO2, and HCN, but no trend can be seen.

With the possible exception of \chCS, we generally do not see strong correlations for daughter species. However, we do note a moderately significant anti-correlation for NH which is driven by the ECs. This correlation is surprising given the lack for correlation for ECs for NH2 and NH3. However, the scale-length used to compute NH production rates using the Haser model are difficult to constrain, which could influence the results. We thus disregard the NH trend for the rest of the discussion.

In Figure 9, we also see a 2-sigma correlation for the AfρAf𝜌\mathrm{Af\rho}roman_Af italic_ρ for NICs, with higher dust to gas ratios for larger comets. An increase of the AfρAf𝜌\mathrm{Af\rho}roman_Af italic_ρ/OH ratio at large heliocentric distances has been noticed by A’Hearn et al. (1995) and Langland-Shula & Smith (2011), which has been explained either by a selection effect (high dust to gas ratio comets have higher visual magnitudes), by the presence of large grains less volatile than water, or the build-up of a crust on the surface of the nucleus. Since larger comets tend to be more active, and thus brighter, they can be observed farther from the Sun. This is particularly true for NICs, which are more likely to be observed far from the Sun. The trend of higher dust-to-gas ratio at larger distances from the Sun could thus bias our correlation between AfρAf𝜌\mathrm{Af\rho}roman_Af italic_ρ/H2O and the nucleus size.

Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption
Figure 1: Log scale plots showing the relation between comet composition (of various parent species relative to \chH2O) and radius of the nucleus. Marker shape denotes the dynamical class of each comet, either an Ecliptic Comet (EC, square markers) or Nearly Isotropic Comet (NIC, triangular markers). Marker colour indicates the heliocentric distance of the comet when the composition was measured. The error bars denote the uncertainty in the measured composition or size (when this was available). We indicate the correlation of the radius-composition data with a linear fit (in log-log space) for the whole dataset (solid line), ECs (dotted line) and NICs (dashed line). The Pearson correlation coefficients for each parent species are given in Table 2.
Ecliptic Comets Nearly Isotropic Comets All Comets
Species Number Correlation p𝑝pitalic_p-value Number Correlation p𝑝pitalic_p-value Number Correlation p𝑝pitalic_p-value
\chC2H2/H2O 6 0.7796 0.0675 4 0.9537 0.0463 10 0.3421 0.3332
\chC2H6/H2O 11 -0.3479 0.2944 5 0.0177 0.9775 16 0.0862 0.7508
\chCH3CN/H2O 4 -0.4120 0.5880 2 1.0000 1.0000 6 -0.0430 0.9356
\chCH3OH/H2O 14 -0.2295 0.4300 7 -0.1155 0.8053 21 0.0933 0.6874
\chCH4/H2O 5 -0.5520 0.3347 5 0.0376 0.9521 10 0.2667 0.4564
\chCO2/H2O 19 0.2628 0.2770 10 0.2205 0.5405 29 0.3003 0.1135
\chCO/H2O 8 0.9143 0.0015 9 0.5740 0.1060 17 0.7880 0.0002
\chH2CO/H2O 8 0.7575 0.0295 5 0.5971 0.2877 13 0.3832 0.1962
\chH2S/H2O 6 -0.4590 0.3599 3 0.6947 0.5111 9 0.2737 0.4761
\chHCN/H2O 15 0.4108 0.1283 7 0.3957 0.3796 22 0.4678 0.0281
\chNH3/H2O 10 0.1611 0.6565 4 0.1050 0.8950 14 -0.2537 0.3815
Table 2: Table showing the results of the Pearson correlation tests between abundance of parent species and nucleus size, described in Section 3. For each species abundance (with respect to \chH2O) we present the correlations for dynamical subsets of the data, ecliptic comets and nearly isotropic comets, as well as the results for all comets. For each group we state the number of comets analysed, the Pearson correlation coefficient and the associated p𝑝pitalic_p-value of the correlation test. We highlight the species and dynamical groups in order of Pearson correlation significance, i.e. by p𝑝pitalic_p-value. The strongest significance correlations (p0.003𝑝0.003p\leq 0.003italic_p ≤ 0.003, equivalent to a 3-sigma threshold) are indicated by dark grey cells with white text. Moderate significance (0.003<p0.050.003𝑝0.050.003<p\leq 0.050.003 < italic_p ≤ 0.05, 2-sigma) results are indicated by grey cells with white text. Results with only marginal significance (0.05<p0.320.05𝑝0.320.05<p\leq 0.320.05 < italic_p ≤ 0.32, 1-sigma) are shaded light grey. All other results have been deemed to be statistically insignificant in this analysis (plain white cells).

3.2 The critical influence of heliocentric distance

Cometary activity is driven by solar heating and is therefore strongly correlated with the heliocentric distance. This is why we have assessed the abundance ratio of each species relative to a common volatile such as \chH2O rather than considering production rates directly. However, more volatile species are able to drive activity at lower temperatures and greater heliocentric distances. As such we must assess whether the correlations reported above are driven primarily by comet size or by heliocentric distance of the measurements. It is for example expected that the abundances of CO/H2O and CO2/H2O increase for comets past 2-3 au as the water sublimation becomes less efficient (Dello Russo et al., 2016; Ootsubo et al., 2012). For other species, Langland-Shula & Smith (2011) reported a trend of decreasing C2/CN ratio as comets moved away from the Sun whereas A’Hearn et al. (1995); Cochran et al. (2012); Fink (2009) did not report a similar trend. Dello Russo et al. (2016) report increases in the abundance of H2CO, NH3, and C2H2 within heliocentric distances of 0.8 au compared to the abundances measured between 1 and 2 au, which might be caused by an additional contribution from extended sources.

On the plots shown in Figure 1 the colour of the data points reflects the heliocentric distance of the observation. For most species, there is no obvious heliocentric trend in the plots. However, for some of the species which show the strongest correlations between abundance and size (\chCO, \chCO2, \chHCN) there are strong indications of a compositional dependence on heliocentric distance. In order to test this additional correlation, in Figure 2 we plot the abundance of these species (relative to \chH2O) against heliocentric distance. This is the same data as shown in Figure 1, with the exception of an outlying \chHCN/H2O measurement for C/2002 X5 made at rh=0.21ausubscript𝑟0.21aur_{h}=0.21\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = 0.21 roman_au, which we have excluded as an outlier in terms of heliocentric distance. There is strong positive correlation between abundance of these species and heliocentric distance as indicated by the corresponding values of γ𝛾\gammaitalic_γ and p𝑝pitalic_p provided in Figure 2. In addition, we highlight these trends with a linear fit in log-log space.

In order to test the strength of the heliocentric distance dependence we considered the composition - size correlations for subsets of the data that have been limited to measurements made with heliocentric distances of rh<2ausubscript𝑟2aur_{h}<2\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT < 2 roman_au. This restricted range is selected to remove observations at large rhsubscript𝑟r_{h}italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT where the changes in the production rate of \chH2O due to reduced solar heating may skew the measured abundance of a particular species (see further discussion in Section 4). However, it must be noted that this selection criteria preferentially excludes some of the largest comets, primarily due to observational biases. Figure 3 and Table 3 show the results of this test and we see that the subset of \chCO/\chH2O abundances still displays a statistically significant Pearson correlation coefficient (γ=0.750𝛾0.750\gamma=0.750italic_γ = 0.750, p=0.0031𝑝0.0031p=0.0031italic_p = 0.0031) at the 2-sigma level (albeit on the 3-sigma boundary). This implies that the correlation between \chCO/\chH2O and comet size dominates over the rhsubscript𝑟r_{h}italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT dependence. In contrast, the positive correlations of \chCO2/\chH2O and \chHCN/\chH2O disappear or are reduced in significance (γ=0.295𝛾0.295\gamma=-0.295italic_γ = - 0.295, p=0.352𝑝0.352p=0.352italic_p = 0.352 and γ=0.312𝛾0.312\gamma=0.312italic_γ = 0.312, p=0.207𝑝0.207p=0.207italic_p = 0.207 respectively), implying that the correlations seen in Figure 1 are driven primarily by heliocentric distance effects for these species.

To further test the robustness of this correlation we repeated the analysis with statistical resampling of the 13 comets measured at rh<2ausubscript𝑟2aur_{h}<2\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT < 2 roman_au in our \chCO/\chH2O dataset. We conducted a bootstrap resampling, i.e. sampling with replacement, and found that over the course of 10,000 repeats 52% of the resulting correlations were of 3-sigma significance. 93% of tests had a significance of 2-sigma or stronger. We also conducted a jack-knife resampling, where the test is repeated with a given data point dropped in turn. In this test the correlation had 3-sigma significance 23% of the time and all permutations resulted in at least a 2-sigma correlation. The weakest correlation occurred when C/1995 O1 was excluded, as would be expected given that this is the largest comet in the dataset, however the overall correlation was still moderate (γ=0.68𝛾0.68\gamma=0.68italic_γ = 0.68, p=0.016𝑝0.016p=0.016italic_p = 0.016). Overall these resampling tests show that the correlation between \chCO/\chH2O and nucleus size is relatively robust for the given dataset and is not overly dominated by a particular comet. However we acknowledge that our dataset is limited by its small size and the inherent difficulties in accurately determining the size and composition of cometary nuclei. The veracity of this correlation would be greatly strengthened with more measurements and improved estimates of the \chCO abundances in particular, given the variation in literature values for some comets.

Refer to caption Refer to caption Refer to caption
Figure 2: Plots of the \chCO/H2O, \chCO2/H2O, \chHCN/H2O abundance as a function of the mean heliocentric distance of the observations for each comet. The datasets consists of the same comets as the corresponding panels in Figure 1 (except for \chHCN/H2O where we have excluded an outlying measurement for C/2002 X5 at rh=0.21ausubscript𝑟0.21aur_{h}=0.21\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = 0.21 roman_au). As in Figure 1, square and triangle markers denote ECs and NICs, respectively, and here the marker colour indicates the comet nucleus size. The Pearson correlation coefficients for abundance vs rhsubscript𝑟r_{h}italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT are γ=0.810𝛾0.810\gamma=0.810italic_γ = 0.810 (p=8×105𝑝8superscript105p=8\times 10^{-5}italic_p = 8 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT), γ=0.410𝛾0.410\gamma=0.410italic_γ = 0.410 (p=2.7×102𝑝2.7superscript102p=2.7\times 10^{-2}italic_p = 2.7 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT), γ=0.709𝛾0.709\gamma=0.709italic_γ = 0.709 (p=2×104𝑝2superscript104p=2\times 10^{-4}italic_p = 2 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) for \chCO/H2O, \chCO2/H2O, \chHCN/H2O respectively.
Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption
Figure 3: Log-scale plots of parent species abundance (relative to \chH2O) as a function of radius of the nucleus. These plots are similar to Figure 1, but consider only compositional measurements with heliocentric distances rh<2ausubscript𝑟2aur_{h}<2\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT < 2 roman_au as observations at large heliocentric distances may not accurately reflect the comet composition. The radius-composition correlation for all comets in the sample is indicated by a linear trend line fit (solid line), the corresponding Pearson correlation coefficients are provided in Table 3.
Ecliptic Comets Nearly Isotropic Comets All Comets
Species Number Correlation p𝑝pitalic_p-value Number Correlation p𝑝pitalic_p-value Number Correlation p𝑝pitalic_p-value
\chC2H2/H2O 5 0.8573 0.0633 4 0.9537 0.0463 9 0.4411 0.2347
\chC2H6/H2O 10 -0.6085 0.0619 4 0.6838 0.3162 14 0.0340 0.9083
\chCH3CN/H2O 4 -0.4120 0.5880 2 1.0000 1.0000 6 -0.0430 0.9356
\chCH3OH/H2O 13 -0.2809 0.3526 5 0.2691 0.6616 18 -0.0162 0.9493
\chCH4/H2O 5 -0.5520 0.3347 4 0.8068 0.1932 9 0.4084 0.2752
\chCO2/H2O 10 -0.0354 0.9227 2 1.0000 1.0000 12 -0.2947 0.3524
\chCO/H2O 6 0.6573 0.1560 7 0.5990 0.1553 13 0.7503 0.0031
\chH2CO/H2O 8 0.7575 0.0295 5 0.5971 0.2877 13 0.3832 0.1962
\chH2S/H2O 6 -0.4590 0.3599 2 1.0000 1.0000 8 0.0712 0.8669
\chHCN/H2O 13 -0.1625 0.5958 5 0.9446 0.0155 18 0.3124 0.2069
\chNH3/H2O 9 0.1157 0.7669 4 0.1050 0.8950 13 -0.2562 0.3981
Table 3: Similar to Table 2, here we show the Pearson correlations coefficients for the composition-radius relations for the parent species shown in Figure 1. As described in Section 3.2 only compositions with rh<2ausubscript𝑟2aur_{h}<2\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT < 2 roman_au where considered.
Ecliptic Comets Nearly Isotropic Comets All Comets
Species Number Correlation p𝑝pitalic_p-value Number Correlation p𝑝pitalic_p-value Number Correlation p𝑝pitalic_p-value
\chC2H2/H2O 2 -1.0000 1.0000 4 0.9537 0.0463 6 0.4825 0.3324
\chC2H6/H2O 7 0.3974 0.3774 4 0.6838 0.3162 11 0.7194 0.0126
\chCH3CN/H2O 2 1.0000 1.0000 2 1.0000 1.0000 4 0.4053 0.5947
\chCH3OH/H2O 9 0.3317 0.3832 5 0.2691 0.6616 14 0.3563 0.2112
\chCH4/H2O 4 -0.1136 0.8864 4 0.8068 0.1932 8 0.6835 0.0616
\chCO2/H2O 6 0.7001 0.1214 2 1.0000 1.0000 8 0.1737 0.6809
\chCO/H2O 4 -0.1458 0.8542 7 0.5990 0.1553 11 0.6136 0.0447
\chH2CO/H2O 6 0.7043 0.1182 5 0.5971 0.2877 11 0.2525 0.4537
\chH2S/H2O 4 0.0417 0.9583 2 1.0000 1.0000 6 0.4129 0.4158
\chHCN/H2O 9 0.0371 0.9245 5 0.9446 0.0155 14 0.5512 0.0411
\chNH3/H2O 6 0.5948 0.2131 4 0.1050 0.8950 10 -0.1333 0.7135
Table 4: Similar to the results shown in Table 3 we display the Pearson correlation coefficients for the relation between parent species abundance and nucleus radius. In this analysis we have considered only measurements with rh<2ausubscript𝑟2aur_{h}<2\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT < 2 roman_au and made a further cut to exclude comets with radii <1kmabsent1km<1\ $\mathrm{k}\mathrm{m}$< 1 roman_km; these smaller objects are more likely to be collisional fragments as opposed to primordial nuclei.

3.3 Potential significance of a minimum size cutoff

3.3.1 Dependence on unknown fragmentation history

An important assumption in this work is that the cometary nuclei in the dataset have not had their internal composition and structure significantly altered since their formation, e.g. by collisional or tidal events. However it has been proposed that many small Solar System bodies could be fragments of larger, primordial parent bodies, either through past collisions (Morbidelli & Rickman, 2015), or tidal/rotational breakup (Boehnhardt, 2002). Due to the limited size of our dataset, and the difficulties in definitively determining the history of small bodies without detailed in situ analysis, we have thus far only excluded the comets with a known fragmentation history from our analysis.

Here we perform a brief test to determine the influence of potential unknown fragmentation history in the data. This could distort the abundances displayed by the smallest comets in our dataset and therefore skew the results, altering the real composition-size correlation by reflecting the abundance of the larger primordial parent objects instead. In order to check for this possibility, we have repeated the Pearson correlation analysis from the previous section, but in addition to the rh<2ausubscript𝑟2aur_{h}<2\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT < 2 roman_au restriction we also removed any comets with radius <1kmabsent1km<1\ $\mathrm{k}\mathrm{m}$< 1 roman_km. The 1 km cutoff was determined arbitrarily, representing exceptionally small nuclei.

The results of this test are displayed in Table 4. In comparison to Table 3 we see that the removal of the smallest comets in the dataset leads to more significant correlations for some abundances, e.g. \chC2H6/\chH2O with a decrease in p𝑝pitalic_p-value of 0.9080.0120.9080.0120.908\rightarrow 0.0120.908 → 0.012. However, the significance of the correlation for other species is reduced when these objects are excluded. E.g., For \chCO/\chH2O, the p𝑝pitalic_p-value increased from 0.0030.0450.0030.0450.003\rightarrow 0.0450.003 → 0.045. Given that the sample now consists of a smaller number of comets a reduction of statistical significance is generally to be expected. Despite this, the correlation became highly significant (3-sigma) for ethane, which is among the species with the lowest sublimation temperature beyond that of methane.

3.3.2 Dependence on volatility

For species that are much less volatile than \chCO, e.g. \chCO2 and \chHCN, the thermophysical models of Malamud et al. (2022) predict that significant migration and differentiation requires higher temperatures and thus larger cometary nuclei. As such, a correlation between these abundances and size might only be relevant beyond a certain size threshold.

In order to test this hypothesis we consider the correlation of subsets of the dataset limited by comet radius for \chCO2/HCN and \chHCN/H2O. In each test we select only the comets in the dataset with radius >0,1,2,9kmabsent0129km>0,1,2,...9\ $\mathrm{k}\mathrm{m}$> 0 , 1 , 2 , … 9 roman_km and determine the Pearson correlation coefficient as before, the results of which are shown in Figure 4. For \chCO2/\chH2O the correlation stays approximately the same and the p𝑝pitalic_p-value increases as more data points are removed; the Pearson correlation test on fewer data points produces less significant results as one would expect. Therefore we cannot easily determine if this correlation is driven primarily by the comet nuclear size, or if it is being influenced by the more complicated size-heliocentric distance observational bias (Figure 2). For \chHCN/H2O the results of this correlation test fluctuate significantly, indicating that this dataset is strongly influenced by the specific objects being considered. This test implies that both datasets would benefit greatly from being larger and having improved coverage across the range of comet sizes, with more accurate measurements taken in a uniform manner and preferentially from observations at distances less than 2 au.

Refer to caption
Refer to caption
Refer to caption
Figure 4: Testing the composition - size correlation as a function of comet sizes for the abundances of \chCO2/H2O and \chHCN/H2O. Upper panel: The inverse cumulative distribution of number of comets greater than some radius. Middle panel: The Pearson correlation coefficient of comet composition and size (in log-log space) as a function of the smallest comet radius included in the data subset. Lower panel: The p𝑝pitalic_p-value corresponding to each correlation test in the middle panel. The results for \chCO2/H2O and \chHCN/H2O are indicated by dotted and dashed lines respectively.

4 Discussion

4.1 A model for explaining the correlation between CO and size

The findings in Section 3 show that the clearest indication of correlation between size and composition exists within the activity of CO, confirming our first prediction in Section 1. In what follows we attempt to reconcile the composition-size trend with the theoretical model of Malamud et al. (2022).

In Section 1 we described the likely mode of transport of hyper-volatile gases within the nucleus. In particular, internal CO might have initially existed as either a pure ice condensate, or as a trapped gas within an amorphous ice host. In most comets the second option might be the more likely, as also newly indicated by the activity of comet 67P/C-G (Rubin et al., 2023), but regardless of which of the two options is correct, internal radiogenic heating would drive migration outwards. Beyond a certain temperature threshold, CO would either sublimate or be released due to the phase transitions of amorphous hosts such as CO2 or H2O. While migrating out, it would encounter cold matrices of pristine amorphous ice. It can therefore become re-incorporated in the amorphous ice host as trapped gas. Multiple lab experiments have demonstrated this to occur. Unlike co-deposition, which is the entrapment of high-volatility gases during the deposition of the amorphous host itself, the process we refer to above is often called sequential deposition (sometimes also entrapment via gas-flow or gas-streaming). It relates to gas which was streamed into an already existing, pre-deposited amorphous ice host. Seminal lab studies have shown that it is an effective way to trap high-volatility species (Bar-Nun et al., 1985; Laufer et al., 1987; Bar-Nun et al., 1987, 1988).

Unfortunately, the Malamud et al. (2022) code was not explicitly designed to treat the incorporation or release of high-volatility gases into or out of amorphous water ice. It can currently only handle the sublimation and deposition of some species such as CO or CH4, but only as pure condensates. We shall therefore only employ an approximate calculation, giving us a rough estimation of how much CO should be re-incorporated into amorphous ice and in turn quantify the degree of near-surface amorphous ice CO-enrichment, as a consequence.

For the initial state of the comet, prior to any heating, we assume that the internal composition is uniform. Consider that some small fraction of CO is trapped within the amorphous ice hosts – CO2 or H2O – and is released during their respective crystallisation phase transitions. For simplicity, here we neglect the possibility of CO initially present as pure ice condensate, because these two scenarios are related. Then, following the aforementioned phase transitions, released CO migrates out towards the surface, and it can become re-trapped within the ice in its path when the temperature is sufficiently cold, but still higher than the temperature of its deposition as pure condensate. Sequential deposition of CO leads to enrichment of the CO fraction stored in the amorphous ice.

Assuming that the comets we observe probe the CO/H2O ratio of trapped hyper-volatiles as envisioned above, we can attempt to interpret the enrichment pattern of larger comet nuclei - using the theoretical model of Malamud et al. (2022). In their work, Figures 5 through 13 showed the distribution of internal temperatures in the comet as a function of various realisations of the model parameters. The two most important model parameters were the nucleus size and the comet formation time. The temperatures correlated with the nucleus size and anti-correlated with formation time. The internal temperatures within different volume fractions in the nucleus were depicted according to a colour scheme. The black colour in those figures represented relatively pristine material heated below 70 K (all amorphous ices are stable against phase transitions); red depicted temperatures in the range 70K<T𝑇Titalic_T<100K (allowing CO2 amorphous ice to release trapped CO gas when undergoing a phase transition); orange depicted temperatures in the range 100K<T𝑇Titalic_T<170K (allowing H2O amorphous ice to likewise release its trapped gases); yellow and white correspond to even higher temperature thresholds (also corresponding to full release of all trapped CO). We assume that the released CO gas flows toward the pristine (black coloured) volume fraction, which is closer to the surface. As already pointed out, this layer is sufficiently cold to keep its amorphous ices in their pristine state, but now allowing the excess CO gas to become re-entrapped there, enriching the CO abundance. For simplicity, we assume that these layers are enriched with CO uniformly. Given the aforementioned dependence on model parameters, it immediately follows that the degree of enrichment is greater for large comets and/or comets with a smaller formation time.

In order to compare the model enrichment to the observations, we will derive a simple mathematical formula based on the assumptions above. In Figures 5, 6 & 7 we plot the model predictions alongside the observed data points. For the comparison we use observed data points from Figure 3. It must be noted that Figure 1 considers the CO/H2O mass ratio of comets up to an observed distance of 6 au. However, comets that are observed beyond 2 au are not able to directly sublimate significant amounts of water ice. In contrast, they are certainly able to expel trapped CO gas, released through crystallisation of the amorphous ice hosts. While water can still be expelled to some extent as a byproduct of hyper-volatile activity, we can expect the CO/H2O ratio to be enhanced. Therefore, beyond 2 au, this ratio should not be indicative of the intrinsic mass fraction of entrapped CO within the amorphous host ice at the surface. Indeed, Figure 1 shows that the peak observed CO/H2O ratios are in excess of 1. This is only possible due to the large observed distance, because the amorphous ice host cannot contain more trapped gas than matrix (Carmack et al., 2023). Figure 3 on the other hand, shows only the comets whose distance from the Sun at the time of observation is less than 2 au. It should therefore be more indicative of the intrinsic properties of the amorphous ices, which is why we have chosen to use it for the comparison.

In order to plot the theoretical enrichment curves, we first express Figures 5-13 of Malamud et al. (2022) in terms of mass rather than volume. We then define the following free parameters with respect to mass: fH2Osubscript𝑓H2Of_{\rm H2O}italic_f start_POSTSUBSCRIPT H2O end_POSTSUBSCRIPT - the mass fraction of amorphous H2O ice in the nucleus; fCO2subscript𝑓CO2f_{\rm CO2}italic_f start_POSTSUBSCRIPT CO2 end_POSTSUBSCRIPT - the mass fraction of amorphous CO2 ice in the nucleus; and fCOsubscript𝑓COf_{\rm CO}italic_f start_POSTSUBSCRIPT roman_CO end_POSTSUBSCRIPT - the initial mass fraction of trapped CO in the amorphous H2O or CO2 host ices (assumed to be equal for simplicity).

Comets are presently regarded to be highly refractory-rich bodies, having a refractory to ice mass ratios in approximately the range of 3-5 (Rotundi et al., 2015; Fulle et al., 2016, 2017, 2019; Choukroun et al., 2020), and comet 67P/C-G, the most well-studied cometary archetype (Fulle et al., 2016; Filacchione et al., 2019; Groussin et al., 2019) has a refractory to ice mass ratio of about 4. We therefore use a combined ice mass fraction of fH2O+fCO2=0.2subscript𝑓H2Osubscript𝑓CO20.2f_{\rm H2O}+f_{\rm CO2}=0.2italic_f start_POSTSUBSCRIPT H2O end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT CO2 end_POSTSUBSCRIPT = 0.2 to comply with these estimates. For the mass ratio between H2O and CO2 we also rely on estimates from comet 67P/C-G, with a respective ratio of similar-to\sim15 (Rubin et al., 2023). For our fiducial parameter set we thus have: fH2O=0.1875subscript𝑓H2O0.1875f_{\rm H2O}=0.1875italic_f start_POSTSUBSCRIPT H2O end_POSTSUBSCRIPT = 0.1875 and fCO2=0.0125subscript𝑓CO20.0125f_{\rm CO2}=0.0125italic_f start_POSTSUBSCRIPT CO2 end_POSTSUBSCRIPT = 0.0125.

The choice of ftrapCOsubscript𝑓trapCOf_{\rm trapCO}italic_f start_POSTSUBSCRIPT roman_trapCO end_POSTSUBSCRIPT is motivated by the raw observed data. Figure 3 shows that the smallest CO/H2O number fraction, in the smallest comet nucleus, is around 0.003. Were comets to be completely pristine, this would have given us the approximate value of ftrapCOsubscript𝑓trapCOf_{\rm trapCO}italic_f start_POSTSUBSCRIPT roman_trapCO end_POSTSUBSCRIPT (amorphous water ice near the surface sublimates along with its entrapped CO, which is uniformly distributed throughout the whole nucleus). However, Figures 5-13 in Malamud et al. (2022) show that even comets with radii as small as 0.5 km can still attain temperatures in excess of 70 K deep beneath the surface, despite their small size (but only when minimising their formation time). Therefore, even in small nuclei some degree of migration and enrichment of CO is possible, and in such cases the incipient CO/H2O could be slightly smaller than 0.003. To account for this, we choose a round value of CO/H2O=0.001, slightly lower than yet characteristic of the 0.003 minimum observed. From this we obtain the mass fraction ftrapCOsubscript𝑓trapCOf_{\rm trapCO}italic_f start_POSTSUBSCRIPT roman_trapCO end_POSTSUBSCRIPT (multiplying by the molecular weight ratio - see below), capturing the right order of magnitude based on the minimum observed CO fraction.

Using F1𝐹1F1italic_F 1, F2𝐹2F2italic_F 2 and F3𝐹3F3italic_F 3 to denote the mass fractions of layers within comet nuclei that have T<70𝑇70T<70italic_T < 70 K, 70<T<10070𝑇10070<T<10070 < italic_T < 100 K, T>100𝑇100T>100italic_T > 100 K, obtained from Malamud et al. (2022), we can calculate the fraction of CO released from the bulk of the comet, denoted as

CObulk=F3fH2OftrapCO+(F2+F3)fCO2ftrapCOsubscriptCObulk𝐹3subscript𝑓H2Osubscript𝑓trapCO𝐹2𝐹3subscript𝑓CO2subscript𝑓trapCO{\rm CO}_{\rm bulk}=F3\cdot f_{\rm H2O}\cdot f_{\rm trapCO}+(F2+F3)\cdot f_{% \rm CO2}\cdot f_{\rm trapCO}roman_CO start_POSTSUBSCRIPT roman_bulk end_POSTSUBSCRIPT = italic_F 3 ⋅ italic_f start_POSTSUBSCRIPT H2O end_POSTSUBSCRIPT ⋅ italic_f start_POSTSUBSCRIPT roman_trapCO end_POSTSUBSCRIPT + ( italic_F 2 + italic_F 3 ) ⋅ italic_f start_POSTSUBSCRIPT CO2 end_POSTSUBSCRIPT ⋅ italic_f start_POSTSUBSCRIPT roman_trapCO end_POSTSUBSCRIPT (1)

Using Eq. 1, the fraction of CO newly trapped inside the pristine amorphous water ice, i.e. the degree of its enrichment, denoted by fenrichCOsubscript𝑓enrichCOf_{\rm enrichCO}italic_f start_POSTSUBSCRIPT roman_enrichCO end_POSTSUBSCRIPT, is approximately given by

fenrichCO(F1(fH2O+fCO2)ftrapCO+CObulkF1fH2O)mH2OmCOsubscript𝑓enrichCO𝐹1subscript𝑓H2Osubscript𝑓CO2subscript𝑓trapCOsubscriptCObulk𝐹1subscript𝑓H2Osubscript𝑚H2Osubscript𝑚COf_{\rm enrichCO}\cong\left(\frac{F1\cdot(f_{\rm H2O}+f_{\rm CO2})\cdot f_{\rm trapCO% }+{\rm CO}_{\rm bulk}}{F1\cdot f_{\rm H2O}}\right)\frac{m_{\rm H2O}}{m_{\rm CO}}italic_f start_POSTSUBSCRIPT roman_enrichCO end_POSTSUBSCRIPT ≅ ( divide start_ARG italic_F 1 ⋅ ( italic_f start_POSTSUBSCRIPT H2O end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT CO2 end_POSTSUBSCRIPT ) ⋅ italic_f start_POSTSUBSCRIPT roman_trapCO end_POSTSUBSCRIPT + roman_CO start_POSTSUBSCRIPT roman_bulk end_POSTSUBSCRIPT end_ARG start_ARG italic_F 1 ⋅ italic_f start_POSTSUBSCRIPT H2O end_POSTSUBSCRIPT end_ARG ) divide start_ARG italic_m start_POSTSUBSCRIPT H2O end_POSTSUBSCRIPT end_ARG start_ARG italic_m start_POSTSUBSCRIPT roman_CO end_POSTSUBSCRIPT end_ARG (2)

where mH2O=18subscript𝑚H2O18m_{\rm H2O}=18italic_m start_POSTSUBSCRIPT H2O end_POSTSUBSCRIPT = 18 and mCO=28subscript𝑚CO28m_{\rm CO}=28italic_m start_POSTSUBSCRIPT roman_CO end_POSTSUBSCRIPT = 28 are the molecular weights of H2O and CO molecules. The molecular weight ratio is required in order to go from mass fraction to number fraction, as in our reported values from observations. Note that the actual ratios in the coma also depend on the relative life times of the molecules in the coma, an effect which we do not consider here. Therefore, Eq. 2 has to be taken only as a first-order approximation, but this approximation is good enough to capture the trend in the data. Recall again that ftrapCOsubscript𝑓trapCOf_{\rm trapCO}italic_f start_POSTSUBSCRIPT roman_trapCO end_POSTSUBSCRIPT denotes the initial uniform fraction of trapped CO within the amorphous ice matrices (assumed equal for H2O and CO2), whereas fenrichCOsubscript𝑓enrichCOf_{\rm enrichCO}italic_f start_POSTSUBSCRIPT roman_enrichCO end_POSTSUBSCRIPT denotes the final enriched ratio in the remaining H2O amorphous ice.

Figures 5-7 show the CO/H2O abundances predicted by Eq. 2. Different lines depict different formation times for each comet (quicker formation corresponds to greater heating by short-lived radionuclides), and the observations are marked by the full circles, for comparison. A detailed explanation of all the model parameters is given in Malamud et al. (2022). Here we provide a brief explanation. The mineral fraction is introduced to the model since radionuclides are only incorporated into refractory silicate minerals, and not organics. The former might not be present in comets in the same proportion as they are in meteorites, based on which the radionuclide information is derived (we also consider 50% and 5% of meteoritic fraction). The pebble radius controls heat and mass transport inside the comet, and we take a binary selection for the pebble radii of 1 mm and 1 cm. This choice roughly represents the lower and upper limits expected in the literature. The permeability b𝑏bitalic_b coefficient is related to the Knudsen diffusivity and in turn gas permeability and flow within the comet. We also consider lower and upper limit values.

It is encouraging that a physical interpretation, however approximated, nicely captures the observed CO/H2O trend. If some comets were to form earlier than others, the various curves span the desired range of the observations. We note that only one set of parameters is adopted for fH2Osubscript𝑓H2Of_{\rm H2O}italic_f start_POSTSUBSCRIPT H2O end_POSTSUBSCRIPT, fCO2subscript𝑓CO2f_{\rm CO2}italic_f start_POSTSUBSCRIPT CO2 end_POSTSUBSCRIPT and ftrapCOsubscript𝑓trapCOf_{\rm trapCO}italic_f start_POSTSUBSCRIPT roman_trapCO end_POSTSUBSCRIPT in these plots, however our choice of parameters was physically motivated by 67P and explained above. We had also experimented with changing the fH2Osubscript𝑓H2Of_{\rm H2O}italic_f start_POSTSUBSCRIPT H2O end_POSTSUBSCRIPT:fCO2subscript𝑓CO2f_{\rm CO2}italic_f start_POSTSUBSCRIPT CO2 end_POSTSUBSCRIPT mass ratio considerably, and we always find that as long as H2O is the dominant amorphous ice host, the model keeps capturing the trend with some small variations. It is indeed expected that water is much more prevalent than carbon dioxide in (the bulk of) comet nuclei.

Refer to caption
(a) Pebble radius =0.1absent0.1=0.1= 0.1 cm; permeability b=1𝑏1b=1italic_b = 1
Refer to caption
(b) Pebble radius =0.1absent0.1=0.1= 0.1 cm; permeability b=7𝑏7b=7italic_b = 7
Refer to caption
(c) Pebble radius =1absent1=1= 1 cm; permeability b=1𝑏1b=1italic_b = 1
Refer to caption
(d) Pebble radius =1absent1=1= 1 cm; permeability b=7𝑏7b=7italic_b = 7
Figure 5: CO/H2O ratio: observed data points (full circles) versus theoretical curves predicted by Malamud et al. (2022) (different lines depict different formation times - see legend). The mineral fraction is 1. Other model parameters vary as indicated in the sub-caption. Parameters are explained in the main text.
Refer to caption
(a) Pebble radius =0.1absent0.1=0.1= 0.1 cm; permeability b=1𝑏1b=1italic_b = 1
Refer to caption
(b) Pebble radius =0.1absent0.1=0.1= 0.1 cm; permeability b=7𝑏7b=7italic_b = 7
Refer to caption
(c) Pebble radius =1absent1=1= 1 cm; permeability b=1𝑏1b=1italic_b = 1
Refer to caption
(d) Pebble radius =1absent1=1= 1 cm; permeability b=7𝑏7b=7italic_b = 7
Figure 6: CO/H2O ratio: observed data points (full circles) versus theoretical curves predicted by Malamud et al. (2022) (different lines depict different formation times - see legend). The mineral fraction is 0.5. Other model parameters vary as indicated in the sub-caption. Parameters are explained in the main text.
Refer to caption
(a) Pebble radius =0.1absent0.1=0.1= 0.1 cm; permeability b=1𝑏1b=1italic_b = 1
Refer to caption
(b) Pebble radius =0.1absent0.1=0.1= 0.1 cm; permeability b=7𝑏7b=7italic_b = 7
Refer to caption
(c) Pebble radius =1absent1=1= 1 cm; permeability b=1𝑏1b=1italic_b = 1
Refer to caption
(d) Pebble radius =1absent1=1= 1 cm; permeability b=7𝑏7b=7italic_b = 7
Figure 7: CO/H2O ratio: observed data points (full circles) versus theoretical curves predicted by Malamud et al. (2022) (different lines depict different formation times - see legend). The mineral fraction is 0.05. Other model parameters vary as indicated in the sub-caption. Parameters are explained in the main text.

Based on Figures 5-7 one might also speculate further that the peak CO/H2O mass ratio found for mid-to-large sized comets can be more readily explained (a) by early formation; or (b) if the mineral fraction is not as small as 0.05. In addition, the spread in CO/H2O ratios at each size bin may indicate that comets are formed over an extended period of time and/or have a large variation in mineral fraction. Current data does not provide an easy way to differentiate between various options, but there is indeed some indication that comet nuclei have varied mineral fractions. Recently, Spitzer remote observations of cometary dust revealed a wide range of amorphous carbon mass fractions spanning 10-90 %, based on a large set of a few dozen comets (Harker et al., 2023). While the mean value of 54% indicates that the mass ratio of silicate minerals to organics is, on average, around 1:1, i.e. very similar to comets 67P/C-G (Bardyn et al., 2017), C/2013 US10 (Catalina) (Woodward et al., 2021) and Halley (Fomenkova & Chang, 1993), a spread in the mineral fraction is currently supported. We think that this point is very important and we strongly advocate for future study of the ratio of silicates/organics in cometary dust.

An additional point is that low CO/H2O ratios in certain comets should simply be a sign that the CO-enriched layers have been largely removed already, indicating that these comets are more dynamically evolved than their high CO/H2O ratio counterparts. A prominent example is the comet 2P/Encke, which we know has been active for at least a few centuries (Marsden & Sekanina, 1974). Thus, at each size bin, if a comet was observed to have a CO/H2O ratio in the upper part of the spread, it might also be considered a sign of having a relatively fresh dynamical origin.

These plots also reveal that the pebble size and the b𝑏bitalic_b coefficient are of lesser importance compared to other parameters, echoing the conclusions already suggested by Malamud et al. (2022).

4.2 Other species less volatile than CO

An intriguing result is the presence of a significant correlation between size and CO/H2O ratio, while observing a lack of correlation, or a weaker correlation among the ratios of some other volatiles. There are two possible explanations. The first might be a simple lack of observations, given that we have a rather small statistical sample for many species. The second explanation is much more fundamental.

We hypothesise that most volatiles released from their amorphous ice hosts would be buried deeper inside the comet, hiding from our sight as only the outermost surface layers are being eroded through activity. Of the many hyper- and super-volatiles that we consider in this work (incorporated into our sample because it is possible to observe them via telescope surveys), only CO and CH4 have extremely low sublimation/deposition temperatures (Womack et al., 2017). That being the case, the other hyper-volatiles would encounter temperatures that should lead to their re-incorporation into amorphous ice before CO and CH4, and the super-volatiles could even deposit as pure ice condensates, when they have characteristic sublimation temperatures that are higher than the crystallisation temperatures of the host amorphous ice. The resulting outcome is that they are buried deeper within the comet, somewhere between the surface and the location of their release by radiogenic heating.

The explanation is qualitatively straightforward and depends on temperature. The surface temperature of an active comet in the inner Solar System is determined by its exact heliocentric distance. However, below the skin depth the temperature is much colder – and is a relic of its previous location before it was perturbed into the inner Solar System. For example, if it came from the Kuiper belt, this temperature is certainly lower than the temperature of crystallisation of amorphous ice, but often not lower than the deposition temperature of CO and CH4 (Lisse et al., 2021; Parhi & Prialnik, 2023). We therefore expect the main volatiles which are not CO or CH4 to be buried deeper below the surface of the comet. Their release location within the comet corresponds to the inner cubic-amorphous ice boundary. In this context, inner refers to the boundary that forms as a result of an internal temperature gradient due to radiogenic heating from within. It should not be confused with the (outer) amorphous-cubic boundary that might form externally by a heat wave propagating inwards from the insolated comet surface (triggering crystallisation). Their exact burial location is a function of the characteristic deposition temperature (e.g., HCN would be buried deeper than C2H6).

One interpretation of our results therefore might be that most active comets are still eroding their outermost layers, which are not significantly enriched by any of these less volatile gases. It should be noted that some species, such as HCN (see Figure 3) do exhibit a much more moderate slope, in contrast to the steep slope we obtained for CO. This could still be reconciled with our hypothesis, since comet nuclei are neither spheri-symmetric nor is their surface eroded homogeneously. In reality only small fractions of the comet surface might be eroded to expose deeper layers, so the integrated result for the entire comet circumference gives the outcome that larger nuclei also release more of these gases, but the slope is moderated by these geometrical factors.

It begs the question however, why is CH4 not giving us the same slope as CO? While CH4 indeed sublimates at a slightly higher temperature, the small difference is not a likely explanation. A more robust explanation involves the trapping efficiency of these two gases in amorphous ice. Bar-Nun et al. (1988) have shown that when both CO and CH4 gases are streamed into a pre-deposited amorphous ice, the entrapment of CH4 is 150 times more efficient than that of CO 555This result is for a temperature of 50 K and when ample CO and CH4 gas is used in the experiment (CH4 and CO molecules differ in both their size and energy of interaction with the host ice due to their polarity, leading to the greater ease of trapping of CH4).. For comets this would mean that sequential deposition of these two gases when they are released together from the underlying crystallising ice, leads to the preferential trapping of CH4, while CO continues to flow until it sees no competition from CH4. The expectation is therefore that CO lies closer to the surface, while CH4 is buried deeper. A final and trivial explanation is that it is simply due to the small amount of data available for CH4, only 10 comets in total, and the larger dispersion of the measurements. This latter comment is however true, in general, for virtually all the other species as well.

4.3 A note about cometary outbursts

We briefly note that cometary outbursts at large heliocentric distances, post perihelion passage, are often associated with a heat wave propagating inwards from an insolated surface, to a deeply buried crystalline-amorphous ice boundary. Upon reaching this boundary with sufficient energy, crystallisation releases latent heat as well as entrapped gases. Latent heat is important because it can trigger further crystallisation, a process however which cannot continue indefinitely, since the eventual sublimation of ice absorbs a large amount of energy. Trapped gases are important since they are effectively the cause of the outbursts. On release, these gases lead to build-up of pressure, and only when this pressure exceeds the tensile strength of the ambient solid materials, it leads to cracking and possibly more rapid expulsion of gas (Prialnik & Bar-Nun, 1987, 1992).

The current study does not alter this basic picture in any way. However, we have envisioned here the formation of localised spots of amorphous ice, highly enriched with various high-volatility species. These are buried at various distances from the surface, based on the sublimation properties of each particular volatile, and after having concentrated them from the bulk of the comet. For a spherically-symmetric nucleus, an onion-like stratification might be expected. Yet in reality comet nuclei have irregular shapes, which means that these pockets might be rather more sporadically placed. The significance in relation to our study, is simply to justify a large fraction of high-volatility species required for an enhanced outburst.

4.4 Other predictions

In Section 1 we also presented two additional predictions besides the general size-composition correlation. We suggested that different dynamical classes of comets might exhibit significant differences in their size-composition correlation, as a result of differences in their erosional state. Our findings in Section 3 however cannot confirm this hypothesis. We believe that this could certainly be due to insufficient statistics to drive a conclusion.

We additionally speculated that dynamically evolved active comets are smaller than their long period counterparts. Figure 3 shows that this is strictly correct (the triangle positions tend to be located more to the right, wheres the squares are positioned more to the left, within each sub-plot). However, we caution that this result might also simply be an observational bias – ECs are generally less active and can be observed inactive closer than NICs and therefore it is preferentially easier to work out the size for them. At this time we require more data to confirm all our other predictions.

4.5 Future observations

In order to improve comparisons between models and comet composition in the future, more data are necessary. In particular, we need larger homogeneous datasets for comet sizes but also comet composition information where the abundance of species is measured simultaneously (or as close to that as possible), at different distances from the Sun, and in a consistent way in terms of observational techniques and models used to derive production rates. With CO and CO2 being the most abundant volatiles in comets, abundance measurements for these elements for a larger number of comets for which we have size information is particularly critical. Additional measurements of CH4 and N2 abundances would also be very valuable, as these species have sublimation temperatures close to that of CO. N2 abundances were not presented in this work as they are extremely difficult to measure. This species was only detected in situ in the coma of comet 67P/C-G by the ROSINA instrument onboard the Rosetta mission (Rubin et al., 2015). While N2 itself is difficult to detect, N+2superscriptsubscriptabsent2{}_{2}^{+}start_FLOATSUBSCRIPT 2 end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT can be observed at optical wavelengths from the ground and has been observed in a handful of comets (Korsun et al., 2008; Ivanova et al., 2018; Cochran & McKay, 2018; Opitom et al., 2019). This can then be used to infer the abundance of N2 in comets. More N2 or N+2superscriptsubscriptabsent2{}_{2}^{+}start_FLOATSUBSCRIPT 2 end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT measurements would be very valuable in the future. In general, this type of work would particularly benefit from a more substantial sample of large comets for which composition information is available. Indeed, this database only contains a handful of comets larger than 5 km with composition information.

5 Conclusions

In this manuscript, following predictions of a model from Malamud et al. (2022), we gathered a large number of literature data to search for correlations between the size and composition of comets.

  • For the dataset we have gathered we found a statistically significant correlation between the CO/H2O abundance ratios and the sizes of both ecliptic and nearly isotropic comets. This trend persists even when selecting for comets observed within 2 au from the Sun, indicating it is not driven by changes in the abundance ratios with heliocentric distance.

  • A weaker correlation was also observed for some other volatile species, however further tests indicate that our analysis would critically benefit from obtaining a bigger statistical sample in the future.

  • We do not see any strong correlations for daughter species.

  • We do not see a similarly strong correlation for CH4, in spite of having a comparable sublimation temperature to that of CO.

We develop a simple theoretical framework based on the Malamud et al. (2022) model, with which we rather accurately obtain the CO/H2O abundance-to-size trend in our observed data. In this framework we consider CO to migrate from the bulk of the nucleus outwards, becoming entrapped within its outer amorphous ice layers, and in turn enhancing their CO-enrichment as a function of the nucleus size.

We emphasise that the correlation between \chCO/\chH2O abundance and size appears to be robust for the dataset we have presented, where we have gathered together a wide range of measurements from the available literature. However this dataset is ultimately limited by its size and also by the intrinsic difficulties in accurately determining the physical properties of cometary nuclei from a variety of observations and techniques. This study would have benefited from, and therefore strongly motivates, a larger homogeneous set of composition measurements in the future, in particular for highly volatile species like CO, CH4, or N2. State of the art observatories, e.g. JWST and the upcoming Vera C. Rubin Observatory and ELT, could provide more opportunities to characterise the physical properties of cometary nuclei, especially the sizes of long period comets which are otherwise sparse in the literature.

Acknowledgements

The authors would like to thank the referee for a thorough review that helped to improve the work. We wish to thank Diana Laufer for providing information about the relative entrapment efficiency of CO versus CH4 in amorphous water ice. We also thank Rosita Kokotanekova for valuable input on the selection of comet sizes from literature sources. UM and HBP acknowledge support by the Niedersächsisches Vorab in the framework of the research cooperation between Israel and Lower Saxony under grant ZN 3630 and grant by MOST-space. CO and JR ackowledge the support of the Royal Society. This work made use of the NASA SBDB service and PDS datasets ear-c-phot-5-rdr-lowell-comet-db-pr-v1.0 (Osip et al., 2003) and urn:nasa:pds:compil-comet:nuc_properties::1.0 (Barnes et al., 2010). The following software packages were used in this work: matplotlib (Hunter, 2007), numpy (Harris et al., 2020), scipy (Virtanen et al., 2020), pandas (McKinney, 2010), pds3 (Kelley, 2021), pds4_tools (Nagdimunov, 2021), astropy (Astropy Collaboration et al., 2022), astroquery (Ginsburg et al., 2019), sbpy (Mommert et al., 2019) and camelot (Mehta, 2021). For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.

Data Availability

The dataset constructed in this work is available online as Supple- mentary data and also from the University of Edinburgh DataShare repository (https://doi.org/10.7488/ds/7723). The data compiled in this work may also be obtained via reasonable e-mail request to the lead authors.

References

  • A’Hearn et al. (1995) A’Hearn M. F., Millis R. C., Schleicher D. O., Osip D. J., Birch P. V., 1995, Icarus, 118, 223
  • Altenhoff et al. (1999) Altenhoff W. J., et al., 1999, Astronomy and Astrophysics, 348, 1020
  • Astropy Collaboration et al. (2022) Astropy Collaboration et al., 2022, The Astrophysical Journal, 935, 167
  • Bar-Nun et al. (1985) Bar-Nun A., Herman G., Laufer D., Rappaport M. L., 1985, Icarus, 63, 317
  • Bar-Nun et al. (1987) Bar-Nun A., Dror J., Kochavi E., Laufer D., 1987, Phys. Rev. B, 35, 2427
  • Bar-Nun et al. (1988) Bar-Nun A., Kleinfeld I., Kochavi E., 1988, Phys. Rev. B, 38, 7749
  • Bardyn et al. (2017) Bardyn A., et al., 2017, MNRAS, 469, S712
  • Barnes et al. (2010) Barnes T. F., A’Hearn M. F., Kolokolova L., 2010, Properties of Comet Nuclei, Version 2.0, doi:10.26007/CSR5-JW43
  • Bauer et al. (2017) Bauer J. M., et al., 2017, The Astronomical Journal, 154, 53
  • Bischoff et al. (2021) Bischoff D., Gundlach B., Blum J., 2021, MNRAS, 508, 4705
  • Biver et al. (1999) Biver N., et al., 1999, The Astronomical Journal, 118, 1850
  • Biver et al. (2007) Biver N., et al., 2007, Icarus, 187, 253
  • Biver et al. (2011a) Biver N., et al., 2011a, in EPSC-DPS Joint Meeting 2011. p. 938
  • Biver et al. (2011b) Biver N., Bockelée-Morvan D., Colom P., Crovisier J., Paubert G., Weiss A., Wiesemeyer H., 2011b, Astronomy & Astrophysics, 528, A142
  • Biver et al. (2012) Biver N., et al., 2012, Astronomy & Astrophysics, 539, A68
  • Biver et al. (2019) Biver N., et al., 2019, Astronomy & Astrophysics, 630, A19
  • Biver et al. (2021a) Biver N., et al., 2021a, Astronomy & Astrophysics, 648, A49
  • Biver et al. (2021b) Biver N., et al., 2021b, Astronomy & Astrophysics, 651, A25
  • Biver et al. (2022a) Biver N., Russo N. D., Opitom C., Rubin M., 2022a, Chemistry of Comet Atmospheres (arxiv:2207.04800), doi:10.48550/arXiv.2207.04800
  • Biver et al. (2022b) Biver N., Boissier J., Bockelée-Morvan D., Crovisier J., Cottin H., Cordiner M. A., Roth N. X., Moreno R., 2022b, Astronomy & Astrophysics, 668, A171
  • Bockelée-Morvan et al. (2000) Bockelée-Morvan D., et al., 2000, Astronomy and Astrophysics, 353, 1101
  • Bockelée-Morvan et al. (2004) Bockelée-Morvan D., et al., 2004, Icarus, 167, 113
  • Bockelée-Morvan et al. (2010) Bockelée-Morvan D., et al., 2010, Astronomy and Astrophysics, 518, L149
  • Bockelée-Morvan et al. (2016) Bockelée-Morvan D., et al., 2016, Monthly Notices of the Royal Astronomical Society, 462, S170
  • Bockelée-Morvan et al. (2022) Bockelée-Morvan D., et al., 2022, Astronomy & Astrophysics, 664, A95
  • Bodewits et al. (2011) Bodewits D., Kelley M. S., Li J.-Y., Landsman W. B., Besse S., A’Hearn M. F., 2011, The Astrophysical Journal, 733, L3
  • Bodewits et al. (2022) Bodewits D., Bonev B. P., Cordiner M. A., Villanueva G. L., 2022, Radiative Processes as Diagnostics of Cometary Atmospheres (arxiv:2209.02616)
  • Boehnhardt (2002) Boehnhardt H., 2002, Earth Moon and Planets, 89, 91
  • Boehnhardt (2004) Boehnhardt H., 2004, in , Comets II. University of Arizona Press, p. 301
  • Boehnhardt et al. (1999) Boehnhardt H., Rainer N., Birkle K., Schwehm G., 1999, Astronomy and Astrophysics, v.341, p.912-917 (1999), 341, 912
  • Boehnhardt et al. (2002) Boehnhardt H., et al., 2002, Astronomy & Astrophysics, 387, 1107
  • Boehnhardt et al. (2008) Boehnhardt H., Tozzi G. P., Bagnulo S., Muinonen K., Nathues A., Kolokolova L., 2008, Astronomy & Astrophysics, 489, 1337
  • Boissier et al. (2011) Boissier J., et al., 2011, Astronomy & Astrophysics, 528, A54
  • Boissier et al. (2013) Boissier J., et al., 2013, Astronomy & Astrophysics, 557, A88
  • Bonev et al. (2021) Bonev B. P., et al., 2021, The Planetary Science Journal, 2, 45
  • Buratti et al. (2004) Buratti B., Hicks M., Soderblom L., Britt D., Oberst J., Hillier J., 2004, Icarus, 167, 16
  • Capria et al. (2017) Capria M. T., et al., 2017, Monthly Notices of the Royal Astronomical Society, 469, S685
  • Carmack et al. (2023) Carmack R. A., Tribbett P. D., Loeffler M. J., 2023, ApJ, 942, 1
  • Choi et al. (2002) Choi Y.-J., Cohen M., Merk R., Prialnik D., 2002, Icarus, 160, 300
  • Choukroun et al. (2020) Choukroun M., et al., 2020, Space Sci. Rev., 216, 44
  • Cochran & McKay (2018) Cochran A. L., McKay A. J., 2018, ApJ, 854, L10
  • Cochran et al. (2012) Cochran A., Barker E., Gray C., 2012, Icarus, 218, 144
  • Collings et al. (2003) Collings M. P., Dever J. W., Fraser H. J., McCoustra M. R. S., Williams D. A., 2003, ApJ, 583, 1058
  • Combi et al. (2019) Combi M., Mäkinen T., Bertaux J.-L., Quémerais E., Ferron S., 2019, Icarus, 317, 610
  • Combi et al. (2021) Combi M. R., Mäkinen T., Bertaux J.-L., Quémerais E., Ferron S., 2021, The Astrophysical Journal Letters, 907, L38
  • Davidsson (2021) Davidsson B. J. R., 2021, Monthly Notices of the Royal Astronomical Society, 505, 5654
  • De Sanctis et al. (2001) De Sanctis M. C., Capria M. T., Coradini A., 2001, The Astronomical Journal, 121, 2792
  • Dello Russo et al. (2016) Dello Russo N., Kawakita H., Vervack R. J., Weaver H. A., 2016, Icarus, 278, 301
  • Dello Russo et al. (2020) Dello Russo N., et al., 2020, Icarus, 335, 113411
  • DiSanti et al. (2014) DiSanti M. A., Villanueva G. L., Paganini L., Bonev B. P., Keane J. V., Meech K. J., Mumma M. J., 2014, Icarus, 228, 167
  • DiSanti et al. (2017) DiSanti M., et al., 2017, Central Bureau Electronic Telegrams, 4357, 1
  • Drozdovskaya et al. (2023) Drozdovskaya M. N., et al., 2023, Astronomy & Astrophysics, 677, A157
  • Eisner et al. (2019) Eisner N. L., Knight M. M., Snodgrass C., Kelley M. S. P., Fitzsimmons A., Kokotanekova R., 2019, The Astronomical Journal, 157, 186
  • Faggi et al. (2019) Faggi S., Mumma M. J., Villanueva G. L., Paganini L., Lippi M., 2019, The Astronomical Journal, 158, 254
  • Faggi et al. (2021) Faggi S., Lippi M., Camarca M., Buzard C. F., Villanueva G. L., Doppmann G. W., Blake G. A., Mumma M. J., 2021, The Astronomical Journal, 162, 178
  • Farnham et al. (2017) Farnham T., Kelley M. S., Bodewits D., Bauer J. M., 2017, in AAS/Division for Planetary Sciences Meeting. p. 403.01
  • Feaga et al. (2013) Feaga L. M., et al., 2013, The Astronomical Journal, 147, 24
  • Feldman et al. (2018) Feldman P. D., et al., 2018, The Astronomical Journal, 155, 9
  • Fernández et al. (2013) Fernández Y. R., et al., 2013, Icarus, 226, 1138
  • Filacchione et al. (2019) Filacchione G., et al., 2019, Space Science Reviews, 215, 19
  • Fink (2009) Fink U., 2009, Icarus, 201, 311
  • Fink & Hicks (1996) Fink U., Hicks M. D., 1996, ApJ, 459, 729
  • Fomenkova & Chang (1993) Fomenkova M., Chang S., 1993, in Lunar and Planetary Science Conference. Lunar and Planetary Science Conference. p. 501
  • Fomenkova et al. (1995) Fomenkova M. N., Jones B., Pina R., Puetter R., Sarmecanic J., Gehrz R., Jones T., 1995, The Astronomical Journal, 110, 1866
  • Fulle et al. (2016) Fulle M., et al., 2016, ApJ, 821, 19
  • Fulle et al. (2017) Fulle M., et al., 2017, MNRAS, 469, S45
  • Fulle et al. (2019) Fulle M., et al., 2019, MNRAS, 482, 3326
  • Gálvez et al. (2007) Gálvez O., Ortega I. K., Maté B., Moreno M. A., Martın-Llorente B., Herrero V. J., Escribano R., Gutiérrez P. J., 2007, Astronomy and Astrophysics, 472, 691
  • Gálvez et al. (2008) Gálvez Ó., Maté B., Herrero V. J., Escribano R., 2008, Icarus, 197, 599
  • Gicquel et al. (2015) Gicquel A., et al., 2015, The Astrophysical Journal, 807, 19
  • Ginsburg et al. (2019) Ginsburg A., et al., 2019, The Astronomical Journal, 157, 98
  • Groussin et al. (2010) Groussin O., Lamy P., Jorda L., 2010, Planetary and Space Science, 58, 904
  • Groussin et al. (2019) Groussin O., et al., 2019, Space Science Reviews, 215, 29
  • Gundlach et al. (2011) Gundlach B., Skorov Y. V., Blum J., 2011, Icarus, 213, 710
  • Gundlach et al. (2020) Gundlach B., Fulle M., Blum J., 2020, Monthly Notices of the Royal Astronomical Society, 493, 3690
  • Güttler et al. (2023) Güttler C., et al., 2023, MNRAS, 524, 6114
  • Harker et al. (2023) Harker D. E., Wooden D. H., Kelley M. S. P., Woodward C. E., 2023, The Planetary Science Journal, 4, 242
  • Harmon et al. (1997) Harmon J. K., et al., 1997, Science, 278, 1921
  • Harmon et al. (2008) Harmon J. K., Nolan M. C., Howell E. S., Giorgini J. D., 2008, International Astronomical Union Circular, 8909, 1
  • Harrington Pinto et al. (2022) Harrington Pinto O., Womack M., Fernandez Y., Bauer J., 2022, The Planetary Science Journal, 3, 247
  • Harris (1998) Harris A. W., 1998, Icarus, 131, 291
  • Harris et al. (2002) Harris W. M., Scherb F., Mierkiewicz E., Oliversen R., Morgenthaler J., 2002, The Astrophysical Journal, 578, 996
  • Harris et al. (2020) Harris C. R., et al., 2020, Nature, 585, 357
  • Herrero et al. (2010) Herrero V. J., Gálvez Ó., Maté B., Escribano R., 2010, Physical Chemistry Chemical Physics, 12, 3164
  • Hu et al. (2019) Hu X., Gundlach B., von Borstel I., Blum J., Shi X., 2019, Astronomy and Astrophysics, 630, A5
  • Hunter (2007) Hunter J. D., 2007, Computing in Science & Engineering, 9, 90
  • Ivanova et al. (2018) Ivanova O. V., Picazzio E., Luk’yanyk I. V., Cavichia O., Andrievsky S. M., 2018, Planet. Space Sci., 157, 34
  • Jewitt (2009) Jewitt D., 2009, The Astronomical Journal, 137, 4296
  • Jewitt (2022) Jewitt D., 2022, The Astronomical Journal, 164, 158
  • Jorda et al. (2016) Jorda L., et al., 2016, Icarus, 277, 257
  • Kelley (2021) Kelley M., 2021, pds3, https://github.com/mkelley/pds3
  • Kelley & Kolokolova (2014) Kelley M., Kolokolova L., 2014, in Proceedings of Asteroids, Comets, Meteors 2014. p. 262
  • Knight et al. (2023) Knight M. M., Kokotanekova R., Samarasinha N. H., 2023, Physical and Surface Properties of Comet Nuclei from Remote Observations, doi:10.48550/arXiv.2304.09309
  • Korsun et al. (2008) Korsun P. P., Ivanova O. V., Afanasiev V. L., 2008, Icarus, 198, 465
  • Kumi et al. (2006) Kumi G., Malyk S., Hawkins S., Reisler H., Wittig C., 2006, Journal of Physical chemistry A, 110, 2097
  • Lamy et al. (2004) Lamy P. L., Toth I., Fernandez Y. R., Weaver H. A., 2004, in , Comets II. p. 223
  • Lamy et al. (2009) Lamy P. L., Toth I., Weaver H. A., A’Hearn M. F., Jorda L., 2009, Astronomy & Astrophysics, 508, 1045
  • Lamy et al. (2011) Lamy P. L., Toth I., Weaver H. A., A’Hearn M. F., Jorda L., 2011, Monthly Notices of the Royal Astronomical Society, 412, 1573
  • Langland-Shula & Smith (2011) Langland-Shula L. E., Smith G. H., 2011, Icarus, 213, 280
  • Laufer et al. (1987) Laufer D., Kochavi E., Bar-Nun A., 1987, Phys. Rev. B, 36, 9219
  • Lejoly et al. (2022) Lejoly C., et al., 2022, The Planetary Science Journal, 3, 17
  • Levison (1996) Levison H. F., 1996, in Rettig T., Hahn J. M., eds, Astronomical Society of the Pacific Conference Series Vol. 107, Completing the Inventory of the Solar System. pp 173–191
  • Li et al. (2020) Li J., Jewitt D., Mutchler M., Agarwal J., Weaver H., 2020, The Astronomical Journal, 159, 209
  • Lippi et al. (2021) Lippi M., Villanueva G. L., Mumma M. J., Faggi S., 2021, The Astronomical Journal, 162, 74
  • Lis et al. (2019) Lis D. C., et al., 2019, Astronomy & Astrophysics, 625, L5
  • Lisse et al. (2021) Lisse C. M., et al., 2021, Icarus, 356, 114072
  • Malamud et al. (2022) Malamud U., Landeck W. A., Bischoff D., Kreuzig C., Perets H. B., Gundlach B., Blum J., 2022, Monthly Notices of the Royal Astronomical Society, 514, 3366
  • Marsden & Sekanina (1974) Marsden B. G., Sekanina Z., 1974, AJ, 79, 413
  • Maté et al. (2008) Maté B., Gálvez O., Martín-Llorente B., Moreno M. A., Herrero V. J., Escribano R., Artacho E., 2008, Journal of Physical chemistry A, 112, 457
  • Mazzotta Epifani et al. (2008) Mazzotta Epifani E., Palumbo P., Capria M. T., Cremonese G., Fulle M., Colangeli L., 2008, Monthly Notices of the Royal Astronomical Society, 390, 265
  • McKay et al. (2015) McKay A. J., et al., 2015, Icarus, 250, 504
  • McKay et al. (2019) McKay A. J., et al., 2019, The Astronomical Journal, 158, 128
  • McKinney (2010) McKinney W., 2010, in Python in Science Conference. Austin, Texas, pp 56–61, doi:10.25080/Majora-92bf1922-00a
  • Mehta (2021) Mehta V., 2021, camelot, https://pypi.org/project/camelot-py/
  • Mommert et al. (2019) Mommert M., et al., 2019, Journal of Open Source Software, 4, 1426
  • Morbidelli & Rickman (2015) Morbidelli A., Rickman H., 2015, Astronomy & Astrophysics, 583, A43
  • Moulane et al. (2018) Moulane Y., Jehin E., Opitom C., Pozuelos F. J., Manfroid J., Benkhaldoun Z., Daassou A., Gillon M., 2018, Astronomy & Astrophysics, 619, A156
  • Mumma et al. (2005) Mumma M. J., et al., 2005, Science, 310, 270
  • Nagdimunov (2021) Nagdimunov L., 2021, pds4-tools, https://pypi.org/project/pds4-tools/
  • Ootsubo et al. (2012) Ootsubo T., et al., 2012, The Astrophysical Journal, 752, 15
  • Opitom et al. (2016) Opitom C., et al., 2016, Astronomy & Astrophysics, 589, A8
  • Opitom et al. (2019) Opitom C., et al., 2019, A&A, 624, A64
  • Osip et al. (2003) Osip D. J., A’Hearn M., Raugh A. C., 2003, Lowell Observatory Cometary Database - Production Rates, doi:10.26007/0A3F-R875
  • Paganini et al. (2012) Paganini L., Mumma M. J., Villanueva G. L., DiSanti M. A., Bonev B. P., Lippi M., Boehnhardt H., 2012, The Astrophysical Journal, 748, L13
  • Parhi & Prialnik (2023) Parhi A., Prialnik D., 2023, MNRAS, 522, 2081
  • Pittichová et al. (2008) Pittichová J., Woodward C. E., Kelley M. S., Reach W. T., 2008, The Astronomical Journal, 136, 1127
  • Prialnik & Bar-Nun (1987) Prialnik D., Bar-Nun A., 1987, ApJ, 313, 893
  • Prialnik & Bar-Nun (1992) Prialnik D., Bar-Nun A., 1992, A&A, 258, L9
  • Prialnik et al. (1987) Prialnik D., Bar-Nun A., Podolak M., 1987, The Astrophysical Journal, 319, 993
  • Reach et al. (2013) Reach W. T., Kelley M. S., Vaubaillon J., 2013, Icarus, 226, 777
  • Rosser et al. (2018) Rosser J. D., et al., 2018, The Astronomical Journal, 155, 164
  • Roth et al. (2018) Roth N. X., Gibb E. L., Bonev B. P., DiSanti M. A., Dello Russo N., Vervack Jr. R. J., McKay A. J., Kawakita H., 2018, The Astronomical Journal, 156, 251
  • Roth et al. (2020) Roth N. X., et al., 2020, The Astronomical Journal, 159, 42
  • Rotundi et al. (2015) Rotundi A., et al., 2015, Science, 347, 3905
  • Rubin et al. (2015) Rubin M., et al., 2015, Science, 348, 232
  • Rubin et al. (2019) Rubin M., et al., 2019, Monthly Notices of the Royal Astronomical Society, 489, 594
  • Rubin et al. (2023) Rubin M., et al., 2023, Monthly Notices of the Royal Astronomical Society, 526, 4209
  • Schleicher (2008) Schleicher D. G., 2008, The Astronomical Journal, 136, 2204
  • Schuller & Struve (1930) Schuller Fr., Struve G., 1930, International Astronomical Union Circular, 288, 2
  • Schweighart et al. (2021) Schweighart M., Macher W., Kargl G., Gundlach B., Capelo H. L., 2021, Monthly Notices of the Royal Astronomical Society, 504, 5513
  • Scotti (1994) Scotti J. V., 1994, in American Astronomical Society Meeting Abstracts. p. 43.06
  • Sekanina et al. (2004) Sekanina Z., Brownlee D. E., Economou T. E., Tuzzolino A. J., Green S. F., 2004, Science, 304, 1769
  • Simon et al. (2023) Simon A., Rajappan M., Oberg K., 2023, ApJ, 955
  • Tancredi et al. (2000) Tancredi G., Fernández J. A., Rickman H., Licandro J., 2000, Astronomy and Astrophysics Supplement Series, 146, 73
  • Thomas et al. (2013a) Thomas P., et al., 2013a, Icarus, 222, 453
  • Thomas et al. (2013b) Thomas P. C., et al., 2013b, Icarus, 222, 550
  • Villanueva et al. (2012) Villanueva G., Mumma M., DiSanti M., Bonev B., Paganini L., Blake G., 2012, Icarus, 220, 291
  • Virtanen et al. (2020) Virtanen P., et al., 2020, Nature Methods, 17, 261
  • Weaver et al. (2011) Weaver H. A., Feldman P. D., A’Hearn M. F., Russo N. D., Stern S. A., 2011, The Astrophysical Journal Letters, 734, L5
  • Weissman et al. (2008) Weissman P. R., Choi Y. J., Lowry S. C., 2008, in AAS/Division for Planetary Sciences Meeting Abstracts #40. p. 2.03
  • Womack et al. (2017) Womack M., Sarid G., Wierzchos K., 2017, Publications of the Astronomical Society of the Pacific, 129, 031001
  • Woodward et al. (2021) Woodward C. E., Wooden D. H., Harker D. E., Kelley M. S. P., Russell R. W., Kim D. L., 2021, The Planetary Science Journal, 2, 25

Supporting Information

Supplementary data are available at MNRAS online.

Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

Appendix A Comet Radius Sources

Source Number Method
Lamy et al. (2004) 22 Compilation (Photometric/Thermal)
Bauer et al. (2017) 20 Thermal
Fernández et al. (2013) 15 Thermal
Tancredi et al. (2000) 8 Photometric
Lamy et al. (2009) 7 Photometric
Lamy et al. (2011) 2 Photometric
Rosser et al. (2018) 2 Thermal
Scotti (1994) 1 Photometric
Weissman et al. (2008) 1 Photometric
Boehnhardt et al. (1999) 1 Photometric
Boehnhardt et al. (2008) 1 Photometric
Boehnhardt et al. (2002) 1 Photometric
Eisner et al. (2019) 1 Photometric
Mazzotta Epifani et al. (2008) 1 Photometric
Harmon et al. (2008) 1 Radar
Harmon et al. (1997) 1 Radar
Lejoly et al. (2022) 1 Radar
Buratti et al. (2004) 1 Spacecraft
Farnham et al. (2017) 1 Spacecraft
Jorda et al. (2016) 1 Spacecraft
Sekanina et al. (2004) 1 Spacecraft
Thomas et al. (2013a) 1 Spacecraft
Thomas et al. (2013b) 1 Spacecraft
J. Bauer (unpubl. data) 1 Thermal
Boissier et al. (2013) 1 Thermal
Fomenkova et al. (1995) 1 Thermal
Groussin et al. (2010) 1 Thermal
Pittichová et al. (2008) 1 Thermal
Table 5: Table summarising the sources used for comet radii in the combined composition-size dataset. For each source we state the number of comet size measurements used in this study and also the general method by which these sizes were obtained. These counts include the objects with only limits on size.

In general it was a simple procedure to apply our guidelines for comet size selection from a range of literature sources (Section 2.1.6). However for a small number of objects there were conflicting measurements or more detailed circumstances that complicated the size selection, which we describe in more detail here.

For comets 7P, 17P, 37P, 64P, 109P and 116P we rejected smaller nucleus sizes measured by photometric methods in favour of larger sizes from thermal observations (primarily from Fernández et al., 2013; Bauer et al., 2017). With the exception of the thermal size of 109P by Fomenkova et al. (1995) these measurements made use of more modern data than the earlier photometric observations. Furthermore our selected sources provided uncertainties on the nucleus size whereas this was not always the case for the photometric measurements (most of which were from Lamy et al., 2004). Often the lower photometric nucleus size was consistent with the larger thermal estimate when the uncertainties were taken into account. Furthermore, for most of these objects our selected size is the same as that selected by the literature compilation of Knight et al. (2023).

Likewise, C/2009 P1 has a radius measurement of r=13.5±2.5km𝑟plus-or-minus13.52.5kmr=13.5\pm 2.5\ $\mathrm{k}\mathrm{m}$italic_r = 13.5 ± 2.5 roman_km (Bauer et al., 2017) which is in conflict with an upper limit of r<5.6km𝑟5.6kmr<5.6\ $\mathrm{k}\mathrm{m}$italic_r < 5.6 roman_km from a non-detection with IRAM on 04/03/2012 by Boissier et al. (2013). These measurements could be explained if the nucleus of C/2009 P1 is elongated and presented a smaller cross-section during the IRAM observations. As such we combine the two measurements into a single size estimate by taking the mean and using the range to define the uncertainty such that r=9.6±4km𝑟plus-or-minus9.64kmr=9.6\pm 4\ $\mathrm{k}\mathrm{m}$italic_r = 9.6 ± 4 roman_km.

A thermal size of r=2.465±0.135km𝑟plus-or-minus2.4650.135kmr=2.465\pm 0.135\ $\mathrm{k}\mathrm{m}$italic_r = 2.465 ± 0.135 roman_km was measured for comet 10P from NEOWISE data (Bauer et al., 2017), however, this is smaller than a photometric size of r=5.98±0.04km𝑟plus-or-minus5.980.04kmr=5.98\pm 0.04\ $\mathrm{k}\mathrm{m}$italic_r = 5.98 ± 0.04 roman_km from HST observations (Lamy et al., 2011). It would appear that the comet was active and trailed in the NEOWISE data which may have led to a smaller size estimate (R. Kokotanekova, personal communication). Therefore we followed Knight et al. (2023) in selecting the larger photometric size.

The nucleus of 45P was imaged by radar and found to have a radius in the range r=0.60.65km𝑟0.60.65kmr=0.6-0.65\ $\mathrm{k}\mathrm{m}$italic_r = 0.6 - 0.65 roman_km (DiSanti et al., 2017; Lejoly et al., 2022). In order to include this object in our analysis we used the centroid of this range for the radius and the lower/upper bounds for the uncertainty, resulting in r=0.625±0.025km𝑟plus-or-minus0.6250.025kmr=0.625\pm 0.025\ $\mathrm{k}\mathrm{m}$italic_r = 0.625 ± 0.025 roman_km. In addition, a personal communication mentioned in Lejoly et al. (2022) describes a radar diameter of 1.4km1.4km1.4\ $\mathrm{k}\mathrm{m}$1.4 roman_km for 46P. Although we consider radar measurements to be preferable to other remote observations we could find no further details of this measurement in the available literature. Therefore we used the radius of r=0.56±0.04km𝑟plus-or-minus0.560.04kmr=0.56\pm 0.04\ $\mathrm{k}\mathrm{m}$italic_r = 0.56 ± 0.04 roman_km from Boehnhardt et al. (2002) which was also selected by Knight et al. (2023).

In the compilation of Lis et al. (2019) the nucleus radius of comet 73P is given as r=1.10±0.03km𝑟plus-or-minus1.100.03kmr=1.10\pm 0.03\ $\mathrm{k}\mathrm{m}$italic_r = 1.10 ± 0.03 roman_km. This measurement was ultimately derived from photometric observations by Boehnhardt et al. (1999) which were made before fragmentation of the comet nucleus in 1995. However, in this work it is clearly shown that comet 73P was active at the time of these observations and Boehnhardt et al. (1999) stated that the nuclear radius of 73P must be <1.1kmabsent1.1km<1.1\ $\mathrm{k}\mathrm{m}$< 1.1 roman_km. We did not account for size limits in our methods, and given the likelihood of an earlier fragmentation event (Schuller & Struve, 1930) this object was not included in our analysis.

In Table 5 we present a summary of the different sources used to obtain comet radii for this analysis. Several sources present comet radii without formal uncertainties, however there is a clear power law relation between radius and the associated uncertainty (Figure 8). We use this relation to assign approximate uncertainties to radius measurements without them in the compiled dataset.

Refer to caption
Figure 8: Relation between the reported comet nucleus radius and associated uncertainty from the sources searched in this work. We have fit a linear relation in log-log space to allow us to assign approximate uncertainties to radius measurements missing them.

Appendix B Composition - Size Data Table

Here we present a sample of the complete dataset used in this study, compiled as described in Section 2. Each row contains a single abundance measurement of species X for a particular comet, where abundance is given with respect to either \chH2O or \chCN. The circumstances of the compositional observation are provided, and the literature sources of both the composition and size measurement are stated. The dataset contains 909 unique species measurements for 96 unique comets with sizes; this includes measurements of composition/size with limits and comets with a known fragmentation history. In our analysis we rejected limits and split comets and were left with 710 composition measurements for 69 comets. Table 6 gives a sample of selected rows and columns from the full dataset, which is available online as Supplementary data and at this link.

Type Designation Number Name Date(MJD) rhsubscript𝑟r_{h}italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT(au) X X/\chH2O σX/H2Osubscript𝜎X/H2O\sigma_{\textrm{X/H2O}}italic_σ start_POSTSUBSCRIPT X/H2O end_POSTSUBSCRIPT Composition Source r𝑟ritalic_r(km) σrsubscript𝜎𝑟\sigma_{r}italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT(km) Radius Source
P 49 Arend-Rigaux 46035.0 1.56 \chAfρ𝜌\rhoitalic_ρ 5.881e-26 4.7e-27 A’Hearn et al. (1995) 3.21 0.37 Bauer et al. (2017)
P 49 Arend-Rigaux 46035.0 1.56 \chC2 1.817e-03 1.8e-04 A’Hearn et al. (1995) 3.21 0.37 Bauer et al. (2017)
P 49 Arend-Rigaux 46035.0 1.56 \chC3 2.567e-04 2.6e-05 A’Hearn et al. (1995) 3.21 0.37 Bauer et al. (2017)
P 49 Arend-Rigaux 46035.0 1.56 \chCN 2.087e-03 1.3e-04 A’Hearn et al. (1995) 3.21 0.37 Bauer et al. (2017)
P 49 Arend-Rigaux 46035.0 1.56 \chNH 1.620e-03 2.6e-03 A’Hearn et al. (1995) 3.21 0.37 Bauer et al. (2017)
P 59 Kearns-Kwee 45599.0 1.00 \chC2 3.626e-03 3.1e-03 Cochran et al. (2012) 0.79 0.03 Lamy et al. (2009)
P 59 Kearns-Kwee 45599.0 1.00 \chC3 4.461e-04 4.0e-04 Cochran et al. (2012) 0.79 0.03 Lamy et al. (2009)
P 59 Kearns-Kwee 45599.0 1.00 \chNH 1.046e-02 7.0e-03 Cochran et al. (2012) 0.79 0.03 Lamy et al. (2009)
P 59 Kearns-Kwee 46578.0 2.23 \chAfρ𝜌\rhoitalic_ρ 3.710e-26 3.3e-27 A’Hearn et al. (1995) 0.79 0.03 Lamy et al. (2009)
P 59 Kearns-Kwee 46578.0 2.23 \chCN 1.735e-03 1.7e-04 A’Hearn et al. (1995) 0.79 0.03 Lamy et al. (2009)
P 65 Gunn 46547.0 2.64 \chAfρ𝜌\rhoitalic_ρ 9.537e-27 1.7e-27 A’Hearn et al. (1995) 4.80 1.02 Bauer et al. (2017)
P 65 Gunn 46547.0 2.64 \chC2 1.735e-04 4.7e-05 A’Hearn et al. (1995) 4.80 1.02 Bauer et al. (2017)
P 65 Gunn 46547.0 2.64 \chC3 2.135e-04 1.1e-04 A’Hearn et al. (1995) 4.80 1.02 Bauer et al. (2017)
P 65 Gunn 46547.0 2.64 \chCN 4.780e-04 8.6e-05 A’Hearn et al. (1995) 4.80 1.02 Bauer et al. (2017)
P 65 Gunn 46547.0 2.64 \chNH 8.698e-04 3.3e-03 A’Hearn et al. (1995) 4.80 1.02 Bauer et al. (2017)
P 88 Howell 44728.0 2.09 \chAfρ𝜌\rhoitalic_ρ 5.363e-26 4.3e-27 A’Hearn et al. (1995) 1.00 Tancredi et al. (2000)
P 88 Howell 44728.0 2.09 \chC2 2.396e-03 2.4e-04 A’Hearn et al. (1995) 1.00 Tancredi et al. (2000)
P 88 Howell 44728.0 2.09 \chC3 1.583e-04 1.6e-05 A’Hearn et al. (1995) 1.00 Tancredi et al. (2000)
P 88 Howell 44728.0 2.09 \chCN 2.880e-03 2.3e-04 A’Hearn et al. (1995) 1.00 Tancredi et al. (2000)
P 88 Howell 55015.1 1.74 \chCO2 2.495e-01 5.0e-02 Ootsubo et al. (2012) 1.00 Tancredi et al. (2000)
C 1983 J1 Sugano-Saigusa-Fujikawa 45455.0 0.74 \chAfρ𝜌\rhoitalic_ρ 3.797e-27 1.9e-28 A’Hearn et al. (1995) 0.37 Lamy et al. (2004)
C 1983 J1 Sugano-Saigusa-Fujikawa 45455.0 0.74 \chC2 5.881e-03 1.8e-04 A’Hearn et al. (1995) 0.37 Lamy et al. (2004)
C 1983 J1 Sugano-Saigusa-Fujikawa 45455.0 0.74 \chC3 6.018e-05 3.6e-06 A’Hearn et al. (1995) 0.37 Lamy et al. (2004)
C 1983 J1 Sugano-Saigusa-Fujikawa 45455.0 0.74 \chCN 2.947e-03 8.8e-05 A’Hearn et al. (1995) 0.37 Lamy et al. (2004)
C 1983 J1 Sugano-Saigusa-Fujikawa 45455.0 0.74 \chNH 2.627e-03 1.8e-04 A’Hearn et al. (1995) 0.37 Lamy et al. (2004)
C 2006 W3 Christensen 54909.9 3.40 \chCO2 7.204e-01 1.5e-01 Ootsubo et al. (2012) 21.88 4.20 Bauer et al. (2017)
C 2006 W3 Christensen 54909.9 3.40 \chCO 2.296e+00 4.6e-01 Ootsubo et al. (2012) 21.88 4.20 Bauer et al. (2017)
C 2006 W3 Christensen 55073.3 3.22 \chCH3OH 3.355e-02 6.7e-03 Bockelée-Morvan et al. (2010) 21.88 4.20 Bauer et al. (2017)
C 2006 W3 Christensen 55073.3 3.22 \chCS 1.118e-03 4.5e-04 Bockelée-Morvan et al. (2010) 21.88 4.20 Bauer et al. (2017)
C 2006 W3 Christensen 55073.3 3.22 \chH2S 2.237e-02 2.2e-03 Bockelée-Morvan et al. (2010) 21.88 4.20 Bauer et al. (2017)
C 2006 W3 Christensen 55073.3 3.22 \chHCN 3.579e-03 8.5e-04 Bockelée-Morvan et al. (2010) 21.88 4.20 Bauer et al. (2017)
Table 6: A sample of selected rows and columns from the comet composition-size dataset used in this work. The columns presented are the comet identifiers (type, designation and number), details of the compositional measurement (date and heliocentric distance rhsubscript𝑟r_{h}italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT) and the abundance of species X relative to water (X/\chH2O and the corresponding uncertainty σX/H2Osubscript𝜎X/H2O\sigma_{\textrm{X/H2O}}italic_σ start_POSTSUBSCRIPT X/H2O end_POSTSUBSCRIPT if available) with the source of the compositional measurement. Size information is provided as radius, r𝑟ritalic_r, with uncertainty σrsubscript𝜎𝑟\sigma_{r}italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT (if available) alongside the literature source of the measurement. The full table with all rows and additional columns is available online as Supplementary data and at this link.

Appendix C Additional composition data

This annex presents additional figures for the composition vs radius of our sample of comets for daughter species, compared to the parent species considered in the main analysis. Figure 9 shows the daughter species abundance (relative to \chH2O) as a function of comet radius. The full results of the Pearson correlation tests for each daughter species, and the dynamical sub-populations in the dataset, are provided in Table 7.

In addition we have considered the daughter species abundance relative to \chCN instead of \chH2O (Figure 10). The results of the Pearson correlation analysis for this dataset are provided in 8. We note that a moderately significant correlation for \chCH/\chCN is visible in these data, while no correlation was seen for \chCH/\chH2O. However, this is likely a small number statistics effect as we have only a handful of measurements for \chCH/\chH2O.

Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption
Figure 9: Log scale plots showing the relation between comet composition (of various daughter species relative to \chH2O) and radius of the nucleus. The meanings of the markers and lines are the same as Figure 1 and the corresponding Pearson correlation coefficients are provided in Table 7.
Ecliptic Comets Nearly Isotropic Comets All Comets
Species Number Correlation p𝑝pitalic_p-value Number Correlation p𝑝pitalic_p-value Number Correlation p𝑝pitalic_p-value
\chAfρ𝜌\rhoitalic_ρ/H2O 38 0.2028 0.2221 10 0.6396 0.0464 48 0.1885 0.1994
\chC2/H2O 40 0.0209 0.8981 11 0.0345 0.9198 51 0.1114 0.4366
\chC3/H2O 35 -0.0829 0.6360 11 0.6139 0.0445 46 0.0995 0.5107
\chCH/H2O 7 0.1965 0.6728 2 -1.0000 1.0000 9 0.3612 0.3395
\chCN/H2O 38 -0.0668 0.6902 11 -0.0737 0.8294 49 -0.0819 0.5757
\chCS/H2O 6 0.7663 0.0755 3 -0.1783 0.8859 9 0.6955 0.0375
\chNH2/H2O 17 -0.0381 0.8846 5 -0.4985 0.3926 22 -0.0758 0.7373
\chNH/H2O 32 -0.3465 0.0520 11 -0.0019 0.9955 43 -0.2190 0.1583
Table 7: Table showing the Pearson test results for daughter species abundance relative to \chH2O, including the number of comets tested, correlation coefficients and associated p𝑝pitalic_p-values Similar to Table 2 results are shown for the ecliptic comets, nearly isotropic comets and all objects when considered together.
Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption
Refer to caption
Figure 10: Log scale plots showing the relation between comet composition (of various daughter species relative to \chCN) and radius of the nucleus. Marker shape, colour and plotted lines have the same meanings as Figure 1. The results of the Pearson correlation tests are provided in Table 8.
Ecliptic Comets Nearly Isotropic Comets All Comets
Species Number Correlation p𝑝pitalic_p-value Number Correlation p𝑝pitalic_p-value Number Correlation p𝑝pitalic_p-value
\chAfρ𝜌\rhoitalic_ρ/CN 38 0.2307 0.1636 11 0.3078 0.3571 49 0.2826 0.0491
\chC2/CN 45 0.1089 0.4763 11 0.2958 0.3771 56 0.2203 0.1028
\chC3/CN 42 -0.0025 0.9872 11 0.2747 0.4136 53 0.1623 0.2456
\chCH/CN 14 -0.0085 0.9771 7 0.4812 0.2743 21 0.3369 0.1354
\chNH2/CN 21 -0.2368 0.3014 9 -0.1134 0.7715 30 0.0669 0.7254
\chNH/CN 32 -0.2019 0.2678 10 0.0736 0.8399 42 -0.0413 0.7953
\chOH/CN 13 -0.2573 0.3960 5 -0.2853 0.6418 18 0.0146 0.9541
Table 8: Table showing species abundance relative to \chCN, number of comets for which we have an abundance measurement, Pearson correlation coefficients for the abundance vs nucleus size and associated p𝑝pitalic_p-values. Results are shown for the ecliptic comets (EC), nearly isotropic comets (NIC) and all objects for which a radius and composition estimate are available. Similar to Table 2 the strong, moderate and marginal significance correlations are highlighted.

Appendix D \chCO/\chH2O data

In Table 9 we present the complete data for \chCO/\chH2O abundance ratios and sizes of the comets used in our correlation analysis and the detailed discussion in Section 4.1. In this subset of the full composition - size dataset (a sample of which of which is presented in Table 6) we have already rejected measurements that are only limits, and any comets with a known fragmentation history prior to measurement. We followed the methodology described in Section 2.2.2 to select a specific source for the abundance ratio when there were multiple measurements available. These steps preferentially selected the source with the largest number of measurements for unique comets and species. We did this to try compile as homogeneous a dataset as possible given the range of literature sources available.

Most of the \chCO/\chH2O abundance ratios were selected from the large scale study by Dello Russo et al. (2016). This work is a compilation of the abundance of 8 volatile molecules for 30 comets determined from a database of high resolution infrared spectroscopy taken from 1997 - 2013 using a variety of telescopes/instruments. In our methodology this source was frequently selected due to its large size and number of different species measured in a consistent manner, thus helping to increase the consistency across our literature-complied dataset. Furthermore, we note that for many comets with multiple sources the abundance ratios are of a similar value to that of Dello Russo et al. (2016). In this work we repeated our analysis while selecting different abundances from different sources and found that variation in the value of the logarithm abundance was small and so there was little change to the trends discussed in Section 3.

Within the dataset of Dello Russo et al. (2016) we highlight some notable abundance ratios. The observations from which \chCO/\chH2O was determined for 9P were taken shortly after the collision of the Deep Impact spacecraft with the nucleus of 9P on 04/07/2005. There were no pre-impact measurements of the \chCO abundance for direct comparison, however Biver et al. (2007) did not observe significant changes in the abundance of \chHCN and only a possible increase for \chCH3OH. Likewise Mumma et al. (2005) found no changes in the abundances of \chHCN, \chCH3OH, but they did observe a significant increase for \chC2H6. For comet 9P measurements of \chCO/\chH2O were also available from Biver et al. (2007); Lippi et al. (2021) and the literature compilation of Harrington Pinto et al. (2022). However these sources either published upper limits on the abundance ratio, or did not include uncertainties, therefore the value from Dello Russo et al. (2016) was selected. Likewise, for comet 103P additional measurements of \chCO/\chH2O are available from Harrington Pinto et al. (2022), but with no uncertainty, and from Lippi et al. (2021). However the latter measurement is identical to that of Dello Russo et al. (2016) as they both obtained this value from UV spectroscopic observation with HST at the time of the NASA EPOXI flyby of 04/11/2010 (Weaver et al., 2011, the only non-IR measurements included in this work). In any case, following our methodology the measurement was selected from Dello Russo et al. (2016) as it was the larger study.

The hyperbolic comet C/2009 P1 demonstrated unusual behaviour in the observed production rates of \chCO during its perihelion passage of December 2011. For most comets volatile production rates are expected to peak sometime around perihelion approach and then decrease. This was the case for the production of \chH2O by C/2009 P1, however, the observed production of \chCO continued to increase past the perihelion passage (see Figure 9 of Feaga et al., 2013). This resulted in a large variation of the measured \chCO/\chH2O abundance across the perihelion passage; as such we assessed the available literature in an attempt to determine a suitable value of \chCO/\chH2O for our investigation. The \chCO/\chH2O abundance of C/2009 P1 as presented in Dello Russo et al. (2016) is the weighted mean of abundances from the following sources: Paganini et al. (2012), Villanueva et al. (2012), DiSanti et al. (2014), and McKay et al. (2015). In addition we retrieved the abundance ratios presented by Gicquel et al. (2015) Furthermore the largest abundance ratio for this comet, \chCO/\chH2O = 0.630±0.206plus-or-minus0.6300.2060.630\pm 0.2060.630 ± 0.206, was derived by Feaga et al. (2013) from remote observations from the Deep Impact Flyby spacecraft when of C/2009 P1 was at rh=2.002.06ausubscript𝑟2.002.06aur_{h}=2.00-2.06\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = 2.00 - 2.06 roman_au (abundance value and uncertainty retrieved from Harrington Pinto et al., 2022). For consistency with our methodology we excluded measurements with rh2ausubscript𝑟2aur_{h}\geq 2\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ≥ 2 roman_au and took the mean abundance, getting a similar value to the composition presented in Dello Russo et al. (2016): \chCO/\chH2O = 0.084±0.076plus-or-minus0.0840.0760.084\pm 0.0760.084 ± 0.076, where we reflect the large variation in abundance by assigning an uncertainty derived from the range of measured values666±plus-or-minus\pm± (max(\chCO/\chH2O) - min(\chCO/\chH2O)) / 2. It should be noted that we repeated our analysis using the much larger estimate of Feaga et al. (2013) and we found no significant changes in the overall strength of the composition - size correlation presented in Table 3. This is in line with the bootstrap/jack-knife resampling tests described in section 3.2, which demonstrated that the correlation for this dataset does not depend strongly on any one object.

Abundance ratios for 29P, C/2006 W3 and C/2008/Q3 were selected from Ootsubo et al. (2012), a survey of \chCO, \chCO2 and \chH2O for 18 comets using NIR spectroscopy from the AKARI spacecraft. We note that the observations for C/2006 W3 and 29P were taken at a large heliocentric distances of rh>3& 6ausubscript𝑟36aur_{h}>3\ \&\ 6\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT > 3 & 6 roman_au respectively. This is much greater than the typical rh12ausubscript𝑟12aur_{h}\approx 1-2\ $\mathrm{a}\mathrm{u}$italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ≈ 1 - 2 roman_au for other comets in the \chCO dataset, which may be another explanation for the higher than average abundance of \chCO relative to less volatile \chH2O, which we attempt to address in our analysis using the additional tests described in Section 3.2.

We selected the \chCO/\chH2O abundance of periodic comet 17P/Holmes from the population study by Lippi et al. (2021) as this was the only source available for this comet. This work reports abundances for 20 comets based on reanalysis of an archive of high resolution infrared spectroscopy from NIRSPEC at the Keck Observatory.

In addition to the large scale surveys described above we searched for literature describing composition of individual comets. C/2020 F3 was observed by Biver et al. (2022b) with IRAM/NOEMA in July/August 2020 with generally poor weather conditions in both runs which limited detection of more complex molecules. We note that there were relatively few observations of this comet, presumably due pandemic restrictions during its apparition, although similar abundances were also measured by Faggi et al. (2021). C/2020 F3 has a low \chCO/\chH2O abundance ratio compared to other comets; in the IRAM observations \chCO was only marginally detected. The reference water production rates were derived from interpolation of SOHO-SWAN observations of Lyman-α𝛼\alphaitalic_α Hydrogen emission (Combi et al., 2021) and observations of the 18cm \chOH line at the Green Bank Telescope and Nançay Radio Telescope (Drozdovskaya et al., 2023). For comet 2P Encke, the \chCO/\chH2O abundance was measured during its 2017 apparition by Roth et al. (2018) using iSHELL at IRTF. These observations were made shortly after perihelion passage under favourable conditions, with 2P at geocentric distance of only 0.75ausimilar-toabsent0.75au\sim 0.75\ $\mathrm{a}\mathrm{u}$∼ 0.75 roman_au. This allowed the detection of hyper-volatiles, \chCO and \chCH4, which are usually difficult to measure for ecliptic comets from ground-based observations with low geocentric velocities.

67P Churyumov-Gerasimenko was the target of the Rosetta mission and its production rates were measured in situ by the ROSINA mass spectrometer instrument in May 2015 (Rubin et al., 2019). We selected these in situ measurements as we expect them to be more accurate and precise than remote observations. The Rosetta observations used in this study were taken while 67P was in a period of strong outgassing on the approach to perihelion. These detailed measurements revealed that the abundance ratios of volatile species varied over the course of the mission, related in a complex way to the heliocentric distance, nucleus spin axis orientation and the relative position of Rosetta to the nucleus. This highlights that instantaneous measurements of abundance ratios may not necessarily reflect the true abundance ratios within the bulk nucleus, however, such detailed analysis is impossible for remotely observed comets.

The remaining abundance ratios for comets 1P and 45P were selected from the literature compilation by Harrington Pinto et al. (2022). This study gathered production rates for \chCO, \chCO2 (and \chH2O where available) for 25 comets from a wide range of published sources using both space and ground-based observations. They selected sources where \chCO and/or \chCO2 production rates were measured contemporaneously with \chH2O and for some comets they have collated multiple measurements for abundance ratio. Following our methodology we calculated the mean abundance, date and heliocentric distance for each comet to use in our own dataset. However, as the measurements collected by Harrington Pinto et al. (2022) are from multiple sources we selected abundance ratios from the larger homogeneous studies (e.g. Dello Russo et al., 2016; Ootsubo et al., 2012) where possible.

Type Designation Number Name Date(MJD) rhsubscript𝑟r_{h}italic_r start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT(au) \chCO/\chH2O σCO/H2Osubscript𝜎CO/H2O\sigma_{\textrm{CO/H2O}}italic_σ start_POSTSUBSCRIPT CO/H2O end_POSTSUBSCRIPT Composition Source r𝑟ritalic_r(km) σrsubscript𝜎𝑟\sigma_{r}italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT(km) Radius Source
P 1 Halley 46495.0 0.79 0.110 0.0160 Harrington Pinto et al. (2022) 5.50 0.53 Lamy et al. (2004)
P 2 Encke 57834.3 0.48 0.004 0.0004 Roth et al. (2018) 2.43 0.06 Boehnhardt et al. (2008)
P 8 Tuttle 54487.7 1.05 0.004 0.0008 Dello Russo et al. (2016) 2.25 0.50 Harmon et al. (2008)
P 9 Tempel 1 53547.5 1.52 0.043 0.0100 Dello Russo et al. (2016) 2.83 0.10 Thomas et al. (2013a)
P 17 Holmes 54401.5 2.46 0.088 0.0270 Lippi et al. (2021) 2.40 0.53 Bauer et al. (2017)
P 21 Giacobini-Zinner 51455.7 1.12 0.022 0.0150 Dello Russo et al. (2016) 1.82 0.05 Pittichová et al. (2008)
P 29 Schwassmann-Wachmann 1 55153.5 6.18 4.645 1.0187 Ootsubo et al. (2012) 23.00 6.50 Bauer et al. (2017)
P 45 Honda-Mrkos-Pajdusakova 57761.0 0.56 0.005 0.0010 Harrington Pinto et al. (2022) 0.62 0.03 Lejoly et al. (2022)
P 67 Churyumov-Gerasimenko 57152.0 1.66 0.031 0.0090 Rubin et al. (2019) 1.65 0.01 Jorda et al. (2016)
P 103 Hartley 2 55498.0 1.13 0.003 0.0015 Dello Russo et al. (2016) 0.58 0.02 Thomas et al. (2013b)
C 1995 O1 Hale-Bopp 50594.6 1.14 0.262 0.0070 Dello Russo et al. (2016) 30.00 10.00 Lamy et al. (2004)
C 2006 W3 Christensen 54909.9 3.40 2.296 0.4648 Ootsubo et al. (2012) 21.88 4.20 Bauer et al. (2017)
C 2007 N3 Lulin 54870.9 1.31 0.022 0.0009 Dello Russo et al. (2016) 6.10 0.25 Bauer et al. (2017)
C 2008 Q3 Garradd 55018.0 1.81 0.243 0.0494 Ootsubo et al. (2012) 3.35 0.50 Bauer et al. (2017)
C 2009 P1 Garradd 55943.5 1.71 0.084 0.0750 See appendix D 9.60 4.00 Boissier et al. (2013) & Bauer et al. (2017)
C 2010 G2 Hill 55935.5 2.50 0.910 0.2300 Dello Russo et al. (2016) 4.01 1.04 Bauer et al. (2017)
C 2020 F3 NEOWISE 59047.3 0.80 0.032 0.0120 Biver et al. (2022b) 2.50 0.22 J. Bauer (unpubl. data)
Table 9: The \chCO/\chH2O abundance ratio and nucleus radius for each comet used in our analysis. This table is a subset of the full composition - size dataset (which is sampled in table 6) but is presented here for convenience.