A link between the size and composition of comets

James E. Robinson,¹²²footnotemark: 2 Uri Malamud,² Cyrielle Opitom,¹ Hagai Perets,² and Jürgen Blum³
¹Institute for Astronomy, University of Edinburgh, Edinburgh EH9 3HJ, UK
²Department of Physics, Technion - Israel Institute of Technology, Technion City, 3200003 Haifa, Israel
³Institute for Geophysics and extraterrestrial Physics, Technische Universität Braunschweig, Mendelssohnstr. 3, D-38106, Braunschweig, Germany E-mail: james.robinson@ed.ac.ukThese authors contributed equally to this work. Robinson - observational data compilation ; Malamud - initiative and theory.

(Accepted 2024 March 21. Received 2024 March 7; in original form 2023 October 17)

Abstract

All cometary nuclei that formed in the early Solar System incorporated radionuclides and therefore were subject to internal radiogenic heating. Previous work predicts that if comets have a pebble-pile structure internal temperature build-up is enhanced due to very low thermal conductivity, leading to internal differentiation. An internal thermal gradient causes widespread sublimation and migration of either ice condensates, or gases released from amorphous ice hosts during their crystallisation. Overall, the models predict that the degree of differentiation and re-distribution of volatile species to a shallower near-surface layer depends primarily on nucleus size. Hence, we hypothesise that cometary activity should reveal a correlation between the abundance of volatile species and the size of the nucleus. To explore this hypothesis we have conducted a thorough literature search for measurements of the composition and size of cometary nuclei, compiling these into a unified database. We report a statistically significant correlation between the measured abundance of \chCO/\chH2O and the size of cometary nuclei. We further recover the measured slope of abundance as a function of size, using a theoretical model based on our previous thermophysical models, invoking re-entrapment of outward migrating high volatility gases in the near-surface pristine amorphous ice layers. This model replicates the observed trend and supports the theory of internal differentiation of cometary nuclei by early radiogenic heating. We make our database available for future studies, and we advocate for collection of more measurements to allow more precise and statistically significant analyses to be conducted in the future.

keywords:

comets: general – astronomical data bases: miscellaneous

^†^†pubyear: 2024^†^†pagerange: A link between the size and composition of comets–D

1 Introduction

In a recent study of the long-term evolution of comets (Malamud et al., 2022), it was found, through a combination of various empirical laboratory works, that pebble-made comets have an extremely low thermal conductivity. The result of which is that due to radiogenic heating, comets are able to attain higher temperatures during their evolution than those typically considered in past models. This implies that internal transport of various volatile species is ubiquitous in comets.

Comet nuclei consist of various volatile species. Water and probably also carbon dioxide are currently viewed as primary volatile species, that exist as amorphous solids (Rubin et al., 2023), and can trap other high-volatility species during their incipient formation (Bar-Nun et al., 1985; Simon et al., 2023). We cannot know for certain which high-volatility species exist inside the comet as pure ice condensates, and which ones were co-deposited within the amorphous ice hosts, or in what precise proportion. Regardless, as was envisaged in Malamud et al. (2022), due to the internal temperature gradient in the comet, volatiles must migrate towards the surface upon their direct sublimation or else upon the phase transitions of their amorphous ice hosts. While migrating through an intact matrix of amorphous ice, the volatiles may become sequentially entrapped in the amorphous ice again, even though local temperatures are otherwise still too high for their deposition as pure ices (Bar-Nun et al., 1985; Laufer et al., 1987; Bar-Nun et al., 1987, 1988; Collings et al., 2003; Kumi et al., 2006; Gálvez et al., 2007; Maté et al., 2008; Gálvez et al., 2008; Herrero et al., 2010; Carmack et al., 2023). Amorphous ice hosts can thus become highly enriched in entrapped gases, since they have a very large uptake (storage capacity) of high-volatility species, amounting to a few tens of % of their own mass (see Carmack et al., 2023, and references therein). Even if sequential deposition were to be ignored, hyper- and super-volatile species will still generally flow outwards and eventually freeze according to their respective deposition temperatures, forming an onion-like internal stratification (De Sanctis et al., 2001; Choi et al., 2002; Davidsson, 2021).

Not only might this process differentiate hyper- and super-volatile species from the bulk of the comet to a much narrower layer, closer to the surface, but the degree of differentiation could be strongly dependent on the comet’s size. While Malamud et al. (2022) have established a correlation between the degree of differentiation and several factors, such as the assumed composition, the time of formation, the pore-space permeability and the pebble size, the dependence on the nucleus size involves fewer uncertainties, as follows.

Internal temperatures are, generally, governed by the interplay between the rate at which internal heat is released, and the rate in which it can diffuse out, either through conduction, advection or radiative transport. The amount of radiogenic heat release depends on the radionuclide abundances which in turn depend on the comet’s formation time (for short-lived radionuclides) and the assumed composition of refractories which sets the initial radiogenic abundances at $t=0$ . Arguments were made in Malamud et al. (2022) that for outer Solar System objects the abundances do not necessarily adhere to meteoritic levels, which are usually invoked in thermophysical models.

The effectiveness of advective flow and thus of differentiation, depends on the permeability of gas within the porous matrix in which it travels. This parameter is unconstrained in pebble media and has a potentially large range (Gundlach et al., 2011, 2020; Schweighart et al., 2021; Güttler et al., 2023). Radiative transport strongly depends on temperature as well as pebble size (Hu et al., 2019; Bischoff et al., 2021). Finally, heat conduction out of the interior, which is the primary mode of heat transport, depends quadratically on the characteristic length scale of the object. Hence, all else being equal (radionuclide abundances, internal permeability, thermal conductivity, heat capacity etc.), there is no question that the size of the nucleus dictates how much heat the nucleus can retain. The larger the comet, the greater bulk migration of hyper- and super-volatiles we might expect, sweeping them outwards and depositing them in differentiated layers of either gas laden amorphous ice or as pure ice. The former is much more likely for hyper-volatiles, as discussed next.

The survival of pure ices of hyper-volatile materials is not impossible, however probably unlikely in most present-day observed comets. Since comets are expected to lose their hyper-volatile content either in the contemporary Kuiper Belt (Lisse et al., 2021) or even in the primordial disc (Davidsson, 2021) relatively quickly, they must be emplaced onto distant Oort-cloud orbits early enough in the Solar System formation history in order to avoid this fate. At least some fraction of the outer nucleus must be kept below the threshold temperature of incipient sublimation of such ices, unaffected by both external insolation and internal radiogenic heating. C/2016 R2 might be an example of a rare, hyper-volatile rich, yet water and dust poor comet, belonging to this category (McKay et al., 2019). Even without early emplacement onto an Oort-cloud-like orbit, several super-volatile species as well as amorphous water and carbon dioxide ice would remain sufficiently cold and thus safe against insolation in the outer Solar System, as evident from both theory and observations (Prialnik et al., 1987; Jewitt, 2009; Li et al., 2020; Parhi & Prialnik, 2023). If perturbed into the inner Solar System for the first time, more vigorous activity is expected, however erosion limits the penetration of a heat wave to the interior (Capria et al., 2017), thus shielding the inner nucleus. The bulk composition of the eroded surface reflects the early rather than contemporary orbital state.

Based on the aforementioned arguments, and in light of the elevated internal temperatures suggested by the pebble nucleus model of Malamud et al. (2022), the following predictions have instigated this study:

1. Only extremely small nuclei cool effectively enough in order to prevent any internal differentiation, whereas increased nucleus size correlates with increased hyper- and super-volatile differentiation and concentration near the surface, triggered by internal radiogenic evolution. Thus, active comets will appear to have greater abundances of high-volatility species as a function of their size.

2. More dynamically evolved short period active comets will, as a group, be more eroded and therefore expose deeper layers compared to long period comets, which might be reflected in their size-dependent volatile abundances.

3. If prolonged activity of comets strictly requires gas laden amorphous ice, then small comets might outlive larger comets, because the latter might concentrate amorphous ice in a thinner outer layer. In turn, a testable prediction is that dynamically evolved active comets are smaller than their long period counterparts, because large inner Solar System comets have had more orbits over which to erode and become dormant.

In this paper we present the first evidence that comet observations partly support the above predictions, and in particular the size-composition dependence for the hyper-volatile CO. Future efforts are certainly needed in order to increase the quantity of currently available data and help verify our predictions. In what follows we first carefully describe the criteria for assembling our data set, consisting of comets for which both the size and the coma composition are reliably known (Section 2). We then present the observational evidence in Section 3. A discussion of our findings is given in Section 4. The paper is concluded in Section 5.

2 The data set

In this section, we describe the data we used for this study. We aimed to gather the largest amount of size and composition data as we could, given what was available in the literature at the time of writing.

Cometary activity allows us to measure the composition and abundance of gases in the coma that are being released from the nucleus via sublimation of volatile ices. This provides an opportunity to gain insight into the composition of the ices contained in cometary nuclei. Many radiative processes take place in cometary atmospheres, which can be observed across a range of wavelengths (Biver et al., 2022a; Bodewits et al., 2022). Spectroscopic observations of comets have been used to measure the composition of their coma through detection of emission bands or lines from a variety of molecules. This complements much rarer data from direct mass spectroscopy measurements following flyby of a comet by a space mission. In this study, we used mass spectroscopy data only for comet 67P/C-G (Rubin et al., 2019). We have gathered measurements from a large number of sources covering a range of molecules, from relatively complex ones to small radicals.

The other key information for this study is the size of comet nuclei, which is difficult to measure. Indeed, comets far from the Sun are faint and challenging to observe because of their small sizes and low geometric albedo. As they move inwards, solar radiation increases and causes the formation of the coma surrounding and obscuring the nucleus. Comets are most often discovered/observed while active, as this is when they are brightest.

In this study we generally refer to two broad classes of comets: nearly isotropic comets (NIC), which possess a fairly uniform inclination distribution, long orbital periods and a Tisserand parameter $T<2$ ; and ecliptic comets (EC), also known as short period comets, which are sub-classified into Jupiter family comets (with a Tisserand parameter $2<T<3$ and periods below 20 years) and Chiron-type comets (with $T>3$ and periods in the range 20-200 years) (Levison, 1996).

2.1 Comet Nuclei Sizes

Different methods are used to measure the size of a comet nucleus, from optical photometric observations of inactive comets, observations of thermal emission from the nucleus, or direct measurements by spacecrafts. These techniques have different levels of accuracy and usually rely on different assumptions. We briefly describe the techniques that were used for the objects in this study but refer the reader to Lamy et al. (2004) and Knight et al. (2023) for a more complete description alongside the advantages/shortfalls of the different techniques.

2.1.1 Observing reflected light

This is one of the most commonly used techniques to determine the size of cometary nuclei. When a comet is at large heliocentric distances ( $\gtrsim 4$ au for short period comets, further away for other types of comets), the heating from the Sun is insufficient to efficiently drive sublimation of water ice and there is little to no coma obscuring the nucleus. The nucleus can then be observed directly, or its flux estimated once a small coma contribution is modelled and subtracted. The flux measured for the nucleus is then used to compute the nucleus size. This technique presents some difficulties, as it relies on observing faint objects far from the Sun. With a limited spatial resolution, it can also be impossible to ascertain that the object is truly inactive vs having an unresolved coma. For most objects, when these parameters are unknown, assumptions also need to be made concerning the geometric albedo of the target and its phase curve properties, introducing uncertainties in the nucleus size obtained.

2.1.2 Observing thermal emission

At longer wavelengths, the thermal emission of comet nuclei can be detected, which can be linked to the nucleus size. This generally requires observing at infrared wavelengths, often with a space telescope. Similar to observations of reflected light at optical wavelengths, in cases where the object is active the coma contribution has to be modelled to retrieve the contribution of the nucleus. Thermal modelling of the nucleus is then used to determine its size (e.g. NEATM; Harris, 1998). In most cases, assumptions must be made on the rotation period of the object, its shape (often, the derived size is an effective radius, assuming a spherical nucleus), albedo, or thermal inertia. Likewise, interferometric measurements can be made of the submillimetre continuum component of the thermal emission, from which nucleus size can also be determined through thermal modelling (e.g. Altenhoff et al., 1999; Boissier et al., 2011, 2013). At these wavelengths dust in the coma contributes much less to the total emission than in the IR and visible ranges, as such one expects such observations to be dominated by thermal emission from the nucleus. However, the submillimetre flux from the nucleus is lower than the IR flux making these observations challenging except for bright and/or nearby comets. Furthermore, in cases where thermal emission and visible photometry can be measured simultaneously it is possible to solve for the nucleus radius and albedo independently (Lamy et al., 2004).

2.1.3 Radar observations

Radar observations, where a burst of microwaves is sent towards the nucleus of a comet and the reflected echo measured, can accurately constrain the shape and size of comets and asteroids that pass very close to the Earth. However, this technique is limited by a relatively small number of comet nuclei with near-Earth orbits.

2.1.4 Space-based size measurements from flyby / rendezvous

The most accurate determination of the shape and size of any small Solar System body is made by direct observation during a spacecraft flyby/rendezvous. Naturally only a handful of comets have been visited by a spacecraft. These accurate size measurements are invaluable, alongside detailed information on the shapes (e.g. bilobate) and terrain (topography) of comet nuclei.

2.1.5 Other size-measuring techniques

In this study, we have gathered comet size measurements determined using the techniques mentioned above. Other techniques have been used to measure nucleus sizes, but either the comets they targeted did not have composition information and were not useful for this study or they were judged to be less reliable. We investigated the size estimates inferred by Jewitt (2022) using the water production rate and non-gravitational acceleration of long period comets. The former model assumes the production rate is linked to the sublimating area and therefore nucleus size, and the latter model estimates the nucleus mass/size from the magnitude of non-gravitational accelerations on the comet orbit. These methods are generally less accurate than those described above; they make use of simplified models of cometary activity and assume physical parameters such as active surface area and nucleus density. Furthermore, photometric/thermal techniques generally provide an upper limit on nucleus size whereas the techniques of Jewitt (2022) could either overestimate the size of hyperactive comets (when icy grains in the coma enhance activity) or underestimate the size depending on the accuracy of the model and choice of physical parameters. At the time of writing this source provides the only available literature size estimates for a number of NICs (8 comets for which we also found compositional information) therefore we considered using these comets in our analysis. We found that our overall results were not significantly changed by the inclusion of sizes from Jewitt (2022) therefore we elected not to include these sizes in the final results to avoid potential biases. However, we welcome the efforts to broaden the dataset of comet sizes in this manner and perhaps future work can incorporate such size estimates into a more complete analysis.

2.1.6 Selection criteria

In addition to searching the literature, we used the Small-Body Database Lookup tool¹¹1https://ssd.jpl.nasa.gov/tools/sbdb_lookup.html to query all comets listed in the MPC comet list²²2https://www.minorplanetcenter.net/iau/MPCORB/CometEls.txt (via astroquery, accessed 23/06/2022) and checked the original references. Furthermore we searched for additional size measurements in the Properties of Comet Nuclei v2.0 PDS³³3https://pds.nasa.gov/ dataset (Barnes et al., 2010). The studies used for our final selection of comet nucleus sizes are listed in Table 5.

Many comets had multiple measurements from different sources and obtained with different techniques. Previous compilations of comet nuclei sizes have sometimes taken the average of multiple measurements and used their variation to estimate the uncertainty (e.g. Combi et al., 2019, for 46P and 96P). In this work we chose to pick a single size for each comet, considering the reliability of each source and the methods used. We do this as one would expect the measurement of nucleus size to be more frequently biased towards larger sizes due to additional signal from unresolved activity. We applied the following guidelines to make our selection:

•

Use spacecraft rendezvous/flyby measurements (a direct measurement of size) where available.
•

Choose the smallest measured radius as some studies only provided an upper limit on size.
•

Prefer more modern sources which generally apply more well established techniques on mostly higher quality data.
•

Select a measurement with a directly calculated uncertainty if available⁴⁴4When no uncertainty is provided we use the radius-uncertainty relation of the literature data to assign an uncertainty estimate (Figure 8). For some of these objects the uncertainty on the radius is expressed as an upper and lower limit. In these cases, for simplicity, we have taken the mean of these values to obtain a single uncertainty.

For further details about size selection for particular comet nuclei, please refer to Appendix A.

2.1.7 Comet fragments

Comets have often been observed to split into multiple fragments due to tidal forces while passing too close to the Sun or a planet (e.g. the Kreutz sungrazers and the 1994 Shoemaker-Levy encounter with Jupiter) or for underdetermined reasons such as rotational spin-up, impacts or gas pressure (Boehnhardt, 2002). As such in this study we must be wary of when the nucleus size was measured relative to the observation of its composition. A famous example is comet 73P/Schwassmann-Wachmann 3, which fragmented in 1995 (and possibly earlier as well - Schuller & Struve, 1930). This is one of the only cases where the composition of different fragments of a split comet could be measured separately. The compositions of fragments B and C have been measured by Fink (2009), Dello Russo et al. (2016) and Lippi et al. (2021) and were determined to be similar. However, in this work we decided to use only pre-fragmentation sizes as this is more representative of the initial size of the primordial comet nucleus. We were able to find composition information of nuclei/fragments for a number of comets including 73P, 51P/Harrington, C/1996 B2 (Hyakutake) and C/2001 A2 (LINEAR), however we have excluded them from this study as they do not have reliable size determination prior to the splitting events. Likewise for other known split comets (listed in Boehnhardt, 2004), either no composition information was available or the size pre-splitting was unknown so they were not used in this study.

2.2 Comet Compositions

2.2.1 Observed species

Over the years, various techniques have been used to measure the compositions of comets by observing their atmosphere. Biver et al. (2022a) present an overview of the current stage of our knowledge about comet composition and how it is measured. The simplest measurements to obtain are made at optical wavelengths, using spectroscopy or narrow-band filters to measure the abundance of a set of radicals, such as \chCN, \chC2, \chC3, \chOH, \chNH (see for example A’Hearn et al., 1995). While databases containing comets observed at optical wavelengths are the largest available, the species they sample are what we call product or daughter species. They are not present as such in the nucleus ices but are instead produced in the coma by the photo-dissociation of larger molecules. Therefore they might not directly represent the composition of nucleus ices. Molecules produced directly by the sublimation of nucleus ices, called parent species, tend to emit at infrared and radio wavelengths (H₂O, CO, CO₂, HCN, NH₃, CH₄, …). These molecules are harder to detect, and their abundances can typically only be measured for relatively bright comets. In this study, we have included both types of species, in order to gather the largest sample possible. We have also included abundances measured in situ using mass spectroscopy in the coma of comet 67P/C-G by the ROSINA mass spectrometer onboard the Rosetta spacecraft (Rubin et al., 2019). We note that there are composition measurements of 67P made using other instruments on Rosetta (e.g. Bockelée-Morvan et al., 2016; Feldman et al., 2018; Biver et al., 2019), however we used the measurements of Rubin et al. (2019) as they report abundances for a large number of species together.

Comparing the abundance of species from different studies, sometimes derived from observations at different wavelengths and with different techniques, can be a challenge. Indeed, the size of the field of view and the model parameters used can have a significant effect on the abundances measured. For example, in hyperactive comets sublimation of icy grains in the coma can be a significant source of volatile gas compared to production from the nucleus alone (e.g. 103P, Kelley & Kolokolova, 2014). This could in principle lead to changes in measured abundance depending on where in the coma the measurement was made. However, the goal of this study is to find broad trends among comets rather than performing detailed comparison between a small number of comets. Given the limited number of targets for which we could find both composition and size measurements, we collected all the sources we could find for abundance measurements. We focused first on large-scale studies, and then complemented our database using works focused on individual comets.

For observations of radicals at optical wavelengths, we considered mainly the following studies. Among the largest available studies is the one by A’Hearn et al. (1995). They published the results of a survey of 85 comets observed from the 1970s until 1992 using narrow-band photometry at optical wavelengths. They sample a range of ECs and NICs. This was later updated by Schleicher (2008). We queried that dataset from PDS (Lowell Observatory Cometary Database - Production Rates, Osip et al., 2003). Cochran et al. (2012) obtained abundances from optical spectroscopy of 130 comets from 1980 - 2008 while Langland-Shula & Smith (2011) observed 26 comets using the Kask double spectrograph at Lick observatory (primarily in the $300-600\ $\mathrm{n}\mathrm{m}$$ range). Finally, Fink (2009) present abundances for 50 comets with significant enough detections to derive reliable production rates from observations made in the wavelength range $520-1040\ $\mathrm{n}\mathrm{m}$$ at the Catalina Site telescope (Fink & Hicks, 1996). In this particular study, there are no exact uncertainties published, only a subjective quality grade. We thus converted the quality grade into an uncertainty using the suggested percentage errors. Most of these studies contain measurements of the $\mathrm{Af\rho}$ parameter, a proxy for the dust production, in addition to the radical production rates. For completeness and as an estimate of the dust-to-gas ratio we have included $\mathrm{Af\rho}$ measurements in this study.

For observations of parent species, we focused mainly on the following studies. Dello Russo et al. (2016) present high resolution IR spectroscopy of 30 comets observed between 1997 and 2013. We complemented this with data from Lippi et al. (2021). When a target was available in both data sets, we used data from Dello Russo et al. (2016) as it presents the largest dataset. Ootsubo et al. (2012) present CO₂ production rates for a sample of 18 comets with the AKARI satellite, and Reach et al. (2013) measured abundances of CO and CO₂ with the Spitzer space telescope for 23 comets. The production rates of \chCO and \chCO2 were further complemented by an existing compilation by Harrington Pinto et al. (2022), which contains the results of Ootsubo et al. (2012) & Reach et al. (2013) alongside additional sources. Table 1 summarises the number of comets included in the largest of these studies, as well as the species they measured.

Source	Method	Wavelength	Number of Comets	Species
A’Hearn et al. (1995)	Narrowband photometry	Visible	85	\chCN, \chC2, \chC3, \chNH, $\mathrm{Af\rho}$ , \chOH
Fink (2009)	Spectroscopy	Visible	92 (50)	\chC2, \chNH2, \chCN, $\mathrm{Af\rho}$ , \chH2O
Langland-Shula & Smith (2011)	Spectroscopy	Visible	26	\chCN, \chC2, \chC3, \chNH, \chNH2, $\mathrm{Af\rho}$
Cochran et al. (2012)	Spectroscopy	Visible	130 (110)	\chCN, \chNH, \chC3, \chCH, \chC2, \chNH2, \chOH
Ootsubo et al. (2012)	Spectroscopy	IR	18 (17)	\chH2O, \chCO2, \chCO
Reach et al. (2013)	Spectroscopy	IR	23 (20)	\chOH, \chCO2, $\mathrm{Af\rho}$
Dello Russo et al. (2016)	Spectroscopy	IR	30	\chCH3OH, \chHCN, \chNH3, \chH2CO, \chC2H2, \chC2H6, \chCH4, \chH2O
Lippi et al. (2021)	Spectroscopy	IR	20	\chCH3OH, \chHCN, \chNH3, \chH2CO, \chC2H2, \chC2H6, \chCH4, \chCO, \chH2O

Table 1: Table of the main compositional surveys considered in this work. The source is listed along with the observational method and wavelength. The number of comets considered in each study is listed, where the number in brackets is the number for which some or all of the species were detected. The species targeted by each survey are also listed, including

\mathrm{Af\rho}

which is a proxy for dust production.

The rest of the data comes from publications focused on individual comets by Biver et al. (1999, 2007, 2011a, 2011b, 2012, 2021a, 2021b, 2022b); Bockelée-Morvan et al. (2000, 2004, 2010, 2022); Bodewits et al. (2011); Bonev et al. (2021); Dello Russo et al. (2020); Faggi et al. (2019); Moulane et al. (2018); Opitom et al. (2016); Roth et al. (2018); Roth et al. (2020); Rubin et al. (2019)

2.2.2 Selection criteria

To select the final abundance measurements presented in Section 3, we applied the following methodology:

•

We collected data from all sources and identified the available species. We selected only data sets that also had available production rates for \chOH/\chH2O or \chCN (we considered abundances relative to water or CN when looking for correlation between composition and size).
•

For each data set, if multiple measurements were provided for a comet we calculated the mean heliocentric distance and date of the observations and the corresponding average for each production rate if required. We used a weighted average if and when the source provided the corresponding weights.
•

If multiple sources were available for a comet, we selected production rate ratios from the largest available dataset, to prioritise using larger homogeneous datasets.
•

For improved reliability, preference was given to sources who published production rates with an associated uncertainty.

By selecting composition from a single source we attempt to avoid difficulties in combining abundance measurements from multiple observational techniques and/or observational circumstances. Assessing the changes in compositional abundance as a function of methodology, viewing geometry and other circumstances such as outburst events for all comets analysed here is beyond the scope of this work. We note that determining the true bulk abundance of species in a cometary nucleus from remote observations alone will always be a difficult problem, and we attempted to minimise these effects by selecting from larger homogeneous datasets where possible.

As mentioned above, we focused mostly on abundances of other species relative to water. However, observations at optical (and sometimes radio) wavelengths only provide measurements of the \chOH production rate. Since OH is produced by the photo-dissociation of water, it is possible to convert between \chH2O and \chOH production rates. Several ways to do the conversion have been presented in the literature and we decided to use the conversion ratio of $\ch{OH}=0.85\times\ch{H2O}$ based on the photo-dissociation rate of water into OH and H (Harris et al., 2002). As \chH2O and \chOH are hard to observe at optical wavelengths, abundance is sometimes reported relative to CN. We thus included these observations as well for completeness.

Most studies present composition measurements for a comet in a distinct time window, e.g. around its perihelion passage or the date range over which it was observable/observing time was available. If not provided in the original study, we determined the mean heliocentric distance and date of the observation for each comet from each source. In most cases this accurately captures the mean epoch at which the comet composition was measured, however in certain long running observing campaigns the mean might not reflect the true range of observing conditions (e.g. A’Hearn et al., 1995).

Whenever a source provided several measurements for the same target, and if a final summary table was not provided by the authors, we took a mean of the compositions for that source to get a single mixing ratio for each object (uncertainties were propagated forward when available), unless a large gap in heliocentric distance/time was present between the observations. In that case, the measurements closest to perihelion were selected. Some sources provide an upper and lower uncertainty estimate on composition/mixing ratio. For simplicity, with these measurements we took the mean value of the upper and lower limits to be the uncertainty.

As a sanity check we considered the range of heliocentric distances used to calculate each mean composition. For the comet compositions where this range is large ( $>1~{}$\mathrm{a}\mathrm{u}$$ ) we inspected the individual measurements to ensure they were consistent. 10P and 103P displayed significant variation in their \chCO2/\chH2O abundance as reported by Reach et al. (2013), therefore for these two comets we excluded the production rates measured at $>2~{}$\mathrm{a}\mathrm{u}$$ where ices more volatile than \chH2O begin to dominate activity. Likewise, there was large variation in the measurements of \chCO2/\chH2O for C/1995 O1 compiled by Harrington Pinto et al. (2022). The reported observations spanned a wide range of heliocentric distances, with some taken while the comet was $>3\ $\mathrm{a}\mathrm{u}$$ . As such, we selected the measurement with the lowest heliocentric distance ( $r_{h}=2.93\ $\mathrm{a}\mathrm{u}$$ ) for our analysis. Furthermore, we note here that Cochran et al. (2012) provided average production rates with respect to \chCN scaled to $1\ $\mathrm{a}\mathrm{u}$$ ; as such we set all measurements from this source to heliocentric distance $r_{h}=1\ $\mathrm{a}\mathrm{u}$$ when incorporating their results into our dataset.

We have taken the steps described above, applying a consistent methodology when selecting which source to use for a given composition measurement, in order to utilise the wide range of literature measurements in a reliable manner. We note that choice of source will greatly affect the outcome of an analysis such as ours. For example there is significant variation in the \chCO2/\chH2O production rates of comets such as 19P between Ootsubo et al. (2012) & Reach et al. (2013). We make available the full data table with all comet composition and size measurements (and their literature sources) so that in all cases the provenance of the data is clear. Furthermore we hope that this data collection is of value to future studies investigating the size and/or compositions of comets. A sample of the dataset is displayed in Table B and the full dataset is available at this link.

3 Observational findings

3.1 Raw data analysis

Figure 1 presents the abundance relative to water of a range of species commonly considered to be parent species as a function of the nucleus radius, for our entire data set. Each point represents a single comet. Different symbols are used for the ecliptic comets (ECs) and nearly isotropic comets (NICs), and the heliocentric distance of the comet when the abundance measurement was performed is indicated by the colour scale. Figure 9 shows the same information but for daughter species (and the proxy for dust production rate $\mathrm{Af\rho}$ ). Only species for which a significant number of measurements were available are shown. Additional figures displaying abundances relative to CN are presented in Figure 10. For each species we assess the possible presence of a correlation between the relative production rate of that species and the comet nuclear size. In order to do so we calculated the Pearson correlation coefficient ( $\gamma$ ) of these data (in log-log space) to measure the degree of linear correlation. When the data has strong linear correlation $\gamma$ has values approaching $\pm 1$ (signifying positive or negative correlation). In order to test the statistical significance of $\gamma$ , a $p$ -value is also calculated, which represents the probability of obtaining a result assuming that the null hypothesis (no correlation) is true. Therefore, when we measure large values of $\gamma$ with a corresponding small value of $p$ we can assume that the correlation is statistically significant, where typically a significance level of $p<0.05$ is the de facto threshold often used in literature. Table 2 presents the values of the Pearson correlation and $p$ -values for all species for ECs and NICs separately as well as for the full sample. The exact $p$ -value for a significant result can be a somewhat arbitrary choice, therefore in Table 2 we have highlighted different ranges of $p$ -value. We select thresholds that are analogous to 3-, 2-, 1-sigma significances, i.e. $p\leq 0.003$ (strong significance), $0.003<p\leq 0.05$ (moderate significance), and $0.05<p\leq 0.32$ (marginal significance), respectively.

The strongest correlation by far is seen for CO at the 3-sigma level. CO is one of the most volatile ices in cometary nuclei. The trend is stronger for ECs than NICs. One potential bias to keep in mind is that short period comets tend to have lower CO abundances due to repeated passages close to the Sun (Dello Russo et al., 2016). The effect of the heliocentric distance on the CO abundance measurements in comets and how it could affect these results is discussed below in Section 3.2. For more detail on the compilation of \chCO/\chH2O abundances from the literature, please refer to Appendix D.

We also see a trend at the 1-sigma level for HCN for ECs and no significant trend for the NICs, however, the correlation significance increases to 2-sigma when the whole sample is considered. As noted by Biver et al. (2022b), HCN production rates derived from millimetre observations can differ from production rates derived from infrared observations by typically a factor two, which complicates the interpretation of the trend for HCN. We do not see any trend for CN but this is not entirely surprising. While CN was originally thought to be produced by the photo-dissociation of HCN, evidence indicates that another source is needed to account for the CN abundance and morphology in comets. This other source could be another parent species (C₂N₂, HC₃N, or CH₃CN), sublimation of dust grain, salts, or macro-molecules (Biver et al., 2022b). The importance of this additional source could vary from comet to comet and explain the different trends seen for CN and HCN.

We see a trend at the 2-sigma level for H₂CO for the ECs and at 1-sigma for the sample as a whole. For CO₂, we see a 1-sigma level trend for the ECs and the full sample. We do not see any correlation for any of the other typical parent species. However, this does not mean that the correlation is not present, but most likely that the current data available are insufficient to draw strong conclusions. This should improve in the future when full composition and size measurements become available for a larger number of comets. The only exception might be methanol for which the number of data points available are similar to CO, CO₂, and HCN, but no trend can be seen.

With the possible exception of \chCS, we generally do not see strong correlations for daughter species. However, we do note a moderately significant anti-correlation for NH which is driven by the ECs. This correlation is surprising given the lack for correlation for ECs for NH₂ and NH₃. However, the scale-length used to compute NH production rates using the Haser model are difficult to constrain, which could influence the results. We thus disregard the NH trend for the rest of the discussion.

In Figure 9, we also see a 2-sigma correlation for the $\mathrm{Af\rho}$ for NICs, with higher dust to gas ratios for larger comets. An increase of the $\mathrm{Af\rho}$ /OH ratio at large heliocentric distances has been noticed by A’Hearn et al. (1995) and Langland-Shula & Smith (2011), which has been explained either by a selection effect (high dust to gas ratio comets have higher visual magnitudes), by the presence of large grains less volatile than water, or the build-up of a crust on the surface of the nucleus. Since larger comets tend to be more active, and thus brighter, they can be observed farther from the Sun. This is particularly true for NICs, which are more likely to be observed far from the Sun. The trend of higher dust-to-gas ratio at larger distances from the Sun could thus bias our correlation between $\mathrm{Af\rho}$ /H₂O and the nucleus size.

Refer to caption — Figure 1: Log scale plots showing the relation between comet composition (of various parent species relative to \chH2O) and radius of the nucleus. Marker shape denotes the dynamical class of each comet, either an Ecliptic Comet (EC, square markers) or Nearly Isotropic Comet (NIC, triangular markers). Marker colour indicates the heliocentric distance of the comet when the composition was measured. The error bars denote the uncertainty in the measured composition or size (when this was available). We indicate the correlation of the radius-composition data with a linear fit (in log-log space) for the whole dataset (solid line), ECs (dotted line) and NICs (dashed line). The Pearson correlation coefficients for each parent species are given in Table 2.

	Ecliptic Comets			Nearly Isotropic Comets			All Comets
Species	Number	Correlation	$p$ -value	Number	Correlation	$p$ -value	Number	Correlation	$p$ -value
\chC2H2/H2O	6	0.7796	0.0675	4	0.9537	0.0463	10	0.3421	0.3332
\chC2H6/H2O	11	-0.3479	0.2944	5	0.0177	0.9775	16	0.0862	0.7508
\chCH3CN/H2O	4	-0.4120	0.5880	2	1.0000	1.0000	6	-0.0430	0.9356
\chCH3OH/H2O	14	-0.2295	0.4300	7	-0.1155	0.8053	21	0.0933	0.6874
\chCH4/H2O	5	-0.5520	0.3347	5	0.0376	0.9521	10	0.2667	0.4564
\chCO2/H2O	19	0.2628	0.2770	10	0.2205	0.5405	29	0.3003	0.1135
\chCO/H2O	8	0.9143	0.0015	9	0.5740	0.1060	17	0.7880	0.0002
\chH2CO/H2O	8	0.7575	0.0295	5	0.5971	0.2877	13	0.3832	0.1962
\chH2S/H2O	6	-0.4590	0.3599	3	0.6947	0.5111	9	0.2737	0.4761
\chHCN/H2O	15	0.4108	0.1283	7	0.3957	0.3796	22	0.4678	0.0281
\chNH3/H2O	10	0.1611	0.6565	4	0.1050	0.8950	14	-0.2537	0.3815

Table 2: Table showing the results of the Pearson correlation tests between abundance of parent species and nucleus size, described in Section 3. For each species abundance (with respect to \chH2O) we present the correlations for dynamical subsets of the data, ecliptic comets and nearly isotropic comets, as well as the results for all comets. For each group we state the number of comets analysed, the Pearson correlation coefficient and the associated

p

-value of the correlation test. We highlight the species and dynamical groups in order of Pearson correlation significance, i.e. by

p

-value. The strongest significance correlations (

p\leq 0.003

, equivalent to a 3-sigma threshold) are indicated by dark grey cells with white text. Moderate significance (

0.003<p\leq 0.05

, 2-sigma) results are indicated by grey cells with white text. Results with only marginal significance (

0.05<p\leq 0.32

, 1-sigma) are shaded light grey. All other results have been deemed to be statistically insignificant in this analysis (plain white cells).

3.2 The critical influence of heliocentric distance

Cometary activity is driven by solar heating and is therefore strongly correlated with the heliocentric distance. This is why we have assessed the abundance ratio of each species relative to a common volatile such as \chH2O rather than considering production rates directly. However, more volatile species are able to drive activity at lower temperatures and greater heliocentric distances. As such we must assess whether the correlations reported above are driven primarily by comet size or by heliocentric distance of the measurements. It is for example expected that the abundances of CO/H₂O and CO₂/H₂O increase for comets past 2-3 au as the water sublimation becomes less efficient (Dello Russo et al., 2016; Ootsubo et al., 2012). For other species, Langland-Shula & Smith (2011) reported a trend of decreasing C₂/CN ratio as comets moved away from the Sun whereas A’Hearn et al. (1995); Cochran et al. (2012); Fink (2009) did not report a similar trend. Dello Russo et al. (2016) report increases in the abundance of H₂CO, NH₃, and C₂H₂ within heliocentric distances of 0.8 au compared to the abundances measured between 1 and 2 au, which might be caused by an additional contribution from extended sources.

On the plots shown in Figure 1 the colour of the data points reflects the heliocentric distance of the observation. For most species, there is no obvious heliocentric trend in the plots. However, for some of the species which show the strongest correlations between abundance and size (\chCO, \chCO2, \chHCN) there are strong indications of a compositional dependence on heliocentric distance. In order to test this additional correlation, in Figure 2 we plot the abundance of these species (relative to \chH2O) against heliocentric distance. This is the same data as shown in Figure 1, with the exception of an outlying \chHCN/H2O measurement for C/2002 X5 made at $r_{h}=0.21\ $\mathrm{a}\mathrm{u}$$ , which we have excluded as an outlier in terms of heliocentric distance. There is strong positive correlation between abundance of these species and heliocentric distance as indicated by the corresponding values of $\gamma$ and $p$ provided in Figure 2. In addition, we highlight these trends with a linear fit in log-log space.

In order to test the strength of the heliocentric distance dependence we considered the composition - size correlations for subsets of the data that have been limited to measurements made with heliocentric distances of $r_{h}<2\ $\mathrm{a}\mathrm{u}$$ . This restricted range is selected to remove observations at large $r_{h}$ where the changes in the production rate of \chH2O due to reduced solar heating may skew the measured abundance of a particular species (see further discussion in Section 4). However, it must be noted that this selection criteria preferentially excludes some of the largest comets, primarily due to observational biases. Figure 3 and Table 3 show the results of this test and we see that the subset of \chCO/\chH2O abundances still displays a statistically significant Pearson correlation coefficient ( $\gamma=0.750$ , $p=0.0031$ ) at the 2-sigma level (albeit on the 3-sigma boundary). This implies that the correlation between \chCO/\chH2O and comet size dominates over the $r_{h}$ dependence. In contrast, the positive correlations of \chCO2/\chH2O and \chHCN/\chH2O disappear or are reduced in significance ( $\gamma=-0.295$ , $p=0.352$ and $\gamma=0.312$ , $p=0.207$ respectively), implying that the correlations seen in Figure 1 are driven primarily by heliocentric distance effects for these species.

To further test the robustness of this correlation we repeated the analysis with statistical resampling of the 13 comets measured at $r_{h}<2\ $\mathrm{a}\mathrm{u}$$ in our \chCO/\chH2O dataset. We conducted a bootstrap resampling, i.e. sampling with replacement, and found that over the course of 10,000 repeats 52% of the resulting correlations were of 3-sigma significance. 93% of tests had a significance of 2-sigma or stronger. We also conducted a jack-knife resampling, where the test is repeated with a given data point dropped in turn. In this test the correlation had 3-sigma significance 23% of the time and all permutations resulted in at least a 2-sigma correlation. The weakest correlation occurred when C/1995 O1 was excluded, as would be expected given that this is the largest comet in the dataset, however the overall correlation was still moderate ( $\gamma=0.68$ , $p=0.016$ ). Overall these resampling tests show that the correlation between \chCO/\chH2O and nucleus size is relatively robust for the given dataset and is not overly dominated by a particular comet. However we acknowledge that our dataset is limited by its small size and the inherent difficulties in accurately determining the size and composition of cometary nuclei. The veracity of this correlation would be greatly strengthened with more measurements and improved estimates of the \chCO abundances in particular, given the variation in literature values for some comets.

	Ecliptic Comets			Nearly Isotropic Comets			All Comets
Species	Number	Correlation	$p$ -value	Number	Correlation	$p$ -value	Number	Correlation	$p$ -value
\chC2H2/H2O	5	0.8573	0.0633	4	0.9537	0.0463	9	0.4411	0.2347
\chC2H6/H2O	10	-0.6085	0.0619	4	0.6838	0.3162	14	0.0340	0.9083
\chCH3CN/H2O	4	-0.4120	0.5880	2	1.0000	1.0000	6	-0.0430	0.9356
\chCH3OH/H2O	13	-0.2809	0.3526	5	0.2691	0.6616	18	-0.0162	0.9493
\chCH4/H2O	5	-0.5520	0.3347	4	0.8068	0.1932	9	0.4084	0.2752
\chCO2/H2O	10	-0.0354	0.9227	2	1.0000	1.0000	12	-0.2947	0.3524
\chCO/H2O	6	0.6573	0.1560	7	0.5990	0.1553	13	0.7503	0.0031
\chH2CO/H2O	8	0.7575	0.0295	5	0.5971	0.2877	13	0.3832	0.1962
\chH2S/H2O	6	-0.4590	0.3599	2	1.0000	1.0000	8	0.0712	0.8669
\chHCN/H2O	13	-0.1625	0.5958	5	0.9446	0.0155	18	0.3124	0.2069
\chNH3/H2O	9	0.1157	0.7669	4	0.1050	0.8950	13	-0.2562	0.3981

Table 3: Similar to Table 2, here we show the Pearson correlations coefficients for the composition-radius relations for the parent species shown in Figure 1. As described in Section 3.2 only compositions with

r_{h}<2\ $\mathrm{a}\mathrm{u}$

where considered.

	Ecliptic Comets			Nearly Isotropic Comets			All Comets
Species	Number	Correlation	$p$ -value	Number	Correlation	$p$ -value	Number	Correlation	$p$ -value
\chC2H2/H2O	2	-1.0000	1.0000	4	0.9537	0.0463	6	0.4825	0.3324
\chC2H6/H2O	7	0.3974	0.3774	4	0.6838	0.3162	11	0.7194	0.0126
\chCH3CN/H2O	2	1.0000	1.0000	2	1.0000	1.0000	4	0.4053	0.5947
\chCH3OH/H2O	9	0.3317	0.3832	5	0.2691	0.6616	14	0.3563	0.2112
\chCH4/H2O	4	-0.1136	0.8864	4	0.8068	0.1932	8	0.6835	0.0616
\chCO2/H2O	6	0.7001	0.1214	2	1.0000	1.0000	8	0.1737	0.6809
\chCO/H2O	4	-0.1458	0.8542	7	0.5990	0.1553	11	0.6136	0.0447
\chH2CO/H2O	6	0.7043	0.1182	5	0.5971	0.2877	11	0.2525	0.4537
\chH2S/H2O	4	0.0417	0.9583	2	1.0000	1.0000	6	0.4129	0.4158
\chHCN/H2O	9	0.0371	0.9245	5	0.9446	0.0155	14	0.5512	0.0411
\chNH3/H2O	6	0.5948	0.2131	4	0.1050	0.8950	10	-0.1333	0.7135

Table 4: Similar to the results shown in Table 3 we display the Pearson correlation coefficients for the relation between parent species abundance and nucleus radius. In this analysis we have considered only measurements with

r_{h}<2\ $\mathrm{a}\mathrm{u}$

and made a further cut to exclude comets with radii

<1\ $\mathrm{k}\mathrm{m}$

; these smaller objects are more likely to be collisional fragments as opposed to primordial nuclei.

3.3 Potential significance of a minimum size cutoff

3.3.1 Dependence on unknown fragmentation history

An important assumption in this work is that the cometary nuclei in the dataset have not had their internal composition and structure significantly altered since their formation, e.g. by collisional or tidal events. However it has been proposed that many small Solar System bodies could be fragments of larger, primordial parent bodies, either through past collisions (Morbidelli & Rickman, 2015), or tidal/rotational breakup (Boehnhardt, 2002). Due to the limited size of our dataset, and the difficulties in definitively determining the history of small bodies without detailed in situ analysis, we have thus far only excluded the comets with a known fragmentation history from our analysis.

Here we perform a brief test to determine the influence of potential unknown fragmentation history in the data. This could distort the abundances displayed by the smallest comets in our dataset and therefore skew the results, altering the real composition-size correlation by reflecting the abundance of the larger primordial parent objects instead. In order to check for this possibility, we have repeated the Pearson correlation analysis from the previous section, but in addition to the $r_{h}<2\ $\mathrm{a}\mathrm{u}$$ restriction we also removed any comets with radius $<1\ $\mathrm{k}\mathrm{m}$$ . The 1 km cutoff was determined arbitrarily, representing exceptionally small nuclei.

The results of this test are displayed in Table 4. In comparison to Table 3 we see that the removal of the smallest comets in the dataset leads to more significant correlations for some abundances, e.g. \chC2H6/\chH2O with a decrease in $p$ -value of $0.908\rightarrow 0.012$ . However, the significance of the correlation for other species is reduced when these objects are excluded. E.g., For \chCO/\chH2O, the $p$ -value increased from $0.003\rightarrow 0.045$ . Given that the sample now consists of a smaller number of comets a reduction of statistical significance is generally to be expected. Despite this, the correlation became highly significant (3-sigma) for ethane, which is among the species with the lowest sublimation temperature beyond that of methane.

3.3.2 Dependence on volatility

For species that are much less volatile than \chCO, e.g. \chCO2 and \chHCN, the thermophysical models of Malamud et al. (2022) predict that significant migration and differentiation requires higher temperatures and thus larger cometary nuclei. As such, a correlation between these abundances and size might only be relevant beyond a certain size threshold.

In order to test this hypothesis we consider the correlation of subsets of the dataset limited by comet radius for \chCO2/HCN and \chHCN/H2O. In each test we select only the comets in the dataset with radius $>0,1,2,...9\ $\mathrm{k}\mathrm{m}$$ and determine the Pearson correlation coefficient as before, the results of which are shown in Figure 4. For \chCO2/\chH2O the correlation stays approximately the same and the $p$ -value increases as more data points are removed; the Pearson correlation test on fewer data points produces less significant results as one would expect. Therefore we cannot easily determine if this correlation is driven primarily by the comet nuclear size, or if it is being influenced by the more complicated size-heliocentric distance observational bias (Figure 2). For \chHCN/H2O the results of this correlation test fluctuate significantly, indicating that this dataset is strongly influenced by the specific objects being considered. This test implies that both datasets would benefit greatly from being larger and having improved coverage across the range of comet sizes, with more accurate measurements taken in a uniform manner and preferentially from observations at distances less than 2 au.

4 Discussion

4.1 A model for explaining the correlation between CO and size

The findings in Section 3 show that the clearest indication of correlation between size and composition exists within the activity of CO, confirming our first prediction in Section 1. In what follows we attempt to reconcile the composition-size trend with the theoretical model of Malamud et al. (2022).

In Section 1 we described the likely mode of transport of hyper-volatile gases within the nucleus. In particular, internal CO might have initially existed as either a pure ice condensate, or as a trapped gas within an amorphous ice host. In most comets the second option might be the more likely, as also newly indicated by the activity of comet 67P/C-G (Rubin et al., 2023), but regardless of which of the two options is correct, internal radiogenic heating would drive migration outwards. Beyond a certain temperature threshold, CO would either sublimate or be released due to the phase transitions of amorphous hosts such as CO₂ or H₂O. While migrating out, it would encounter cold matrices of pristine amorphous ice. It can therefore become re-incorporated in the amorphous ice host as trapped gas. Multiple lab experiments have demonstrated this to occur. Unlike co-deposition, which is the entrapment of high-volatility gases during the deposition of the amorphous host itself, the process we refer to above is often called sequential deposition (sometimes also entrapment via gas-flow or gas-streaming). It relates to gas which was streamed into an already existing, pre-deposited amorphous ice host. Seminal lab studies have shown that it is an effective way to trap high-volatility species (Bar-Nun et al., 1985; Laufer et al., 1987; Bar-Nun et al., 1987, 1988).

Unfortunately, the Malamud et al. (2022) code was not explicitly designed to treat the incorporation or release of high-volatility gases into or out of amorphous water ice. It can currently only handle the sublimation and deposition of some species such as CO or CH₄, but only as pure condensates. We shall therefore only employ an approximate calculation, giving us a rough estimation of how much CO should be re-incorporated into amorphous ice and in turn quantify the degree of near-surface amorphous ice CO-enrichment, as a consequence.

For the initial state of the comet, prior to any heating, we assume that the internal composition is uniform. Consider that some small fraction of CO is trapped within the amorphous ice hosts – CO₂ or H₂O – and is released during their respective crystallisation phase transitions. For simplicity, here we neglect the possibility of CO initially present as pure ice condensate, because these two scenarios are related. Then, following the aforementioned phase transitions, released CO migrates out towards the surface, and it can become re-trapped within the ice in its path when the temperature is sufficiently cold, but still higher than the temperature of its deposition as pure condensate. Sequential deposition of CO leads to enrichment of the CO fraction stored in the amorphous ice.

Assuming that the comets we observe probe the CO/H₂O ratio of trapped hyper-volatiles as envisioned above, we can attempt to interpret the enrichment pattern of larger comet nuclei - using the theoretical model of Malamud et al. (2022). In their work, Figures 5 through 13 showed the distribution of internal temperatures in the comet as a function of various realisations of the model parameters. The two most important model parameters were the nucleus size and the comet formation time. The temperatures correlated with the nucleus size and anti-correlated with formation time. The internal temperatures within different volume fractions in the nucleus were depicted according to a colour scheme. The black colour in those figures represented relatively pristine material heated below 70 K (all amorphous ices are stable against phase transitions); red depicted temperatures in the range 70K< $T$ <100K (allowing CO₂ amorphous ice to release trapped CO gas when undergoing a phase transition); orange depicted temperatures in the range 100K< $T$ <170K (allowing H₂O amorphous ice to likewise release its trapped gases); yellow and white correspond to even higher temperature thresholds (also corresponding to full release of all trapped CO). We assume that the released CO gas flows toward the pristine (black coloured) volume fraction, which is closer to the surface. As already pointed out, this layer is sufficiently cold to keep its amorphous ices in their pristine state, but now allowing the excess CO gas to become re-entrapped there, enriching the CO abundance. For simplicity, we assume that these layers are enriched with CO uniformly. Given the aforementioned dependence on model parameters, it immediately follows that the degree of enrichment is greater for large comets and/or comets with a smaller formation time.

In order to compare the model enrichment to the observations, we will derive a simple mathematical formula based on the assumptions above. In Figures 5, 6 & 7 we plot the model predictions alongside the observed data points. For the comparison we use observed data points from Figure 3. It must be noted that Figure 1 considers the CO/H₂O mass ratio of comets up to an observed distance of 6 au. However, comets that are observed beyond 2 au are not able to directly sublimate significant amounts of water ice. In contrast, they are certainly able to expel trapped CO gas, released through crystallisation of the amorphous ice hosts. While water can still be expelled to some extent as a byproduct of hyper-volatile activity, we can expect the CO/H₂O ratio to be enhanced. Therefore, beyond 2 au, this ratio should not be indicative of the intrinsic mass fraction of entrapped CO within the amorphous host ice at the surface. Indeed, Figure 1 shows that the peak observed CO/H₂O ratios are in excess of 1. This is only possible due to the large observed distance, because the amorphous ice host cannot contain more trapped gas than matrix (Carmack et al., 2023). Figure 3 on the other hand, shows only the comets whose distance from the Sun at the time of observation is less than 2 au. It should therefore be more indicative of the intrinsic properties of the amorphous ices, which is why we have chosen to use it for the comparison.

In order to plot the theoretical enrichment curves, we first express Figures 5-13 of Malamud et al. (2022) in terms of mass rather than volume. We then define the following free parameters with respect to mass: $f_{\rm H2O}$ - the mass fraction of amorphous H₂O ice in the nucleus; $f_{\rm CO2}$ - the mass fraction of amorphous CO₂ ice in the nucleus; and $f_{\rm CO}$ - the initial mass fraction of trapped CO in the amorphous H₂O or CO₂ host ices (assumed to be equal for simplicity).

Comets are presently regarded to be highly refractory-rich bodies, having a refractory to ice mass ratios in approximately the range of 3-5 (Rotundi et al., 2015; Fulle et al., 2016, 2017, 2019; Choukroun et al., 2020), and comet 67P/C-G, the most well-studied cometary archetype (Fulle et al., 2016; Filacchione et al., 2019; Groussin et al., 2019) has a refractory to ice mass ratio of about 4. We therefore use a combined ice mass fraction of $f_{\rm H2O}+f_{\rm CO2}=0.2$ to comply with these estimates. For the mass ratio between H₂O and CO₂ we also rely on estimates from comet 67P/C-G, with a respective ratio of $\sim$ 15 (Rubin et al., 2023). For our fiducial parameter set we thus have: $f_{\rm H2O}=0.1875$ and $f_{\rm CO2}=0.0125$ .

The choice of $f_{\rm trapCO}$ is motivated by the raw observed data. Figure 3 shows that the smallest CO/H₂O number fraction, in the smallest comet nucleus, is around 0.003. Were comets to be completely pristine, this would have given us the approximate value of $f_{\rm trapCO}$ (amorphous water ice near the surface sublimates along with its entrapped CO, which is uniformly distributed throughout the whole nucleus). However, Figures 5-13 in Malamud et al. (2022) show that even comets with radii as small as 0.5 km can still attain temperatures in excess of 70 K deep beneath the surface, despite their small size (but only when minimising their formation time). Therefore, even in small nuclei some degree of migration and enrichment of CO is possible, and in such cases the incipient CO/H₂O could be slightly smaller than 0.003. To account for this, we choose a round value of CO/H₂O=0.001, slightly lower than yet characteristic of the 0.003 minimum observed. From this we obtain the mass fraction $f_{\rm trapCO}$ (multiplying by the molecular weight ratio - see below), capturing the right order of magnitude based on the minimum observed CO fraction.

Using $F1$ , $F2$ and $F3$ to denote the mass fractions of layers within comet nuclei that have $T<70$ K, $70<T<100$ K, $T>100$ K, obtained from Malamud et al. (2022), we can calculate the fraction of CO released from the bulk of the comet, denoted as

{\rm CO}_{\rm bulk}=F3\cdot f_{\rm H2O}\cdot f_{\rm trapCO}+(F2+F3)\cdot f_{% \rm CO2}\cdot f_{\rm trapCO}

(1)

Using Eq. 1, the fraction of CO newly trapped inside the pristine amorphous water ice, i.e. the degree of its enrichment, denoted by $f_{\rm enrichCO}$ , is approximately given by

f_{\rm enrichCO}\cong\left(\frac{F1\cdot(f_{\rm H2O}+f_{\rm CO2})\cdot f_{\rm trapCO% }+{\rm CO}_{\rm bulk}}{F1\cdot f_{\rm H2O}}\right)\frac{m_{\rm H2O}}{m_{\rm CO}}

(2)

where $m_{\rm H2O}=18$ and $m_{\rm CO}=28$ are the molecular weights of H₂O and CO molecules. The molecular weight ratio is required in order to go from mass fraction to number fraction, as in our reported values from observations. Note that the actual ratios in the coma also depend on the relative life times of the molecules in the coma, an effect which we do not consider here. Therefore, Eq. 2 has to be taken only as a first-order approximation, but this approximation is good enough to capture the trend in the data. Recall again that $f_{\rm trapCO}$ denotes the initial uniform fraction of trapped CO within the amorphous ice matrices (assumed equal for H₂O and CO₂), whereas $f_{\rm enrichCO}$ denotes the final enriched ratio in the remaining H₂O amorphous ice.

Figures 5-7 show the CO/H₂O abundances predicted by Eq. 2. Different lines depict different formation times for each comet (quicker formation corresponds to greater heating by short-lived radionuclides), and the observations are marked by the full circles, for comparison. A detailed explanation of all the model parameters is given in Malamud et al. (2022). Here we provide a brief explanation. The mineral fraction is introduced to the model since radionuclides are only incorporated into refractory silicate minerals, and not organics. The former might not be present in comets in the same proportion as they are in meteorites, based on which the radionuclide information is derived (we also consider 50% and 5% of meteoritic fraction). The pebble radius controls heat and mass transport inside the comet, and we take a binary selection for the pebble radii of 1 mm and 1 cm. This choice roughly represents the lower and upper limits expected in the literature. The permeability $b$ coefficient is related to the Knudsen diffusivity and in turn gas permeability and flow within the comet. We also consider lower and upper limit values.

It is encouraging that a physical interpretation, however approximated, nicely captures the observed CO/H₂O trend. If some comets were to form earlier than others, the various curves span the desired range of the observations. We note that only one set of parameters is adopted for $f_{\rm H2O}$ , $f_{\rm CO2}$ and $f_{\rm trapCO}$ in these plots, however our choice of parameters was physically motivated by 67P and explained above. We had also experimented with changing the $f_{\rm H2O}$ : $f_{\rm CO2}$ mass ratio considerably, and we always find that as long as H₂O is the dominant amorphous ice host, the model keeps capturing the trend with some small variations. It is indeed expected that water is much more prevalent than carbon dioxide in (the bulk of) comet nuclei.

Based on Figures 5-7 one might also speculate further that the peak CO/H₂O mass ratio found for mid-to-large sized comets can be more readily explained (a) by early formation; or (b) if the mineral fraction is not as small as 0.05. In addition, the spread in CO/H₂O ratios at each size bin may indicate that comets are formed over an extended period of time and/or have a large variation in mineral fraction. Current data does not provide an easy way to differentiate between various options, but there is indeed some indication that comet nuclei have varied mineral fractions. Recently, Spitzer remote observations of cometary dust revealed a wide range of amorphous carbon mass fractions spanning 10-90 %, based on a large set of a few dozen comets (Harker et al., 2023). While the mean value of 54% indicates that the mass ratio of silicate minerals to organics is, on average, around 1:1, i.e. very similar to comets 67P/C-G (Bardyn et al., 2017), C/2013 US₁₀ (Catalina) (Woodward et al., 2021) and Halley (Fomenkova & Chang, 1993), a spread in the mineral fraction is currently supported. We think that this point is very important and we strongly advocate for future study of the ratio of silicates/organics in cometary dust.

An additional point is that low CO/H₂O ratios in certain comets should simply be a sign that the CO-enriched layers have been largely removed already, indicating that these comets are more dynamically evolved than their high CO/H₂O ratio counterparts. A prominent example is the comet 2P/Encke, which we know has been active for at least a few centuries (Marsden & Sekanina, 1974). Thus, at each size bin, if a comet was observed to have a CO/H₂O ratio in the upper part of the spread, it might also be considered a sign of having a relatively fresh dynamical origin.

These plots also reveal that the pebble size and the $b$ coefficient are of lesser importance compared to other parameters, echoing the conclusions already suggested by Malamud et al. (2022).

4.2 Other species less volatile than CO

An intriguing result is the presence of a significant correlation between size and CO/H₂O ratio, while observing a lack of correlation, or a weaker correlation among the ratios of some other volatiles. There are two possible explanations. The first might be a simple lack of observations, given that we have a rather small statistical sample for many species. The second explanation is much more fundamental.

We hypothesise that most volatiles released from their amorphous ice hosts would be buried deeper inside the comet, hiding from our sight as only the outermost surface layers are being eroded through activity. Of the many hyper- and super-volatiles that we consider in this work (incorporated into our sample because it is possible to observe them via telescope surveys), only CO and CH₄ have extremely low sublimation/deposition temperatures (Womack et al., 2017). That being the case, the other hyper-volatiles would encounter temperatures that should lead to their re-incorporation into amorphous ice before CO and CH₄, and the super-volatiles could even deposit as pure ice condensates, when they have characteristic sublimation temperatures that are higher than the crystallisation temperatures of the host amorphous ice. The resulting outcome is that they are buried deeper within the comet, somewhere between the surface and the location of their release by radiogenic heating.

The explanation is qualitatively straightforward and depends on temperature. The surface temperature of an active comet in the inner Solar System is determined by its exact heliocentric distance. However, below the skin depth the temperature is much colder – and is a relic of its previous location before it was perturbed into the inner Solar System. For example, if it came from the Kuiper belt, this temperature is certainly lower than the temperature of crystallisation of amorphous ice, but often not lower than the deposition temperature of CO and CH₄ (Lisse et al., 2021; Parhi & Prialnik, 2023). We therefore expect the main volatiles which are not CO or CH₄ to be buried deeper below the surface of the comet. Their release location within the comet corresponds to the inner cubic-amorphous ice boundary. In this context, inner refers to the boundary that forms as a result of an internal temperature gradient due to radiogenic heating from within. It should not be confused with the (outer) amorphous-cubic boundary that might form externally by a heat wave propagating inwards from the insolated comet surface (triggering crystallisation). Their exact burial location is a function of the characteristic deposition temperature (e.g., HCN would be buried deeper than C₂H₆).

One interpretation of our results therefore might be that most active comets are still eroding their outermost layers, which are not significantly enriched by any of these less volatile gases. It should be noted that some species, such as HCN (see Figure 3) do exhibit a much more moderate slope, in contrast to the steep slope we obtained for CO. This could still be reconciled with our hypothesis, since comet nuclei are neither spheri-symmetric nor is their surface eroded homogeneously. In reality only small fractions of the comet surface might be eroded to expose deeper layers, so the integrated result for the entire comet circumference gives the outcome that larger nuclei also release more of these gases, but the slope is moderated by these geometrical factors.

It begs the question however, why is CH₄ not giving us the same slope as CO? While CH₄ indeed sublimates at a slightly higher temperature, the small difference is not a likely explanation. A more robust explanation involves the trapping efficiency of these two gases in amorphous ice. Bar-Nun et al. (1988) have shown that when both CO and CH₄ gases are streamed into a pre-deposited amorphous ice, the entrapment of CH₄ is 150 times more efficient than that of CO ⁵⁵5This result is for a temperature of 50 K and when ample CO and CH₄ gas is used in the experiment (CH₄ and CO molecules differ in both their size and energy of interaction with the host ice due to their polarity, leading to the greater ease of trapping of CH₄).. For comets this would mean that sequential deposition of these two gases when they are released together from the underlying crystallising ice, leads to the preferential trapping of CH₄, while CO continues to flow until it sees no competition from CH₄. The expectation is therefore that CO lies closer to the surface, while CH₄ is buried deeper. A final and trivial explanation is that it is simply due to the small amount of data available for CH₄, only 10 comets in total, and the larger dispersion of the measurements. This latter comment is however true, in general, for virtually all the other species as well.

4.3 A note about cometary outbursts

We briefly note that cometary outbursts at large heliocentric distances, post perihelion passage, are often associated with a heat wave propagating inwards from an insolated surface, to a deeply buried crystalline-amorphous ice boundary. Upon reaching this boundary with sufficient energy, crystallisation releases latent heat as well as entrapped gases. Latent heat is important because it can trigger further crystallisation, a process however which cannot continue indefinitely, since the eventual sublimation of ice absorbs a large amount of energy. Trapped gases are important since they are effectively the cause of the outbursts. On release, these gases lead to build-up of pressure, and only when this pressure exceeds the tensile strength of the ambient solid materials, it leads to cracking and possibly more rapid expulsion of gas (Prialnik & Bar-Nun, 1987, 1992).

The current study does not alter this basic picture in any way. However, we have envisioned here the formation of localised spots of amorphous ice, highly enriched with various high-volatility species. These are buried at various distances from the surface, based on the sublimation properties of each particular volatile, and after having concentrated them from the bulk of the comet. For a spherically-symmetric nucleus, an onion-like stratification might be expected. Yet in reality comet nuclei have irregular shapes, which means that these pockets might be rather more sporadically placed. The significance in relation to our study, is simply to justify a large fraction of high-volatility species required for an enhanced outburst.

4.4 Other predictions

In Section 1 we also presented two additional predictions besides the general size-composition correlation. We suggested that different dynamical classes of comets might exhibit significant differences in their size-composition correlation, as a result of differences in their erosional state. Our findings in Section 3 however cannot confirm this hypothesis. We believe that this could certainly be due to insufficient statistics to drive a conclusion.

We additionally speculated that dynamically evolved active comets are smaller than their long period counterparts. Figure 3 shows that this is strictly correct (the triangle positions tend to be located more to the right, wheres the squares are positioned more to the left, within each sub-plot). However, we caution that this result might also simply be an observational bias – ECs are generally less active and can be observed inactive closer than NICs and therefore it is preferentially easier to work out the size for them. At this time we require more data to confirm all our other predictions.

4.5 Future observations

In order to improve comparisons between models and comet composition in the future, more data are necessary. In particular, we need larger homogeneous datasets for comet sizes but also comet composition information where the abundance of species is measured simultaneously (or as close to that as possible), at different distances from the Sun, and in a consistent way in terms of observational techniques and models used to derive production rates. With CO and CO₂ being the most abundant volatiles in comets, abundance measurements for these elements for a larger number of comets for which we have size information is particularly critical. Additional measurements of CH₄ and N₂ abundances would also be very valuable, as these species have sublimation temperatures close to that of CO. N₂ abundances were not presented in this work as they are extremely difficult to measure. This species was only detected in situ in the coma of comet 67P/C-G by the ROSINA instrument onboard the Rosetta mission (Rubin et al., 2015). While N₂ itself is difficult to detect, N ${}_{2}^{+}$ can be observed at optical wavelengths from the ground and has been observed in a handful of comets (Korsun et al., 2008; Ivanova et al., 2018; Cochran & McKay, 2018; Opitom et al., 2019). This can then be used to infer the abundance of N₂ in comets. More N₂ or N ${}_{2}^{+}$ measurements would be very valuable in the future. In general, this type of work would particularly benefit from a more substantial sample of large comets for which composition information is available. Indeed, this database only contains a handful of comets larger than 5 km with composition information.

5 Conclusions

In this manuscript, following predictions of a model from Malamud et al. (2022), we gathered a large number of literature data to search for correlations between the size and composition of comets.

•

For the dataset we have gathered we found a statistically significant correlation between the CO/H₂O abundance ratios and the sizes of both ecliptic and nearly isotropic comets. This trend persists even when selecting for comets observed within 2 au from the Sun, indicating it is not driven by changes in the abundance ratios with heliocentric distance.
•

A weaker correlation was also observed for some other volatile species, however further tests indicate that our analysis would critically benefit from obtaining a bigger statistical sample in the future.
•

We do not see any strong correlations for daughter species.
•

We do not see a similarly strong correlation for CH₄, in spite of having a comparable sublimation temperature to that of CO.

We develop a simple theoretical framework based on the Malamud et al. (2022) model, with which we rather accurately obtain the CO/H₂O abundance-to-size trend in our observed data. In this framework we consider CO to migrate from the bulk of the nucleus outwards, becoming entrapped within its outer amorphous ice layers, and in turn enhancing their CO-enrichment as a function of the nucleus size.

We emphasise that the correlation between \chCO/\chH2O abundance and size appears to be robust for the dataset we have presented, where we have gathered together a wide range of measurements from the available literature. However this dataset is ultimately limited by its size and also by the intrinsic difficulties in accurately determining the physical properties of cometary nuclei from a variety of observations and techniques. This study would have benefited from, and therefore strongly motivates, a larger homogeneous set of composition measurements in the future, in particular for highly volatile species like CO, CH₄, or N₂. State of the art observatories, e.g. JWST and the upcoming Vera C. Rubin Observatory and ELT, could provide more opportunities to characterise the physical properties of cometary nuclei, especially the sizes of long period comets which are otherwise sparse in the literature.

Acknowledgements

The authors would like to thank the referee for a thorough review that helped to improve the work. We wish to thank Diana Laufer for providing information about the relative entrapment efficiency of CO versus CH₄ in amorphous water ice. We also thank Rosita Kokotanekova for valuable input on the selection of comet sizes from literature sources. UM and HBP acknowledge support by the Niedersächsisches Vorab in the framework of the research cooperation between Israel and Lower Saxony under grant ZN 3630 and grant by MOST-space. CO and JR ackowledge the support of the Royal Society. This work made use of the NASA SBDB service and PDS datasets ear-c-phot-5-rdr-lowell-comet-db-pr-v1.0 (Osip et al., 2003) and urn:nasa:pds:compil-comet:nuc_properties::1.0 (Barnes et al., 2010). The following software packages were used in this work: matplotlib (Hunter, 2007), numpy (Harris et al., 2020), scipy (Virtanen et al., 2020), pandas (McKinney, 2010), pds3 (Kelley, 2021), pds4_tools (Nagdimunov, 2021), astropy (Astropy Collaboration et al., 2022), astroquery (Ginsburg et al., 2019), sbpy (Mommert et al., 2019) and camelot (Mehta, 2021). For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.

Data Availability

The dataset constructed in this work is available online as Supple- mentary data and also from the University of Edinburgh DataShare repository (https://doi.org/10.7488/ds/7723). The data compiled in this work may also be obtained via reasonable e-mail request to the lead authors.

References

A’Hearn et al. (1995) A’Hearn M. F., Millis R. C., Schleicher D. O., Osip D. J., Birch P. V., 1995, Icarus, 118, 223
Altenhoff et al. (1999) Altenhoff W. J., et al., 1999, Astronomy and Astrophysics, 348, 1020
Astropy Collaboration et al. (2022) Astropy Collaboration et al., 2022, The Astrophysical Journal, 935, 167
Bar-Nun et al. (1985) Bar-Nun A., Herman G., Laufer D., Rappaport M. L., 1985, Icarus, 63, 317
Bar-Nun et al. (1987) Bar-Nun A., Dror J., Kochavi E., Laufer D., 1987, Phys. Rev. B, 35, 2427
Bar-Nun et al. (1988) Bar-Nun A., Kleinfeld I., Kochavi E., 1988, Phys. Rev. B, 38, 7749
Bardyn et al. (2017) Bardyn A., et al., 2017, MNRAS, 469, S712
Barnes et al. (2010) Barnes T. F., A’Hearn M. F., Kolokolova L., 2010, Properties of Comet Nuclei, Version 2.0, doi:10.26007/CSR5-JW43
Bauer et al. (2017) Bauer J. M., et al., 2017, The Astronomical Journal, 154, 53
Bischoff et al. (2021) Bischoff D., Gundlach B., Blum J., 2021, MNRAS, 508, 4705
Biver et al. (1999) Biver N., et al., 1999, The Astronomical Journal, 118, 1850
Biver et al. (2007) Biver N., et al., 2007, Icarus, 187, 253
Biver et al. (2011a) Biver N., et al., 2011a, in EPSC-DPS Joint Meeting 2011. p. 938
Biver et al. (2011b) Biver N., Bockelée-Morvan D., Colom P., Crovisier J., Paubert G., Weiss A., Wiesemeyer H., 2011b, Astronomy & Astrophysics, 528, A142
Biver et al. (2012) Biver N., et al., 2012, Astronomy & Astrophysics, 539, A68
Biver et al. (2019) Biver N., et al., 2019, Astronomy & Astrophysics, 630, A19
Biver et al. (2021a) Biver N., et al., 2021a, Astronomy & Astrophysics, 648, A49
Biver et al. (2021b) Biver N., et al., 2021b, Astronomy & Astrophysics, 651, A25
Biver et al. (2022a) Biver N., Russo N. D., Opitom C., Rubin M., 2022a, Chemistry of Comet Atmospheres (arxiv:2207.04800), doi:10.48550/arXiv.2207.04800
Biver et al. (2022b) Biver N., Boissier J., Bockelée-Morvan D., Crovisier J., Cottin H., Cordiner M. A., Roth N. X., Moreno R., 2022b, Astronomy & Astrophysics, 668, A171
Bockelée-Morvan et al. (2000) Bockelée-Morvan D., et al., 2000, Astronomy and Astrophysics, 353, 1101
Bockelée-Morvan et al. (2004) Bockelée-Morvan D., et al., 2004, Icarus, 167, 113
Bockelée-Morvan et al. (2010) Bockelée-Morvan D., et al., 2010, Astronomy and Astrophysics, 518, L149
Bockelée-Morvan et al. (2016) Bockelée-Morvan D., et al., 2016, Monthly Notices of the Royal Astronomical Society, 462, S170
Bockelée-Morvan et al. (2022) Bockelée-Morvan D., et al., 2022, Astronomy & Astrophysics, 664, A95
Bodewits et al. (2011) Bodewits D., Kelley M. S., Li J.-Y., Landsman W. B., Besse S., A’Hearn M. F., 2011, The Astrophysical Journal, 733, L3
Bodewits et al. (2022) Bodewits D., Bonev B. P., Cordiner M. A., Villanueva G. L., 2022, Radiative Processes as Diagnostics of Cometary Atmospheres (arxiv:2209.02616)
Boehnhardt (2002) Boehnhardt H., 2002, Earth Moon and Planets, 89, 91
Boehnhardt (2004) Boehnhardt H., 2004, in , Comets II. University of Arizona Press, p. 301
Boehnhardt et al. (1999) Boehnhardt H., Rainer N., Birkle K., Schwehm G., 1999, Astronomy and Astrophysics, v.341, p.912-917 (1999), 341, 912
Boehnhardt et al. (2002) Boehnhardt H., et al., 2002, Astronomy & Astrophysics, 387, 1107
Boehnhardt et al. (2008) Boehnhardt H., Tozzi G. P., Bagnulo S., Muinonen K., Nathues A., Kolokolova L., 2008, Astronomy & Astrophysics, 489, 1337
Boissier et al. (2011) Boissier J., et al., 2011, Astronomy & Astrophysics, 528, A54
Boissier et al. (2013) Boissier J., et al., 2013, Astronomy & Astrophysics, 557, A88
Bonev et al. (2021) Bonev B. P., et al., 2021, The Planetary Science Journal, 2, 45
Buratti et al. (2004) Buratti B., Hicks M., Soderblom L., Britt D., Oberst J., Hillier J., 2004, Icarus, 167, 16
Capria et al. (2017) Capria M. T., et al., 2017, Monthly Notices of the Royal Astronomical Society, 469, S685
Carmack et al. (2023) Carmack R. A., Tribbett P. D., Loeffler M. J., 2023, ApJ, 942, 1
Choi et al. (2002) Choi Y.-J., Cohen M., Merk R., Prialnik D., 2002, Icarus, 160, 300
Choukroun et al. (2020) Choukroun M., et al., 2020, Space Sci. Rev., 216, 44
Cochran & McKay (2018) Cochran A. L., McKay A. J., 2018, ApJ, 854, L10
Cochran et al. (2012) Cochran A., Barker E., Gray C., 2012, Icarus, 218, 144
Collings et al. (2003) Collings M. P., Dever J. W., Fraser H. J., McCoustra M. R. S., Williams D. A., 2003, ApJ, 583, 1058
Combi et al. (2019) Combi M., Mäkinen T., Bertaux J.-L., Quémerais E., Ferron S., 2019, Icarus, 317, 610
Combi et al. (2021) Combi M. R., Mäkinen T., Bertaux J.-L., Quémerais E., Ferron S., 2021, The Astrophysical Journal Letters, 907, L38
Davidsson (2021) Davidsson B. J. R., 2021, Monthly Notices of the Royal Astronomical Society, 505, 5654
De Sanctis et al. (2001) De Sanctis M. C., Capria M. T., Coradini A., 2001, The Astronomical Journal, 121, 2792
Dello Russo et al. (2016) Dello Russo N., Kawakita H., Vervack R. J., Weaver H. A., 2016, Icarus, 278, 301
Dello Russo et al. (2020) Dello Russo N., et al., 2020, Icarus, 335, 113411
DiSanti et al. (2014) DiSanti M. A., Villanueva G. L., Paganini L., Bonev B. P., Keane J. V., Meech K. J., Mumma M. J., 2014, Icarus, 228, 167
DiSanti et al. (2017) DiSanti M., et al., 2017, Central Bureau Electronic Telegrams, 4357, 1
Drozdovskaya et al. (2023) Drozdovskaya M. N., et al., 2023, Astronomy & Astrophysics, 677, A157
Eisner et al. (2019) Eisner N. L., Knight M. M., Snodgrass C., Kelley M. S. P., Fitzsimmons A., Kokotanekova R., 2019, The Astronomical Journal, 157, 186
Faggi et al. (2019) Faggi S., Mumma M. J., Villanueva G. L., Paganini L., Lippi M., 2019, The Astronomical Journal, 158, 254
Faggi et al. (2021) Faggi S., Lippi M., Camarca M., Buzard C. F., Villanueva G. L., Doppmann G. W., Blake G. A., Mumma M. J., 2021, The Astronomical Journal, 162, 178
Farnham et al. (2017) Farnham T., Kelley M. S., Bodewits D., Bauer J. M., 2017, in AAS/Division for Planetary Sciences Meeting. p. 403.01
Feaga et al. (2013) Feaga L. M., et al., 2013, The Astronomical Journal, 147, 24
Feldman et al. (2018) Feldman P. D., et al., 2018, The Astronomical Journal, 155, 9
Fernández et al. (2013) Fernández Y. R., et al., 2013, Icarus, 226, 1138
Filacchione et al. (2019) Filacchione G., et al., 2019, Space Science Reviews, 215, 19
Fink (2009) Fink U., 2009, Icarus, 201, 311
Fink & Hicks (1996) Fink U., Hicks M. D., 1996, ApJ, 459, 729
Fomenkova & Chang (1993) Fomenkova M., Chang S., 1993, in Lunar and Planetary Science Conference. Lunar and Planetary Science Conference. p. 501
Fomenkova et al. (1995) Fomenkova M. N., Jones B., Pina R., Puetter R., Sarmecanic J., Gehrz R., Jones T., 1995, The Astronomical Journal, 110, 1866
Fulle et al. (2016) Fulle M., et al., 2016, ApJ, 821, 19
Fulle et al. (2017) Fulle M., et al., 2017, MNRAS, 469, S45
Fulle et al. (2019) Fulle M., et al., 2019, MNRAS, 482, 3326
Gálvez et al. (2007) Gálvez O., Ortega I. K., Maté B., Moreno M. A., Martın-Llorente B., Herrero V. J., Escribano R., Gutiérrez P. J., 2007, Astronomy and Astrophysics, 472, 691
Gálvez et al. (2008) Gálvez Ó., Maté B., Herrero V. J., Escribano R., 2008, Icarus, 197, 599
Gicquel et al. (2015) Gicquel A., et al., 2015, The Astrophysical Journal, 807, 19
Ginsburg et al. (2019) Ginsburg A., et al., 2019, The Astronomical Journal, 157, 98
Groussin et al. (2010) Groussin O., Lamy P., Jorda L., 2010, Planetary and Space Science, 58, 904
Groussin et al. (2019) Groussin O., et al., 2019, Space Science Reviews, 215, 29
Gundlach et al. (2011) Gundlach B., Skorov Y. V., Blum J., 2011, Icarus, 213, 710
Gundlach et al. (2020) Gundlach B., Fulle M., Blum J., 2020, Monthly Notices of the Royal Astronomical Society, 493, 3690
Güttler et al. (2023) Güttler C., et al., 2023, MNRAS, 524, 6114
Harker et al. (2023) Harker D. E., Wooden D. H., Kelley M. S. P., Woodward C. E., 2023, The Planetary Science Journal, 4, 242
Harmon et al. (1997) Harmon J. K., et al., 1997, Science, 278, 1921
Harmon et al. (2008) Harmon J. K., Nolan M. C., Howell E. S., Giorgini J. D., 2008, International Astronomical Union Circular, 8909, 1
Harrington Pinto et al. (2022) Harrington Pinto O., Womack M., Fernandez Y., Bauer J., 2022, The Planetary Science Journal, 3, 247
Harris (1998) Harris A. W., 1998, Icarus, 131, 291
Harris et al. (2002) Harris W. M., Scherb F., Mierkiewicz E., Oliversen R., Morgenthaler J., 2002, The Astrophysical Journal, 578, 996
Harris et al. (2020) Harris C. R., et al., 2020, Nature, 585, 357
Herrero et al. (2010) Herrero V. J., Gálvez Ó., Maté B., Escribano R., 2010, Physical Chemistry Chemical Physics, 12, 3164
Hu et al. (2019) Hu X., Gundlach B., von Borstel I., Blum J., Shi X., 2019, Astronomy and Astrophysics, 630, A5
Hunter (2007) Hunter J. D., 2007, Computing in Science & Engineering, 9, 90
Ivanova et al. (2018) Ivanova O. V., Picazzio E., Luk’yanyk I. V., Cavichia O., Andrievsky S. M., 2018, Planet. Space Sci., 157, 34
Jewitt (2009) Jewitt D., 2009, The Astronomical Journal, 137, 4296
Jewitt (2022) Jewitt D., 2022, The Astronomical Journal, 164, 158
Jorda et al. (2016) Jorda L., et al., 2016, Icarus, 277, 257
Kelley (2021) Kelley M., 2021, pds3, https://github.com/mkelley/pds3
Kelley & Kolokolova (2014) Kelley M., Kolokolova L., 2014, in Proceedings of Asteroids, Comets, Meteors 2014. p. 262
Knight et al. (2023) Knight M. M., Kokotanekova R., Samarasinha N. H., 2023, Physical and Surface Properties of Comet Nuclei from Remote Observations, doi:10.48550/arXiv.2304.09309
Korsun et al. (2008) Korsun P. P., Ivanova O. V., Afanasiev V. L., 2008, Icarus, 198, 465
Kumi et al. (2006) Kumi G., Malyk S., Hawkins S., Reisler H., Wittig C., 2006, Journal of Physical chemistry A, 110, 2097
Lamy et al. (2004) Lamy P. L., Toth I., Fernandez Y. R., Weaver H. A., 2004, in , Comets II. p. 223
Lamy et al. (2009) Lamy P. L., Toth I., Weaver H. A., A’Hearn M. F., Jorda L., 2009, Astronomy & Astrophysics, 508, 1045
Lamy et al. (2011) Lamy P. L., Toth I., Weaver H. A., A’Hearn M. F., Jorda L., 2011, Monthly Notices of the Royal Astronomical Society, 412, 1573
Langland-Shula & Smith (2011) Langland-Shula L. E., Smith G. H., 2011, Icarus, 213, 280
Laufer et al. (1987) Laufer D., Kochavi E., Bar-Nun A., 1987, Phys. Rev. B, 36, 9219
Lejoly et al. (2022) Lejoly C., et al., 2022, The Planetary Science Journal, 3, 17
Levison (1996) Levison H. F., 1996, in Rettig T., Hahn J. M., eds, Astronomical Society of the Pacific Conference Series Vol. 107, Completing the Inventory of the Solar System. pp 173–191
Li et al. (2020) Li J., Jewitt D., Mutchler M., Agarwal J., Weaver H., 2020, The Astronomical Journal, 159, 209
Lippi et al. (2021) Lippi M., Villanueva G. L., Mumma M. J., Faggi S., 2021, The Astronomical Journal, 162, 74
Lis et al. (2019) Lis D. C., et al., 2019, Astronomy & Astrophysics, 625, L5
Lisse et al. (2021) Lisse C. M., et al., 2021, Icarus, 356, 114072
Malamud et al. (2022) Malamud U., Landeck W. A., Bischoff D., Kreuzig C., Perets H. B., Gundlach B., Blum J., 2022, Monthly Notices of the Royal Astronomical Society, 514, 3366
Marsden & Sekanina (1974) Marsden B. G., Sekanina Z., 1974, AJ, 79, 413
Maté et al. (2008) Maté B., Gálvez O., Martín-Llorente B., Moreno M. A., Herrero V. J., Escribano R., Artacho E., 2008, Journal of Physical chemistry A, 112, 457
Mazzotta Epifani et al. (2008) Mazzotta Epifani E., Palumbo P., Capria M. T., Cremonese G., Fulle M., Colangeli L., 2008, Monthly Notices of the Royal Astronomical Society, 390, 265
McKay et al. (2015) McKay A. J., et al., 2015, Icarus, 250, 504
McKay et al. (2019) McKay A. J., et al., 2019, The Astronomical Journal, 158, 128
McKinney (2010) McKinney W., 2010, in Python in Science Conference. Austin, Texas, pp 56–61, doi:10.25080/Majora-92bf1922-00a
Mehta (2021) Mehta V., 2021, camelot, https://pypi.org/project/camelot-py/
Mommert et al. (2019) Mommert M., et al., 2019, Journal of Open Source Software, 4, 1426
Morbidelli & Rickman (2015) Morbidelli A., Rickman H., 2015, Astronomy & Astrophysics, 583, A43
Moulane et al. (2018) Moulane Y., Jehin E., Opitom C., Pozuelos F. J., Manfroid J., Benkhaldoun Z., Daassou A., Gillon M., 2018, Astronomy & Astrophysics, 619, A156
Mumma et al. (2005) Mumma M. J., et al., 2005, Science, 310, 270
Nagdimunov (2021) Nagdimunov L., 2021, pds4-tools, https://pypi.org/project/pds4-tools/
Ootsubo et al. (2012) Ootsubo T., et al., 2012, The Astrophysical Journal, 752, 15
Opitom et al. (2016) Opitom C., et al., 2016, Astronomy & Astrophysics, 589, A8
Opitom et al. (2019) Opitom C., et al., 2019, A&A, 624, A64
Osip et al. (2003) Osip D. J., A’Hearn M., Raugh A. C., 2003, Lowell Observatory Cometary Database - Production Rates, doi:10.26007/0A3F-R875
Paganini et al. (2012) Paganini L., Mumma M. J., Villanueva G. L., DiSanti M. A., Bonev B. P., Lippi M., Boehnhardt H., 2012, The Astrophysical Journal, 748, L13
Parhi & Prialnik (2023) Parhi A., Prialnik D., 2023, MNRAS, 522, 2081
Pittichová et al. (2008) Pittichová J., Woodward C. E., Kelley M. S., Reach W. T., 2008, The Astronomical Journal, 136, 1127
Prialnik & Bar-Nun (1987) Prialnik D., Bar-Nun A., 1987, ApJ, 313, 893
Prialnik & Bar-Nun (1992) Prialnik D., Bar-Nun A., 1992, A&A, 258, L9
Prialnik et al. (1987) Prialnik D., Bar-Nun A., Podolak M., 1987, The Astrophysical Journal, 319, 993
Reach et al. (2013) Reach W. T., Kelley M. S., Vaubaillon J., 2013, Icarus, 226, 777
Rosser et al. (2018) Rosser J. D., et al., 2018, The Astronomical Journal, 155, 164
Roth et al. (2018) Roth N. X., Gibb E. L., Bonev B. P., DiSanti M. A., Dello Russo N., Vervack Jr. R. J., McKay A. J., Kawakita H., 2018, The Astronomical Journal, 156, 251
Roth et al. (2020) Roth N. X., et al., 2020, The Astronomical Journal, 159, 42
Rotundi et al. (2015) Rotundi A., et al., 2015, Science, 347, 3905
Rubin et al. (2015) Rubin M., et al., 2015, Science, 348, 232
Rubin et al. (2019) Rubin M., et al., 2019, Monthly Notices of the Royal Astronomical Society, 489, 594
Rubin et al. (2023) Rubin M., et al., 2023, Monthly Notices of the Royal Astronomical Society, 526, 4209
Schleicher (2008) Schleicher D. G., 2008, The Astronomical Journal, 136, 2204
Schuller & Struve (1930) Schuller Fr., Struve G., 1930, International Astronomical Union Circular, 288, 2
Schweighart et al. (2021) Schweighart M., Macher W., Kargl G., Gundlach B., Capelo H. L., 2021, Monthly Notices of the Royal Astronomical Society, 504, 5513
Scotti (1994) Scotti J. V., 1994, in American Astronomical Society Meeting Abstracts. p. 43.06
Sekanina et al. (2004) Sekanina Z., Brownlee D. E., Economou T. E., Tuzzolino A. J., Green S. F., 2004, Science, 304, 1769
Simon et al. (2023) Simon A., Rajappan M., Oberg K., 2023, ApJ, 955
Tancredi et al. (2000) Tancredi G., Fernández J. A., Rickman H., Licandro J., 2000, Astronomy and Astrophysics Supplement Series, 146, 73
Thomas et al. (2013a) Thomas P., et al., 2013a, Icarus, 222, 453
Thomas et al. (2013b) Thomas P. C., et al., 2013b, Icarus, 222, 550
Villanueva et al. (2012) Villanueva G., Mumma M., DiSanti M., Bonev B., Paganini L., Blake G., 2012, Icarus, 220, 291
Virtanen et al. (2020) Virtanen P., et al., 2020, Nature Methods, 17, 261
Weaver et al. (2011) Weaver H. A., Feldman P. D., A’Hearn M. F., Russo N. D., Stern S. A., 2011, The Astrophysical Journal Letters, 734, L5
Weissman et al. (2008) Weissman P. R., Choi Y. J., Lowry S. C., 2008, in AAS/Division for Planetary Sciences Meeting Abstracts #40. p. 2.03
Womack et al. (2017) Womack M., Sarid G., Wierzchos K., 2017, Publications of the Astronomical Society of the Pacific, 129, 031001
Woodward et al. (2021) Woodward C. E., Wooden D. H., Harker D. E., Kelley M. S. P., Russell R. W., Kim D. L., 2021, The Planetary Science Journal, 2, 25

Supporting Information

Supplementary data are available at MNRAS online.

Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

Appendix A Comet Radius Sources

Source	Number	Method
Lamy et al. (2004)	22	Compilation (Photometric/Thermal)
Bauer et al. (2017)	20	Thermal
Fernández et al. (2013)	15	Thermal
Tancredi et al. (2000)	8	Photometric
Lamy et al. (2009)	7	Photometric
Lamy et al. (2011)	2	Photometric
Rosser et al. (2018)	2	Thermal
Scotti (1994)	1	Photometric
Weissman et al. (2008)	1	Photometric
Boehnhardt et al. (1999)	1	Photometric
Boehnhardt et al. (2008)	1	Photometric
Boehnhardt et al. (2002)	1	Photometric
Eisner et al. (2019)	1	Photometric
Mazzotta Epifani et al. (2008)	1	Photometric
Harmon et al. (2008)	1	Radar
Harmon et al. (1997)	1	Radar
Lejoly et al. (2022)	1	Radar
Buratti et al. (2004)	1	Spacecraft
Farnham et al. (2017)	1	Spacecraft
Jorda et al. (2016)	1	Spacecraft
Sekanina et al. (2004)	1	Spacecraft
Thomas et al. (2013a)	1	Spacecraft
Thomas et al. (2013b)	1	Spacecraft
J. Bauer (unpubl. data)	1	Thermal
Boissier et al. (2013)	1	Thermal
Fomenkova et al. (1995)	1	Thermal
Groussin et al. (2010)	1	Thermal
Pittichová et al. (2008)	1	Thermal

Table 5: Table summarising the sources used for comet radii in the combined composition-size dataset. For each source we state the number of comet size measurements used in this study and also the general method by which these sizes were obtained. These counts include the objects with only limits on size.

In general it was a simple procedure to apply our guidelines for comet size selection from a range of literature sources (Section 2.1.6). However for a small number of objects there were conflicting measurements or more detailed circumstances that complicated the size selection, which we describe in more detail here.

For comets 7P, 17P, 37P, 64P, 109P and 116P we rejected smaller nucleus sizes measured by photometric methods in favour of larger sizes from thermal observations (primarily from Fernández et al., 2013; Bauer et al., 2017). With the exception of the thermal size of 109P by Fomenkova et al. (1995) these measurements made use of more modern data than the earlier photometric observations. Furthermore our selected sources provided uncertainties on the nucleus size whereas this was not always the case for the photometric measurements (most of which were from Lamy et al., 2004). Often the lower photometric nucleus size was consistent with the larger thermal estimate when the uncertainties were taken into account. Furthermore, for most of these objects our selected size is the same as that selected by the literature compilation of Knight et al. (2023).

Likewise, C/2009 P1 has a radius measurement of $r=13.5\pm 2.5\ $\mathrm{k}\mathrm{m}$$ (Bauer et al., 2017) which is in conflict with an upper limit of $r<5.6\ $\mathrm{k}\mathrm{m}$$ from a non-detection with IRAM on 04/03/2012 by Boissier et al. (2013). These measurements could be explained if the nucleus of C/2009 P1 is elongated and presented a smaller cross-section during the IRAM observations. As such we combine the two measurements into a single size estimate by taking the mean and using the range to define the uncertainty such that $r=9.6\pm 4\ $\mathrm{k}\mathrm{m}$$ .

A thermal size of $r=2.465\pm 0.135\ $\mathrm{k}\mathrm{m}$$ was measured for comet 10P from NEOWISE data (Bauer et al., 2017), however, this is smaller than a photometric size of $r=5.98\pm 0.04\ $\mathrm{k}\mathrm{m}$$ from HST observations (Lamy et al., 2011). It would appear that the comet was active and trailed in the NEOWISE data which may have led to a smaller size estimate (R. Kokotanekova, personal communication). Therefore we followed Knight et al. (2023) in selecting the larger photometric size.

The nucleus of 45P was imaged by radar and found to have a radius in the range $r=0.6-0.65\ $\mathrm{k}\mathrm{m}$$ (DiSanti et al., 2017; Lejoly et al., 2022). In order to include this object in our analysis we used the centroid of this range for the radius and the lower/upper bounds for the uncertainty, resulting in $r=0.625\pm 0.025\ $\mathrm{k}\mathrm{m}$$ . In addition, a personal communication mentioned in Lejoly et al. (2022) describes a radar diameter of $1.4\ $\mathrm{k}\mathrm{m}$$ for 46P. Although we consider radar measurements to be preferable to other remote observations we could find no further details of this measurement in the available literature. Therefore we used the radius of $r=0.56\pm 0.04\ $\mathrm{k}\mathrm{m}$$ from Boehnhardt et al. (2002) which was also selected by Knight et al. (2023).

In the compilation of Lis et al. (2019) the nucleus radius of comet 73P is given as $r=1.10\pm 0.03\ $\mathrm{k}\mathrm{m}$$ . This measurement was ultimately derived from photometric observations by Boehnhardt et al. (1999) which were made before fragmentation of the comet nucleus in 1995. However, in this work it is clearly shown that comet 73P was active at the time of these observations and Boehnhardt et al. (1999) stated that the nuclear radius of 73P must be $<1.1\ $\mathrm{k}\mathrm{m}$$ . We did not account for size limits in our methods, and given the likelihood of an earlier fragmentation event (Schuller & Struve, 1930) this object was not included in our analysis.

In Table 5 we present a summary of the different sources used to obtain comet radii for this analysis. Several sources present comet radii without formal uncertainties, however there is a clear power law relation between radius and the associated uncertainty (Figure 8). We use this relation to assign approximate uncertainties to radius measurements without them in the compiled dataset.

Appendix B Composition - Size Data Table

Here we present a sample of the complete dataset used in this study, compiled as described in Section 2. Each row contains a single abundance measurement of species X for a particular comet, where abundance is given with respect to either \chH2O or \chCN. The circumstances of the compositional observation are provided, and the literature sources of both the composition and size measurement are stated. The dataset contains 909 unique species measurements for 96 unique comets with sizes; this includes measurements of composition/size with limits and comets with a known fragmentation history. In our analysis we rejected limits and split comets and were left with 710 composition measurements for 69 comets. Table 6 gives a sample of selected rows and columns from the full dataset, which is available online as Supplementary data and at this link.

Type	Designation	Number	Name	Date(MJD)	$r_{h}$ (au)	X	X/\chH2O	$\sigma_{\textrm{X/H2O}}$	Composition Source	$r$ (km)	$\sigma_{r}$ (km)	Radius Source
P		49	Arend-Rigaux	46035.0	1.56	\chAf $\rho$	5.881e-26	4.7e-27	A’Hearn et al. (1995)	3.21	0.37	Bauer et al. (2017)
P		49	Arend-Rigaux	46035.0	1.56	\chC2	1.817e-03	1.8e-04	A’Hearn et al. (1995)	3.21	0.37	Bauer et al. (2017)
P		49	Arend-Rigaux	46035.0	1.56	\chC3	2.567e-04	2.6e-05	A’Hearn et al. (1995)	3.21	0.37	Bauer et al. (2017)
P		49	Arend-Rigaux	46035.0	1.56	\chCN	2.087e-03	1.3e-04	A’Hearn et al. (1995)	3.21	0.37	Bauer et al. (2017)
P		49	Arend-Rigaux	46035.0	1.56	\chNH	1.620e-03	2.6e-03	A’Hearn et al. (1995)	3.21	0.37	Bauer et al. (2017)
P		59	Kearns-Kwee	45599.0	1.00	\chC2	3.626e-03	3.1e-03	Cochran et al. (2012)	0.79	0.03	Lamy et al. (2009)
P		59	Kearns-Kwee	45599.0	1.00	\chC3	4.461e-04	4.0e-04	Cochran et al. (2012)	0.79	0.03	Lamy et al. (2009)
P		59	Kearns-Kwee	45599.0	1.00	\chNH	1.046e-02	7.0e-03	Cochran et al. (2012)	0.79	0.03	Lamy et al. (2009)
P		59	Kearns-Kwee	46578.0	2.23	\chAf $\rho$	3.710e-26	3.3e-27	A’Hearn et al. (1995)	0.79	0.03	Lamy et al. (2009)
P		59	Kearns-Kwee	46578.0	2.23	\chCN	1.735e-03	1.7e-04	A’Hearn et al. (1995)	0.79	0.03	Lamy et al. (2009)
P		65	Gunn	46547.0	2.64	\chAf $\rho$	9.537e-27	1.7e-27	A’Hearn et al. (1995)	4.80	1.02	Bauer et al. (2017)
P		65	Gunn	46547.0	2.64	\chC2	1.735e-04	4.7e-05	A’Hearn et al. (1995)	4.80	1.02	Bauer et al. (2017)
P		65	Gunn	46547.0	2.64	\chC3	2.135e-04	1.1e-04	A’Hearn et al. (1995)	4.80	1.02	Bauer et al. (2017)
P		65	Gunn	46547.0	2.64	\chCN	4.780e-04	8.6e-05	A’Hearn et al. (1995)	4.80	1.02	Bauer et al. (2017)
P		65	Gunn	46547.0	2.64	\chNH	8.698e-04	3.3e-03	A’Hearn et al. (1995)	4.80	1.02	Bauer et al. (2017)
P		88	Howell	44728.0	2.09	\chAf $\rho$	5.363e-26	4.3e-27	A’Hearn et al. (1995)	1.00		Tancredi et al. (2000)
P		88	Howell	44728.0	2.09	\chC2	2.396e-03	2.4e-04	A’Hearn et al. (1995)	1.00		Tancredi et al. (2000)
P		88	Howell	44728.0	2.09	\chC3	1.583e-04	1.6e-05	A’Hearn et al. (1995)	1.00		Tancredi et al. (2000)
P		88	Howell	44728.0	2.09	\chCN	2.880e-03	2.3e-04	A’Hearn et al. (1995)	1.00		Tancredi et al. (2000)
P		88	Howell	55015.1	1.74	\chCO2	2.495e-01	5.0e-02	Ootsubo et al. (2012)	1.00		Tancredi et al. (2000)
…	…	…	…	…	…	…	…	…	…	…	…	…
C	1983 J1		Sugano-Saigusa-Fujikawa	45455.0	0.74	\chAf $\rho$	3.797e-27	1.9e-28	A’Hearn et al. (1995)	0.37		Lamy et al. (2004)
C	1983 J1		Sugano-Saigusa-Fujikawa	45455.0	0.74	\chC2	5.881e-03	1.8e-04	A’Hearn et al. (1995)	0.37		Lamy et al. (2004)
C	1983 J1		Sugano-Saigusa-Fujikawa	45455.0	0.74	\chC3	6.018e-05	3.6e-06	A’Hearn et al. (1995)	0.37		Lamy et al. (2004)
C	1983 J1		Sugano-Saigusa-Fujikawa	45455.0	0.74	\chCN	2.947e-03	8.8e-05	A’Hearn et al. (1995)	0.37		Lamy et al. (2004)
C	1983 J1		Sugano-Saigusa-Fujikawa	45455.0	0.74	\chNH	2.627e-03	1.8e-04	A’Hearn et al. (1995)	0.37		Lamy et al. (2004)
C	2006 W3		Christensen	54909.9	3.40	\chCO2	7.204e-01	1.5e-01	Ootsubo et al. (2012)	21.88	4.20	Bauer et al. (2017)
C	2006 W3		Christensen	54909.9	3.40	\chCO	2.296e+00	4.6e-01	Ootsubo et al. (2012)	21.88	4.20	Bauer et al. (2017)
C	2006 W3		Christensen	55073.3	3.22	\chCH3OH	3.355e-02	6.7e-03	Bockelée-Morvan et al. (2010)	21.88	4.20	Bauer et al. (2017)
C	2006 W3		Christensen	55073.3	3.22	\chCS	1.118e-03	4.5e-04	Bockelée-Morvan et al. (2010)	21.88	4.20	Bauer et al. (2017)
C	2006 W3		Christensen	55073.3	3.22	\chH2S	2.237e-02	2.2e-03	Bockelée-Morvan et al. (2010)	21.88	4.20	Bauer et al. (2017)
C	2006 W3		Christensen	55073.3	3.22	\chHCN	3.579e-03	8.5e-04	Bockelée-Morvan et al. (2010)	21.88	4.20	Bauer et al. (2017)

Table 6: A sample of selected rows and columns from the comet composition-size dataset used in this work. The columns presented are the comet identifiers (type, designation and number), details of the compositional measurement (date and heliocentric distance

r_{h}

) and the abundance of species X relative to water (X/\chH2O and the corresponding uncertainty

\sigma_{\textrm{X/H2O}}

if available) with the source of the compositional measurement. Size information is provided as radius,

r

, with uncertainty

\sigma_{r}

(if available) alongside the literature source of the measurement. The full table with all rows and additional columns is available online as Supplementary data and at this link.

Appendix C Additional composition data

This annex presents additional figures for the composition vs radius of our sample of comets for daughter species, compared to the parent species considered in the main analysis. Figure 9 shows the daughter species abundance (relative to \chH2O) as a function of comet radius. The full results of the Pearson correlation tests for each daughter species, and the dynamical sub-populations in the dataset, are provided in Table 7.

In addition we have considered the daughter species abundance relative to \chCN instead of \chH2O (Figure 10). The results of the Pearson correlation analysis for this dataset are provided in 8. We note that a moderately significant correlation for \chCH/\chCN is visible in these data, while no correlation was seen for \chCH/\chH2O. However, this is likely a small number statistics effect as we have only a handful of measurements for \chCH/\chH2O.

	Ecliptic Comets			Nearly Isotropic Comets			All Comets
Species	Number	Correlation	$p$ -value	Number	Correlation	$p$ -value	Number	Correlation	$p$ -value
\chAf $\rho$ /H2O	38	0.2028	0.2221	10	0.6396	0.0464	48	0.1885	0.1994
\chC2/H2O	40	0.0209	0.8981	11	0.0345	0.9198	51	0.1114	0.4366
\chC3/H2O	35	-0.0829	0.6360	11	0.6139	0.0445	46	0.0995	0.5107
\chCH/H2O	7	0.1965	0.6728	2	-1.0000	1.0000	9	0.3612	0.3395
\chCN/H2O	38	-0.0668	0.6902	11	-0.0737	0.8294	49	-0.0819	0.5757
\chCS/H2O	6	0.7663	0.0755	3	-0.1783	0.8859	9	0.6955	0.0375
\chNH2/H2O	17	-0.0381	0.8846	5	-0.4985	0.3926	22	-0.0758	0.7373
\chNH/H2O	32	-0.3465	0.0520	11	-0.0019	0.9955	43	-0.2190	0.1583

Table 7: Table showing the Pearson test results for daughter species abundance relative to \chH2O, including the number of comets tested, correlation coefficients and associated

p

-values Similar to Table 2 results are shown for the ecliptic comets, nearly isotropic comets and all objects when considered together.

	Ecliptic Comets			Nearly Isotropic Comets			All Comets
Species	Number	Correlation	$p$ -value	Number	Correlation	$p$ -value	Number	Correlation	$p$ -value
\chAf $\rho$ /CN	38	0.2307	0.1636	11	0.3078	0.3571	49	0.2826	0.0491
\chC2/CN	45	0.1089	0.4763	11	0.2958	0.3771	56	0.2203	0.1028
\chC3/CN	42	-0.0025	0.9872	11	0.2747	0.4136	53	0.1623	0.2456
\chCH/CN	14	-0.0085	0.9771	7	0.4812	0.2743	21	0.3369	0.1354
\chNH2/CN	21	-0.2368	0.3014	9	-0.1134	0.7715	30	0.0669	0.7254
\chNH/CN	32	-0.2019	0.2678	10	0.0736	0.8399	42	-0.0413	0.7953
\chOH/CN	13	-0.2573	0.3960	5	-0.2853	0.6418	18	0.0146	0.9541

Table 8: Table showing species abundance relative to \chCN, number of comets for which we have an abundance measurement, Pearson correlation coefficients for the abundance vs nucleus size and associated

p

-values. Results are shown for the ecliptic comets (EC), nearly isotropic comets (NIC) and all objects for which a radius and composition estimate are available. Similar to Table 2 the strong, moderate and marginal significance correlations are highlighted.

Appendix D \chCO/\chH2O data

In Table 9 we present the complete data for \chCO/\chH2O abundance ratios and sizes of the comets used in our correlation analysis and the detailed discussion in Section 4.1. In this subset of the full composition - size dataset (a sample of which of which is presented in Table 6) we have already rejected measurements that are only limits, and any comets with a known fragmentation history prior to measurement. We followed the methodology described in Section 2.2.2 to select a specific source for the abundance ratio when there were multiple measurements available. These steps preferentially selected the source with the largest number of measurements for unique comets and species. We did this to try compile as homogeneous a dataset as possible given the range of literature sources available.

Most of the \chCO/\chH2O abundance ratios were selected from the large scale study by Dello Russo et al. (2016). This work is a compilation of the abundance of 8 volatile molecules for 30 comets determined from a database of high resolution infrared spectroscopy taken from 1997 - 2013 using a variety of telescopes/instruments. In our methodology this source was frequently selected due to its large size and number of different species measured in a consistent manner, thus helping to increase the consistency across our literature-complied dataset. Furthermore, we note that for many comets with multiple sources the abundance ratios are of a similar value to that of Dello Russo et al. (2016). In this work we repeated our analysis while selecting different abundances from different sources and found that variation in the value of the logarithm abundance was small and so there was little change to the trends discussed in Section 3.

Within the dataset of Dello Russo et al. (2016) we highlight some notable abundance ratios. The observations from which \chCO/\chH2O was determined for 9P were taken shortly after the collision of the Deep Impact spacecraft with the nucleus of 9P on 04/07/2005. There were no pre-impact measurements of the \chCO abundance for direct comparison, however Biver et al. (2007) did not observe significant changes in the abundance of \chHCN and only a possible increase for \chCH3OH. Likewise Mumma et al. (2005) found no changes in the abundances of \chHCN, \chCH3OH, but they did observe a significant increase for \chC2H6. For comet 9P measurements of \chCO/\chH2O were also available from Biver et al. (2007); Lippi et al. (2021) and the literature compilation of Harrington Pinto et al. (2022). However these sources either published upper limits on the abundance ratio, or did not include uncertainties, therefore the value from Dello Russo et al. (2016) was selected. Likewise, for comet 103P additional measurements of \chCO/\chH2O are available from Harrington Pinto et al. (2022), but with no uncertainty, and from Lippi et al. (2021). However the latter measurement is identical to that of Dello Russo et al. (2016) as they both obtained this value from UV spectroscopic observation with HST at the time of the NASA EPOXI flyby of 04/11/2010 (Weaver et al., 2011, the only non-IR measurements included in this work). In any case, following our methodology the measurement was selected from Dello Russo et al. (2016) as it was the larger study.

The hyperbolic comet C/2009 P1 demonstrated unusual behaviour in the observed production rates of \chCO during its perihelion passage of December 2011. For most comets volatile production rates are expected to peak sometime around perihelion approach and then decrease. This was the case for the production of \chH2O by C/2009 P1, however, the observed production of \chCO continued to increase past the perihelion passage (see Figure 9 of Feaga et al., 2013). This resulted in a large variation of the measured \chCO/\chH2O abundance across the perihelion passage; as such we assessed the available literature in an attempt to determine a suitable value of \chCO/\chH2O for our investigation. The \chCO/\chH2O abundance of C/2009 P1 as presented in Dello Russo et al. (2016) is the weighted mean of abundances from the following sources: Paganini et al. (2012), Villanueva et al. (2012), DiSanti et al. (2014), and McKay et al. (2015). In addition we retrieved the abundance ratios presented by Gicquel et al. (2015) Furthermore the largest abundance ratio for this comet, \chCO/\chH2O = $0.630\pm 0.206$ , was derived by Feaga et al. (2013) from remote observations from the Deep Impact Flyby spacecraft when of C/2009 P1 was at $r_{h}=2.00-2.06\ $\mathrm{a}\mathrm{u}$$ (abundance value and uncertainty retrieved from Harrington Pinto et al., 2022). For consistency with our methodology we excluded measurements with $r_{h}\geq 2\ $\mathrm{a}\mathrm{u}$$ and took the mean abundance, getting a similar value to the composition presented in Dello Russo et al. (2016): \chCO/\chH2O = $0.084\pm 0.076$ , where we reflect the large variation in abundance by assigning an uncertainty derived from the range of measured values⁶⁶6 $\pm$ (max(\chCO/\chH2O) - min(\chCO/\chH2O)) / 2. It should be noted that we repeated our analysis using the much larger estimate of Feaga et al. (2013) and we found no significant changes in the overall strength of the composition - size correlation presented in Table 3. This is in line with the bootstrap/jack-knife resampling tests described in section 3.2, which demonstrated that the correlation for this dataset does not depend strongly on any one object.

Abundance ratios for 29P, C/2006 W3 and C/2008/Q3 were selected from Ootsubo et al. (2012), a survey of \chCO, \chCO2 and \chH2O for 18 comets using NIR spectroscopy from the AKARI spacecraft. We note that the observations for C/2006 W3 and 29P were taken at a large heliocentric distances of $r_{h}>3\ \&\ 6\ $\mathrm{a}\mathrm{u}$$ respectively. This is much greater than the typical $r_{h}\approx 1-2\ $\mathrm{a}\mathrm{u}$$ for other comets in the \chCO dataset, which may be another explanation for the higher than average abundance of \chCO relative to less volatile \chH2O, which we attempt to address in our analysis using the additional tests described in Section 3.2.

We selected the \chCO/\chH2O abundance of periodic comet 17P/Holmes from the population study by Lippi et al. (2021) as this was the only source available for this comet. This work reports abundances for 20 comets based on reanalysis of an archive of high resolution infrared spectroscopy from NIRSPEC at the Keck Observatory.

In addition to the large scale surveys described above we searched for literature describing composition of individual comets. C/2020 F3 was observed by Biver et al. (2022b) with IRAM/NOEMA in July/August 2020 with generally poor weather conditions in both runs which limited detection of more complex molecules. We note that there were relatively few observations of this comet, presumably due pandemic restrictions during its apparition, although similar abundances were also measured by Faggi et al. (2021). C/2020 F3 has a low \chCO/\chH2O abundance ratio compared to other comets; in the IRAM observations \chCO was only marginally detected. The reference water production rates were derived from interpolation of SOHO-SWAN observations of Lyman- $\alpha$ Hydrogen emission (Combi et al., 2021) and observations of the 18cm \chOH line at the Green Bank Telescope and Nançay Radio Telescope (Drozdovskaya et al., 2023). For comet 2P Encke, the \chCO/\chH2O abundance was measured during its 2017 apparition by Roth et al. (2018) using iSHELL at IRTF. These observations were made shortly after perihelion passage under favourable conditions, with 2P at geocentric distance of only $\sim 0.75\ $\mathrm{a}\mathrm{u}$$ . This allowed the detection of hyper-volatiles, \chCO and \chCH4, which are usually difficult to measure for ecliptic comets from ground-based observations with low geocentric velocities.

67P Churyumov-Gerasimenko was the target of the Rosetta mission and its production rates were measured in situ by the ROSINA mass spectrometer instrument in May 2015 (Rubin et al., 2019). We selected these in situ measurements as we expect them to be more accurate and precise than remote observations. The Rosetta observations used in this study were taken while 67P was in a period of strong outgassing on the approach to perihelion. These detailed measurements revealed that the abundance ratios of volatile species varied over the course of the mission, related in a complex way to the heliocentric distance, nucleus spin axis orientation and the relative position of Rosetta to the nucleus. This highlights that instantaneous measurements of abundance ratios may not necessarily reflect the true abundance ratios within the bulk nucleus, however, such detailed analysis is impossible for remotely observed comets.

The remaining abundance ratios for comets 1P and 45P were selected from the literature compilation by Harrington Pinto et al. (2022). This study gathered production rates for \chCO, \chCO2 (and \chH2O where available) for 25 comets from a wide range of published sources using both space and ground-based observations. They selected sources where \chCO and/or \chCO2 production rates were measured contemporaneously with \chH2O and for some comets they have collated multiple measurements for abundance ratio. Following our methodology we calculated the mean abundance, date and heliocentric distance for each comet to use in our own dataset. However, as the measurements collected by Harrington Pinto et al. (2022) are from multiple sources we selected abundance ratios from the larger homogeneous studies (e.g. Dello Russo et al., 2016; Ootsubo et al., 2012) where possible.

Type	Designation	Number	Name	Date(MJD)	$r_{h}$ (au)	\chCO/\chH2O	$\sigma_{\textrm{CO/H2O}}$	Composition Source	$r$ (km)	$\sigma_{r}$ (km)	Radius Source
P		1	Halley	46495.0	0.79	0.110	0.0160	Harrington Pinto et al. (2022)	5.50	0.53	Lamy et al. (2004)
P		2	Encke	57834.3	0.48	0.004	0.0004	Roth et al. (2018)	2.43	0.06	Boehnhardt et al. (2008)
P		8	Tuttle	54487.7	1.05	0.004	0.0008	Dello Russo et al. (2016)	2.25	0.50	Harmon et al. (2008)
P		9	Tempel 1	53547.5	1.52	0.043	0.0100	Dello Russo et al. (2016)	2.83	0.10	Thomas et al. (2013a)
P		17	Holmes	54401.5	2.46	0.088	0.0270	Lippi et al. (2021)	2.40	0.53	Bauer et al. (2017)
P		21	Giacobini-Zinner	51455.7	1.12	0.022	0.0150	Dello Russo et al. (2016)	1.82	0.05	Pittichová et al. (2008)
P		29	Schwassmann-Wachmann 1	55153.5	6.18	4.645	1.0187	Ootsubo et al. (2012)	23.00	6.50	Bauer et al. (2017)
P		45	Honda-Mrkos-Pajdusakova	57761.0	0.56	0.005	0.0010	Harrington Pinto et al. (2022)	0.62	0.03	Lejoly et al. (2022)
P		67	Churyumov-Gerasimenko	57152.0	1.66	0.031	0.0090	Rubin et al. (2019)	1.65	0.01	Jorda et al. (2016)
P		103	Hartley 2	55498.0	1.13	0.003	0.0015	Dello Russo et al. (2016)	0.58	0.02	Thomas et al. (2013b)
C	1995 O1		Hale-Bopp	50594.6	1.14	0.262	0.0070	Dello Russo et al. (2016)	30.00	10.00	Lamy et al. (2004)
C	2006 W3		Christensen	54909.9	3.40	2.296	0.4648	Ootsubo et al. (2012)	21.88	4.20	Bauer et al. (2017)
C	2007 N3		Lulin	54870.9	1.31	0.022	0.0009	Dello Russo et al. (2016)	6.10	0.25	Bauer et al. (2017)
C	2008 Q3		Garradd	55018.0	1.81	0.243	0.0494	Ootsubo et al. (2012)	3.35	0.50	Bauer et al. (2017)
C	2009 P1		Garradd	55943.5	1.71	0.084	0.0750	See appendix D	9.60	4.00	Boissier et al. (2013) & Bauer et al. (2017)
C	2010 G2		Hill	55935.5	2.50	0.910	0.2300	Dello Russo et al. (2016)	4.01	1.04	Bauer et al. (2017)
C	2020 F3		NEOWISE	59047.3	0.80	0.032	0.0120	Biver et al. (2022b)	2.50	0.22	J. Bauer (unpubl. data)

Table 9: The \chCO/\chH2O abundance ratio and nucleus radius for each comet used in our analysis. This table is a subset of the full composition - size dataset (which is sampled in table 6) but is presented here for convenience.