Abstract

We present a new catalogue of cool supergiants in a section of the Perseus arm, most of which had not been previously identified. To generate it, we have used a set of well-defined photometric criteria to select a large number of candidates (637) that were later observed at intermediate resolution in the infrared calcium triplet spectral range, using a long-slit spectrograph. To separate red supergiants from luminous red giants, we used a statistical method, developed in previous works and improved in the present paper. We present a method to assign probabilities of being a red supergiant to a given spectrum and use the properties of a population to generate clean samples, without contamination from lower luminosity stars. We compare our identification with a classification done using classical criteria and discuss their respective efficiencies and contaminations as identification methods. We confirm that our method is as efficient at finding supergiants as the best classical methods, but with a far lower contamination by red giants than any other method. The result is a catalogue with 197 cool supergiants, 191 of which did not appear in previous lists of red supergiants. This is the largest coherent catalogue of cool supergiants in the Galaxy.

1 INTRODUCTION

The section of the Perseus arm visible from the Northern hemisphere is a Galactic region rich in young stars, with many OB associations and young open clusters (Humphreys 1978). Given its proximity to the Sun (with typical distances ranging between 3 kpc at l ∼ 100° and 2 kpc at l ∼ 140°; Choi et al. 2014), it offers important advantages for the study of stellar populations over other Galactic regions. Located towards the outskirts of the Milky Way (MW), it presents a moderately low reddening, which makes young blue stars easily accessible. In consequence, the high-mass population in Perseus has been widely studied for decades (e.g. Humphreys 1978). Among the young stars in Perseus, there are also many red supergiant (RSG) stars (>70; Humphreys 1978; Levesque et al. 2005). These stars possess moderately high mass (∼10 to ∼40  M), high luminosity (log (L/L) ∼ 4.5–5.8; Humphreys & Davidson 1979), low temperature,1 and late (K or M) spectral type (SpT). Although they have evolved off the main sequence, RSGs are still young stars (with ages between ∼8 and ∼25 Ma; Ekström et al. 2013). In consequence, they are associated with regions of recent stellar formation.

The correct characterization of the RSG phase plays a major role in the understanding of the evolution and final fate of high-mass stars (e.g. Ekström et al. 2013). Despite this pivotal position, there are still many critical questions about them that remain without definitive answers: among them, the definition of a temperature scale and its relation with luminosity, as discussed in Dorda et al. (2016a, from now on Paper II). To bring some light to these questions, we started an ambitious observational programme on RSGs, aimed at characterizing their properties by using statistically significant samples. In González-Fernández et al. (2015, from now on Paper I), we presented the largest spectroscopic sample to date of cool supergiants2 (CSGs) from the Magellanic Clouds (MCs). By combining this large sample with an important number of well-characterized MW RSGs, in Paper II we could present firm statistical confirmation of a correlation between SpT and temperature, or the relation between SpT, luminosity, and mass-loss. Taking advantage of this sample, in Dorda, González-Fernández & Negueruela (2016b, Paper III), we developed an automated method for the identification of CSGs using the atomic and molecular features in the spectral range around the infrared calcium triplet (CaT). Finally, Tabernero et al. (submitted) have calculated the effective temperatures for the sample in Paper I and studied the temperature scales of the RSGs from the MCs.

The present work is the next step in our study of CSGs. After analysing the CSG population from the MCs, we extend our study to the MW population of CSGs. As many of the properties of a given CSG population (e.g. its typical SpT and temperatures) depend on its metallicity (Elias, Frogel & Humphreys 1985), we selected a specific region of the Galaxy where we can expect rather uniform (typically solar) metallicities: the section of the Perseus arm between l = 97° and 150°, with Galactocentric distances in the ∼8 to 10 kpc range. This region was chosen because of the many RSGs that were previously known and well characterized, but also because its CSGs have very low apparent magnitudes and can be observed efficiently with long-slit spectrographs. A systematic search for CSGs in an area that is considered well studied allows a good estimation of the incompleteness of previous samples. Moreover, as the extinction towards the Perseus arm is relatively low, its blue population is well known. In consequence, the relation between OB stars and CSGs can be studied. This analysis would be especially interesting, because many clusters and OB associations in the Perseus arm have total masses and ages coherent with the presence of CSGs.

In this paper, we apply the methods developed in Paper III to a sample of candidate RSGs from the Perseus arm, to test their reliability and obtain a statistically significant sample of CSGs in the area. In addition, we develop a method to compute the likelihood that a given star is indeed an SG and estimate the reliability of our identification. We also study some basic properties of the CSG population at solar metallicities, such as its SpT distribution and its relation with the luminosity class (LC). In a future work, we will carry out a deeper study of the astrophysical properties of the CSG sample found here, analysing its spatial and kinematic distributions, as well as its connection to the known population of high-mass stars close to the main sequence.

2 OBSERVATIONS AND MEASUREMENTS

2.1 Target selection

To identify RSG candidates in the Perseus arm, we performed a comprehensive photometric search in the Galactic plane (b = +6° to −6°, and l = 97° to 150°). We used as a guide the works of Humphreys (1970, 1978). The selection is the result of the following steps.

  • From Humphreys (1978), we selected those regions with detected RSGs and distance moduli (DM) coherent with being part of the Perseus arm.

  • Using these DM, along with the measured AV, we selected from 2MASS those sources with K-band magnitudes bright enough to be an RSG, assuming a lower limit for their intrinsic brightness at MK = −5. This may seem a very low limit, as for example in Paper I there are no CSGs below MK ∼ −7, but it allows for large errors in DM and/or extinction while keeping the CSG candidate sample as complete as possible. This step gets rid of most of the foreground and background undesired populations, as the expected density profile of the Galaxy along this line of sight allows us to adopt a low-luminosity threshold without risking too much contamination (more distant RSGs will likely be also included, but they are expected to be rare in the outer reaches of the Galaxy and will be of interest for future studies). This leaves only nearby dwarfs and giants with types later than M3 as main interlopers.

  • The filtered sample was then cross-correlated with well-known catalogues of optical photometry, such as USNO-B1 (Monet et al. 2003) and UCAC3 (Zacharias et al. 2010), obtaining I-band magnitudes and proper motions. Candidates are required to have (I − KS)0 > 2 (roughly, the colour of a K0 star) and proper motions similar to those of the blue and red supergiants already known in the field. This step cleans the sample of most of the foreground stars, as they have higher motions.

  • The remaining catalogue was then submitted to SIMBAD and all the stars with confirmed SpTs were removed, although we kept 51 previously studied RSGs, for a number of reasons: check spectral variations, test the efficiency of our methods, and provide a comparison sample. In fact, 43 of these objects with reliable SpT or marked as Morgan-Keenan standards were used for the calibration sample used in Paper III. In consequence, we are not considering these 43 SGs as part of the test sample, but we include them to calculate the efficiency of the photometric selection in Section 4.2.

2.2 Observations

The targets were observed during two different campaigns. The first one was done in 2011, on the nights of October 16–18. The second campaign was carried out in 2012, from September 3rd to 7th. We used the Intermediate Dispersion Spectrograph, mounted on the 2.5 m Isaac Newton Telescope (INT) in La Palma (Spain). We used the Red+2 CCD with its 4096 pixel axis along the wavelength direction. The grating employed was R1200R, which covers an unvignetted spectral range 572 Å wide, centred on 8500 Å (i.e. the spectral region around the infrared CaT). This configuration, together with a slit width of 1 arcsec, provides a resolving power of R ∼ 10 500 in the spectral region observed. This R is very similar to the resolution of the data used in Paper I (R ∼ 11 000). The reduction was carried out in the standard manner, using the iraf facility.3

In total, we observed 637 unique targets, 102 in 2011 and 535 in 2012, without any overlap between epochs. As discussed above, 43 of them are CSGs with well-determined SpTs (all but one observed in the 2012 run) that were included in the calibration sample of Paper III (see appendix B in that work). These objects are not considered part of the Perseus sample studied here. This leaves 594 targets in our sample, which are detailed in Table A1.

2.3 Manual classification and spectral measurements

We performed a visual classification for all the stars in the sample, using the classical criteria for the CaT spectral region explained in Negueruela et al. (2012). All the carbon stars found (46) were marked and removed from later calculations. Thus, we do not use them in the present work, but they are included in our complete catalogue (see Table A1). Without the carbon stars, our sample has 548 targets.

For the analysis of our sample, we used the principal component analysis (PCA) method described in Paper III. This method begins with the automated measurement of the main spectral features in the CaT spectral region. We measured all the features needed to calculate the principal components (PCs) of our stars (i.e. those marked as shortened input list in table C.1 from Paper III). The method to measure these features is the same as for the calibration sample in Paper III. Although the resolution of our sample is not exactly the same as in the calibration sample, it is close enough not to introduce any significant difference in the result, as explained in Paper III.

Finally, we combined linearly the PCA coefficients (tables D.1 and D.2 in Paper III) with the spectral measurements of each star in our sample, obtaining their corresponding PCs. We also calculated their uncertainties, propagating the uncertainties of the equivalent widths (EWs) and PC coefficients through a lineal combination.

3 ANALYSIS

3.1 Estimating the probability of being a CSG

In Paper III, we revisited the main criteria classically used to identify RSGs, discussing the advantages and limitations of each one. We also proposed an original method, based on the PCs calculated through a large calibration sample and the use of support vector machines (SVM). All the classical criteria, as well as the PCA method, use boundaries between the SGs and non-SGs as separators (our method uses many boundaries in a multidimensional space, but it is qualitatively the same in concept). Thus, they provide a binary classification for the targets (each of them is classified as either SG or non-SG), but without any direct estimation of the reliability of their classifications.

In Paper III, we also defined two useful concepts for our analysis: efficiency and contamination. Efficiency is the fraction of all SGs that is identified as such by a given criterion, while contamination is the fraction of the stars selected as SGs by a given criterion that are not really SGs. Efficiencies and contaminations obtained for the calibration sample are based on the statistics of the whole sample, and give a good idea of the reliability of each method when it is applied to a large number of candidates. However, it is not a good measurement of the reliability of the individual classification of each target: the result is the same for a star that lies close to the boundary as for one that is far away from it. In consequence, we wanted to measure the reliability of each individual identification. For this, we used a MonteCarlo process that delivers the individual probability of each target being an SG (P(SG)). We detail the process and the results for the calibration sample in Section 3.1.1. Later, after testing the method in the calibration sample, we calculate the probabilities for the test sample of this work in Section 3.2.

3.1.1 Calculation

For each one of the three classification methods described in the following paragraph, we obtained uncertainties through a Monte Carlo process using each target in the calibration sample from Paper III. We took the variables needed for each method and their errors, and we drew a new value for the variable from a random normal distribution, with the original measurement as centre and the error as its standard deviation. For each target, we sampled 1000 draws, and so we obtained 1000 different sets of derived variables. To these, we applied the corresponding classification methods, and checked how many times the target was classified as an SG or not in each draw. The P(SG)method of a target is the fraction of realizations which resulted in a positive identification.

For what we call the PCA method (P(SG)PCA), we used the first 15 PCs (which contain 98 per cent of the accumulated variance), and the SVM calculation defined in Paper III (using a putative boundary at M0; see Paper III), obtaining the P(SG)PCA for each target. The results of this procedure are shown in Fig. 1. We also calculated the P(SG)CaT for the criterion based on the strength of the CaT (a target is identified as an SG if the sum of the EWs of its three Ca lines is equal to or higher than 9 Å), and P(SG)Ti/Fe for the Ti/Fe method [which uses as boundary the line (EW(8514.1) = 0.37 · EW(8518.1) + 0.388 in the Fe i 8514 Å versus Ti i 8518 Å diagram]. The results are shown in Figs 2 and 3. The other classical criteria considered in Paper III, based on the strength of the blend at 8468 Å and the EWs of only the two strongest lines of the CaT, have been not used here because of their low efficiency or high contamination.

PC1 versus PC3 diagram for the calibration sample. The shapes indicate their origin: circles are from the SMC survey, squares are from the LMC survey, diamonds are Galactic standard stars, and inverted triangles are the stars from the Perseus arm survey used as part of the calibration sample (see Section 2.3). The cross indicates the median uncertainties, which have been calculated by propagating the uncertainties through the lineal combination of the input data (EWs and bandheads) with the coefficients calculated in Paper III. Left: the colour indicates LC (identical to fig. 7b in Paper III). Right: the colour indicates the probability of being an SG (see Section 3.1.1).
Figure 1.

PC1 versus PC3 diagram for the calibration sample. The shapes indicate their origin: circles are from the SMC survey, squares are from the LMC survey, diamonds are Galactic standard stars, and inverted triangles are the stars from the Perseus arm survey used as part of the calibration sample (see Section 2.3). The cross indicates the median uncertainties, which have been calculated by propagating the uncertainties through the lineal combination of the input data (EWs and bandheads) with the coefficients calculated in Paper III. Left: the colour indicates LC (identical to fig. 7b in Paper III). Right: the colour indicates the probability of being an SG (see Section 3.1.1).

Depth of TiO bandhead at 8859 Å versus total EW of the CaT (8498, 8542, and 8662 Å), for the calibration sample. The strength of the TiO 8859 Å bandhead is simply an indicator of the spectral sequence for early- to mid-M stars (see section 4.3.4 in Paper III) and is included here simply to display the measurements in a 2D graph, so that the CaT criterion is easily visualized. Symbol shapes are the same as in Fig. 1. The black cross indicates the median uncertainties. In these panels, the probability of being an SG (see Section 3.1) can be compared to the actual LC classification. Left: the colour indicates LC. Right: the colour indicates the probability of being an SG (see Section 3.1.1).
Figure 2.

Depth of TiO bandhead at 8859 Å versus total EW of the CaT (8498, 8542, and 8662 Å), for the calibration sample. The strength of the TiO 8859 Å bandhead is simply an indicator of the spectral sequence for early- to mid-M stars (see section 4.3.4 in Paper III) and is included here simply to display the measurements in a 2D graph, so that the CaT criterion is easily visualized. Symbol shapes are the same as in Fig. 1. The black cross indicates the median uncertainties. In these panels, the probability of being an SG (see Section 3.1) can be compared to the actual LC classification. Left: the colour indicates LC. Right: the colour indicates the probability of being an SG (see Section 3.1.1).

EWs of the lines Fe i 8514 Å and Ti i 8518 Å for the calibration sample. Symbol shapes are the same as in Fig. 1. The cross indicates the median uncertainties. In these panels, the probability of being an SG (see Section 3.1) can be compared with the actual LC classification. Left: the colour indicates LC (equivalent to fig. 12b from Paper III). Right: the colour indicates the probability of being an SG (see Section 3.1.1).
Figure 3.

EWs of the lines Fe i 8514 Å and Ti i 8518 Å for the calibration sample. Symbol shapes are the same as in Fig. 1. The cross indicates the median uncertainties. In these panels, the probability of being an SG (see Section 3.1) can be compared with the actual LC classification. Left: the colour indicates LC (equivalent to fig. 12b from Paper III). Right: the colour indicates the probability of being an SG (see Section 3.1.1).

3.1.2 Identification based on individual probabilities

With the classical criteria studied, based on the CaT and on the Ti/Fe ratio, a large fraction of the SGs (>0.85 and >0.70) in the sample have P(SG) = 1 and most non-SGs have P(SG) = 0. Only those stars close to the boundary used by these methods present intermediate values of P(SG). Since the boundaries between SGs and non-SGs in these diagrams are straight lines, a given star can be identified as an SG if it has P(SG) ≥ 0.5 – this is equivalent to the simple assignment to one of the two categories. On the other hand, in the PCA method, there are not many targets with their P(SG) equal to 1 or to 0. This is because the PCA uses many boundaries in the multidimensional space of the PCs, not a single boundary in a two-dimensional diagram, as is the case of the classic criteria. Thus, it is more difficult to stay far away from every boundary and the probabilities tend to have intermediate values.

To illustrate this, and also to evaluate the application of this method to the identification of SGs, we calculated how many targets have their individual probability Pi equal to or higher than a given P(SG) value. As the SGs from each galaxy in the calibration sample have different typical SpTs (Levesque 2013, Paper II), we performed this calculation for six different subsamples taken from the calibration sample: SGs from the Small Magellanic Cloud (SMC), from the Large Magellanic Cloud (LMC), from the MW, all SGs, all non-SGs, and the whole sample. We present the results for each of these subsamples as fractions (F(PiP(SG))) with respect to their corresponding total size, in Figs 46. For all three classification criteria, the SGs from both MCs present very similar behaviours, but the SGs from the MW present slightly lower probabilities. This small difference is likely due to the lower efficiency of all criteria towards later subtypes, as it is well known that SGs in the MW tend to have later subtypes than those in the MCs (Levesque 2013).

Fraction of the calibration sample that has a probability of being an SG (calculated through the PCA method) equal to or higher than the corresponding x-axis value. The colours indicate the subsample: black for whole sample, red for non-SGs, blue for all SGs, magenta for SMC SGs, cyan for LMC SGs, and green for MW SGs. Each fraction is calculated with respect to the size of its own subsample.
Figure 4.

Fraction of the calibration sample that has a probability of being an SG (calculated through the PCA method) equal to or higher than the corresponding x-axis value. The colours indicate the subsample: black for whole sample, red for non-SGs, blue for all SGs, magenta for SMC SGs, cyan for LMC SGs, and green for MW SGs. Each fraction is calculated with respect to the size of its own subsample.

Fraction of the calibration sample that has a probability of being an SG (calculated through the CaT method) equal to or higher than the corresponding x-axis value. The colours indicate the subsample, as explained in Fig. 4. Each fraction is calculated with respect to the size of its own subsample.
Figure 5.

Fraction of the calibration sample that has a probability of being an SG (calculated through the CaT method) equal to or higher than the corresponding x-axis value. The colours indicate the subsample, as explained in Fig. 4. Each fraction is calculated with respect to the size of its own subsample.

Fraction of the calibration sample that has a probability of being an SG (calculated through the ratio of the Fe i 8514 Å to Ti i 8518 Å lines) equal to or higher than the corresponding x-axis value. The colours indicate the subsample, as explained in Fig. 4. Each fraction is calculated with respect to the size of its own subsample.
Figure 6.

Fraction of the calibration sample that has a probability of being an SG (calculated through the ratio of the Fe i 8514 Å to Ti i 8518 Å lines) equal to or higher than the corresponding x-axis value. The colours indicate the subsample, as explained in Fig. 4. Each fraction is calculated with respect to the size of its own subsample.

The CaT and the Ti/Fe criteria result in a large fraction of SGs with high values of P(SG), but there are non-SGs with probabilities as high as P(SG) = 1. Thus, these methods provide a quick way to identify most SGs in the sample, but at the price of having a significant contamination. Of these two methods, the CaT one is less strict, finding more SGs, but also including more non-SGs with high P(SG) values.

The PCA method finds a very small fraction of SGs with P(SG) > 0.9 (and this fraction is significantly higher for SMC SGs than for MW ones, as can be seen in Fig. 4). However, non-SGs present significantly lower values of P(SG), with none of them having P(SG) > 0.75. For this value, the fraction of SGs identified is about 0.90 ± 0.04 (∼0.80 ± 0.13 for the SGs from the MW). Therefore, using this value as a threshold, the vast majority of SGs can be identified without any contamination. In addition, it is also possible to identify a group of likely SGs with a relatively low contamination, by taking the targets whose P(SG) lies within the interval between P(SG) = 0.75 and a lower limit set at convenience (depending on the level of contamination that may be considered acceptable).

For a new sample, such as the Perseus arm sample in this paper, it is possible to estimate the value of this lower limit of P(SG) that results in an optimal selection of potential SGs. In such a sample, the only information available will be the shape of the P(SG) fraction curve (the black line in our figures). This curve, however, will always have an inflexion point at the P(SG) value where most SGs have already been selected, while most non-SGs have lower values of P(SG). Thus, from this point towards lower probabilities, the addition of extra targets to the selection becomes dominated by non-SGs. Therefore, this inflexion point can be used as a lower boundary for the group of potential SGs, and can be easily estimated for any sample under study, as we do for the Perseus sample in next section.

In the calibration sample, the inflexion point is at P(SG) ∼ 0.60. Taking this value as a lower boundary, the efficiency of the resultant selection is higher than 0.95 ± 0.04 (∼0.90 ± 0.13 for SGs from the MW), while the contamination is only 0.03 ± 0.04 (0.08 ± 0.13 in the case of the MW sample). Note that the contaminations were calculated for the total number of stars tagged as SGs, i.e. all those having P(SG) ≥ 0.60. For similar efficiencies in the CaT and Ti/Fe ratio criteria, the contaminations are slightly higher (∼0.08 ± 0.04 in both cases). These values become slightly worse in the case of MW SGs, with contaminations of 0.17 ± 0.13 for the Ti/Fe ratio criterion and 0.20 ± 0.13 for the CaT one. In Paper III, we found that the PCA method provides a higher quality method to identify SGs than the other two, because it has a significantly lower contamination. In this work, we found another advantage: the possibility to identify a large fraction of SGs without any contamination.

3.2 Probabilities for the Perseus sample

Before the analysis of our Perseus sample, we must stress that the SGs from the MW typically have M subtypes. We may thus expect our sample to be dominated by these subtypes. Moreover, most of the interlopers found in the manual classification are red giants with M types. In consequence, the diagrams obtained for the Perseus sample have their data points concentrated in the regions typical of M-type stars, and look quite different from the distributions seen in the calibration sample (see Figs 13), whose SpT range spans from G0 till late-M subtypes. For further details about the calibration sample and their SpT distribution, see Paper III and figs 7a, 9, and 12a therein.

We calculated the individual probabilities of being an SG for each target in the Perseus sample, following the same method described for the calibration sample (Section 3.1). Using the PCs previously obtained for our targets, P(SG)PCA was calculated through a Monte Carlo process (generating 1000 new sets of PCs per target). The results are given in Table A1 and represented in a PC1-to-PC3 diagram in Fig. 7.

PC1 versus PC3 diagram for the Perseus sample. The shapes indicate epoch: 2011 circles, 2012 squares. The black cross indicates the median uncertainties, which have been calculated by propagating the uncertainties through the lineal combination of the input data (EWs and bandheads) with the coefficients calculated. The colour indicates P(SG)PCA. The plot is shown at the same scale as Fig. 1, to ease comparison. The differences in the target distribution with respect to the calibration sample are due to the different ranges of SpTs.
Figure 7.

PC1 versus PC3 diagram for the Perseus sample. The shapes indicate epoch: 2011 circles, 2012 squares. The black cross indicates the median uncertainties, which have been calculated by propagating the uncertainties through the lineal combination of the input data (EWs and bandheads) with the coefficients calculated. The colour indicates P(SG)PCA. The plot is shown at the same scale as Fig. 1, to ease comparison. The differences in the target distribution with respect to the calibration sample are due to the different ranges of SpTs.

Although the PCA method provides significantly better results than classical criteria, we also calculated the probabilities for them (CaT and Ti/Fe). We include these criteria because they are useful for a quick estimate despite their limitations. In addition, this is the first time that these criteria are systematically applied to a very large sample at solar metallicity: more than 500 targets, instead of the ∼100 MW stars from the calibration sample. The results are given in Table A1, and presented in Figs 8 and 9.

Depth of the TiO bandhead at 8859 Å with respect to the sum of the EWs of the CaT lines. The shapes indicate epoch: 2011 circles, 2012 squares. The black cross indicates the median uncertainties. The colour indicates P(SG)CaT. Note again the difference in SpT distribution with respect to the calibration sample (Fig. 2).
Figure 8.

Depth of the TiO bandhead at 8859 Å with respect to the sum of the EWs of the CaT lines. The shapes indicate epoch: 2011 circles, 2012 squares. The black cross indicates the median uncertainties. The colour indicates P(SG)CaT. Note again the difference in SpT distribution with respect to the calibration sample (Fig. 2).

EWs of the Fe i 8514 Å and Ti i 8518 Å lines. The shapes indicate epoch: 2011 circles, 2012 squares. The black cross indicates the median uncertainties. The colour indicates P(SG)Ti/Fe. Comparison to Fig. 3 highlights the lack of stars with G and K SpTs.
Figure 9.

EWs of the Fe i 8514 Å and Ti i 8518 Å lines. The shapes indicate epoch: 2011 circles, 2012 squares. The black cross indicates the median uncertainties. The colour indicates P(SG)Ti/Fe. Comparison to Fig. 3 highlights the lack of stars with G and K SpTs.

4 RESULTS

4.1 SGs identified

When we studied the distribution of P(SG)PCA among the components of the calibration sample, we found that only true SGs present values higher than P(SG)PCA = 0.75 (see Section 3.1.2). Thus, we were able to obtain a group of SGs a priori free from any non-SG (the ‘reliable SGs’ set). In addition, it is possible to define an interval of probabilities between P(SG)PCA = 0.75 and a lower limit, that increases the selection of SGs, while keeping the contamination very low (the ‘probable SGs’ set). The optimal lower limits for the Galactic samples were selected through the diagram shown in Fig. 10, by the estimation of the inflexion point in the corresponding curve. For the Perseus sample, we estimated it at P(SG)PCA ∼ 0.55. The number of SGs found by these cuts is indicated in Table 1.

Fraction of the Perseus sample that has a probability of being an SG (calculated through the PCA method) equal to or higher than the corresponding x-axis value.
Figure 10.

Fraction of the Perseus sample that has a probability of being an SG (calculated through the PCA method) equal to or higher than the corresponding x-axis value.

Table 1.

Number of targets tagged as ‘reliable SGs’ or ‘probable SGs’ (see Section 4.1) through the analysis of P(SG)PCA. The LC was assigned through the manual classification. We also show the fraction that these groups represent with respect to the number of total targets in the sample (594). The 2σ uncertainties for the given fractions are equal to |$1/\root \of {n}$|⁠, where n is the total number of targets. Thus, the uncertainty of both fractions is equal to ±0.04.

NumberFraction
ReliableProbableReliableProbable
SGsSGsTotalSGsSGsTotal
116751910.200.130.33
NumberFraction
ReliableProbableReliableProbable
SGsSGsTotalSGsSGsTotal
116751910.200.130.33
Table 1.

Number of targets tagged as ‘reliable SGs’ or ‘probable SGs’ (see Section 4.1) through the analysis of P(SG)PCA. The LC was assigned through the manual classification. We also show the fraction that these groups represent with respect to the number of total targets in the sample (594). The 2σ uncertainties for the given fractions are equal to |$1/\root \of {n}$|⁠, where n is the total number of targets. Thus, the uncertainty of both fractions is equal to ±0.04.

NumberFraction
ReliableProbableReliableProbable
SGsSGsTotalSGsSGsTotal
116751910.200.130.33
NumberFraction
ReliableProbableReliableProbable
SGsSGsTotalSGsSGsTotal
116751910.200.130.33

Classical methods are based on a linear boundary in a two-dimensional space. In consequence, when curves of P(SG) are plotted for them (see Section 3.1.2), there is no hint of a threshold value for ‘reliable SGs’ as in the case of P(SG)PCA. Thus, the only reasonable minimum value, given the two-dimensional nature of the boundary, is P(SG) = 0.5. The number of targets identified as SGs is given in Table 2.

Table 2.

Number of SGs found by different methods, and the fraction that they represent with respect to the total number of targets observed (594). For the classical criteria, we used a threshold of P(SG) = 0.5; for the PCA method, we adopted a threshold of P(SG) = 0.55 (see Section 3.2). The 2σ uncertainties of the fractions are equal to |$1/\root \of {n}$|⁠, where n is the total number of targets.

CriterionNumber of SGsFraction
CaT3040.51 ± 0.04
Ti/Fe2380.40 ± 0.04
PCA1930.32 ± 0.04
CriterionNumber of SGsFraction
CaT3040.51 ± 0.04
Ti/Fe2380.40 ± 0.04
PCA1930.32 ± 0.04
Table 2.

Number of SGs found by different methods, and the fraction that they represent with respect to the total number of targets observed (594). For the classical criteria, we used a threshold of P(SG) = 0.5; for the PCA method, we adopted a threshold of P(SG) = 0.55 (see Section 3.2). The 2σ uncertainties of the fractions are equal to |$1/\root \of {n}$|⁠, where n is the total number of targets.

CriterionNumber of SGsFraction
CaT3040.51 ± 0.04
Ti/Fe2380.40 ± 0.04
PCA1930.32 ± 0.04
CriterionNumber of SGsFraction
CaT3040.51 ± 0.04
Ti/Fe2380.40 ± 0.04
PCA1930.32 ± 0.04
Table 3.

Number of targets from the Perseus sample tagged as SGs through the manual classification that were also identified as such by the different methods considered. Note that we found 241 SGs through the manual classification. Among them, 90 were classified as Ia or Iab, 85 as Ib, and 66 as Ib–II. Thus, the efficiencies and their uncertainties (that are equal to |$1/\root \of {n}$|⁠) are calculated with respect to these values, and modified by the definition of efficiency (an efficiency >1 is not possible).

Number of SGs foundEfficiency
MethodAllIa to IabIbIb–IIAllIa to IabIbIb–II
PCA1828683130.76 ± 0.06|$0.96^{+0.04}_{-0.11}$||$0.98^{+0.02}_{-0.11}$|0.20 ± 0.12
CaT2048681370.85 ± 0.06|$0.96^{+0.04}_{-0.11}$||$0.95^{+0.02}_{-0.11}$|0.56 ± 0.12
Ti/Fe1948380310.80 ± 0.06|$0.92^{+0.08}_{-0.11}$||$0.94^{+0.06}_{-0.11}$|0.47 ± 0.12
Number of SGs foundEfficiency
MethodAllIa to IabIbIb–IIAllIa to IabIbIb–II
PCA1828683130.76 ± 0.06|$0.96^{+0.04}_{-0.11}$||$0.98^{+0.02}_{-0.11}$|0.20 ± 0.12
CaT2048681370.85 ± 0.06|$0.96^{+0.04}_{-0.11}$||$0.95^{+0.02}_{-0.11}$|0.56 ± 0.12
Ti/Fe1948380310.80 ± 0.06|$0.92^{+0.08}_{-0.11}$||$0.94^{+0.06}_{-0.11}$|0.47 ± 0.12
Table 3.

Number of targets from the Perseus sample tagged as SGs through the manual classification that were also identified as such by the different methods considered. Note that we found 241 SGs through the manual classification. Among them, 90 were classified as Ia or Iab, 85 as Ib, and 66 as Ib–II. Thus, the efficiencies and their uncertainties (that are equal to |$1/\root \of {n}$|⁠) are calculated with respect to these values, and modified by the definition of efficiency (an efficiency >1 is not possible).

Number of SGs foundEfficiency
MethodAllIa to IabIbIb–IIAllIa to IabIbIb–II
PCA1828683130.76 ± 0.06|$0.96^{+0.04}_{-0.11}$||$0.98^{+0.02}_{-0.11}$|0.20 ± 0.12
CaT2048681370.85 ± 0.06|$0.96^{+0.04}_{-0.11}$||$0.95^{+0.02}_{-0.11}$|0.56 ± 0.12
Ti/Fe1948380310.80 ± 0.06|$0.92^{+0.08}_{-0.11}$||$0.94^{+0.06}_{-0.11}$|0.47 ± 0.12
Number of SGs foundEfficiency
MethodAllIa to IabIbIb–IIAllIa to IabIbIb–II
PCA1828683130.76 ± 0.06|$0.96^{+0.04}_{-0.11}$||$0.98^{+0.02}_{-0.11}$|0.20 ± 0.12
CaT2048681370.85 ± 0.06|$0.96^{+0.04}_{-0.11}$||$0.95^{+0.02}_{-0.11}$|0.56 ± 0.12
Ti/Fe1948380310.80 ± 0.06|$0.92^{+0.08}_{-0.11}$||$0.94^{+0.06}_{-0.11}$|0.47 ± 0.12

The targets tagged as SGs through P(SG)PCA represent a significant fraction (almost one third) of the Perseus sample. Moreover, most of them (∼66 per cent) are tagged as ‘reliable SGs’; we can thus consider this group in good confidence free of any interloper. The number of SGs found through the PCA method is, however, significantly lower than the numbers found through the CaT and Ti/Fe criteria. We must be cautious with the results obtained using these methods, as their contaminations were higher (0.17 ± 0.13 for Ti/Fe and 0.20 ± 0.13 for CaT) than for the PCA (0.08 ± 0.13) among MW stars in the calibration sample (see Paper III). The difference in the expected contamination is not enough to explain the number of stars tagged as SG, but it seems clear that the higher the contamination is for a method, the larger number of stars it identifies as SGs. Moreover, we have to take into account that the Galactic set from the calibration sample is limited in two ways. First, the subsample was relatively small, which causes high uncertainties in our fractions (±0.13). Secondly, this sample is not comparable to any observed sample, because it was intentionally created by assembling a similar number of well-known SGs and non-SGs. Thus, it will not be at all representative in terms of the number of non-SG stars that one may expect to find as interlopers when using photometric criteria to select SG candidates in the Galactic plane. In view of these limitations, to study the efficiency and contamination of our methods in the Perseus sample, we resort to a direct calculation, in the next section.

4.2 Efficiency of the photometric selection

The most important source of contaminants in the photometric selection comes from the magnitude/distance degeneracy. In this case, we are interested in structures relatively close to Earth, and in stars that are intrinsically bright, so we can enforce strict limits in apparent magnitude that will filter out most of the intrinsically dimmer populations along the line of sight. The overall efficiency of the selection criteria outlined in Section 2.1 is 47 per cent. This includes the 43 MK standards mentioned in Section 2.2, as these were not included a posteriori but picked up by the selection algorithm.

As can be seen in Fig. 11, the efficiency decays with magnitude: at |$m_{K_{\mathrm{S}}}\sim 4.5,$| most of the observed stars turn out to be interlopers. This agrees roughly with Paper I, as at the low end of the brightness distribution of SGs, the selected sample is dominated by bright giants. Similarly, while the fraction of SGs is more or less homogeneous with colour, the red end of the distribution (stars with (J − KS) ≥ 1.7) is mostly composed of bright carbon stars. These results for an MW sample confirm those obtained in the MCs, in Paper I, and will also be useful for future photometric selections. However, we must caution that such a red cut-off can only be used to discriminate carbon stars in fields of low (such as the MCs) or moderate (like the present sample) extinction. For the high extinctions (AV ≳ 5 mag) found in many lines of sight towards the inner MW, M-type stars would be shifted to very high values of (J − KS) and other discriminants must be found.

Fraction of SGs found in the target sample as a function of apparent KS magnitude and colour. The dashed line marks the total average fraction, 47 per cent. Of these detected SGs, ∼ 5 per cent were already known.
Figure 11.

Fraction of SGs found in the target sample as a function of apparent KS magnitude and colour. The dashed line marks the total average fraction, 47 per cent. Of these detected SGs, ∼ 5 per cent were already known.

4.3 Efficiency and contamination in the PCA method

4.3.1 Efficiency

To estimate directly the efficiency of our survey in the Perseus arm, we used the manual classification previously performed. We have to note that this classification is not a priori more reliable than our automatized methods. Manual classification was done before we developed the automated process detailed in Paper III. For the manual classification, we used classical criteria, such as the EW of the CaT, the ratio between nearby Ti and Fe lines (Fe i 8514 Å and Ti i 8518 Å among others), and the EW of the blend at 8468 Å. In Paper III, we demonstrated that the criteria based on these features have an efficiency slightly worse (at best) than our automated method. The manual classification can be somewhat better than these methods at identifying SGs, as it is a global process (like our PCA method), not based on any single spectral feature. Thus, the efficiency found in this work is useful to estimate the average quality of the classification methods under study with respect to a manual classification done following the classical criteria for the CaT range.

In the first place, we calculated the efficiency for each method (see Table 3). The efficiency in this case is the fraction of all SGs found through the manual classification, which were also tagged as such by a given automated criterion. The PCA method has the lowest global efficiency. It is similar to the value for the Ti/Fe criterion, but significantly lower than the efficiency of the CaT criterion. Nevertheless, when the LC of the targets is taken into account, the results can be seen in a very different light.

The calibration sample (see Paper III for details) is dominated by high- and mid-luminosity SGs (Ia and Iab), with only a small fraction of Ib or less luminous SGs (LC Ib–II). In consequence, our PCA method is optimized to find Ia and Iab SGs. In view of this, in the Perseus sample, we considered the efficiency for different LCs separately. The efficiencies of the PCA and CaT criteria for high-luminosity SGs are the same, 0.96 ± 0.11, and comparable to those found for the calibration sample. The efficiencies for low-luminosity SGs (Ib) are also similar in both methods, and compatible with the results obtained for Ia and Iab stars. However, for the Ib–II stars, the efficiencies are significantly different depending on the criterion used. The higher efficiency of the CaT method in the Ib–II group stems from the fact that this criterion is much less strict than the PCA one, but at the price of being more susceptible to contamination of red giants (see the following subsection). As the Ib–II subclass is the boundary between SGs (LC I) and bright giants (LC II), the morphology of the objects with this tag is intermediate. Moreover, there are asymptotic giant branch (AGB) stars, which are not high-mass stars, whose spectra are pretty similar to those of low-luminosity SGs (Ib). The perfect example of this is α Her. This star is the high-luminosity MK standard with the latest SpT available (M5 Ib–II; Keenan & McNeil 1989). However, Moravveji et al. (2013) show that this star is not a high-mass star (M* ≳ 10 M), but an AGB star with a mass around 3 M, even though its spectral morphology is very close to that of an SG. In view of this, through the manual classification we probably identified as SGs stars that are not really SGs, but pretty similar to them morphologically. The PCA criterion, instead, is more restrictive, and only selects as SGs those objects similar enough to the luminous (high-mass) SGs (those having LC Ia and Iab) used to calibrate it.

Our methods, and especially the PCA method, are very efficient for mid- to high-luminosity SGs (Iab to Ia), and also for lower luminosity SGs (Ib). However, there are also a small number of stars (6) manually classified between Ia and Ib that were not identified as SGs by the PCA. All these six stars have mid- to late-M types. All but one of them are M5 or later, with most of them (four) having very late SpTs (M7 or M7.5). In fact, these stars are the majority of the RSGs with SpTs M5 or later in the whole Perseus sample, as there are only two other M5 Ib stars (which were correctly identified by the PCA method). The only star earlier than M5 (it was classified as M3) that was not identified as an SG is S Per, an extreme RSG (ERSG). The reason why this object was not correctly identified is clear: its lines are weakened by veiling, an effect that may appear in ERSG stars which has been reported before for S Per (Humphreys 1974). For more details about ERSGs and veiling, see section 4.4 from Paper III and references therein.

Just like the PCA method, the CaT and the Ti/Fe criteria fail for mid- to late-M SGs. They failed to identify the same true SGs that were not found by the PCA. In addition, they also failed for a group of Ia to Ib stars with slightly earlier SpTs (M3 and M4). The obvious conclusion is that all methods fail almost completely in the identification of mid- to late-M RSGs. However, the PCA method provides significantly better results for mid-M SGs (up to M5) than the other criteria. This, in turn, cannot be considered a major drawback, as the number of mid- to late-M RSGs is very small, with only a handful of SGs presenting SpTs later than M5 (and most of them presenting spectral variability).

4.3.2 Contamination

The three identification methods studied above have similar efficiencies for mid- to high-luminosity subsamples. The advantage of the PCA method over the other two is to provide significant lower contaminations, at least for the calibration sample. Therefore, we estimated the contamination obtained through each method for the Perseus sample. The contamination in this case is the fraction of the stars selected as SGs by a given automated criterion that were not identified as real SGs through the manual classification. The results are shown in Table 4.

Table 4.

Contaminations obtained through different methods for the Perseus sample. As the contamination is the fraction of targets tagged as SGs that actually are not SGs, its 2σ uncertainty is equal to |$1/\root \of {n}$|⁠, where n is the number of objects identified as SGs.

Number of targetsNumber of non-SGs
Methodtagged as SGswrongly identifiedContamination
PCA193110.06 ± 0.07
CaT3041000.33 ± 0.06
Ti/Fe238430.18 ± 0.07
Number of targetsNumber of non-SGs
Methodtagged as SGswrongly identifiedContamination
PCA193110.06 ± 0.07
CaT3041000.33 ± 0.06
Ti/Fe238430.18 ± 0.07
Table 4.

Contaminations obtained through different methods for the Perseus sample. As the contamination is the fraction of targets tagged as SGs that actually are not SGs, its 2σ uncertainty is equal to |$1/\root \of {n}$|⁠, where n is the number of objects identified as SGs.

Number of targetsNumber of non-SGs
Methodtagged as SGswrongly identifiedContamination
PCA193110.06 ± 0.07
CaT3041000.33 ± 0.06
Ti/Fe238430.18 ± 0.07
Number of targetsNumber of non-SGs
Methodtagged as SGswrongly identifiedContamination
PCA193110.06 ± 0.07
CaT3041000.33 ± 0.06
Ti/Fe238430.18 ± 0.07

The method with the lowest contamination is by far PCA. All the non-SGs wrongly selected by the P(SG) have LC II in the manual classification, and therefore their spectra are very similar morphologically to those of low-luminosity RSGs. Indeed, we cannot dismiss a priori the possibility that they may be low-luminosity SGs wrongly identified in the manual classification. The Ti/Fe criterion has a significantly higher contamination, but the CaT criterion works significantly worse than the other two in this respect. This is not completely unexpected, as the strength of the CaT lines is not only a function of luminosity, but also effective temperature and metallicity (Diaz, Terlevich & Terlevich 1989).

The contamination found in the Perseus sample through the PCA method (0.06 ± 0.07) is compatible with those obtained for the calibration sample (0.03 ± 0.04) and its MW subset (0.08 ± 0.13) in Paper III. In the case of the CaT and Ti/Fe methods, their contaminations when applied to the MW subset of the calibration sample are 0.17 ± 0.13 for the Ti/Fe criterion and 0.20 ± 0.13 for the CaT criterion, which are again compatible with those obtained in this work for these methods (see Table 4). Therefore, the results for the Perseus sample corroborate the conclusions that we reached based on the subsample of MW stars in the calibration sample in Paper III, this time for a significantly larger sample.

4.4 The population of CSGs in Perseus

As explained in Section 3.2, with the values proposed for the P(SG)PCA, we identified 191 targets as SGs in Perseus (86 of them having LC Ia or Iab according to the manual classification), while our manual identification found 258 (96 of them having LC Ia or Iab), including all the 191 PCA SGs. The difference between both sets is mainly due to Ib–II stars, which, as discussed above, may in fact not be true SGs, but bright giants. The rest of the difference is due to the late-M stars, which are not correctly selected by any of the automated criteria studied, even though their SG nature is very likely. Thus, for the present analysis we decided to adopt the PCA selection, but also include the five SGs (Ia to Ib) with late subtypes (M5 to M7) that were identified through manual classification, as well as S Per, which is a well-known ERSG (see Section 4.3.1).

The SG content of the Perseus arm was studied by Humphreys (1970, 1978), who found more than 60 CSGs in this region. Later, Levesque et al. (2005) studied the RSG population of the Galaxy, adding a handful of new stars to the list of known RSGs in the Perseus arm. We also took into account a small number of CSG standards from Keenan & McNeil (1989) located in the Perseus arm. Using these works and crossing their lists, we obtained a list of 77 previously known CSGs in the Perseus arm. Among the 197 CSGs we found, there are only six that were included in this list. Thus, our work increases the number of CSGs known in Perseus in 191 stars, more than trebling the size of previous compilations (from 77 to 268 CSGs).

This large number of CSGs allows us to study statistically the population of CSGs in the Perseus arm with unprecedented significance. Indeed, this sample permits a direct comparison of the CSG population in the Perseus arm and those in the MCs studied in Paper II. For this analysis, we used the SpT and LC given through the manual classification for the CSGs in our Perseus sample, and the classification given in the literature for the rest of the Perseus SGs that had gone to the calibration sample. Unfortunately, the distances to many of these stars still have significant uncertainties, which do not allow us to compare absolute magnitudes. However, in the near future, Gaia will provide reliable and homogeneous distances for almost all of them. We will then use these distances together with the radial velocities obtained from our spectra (which can be compared to the Gaia/RVS radial velocities to detect binarity) to study in detail the spatial and luminosity distributions for the CSG population in the Perseus arm. In the present work, we only analyse the SpT and LC distributions.

When previous works have analysed a given population of RSGs, they have typically found their SpTs to be distributed around a central subtype with maximum frequency. In all populations, the frequency of the subtypes is lower, the farther away from the central value the subtype is. The central subtype is related to the typical metallicity of the population, with later types for higher metallicities (Humphreys 1979; Elias et al. 1985). This effect has been confirmed by recent works for different low-metallicity environments (Levesque & Massey 2012). In Paper II, we confirmed this effect for very large samples in both MCs.

The SpT distribution of the Perseus CSGs found in the present work (the PCA selection plus the six late RSGs visually identified) is shown in Fig. 12. The median SpT of this sample is M1. We also studied the global population (268 CSGs), which includes all the previously known RSGs from the Perseus arm together with all our newly found CSGs. Its histogram is shown in Fig. 13(a). Addition of the set of previously known RSGs not included in our own sample (see Fig. 13a) shifts very slightly the median type to M1.5. Both median types are slightly earlier than values typically given for the MW according in the literature (M2; Elias et al. 1985; Levesque 2013). However, the difference is not large enough to be truly significant, given the typical uncertainty of one subtype in our manual classifications. We can thus consider our results consistent with the value found in the literature. Despite this, we note that our sample is intrinsically different from any previous sample of Galactic RSGs. With the possible exception of a few background RSGs (which could be present given our magnitude cut, but should be very rare, because of the steeply falling density of young stars towards the outer MW), our sample is volume limited; it represents the total RSG population for a section of a Galactic arm. Previous works are mostly magnitude limited and therefore tend to include an over-representation of later-type M SGs, as these objects tend to be intrinsically brighter (see Paper II and references therein).

Distribution of SpTs for the targets identified as SGs using the PCA method.
Figure 12.

Distribution of SpTs for the targets identified as SGs using the PCA method.

Left: distribution of SpTs for Perseus CSGs (our sample plus previous identifications). Right: the same sample as in left-hand panel, but split by LC, with red for Ia, blue for Iab, green for Ib, and black for Ib–II.
Figure 13.

Left: distribution of SpTs for Perseus CSGs (our sample plus previous identifications). Right: the same sample as in left-hand panel, but split by LC, with red for Ia, blue for Iab, green for Ib, and black for Ib–II.

The SpT distribution shown presents a clear asymmetry due to the presence of a local maximum at early-K types. This local maximum was not detected by Elias et al. (1985), but is present in Levesque (2013), in their fig. 1. The SGs considered in Elias et al. (1985) were mainly of LC Ia and Iab, while most of the early-K SGs used in Levesque (2013) are of Ib class. This is also the case in our sample; most early-K (K0–K3) SGs present low-luminosity classes (Ib or less, see Fig. 13b). Studies of similar stars in open clusters (e.g. Negueruela & Marco 2012; Alonso-Santiago et al. 2017) show that these low-luminosity SGs with early-K types are in general intermediate-mass stars (of 6–8 M), with typical ages (∼50 Ma) much older than luminous RSGs (typically between 10 and 25 Ma). Therefore, despite their morphological classification as SGs, these stars should not be considered as true SGs, because they are not quite high-mass stars. These stars are not very numerous in our sample (we have 19 stars with early-K types and LC Ib or less luminous) nor in the total population (23 stars). Therefore, our median types do not change if we do not consider these stars as part of the CSG population. It is worthwhile stressing that there are very few K-type true SGs in the MW, to the point that the original list of MK standards contains only one such object (the K3 Iab standard o1 CMa, later moved to K2.5 Iab; Morgan & Keenan 1973), as opposed to five K Ib stars, representative of the lower mass population discussed above (see Johnson & Morgan 1953). This absence of K-type SGs represents the main difference between the present catalogue and those from the MCs, as illustrated by Fig. 13(b).

In Paper II, we found that RSGs in the MCs present a relation between SpTs and LCs, with later typical types for Ia than for Iab stars. As a consequence, we found an earlier typical SpT for each MC than in previous works by a few subtypes. This difference was caused by the inclusion in our survey of a large number of Iab CSGs, while previous studies studied were centred on the brightest RSGs, mostly Ia (see section 4.2 of Paper II). In contrast, when we analyse the different LC subsamples in Perseus, we do not find any significant difference between Ia and Iab stars, as both groups have the same median SpT: M1 (see Fig. 13b). When we consider the global population, Iab SGs have a median type of M1.5, but a difference of half a subtype cannot be considered significant. These results contrast strongly with the trends found in the MCs. It is unclear, though, if we can derive any reliable conclusions from this difference, because the number of Ia stars in the Perseus sample is too low compared to the number of Iab stars: seven Ia against 83 Iab in our sample; 19 Ia against 116 Iab in the global population.

There are a number of factors to consider before attempting any interpretation. First, there are four early K-type Ia SGs pushing the median type to early types. As mentioned, these SpTs are rare in the MW, and many of these objects present unusual characteristics, such as evidence for binary interaction or heavy mass-loss. Due to the small size of the Ia sample, these rare objects may have a disproportionate impact on the average type. Moreover, we may be biasing our sample because of a classification issue: there are no MK SG standards for SpTs later than M4 (except for α Her, mentioned above, which is not a true SG). At these SpTs, luminosity indicators are strongly affected by the molecular bands, especially TiO bands. In fact, for types later than M3, many luminosity indicators (e.g. the Ca Triplet) do not separate RSGs from red giants (Dorda, Negueruela & González-Fernández 2013, Paper II, and Paper III). Our sample contains a number of RSGs with mid to late types, which were given a generic I classification, as it was not possible to give a more accurate luminosity subclass (see discussion in Negueruela et al. 2012). For calculation purposes, these objects have been assigned to the intermediate luminosity Iab. This could be incorrect, as the few late-M RSGs found in open clusters tend to have much higher luminosities than earlier RSGs in the same clusters (Marco & Negueruela 2013; Negueruela et al. 2013).

Within our sample, we have an interesting example of the situation explained above in the cluster NGC 7419. This rich cluster contains five RSG members; four of them have M0 to M2 Iab types, while the last one, MY Cep, is M7.5 I (Marco & Negueruela 2013). As can be seen in fig. 13 of Marco & Negueruela (2013), MY Cep is about one and a half magnitude more luminous than the other four RSGs. As MY Cep was the only comparison star available for the manual classification of the late RSGs in our sample, it is reasonable to expect that the three stars classified as M7 I could also be high-luminosity RSGs, as MY Cep is. Four other Ia stars present types M3 to M4. One of them is S Per, a known spectral variable that can present types as late as M7, according to Fawley (1977). In view of this, it is highly likely that we are underestimating the number of late-M Ia RSGs. Even though these are also rare objects, given the small size of the Ia sample, they could move the median to later types. In this context, it is important to note that the MC populations studied in Paper II include very few mid- or late-M SGs. Most MC Ia RSGs were M3 or earlier, allowing their LC classification without the complications that affect luminosity indicators at later types. In addition, the distance to the RSGs in the MCs is well known, allowing direct knowledge of the actual luminosity. In the Perseus sample, we have to resort only to morphological characteristics in most cases, at least until accurate distances are provided by Gaia.

The low number of Ia SGs may be meaningful in itself. On one side, magnitude-limited samples will always have a bias towards intrinsically bright stars that is not present in the Perseus sample. On the other side, the sample of CSGs in the SMC presented in Paper I, which may not be complete, but is at least representative, has a much higher fraction of Ia SGs with respect to the Iab cohort. As discussed in Paper II, there may be two different pathways leading to high-luminosity CSGs. Since stellar evolutionary models (Ekström et al. 2012; Georgy et al. 2013; Brott et al. 2011) indicate that evolution from the hot to the cool side of the Hertzsprung-Russell diagram happens at approximately constant luminosity, the brightest CSGs should be descended from more massive stars (with masses ∼25 M and up to ∼40 M). On the other hand, observations of open clusters (Negueruela et al. 2013; Beasor & Davies 2016) suggest that less massive stars (with masses between 10 and ∼20 M) could evolve from typical Iab CSGs towards higher luminosities and cooler temperatures at some point in their lives. This idea is suggested by the presence in massive clusters of some RSGs with significantly later SpTs and much higher luminosities than most of the other RSGs in the same cluster (as in the example of NGC 7419 mentioned above).

The low fraction of Ia CSGs in the Perseus arms may shed some light on these issues. Although there are some very young star clusters and associations (mainly Cep OB1 and Cas OB6) in the area surveyed, most of the clusters and OB associations are not young enough to still have any RSGs with high masses ( ≳ 20 M). The most massive clusters included in the sample region have ages around 15 Ma, with main-sequence turn-offs at B1 V. This is the case of NGC 7419 (Marco & Negueruela 2013) or the double Perseus cluster, the core of the Perseus OB1 association (Slesnick, Hillenbrand & Massey 2002), while the clusters in Cas OB8 are even older. For an age ∼15 Ma, according to Geneva evolutionary models (Ekström et al. 2012), RSGs should be descended from stars with an initial mass ∼15 M and not be much more luminous than Mbol ∼ −7. As can be seen in fig. 16 of Paper II, most Ia RSGs are more luminous than this value. Therefore, the scarcity of Ia RSGs in Perseus can be interpreted as a straight consequence of the lack of high-mass RSGs, which supports the idea that Ia CSGs come mainly from stars with initial masses between 20 and 40 M. However, there still is a significant fraction (0.07 ± 0.06) of Ia stars, which are not directly related to any very young cluster. For example, following with the example of Per OB1, this association contains the well-known ERSG S Per (Humphreys 1978), which has been observed to vary from M4 to M7 Ia. This suggests that indeed some intermediate-mass RSGs may increase their luminosity up to LC Ia from lower luminosities. Their low number in the sample agrees with small fraction of very luminous RSGs found in massive open clusters.

4.5 Candidates to ERSGs

In Paper III, we proposed the use of two diagrams to detect RSGs affected by veiling, a characteristic effect that ERSGs present at some points in their spectral variation (for details about veiling, see Humphreys 1974, and section 4.4 in Paper III). In Fig. 14, we include the location of the two veiled ERSGs, UY Sct and S Per (which indeed is one of the stars in the Perseus sample), that were available to us. They indicate the typical region where veiled ERSGs seem to lie. For the Perseus sample, we found only one star close to them, outside the main band of giant and supergiant stars. This object, PER433, was rejected as an SG by the P(SG)PCA (and also by the other methods), but given the effect of veiling on atomic lines, this rejection cannot be considered conclusive. In the bibliography, this object, known as V627 Cas, has been identified as some kind of symbiotic star (Kolotilov et al. 1996). We checked its spectrum and found that it shows the O i line at 8448 Å in emission, which is usual in Be stars, but not expected in ERSGs, since it requires higher temperatures. It also has its CaT lines in emission, partially filling them, which explains why this star shows EW(CaT) much smaller than expected for a giant star. Therefore, we can conclude that this star is not an ERSG.

Depth of the TiO bandhead at 8859 Å with respect to the sum of the EWs of the CaT lines, for the Perseus sample. The colour indicates P(SG)PCA, and the shapes indicate epoch (2011 circles, 2012 squares), except for the two stars, which are reference ERSGs. The green star is the S Per and the red star is UY Sct. Both ERSGs are represented with their own error bars. The black cross indicates the median uncertainties of the sample. The scale used in this figure is the same as for fig. 14a from Paper III, which show the same diagram for the calibration sample, to ease the comparison.
Figure 14.

Depth of the TiO bandhead at 8859 Å with respect to the sum of the EWs of the CaT lines, for the Perseus sample. The colour indicates P(SG)PCA, and the shapes indicate epoch (2011 circles, 2012 squares), except for the two stars, which are reference ERSGs. The green star is the S Per and the red star is UY Sct. Both ERSGs are represented with their own error bars. The black cross indicates the median uncertainties of the sample. The scale used in this figure is the same as for fig. 14a from Paper III, which show the same diagram for the calibration sample, to ease the comparison.

5 CONCLUSIONS AND FUTURE WORK

In Paper III, we proposed a method for using PCA in the identification of CSGs. In the present work, we have developed it further, obtaining a way to estimate the probability that a given spectrum is a CSG, instead of just giving a binary result (‘SG’ or ‘non-SG’). We have then applied this method to a large sample of galactic stars selected to be part of the Perseus arm. We also compared the results obtained through the method using PCA with two other classical criteria studied in Paper III (those based on the CaT and Ti/Fe criteria). Summarizing, from the analysis presented in this work, we can conclude the following.

  • We find that the efficiencies of all three automated methods are similarly high ( > 90 per cent) for objects that were visually classified as certain CSGs (Ia to Ib), and compatible with those obtained for the calibration sample in Paper III. The results are much worse in the case of those targets visually classified as Ib–II for the three methods, and especially for the PCA one. However, this group of LC Ib–II objects is probably formed mostly by non-SGs, and the automated methods could be simply pointing this out. Finally, we find that the efficiency is almost zero for stars visually identified as SGs having subtypes later than M5, independently of the method used.

  • Although the efficiencies are similarly good in the three cases, the contaminations are very different for each method, when manual classification is used as a reference. As in the case of the MCs, the PCA method provides the cleanest sample of SGs, with a contamination fraction as low as 0.06 ± 0.07, against 0.33 ± 0.06 and 0.18 ± 0.07 for the CaT and Ti/Fe criteria. The contamination found for the PCA method is compatible with that obtained for the calibration sample of Paper III. However, the other two methods result in values significantly higher, probably because the Perseus sample has a larger fraction of bright M giants than our MC samples due to in this case we are observing through the Galactic plane.

  • Using the PCA method, we identified 191 targets as CSGs, plus six RSGs with late SpTs that were identified through the manual classification. These 197 CSGs are a significant fraction of the total sample (0.33 ± 0.04), demonstrating that the photometric selection criteria used have a very high efficiency at moderate reddenings. This sample represents the largest catalogue of CSGs in the MW observed homogeneously, increasing the census of catalogued CSGs in the Perseus arm dramatically: to the 77 CSGs contained in previous lists, this catalogue adds 191 more objects. The list of stars observed, with their corresponding probabilities of being an SG through different methods, is given in Table A1.

The final catalogue, with almost 200 CSGs, is the largest coherent sample of CSGs observed to date in the Galaxy. In the future, we will use this sample to study both the CSG population and its relation to structure of the Perseus arm. We will use the radial velocities that we can obtain from our spectra, along with Gaia distances (which will be available for these stars by mid-2018), to study the spatial distribution of the CSGs in the Perseus arm and their relation with nearby clusters and OB associations. In addition, we will also analyse the physical properties of these stars, deriving them from their spectra by using the method that we are developing (Tabernero et al., in preparation). Finally, it is our intention to extend the study of CSG populations towards the inner Galaxy, where we should find higher metallicities, but will also have to fight much higher extinction and stellar densities.

Acknowledgements

We thank the referee, Professor Roberta Humphreys, for the swiftness of her response. The INT is operated on the island of La Palma by the Isaac Newton Group in the Spanish Observatorio del Roque de Los Muchachos of the Instituto de Astrofísica de Canarias. This research is partially supported by the Spanish Government Ministerio de Economía y Competitivad (MINECO/FEDER) under grant AYA2015-68012-C2-2-P. This research has made use of the SIMBAD, Vizier, and Aladin services developed at the Centre de Données Astronomiques de Strasbourg, France. This research has made use of the WEBDA data base, operated at the Department of Theoretical Physics and Astrophysics of the Masaryk University. It also makes use of data products from the Two Micron All-Sky Survey, which is a joint project of the University of Massachusetts and the Infrared Processing and Analysis Center/California Institute of Technology, funded by the National Aeronautics and Space Administration and the National Science Foundation. This research has made use of the SIMBAD data base, operated at CDS, Strasbourg, France.

Footnotes

1

The temperature scale of RSGs is still an open question. Over the last decade, different works (Levesque et al. 2007, Davies et al. 2013, and Tabernero et al., submitted) have reported quite different temperature ranges. In all cases, though, the effective temperatures of these stars are well below 4500 K.

2

‘Cool supergiants’ is a denomination that includes all red and some yellow supergiants (SGs). In Paper I, we showed that G-type SGs in the SMC (and presumably other low-metallicity environments) are part of the same population as RSGs. This is not the case in the MW, but a few luminous G-type SGs are part of our calibration sample. Thus, we use the term CSG to make reference to the present sample. Despite this, the term RSG is used in many cases, in reference to the samples of K and M SGs studied in previous works (e.g. Humphreys 1978; Levesque et al. 2005).

3

iraf is distributed by the National Optical Astronomy Observatories, which are operated by the Association of Universities for Research in Astronomy, Inc., under cooperative agreement with the National Science Foundation.

REFERENCES

Alonso-Santiago
J.
,
Negueruela
I.
,
Marco
A.
,
Tabernero
H. M.
,
González-Fernández
C.
,
Castro
N.
,
2017
,
MNRAS
,
469
,
1330

Beasor
E. R.
,
Davies
B.
,
2016
,
MNRAS
,
463
,
1269

Brott
I.
et al. ,
2011
,
A&A
,
530
,
A115

Choi
Y. K.
,
Hachisuka
K.
,
Reid
M. J.
,
Xu
Y.
,
Brunthaler
A.
,
Menten
K. M.
,
Dame
T. M.
,
2014
,
ApJ
,
790
,
99

Davies
B.
et al. ,
2013
,
ApJ
,
767
,
3

Diaz
A. I.
,
Terlevich
E.
,
Terlevich
R.
,
1989
,
MNRAS
,
239
,
325

Dorda
R.
,
Negueruela
I.
,
González-Fernández
C.
,
2013
, in
Kervella
P.
,
Le Bertre
T.
,
Perrin
G.
, eds,
EAS Publ. Ser. Vol. 60, Spectral Classification of Very Late Luminous Stars in the Gaia Region
.
EDP Sciences
,
France
, p.
299

Dorda
R.
,
Negueruela
I.
,
González-Fernández
C.
,
Tabernero
H. M.
,
2016a
,
A&A
,
592
,
A16
(Paper II)

Dorda
R.
,
González-Fernández
C.
,
Negueruela
I.
,
2016b
,
A&A
,
595
,
A105
(Paper III)

Ekström
S.
et al. ,
2012
,
A&A
,
537
,
A146

Ekström
S.
,
Georgy
C.
,
Meynet
G.
,
Groh
J.
,
Granada
A.
,
2013
, in
Kervella
P.
,
Le Bertre
T.
,
Perrin
G.
, eds,
EAS Publ. Ser. Vol. 60, Red Supergiants and Stellar Evolution
.
EDP Sciences
,
France
, p.
31

Elias
J. H.
,
Frogel
J. A.
,
Humphreys
R. M.
,
1985
,
ApJS
,
57
,
91

Fawley
W. M.
,
1977
,
ApJ
,
218
,
181

Georgy
C.
et al. ,
2013
,
A&A
,
558
,
A103

González-Fernández
C.
,
Dorda
R.
,
Negueruela
I.
,
Marco
A.
,
2015
,
A&A
,
578
,
A3
(Paper I)

Humphreys
R. M.
,
1970
,
ApJ
,
160
,
1149

Humphreys
R. M.
,
1974
,
ApJ
,
188
,
75

Humphreys
R. M.
,
1978
,
ApJS
,
38
,
309

Humphreys
R. M.
,
1979
,
ApJ
,
231
,
384

Humphreys
R. M.
,
Davidson
K.
,
1979
,
ApJ
,
232
,
409

Johnson
H. L.
,
Morgan
W. W.
,
1953
,
ApJ
,
117
,
313

Keenan
P. C.
,
McNeil
R. C.
,
1989
,
ApJS
,
71
,
245

Kolotilov
E. A.
,
Munari
U.
,
Yudin
B. F.
,
Tatarnikov
A. M.
,
1996
,
Astron. Rep.
,
40
,
812

Levesque
E. M.
,
2013
, in
Kervella
P.
,
Le Bertre
T.
,
Perrin
G.
, eds,
EAS Publ. Ser. Vol. 60, Red Supergiants in the Local Group
.
EDP Sciences
,
France
, p.
269

Levesque
E. M.
,
Massey
P.
,
2012
,
AJ
,
144
,
2

Levesque
E. M.
,
Massey
P.
,
Olsen
K. A. G.
,
Plez
B.
,
Josselin
E.
,
Maeder
A.
,
Meynet
G.
,
2005
,
ApJ
,
628
,
973

Levesque
E. M.
,
Massey
P.
,
Olsen
K. A. G.
,
Plez
B.
,
2007
,
ApJ
,
667
,
202

Marco
A.
,
Negueruela
I.
,
2013
,
A&A
,
552
,
A92

Monet
D. G.
et al. ,
2003
,
AJ
,
125
,
984

Moravveji
E.
,
Guinan
E. F.
,
Khosroshahi
H.
,
Wasatonic
R.
,
2013
,
AJ
,
146
,
148

Morgan
W. W.
,
Keenan
P. C.
,
1973
,
ARA&A
,
11
,
29

Negueruela
I.
,
Marco
A.
,
2012
,
AJ
,
143
,
46

Negueruela
I.
,
Marco
A.
,
González-Fernández
C.
,
Jiménez-Esteban
F.
,
Clark
J. S.
,
Garcia
M.
,
Solano
E.
,
2012
,
A&A
,
547
,
A15

Negueruela
I.
,
González-Fernández
C.
,
Dorda
R.
,
Marco
A.
,
Clark
J. S.
,
2013
, in
Kervella
P.
,
Le Bertre
T.
,
Perrin
G.
, eds,
EAS Publ. Ser. Vol. 60, The Population of M-type Supergiants in the Starburst Cluster Stephenson 2
.
EDP Sciences
,
France
, p.
279

Slesnick
C. L.
,
Hillenbrand
L. A.
,
Massey
P.
,
2002
,
ApJ
,
576
,
880

Zacharias
N.
et al. ,
2010
,
AJ
,
139
,
2184

SUPPORTING INFORMATION

Supplementary data are available at MNRAS online.

Table A1. Small sample of the stars we observed in the Perseus arm.

Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

APPENDIX: SAMPLE OBSERVED

Table A1.

Small sample of the stars we observed in the Perseus arm. We include for each target the observation epoch, the manual classification done, and the calculated probabilities of being an SG (obtained through the PCA method as well as the CaT or Ti/Fe criteria). For more details, see Section 3.

RADec.lbVisual
IDJ2000J2000(deg)(deg)EpochclassificationPPCAPCaTPTi/Fe
PER0010:00:10.00+62:27:36.0117.0440.175 792011M6.0 II0.4290.0350.251
PER0020:00:18.00+60:21:02.0116.644−1.895 362011M4.5 Ib–II0.5130.3450.299
PER0030:01:44.20+62:11:23.8117.171−0.124 562012M3.0 II0.5950.9980.5
PER0040:01:46.90+64:16:36.8117.5751.922 822012M6.0 II–III0.3640.00.073
PER0050:02:20.00+57:02:14.1116.261−5.197 142012M8.0 III0.1170.00.0
PER0060:02:59.00+61:22:05.0117.160−0.959 482011M3.0 Ib–II0.390.9980.636
PER0070:04:10.80+60:55:22.3117.220−1.423 672012C star
PER0080:06:39.00+58:02:18.0117.015−4.317 972011M5.0 Ib–II0.3570.1570.277
PER0090:08:58.40+62:42:57.0118.0870.244 192012C star
PER0100:09:26.30+63:57:14.0118.3411.457 162012M2.0 Iab0.8791.01.0
RADec.lbVisual
IDJ2000J2000(deg)(deg)EpochclassificationPPCAPCaTPTi/Fe
PER0010:00:10.00+62:27:36.0117.0440.175 792011M6.0 II0.4290.0350.251
PER0020:00:18.00+60:21:02.0116.644−1.895 362011M4.5 Ib–II0.5130.3450.299
PER0030:01:44.20+62:11:23.8117.171−0.124 562012M3.0 II0.5950.9980.5
PER0040:01:46.90+64:16:36.8117.5751.922 822012M6.0 II–III0.3640.00.073
PER0050:02:20.00+57:02:14.1116.261−5.197 142012M8.0 III0.1170.00.0
PER0060:02:59.00+61:22:05.0117.160−0.959 482011M3.0 Ib–II0.390.9980.636
PER0070:04:10.80+60:55:22.3117.220−1.423 672012C star
PER0080:06:39.00+58:02:18.0117.015−4.317 972011M5.0 Ib–II0.3570.1570.277
PER0090:08:58.40+62:42:57.0118.0870.244 192012C star
PER0100:09:26.30+63:57:14.0118.3411.457 162012M2.0 Iab0.8791.01.0
Table A1.

Small sample of the stars we observed in the Perseus arm. We include for each target the observation epoch, the manual classification done, and the calculated probabilities of being an SG (obtained through the PCA method as well as the CaT or Ti/Fe criteria). For more details, see Section 3.

RADec.lbVisual
IDJ2000J2000(deg)(deg)EpochclassificationPPCAPCaTPTi/Fe
PER0010:00:10.00+62:27:36.0117.0440.175 792011M6.0 II0.4290.0350.251
PER0020:00:18.00+60:21:02.0116.644−1.895 362011M4.5 Ib–II0.5130.3450.299
PER0030:01:44.20+62:11:23.8117.171−0.124 562012M3.0 II0.5950.9980.5
PER0040:01:46.90+64:16:36.8117.5751.922 822012M6.0 II–III0.3640.00.073
PER0050:02:20.00+57:02:14.1116.261−5.197 142012M8.0 III0.1170.00.0
PER0060:02:59.00+61:22:05.0117.160−0.959 482011M3.0 Ib–II0.390.9980.636
PER0070:04:10.80+60:55:22.3117.220−1.423 672012C star
PER0080:06:39.00+58:02:18.0117.015−4.317 972011M5.0 Ib–II0.3570.1570.277
PER0090:08:58.40+62:42:57.0118.0870.244 192012C star
PER0100:09:26.30+63:57:14.0118.3411.457 162012M2.0 Iab0.8791.01.0
RADec.lbVisual
IDJ2000J2000(deg)(deg)EpochclassificationPPCAPCaTPTi/Fe
PER0010:00:10.00+62:27:36.0117.0440.175 792011M6.0 II0.4290.0350.251
PER0020:00:18.00+60:21:02.0116.644−1.895 362011M4.5 Ib–II0.5130.3450.299
PER0030:01:44.20+62:11:23.8117.171−0.124 562012M3.0 II0.5950.9980.5
PER0040:01:46.90+64:16:36.8117.5751.922 822012M6.0 II–III0.3640.00.073
PER0050:02:20.00+57:02:14.1116.261−5.197 142012M8.0 III0.1170.00.0
PER0060:02:59.00+61:22:05.0117.160−0.959 482011M3.0 Ib–II0.390.9980.636
PER0070:04:10.80+60:55:22.3117.220−1.423 672012C star
PER0080:06:39.00+58:02:18.0117.015−4.317 972011M5.0 Ib–II0.3570.1570.277
PER0090:08:58.40+62:42:57.0118.0870.244 192012C star
PER0100:09:26.30+63:57:14.0118.3411.457 162012M2.0 Iab0.8791.01.0

Supplementary data