\authorinfo

Further author information: (Send correspondence to Jackson A. Zariski)
: E-mail: jzariski@arizona.edu, Telephone: 1 425 295 4790

Deep learning solutions to telescope pointing and guiding

Jackson Zariski Steward Observatory, University of Arizona. (United States) Department of Applied Mathematics, University of Arizona (United States) Kaitlin M. Kratter Steward Observatory, University of Arizona. (United States) Sarah E. Logsdon NSF National Optical-Infrared Astronomy Research Lab. (United States) Chad Bender Steward Observatory, University of Arizona. (United States) Dan Li NSF National Optical-Infrared Astronomy Research Lab. (United States) Heidi Schweiker NSF National Optical-Infrared Astronomy Research Lab. (United States) Jayadev Rajagopal NSF National Optical-Infrared Astronomy Research Lab. (United States) Bill McBride NSF National Optical-Infrared Astronomy Research Lab. (United States) Emily Hunting NSF National Optical-Infrared Astronomy Research Lab. (United States)
Abstract

The WIYN 3.5m Telescope at Kitt Peak National Observatory hosts a suite of optical and near-infrared instruments, including an extreme precision, optical spectrograph, NEID, built for exoplanet radial velocity studies. In order to achieve sub ms-1 precision, NEID has strict requirements on survey efficiency, stellar image positioning, and guiding performance, which have exceeded the native capabilities of the telescope’s original pointing and tracking system. In order to improve the operational efficiency of the telescope we have developed a novel telescope pointing system, built on a recurrent neural network, that does not rely on the usual pointing models (TPoint or other quasi-physical bases). We discuss the development of this system, how the intrinsic properties of the pointing problem inform our network design, and show preliminary results from our best models. We also discuss plans for the generalization of this framework, so that it can be applied at other sites.

keywords:
Machine-Learning, Regression, NEID, WIYN, pointing, guiding

1 INTRODUCTION

The WIYN telescope111The WIYN Observatory is a joint facility of the NSF’s National Optical-Infrared Astronomy Research Laboratory, Indiana University, the University of Wisconsin-Madison, Pennsylvania State University, Purdue University and Princeton University., located at Kitt Peak National Observatory in Arizona, hosts a primary mirror measuring 3.53.53.53.5m in diameter, with a total weight of approximately 35 tons. It currently hosts several facility instruments, including NEID, a high-precision optical fiber-fed spectrograph built to conduct an exoplanet radial velocity survey [5, 13]. To meet its survey goals, NEID requires a high efficiency observing strategy in order to observe an average of approximately 15151515 targets per night of operation. At present, survey efficiency is hampered by an imprecise, multi-stage target acquisition process. Targets are first acquired through a telescope mounted Star Tracker Camera, then centered on the NEID science fiber through multiple iterations with a secondary guider camera. This process not only takes time and operator input, but occasionally fails entirely when a target cannot be located in the wide-field Star Tracker camera.

Refer to caption
Figure 1: A historical look at how acquisition errors at the WIYN depend on mount azimuth. Each point represents the error, measured in arcminutes, between the expected target (α,δ𝛼𝛿\alpha,\deltaitalic_α , italic_δ), and that measured by the acquisition camera based on an astrometric database. Colors (see legend) indicate different time epochs in the data. We can see noticeable, systematic errors at certain coordinates, that shift predictably in time.

The WIYN, like many professional telescopes, currently employs proprietary software, TPoint [15], to generate a full-sky pointing solution. This regression model requires re-calibration multiple times per year, with telescope operators spending up to half a night acquiring benchmark stars across the sky. Re-calibration is required because pointing solutions drift due to myriad effects, such as motor wear, instrument re-mounting, and thermal expansion and contraction[14]. Moreover, the pointing model has highly non-uniform, time-dependent errors. In Fig. 1, we show the historical typical pointing error as a function of telescope azimuth. Consequently on any given night, targets that happen to fall at high-error azimuths will likely be difficult to acquire. In its current implementation, these errors are neither systematically tracked nor reported to operators and engineers to alert users of a possible challenge (or mechanical issues).

To improve the WIYN’s overall operating efficiency and in particular enable NEID to meet its design targets, we are developing a novel method for generating an adaptive, accurate, pointing model using various neural network architectures. We have built a prototype model based on historical telemetry that can accurately predict telescope positioning within roughly 0.05superscript0.050.05^{\circ}0.05 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, compared to a typical error closer to 0.2superscript0.20.2^{\circ}0.2 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT with the current system. Moreover, such a machine-learning informed model does not require that any additional observing time be dedicated to data collection or re-calibration, as requisite data is gained through normal operation, and model re-calibration can be done offline during daylight hours on a daily basis.

In Sec. 2 we provide an accessible summary of the machine-learning models we employ. In Sec. 3 we describe the development of our prototype pointing model using nearly a decade of historical telemetry. We then continue in Sec. 4 to describe ongoing efforts to generate a predictive model for target tracking during an observing sequence to reduce unnecessary telescope slews due to inaccuracies in the underlying pointing solution. We conclude in Sec. 5 with a discussion of future directions for our work.

2 Background to Deep Learning

To introduce some of the primary machine-learning concepts that we employ, consider just the acquisition problem in the following, simplified context. Suppose we have a celestial target, written in celestial coordinates as (α,δ)𝛼𝛿(\alpha,\delta)( italic_α , italic_δ ), in addition to a set of values regarding certain weather/atmospheric conditions W𝑊Witalic_W and a set of values referencing telescope-specific variables (guiding camera settings, time of observation, etc.). We represent these values as the set P𝑃Pitalic_P. Our ultimate goal then is to numerically generate some function f(α,δ,W,P)H𝑓𝛼𝛿𝑊𝑃𝐻f(\alpha,\delta,W,P)\to Hitalic_f ( italic_α , italic_δ , italic_W , italic_P ) → italic_H where H𝐻Hitalic_H is the mathematical space of horizon coordinates bounded by the minimum and maximum azimuth and elevation values that can be obtained by the telescope. The function f𝑓fitalic_f should seek to minimize the difference between the pixel-center of the acquisition system and the acquired image of the celestial object. To find said function numerically, a variety of learning techniques exist, which can generally be separated into general iterative regression techniques, and those requiring deep learning.

2.1 Shallow Versus Deep Learning

When discussing common machine-learning techniques, more specifically deep learning, we must first make the distinction between more modern neural networks and traditional statistical methods. In our context, the term ‘shallow regression’ will refer to models with just two layers–an input layer and an output layer, though this nomenclature is not standard. Regression in this sense can take a variety of forms, but it generally entails developing a function as described above in the context of some linear combination of weighted features. Using the previous notation, we would generate the function f𝑓fitalic_f as

f(x)=a(ixi),x{α,δ,W,P}.formulae-sequence𝑓x𝑎subscript𝑖subscript𝑥𝑖x𝛼𝛿𝑊𝑃f(\textbf{x})=a(\ell_{i}x_{i}),~{}~{}~{}\textbf{x}\in\cup\{\alpha,\delta,W,P\}.italic_f ( x ) = italic_a ( roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , x ∈ ∪ { italic_α , italic_δ , italic_W , italic_P } . (1)

Here a()𝑎a(\cdot)italic_a ( ⋅ ) is some function (called an activation), often a linear combination in the case of most linear regression schemes, though polynomial regression is also common for situations requiring more complexity. Deriving the coefficients isubscript𝑖\ell_{i}roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that generate a function for the best possible fit of the data is a non-trivial task with a variety of possible avenues for ‘training’. As an example, the current system at the WIYN and many other telescopes (TPoint) uses a proprietary fitting method to determine various coefficients for a pre-defined function [15]. In addition, the TPoint solution is revised through a few ‘test’ acquisitions following re-calibration–a process that requires substantial time on sky – roughly half a night, multiple times per year.

To contextualize the deep learning process, we will first describe briefly the process for general iterative methods, which are trained and tested separately as follows. First available data is split into a training set and a testing set. Following this, a loss function L𝐿Litalic_L is chosen to quantify the error in the model; mean-squared loss is a common choice. A set of weights {i0}subscriptsuperscript0𝑖\{\ell^{0}_{i}\}{ roman_ℓ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } are initialized and a loss is calculated following a pass of the training data; then, through some differentiation scheme (auto-diff, finite differences, etc.), the values for Li𝐿subscript𝑖\frac{\partial L}{\partial\ell_{i}}divide start_ARG ∂ italic_L end_ARG start_ARG ∂ roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG are derived. Finally, the initialized weights are updated through the operation

ik+1=ikηLik,superscriptsubscript𝑖𝑘1superscriptsubscript𝑖𝑘𝜂𝐿superscriptsubscript𝑖𝑘\ell_{i}^{k+1}=\ell_{i}^{k}-\eta\frac{\partial L}{\partial\ell_{i}^{k}},roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_η divide start_ARG ∂ italic_L end_ARG start_ARG ∂ roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG , (2)

where k𝑘kitalic_k is the current iterative step (epoch) and η𝜂\etaitalic_η is some learning rate, which can be either constant or adjusted during training. Following a certain number of iterations, training ceases and the function f(x)𝑓xf(\textbf{x})italic_f ( x ) is evaluated on the testing set with the current weights to measure performance.

Refer to caption
Figure 2: An example of a shallow network. The network accepts an input array x comprised of three features, and generates a predictive function f(x)𝑓xf(\textbf{x})italic_f ( x ). The activation used in this network is a simple linear combination. The values wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the weights of the network.

What differentiates the above learning scheme from the deep learning techniques that we currently employ is the inclusion of one or multiple ‘hidden layers’ in the model. Now, instead of inputs to the network being weighted, activated and then directly returned as the output, they are instead passed to other layers, where further activations occur and the process continues. Gradient calculations for the loss function with respect to the weights are found through a procedure called back-propagation[2]. The partials for the weights connecting the output layer are computed first; then, through a clever implementation of auto-differentiation and dynamic programming, gradient calculation is propagated outwards towards the weights directly connecting the input layer to the first hidden layer. To apply this using the notation from above, we can represent the state of the nth𝑛𝑡nthitalic_n italic_t italic_h singular neuron in a deep network through some function gnsubscript𝑔𝑛g_{n}italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT where

gn(x)=an(L2a1(L1a0(L0x))),x{α,δ,W,P}.formulae-sequencesubscript𝑔𝑛xsubscript𝑎𝑛subscript𝐿2subscript𝑎1subscript𝐿1subscript𝑎0subscript𝐿0xx𝛼𝛿𝑊𝑃g_{n}(\textbf{x})=a_{n}(\cdots L_{2}a_{1}(L_{1}a_{0}(L_{0}\textbf{x}))),~{}~{}% ~{}\textbf{x}\in\cup\{\alpha,\delta,W,P\}.italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( x ) = italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋯ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT x ) ) ) , x ∈ ∪ { italic_α , italic_δ , italic_W , italic_P } . (3)

Here, each Lisubscript𝐿𝑖L_{i}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the set of weights connecting to the corresponding previous neuron. So, L0subscript𝐿0L_{0}italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the set of weights directly connecting the input sequence to the first hidden layer, L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT connects to the second, and so forth. Thus, deep learning clearly offers substantially more complexity than more traditional shallow methods, though with this comes an increase in time and memory requirements for both the training program and functional evaluation at runtime. The process described above is known as a simple feed-forward, densely connected network. While useful for certain tasks, our problem, which we can represent as a time-series forecast, benefits from a more complex architecture.

2.2 A Deeper Dive into Deep Learning: Recurrent Neural Networks

While feed-forward neural networks are perfectly adept at modeling nonlinear relationships, when it comes to sequential data, the recurrent neural network stands out as a clear improvement. In essence, data is fed into a recurrent network grouped into sequences (such as sentences in a natural language processing model, or in our case, as sequential target acquisitions). Here, we classify an observation as a set of positional/environmental data points that is recorded at a single time-step. The first observation in the input sequence is weighted just like before; however, the output is used as a factor for the next observation in the sequence, in conjunction with the same weights. This continues throughout the sequence, and down through the network. Hence, a recurrent neural network acts as a ‘folded-up’ feed-forward network that when unfurled, has both temporal weights and weights connecting neurons. While these extra temporal weights allow the model to learn connections between observations in a sequence (these networks can even be bidirectional where they learn from both directions), some problems can arise. For sequences of large length, the gradient is being multiplied by some repeated weights a large number of times. If these weights are small, <11\ell<1roman_ℓ < 1, the gradient will tend to vanish and approach zero, meaning that reaching a minimum for the loss function will take exponentially longer, or will be impossible. Meanwhile, with >11\ell>1roman_ℓ > 1, one encounters the exploding gradient problem, where the back-propagation process consistently over-shoots the minimum, again making convergence extremely difficult[2]. These problems can also arise in feed-forward networks (especially very deep ones), though due to the temporally linked nature of recurrent networks the problems are much more prevalent. However, some modifications to the recurrent unit exist to help rectify this issue.

Refer to caption
Figure 3: An example of a deep network with two additional hidden layers. The network again accepts an an input array x comprised of three features, and generates a predictive function f(x)𝑓xf(\textbf{x})italic_f ( x ). This time however, the input is propagated through nodes in the hidden layer, each with their own activations that add extra non-linearity to the model. Note that in general the choice of activation stays the same amongst nodes in the same layer.

The problem with recurrent units described above is a problem of good ‘short-term’ memory and poor ‘long-term’ memory. Shorter sequences are exposed to less weight multiplications, and thus are less prone to vanishing or exploding. Longer sequences however are more exposed to this problem. Hence, we need a way to reliably train recurrent neural networks for sequences of any reasonable length. This is where we see the benefit of the Long Short-Term Memory (LSTM) unit, which has some extra variables in place to better mitigate this issue. Essentially, the LSTM cell contains a few running values (that are updated as sequences are propagated forward through the unit) which decide how much information to pass along without relying on compounding multiplications. In addition to the LSTM unit, a more recent adaptation is the Gated Recurrent Unit (GRU). With fewer operations, the GRU is less computationally expensive than the LSTM unit[4], though still can be very effective. In general, during the training of a recurrent neural network, we treat the recurrent unit as a hyperparameter to examine, and try training the model using both types.

2.3 Tuning Hyperparameters

The word ‘hyperparameter’ is very broad–it’s used to reference anything from the number of hidden layers in a network to the type of units that compose the layers themselves. While there is no exact science to tuning hyperparameters (though automatic tuning is possible with a variety of different methods) one can also glean some intuition by analyzing how each parameter affects the network. Some commonly tuned for our purposes hyperparameters include:

  • Dropout (Regular and Recurrent): Dropout is one of the primary ways to reduce over-fitting in a neural network. Regular dropout is included as a ‘layer’ in the network, and assigned with some probability p𝑝pitalic_p. During the forward step of training, neurons in the hidden layer following the dropout layer are ‘dropped’ with probability 1p1𝑝1-p1 - italic_p. This process thins out the layer, and helps reduce the network from fitting too closely to the training data while making the model more general. Recurrent dropout works in a similar fashion as the method above, and is applied to the recurrent connections in a recurrent cell rather than the nodes themselves.

  • Batch-Size: This hyperparameter determines how we feed our training data through the network. If our batch-size is b𝑏bitalic_b, the network takes b𝑏bitalic_b samples from the training set, runs it through the network, and then performs back-propagation to complete a part of training. This is then repeated for the next b𝑏bitalic_b samples, and so on until all of the data is used. Note that higher batch sizes thus entail less training, and make the process less computationally expensive. However, they tend to add to the over-fitting problem, as they attract local minima solutions to the loss function, rather than global ones [9].

  • Activation Function: This is the function applied to output values from a layer in the network. These functions are used for a variety of reasons, with one important reason being the mechanism to allow the network to output the correct class of data. In our networks, our final goal is to predict some continuous variable representing a desired targeting input for the telescope. As such, the values determined by our network must be continuous. In our case regarding time-series regression, the activation function in the output layer is simply a(x)=x𝑎𝑥𝑥a(x)=xitalic_a ( italic_x ) = italic_x, a linear activation. In addition to their use in the output layer, activation functions in the hidden layers of the network can add non-linearity to the model. In our case specifically, we utilize the Hyperbolic Tangent function (a popular choice for recurrent networks) and the Swish function, swish(x)=x1+eβx,β>0formulae-sequenceswish𝑥𝑥1superscript𝑒𝛽𝑥𝛽0\mathrm{swish}(x)=\frac{x}{1+e^{-\beta x}},~{}\beta>0roman_swish ( italic_x ) = divide start_ARG italic_x end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - italic_β italic_x end_POSTSUPERSCRIPT end_ARG , italic_β > 0, to achieve this goal. These hidden layer activation functions are tuned as hyperparameters, since the function in the output layer is predetermined based on the purpose of the model.

3 Acquisition

Refer to caption
Figure 4: Feature and procedure diagram for our acquisition network.

The WIYN presents a very well-suited environment for the development of a deep learning pointing solution, as we have both positional and environmental data logs dating back to 2014, as well as an already established targeting framework in TPoint to use as a baseline for improvement. The development of this system is built off of the Star Tracker camera, an Allied Vision Technologies GT1920 with a Tele-Xenar 2.2/70 lens fitted with an IR filter and a focal plane of 1936 X 1456 pixels. The field of view of the camera is approximately 7×5superscript7superscript57^{\circ}\times 5^{\circ}7 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT × 5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, with a pixel-size of 13′′superscript13′′13^{\prime\prime}13 start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT. The camera’s effective bandpass is 400-700 nm [12]. Mounted at the top-end of the telescope, on a frame supporting the secondary mirror, the camera system uses a World Coordinate System to determine pointing by matching up stars in its field to known objects in astrometric catalogs. However, this ‘real’ pointing location is only derived after the fact, when the telescope mount has already moved into position. Hence, we use the calculated celestial coordinate center from the star track center (in α,δ𝛼𝛿\alpha,\deltaitalic_α , italic_δ) as the ground truth positions that we aim to predict with our neural network training. These coordinates are hereafter referred to as the AST (astronomical) coordinates. With the Star Tracker camera as a base, our system will represent the pointing center for the telescope as the coordinate center of the image reported by the camera. Note that ultimately we desire a system that is trained to predict the mount encoder measured horizon coordinates, taking celestial coordinates as input. However, the historical dataset does not include the requisite data to analyze the model performance and errors in this direction, so to validate our method we choose to predict the celestial coordinates.

Our training data consists of roughly 50,0005000050,00050 , 000 target acquisitions over the course of eight years of observations. In addition to the initial horizon coordinates from the mount controller, we use features such as the angles of various port rotators on the telescope, the parallactic angle of the object we’re observing, and the current pixel center on the camera. In addition, to bolster our model’s ability to detect underlying dependencies of the pointing solution on weather, we include information on the dewpoint, temperature, wind-speed, and other climate-related features. Our target outputs are trained on the astrometrically determined celestial coordinates of the pixel center of the star tracking camera.

Though the acquisition data is not a uniform time series (targets are not acquired/reacquired at evenly spaced intervals), over the period of a night we do not expect too much deviation in the engineering and temperature variables we hope to capture in our network. In addition, some of these variables are included as features in our network as well, namely those that are quickly measurable relating to atmospheric conditions. Fig. 4 shows how the inputs to our recurrent network can be unraveled into individual observations.

Table 1: Median Absolute Differences with AST Celestial Coordinates during Target Acquisition
Coordinate TCS System Recurrent Neural Network (NN)
AST RA (Degrees) 0.2173 0.0704
AST Dec (Degrees) 0.2321 0.0416

As noted above, because of the nature of the current data available to us, we use the mount encoder horizon coordinates to predict celestial coordinates, in order to analyze the performance of our model relative to the existing telescope pointing solution. Summary metrics for our model’s performance are shown in Table 1. For programming these neural networks, we utilized the TensorFlow package in python which allows for the construction of networks with back-propagation done internally on the back-end[1]. We have validated that our model functions in reverse (i.e. using celestial coordinates to predict mount encoder horizon coordinates), but cannot generate a robust comparison with the existing pointing model in this direction. We illustrate the current performance of our model in Fig. 6. Here, NN is the shorthand nomenclature for our recurrent network, TCS refers to the current pointing system[10], and the ‘astropy’ coordinates are shown for reference as the predictions from the python Astropy coordinate transformation package.[3, 11] In addition, Fig.5 shows how our deep learning framework is less prone to systematic errors in the azimuthal direction. Instead we see more uniform errors, and overall improved accuracy in comparison to the existing pointing solution.

While the errors in our neural network model are still relatively large in absolute terms, for example compared to the maximal precision afforded by the resolution of the star tracker camera (67′′similar-toabsent6superscript7′′\sim 6-7^{\prime\prime}∼ 6 - 7 start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT), the historical data logs are missing some relevant data that might lead to an improved solution. For example we do not have data on which instrument was being used for each target, nor which instruments were mounted at the time, which affect the weight and balance of the telescope. Finally, while the Star Tracker camera is aligned to the telescope azimuth and elevation, it is not directly aligned to every instrument’s focal plane (some of which have field derotators), which could introduce additional errors. These features could all be included in future telemetry recorded with an eye towards developing new pointing models, and then easily ingested into the neural network training.

Refer to caption
Figure 5: A comparison of how acquisition error of both the current WIYN system (TCS) and our neural network (NN) depends on azimuth. The orange line indicates the median error value for the given system and coordinate. Other than some clear outliers, the deep learning model is less dependent on azimuth, and instead has more uniform accuracy throughout the entire coordinate interval.

4 Tracking–Synthetically-Generated Data

4.1 Tracking Corrections with a Recurrent Neural Network

Refer to caption
Figure 6: The cumulative distribution function representing the fraction of observations on the test set that fall below a given absolute error in each coordinate, shown with a logarithmic scale. Here, the astropy measurements (blue) come directly from the coordinate transform done via the astropy software[3]. This accounts for phenomena such as precession, nutation, etc. The TCS measurement (orange) comes from the current system at the WIYN and NN (green) is our neural network. Our network displays a steep rise at lower absolute errors compared to the current system and astropy coordinates, indicating a substantial improvement in accuracy for most targets. Thus we conclude that deep learning can be an effective tool for target acquisition.

In order to maintain the target on the fiber, NEID requires a pointing precision during observations of 0.05′′superscript0.05′′0.05^{\prime\prime}0.05 start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT for stars V = 12 mag or brighter under median seeing and wind conditions (0.2′′superscript0.2′′0.2^{\prime\prime}0.2 start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT for 12 mag <V<16absent𝑉16<V<16< italic_V < 16 mag)[8, 6]. To maintain this accuracy, NEID employs two feedback loops while on target. The inner, high cadence loop (27Hz27𝐻𝑧27Hz27 italic_H italic_z) measures target offsets in the guider-camera, and adjusts a tip-tilt mirror at the image pupil for re-centering. The outer loop, with a cadence of 0.5Hz0.5𝐻𝑧0.5Hz0.5 italic_H italic_z sends offsets measured on the guider camera to the telescope control system, where pixel offsets, are converted to celestial coordinates, and subsequently horizon coordinates using the existing pointing solution[7]. We highlight two potential sources of error, which we aim to address with predictive machine-learning models. First, both the inner and outer loop adjustments are inherently corrective, rather than predictive. By the time the telescope slew commands have been processed, the offset will be different from what has been computed. In addition, the multiple coordinate conversions (pixel to celestial to horizon) add errors into the model due to underlying inaccuracies in the pointing solution. We instead aim to train a predictive model that directly outputs the required adjustments in horizon coordinates, at the correct future timestamp when the telescope mount will execute the command, including latency in model compute time and system control. Crucially, as we discuss below, we design a self-correcting, or auto-regressive network, which is not only trained on historical data, but is informed and corrected by measuring pointing errors during tracking of each individual target.

Because implementation of this system requires specialized telemetry collection during scientific observations, we begin by validating this technique using a synthetically generated dataset. Using the acquisition points recorded in the Star Tracker log as a base set of astronomical targets, we generate synthetic tracking data for each target with observing lengths ranging from a few minutes to half an hour[5], with an exposure cadence of 1 Hz. We use astropy [3, 11] to covert the target (α,δ𝛼𝛿\alpha,\deltaitalic_α , italic_δ) from the Star Tracker log into a sequence of horizon coordinates as a function of the putative observing time. We then modulate this data to introduce both systematic and random errors to mimic deviations due to e.g. weather, telescope sag, load balance, and motor wear.

Our synthetic data takes the following form. First, define the random variable XUniform(u1,u2)similar-to𝑋Uniformsubscript𝑢1subscript𝑢2X\sim\mathrm{Uniform}(u_{1},u_{2})italic_X ∼ roman_Uniform ( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) (where u1subscript𝑢1u_{1}italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and u2subscript𝑢2u_{2}italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT can be adjusted) with n𝑛nitalic_n-length vector X sampled from this distribution, and a cumulative sum where

S=(S0,,Sn)Si=k=0iXk.formulae-sequenceSsubscript𝑆0subscript𝑆𝑛subscript𝑆𝑖superscriptsubscript𝑘0𝑖subscriptX𝑘\textbf{S}=\left(S_{0},\cdots,S_{n}\right)~{}~{}~{}~{}S_{i}=\sum_{k=0}^{i}% \textbf{X}_{k}.S = ( italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋯ , italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT . (4)

Next, let F be a tensor of synthetic atmospheric features, where initial values are taken from a normal distribution and then propagated over a tracking period with some added noise to more closely mimic a stable yet slightly unpredictable pattern. Define the function T:F:𝑇FT:\textbf{F}\to\mathbb{R}italic_T : F → blackboard_R, which we generally write as a polynomial combination of synthetic features for smoothness purposes, though this can be changed to investigate other problems. In addition, we also include some systematic offsets at various intervals of horizon coordinate values to mimic machine and motor error. Calling this function M𝑀Mitalic_M, then, the altered coordinates in our path take the form:

(Az,El)=(S1,S2)+ϵ1T(F)+ϵ2M(Az,El),AzElsubscriptS1subscriptS2subscriptitalic-ϵ1𝑇Fsubscriptitalic-ϵ2𝑀AzEl(\textbf{Az},\textbf{El})=(\textbf{S}_{1},\textbf{S}_{2})+\epsilon_{1}T(% \textbf{F})+\epsilon_{2}M(\textbf{Az},\textbf{El}),( Az , El ) = ( S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_T ( F ) + italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_M ( Az , El ) , (5)

where ϵ1,ϵ2subscriptitalic-ϵ1subscriptitalic-ϵ2\epsilon_{1},\epsilon_{2}\in\mathbb{R}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R depend on the magnitude of the features for the problem at hand and the desired amount of noise. These pairs of altered coordinates act as the output we train to predict. Henceforth, we will refer to the sequence of observations making up a tracking path in our data as P𝑃Pitalic_P.

In order to mimic how the tracking process would function on sky, we split up our training and testing data into input and output sequences separated by a time lag. If one piece of a synthetic path P𝑃Pitalic_P can be represented as the ordered set Pab={pi}i=absuperscriptsubscript𝑃𝑎𝑏superscriptsubscriptsubscriptp𝑖𝑖𝑎𝑏P_{a}^{b}=\{\textbf{p}_{i}\}_{i=a}^{b}italic_P start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT = { p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT where each pisubscriptp𝑖\textbf{p}_{i}p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is an observation array of k𝑘kitalic_k features at time i𝑖iitalic_i, we separate our data in a sliding window with

I=i=0nbPii+b,O=i=s+bn(Az,El)i,formulae-sequence𝐼superscriptsubscript𝑖0𝑛𝑏superscriptsubscript𝑃𝑖𝑖𝑏𝑂superscriptsubscript𝑖𝑠𝑏𝑛subscript𝐴𝑧𝐸𝑙𝑖I=\bigcup_{i=0}^{n-b}P_{i}^{i+b},~{}~{}~{}~{}~{}O=\bigcup_{i=s+b}^{n}(Az,El)_{% i},italic_I = ⋃ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - italic_b end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i + italic_b end_POSTSUPERSCRIPT , italic_O = ⋃ start_POSTSUBSCRIPT italic_i = italic_s + italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_A italic_z , italic_E italic_l ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (6)

where I𝐼Iitalic_I and O𝑂Oitalic_O are the input and output data sets respectively. Here, b𝑏bitalic_b is the length of the sequences of observations we use as inputs into our model, while s𝑠sitalic_s is the delay between the end of the input sequence and the time-step for which we are trying to predict the optimal horizon coordinates. With this system, we are able to use outputs and corrected measurements from previous observations as features to predict future time-steps. While the NEID system currently has latency of order s<5𝑠5s<5italic_s < 5, to be better able to argue for generalization on other systems we conservatively set s=b=5𝑠𝑏5s=b=5italic_s = italic_b = 5. This delay accounts for the combined latency of measuring offsets from the NEID guide camera, model processing time, as well as the deployment of an additional corrective layer that incorporates recent model error to correct the pre-trained neural network. This additional layer is auto-regressive–a term used to refer to a deep learning framework where previous predictions are used to inform future outputs. It’s important to note that this is different from the recurrent architectures described in Sec. 2, which refers only to inputs being sequences of observations. Often auto-regressive techniques are used in conjunction with recurrent networks, as we do here. Fig.7 shows the total outline of this network in more detail.

Refer to caption
Figure 7: Architecture of the Tracking Network
Refer to caption
Figure 8: A visualization of our model for predictive tracking using one path sample in our testing set. These examples indicate the success of our model at forecasting complex patterns. Again, the uncorrected azimuth (green) comes directly from the coordinate transform done with astropy. The ”true” azimuth is that of the target with the addition systematic error introduced by our synthetic features (orange), and the NN azimuth (blue) is our network’s forecasted prediction.

In order to gauge the effectiveness of these models, we look at the value representing the 95th percentile of the percentage error in offsets in azimuth and elevation. This figure tracks overall performance, minimally biased by outliers. Table 2 displays these percentile values for the recurrent networks on the testing set. In addition, we compare with the astropy coordinate transforms, which serve as our telescope independent benchmark. Finally we show the performance of a shallow linear model. For a detailed look at one example tracking path, Fig. 8 displays a subset of a single “observation” or tracking path, illustrating the performance of our predictive GRU network. We show the performance on three scales: degrees, arcminutes, and arcseconds. The intentionally introduced ‘systematic’ errors and those relating to atmospheric conditions in our synthetic data results in shifts from the baseline astropy coordinate transforms at the level of several arcminutes (as indicated in the first row of Table 2). The synthetic data in our model includes some random noise on the order of a few arcseconds (depending on the length of the path), so we expect our accuracy to be of this same order.

Table 2: Recurrent Network Performance Statistics
Model Azimuth P95subscript𝑃95P_{95}italic_P start_POSTSUBSCRIPT 95 end_POSTSUBSCRIPT Percent Error Elevation P95subscript𝑃95P_{95}italic_P start_POSTSUBSCRIPT 95 end_POSTSUBSCRIPT Percent Error Az (′′) El (′′)
Astro Baseline 6.5075 6.4752 234.27 233.10
Linear Baseline 0.1655 0.1312 5.96 4.72
LSTM 0.0850 0.0887 3.06 3.19
GRU 0.0610 0.0702 2.20 2.53

4.2 Tracking Corrections with Other Architectures

In addition to the recurrent neural networks described above, we also conduct some preliminary testing of other network architectures. These include the temporal convolutional network (TCN), an extrapolation of the common convolutional network used often in image processing, and the transformer network, whose self-attention mechanisms have gained increasing popularity in recent years [2]. In Table 3 we see the results of these other architectures on the same testing set used in the recurrent case.

Table 3: Other Network Performance Statistics
Model Azimuth P95subscript𝑃95P_{95}italic_P start_POSTSUBSCRIPT 95 end_POSTSUBSCRIPT Percent Error Elevation P95subscript𝑃95P_{95}italic_P start_POSTSUBSCRIPT 95 end_POSTSUBSCRIPT Percent Error Az (′′) El (′′)
Astro Baseline 6.5075 6.4752 234.27 233.10
Linear Baseline 0.1655 0.1312 5.96 4.72
TCN 0.2924 0.1260 10.53 4.54
Transformer 0.3098 0.2747 11.15 9.89
GRU-TCN 0.0817 0.0919 2.94 3.31

5 Summary and Future Work

We have developed a target acquisition system, built upon a recurrent neural network, that improves upon the WIYN telescope’s current pointing solution, when trained and tested on historical data recorded from the Star Tracking camera. Our system can be trained and re-calibrated during non-observational hours, using only science observations, boosting both the time and cost efficiency needed to obtain accurate pointing. We have also developed a model for predictive target tracking, using a similar deep learning architecture. We evaluated its performance on synthetically-generated data, mimicking the errors, trends and noise expected in real NEID data. Overall, our system is able to improve upon the WIYN’s current pointing framework through the use of novel deep learning techniques. We are in the process of testing the predictive tracking system using real tracking logs from ongoing NEID observations. In addition, in the upcoming year we hope to fully deploy and test our system alongside TPoint on sky, as we work towards a more generalized model that may be applied to other instruments and telescopes.

References

  • [1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
  • [2] Charu C. Aggarwal. Neural Networks and Deep Learning. Springer International Publishing, 2018.
  • [3] Astropy Collaboration, T. P. Robitaille, E. J. Tollerud, P. Greenfield, M. Droettboom, E. Bray, T. Aldcroft, M. Davis, A. Ginsburg, A. M. Price-Whelan, W. E. Kerzendorf, A. Conley, N. Crighton, K. Barbary, D. Muna, H. Ferguson, F. Grollier, M. M. Parikh, P. H. Nair, H. M. Unther, C. Deil, J. Woillez, S. Conseil, R. Kramer, J. E. H. Turner, L. Singer, R. Fox, B. A. Weaver, V. Zabalza, Z. I. Edwards, K. Azalee Bostroem, D. J. Burke, A. R. Casey, S. M. Crawford, N. Dencheva, J. Ely, T. Jenness, K. Labrie, P. L. Lim, F. Pierfederici, A. Pontzen, A. Ptak, B. Refsdal, M. Servillat, and O. Streicher. Astropy: A community Python package for astronomy. aap, 558:A33, October 2013.
  • [4] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014.
  • [5] Arvind F. Gupta, Jason T. Wright, Paul Robertson, Samuel Halverson, Jacob Luhn, Arpita Roy, Suvrath Mahadevan, Eric B. Ford, Chad F. Bender, Cullen H. Blake, Fred Hearty, Shubham Kanodia, Sarah E. Logsdon, Michael W. McElwain, Andrew Monson, Joe P. Ninan, Christian Schwab, Gudhmundur Stefánsson, and Ryan C. Terrien. Target Prioritization and Observing Strategies for the NEID Earth Twin Survey. aj, 161(3):130, March 2021.
  • [6] Samuel Halverson, Ryan Terrien, Suvrath Mahadevan, Arpita Roy, Chad Bender, Gudmundur K. Stefánsson, Andrew Monson, Eric Levi, Fred Hearty, Cullen Blake, Michael McElwain, Christian Schwab, Lawrence Ramsey, Jason Wright, Sharon Wang, Qian Gong, and Paul Roberston. A comprehensive radial velocity error budget for next generation Doppler spectrometers. In Christopher J. Evans, Luc Simard, and Hideki Takami, editors, Ground-based and Airborne Instrumentation for Astronomy VI, volume 9908 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, page 99086P, August 2016.
  • [7] Dan Li, Sarah E. Logsdon, William McBride, Jayadev Rajagopal, Marsha J. Wolf, Jeffrey W. Percival, Kurt P. Jaehnig, Michael P. Smith, Qian Gong, Michael W. McElwain, Heidi Schweiker, Eli Golub, Jesus Higuera, Jessica Klusmeyer, Emily Hunting, Erik Timmermann, Mark Everett, Wilson Liu, Susan Ridgway, Ming Liang, Christian Schwab, and Suvrath Mahadevan. The NEID port adapter at WIYN: on-sky fast guiding performance. In Christopher J. Evans, Julia J. Bryant, and Kentaro Motohara, editors, Ground-based and Airborne Instrumentation for Astronomy IX, volume 12184 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, page 121844O, August 2022.
  • [8] Sarah E. Logsdon, Michael W. McElwain, Qian Gong, Ming Liang, Fernando Santoro, Christian Schwab, Chad Bender, Cullen Blake, Samuel Halverson, Fred Hearty, Emily Hunting, Kurt P. Jaehnig, Suvrath Mahadevan, Andrew J. Monson, Jeffrey W. Percival, Jayadev Rajagopal, Lawrence Ramsey, Arpita Roy, Michael P. Smith, Ryan C. Terrien, Erik Timmermann, Phil Willems, Marsha J. Wolf, and Jason Wright. The NEID precision radial velocity spectrometer: port adapter overview, requirements, and test plan. In Christopher J. Evans, Luc Simard, and Hideki Takami, editors, Ground-based and Airborne Instrumentation for Astronomy VII, volume 10702 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, page 1070267, July 2018.
  • [9] Dominic Masters and Carlo Luschi. Revisiting small batch training for deep neural networks, 2018.
  • [10] Jeffrey W. Percival. Telescope pointing machine specification version 2.2. 2005.
  • [11] A. M. Price-Whelan, B. M. Sipőcz, H. M. Günther, P. L. Lim, S. M. Crawford, S. Conseil, D. L. Shupe, M. W. Craig, N. Dencheva, A. Ginsburg, J. T. VanderPlas, L. D. Bradley, D. Pérez-Suárez, M. de Val-Borro, (Primary Paper Contributors, T. L. Aldcroft, K. L. Cruz, T. P. Robitaille, E. J. Tollerud, (Astropy Coordination Committee, C. Ardelean, T. Babej, Y. P. Bach, M. Bachetti, A. V. Bakanov, S. P. Bamford, G. Barentsen, P. Barmby, A. Baumbach, K. L. Berry, F. Biscani, M. Boquien, K. A. Bostroem, L. G. Bouma, G. B. Brammer, E. M. Bray, H. Breytenbach, H. Buddelmeijer, D. J. Burke, G. Calderone, J. L. Cano Rodríguez, M. Cara, J. V. M. Cardoso, S. Cheedella, Y. Copin, L. Corrales, D. Crichton, D. D’Avella, C. Deil, É. Depagne, J. P. Dietrich, A. Donath, M. Droettboom, N. Earl, T. Erben, S. Fabbro, L. A. Ferreira, T. Finethy, R. T. Fox, L. H. Garrison, S. L. J. Gibbons, D. A. Goldstein, R. Gommers, J. P. Greco, P. Greenfield, A. M. Groener, F. Grollier, A. Hagen, P. Hirst, D. Homeier, A. J. Horton, G. Hosseinzadeh, L. Hu, J. S. Hunkeler, Ž. Ivezić, A. Jain, T. Jenness, G. Kanarek, S. Kendrew, N. S. Kern, W. E. Kerzendorf, A. Khvalko, J. King, D. Kirkby, A. M. Kulkarni, A. Kumar, A. Lee, D. Lenz, S. P. Littlefair, Z. Ma, D. M. Macleod, M. Mastropietro, C. McCully, S. Montagnac, B. M. Morris, M. Mueller, S. J. Mumford, D. Muna, N. A. Murphy, S. Nelson, G. H. Nguyen, J. P. Ninan, M. Nöthe, S. Ogaz, S. Oh, J. K. Parejko, N. Parley, S. Pascual, R. Patil, A. A. Patil, A. L. Plunkett, J. X. Prochaska, T. Rastogi, V. Reddy Janga, J. Sabater, P. Sakurikar, M. Seifert, L. E. Sherbert, H. Sherwood-Taylor, A. Y. Shih, J. Sick, M. T. Silbiger, S. Singanamalla, L. P. Singer, P. H. Sladen, K. A. Sooley, S. Sornarajah, O. Streicher, P. Teuben, S. W. Thomas, G. R. Tremblay, J. E. H. Turner, V. Terrón, M. H. van Kerkwijk, A. de la Vega, L. L. Watkins, B. A. Weaver, J. B. Whitmore, J. Woillez, V. Zabalza, and (Astropy Contributors. The Astropy Project: Building an Open-science Project and Status of the v2.0 Core Package. aj, 156:123, September 2018.
  • [12] Jayadev K. Rajagopal, Daniel R. Harbeck, Charles Corson, Behzad Abareshi, Heidi Schweiker, Wilson Liu, Eric J. Hooper, Jeffrey W. Percival, and Kurt P. Jaehnig. Improving the WIYN Telescope’s pointing and tracking performance with a star tracker camera. In Gianluca Chiozzi and Nicole M. Radziwill, editors, Software and Cyberinfrastructure for Astronomy III, volume 9152 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, page 91521Z, July 2014.
  • [13] C. Schwab, A. Rakich, Q. Gong, S. Mahadevan, S. P. Halverson, A. Roy, R. C. Terrien, P. M. Robertson, F. R. Hearty, E. I. Levi, A. J. Monson, J. T. Wright, M. W. McElwain, C. F. Bender, C. H. Blake, J. Stürmer, Y. V. Gurevich, A. Chakraborty, and L. W. Ramsey. Design of NEID, an extreme precision Doppler spectrograph for WIYN. In Christopher J. Evans, Luc Simard, and Hideki Takami, editors, Ground-based and Airborne Instrumentation for Astronomy VI, volume 9908 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, page 99087H, August 2016.
  • [14] Patrick Wallace. Ten reasons why accurate pointing is non-trivial. In Metrology and Control of Large Telescopes, page 4, September 2016.
  • [15] Patrick Wallace. Telescope pointing, 2022. Accessed: 2023-08-22.