subscribe to arXiv mailings

NL2KQL: From Natural Language to Kusto Query

Authors: Amir H. Abdi, Xinye Tang, Jeremias Eichelbaum, Mahan Das, Alex Klein, Nihal Irmak Pakis, William Blum, Daniel L Mace, Tanvi Raja, Namrata Padmanabhan, Ye Xing

Abstract: Data is growing rapidly in volume and complexity. Proficiency in database query languages is pivotal for crafting effective queries. As coding assistants become more prevalent, there is significant opportunity to enhance database query languages. The Kusto Query Language (KQL) is a widely used query language for large semi-structured data such as logs, telemetries, and time-series for big data ana… ▽ More Data is growing rapidly in volume and complexity. Proficiency in database query languages is pivotal for crafting effective queries. As coding assistants become more prevalent, there is significant opportunity to enhance database query languages. The Kusto Query Language (KQL) is a widely used query language for large semi-structured data such as logs, telemetries, and time-series for big data analytics platforms. This paper introduces NL2KQL an innovative framework that uses large language models (LLMs) to convert natural language queries (NLQs) to KQL queries. The proposed NL2KQL framework includes several key components: Schema Refiner which narrows down the schema to its most pertinent elements; the Few-shot Selector which dynamically selects relevant examples from a few-shot dataset; and the Query Refiner which repairs syntactic and semantic errors in KQL queries. Additionally, this study outlines a method for generating large datasets of synthetic NLQ-KQL pairs which are valid within a specific database contexts. To validate NL2KQL's performance, we utilize an array of online (based on query execution) and offline (based on query parsing) metrics. Through ablation studies, the significance of each framework component is examined, and the datasets used for benchmarking are made publicly available. This work is the first of its kind and is compared with available baselines to demonstrate its effectiveness. △ Less

Submitted 15 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2305.16444 [pdf, other]

Don't Retrain, Just Rewrite: Countering Adversarial Perturbations by Rewriting Text

Authors: Ashim Gupta, Carter Wood Blum, Temma Choji, Yingjie Fei, Shalin Shah, Alakananda Vempala, Vivek Srikumar

Abstract: Can language models transform inputs to protect text classifiers against adversarial attacks? In this work, we present ATINTER, a model that intercepts and learns to rewrite adversarial inputs to make them non-adversarial for a downstream text classifier. Our experiments on four datasets and five attack mechanisms reveal that ATINTER is effective at providing better adversarial robustness than exi… ▽ More Can language models transform inputs to protect text classifiers against adversarial attacks? In this work, we present ATINTER, a model that intercepts and learns to rewrite adversarial inputs to make them non-adversarial for a downstream text classifier. Our experiments on four datasets and five attack mechanisms reveal that ATINTER is effective at providing better adversarial robustness than existing defense approaches, without compromising task accuracy. For example, on sentiment classification using the SST-2 dataset, our method improves the adversarial accuracy over the best existing defense approach by more than 4% with a smaller decrease in task accuracy (0.5% vs 2.5%). Moreover, we show that ATINTER generalizes across multiple downstream tasks and classifiers without having to explicitly retrain it for those settings. Specifically, we find that when ATINTER is trained to remove adversarial perturbations for the sentiment classification task on the SST-2 dataset, it even transfers to a semantically different task of news classification (on AGNews) and improves the adversarial robustness by more than 10%. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023

arXiv:2303.02161 [pdf]

Exploring Fundamental Particle Acceleration and Loss Processes in Heliophysics through an Orbiting X-ray Instrument in the Jovian System

Authors: W. Dunn, G. Berland, E. Roussos, G. Clark, P. Kollmann, D. Turner, C. Feldman, T. Stallard, G. Branduardi-Raymont, E. E. Woodfield, I. J. Rae, L. C. Ray, J. A. Carter, S. T. Lindsay, Z. Yao, R. Marshall, A. N. Jaynes A., Y. Ezoe, M. Numazawa, G. B. Hospodarsky, X. Wu, D. M. Weigt, C. M. Jackman, K. Mori, Q. Nénon , et al. (19 additional authors not shown)

Abstract: Jupiter's magnetosphere is considered to be the most powerful particle accelerator in the Solar System, accelerating electrons from eV to 70 MeV and ions to GeV energies. How electromagnetic processes drive energy and particle flows, producing and removing energetic particles, is at the heart of Heliophysics. Particularly, the 2013 Decadal Strategy for Solar and Space Physics was to "Discover and… ▽ More Jupiter's magnetosphere is considered to be the most powerful particle accelerator in the Solar System, accelerating electrons from eV to 70 MeV and ions to GeV energies. How electromagnetic processes drive energy and particle flows, producing and removing energetic particles, is at the heart of Heliophysics. Particularly, the 2013 Decadal Strategy for Solar and Space Physics was to "Discover and characterize fundamental processes that occur both within the heliosphere and throughout the universe". The Jovian system offers an ideal natural laboratory to investigate all of the universal processes highlighted in the previous Decadal. The X-ray waveband has been widely used to remotely study plasma across astrophysical systems. The majority of astrophysical emissions can be grouped into 5 X-ray processes: fluorescence, thermal/coronal, scattering, charge exchange and particle acceleration. The Jovian system offers perhaps the only system that presents a rich catalog of all of these X-ray emission processes and can also be visited in-situ, affording the special possibility to directly link fundamental plasma processes with their resulting X-ray signatures. This offers invaluable ground-truths for astrophysical objects beyond the reach of in-situ exploration (e.g. brown dwarfs, magnetars or galaxy clusters that map the cosmos). Here, we show how coupling in-situ measurements with in-orbit X-ray observations of Jupiter's radiation belts, Galilean satellites, Io Torus, and atmosphere addresses fundamental heliophysics questions with wide-reaching impact across helio- and astrophysics. New developments like miniaturized X-ray optics and radiation-tolerant detectors, provide compact, lightweight, wide-field X-ray instruments perfectly suited to the Jupiter system, enabling this exciting new possibility. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: A White Paper for the 2024-2033 Solar and Space Physics (Heliophysics) Decadal Survey

arXiv:1802.10583 [pdf, other]

Reducing Lambda Terms with Traversals

Authors: William Blum

Abstract: We introduce a method to evaluate untyped lambda terms by combining the theory of traversals, a term-tree traversing technique inspired from Game Semantics, with judicious use of the eta-conversion rule of the lambda calculus. The traversal theory of the simply-typed lambda calculus relies on the eta-long transform to ensure that when traversing an application, there is a subterm representing ev… ▽ More We introduce a method to evaluate untyped lambda terms by combining the theory of traversals, a term-tree traversing technique inspired from Game Semantics, with judicious use of the eta-conversion rule of the lambda calculus. The traversal theory of the simply-typed lambda calculus relies on the eta-long transform to ensure that when traversing an application, there is a subterm representing every possible operator's argument. In the untyped setting, we instead exhibit the missing operand via ad-hoc instantiation of the eta-expansion rule, which allows the traversal to proceed as if the operand existed in the original term. This gives rise to a more generic concept of traversals for lambda terms. A notable improvement, in addition to handling untyped terms, is that no preliminary transformation is required: the original unaltered lambda term is traversed. We show that by bounding the non-determinism of the traversal rule for free variables, one can effectively compute a set of traversals characterizing the paths in the tree representation of the beta-normal form, when it exists. This yields an evaluation algorithm for untyped lambda-terms. We prove correctness by showing that traversals implement leftmost linear reduction, a generalization of the head linear reduction of Danos et. al. △ Less

Submitted 28 February, 2018; originally announced February 2018.

ACM Class: F.4.1; F.1.1

arXiv:1711.04596 [pdf, other]

Not all bytes are equal: Neural byte sieve for fuzzing

Authors: Mohit Rajpal, William Blum, Rishabh Singh

Abstract: Fuzzing is a popular dynamic program analysis technique used to find vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input designed to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and often the best approach to generating such inputs is through ap… ▽ More Fuzzing is a popular dynamic program analysis technique used to find vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input designed to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and often the best approach to generating such inputs is through applying uniform random mutations to pre-existing valid inputs (seed files). We present a learning technique that uses neural networks to learn patterns in the input files from past fuzzing explorations to guide future fuzzing explorations. In particular, the neural models learn a function to predict good (and bad) locations in input files to perform fuzzing mutations based on the past mutations and corresponding code coverage information. We implement several neural models including LSTMs and sequence-to-sequence models that can encode variable length input files. We incorporate our models in the state-of-the-art AFL (American Fuzzy Lop) fuzzer and show significant improvements in terms of code coverage, unique code paths, and crashes for various input formats including ELF, PNG, PDF, and XML. △ Less

Submitted 9 November, 2017; originally announced November 2017.

arXiv:1701.02118 [pdf, ps, other]

Type homogeneity is not a restriction for safe recursion schemes

Authors: William Blum

Abstract: Knapik et al. introduced the safety restriction which constrains both the types and syntax of the production rules defining a higher-order recursion scheme. This restriction gives rise to an equi-expressivity result between order-n pushdown automata and order-n safe recursion schemes, when such devices are used as tree generators. We show that the typing constraint of safety, called homogeneity, i… ▽ More Knapik et al. introduced the safety restriction which constrains both the types and syntax of the production rules defining a higher-order recursion scheme. This restriction gives rise to an equi-expressivity result between order-n pushdown automata and order-n safe recursion schemes, when such devices are used as tree generators. We show that the typing constraint of safety, called homogeneity, is unnecessary in the sense that imposing the syntactic restriction alone is sufficient to prove the equi-expressivity result for trees. △ Less

Submitted 9 January, 2017; originally announced January 2017.

Comments: The result presented in this paper was privately circulated for the first time in 2009 and shared on my personal website but was never published in a journal or conference

arXiv:1604.02259 [pdf, other]

doi 10.1109/23.940070

Construction and Test of the Precision Drift Chambers for the ATLAS Muon Spectrometer

Authors: F. Bauer, W. Blum, U. Bratzler, H. Dietl, S. Kotov, H. Kroha, Th. Lagouri, A. Manz, A. Ostapchuk, R. Richter, S. Schael, S. Chouridou, M. Deile, O. Kortner, A. Staude, R. Stroehmer, T. Trefzger

Abstract: The Monitored Drift Tube (MDT) chambers for the muon spectrometer of the ATLAS detector at the Large Hadron Collider (LHC) consist of 3-4 layers of pressurised drift tubes on either side of a space frame carrying an optical deformation monitoring system. The chambers have to provide a track position resolution of 40 microns with a single-tube resolution of at least 80 microns and a sense wire posi… ▽ More The Monitored Drift Tube (MDT) chambers for the muon spectrometer of the ATLAS detector at the Large Hadron Collider (LHC) consist of 3-4 layers of pressurised drift tubes on either side of a space frame carrying an optical deformation monitoring system. The chambers have to provide a track position resolution of 40 microns with a single-tube resolution of at least 80 microns and a sense wire positioning accu- racy of 20 ?microns (rms). The feasibility was demonstrated with the full-scale prototype of one of the largest MDT chambers with 432 drift tubes of 3.8 m length. For the ATLAS muon spectrometer, 88 chambers of this type have to be built. The first chamber has been completed with a wire positioning accuracy of 14 microns (rms). △ Less

Submitted 8 April, 2016; originally announced April 2016.

Report number: MPI-PhE/2000-029, MPP-2016-073

Journal ref: IEEE Transactions on Nuclear Science, Vol. 48, No. 3 (2001) 302

arXiv:0901.2399 [pdf, ps, other]

doi 10.2168/LMCS-5(1:3)2009

The Safe Lambda Calculus

Authors: William Blum, C. -H. Luke Ong

Abstract: Safety is a syntactic condition of higher-order grammars that constrains occurrences of variables in the production rules according to their type-theoretic order. In this paper, we introduce the safe lambda calculus, which is obtained by transposing (and generalizing) the safety condition to the setting of the simply-typed lambda calculus. In contrast to the original definition of safety, our ca… ▽ More Safety is a syntactic condition of higher-order grammars that constrains occurrences of variables in the production rules according to their type-theoretic order. In this paper, we introduce the safe lambda calculus, which is obtained by transposing (and generalizing) the safety condition to the setting of the simply-typed lambda calculus. In contrast to the original definition of safety, our calculus does not constrain types (to be homogeneous). We show that in the safe lambda calculus, there is no need to rename bound variables when performing substitution, as variable capture is guaranteed not to happen. We also propose an adequate notion of beta-reduction that preserves safety. In the same vein as Schwichtenberg's 1976 characterization of the simply-typed lambda calculus, we show that the numeric functions representable in the safe lambda calculus are exactly the multivariate polynomials; thus conditional is not definable. We also give a characterization of representable word functions. We then study the complexity of deciding beta-eta equality of two safe simply-typed terms and show that this problem is PSPACE-hard. Finally we give a game-semantic analysis of safety: We show that safe terms are denoted by `P-incrementally justified strategies'. Consequently pointers in the game semantics of safe lambda-terms are only necessary from order 4 onwards. △ Less

Submitted 19 February, 2009; v1 submitted 16 January, 2009; originally announced January 2009.

ACM Class: F.3.2; F.4.1

Journal ref: Logical Methods in Computer Science, Volume 5, Issue 1 (February 19, 2009) lmcs:1145

arXiv:0901.0512 [pdf]

Expected Performance of the ATLAS Experiment - Detector, Trigger and Physics

Authors: The ATLAS Collaboration, G. Aad, E. Abat, B. Abbott, J. Abdallah, A. A. Abdelalim, A. Abdesselam, O. Abdinov, B. Abi, M. Abolins, H. Abramowicz, B. S. Acharya, D. L. Adams, T. N. Addy, C. Adorisio, P. Adragna, T. Adye, J. A. Aguilar-Saavedra, M. Aharrouche, S. P. Ahlen, F. Ahles, A. Ahmad, H. Ahmed, G. Aielli, T. Akdogan , et al. (2587 additional authors not shown)

Abstract: A detailed study is presented of the expected performance of the ATLAS detector. The reconstruction of tracks, leptons, photons, missing energy and jets is investigated, together with the performance of b-tagging and the trigger. The physics potential for a variety of interesting physics processes, within the Standard Model and beyond, is examined. The study comprises a series of notes based on… ▽ More A detailed study is presented of the expected performance of the ATLAS detector. The reconstruction of tracks, leptons, photons, missing energy and jets is investigated, together with the performance of b-tagging and the trigger. The physics potential for a variety of interesting physics processes, within the Standard Model and beyond, is examined. The study comprises a series of notes based on simulations of the detector and physics processes, with particular emphasis given to the data expected from the first years of operation of the LHC at CERN. △ Less

Submitted 14 August, 2009; v1 submitted 28 December, 2008; originally announced January 2009.

Showing 1–9 of 9 results for author: Blum, W