About
Activity
-
Super excited to announce I will be starting at Aberdean Consulting LLC on Tuesday as a Network Intern! I'm really thankful for the opportunity that…
Super excited to announce I will be starting at Aberdean Consulting LLC on Tuesday as a Network Intern! I'm really thankful for the opportunity that…
Liked by Kyle Shaffer
-
Just finished up Introduction to Network Analysis free course on securityblueteam.com. This is probably one of the more difficult but also rewarding…
Just finished up Introduction to Network Analysis free course on securityblueteam.com. This is probably one of the more difficult but also rewarding…
Liked by Kyle Shaffer
-
Large Language Models work better in English, simple as that. The fact that they can perform multilingual tasks, like translation, is almost…
Large Language Models work better in English, simple as that. The fact that they can perform multilingual tasks, like translation, is almost…
Liked by Kyle Shaffer
Experience & Education
Publications
-
AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature
EACL 2023
Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the linguistic relations of these alignments, and create automated models to predict these relations as…
Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the linguistic relations of these alignments, and create automated models to predict these relations as well as to generate abridgements for new texts. Our findings establish abridgement as a challenging task, motivating future resources and research. The dataset is available at this http URL.
Other authorsSee publication -
Language Clustering for Multilingual Named Entity Recognition
Findings of EMNLP 2021
Recent work in multilingual natural language processing has shown progress in various tasks such as natural language inference and joint multilingual translation. Despite success in learning across many languages, challenges arise where multilingual training regimes often boost performance on some languages at the expense of others. For multilingual named entity recognition (NER) we propose a simple technique that groups similar languages together by using embeddings from a pre-trained masked…
Recent work in multilingual natural language processing has shown progress in various tasks such as natural language inference and joint multilingual translation. Despite success in learning across many languages, challenges arise where multilingual training regimes often boost performance on some languages at the expense of others. For multilingual named entity recognition (NER) we propose a simple technique that groups similar languages together by using embeddings from a pre-trained masked language model, and automatically discovering language clusters in this embedding space. Specifically, we fine-tune an XLM-Roberta model on a language identification task, and use embeddings from this model for clustering. We conduct experiments on 15 diverse languages in the WikiAnn dataset and show our technique largely outperforms three baselines: (1) training a multilingual model jointly on all available languages, (2) training one monolingual model per language, and (3) grouping languages by linguistic family. We also conduct analyses showing meaningful multilingual transfer for low-resource languages (Swahili and Yoruba), despite being automatically grouped with other seemingly disparate languages.
-
Beyond Fine-Tuning: Adding Capacity to Leverage Few Labels
31st Conference on Neural Information Processing Systems (Limited Labeled Data Workshop)
In this paper we present a technique to train neural network models on small
amounts of data. Current methods for training neural networks on small amounts
of rich data typically rely on strategies such as fine-tuning a pre-trained neural
network or the use of domain-specific hand-engineered features. Here we take the
approach of treating network layers, or entire networks, as modules and combine
pre-trained modules with untrained modules, to learn the shift in…In this paper we present a technique to train neural network models on small
amounts of data. Current methods for training neural networks on small amounts
of rich data typically rely on strategies such as fine-tuning a pre-trained neural
network or the use of domain-specific hand-engineered features. Here we take the
approach of treating network layers, or entire networks, as modules and combine
pre-trained modules with untrained modules, to learn the shift in distributions
between data sets. The central impact of using a modular approach comes from
adding new representations to a network, as opposed to replacing representations
via fine-tuning. Using this technique, we are able surpass results using standard
fine-tuning transfer learning approaches, and we are also able to significantly
increase performance over such approaches when using smaller amounts of data. -
Intrinsic and Extrinsic Evaluation of Spatiotemporal Text Representations in Twitter Streams
2nd Workshop on Representation Learning for NLP
-
Predicting Speech Acts in MOOC Forum Posts
Proceedings of the 9th International Conference on Weblogs and Social Media, ICWSM 2015
Students in a Massive Open Online Course (MOOC) interact with each other and the course staff through online discussion forums. While discussion forums play a central role in MOOCs, they also pose a challenge for instructors. The large number of student posts makes it difficult for instructors to know where to intervene to answer questions, resolve issues, and provide feedback.
In this work, we focus on automatically predicting speech acts in MOOC forum posts. Our speech act categories…Students in a Massive Open Online Course (MOOC) interact with each other and the course staff through online discussion forums. While discussion forums play a central role in MOOCs, they also pose a challenge for instructors. The large number of student posts makes it difficult for instructors to know where to intervene to answer questions, resolve issues, and provide feedback.
In this work, we focus on automatically predicting speech acts in MOOC forum posts. Our speech act categories describe the purpose or function of the post in the ongoing discussion. Specifically, we address three main research questions. First, we investigate whether crowdsourced workers can reliably label MOOC forum posts using our speech act definitions. Second, we investigate
whether our speech acts can help predict instructor interventions and assignment completion and
performance. Finally, we investigate which types of features (derived from the post content, author, and surrounding context) are most effective for predicting our different speech act categories.Other authors -
-
Using Natural Language Processing to Facilitate Medical Record Abstraction in Epidemiological Studies
American Medical Informatics Association Annual Symposium (Poster)
The Atherosclerosis Risk in Communities (ARIC) study conducts ongoing surveillance of hospitalized cardiovascular health events and death in 4 communities in the United States (NC, MI, MN and MD). Diagnostic criteria for heart failure (HF) has been manually abstracted from medical records since 2005, including the presence of symptoms consistent with HF decompensation (new onset or worsening shortness of breath, edema, paroxysmal nocturnal dyspnea, and orthopnea) during patients'…
The Atherosclerosis Risk in Communities (ARIC) study conducts ongoing surveillance of hospitalized cardiovascular health events and death in 4 communities in the United States (NC, MI, MN and MD). Diagnostic criteria for heart failure (HF) has been manually abstracted from medical records since 2005, including the presence of symptoms consistent with HF decompensation (new onset or worsening shortness of breath, edema, paroxysmal nocturnal dyspnea, and orthopnea) during patients' hospitalizations. The manual chart abstraction process has high repeatability under a stringent quality control protocol, but is time consuming and costly. The goal of this study is to develop and test natural language processing (NLP) tools to extract information on complex symptoms from free-text electronic medical records.
Other authors -
More activity by Kyle
-
Well, it didn’t take long to ruin ChatGPT did it? A jailbreak called Do Anything Now (affectionately known as Dan) prompts ChatGPT to create content…
Well, it didn’t take long to ruin ChatGPT did it? A jailbreak called Do Anything Now (affectionately known as Dan) prompts ChatGPT to create content…
Liked by Kyle Shaffer
-
Hello LinkedIn! I've been on a consulting journey in the data space since the beginning of the year and I'm looking to pick up more clients. I have…
Hello LinkedIn! I've been on a consulting journey in the data space since the beginning of the year and I'm looking to pick up more clients. I have…
Liked by Kyle Shaffer
-
Looking forward to attending #EACL2023 in Dubrovnik next week to present our paper "AbLit: A Resource for Analyzing and Generating Abridged Versions…
Looking forward to attending #EACL2023 in Dubrovnik next week to present our paper "AbLit: A Resource for Analyzing and Generating Abridged Versions…
Liked by Kyle Shaffer
-
What would Nietzsche have thought of the chatbot revolution? He’d have thought it was meaningless, obviously. The Machine Intelligence Research…
What would Nietzsche have thought of the chatbot revolution? He’d have thought it was meaningless, obviously. The Machine Intelligence Research…
Liked by Kyle Shaffer
-
Completed the final module of the Metasploitable room on TryHackMe. Really fun, learned a ton, a few minor problems with virtual machines but other…
Completed the final module of the Metasploitable room on TryHackMe. Really fun, learned a ton, a few minor problems with virtual machines but other…
Liked by Kyle Shaffer
-
Well ChatGPT escalated quickly... #chatgpt #artficialintelligence #Microsoft Want to learn more about ChatGPT and how it can be used in…
Well ChatGPT escalated quickly... #chatgpt #artficialintelligence #Microsoft Want to learn more about ChatGPT and how it can be used in…
Liked by Kyle Shaffer
-
Was going to start a short AWS Cloud Security certificate training, but ya know what sounded like more fun? Setting a home lab so I can experiment…
Was going to start a short AWS Cloud Security certificate training, but ya know what sounded like more fun? Setting a home lab so I can experiment…
Liked by Kyle Shaffer
-
Another module in the books! This one was on cross-site scripting and although this gave just a very base level overview I enjoyed it. The more I do…
Another module in the books! This one was on cross-site scripting and although this gave just a very base level overview I enjoyed it. The more I do…
Liked by Kyle Shaffer
-
I just completed Authentication Bypass on TryHackMe. This module gave me a lot of new tools to research like curl and ffuf. It also introduced Logic…
I just completed Authentication Bypass on TryHackMe. This module gave me a lot of new tools to research like curl and ffuf. It also introduced Logic…
Liked by Kyle Shaffer
-
Attended DC608 monthly meetup last night for Stuart McIntosh talk on "Threat Hunting Your Alerts", met a lot of very nice people and had some great…
Attended DC608 monthly meetup last night for Stuart McIntosh talk on "Threat Hunting Your Alerts", met a lot of very nice people and had some great…
Liked by Kyle Shaffer
-
Finished Content Discovery portion of Jr. Penetration Tester Path on Tryhackme, now onto Subdomain Enumeration.
Finished Content Discovery portion of Jr. Penetration Tester Path on Tryhackme, now onto Subdomain Enumeration.
Liked by Kyle Shaffer
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Kyle Shaffer in United States
-
Kyle Shaffer
Passionate leader and facilitator, helping teams and leaders become the best version of themselves
-
Kyle Shaffer
-
Kyle Shaffer
Vice President at SRS Real Estate Partners
-
Kyle Shaffer
Operations Manager
87 others named Kyle Shaffer in United States are on LinkedIn
See others named Kyle Shaffer