-
Detecting Asks in SE attacks: Impact of Linguistic and Structural Knowledge
Authors:
Bonnie J. Dorr,
Archna Bhatia,
Adam Dalton,
Brodie Mather,
Bryanna Hebenstreit,
Sashank Santhanam,
Zhuo Cheng,
Samira Shaikh,
Alan Zemel,
Tomek Strzalkowski
Abstract:
Social engineers attempt to manipulate users into undertaking actions such as downloading malware by clicking links or providing access to money or sensitive information. Natural language processing, computational sociolinguistics, and media-specific structural clues provide a means for detecting both the ask (e.g., buy gift card) and the risk/reward implied by the ask, which we call framing (e.g.…
▽ More
Social engineers attempt to manipulate users into undertaking actions such as downloading malware by clicking links or providing access to money or sensitive information. Natural language processing, computational sociolinguistics, and media-specific structural clues provide a means for detecting both the ask (e.g., buy gift card) and the risk/reward implied by the ask, which we call framing (e.g., lose your job, get a raise). We apply linguistic resources such as Lexical Conceptual Structure to tackle ask detection and also leverage structural clues such as links and their proximity to identified asks to improve confidence in our results. Our experiments indicate that the performance of ask detection, framing detection, and identification of the top ask is improved by linguistically motivated classes coupled with structural clues such as links. Our approach is implemented in a system that informs users about social engineering risk situations.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Studying the Effects of Cognitive Biases in Evaluation of Conversational Agents
Authors:
Sashank Santhanam,
Alireza Karduni,
Samira Shaikh
Abstract:
Humans quite frequently interact with conversational agents. The rapid advancement in generative language modeling through neural networks has helped advance the creation of intelligent conversational agents. Researchers typically evaluate the output of their models through crowdsourced judgments, but there are no established best practices for conducting such studies. Moreover, it is unclear if c…
▽ More
Humans quite frequently interact with conversational agents. The rapid advancement in generative language modeling through neural networks has helped advance the creation of intelligent conversational agents. Researchers typically evaluate the output of their models through crowdsourced judgments, but there are no established best practices for conducting such studies. Moreover, it is unclear if cognitive biases in decision-making are affecting crowdsourced workers' judgments when they undertake these tasks. To investigate, we conducted a between-subjects study with 77 crowdsourced workers to understand the role of cognitive biases, specifically anchoring bias, when humans are asked to evaluate the output of conversational agents. Our results provide insight into how best to evaluate conversational agents. We find increased consistency in ratings across two experimental conditions may be a result of anchoring bias. We also determine that external factors such as time and prior experience in similar tasks have effects on inter-rater consistency.
△ Less
Submitted 26 February, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
'Alexa, Do You Know Anything?' The Impact of an Intelligent Assistant on Team Interactions and Creative Performance Under Time Scarcity
Authors:
Sonia Jawaid Shaikh,
Ignacio Cruz
Abstract:
Human-AI collaboration is on the rise with the deployment of AI-enabled intelligent assistants (e.g. Amazon Echo, Cortana, Siri, etc.) across organizational contexts. It is claimed that intelligent assistants can help people achieve more in less time (Personal Digital Assistant - Cortana, n.d.). However, despite the increasing presence of intelligent assistants in collaborative settings, there is…
▽ More
Human-AI collaboration is on the rise with the deployment of AI-enabled intelligent assistants (e.g. Amazon Echo, Cortana, Siri, etc.) across organizational contexts. It is claimed that intelligent assistants can help people achieve more in less time (Personal Digital Assistant - Cortana, n.d.). However, despite the increasing presence of intelligent assistants in collaborative settings, there is a void in the literature on how the deployment of this technology intersects with time scarcity to impact team behaviors and performance. To fill this gap in the literature, we collected behavioral data from 56 teams who participated in a between-subjects 2 (Intelligent Assistant: Available vs. Not Available) x 2 (Time: Scarce vs. Not Scarce/Control) lab experiment. The results show that teams with an intelligent assistant had significantly fewer interactions between its members compared to teams without an intelligent assistant. Teams who faced time scarcity also used the intelligent assistant more often to seek its assistance during task completion compared to those in the control condition. Lastly, teams with an intelligent assistant underperformed on a creative task compared to those without the device. We discuss implications of this technology from theoretical, empirical, and practical perspectives.
△ Less
Submitted 30 December, 2019;
originally announced December 2019.
-
Some properties of Sadik transform and its applications of fractional-order dynamical systems in control theory
Authors:
Saleh S. Redhwan,
Sadikali L. Shaikh,
Mohammed S. Abdo
Abstract:
In this paper, we study some new properties of Sadik transform such as integration, time delay, initial value theorem, and final value theorem. Moreover, we prove the theorem of Sadik transform for Caputo fractional derivative and we also establish sufficient conditions for the existence of the Sadik transform of Caputo fractional derivatives. At the end, the fractional-order dynamical systems in…
▽ More
In this paper, we study some new properties of Sadik transform such as integration, time delay, initial value theorem, and final value theorem. Moreover, we prove the theorem of Sadik transform for Caputo fractional derivative and we also establish sufficient conditions for the existence of the Sadik transform of Caputo fractional derivatives. At the end, the fractional-order dynamical systems in control theory as an application of this transform is discussed, in addition, some numerical examples to justify our results.
△ Less
Submitted 24 December, 2019;
originally announced December 2019.
-
Natural Language Generation Using Reinforcement Learning with External Rewards
Authors:
Vidhushini Srinivasan,
Sashank Santhanam,
Samira Shaikh
Abstract:
We propose an approach towards natural language generation using a bidirectional encoder-decoder which incorporates external rewards through reinforcement learning (RL). We use attention mechanism and maximum mutual information as an initial objective function using RL. Using a two-part training scheme, we train an external reward analyzer to predict the external rewards and then use the predicted…
▽ More
We propose an approach towards natural language generation using a bidirectional encoder-decoder which incorporates external rewards through reinforcement learning (RL). We use attention mechanism and maximum mutual information as an initial objective function using RL. Using a two-part training scheme, we train an external reward analyzer to predict the external rewards and then use the predicted rewards to maximize the expected rewards (both internal and external). We evaluate the system on two standard dialogue corpora - Cornell Movie Dialog Corpus and Yelp Restaurant Review Corpus. We report standard evaluation metrics including BLEU, ROUGE-L, and perplexity as well as human evaluation to validate our approach.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Emotional Neural Language Generation Grounded in Situational Contexts
Authors:
Sashank Santhanam,
Samira Shaikh
Abstract:
Emotional language generation is one of the keys to human-like artificial intelligence. Humans use different type of emotions depending on the situation of the conversation. Emotions also play an important role in mediating the engagement level with conversational partners. However, current conversational agents do not effectively account for emotional content in the language generation process. T…
▽ More
Emotional language generation is one of the keys to human-like artificial intelligence. Humans use different type of emotions depending on the situation of the conversation. Emotions also play an important role in mediating the engagement level with conversational partners. However, current conversational agents do not effectively account for emotional content in the language generation process. To address this problem, we develop a language modeling approach that generates affective content when the dialogue is situated in a given context. We use the recently released Empathetic-Dialogues corpus to build our models. Through detailed experiments, we find that our approach outperforms the state-of-the-art method on the perplexity metric by about 5 points and achieves a higher BLEU metric score.
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
Theory of Nonlinear Caputo-Katugampola Fractional Differential Equations
Authors:
Saleh S. Redhwan,
Sadikali L. Shaikh,
Mohammed S. Abdo
Abstract:
This manuscript investigates the existence and uniqueness of solutions to the first order fractional anti-periodic boundary value problem involving Caputo-Katugampola (CK) derivative. A variety of tools for analysis this paper through the integral equivalent equation of the given problem, fixed point theorems of Leray--Schauder, Krasnoselskii's, and Banach are used. Examples of the obtained result…
▽ More
This manuscript investigates the existence and uniqueness of solutions to the first order fractional anti-periodic boundary value problem involving Caputo-Katugampola (CK) derivative. A variety of tools for analysis this paper through the integral equivalent equation of the given problem, fixed point theorems of Leray--Schauder, Krasnoselskii's, and Banach are used. Examples of the obtained results are also presented.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Towards Best Experiment Design for Evaluating Dialogue System Output
Authors:
Sashank Santhanam,
Samira Shaikh
Abstract:
To overcome the limitations of automated metrics (e.g. BLEU, METEOR) for evaluating dialogue systems, researchers typically use human judgments to provide convergent evidence. While it has been demonstrated that human judgments can suffer from the inconsistency of ratings, extant research has also found that the design of the evaluation task affects the consistency and quality of human judgments.…
▽ More
To overcome the limitations of automated metrics (e.g. BLEU, METEOR) for evaluating dialogue systems, researchers typically use human judgments to provide convergent evidence. While it has been demonstrated that human judgments can suffer from the inconsistency of ratings, extant research has also found that the design of the evaluation task affects the consistency and quality of human judgments. We conduct a between-subjects study to understand the impact of four experiment conditions on human ratings of dialogue system output. In addition to discrete and continuous scale ratings, we also experiment with a novel application of Best-Worst scaling to dialogue evaluation. Through our systematic study with 40 crowdsourced workers in each task, we find that using continuous scales achieves more consistent ratings than Likert scale or ranking-based experiment design. Additionally, we find that factors such as time taken to complete the task and no prior experience of participating in similar studies of rating dialogue system output positively impact consistency and agreement amongst raters
△ Less
Submitted 22 September, 2019;
originally announced September 2019.
-
I Stand With You: Using Emojis to Study Solidarity in Crisis Events
Authors:
Sashank Santhanam,
Vidhushini Srinivasan,
Shaina Glass,
Samira Shaikh
Abstract:
We study how emojis are used to express solidarity in social media in the context of two major crisis events - a natural disaster, Hurricane Irma in 2017 and terrorist attacks that occurred on November 2015 in Paris. Using annotated corpora, we first train a recurrent neural network model to classify expressions of solidarity in text. Next, we use these expressions of solidarity to characterize hu…
▽ More
We study how emojis are used to express solidarity in social media in the context of two major crisis events - a natural disaster, Hurricane Irma in 2017 and terrorist attacks that occurred on November 2015 in Paris. Using annotated corpora, we first train a recurrent neural network model to classify expressions of solidarity in text. Next, we use these expressions of solidarity to characterize human behavior in online social networks, through the temporal and geospatial diffusion of emojis. Our analysis reveals that emojis are a powerful indicator of sociolinguistic behaviors (solidarity) that are exhibited on social media as the crisis events unfold.
△ Less
Submitted 18 July, 2019;
originally announced July 2019.
-
A Survey of Natural Language Generation Techniques with a Focus on Dialogue Systems - Past, Present and Future Directions
Authors:
Sashank Santhanam,
Samira Shaikh
Abstract:
One of the hardest problems in the area of Natural Language Processing and Artificial Intelligence is automatically generating language that is coherent and understandable to humans. Teaching machines how to converse as humans do falls under the broad umbrella of Natural Language Generation. Recent years have seen unprecedented growth in the number of research articles published on this subject in…
▽ More
One of the hardest problems in the area of Natural Language Processing and Artificial Intelligence is automatically generating language that is coherent and understandable to humans. Teaching machines how to converse as humans do falls under the broad umbrella of Natural Language Generation. Recent years have seen unprecedented growth in the number of research articles published on this subject in conferences and journals both by academic and industry researchers. There have also been several workshops organized alongside top-tier NLP conferences dedicated specifically to this problem. All this activity makes it hard to clearly define the state of the field and reason about its future directions. In this work, we provide an overview of this important and thriving area, covering traditional approaches, statistical approaches and also approaches that use deep neural networks. We provide a comprehensive review towards building open domain dialogue systems, an important application of natural language generation. We find that, predominantly, the approaches for building dialogue systems use seq2seq or language models architecture. Notably, we identify three important areas of further research towards building more effective dialogue systems: 1) incorporating larger context, including conversation context and world knowledge; 2) adding personae or personality in the NLG system; and 3) overcoming dull and generic responses that affect the quality of system-produced responses. We provide pointers on how to tackle these open problems through the use of cognitive architectures that mimic human language understanding and generation capabilities.
△ Less
Submitted 2 June, 2019;
originally announced June 2019.
-
Joint Bayesian analysis of large angular scale CMB temperature anomalies
Authors:
Shabbir Shaikh,
Suvodip Mukherjee,
Santanu Das,
Benjamin D. Wandelt,
Tarun Souradeep
Abstract:
Cosmic microwave background measurements show an agreement with the concordance cosmology model except for a few notable anomalies: Power Suppression, the lack of large scale power in the temperature data compared to what is expected in the concordance model, and Cosmic Hemispherical Asymmetry, a dipolar breakdown of statistical isotropy. An expansion of the CMB covariance in Bipolar Spherical Har…
▽ More
Cosmic microwave background measurements show an agreement with the concordance cosmology model except for a few notable anomalies: Power Suppression, the lack of large scale power in the temperature data compared to what is expected in the concordance model, and Cosmic Hemispherical Asymmetry, a dipolar breakdown of statistical isotropy. An expansion of the CMB covariance in Bipolar Spherical Harmonics naturally parametrizes both these large-scale anomalies, allowing us to perform an exhaustive, fully Bayesian joint analysis of the power spectrum and violations of statistical isotropy up to the dipole level. Our analysis sheds light on the scale dependence of the Cosmic Hemispherical Asymmetry. Assuming a scale-dependent dipole modulation model with a two-parameter power law form, we explore the posterior pdf of amplitude $A(l = 16)$ and the power law index $α$ and find the maximum a posteriori values $A_*(l = 16) = 0.064 \pm 0.022$ and $α_* = -0.92 \pm 0.22$. The maximum a posteriori direction associated with the Cosmic Hemispherical Asymmetry is $(l,b) = (247.8^o, -19.6^o)$ in Galactic coordinates, consistent with previous analyses. We evaluate the Bayes factor $B_{SI-DM}$ to compare the Cosmic Hemispherical Asymmetry model with the isotropic model. The data prefer but do not substantially favor the anisotropic model ($B_{SI-DM}=0.4$). We consider several priors and find that this evidence ratio is robust to prior choice. The large-scale power suppression does not soften when jointly inferring both the isotropic power spectrum and the parameters of the asymmetric model, indicating no evidence that these anomalies are coupled.
△ Less
Submitted 19 August, 2019; v1 submitted 26 February, 2019;
originally announced February 2019.
-
Experimental Comparison of Hardware-Amenable Spike Detection Algorithms for iBMIs
Authors:
Shoeb Shaikh,
Rosa So,
Camilo Libedinsky,
Arindam Basu
Abstract:
This paper presents an experiment based comparison of absolute threshold (AT) and non-linear energy operator (NEO) spike detection algorithms in Intra-cortical Brain Machine Interfaces (iBMIs). Results show an average increase in decoding performance of approx. 5% in monkey A across 28 sessions recorded over 6 days and approx. 2% in monkey B across 35 sessions recorded over 8 days when using NEO o…
▽ More
This paper presents an experiment based comparison of absolute threshold (AT) and non-linear energy operator (NEO) spike detection algorithms in Intra-cortical Brain Machine Interfaces (iBMIs). Results show an average increase in decoding performance of approx. 5% in monkey A across 28 sessions recorded over 6 days and approx. 2% in monkey B across 35 sessions recorded over 8 days when using NEO over AT. To the best of our knowledge, this is the first ever reported comparison of spike detection algorithms in an iBMI experimental framework involving two monkeys. Based on the improvements observed in an experimental setting backed by previously reported improvements in simulation studies, we advocate switching from state of the art spike detection technique - AT to NEO.
△ Less
Submitted 11 December, 2018;
originally announced December 2018.
-
Real-time Closed Loop Neural Decoding on a Neuromorphic Chip
Authors:
Shoeb Shaikh,
Rosa So,
Tafadzwa Sibindi,
Camilo Libedinsky,
Arindam Basu
Abstract:
This paper presents for the first time a real-time closed loop neuromorphic decoder chip-driven intra-cortical brain machine interface (iBMI) in a non-human primate (NHP) based experimental setup. Decoded results show trial success rates and mean times to target comparable to those obtained by hand-controlled joystick. Neural control trial success rates of approximately 96% of those obtained by ha…
▽ More
This paper presents for the first time a real-time closed loop neuromorphic decoder chip-driven intra-cortical brain machine interface (iBMI) in a non-human primate (NHP) based experimental setup. Decoded results show trial success rates and mean times to target comparable to those obtained by hand-controlled joystick. Neural control trial success rates of approximately 96% of those obtained by hand-controlled joystick have been demonstrated. Also, neural control has shown mean target reach speeds of approximately 85% of those obtained by hand-controlled joystick . These results pave the way for fast and accurate, fully implantable neuromorphic neural decoders in iBMIs.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
GMRT Archive Processing Project
Authors:
Shubhankar Deshpande,
Yogesh Wadadekar,
Huib Intema,
B. Ratnakumar,
Lijo George,
Rathin Desai,
Archit Sakhadeo,
Shadab Shaikh,
C. H. Ishwara-Chandra,
Divya Oberoi
Abstract:
The GMRT Online Archive now houses over 120 terabytes of interferometric observations obtained with the GMRT since the observatory began operating as a facility in 2002. The utility of this vast data archive, likely the largest of any Indian telescope, can be significantly enhanced if first look (and where possible, science ready) processed images can be made available to the user community. We ha…
▽ More
The GMRT Online Archive now houses over 120 terabytes of interferometric observations obtained with the GMRT since the observatory began operating as a facility in 2002. The utility of this vast data archive, likely the largest of any Indian telescope, can be significantly enhanced if first look (and where possible, science ready) processed images can be made available to the user community. We have initiated a project to pipeline process GMRT images in the 150, 240, 325 and 610 MHz bands. The thousands of processed continuum images that we will produce will prove useful in studies of distant galaxy clusters, radio AGN, as well as nearby galaxies and star forming regions. Besides the scientific returns, a uniform data processing pipeline run on a large volume of data can be used in other interesting ways. For example, we will be able to measure various performance characteristics of the GMRT telescope and their dependence on waveband, time of day, RFI environment, backend, galactic latitude etc. in a systematic way. A variety of data products such as calibrated UVFITS data, sky images and AIPS processing logs will be delivered to users via a web-based interface. Data products will be compatible with standard Virtual Observatory protocols.
△ Less
Submitted 6 December, 2018;
originally announced December 2018.
-
Multiple rooks of chess - a generic integral field unit deployment technique
Authors:
Sabyasachi Chattopadhyay,
A. N. Ramaprakash,
Pravin Khodade,
Kabir Chakrabarty,
Shabbir Shaikh,
Haeun Chung,
Sungwook E. Hong
Abstract:
A new field re-configuration technique, Multiple Rooks of Chess (MRC), for multiple deployable Integral Field Spectrographs has been developed. The method involves mechanical geometry as well as an optimized deployment algorithm. The geometry is found to be simple for mechanical implementation. The algorithm initially assigns the IFUs to the target objects and then devises the movement sequence ba…
▽ More
A new field re-configuration technique, Multiple Rooks of Chess (MRC), for multiple deployable Integral Field Spectrographs has been developed. The method involves mechanical geometry as well as an optimized deployment algorithm. The geometry is found to be simple for mechanical implementation. The algorithm initially assigns the IFUs to the target objects and then devises the movement sequence based on the current and the desired IFU positions. The reconfiguration time using the suitable actuators which runs at 20 cm/s is found to be a maximum of 25 seconds for the circular DOTIFS focal plane (180 mm diameter). The Geometry Algorithm Combination (GAC) has been tested on several million mock target configurations with object-to-IFU (τ ) ratio varying from 0.25 to 16. The MRC method is found to-be efficient in target acquisition in terms of field revisit and deployment time without any collision or entanglement of the fiber bundles. The efficiency of the technique does not get affected by the increase in number density of target objects. The technique is compared with other available methods based on sky coverage, flexibility and overhead time. The proposed geometry and algorithm combination is found to have an advantage in all of the aspects.
△ Less
Submitted 2 October, 2018;
originally announced October 2018.
-
Extracting Fairness Policies from Legal Documents
Authors:
Rashmi Nagpal,
Chetna Wadhwa,
Mallika Gupta,
Samiulla Shaikh,
Sameep Mehta,
Vikram Goyal
Abstract:
Machine Learning community is recently exploring the implications of bias and fairness with respect to the AI applications. The definition of fairness for such applications varies based on their domain of application. The policies governing the use of such machine learning system in a given context are defined by the constitutional laws of nations and regulatory policies enforced by the organizati…
▽ More
Machine Learning community is recently exploring the implications of bias and fairness with respect to the AI applications. The definition of fairness for such applications varies based on their domain of application. The policies governing the use of such machine learning system in a given context are defined by the constitutional laws of nations and regulatory policies enforced by the organizations that are involved in the usage. Fairness related laws and policies are often spread across the large documents like constitution, agreements, and organizational regulations. These legal documents have long complex sentences in order to achieve rigorousness and robustness. Automatic extraction of fairness policies, or in general, any specific kind of policies from large legal corpus can be very useful for the study of bias and fairness in the context of AI applications.
We attempted to automatically extract fairness policies from publicly available law documents using two approaches based on semantic relatedness. The experiments reveal how classical Wordnet-based similarity and vector-based similarity differ in addressing this task. We have shown that similarity based on word vectors beats the classical approach with a large margin, whereas other vector representations of senses and sentences fail to even match the classical baseline. Further, we have presented thorough error analysis and reasoning to explain the results with appropriate examples from the dataset for deeper insights.
△ Less
Submitted 12 September, 2018;
originally announced September 2018.
-
Vulnerable to Misinformation? Verifi!
Authors:
Alireza Karduni,
Isaac Cho,
Ryan Wesslen,
Sashank Santhanam,
Svitlana Volkova,
Dustin Arendt,
Samira Shaikh,
Wenwen Dou
Abstract:
We present Verifi2, a visual analytic system to support the investigation of misinformation on social media. On the one hand, social media platforms empower individuals and organizations by democratizing the sharing of information. On the other hand, even well-informed and experienced social media users are vulnerable to misinformation. To address the issue, various models and studies have emerged…
▽ More
We present Verifi2, a visual analytic system to support the investigation of misinformation on social media. On the one hand, social media platforms empower individuals and organizations by democratizing the sharing of information. On the other hand, even well-informed and experienced social media users are vulnerable to misinformation. To address the issue, various models and studies have emerged from multiple disciplines to detect and understand the effects of misinformation. However, there is still a lack of intuitive and accessible tools that help social media users distinguish misinformation from verified news. In this paper, we present Verifi2, a visual analytic system that uses state-of-the-art computational methods to highlight salient features from text, social network, and images. By exploring news on a source level through multiple coordinated views in Verifi2, users can interact with the complex dimensions that characterize misinformation and contrast how real and suspicious news outlets differ on these dimensions. To evaluate Verifi2, we conduct interviews with experts in digital media, journalism, education, psychology, and computing who study misinformation. Our interviews show promising potential for Verifi2 to serve as an educational tool on misinformation. Furthermore, our interview results highlight the complexity of the problem of combating misinformation and call for more work from the visualization community.
△ Less
Submitted 17 March, 2019; v1 submitted 25 July, 2018;
originally announced July 2018.
-
Anchored in a Data Storm: How Anchoring Bias Can Affect User Strategy, Confidence, and Decisions in Visual Analytics
Authors:
Ryan Wesslen,
Sashank Santhanam,
Alireza Karduni,
Isaac Cho,
Samira Shaikh,
Wenwen Dou
Abstract:
Cognitive biases have been shown to lead to faulty decision-making. Recent research has demonstrated that the effect of cognitive biases, anchoring bias in particular, transfers to information visualization and visual analytics. However, it is still unclear how users of visual interfaces can be anchored and the impact of anchoring on user performance and decision-making process. To investigate, we…
▽ More
Cognitive biases have been shown to lead to faulty decision-making. Recent research has demonstrated that the effect of cognitive biases, anchoring bias in particular, transfers to information visualization and visual analytics. However, it is still unclear how users of visual interfaces can be anchored and the impact of anchoring on user performance and decision-making process. To investigate, we performed two rounds of between-subjects, in-laboratory experiments with 94 participants to analyze the effect of visual anchors and strategy cues in decision-making with a visual analytic system that employs coordinated multiple view design. The decision-making task is identifying misinformation from Twitter news accounts. Participants were randomly assigned one of three treatment groups (including control) in which participant training processes were modified. Our findings reveal that strategy cues and visual anchors (scenario videos) can significantly affect user activity, speed, confidence, and, under certain circumstances, accuracy. We discuss the implications of our experiment results on training users how to use a newly developed visual interface. We call for more careful consideration into how visualization designers and researchers train users to avoid unintentionally anchoring users and thus affecting the end result.
△ Less
Submitted 7 June, 2018;
originally announced June 2018.
-
Surface texturing of Ti6Al4V alloy using femtosecond laser for superior antibacterial performance
Authors:
Shazia Shaikh,
Sunita Kedia,
Deepti Singh,
Mahesh Subramanian,
Sucharita Sinha
Abstract:
Titanium and its alloy are most widely used implant materials in dental and orthopaedic fields. However, infections occurring during implantation leads to implant failure in most of the cases. Here, we have demonstrated antibacterial behavior of Ti6Al4V alloy achieved when surface modified using femtosecond laser beam. Post laser treatment conical microstructures were observed on the Ti6Al4V alloy…
▽ More
Titanium and its alloy are most widely used implant materials in dental and orthopaedic fields. However, infections occurring during implantation leads to implant failure in most of the cases. Here, we have demonstrated antibacterial behavior of Ti6Al4V alloy achieved when surface modified using femtosecond laser beam. Post laser treatment conical microstructures were observed on the Ti6Al4V alloy surface. Generation of different sub-oxide phases of titanium dioxide were detected on laser treated samples using X-ray diffraction and X-ray photoelectron spectroscopy. Wettability of Ti6Al4V alloy surface changed significantly after interaction with the laser. Adhesion and growth of two gram positive; Staphylococcus aureus and Streptococcus mutans and one gram negative Pseudomonas aeruginosa bacteria have been explored on pristine, as well as, on laser textured Ti6Al4V alloy surfaces. In-vitro investigation on agar plate showed inhibition of bacterial growth on most of the laser treated surface. Superior surface roughness and occurrence of magneli phases of titanium dioxide on laser treated surface were probably responsible for the antibacterial behavior exhibited by the laser treated samples. Therefore, femtosecond laser surface treatment of Ti6Al4V alloy could find potential application in the development of infection free medical implants for dental and orthopedic usages.
△ Less
Submitted 21 February, 2018;
originally announced February 2018.
-
Using Naive Bayes Algorithm to Students' bachelor Academic Performances Analysis
Authors:
Fahad Razaque,
Nareena Soomro,
Shoaib Ahmed Shaikh,
Safeeullah Soomro,
Javed Ahmed Samo,
Natesh Kumar,
Huma Dharejo
Abstract:
Academic Data Mining was one of emerging field which comprise procedure of examined students details by different elements such as earlier semester marks, attendance, assignment, discussion, lab work were of used to improved bachelor academic performance of students, and overcome difficulties of low ranks of bachelor students. It was extracted useful knowledge from bachelor academic students data…
▽ More
Academic Data Mining was one of emerging field which comprise procedure of examined students details by different elements such as earlier semester marks, attendance, assignment, discussion, lab work were of used to improved bachelor academic performance of students, and overcome difficulties of low ranks of bachelor students. It was extracted useful knowledge from bachelor academic students data collected from department of Computing. Subsequently preprocessing data, which was applied data mining techniques to discover classification and clustering. In this study, classification method was described which was based on naive byes algorithm and used for Academic data mining. It was supportive to students along with to lecturers for evaluation of academic performance. It was cautionary method for students to progress their performance of study.
△ Less
Submitted 5 February, 2018;
originally announced February 2018.
-
An End-To-End Machine Learning Pipeline That Ensures Fairness Policies
Authors:
Samiulla Shaikh,
Harit Vishwakarma,
Sameep Mehta,
Kush R. Varshney,
Karthikeyan Natesan Ramamurthy,
Dennis Wei
Abstract:
In consequential real-world applications, machine learning (ML) based systems are expected to provide fair and non-discriminatory decisions on candidates from groups defined by protected attributes such as gender and race. These expectations are set via policies or regulations governing data usage and decision criteria (sometimes explicitly calling out decisions by automated systems). Often, the d…
▽ More
In consequential real-world applications, machine learning (ML) based systems are expected to provide fair and non-discriminatory decisions on candidates from groups defined by protected attributes such as gender and race. These expectations are set via policies or regulations governing data usage and decision criteria (sometimes explicitly calling out decisions by automated systems). Often, the data creator, the feature engineer, the author of the algorithm and the user of the results are different entities, making the task of ensuring fairness in an end-to-end ML pipeline challenging. Manually understanding the policies and ensuring fairness in opaque ML systems is time-consuming and error-prone, thus necessitating an end-to-end system that can: 1) understand policies written in natural language, 2) alert users to policy violations during data usage, and 3) log each activity performed using the data in an immutable storage so that policy compliance or violation can be proven later. We propose such a system to ensure that data owners and users are always in compliance with fairness policies.
△ Less
Submitted 18 October, 2017;
originally announced October 2017.
-
Towards a disaster response system based on cognitive radio ad hoc networks
Authors:
Noman Islam,
Ghazala Shafi Shaikh
Abstract:
This paper presents an approach towards disaster management based on cognitive radio ad hoc network. Despite the growing interests on cognitive radio ad hoc networks, not much work has been reported on using them for disaster management. This paper discusses opportunities for disaster management based on cognitive radio ad hoc networks. In this direction, the paper presents a novel technique for d…
▽ More
This paper presents an approach towards disaster management based on cognitive radio ad hoc network. Despite the growing interests on cognitive radio ad hoc networks, not much work has been reported on using them for disaster management. This paper discusses opportunities for disaster management based on cognitive radio ad hoc networks. In this direction, the paper presents a novel technique for disaster detection based on Artificial Neural Network (ANN). The ANN is trained using backward propagation algorithm. An ANN-based spectrum sensing scheme is also presented. Finally, a service discovery scheme is presented for coordination during the time of disaster. The simulation of proposed approach has been performed in NS-2 simulator. The proposed approach shows very low false negative alarm rate using the proposed disaster detection system. The spectrum switching time of spectrum sensing scheme is also analyzed along with an analysis of latency of proposed service discovery scheme
△ Less
Submitted 3 October, 2017;
originally announced October 2017.
-
Simultaneous Detection and Quantification of Retinal Fluid with Deep Learning
Authors:
Dustin Morley,
Hassan Foroosh,
Saad Shaikh,
Ulas Bagci
Abstract:
We propose a new deep learning approach for automatic detection and segmentation of fluid within retinal OCT images. The proposed framework utilizes both ResNet and Encoder-Decoder neural network architectures. When training the network, we apply a novel data augmentation method called myopic warping together with standard rotation-based augmentation to increase the training set size to 45 times t…
▽ More
We propose a new deep learning approach for automatic detection and segmentation of fluid within retinal OCT images. The proposed framework utilizes both ResNet and Encoder-Decoder neural network architectures. When training the network, we apply a novel data augmentation method called myopic warping together with standard rotation-based augmentation to increase the training set size to 45 times the original amount. Finally, the network output is post-processed with an energy minimization algorithm (graph cut) along with a few other knowledge guided morphological operations to finalize the segmentation process. Based on OCT imaging data and its ground truth from the RETOUCH challenge, the proposed system achieves dice indices of 0.522, 0.682, and 0.612, and average absolute volume differences of 0.285, 0.115, and 0.156 mm$^3$ for intaretinal fluid, subretinal fluid, and pigment epithelial detachment respectively.
△ Less
Submitted 17 August, 2017;
originally announced August 2017.
-
Femtosecond laser induced surface modification for prevention of bacterial adhesion on 45S5 bioactive glass
Authors:
Shazia Shaikh,
Deepti Singh,
Mahesh Subramanian,
Sunita Kedia,
Anil Kumar Singh,
Kulwant Singh,
Nidhi Gupta,
Sucharita Sinha
Abstract:
Bacterial attachment and biofilm formation on implant surface has been a major concern in hospital and industrial environment. Prevention of bacterial infections of implant surface through surface treatment could be a potential solution and hence this has become a key area of research. In the present study, the antibacterial and biocompatible properties of femtosecond laser surface treated 45S5 bi…
▽ More
Bacterial attachment and biofilm formation on implant surface has been a major concern in hospital and industrial environment. Prevention of bacterial infections of implant surface through surface treatment could be a potential solution and hence this has become a key area of research. In the present study, the antibacterial and biocompatible properties of femtosecond laser surface treated 45S5 bioactive glass (BG) have been investigated. Adhesion and sustainability of both gram positive S. aureus and gram negative P.aeruginosa and E. coli nosocomial bacteria on untreated and laser treated BG samples has been explored. An imprint method has been used to visualize the growth of bacteria on the sample surface. We observed complete bacterial rejection potentially reducing risk of biofilm formation on laser treated surface. This was correlated with surface roughness, wettability and change in surface chemical composition of the samples before and after laser treatment. Biocompatibility of the laser treated BG was demonstrated by studying the anchoring and growth of human cervix cell line INT407. Our results demonstrate that, laser surface modification of BG enables enhanced bacterial rejection without affecting its biocompatibility towards growth of human cells on it. These results open a significantly potential approach towards use of laser in successfully imparting desirable characteristics to BG based bio-implants and devices.
△ Less
Submitted 20 June, 2017;
originally announced June 2017.
-
Software Model Checking: A Promising Approach to Verify Mobile App Security
Authors:
Irina Mariuca Asavoae,
Hoang Nga Nguyen,
Markus Roggenbach,
Siraj Ahmed Shaikh
Abstract:
In this position paper we advocate software model checking as a technique suitable for security analysis of mobile apps. Our recommendation is based on promising results that we achieved on analysing app collusion in the context of the Android operating system. Broadly speaking, app collusion appears when, in performing a threat, several apps are working together, i.e., they exchange information w…
▽ More
In this position paper we advocate software model checking as a technique suitable for security analysis of mobile apps. Our recommendation is based on promising results that we achieved on analysing app collusion in the context of the Android operating system. Broadly speaking, app collusion appears when, in performing a threat, several apps are working together, i.e., they exchange information which they could not obtain on their own. In this context, we developed the Kandroid tool, which provides an encoding of the Android/Smali code semantics within the K framework. Kandroid allows for software model checking of Android APK files. Though our experience so far is limited to collusion, we believe the approach to be applicable to further security properties as well as other mobile operating systems.
△ Less
Submitted 15 June, 2017;
originally announced June 2017.
-
A mathematical Study of Magnetohydrodynamic Casson Fluid via Special Functions with Heat and Mass Transfer embedded in Porous Plate
Authors:
Kashif Ali Abro,
Hina Saeed Shaikh,
Ilyas Khan
Abstract:
This article is proposed to investigate the impacts of heat and mass transfer in magnetohydrodynamic casson fluid embedded in porous medium. The generalized solutions have been traced out for the temperature distribution, mass concentration and velocity profiles under the existence and non-existence of transverse magnetic field, permeability and porosity. The corresponding solutions of temperature…
▽ More
This article is proposed to investigate the impacts of heat and mass transfer in magnetohydrodynamic casson fluid embedded in porous medium. The generalized solutions have been traced out for the temperature distribution, mass concentration and velocity profiles under the existence and non-existence of transverse magnetic field, permeability and porosity. The corresponding solutions of temperature distribution and mass concentration, velocity profiles are expressed in terms of newly defined generalized Robotnov-Hartley function, wright function and Mittage-Leffler function respectively. All the corresponding solutions fulfill necessary conditions (initial, natural and boundary conditions) as well. Caputo Fractionalized solutions have been converted for ordinary solutions by substituting ζ=1. Some similar solutions for the temperature distribution, mass concentration and velocity profiles have been particularized form generalized solutions. Owing to the rheology of problem, graphical illustrations of distinct parameters are discussed in detail by depicting figures using Mathcad software (15).
△ Less
Submitted 8 June, 2017;
originally announced June 2017.
-
Surface Treatment of 45S5 Bio-glass using Femtosecond Laser to Achieve Superior Growth of Hydroxyapatite
Authors:
Shazia Shaikh,
Sunita Kedia,
Anil Kumar Singh,
Kuldeep Sharma,
Sucharita Sinha
Abstract:
45S5 Hench bio-glass (BG) has gained interest in research because of its potential clinical applications. Several studies in-vivo and in-vitro have been in progress to improve bio-integration efficiency of this glass. In present contribution, surface modification of Hench BG has been done employing a femtosecond (fs) laser beam, resulting in increased effective surface area of the sample. These su…
▽ More
45S5 Hench bio-glass (BG) has gained interest in research because of its potential clinical applications. Several studies in-vivo and in-vitro have been in progress to improve bio-integration efficiency of this glass. In present contribution, surface modification of Hench BG has been done employing a femtosecond (fs) laser beam, resulting in increased effective surface area of the sample. These surface modified samples were subsequently immersed in simulated body fluid for varying number of days and characterized using Scanning electron microscope, energy dispersive X-ray analysis, X-ray diffraction, and micro-Raman spectroscopy. In-vitro studies indicated superior growth of hydroxyapatite (HAP) layer on the laser treated samples in comparison to the untreated samples. Presence of strong XRD peaks confirmed faster growth of HAP on laser treated samples. Raman peaks, five times more intense and relatively narrower represented higher crystallinity of hydroxyapatite layer on laser treated BG.
△ Less
Submitted 18 October, 2016;
originally announced October 2016.
-
Constraining stochastic gravitational wave background from weak lensing of CMB B-modes
Authors:
Shabbir Shaikh,
Suvodip Mukherjee,
Aditya Rotti,
Tarun Souradeep
Abstract:
A stochastic gravitational wave background (SGWB) will affect the CMB anisotropies via weak lensing. Unlike weak lensing due to large scale structure which only deflects photon trajectories, a SGWB has an additional effect of rotating the polarization vector along the trajectory. We study the relative importance of these two effects, deflection \& rotation, specifically in the context of E-mode to…
▽ More
A stochastic gravitational wave background (SGWB) will affect the CMB anisotropies via weak lensing. Unlike weak lensing due to large scale structure which only deflects photon trajectories, a SGWB has an additional effect of rotating the polarization vector along the trajectory. We study the relative importance of these two effects, deflection \& rotation, specifically in the context of E-mode to B-mode power transfer caused by weak lensing due to SGWB. Using weak lensing distortion of the CMB as a probe, we derive constraints on the spectral energy density ($Ω_{GW}$) of the SGWB, sourced at different redshifts, without assuming any particular model for its origin. We present these bounds on $Ω_{GW}$ for different power-law models characterizing the SGWB, indicating the threshold above which observable imprints of SGWB must be present in CMB.
△ Less
Submitted 20 September, 2016; v1 submitted 28 June, 2016;
originally announced June 2016.
-
Towards Automated Android App Collusion Detection
Authors:
Irina Mariuca Asavoae,
Jorge Blasco,
Thomas M. Chen,
Harsha Kumara Kalutarage,
Igor Muttik,
Hoang Nga Nguyen,
Markus Roggenbach,
Siraj Ahmed Shaikh
Abstract:
Android OS supports multiple communication methods between apps. This opens the possibility to carry out threats in a collaborative fashion, c.f. the Soundcomber example from 2011. In this paper we provide a concise definition of collusion and report on a number of automated detection approaches, developed in co-operation with Intel Security.
Android OS supports multiple communication methods between apps. This opens the possibility to carry out threats in a collaborative fashion, c.f. the Soundcomber example from 2011. In this paper we provide a concise definition of collusion and report on a number of automated detection approaches, developed in co-operation with Intel Security.
△ Less
Submitted 7 March, 2016;
originally announced March 2016.
-
Direction dependence of cosmological parameters due to cosmic hemispherical asymmetry
Authors:
Suvodip Mukherjee,
Pavan K. Aluri,
Santanu Das,
Shabbir Shaikh,
Tarun Souradeep
Abstract:
Persistent evidence for a cosmic hemispherical asymmetry in the temperature field of cosmic microwave background (CMB) as observed by both WMAP as well as PLANCK increases the possibility of its cosmological origin. Presence of this signal may lead to different values for the standard model cosmological parameters in different directions, and that can have significant implications for other studie…
▽ More
Persistent evidence for a cosmic hemispherical asymmetry in the temperature field of cosmic microwave background (CMB) as observed by both WMAP as well as PLANCK increases the possibility of its cosmological origin. Presence of this signal may lead to different values for the standard model cosmological parameters in different directions, and that can have significant implications for other studies where they are used. We investigate the effect of this cosmic hemispherical asymmetry on cosmological parameters using non-isotropic Gaussian random simulations injected with both scale dependent and scale independent modulation strengths. Our analysis shows that $A_s$ and $n_s$ are the most susceptible parameters to acquire position dependence across the sky for the kind of isotropy breaking phenomena under study. As expected, we find maximum variation arises for the case of scale independent modulation of CMB anisotropies. We find that scale dependent modulation profile as seen in PLANCK data could lead to only $1.25σ$ deviation in $A_s$ in comparison to its estimate from isotropic CMB sky.
△ Less
Submitted 25 June, 2016; v1 submitted 1 October, 2015;
originally announced October 2015.
-
Document clustering using graph based document representation with constraints
Authors:
Muhammad Rafi,
Farnaz Amin,
Mohammad Shahid Shaikh
Abstract:
Document clustering is an unsupervised approach in which a large collection of documents (corpus) is subdivided into smaller, meaningful, identifiable, and verifiable sub-groups (clusters). Meaningful representation of documents and implicitly identifying the patterns, on which this separation is performed, is the challenging part of document clustering. We have proposed a document clustering tech…
▽ More
Document clustering is an unsupervised approach in which a large collection of documents (corpus) is subdivided into smaller, meaningful, identifiable, and verifiable sub-groups (clusters). Meaningful representation of documents and implicitly identifying the patterns, on which this separation is performed, is the challenging part of document clustering. We have proposed a document clustering technique using graph based document representation with constraints. A graph data structure can easily capture the non-linear relationships of nodes, document contains various feature terms that can be non-linearly connected hence a graph can easily represents this information. Constrains, are explicit conditions for document clustering where background knowledge is use to set the direction for Linking or Not-Linking a set of documents for a target clusters, thus guiding the clustering process. We deemed clustering is an ill-define problem, there can be many clustering results. Background knowledge can be used to drive the clustering algorithm in the right direction. We have proposed three different types of constraints, Instance level, corpus level and cluster level constraints. A new algorithm Constrained HAC is also proposed which will incorporate Instance level constraints as prior knowledge; it will guide the clustering process leading to better results. Extensive set of experiments have been performed on both synthetic and standard document clustering datasets, results are compared on standard clustering measures like: purity, entropy and F-measure. Results clearly establish that our proposed approach leads to improvement in cluster quality.
△ Less
Submitted 4 December, 2014;
originally announced December 2014.
-
An improved semantic similarity measure for document clustering based on topic maps
Authors:
Muhammad Rafi,
Mohammad Shahid Shaikh
Abstract:
A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assigns a real number between 0 and 1 to a pair of documents, depending upon the degree of similarity between them. A value of zero means that the documents are completely dissimilar whereas a value of one indicates that the…
▽ More
A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assigns a real number between 0 and 1 to a pair of documents, depending upon the degree of similarity between them. A value of zero means that the documents are completely dissimilar whereas a value of one indicates that the documents are practically identical. Traditionally, vector-based models have been used for computing the document similarity. The vector-based models represent several features present in documents. These approaches to similarity measures, in general, cannot account for the semantics of the document. Documents written in human languages contain contexts and the words used to describe these contexts are generally semantically related. Motivated by this fact, many researchers have proposed seman-tic-based similarity measures by utilizing text annotation through external thesauruses like WordNet (a lexical database). In this paper, we define a semantic similarity measure based on documents represented in topic maps. Topic maps are rapidly becoming an industrial standard for knowledge representation with a focus for later search and extraction. The documents are transformed into a topic map based coded knowledge and the similarity between a pair of documents is represented as a correlation between the common patterns (sub-trees). The experimental studies on the text mining datasets reveal that this new similarity measure is more effective as compared to commonly used similarity measures in text clustering.
△ Less
Submitted 17 March, 2013;
originally announced March 2013.
-
A comparison of SVM and RVM for Document Classification
Authors:
Muhammad Rafi,
Mohammad Shahid Shaikh
Abstract:
Document classification is a task of assigning a new unclassified document to one of the predefined set of classes. The content based document classification uses the content of the document with some weighting criteria to assign it to one of the predefined classes. It is a major task in library science, electronic document management systems and information sciences. This paper investigates docum…
▽ More
Document classification is a task of assigning a new unclassified document to one of the predefined set of classes. The content based document classification uses the content of the document with some weighting criteria to assign it to one of the predefined classes. It is a major task in library science, electronic document management systems and information sciences. This paper investigates document classification by using two different classification techniques (1) Support Vector Machine (SVM) and (2) Relevance Vector Machine (RVM). SVM is a supervised machine learning technique that can be used for classification task. In its basic form, SVM represents the instances of the data into space and tries to separate the distinct classes by a maximum possible wide gap (hyper plane) that separates the classes. On the other hand RVM uses probabilistic measure to define this separation space. RVM uses Bayesian inference to obtain succinct solution, thus RVM uses significantly fewer basis functions. Experimental studies on three standard text classification datasets reveal that although RVM takes more training time, its classification is much better as compared to SVM.
△ Less
Submitted 13 January, 2013;
originally announced January 2013.
-
A Framework for Analysing Driver Interactions with Semi-Autonomous Vehicles
Authors:
Siraj Shaikh,
Padmanabhan Krishnan
Abstract:
Semi-autonomous vehicles are increasingly serving critical functions in various settings from mining to logistics to defence. A key characteristic of such systems is the presence of the human (drivers) in the control loop. To ensure safety, both the driver needs to be aware of the autonomous aspects of the vehicle and the automated features of the vehicle built to enable safer control. In this pap…
▽ More
Semi-autonomous vehicles are increasingly serving critical functions in various settings from mining to logistics to defence. A key characteristic of such systems is the presence of the human (drivers) in the control loop. To ensure safety, both the driver needs to be aware of the autonomous aspects of the vehicle and the automated features of the vehicle built to enable safer control. In this paper we propose a framework to combine empirical models describing human behaviour with the environment and system models. We then analyse, via model checking, interaction between the models for desired safety properties. The aim is to analyse the design for safe vehicle-driver interaction. We demonstrate the applicability of our approach using a case study involving semi-autonomous vehicles where the driver fatigue are factors critical to a safe journey.
△ Less
Submitted 31 December, 2012;
originally announced January 2013.
-
Content-based Text Categorization using Wikitology
Authors:
Muhammad Rafi,
Sundus Hassan,
Mohammad Shahid Shaikh
Abstract:
A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assign a real number between 0 and 1 to a pair of documents, depending upon the degree of similarity between them. A value of zero means that the documents are completely dissimilar whereas a value of one indicates that the…
▽ More
A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assign a real number between 0 and 1 to a pair of documents, depending upon the degree of similarity between them. A value of zero means that the documents are completely dissimilar whereas a value of one indicates that the documents are practically identical. Traditionally, vector-based models have been used for computing the document similarity. The vector-based models represent several features present in documents. These approaches to similarity measures, in general, cannot account for the semantics of the document. Documents written in human languages contain contexts and the words used to describe these contexts are generally semantically related. Motivated by this fact, many researchers have proposed semantic-based similarity measures by utilizing text annotation through external thesauruses like WordNet (a lexical database). In this paper, we define a semantic similarity measure based on documents represented in topic maps. Topic maps are rapidly becoming an industrial standard for knowledge representation with a focus for later search and extraction. The documents are transformed into a topic map based coded knowledge and the similarity between a pair of documents is represented as a correlation between the common patterns. The experimental studies on the text mining datasets reveal that this new similarity measure is more effective as compared to commonly used similarity measures in text clustering.
△ Less
Submitted 17 August, 2012;
originally announced August 2012.
-
Association Rule Mining Based On Trade List
Authors:
Sanober Shaikh,
Madhuri rao
Abstract:
In this paper a new mining algorithm is defined based on frequent item set. Apriori Algorithm scans the database every time when it finds the frequent item set so it is very time consuming and at each step it generates candidate item set. So for large databases it takes lots of space to store candidate item set .In undirected item set graph, it is improvement on apriori but it takes time and space…
▽ More
In this paper a new mining algorithm is defined based on frequent item set. Apriori Algorithm scans the database every time when it finds the frequent item set so it is very time consuming and at each step it generates candidate item set. So for large databases it takes lots of space to store candidate item set .In undirected item set graph, it is improvement on apriori but it takes time and space for tree generation. The defined algorithm scans the database at the start only once and then from that scanned data base it generates the Trade List. It contains the information of whole database. By considering minimum support it finds the frequent item set and by considering the minimum confidence it generates the association rule. If database and minimum support is changed, the new algorithm finds the new frequent items by scanning Trade List. That is why it's executing efficiency is improved distinctly compared to traditional algorithm.
△ Less
Submitted 22 February, 2012;
originally announced February 2012.
-
Comparing SVM and Naive Bayes classifiers for text categorization with Wikitology as knowledge enrichment
Authors:
Sundus Hassan,
Muhammad Rafi,
Muhammad Shahid Shaikh
Abstract:
The activity of labeling of documents according to their content is known as text categorization. Many experiments have been carried out to enhance text categorization by adding background knowledge to the document using knowledge repositories like Word Net, Open Project Directory (OPD), Wikipedia and Wikitology. In our previous work, we have carried out intensive experiments by extracting knowled…
▽ More
The activity of labeling of documents according to their content is known as text categorization. Many experiments have been carried out to enhance text categorization by adding background knowledge to the document using knowledge repositories like Word Net, Open Project Directory (OPD), Wikipedia and Wikitology. In our previous work, we have carried out intensive experiments by extracting knowledge from Wikitology and evaluating the experiment on Support Vector Machine with 10- fold cross-validations. The results clearly indicate Wikitology is far better than other knowledge bases. In this paper we are comparing Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers under text enrichment through Wikitology. We validated results with 10-fold cross validation and shown that NB gives an improvement of +28.78%, on the other hand SVM gives an improvement of +6.36% when compared with baseline results. Naïve Bayes classifier is better choice when external enriching is used through any external knowledge base.
△ Less
Submitted 18 February, 2012;
originally announced February 2012.
-
Document Clustering based on Topic Maps
Authors:
Muhammad Rafi,
M. Shahid Shaikh,
Amir Farooq
Abstract:
Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to…
▽ More
Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to represent the document in such a form that inherently captures semantics of the text. This may also help to reduce dimensionality of the document, and (2) to define a similarity measure based on the semantic representation such that it assigns higher numerical values to document pairs which have higher semantic relationship. Feature space of the documents can be very challenging for document clustering. A document may contain multiple topics, it may contain a large set of class-independent general-words, and a handful class-specific core-words. With these features in mind, traditional agglomerative clustering algorithms, which are based on either Document Vector model (DVM) or Suffix Tree model (STC), are less efficient in producing results with high cluster quality. This paper introduces a new approach for document clustering based on the Topic Map representation of the documents. The document is being transformed into a compact form. A similarity measure is proposed based upon the inferred information through topic maps data and structures. The suggested method is implemented using agglomerative hierarchal clustering and tested on standard Information retrieval (IR) datasets. The comparative experiment reveals that the proposed approach is effective in improving the cluster quality.
△ Less
Submitted 28 December, 2011;
originally announced December 2011.