Questions tagged [sequence-analysis]
Sequence analysis (in the social sciences) is the analysis of how people or other units of study move from one state to another (for example, single-->married-->widowed, unemployed-->employed-->retired) over the course of their lifespan.
sequence-analysis
36
questions
1
vote
0
answers
45
views
Comparing log-loss values for a probabilistic suffix tree?
In the PST package one can estimate the prediction quality of individual sequences using the log-loss, e.g:
R> ex2 <- c("a-a-b", "a-b-a-a-b", "b-b-b-b-a")
R> ex2 <- seqdef(ex2)
R> ...
1
vote
1
answer
86
views
Meaning of lag parameter in PST?
In the pmine() function in PST you can use lags. What is this lag? Does it mean that it ignores the lag first positions in the sequence? Or does it mean that you allow for lags within the subsequences?...
1
vote
1
answer
55
views
What is the meaning of alpha in the context of an information gain pruning function?
In the PST package we use the value C as a cut-off for the information gain function used to prune the tree. The C value, for an alpha of 0.05 is calculated as follows:
C95 <- qchisq(0.95, 1) / 2
...
2
votes
1
answer
118
views
Fitting a VLMC to very long sequences
I am trying to fit a VLMC to a dataset where the longest sequence is 296 states. I do it as shown below:
# Load libraries
library(PST)
library(RCurl)
library(TraMineR)
# Load and transform data
x &...
2
votes
1
answer
98
views
Predicting conditional probabilities based on contexts with only 1 state
It seems that PST cannot predict the conditional probabilities of the next state after contexts which consist of a single state, e.g. EX-EX
Consider this code:
# Load libraries
library(RCurl)
...
2
votes
1
answer
57
views
Calculate lift for context-state relationship in a probabilistic suffix tree?
PST gives me probabilities and conditional probabilities for various contexts and following states. However, it would be very helpful to be able to calculate the lift (and its significance) of the ...
2
votes
1
answer
273
views
Where in the sequence of a Probabilistic Suffix Tree does "e" occur?
In my data there are only missing data (*) on the right side of the sequences. That means that no sequence starts with * and no sequence has any other markers after *. Despite this the PST (...
2
votes
2
answers
107
views
Getting log-likelihood from probabilistic suffix tree
Here is my code:
library(RCurl)
library(TraMineR)
library(PST)
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/...
0
votes
2
answers
701
views
Python/Biopython: How to search for a reference sequence (string) in a sequence with gaps?
I am facing the following problem and have not found a solution yet:
I am working on a tool for sequence analysis which uses a file with reference sequences and tries to find one of these reference ...
0
votes
1
answer
99
views
Sequence Analysis and Predicting the Next Label
I have recorded a dataset of about 1000 entries in the following format.
TimeStamp | Action | UserId
2015-02-05 | Action1 | XXX
2015-02-06 | Action2 | YYY
2015-02-07 | Action2 | XXX
...
I try to ...
1
vote
1
answer
121
views
R -need help putting matrix into basket or transaction form
Server Epoch A B C D E
1 C301 1420100400 1 0 1 0 0
2 C301 1420100700 0 0 0 0 0
3 C301 1420152000 0 1 0 0 0
4 C301 1420238100 1 1 1 0 0
5 C301 1420324500 1 1 1 1 1
I need ...
1
vote
0
answers
1k
views
Sequence Mining using arulesSequence package in R
I am trying to learn about Sequence Mining, and I ran the following code from wikibooks as an example. The cspade function has taken over 30 minutes to run (and is still running) when the example ...
6
votes
4
answers
1k
views
Pattern in continuous sequence data
Suppose I have a list of events. For example A, D, T, H, U, A, B, F, H, ....
What I need is to find frequent patterns that occur in the complete sequence. In this problem we cannot use traditional ...
2
votes
1
answer
61
views
Detecting sequencing using regexes
Imagine I have multiple character strings in a list like this:
[[1]]
[1] "1-FA-1-I2-1-I2-1-I2-1-EX-1-I2-1-I3-1-FA-1-"
[2] "-1-I2-1-TR-1-"
[3] "-1-I2-1-FA-1-I3-1-" ...
1
vote
2
answers
888
views
Traminer substitution cost
I have a logical problem with the transition cost matrix. I am working on sequences dissimilarity using the R package Traminer.
I try to give you a simple example (very simple, but I hope useful to ...