Skip to main content

Questions tagged [sequence-analysis]

Sequence analysis (in the social sciences) is the analysis of how people or other units of study move from one state to another (for example, single-->married-->widowed, unemployed-->employed-->retired) over the course of their lifespan.

sequence-analysis
1 vote
0 answers
45 views

Comparing log-loss values for a probabilistic suffix tree?

In the PST package one can estimate the prediction quality of individual sequences using the log-loss, e.g: R> ex2 <- c("a-a-b", "a-b-a-a-b", "b-b-b-b-a") R> ex2 <- seqdef(ex2) R> ...
histelheim's user avatar
  • 5,018
1 vote
1 answer
86 views

Meaning of lag parameter in PST?

In the pmine() function in PST you can use lags. What is this lag? Does it mean that it ignores the lag first positions in the sequence? Or does it mean that you allow for lags within the subsequences?...
histelheim's user avatar
  • 5,018
1 vote
1 answer
55 views

What is the meaning of alpha in the context of an information gain pruning function?

In the PST package we use the value C as a cut-off for the information gain function used to prune the tree. The C value, for an alpha of 0.05 is calculated as follows: C95 <- qchisq(0.95, 1) / 2 ...
histelheim's user avatar
  • 5,018
2 votes
1 answer
118 views

Fitting a VLMC to very long sequences

I am trying to fit a VLMC to a dataset where the longest sequence is 296 states. I do it as shown below: # Load libraries library(PST) library(RCurl) library(TraMineR) # Load and transform data x &...
histelheim's user avatar
  • 5,018
2 votes
1 answer
98 views

Predicting conditional probabilities based on contexts with only 1 state

It seems that PST cannot predict the conditional probabilities of the next state after contexts which consist of a single state, e.g. EX-EX Consider this code: # Load libraries library(RCurl) ...
histelheim's user avatar
  • 5,018
2 votes
1 answer
57 views

Calculate lift for context-state relationship in a probabilistic suffix tree?

PST gives me probabilities and conditional probabilities for various contexts and following states. However, it would be very helpful to be able to calculate the lift (and its significance) of the ...
histelheim's user avatar
  • 5,018
2 votes
1 answer
273 views

Where in the sequence of a Probabilistic Suffix Tree does "e" occur?

In my data there are only missing data (*) on the right side of the sequences. That means that no sequence starts with * and no sequence has any other markers after *. Despite this the PST (...
histelheim's user avatar
  • 5,018
2 votes
2 answers
107 views

Getting log-likelihood from probabilistic suffix tree

Here is my code: library(RCurl) library(TraMineR) library(PST) x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/...
histelheim's user avatar
  • 5,018
0 votes
2 answers
701 views

Python/Biopython: How to search for a reference sequence (string) in a sequence with gaps?

I am facing the following problem and have not found a solution yet: I am working on a tool for sequence analysis which uses a file with reference sequences and tries to find one of these reference ...
Sefu's user avatar
  • 5
0 votes
1 answer
99 views

Sequence Analysis and Predicting the Next Label

I have recorded a dataset of about 1000 entries in the following format. TimeStamp | Action | UserId 2015-02-05 | Action1 | XXX 2015-02-06 | Action2 | YYY 2015-02-07 | Action2 | XXX ... I try to ...
nor0x's user avatar
  • 1,213
1 vote
1 answer
121 views

R -need help putting matrix into basket or transaction form

Server Epoch A B C D E 1 C301 1420100400 1 0 1 0 0 2 C301 1420100700 0 0 0 0 0 3 C301 1420152000 0 1 0 0 0 4 C301 1420238100 1 1 1 0 0 5 C301 1420324500 1 1 1 1 1 I need ...
qman's user avatar
  • 11
1 vote
0 answers
1k views

Sequence Mining using arulesSequence package in R

I am trying to learn about Sequence Mining, and I ran the following code from wikibooks as an example. The cspade function has taken over 30 minutes to run (and is still running) when the example ...
orangeteam2's user avatar
6 votes
4 answers
1k views

Pattern in continuous sequence data

Suppose I have a list of events. For example A, D, T, H, U, A, B, F, H, .... What I need is to find frequent patterns that occur in the complete sequence. In this problem we cannot use traditional ...
Haris's user avatar
  • 12.2k
2 votes
1 answer
61 views

Detecting sequencing using regexes

Imagine I have multiple character strings in a list like this: [[1]] [1] "1-FA-1-I2-1-I2-1-I2-1-EX-1-I2-1-I3-1-FA-1-" [2] "-1-I2-1-TR-1-" [3] "-1-I2-1-FA-1-I3-1-" ...
histelheim's user avatar
  • 5,018
1 vote
2 answers
888 views

Traminer substitution cost

I have a logical problem with the transition cost matrix. I am working on sequences dissimilarity using the R package Traminer. I try to give you a simple example (very simple, but I hope useful to ...
Giampiero's user avatar

15 30 50 per page