Questions tagged [sequence-analysis]
Sequence analysis (in the social sciences) is the analysis of how people or other units of study move from one state to another (for example, single-->married-->widowed, unemployed-->employed-->retired) over the course of their lifespan.
sequence-analysis
36
questions
1
vote
2
answers
51
views
adjusting the legend in TramineR plots
I am new to using TramineR and I just cannot seem to figure out how to arrange the legend in any of the plot types. The legend keeps being cut off by the plots. I have tried to use the seqlegend ...
1
vote
1
answer
72
views
Sequence alignment for hierarchical cluster analysis on categorical sequence data
I have a dataset of short-term behaviors displayed by 30 individuals.
#Load packages
library(TraMineR)
# Function to generate a random non-numerical sequence
generate_random_sequence <- function(...
1
vote
2
answers
120
views
Convert long data.frame to sequence in TraMineR
I have a data.frame in long format, that I want to convert to a TraMineR sequence object.
set.seed(1)
df <- data.frame(year = rep(1990:2010, 3),
id = rep(1:3, each = 21),
...
3
votes
1
answer
48
views
Regression tree size in the context of state sequence analysis using TraMineR in R
I am conducting a regression tree using state sequence analysis and I want the image output to have the dimensions of a Letter size paper (landscape).
When I use the code I include the regression tree ...
2
votes
3
answers
246
views
Traminer R for sequence analysis: how to account for state order besides spell lenght?
I'm doing sequence analysis with Traminer on R and I would like to take into account only the order of different spells over time.
For instance, I would like that the sequence A-B-A would be ...
1
vote
1
answer
58
views
Extracting a portion of the generated Representative Sequences
So, I have a set of 893 sequences of varying lengths with max sequence length = 152. There are 10 unique states across all of them. These sequences are split into two groups: Promoted and Not Promoted....
2
votes
1
answer
254
views
R: TraMineR Conversion Between sequence formats SPELL to STS with out dates?
I am trying to study the volunteer trajectories of a group of individuals. My data looks like something like this.
ID Program Area Impact Area Hours Served Organization Served
1 Tutoring ...
1
vote
1
answer
538
views
Is TraMineR appropriate for data with different sequence length?
My data has the sequence of each student's page visit behaviors during a learning session. For example (below) Student 1 read instructions, visited three pages ("Visit-Visit-Visit"), and ...
1
vote
1
answer
466
views
Remove missing data state ‘%’ when using TraMineR’s seqpcplot() function
I am trying to conduct event sequence analysis on longitudinal survey data. I want to create a plot which looks like this (pg. 44 of https://www.researchgate.net/publication/...
1
vote
1
answer
99
views
Sequence analysis clustering CHI2 EUCLID error
I am quite new to sequence analysis and trying to identify clusters in an aggregated sequence matrix, focusing on the state duration. However, when using method='CHI2'/'EUCLID' combined with step=1 (...
1
vote
1
answer
132
views
Setting the "tpow" and "expcost" arguments in TraMineR::seqdist
I'm actually working on the pathways of inpatients during their hospital stay. These pathways are represented as states sequences (the current medical unit at each time unit) and I'm trying to find ...
1
vote
1
answer
469
views
How to compute dissimilarities between sequences when sequences contain gaps?
I want to cluster sequences with optimal matching with TraMineR::seqdist() from data that contains missings, i.e. sequences containing gaps.
library(TraMineR)
data(ex1)
sum(is.na(ex1))
# [1] 38
sq &...
2
votes
3
answers
182
views
How to get the largest possible column sequence with the least possible row NAs from a huge matrix?
I want to select columns from a data frame so that the resulting continuous column-sequences are as long as possible, while the number of rows with NAs is as small as possible, because they have to be ...
1
vote
1
answer
81
views
How to introduce noise into sequence data using TraMineR?
I want to randomly change states in a sequence dataset for the purposes of simulation. The goal is to see how different measures of cluster quality behave with different degrees of structure in the ...
1
vote
1
answer
651
views
How to test if two lift values are significantly different from each other?
Consider this code:
# Load libraries
library(RCurl)
library(TraMineR)
library(PST)
# Get data
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/...
1
vote
0
answers
45
views
Comparing log-loss values for a probabilistic suffix tree?
In the PST package one can estimate the prediction quality of individual sequences using the log-loss, e.g:
R> ex2 <- c("a-a-b", "a-b-a-a-b", "b-b-b-b-a")
R> ex2 <- seqdef(ex2)
R> ...
1
vote
1
answer
86
views
Meaning of lag parameter in PST?
In the pmine() function in PST you can use lags. What is this lag? Does it mean that it ignores the lag first positions in the sequence? Or does it mean that you allow for lags within the subsequences?...
1
vote
1
answer
55
views
What is the meaning of alpha in the context of an information gain pruning function?
In the PST package we use the value C as a cut-off for the information gain function used to prune the tree. The C value, for an alpha of 0.05 is calculated as follows:
C95 <- qchisq(0.95, 1) / 2
...
2
votes
1
answer
118
views
Fitting a VLMC to very long sequences
I am trying to fit a VLMC to a dataset where the longest sequence is 296 states. I do it as shown below:
# Load libraries
library(PST)
library(RCurl)
library(TraMineR)
# Load and transform data
x &...
2
votes
1
answer
98
views
Predicting conditional probabilities based on contexts with only 1 state
It seems that PST cannot predict the conditional probabilities of the next state after contexts which consist of a single state, e.g. EX-EX
Consider this code:
# Load libraries
library(RCurl)
...
2
votes
1
answer
57
views
Calculate lift for context-state relationship in a probabilistic suffix tree?
PST gives me probabilities and conditional probabilities for various contexts and following states. However, it would be very helpful to be able to calculate the lift (and its significance) of the ...
2
votes
1
answer
273
views
Where in the sequence of a Probabilistic Suffix Tree does "e" occur?
In my data there are only missing data (*) on the right side of the sequences. That means that no sequence starts with * and no sequence has any other markers after *. Despite this the PST (...
2
votes
2
answers
107
views
Getting log-likelihood from probabilistic suffix tree
Here is my code:
library(RCurl)
library(TraMineR)
library(PST)
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/...
0
votes
2
answers
701
views
Python/Biopython: How to search for a reference sequence (string) in a sequence with gaps?
I am facing the following problem and have not found a solution yet:
I am working on a tool for sequence analysis which uses a file with reference sequences and tries to find one of these reference ...
0
votes
1
answer
99
views
Sequence Analysis and Predicting the Next Label
I have recorded a dataset of about 1000 entries in the following format.
TimeStamp | Action | UserId
2015-02-05 | Action1 | XXX
2015-02-06 | Action2 | YYY
2015-02-07 | Action2 | XXX
...
I try to ...
1
vote
1
answer
121
views
R -need help putting matrix into basket or transaction form
Server Epoch A B C D E
1 C301 1420100400 1 0 1 0 0
2 C301 1420100700 0 0 0 0 0
3 C301 1420152000 0 1 0 0 0
4 C301 1420238100 1 1 1 0 0
5 C301 1420324500 1 1 1 1 1
I need ...
1
vote
0
answers
1k
views
Sequence Mining using arulesSequence package in R
I am trying to learn about Sequence Mining, and I ran the following code from wikibooks as an example. The cspade function has taken over 30 minutes to run (and is still running) when the example ...
6
votes
4
answers
1k
views
Pattern in continuous sequence data
Suppose I have a list of events. For example A, D, T, H, U, A, B, F, H, ....
What I need is to find frequent patterns that occur in the complete sequence. In this problem we cannot use traditional ...
2
votes
1
answer
61
views
Detecting sequencing using regexes
Imagine I have multiple character strings in a list like this:
[[1]]
[1] "1-FA-1-I2-1-I2-1-I2-1-EX-1-I2-1-I3-1-FA-1-"
[2] "-1-I2-1-TR-1-"
[3] "-1-I2-1-FA-1-I3-1-" ...
1
vote
2
answers
888
views
Traminer substitution cost
I have a logical problem with the transition cost matrix. I am working on sequences dissimilarity using the R package Traminer.
I try to give you a simple example (very simple, but I hope useful to ...
1
vote
1
answer
197
views
How to address void elements overwhelming analysis?
I'm conducting some analysis on sequence data with very different lengths using TraMineR. What ends up happening is that the void elements (%) used to make the sequences equally long end up ...
1
vote
1
answer
346
views
TraMineR:::seqerules help page?
Is there a help-page for TraMineR:::seqerules? I cannot seem to find it, either in the package nor online. The lack of this help page makes the output somewhat difficult to interpret. For example what ...
1
vote
1
answer
113
views
Format of output for seqecmpgroup() function?
The seqecmpgroup() function returns a table that, among other things, include frequencies for each of the specified groups. However, when I run this it generates frequencies below 1 (e.g. 0.00035). ...
1
vote
2
answers
1k
views
seqinr dotplot - change axis
I have to datasets: seq1 and seq2 (DNA sequences). I wanted to do a dataplot, comparing the two sequences and placing a dot where the two sequences match. I was able to accomplish this using seqinr's ...
2
votes
2
answers
184
views
How to identify sequences within each leaf from a regression tree?
Using the biofam dataset
library(TraMineR)
data(biofam)
lab <- c("P","L","M","LM","C","LC","LMC","D")
biofam.seq <- seqdef(biofam[,10:25], states=lab)
head(biofam.seq)
Sequence ...
3
votes
1
answer
348
views
TraMineR Using Weights
I am still new to TraMineR; therefore, my problem might be very simple for most of you.
I am working on some sequence plots with my data and would like to see the results with the survey weights and ...