Skip to main content

Questions tagged [sequence-analysis]

Sequence analysis (in the social sciences) is the analysis of how people or other units of study move from one state to another (for example, single-->married-->widowed, unemployed-->employed-->retired) over the course of their lifespan.

sequence-analysis
1 vote
2 answers
51 views

adjusting the legend in TramineR plots

I am new to using TramineR and I just cannot seem to figure out how to arrange the legend in any of the plot types. The legend keeps being cut off by the plots. I have tried to use the seqlegend ...
user25809482's user avatar
1 vote
1 answer
72 views

Sequence alignment for hierarchical cluster analysis on categorical sequence data

I have a dataset of short-term behaviors displayed by 30 individuals. #Load packages library(TraMineR) # Function to generate a random non-numerical sequence generate_random_sequence <- function(...
JatNTU's user avatar
  • 21
1 vote
2 answers
120 views

Convert long data.frame to sequence in TraMineR

I have a data.frame in long format, that I want to convert to a TraMineR sequence object. set.seed(1) df <- data.frame(year = rep(1990:2010, 3), id = rep(1:3, each = 21), ...
Maël's user avatar
  • 50.9k
3 votes
1 answer
48 views

Regression tree size in the context of state sequence analysis using TraMineR in R

I am conducting a regression tree using state sequence analysis and I want the image output to have the dimensions of a Letter size paper (landscape). When I use the code I include the regression tree ...
idborquez's user avatar
2 votes
3 answers
246 views

Traminer R for sequence analysis: how to account for state order besides spell lenght?

I'm doing sequence analysis with Traminer on R and I would like to take into account only the order of different spells over time. For instance, I would like that the sequence A-B-A would be ...
ggg's user avatar
  • 83
1 vote
1 answer
58 views

Extracting a portion of the generated Representative Sequences

So, I have a set of 893 sequences of varying lengths with max sequence length = 152. There are 10 unique states across all of them. These sequences are split into two groups: Promoted and Not Promoted....
Anand Vamsi's user avatar
2 votes
1 answer
254 views

R: TraMineR Conversion Between sequence formats SPELL to STS with out dates?

I am trying to study the volunteer trajectories of a group of individuals. My data looks like something like this. ID Program Area Impact Area Hours Served Organization Served 1 Tutoring ...
r_mpp's user avatar
  • 21
1 vote
1 answer
538 views

Is TraMineR appropriate for data with different sequence length?

My data has the sequence of each student's page visit behaviors during a learning session. For example (below) Student 1 read instructions, visited three pages ("Visit-Visit-Visit"), and ...
jakeM's user avatar
  • 11
1 vote
1 answer
466 views

Remove missing data state ‘%’ when using TraMineR’s seqpcplot() function

I am trying to conduct event sequence analysis on longitudinal survey data. I want to create a plot which looks like this (pg. 44 of https://www.researchgate.net/publication/...
Misc584's user avatar
  • 357
1 vote
1 answer
99 views

Sequence analysis clustering CHI2 EUCLID error

I am quite new to sequence analysis and trying to identify clusters in an aggregated sequence matrix, focusing on the state duration. However, when using method='CHI2'/'EUCLID' combined with step=1 (...
Rico's user avatar
  • 69
1 vote
1 answer
132 views

Setting the "tpow" and "expcost" arguments in TraMineR::seqdist

I'm actually working on the pathways of inpatients during their hospital stay. These pathways are represented as states sequences (the current medical unit at each time unit) and I'm trying to find ...
L. Trutt's user avatar
1 vote
1 answer
469 views

How to compute dissimilarities between sequences when sequences contain gaps?

I want to cluster sequences with optimal matching with TraMineR::seqdist() from data that contains missings, i.e. sequences containing gaps. library(TraMineR) data(ex1) sum(is.na(ex1)) # [1] 38 sq &...
jay.sf's user avatar
  • 70.8k
2 votes
3 answers
182 views

How to get the largest possible column sequence with the least possible row NAs from a huge matrix?

I want to select columns from a data frame so that the resulting continuous column-sequences are as long as possible, while the number of rows with NAs is as small as possible, because they have to be ...
jay.sf's user avatar
  • 70.8k
1 vote
1 answer
81 views

How to introduce noise into sequence data using TraMineR?

I want to randomly change states in a sequence dataset for the purposes of simulation. The goal is to see how different measures of cluster quality behave with different degrees of structure in the ...
Kenji's user avatar
  • 581
1 vote
1 answer
651 views

How to test if two lift values are significantly different from each other?

Consider this code: # Load libraries library(RCurl) library(TraMineR) library(PST) # Get data x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/...
histelheim's user avatar
  • 5,018
1 vote
0 answers
45 views

Comparing log-loss values for a probabilistic suffix tree?

In the PST package one can estimate the prediction quality of individual sequences using the log-loss, e.g: R> ex2 <- c("a-a-b", "a-b-a-a-b", "b-b-b-b-a") R> ex2 <- seqdef(ex2) R> ...
histelheim's user avatar
  • 5,018
1 vote
1 answer
86 views

Meaning of lag parameter in PST?

In the pmine() function in PST you can use lags. What is this lag? Does it mean that it ignores the lag first positions in the sequence? Or does it mean that you allow for lags within the subsequences?...
histelheim's user avatar
  • 5,018
1 vote
1 answer
55 views

What is the meaning of alpha in the context of an information gain pruning function?

In the PST package we use the value C as a cut-off for the information gain function used to prune the tree. The C value, for an alpha of 0.05 is calculated as follows: C95 <- qchisq(0.95, 1) / 2 ...
histelheim's user avatar
  • 5,018
2 votes
1 answer
118 views

Fitting a VLMC to very long sequences

I am trying to fit a VLMC to a dataset where the longest sequence is 296 states. I do it as shown below: # Load libraries library(PST) library(RCurl) library(TraMineR) # Load and transform data x &...
histelheim's user avatar
  • 5,018
2 votes
1 answer
98 views

Predicting conditional probabilities based on contexts with only 1 state

It seems that PST cannot predict the conditional probabilities of the next state after contexts which consist of a single state, e.g. EX-EX Consider this code: # Load libraries library(RCurl) ...
histelheim's user avatar
  • 5,018
2 votes
1 answer
57 views

Calculate lift for context-state relationship in a probabilistic suffix tree?

PST gives me probabilities and conditional probabilities for various contexts and following states. However, it would be very helpful to be able to calculate the lift (and its significance) of the ...
histelheim's user avatar
  • 5,018
2 votes
1 answer
273 views

Where in the sequence of a Probabilistic Suffix Tree does "e" occur?

In my data there are only missing data (*) on the right side of the sequences. That means that no sequence starts with * and no sequence has any other markers after *. Despite this the PST (...
histelheim's user avatar
  • 5,018
2 votes
2 answers
107 views

Getting log-likelihood from probabilistic suffix tree

Here is my code: library(RCurl) library(TraMineR) library(PST) x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/...
histelheim's user avatar
  • 5,018
0 votes
2 answers
701 views

Python/Biopython: How to search for a reference sequence (string) in a sequence with gaps?

I am facing the following problem and have not found a solution yet: I am working on a tool for sequence analysis which uses a file with reference sequences and tries to find one of these reference ...
Sefu's user avatar
  • 5
0 votes
1 answer
99 views

Sequence Analysis and Predicting the Next Label

I have recorded a dataset of about 1000 entries in the following format. TimeStamp | Action | UserId 2015-02-05 | Action1 | XXX 2015-02-06 | Action2 | YYY 2015-02-07 | Action2 | XXX ... I try to ...
nor0x's user avatar
  • 1,213
1 vote
1 answer
121 views

R -need help putting matrix into basket or transaction form

Server Epoch A B C D E 1 C301 1420100400 1 0 1 0 0 2 C301 1420100700 0 0 0 0 0 3 C301 1420152000 0 1 0 0 0 4 C301 1420238100 1 1 1 0 0 5 C301 1420324500 1 1 1 1 1 I need ...
qman's user avatar
  • 11
1 vote
0 answers
1k views

Sequence Mining using arulesSequence package in R

I am trying to learn about Sequence Mining, and I ran the following code from wikibooks as an example. The cspade function has taken over 30 minutes to run (and is still running) when the example ...
orangeteam2's user avatar
6 votes
4 answers
1k views

Pattern in continuous sequence data

Suppose I have a list of events. For example A, D, T, H, U, A, B, F, H, .... What I need is to find frequent patterns that occur in the complete sequence. In this problem we cannot use traditional ...
Haris's user avatar
  • 12.2k
2 votes
1 answer
61 views

Detecting sequencing using regexes

Imagine I have multiple character strings in a list like this: [[1]] [1] "1-FA-1-I2-1-I2-1-I2-1-EX-1-I2-1-I3-1-FA-1-" [2] "-1-I2-1-TR-1-" [3] "-1-I2-1-FA-1-I3-1-" ...
histelheim's user avatar
  • 5,018
1 vote
2 answers
888 views

Traminer substitution cost

I have a logical problem with the transition cost matrix. I am working on sequences dissimilarity using the R package Traminer. I try to give you a simple example (very simple, but I hope useful to ...
Giampiero's user avatar
1 vote
1 answer
197 views

How to address void elements overwhelming analysis?

I'm conducting some analysis on sequence data with very different lengths using TraMineR. What ends up happening is that the void elements (%) used to make the sequences equally long end up ...
histelheim's user avatar
  • 5,018
1 vote
1 answer
346 views

TraMineR:::seqerules help page?

Is there a help-page for TraMineR:::seqerules? I cannot seem to find it, either in the package nor online. The lack of this help page makes the output somewhat difficult to interpret. For example what ...
histelheim's user avatar
  • 5,018
1 vote
1 answer
113 views

Format of output for seqecmpgroup() function?

The seqecmpgroup() function returns a table that, among other things, include frequencies for each of the specified groups. However, when I run this it generates frequencies below 1 (e.g. 0.00035). ...
histelheim's user avatar
  • 5,018
1 vote
2 answers
1k views

seqinr dotplot - change axis

I have to datasets: seq1 and seq2 (DNA sequences). I wanted to do a dataplot, comparing the two sequences and placing a dot where the two sequences match. I was able to accomplish this using seqinr's ...
Re-l's user avatar
  • 301
2 votes
2 answers
184 views

How to identify sequences within each leaf from a regression tree?

Using the biofam dataset library(TraMineR) data(biofam) lab <- c("P","L","M","LM","C","LC","LMC","D") biofam.seq <- seqdef(biofam[,10:25], states=lab) head(biofam.seq) Sequence ...
histelheim's user avatar
  • 5,018
3 votes
1 answer
348 views

TraMineR Using Weights

I am still new to TraMineR; therefore, my problem might be very simple for most of you. I am working on some sequence plots with my data and would like to see the results with the survey weights and ...
user3355411's user avatar