1

My data has the sequence of each student's page visit behaviors during a learning session. For example (below) Student 1 read instructions, visited three pages ("Visit-Visit-Visit"), and revisited one of the pages ("Revisit"). Student 2 read instructions and visited two pages without any revisit.

Student 1: Instructions-Visit-Visit-Visit-Revisit

Student 2: Instructions-Visit-Visit

Student 3: Instructions-Visit-Visit-Visit-Visit-Visit-Visit-Visit-Visit-Visit-Visit-Visit

My question is TraMineR package is appropriate for this type of data where different individuals have different sequence lengths (Student 1 has 5, Student 2 has 3, etc). The sample data "mvad" discussed in the TraMineR vignette (https://cran.r-project.org/web/packages/TraMineR/vignettes/TraMineR-state-sequence.pdf) has state information captured during a specific time period (Jul.93 through Jun.99), which means that the number of sequence length is same for all individuals. Given this difference, I am not sure if it is okay to use TraMineR for analyzing my date.

I tried a couple of TraMineR functions on my data (seqdef, seqfplot, etc). Thee results make sense to me so far, but I want to make sure before going further and doing more (clustering analysis, etc). If anyone has experience of using TraMineR for this types of data, I would appreciate your inputs. If TraMineR is not appropriate for this, any suggestions for alternative approach? My goal is to identify and visualize major patterns of behaviors in the data, possibly using clustering analysis. Thanks in advance!

1 Answer 1

1

Yes, you can use traMineR for analyzing data with different sequence lengths, as traMineR is a collection of sequence analysis tools.

What matters when you have sequences of unequal length is what distance algorithm you are using. Optimal Matching (OM) which I believe is the default, and the often used standard, accepts sequences of unequal length, as it is using indel (insert/delete) actions to "make" the sequences the same length. Other type of distance algorithms however, as such as the hamming distances (HAM or DHD) does not allow sequences of unequal lengths. These algorithms are often use when timing is important, and by inserting states to make sequences equal lengths, the timing aspects is skewed.

So short answer is yes, but make sure to read up on the type of distance algorithms you are using so that you understand what it is that you are measuring, and in what way it will impact your interpretations.

Not the answer you're looking for? Browse other questions tagged or ask your own question.