0

I tried (unsuccessfuly) to identify the individual corresponding to a representative sequence, using function seqrep() from the R package TraMineR.

I read Gabadinho, A., and G. Ritschard (2013), "Searching for typical life trajectories applied to childbirth histories", In Levy, R. & Widmer, E. (eds) Gendered life courses - Between individualization and standardization. A European approach applied to Switzerland, pp. 287-312. Vienna: LIT.

I was able to visualize the representative sequence(s) of my sequence data using seqrplot(), with different parameters in seqrep ("freq", "density",...).

My aim is to identify in the associated survey database the individual(s) to whom the representative sequence(s) correspond(s) in order to describe their (i.e. social) characteristics.

I wasn't able to do this step.

Thank you for your help. Best regards, Jacques-Antoine

0

1 Answer 1

0

If I understand well, you want the indexes of the representative sequences. Since the representative sequences are taken from the dataset, they belong to the dataset. We can identify them by searching for the sequences that are at a distance 0 from the representative. However, a representative sequence may occur several times in a same dataset, i.e., there may be more than one sequence at a distance 0 from the representative.

Here I show using the biofam data how you can identify the indexes of the first occurrence of each representative in the dataset.

library(TraMineR)
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
                "Child", "Left+Child", "Left+Marr+Child", "Divorced")
biofam.seq <- seqdef(biofam, 10:25, labels=biofam.lab)
## Computing the distance matrix
costs <- seqsubm(biofam.seq, method="TRATE")
biofam.om <- seqdist(biofam.seq, method="OM", sm=costs)
## Representative set using the neighborhood density criterion
biofam.rep <- seqrep(biofam.seq, diss=biofam.om, criterion="density")

rep.dist <- attr(biofam.rep,"Distances")
## retrieving assigned representative
rep.grp <- apply(rep.dist, 1, which.min)
## distance to its representative
dist.to.rep <- apply(rep.dist,1, min, na.rm=TRUE)

nrep = ncol(rep.dist)
idx.rep <- integer(length=nrep)
idx.rep.list <- list()
for (i in 1:nrep){
  idx.rep.list[[i]] <- which(rep.grp==i & dist.to.rep==0)
  idx.rep[i] <- idx.rep.list[[i]][1]
}
idx.rep
## [1]  60  31   1 163

The first occurrence of the first representative sequence corresponds to the case 60, the second representative to case 31, the third to case 1, and the fourth to case 163.

Each representative sequence occurs more than once. For example the occurrences of the first representative are:

idx.rep.list[[1]]
## [[1]]
## 1692 2530 2302  416 1386 1921  323 1857  908  746 2379 2404 1403 1893  348 1688 1629 1799  139 1987 
##   60  128  130  152  235  312  389  494  534  537  621  965 1024 1035 1231 1458 1459 1573 1607 1752

Not the answer you're looking for? Browse other questions tagged or ask your own question.