How is it possible to find out which "represented sequence(s)" are represented by which “representative sequence(s)”?
For example, in the following example, is there a way to find the original 627 sequences represented by “r1”?
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
"Child", "Left+Child", "Left+Marr+Child", "Divorced")
biofam.seq <- seqdef(biofam, 10:25, labels=biofam.lab)
## Computing the distance matrix
costs <- seqsubm(biofam.seq, method="TRATE")
biofam.om <- seqdist(biofam.seq, method="OM", sm=costs)
## Representative set using the neighborhood density criterion
biofam.rep <- seqrep(biofam.seq, diss=biofam.om, criterion="density")
biofam.rep
summary(biofam.rep)
[>] criterion: density
[>] 2000 sequence(s) in the original data set
[>] 4 representative sequences
[>] overall quality: 0.08113734
[>] statistics for the representative set:
na na(%) nb nb(%) SD MD DC V Q
r1 627 31.4 225 11.25 4566 7.28 4856 4.73 5.97
r2 577 28.8 123 6.15 4305 7.46 5175 5.05 16.81
r3 411 20.5 115 5.75 2658 6.47 2394 4.34 -11.04
r4 385 19.2 93 4.65 3006 7.81 3393 5.57 11.42
Total 2000 100.0 556 27.80 14535 7.27 15818 7.91 8.11
na: number of assigned objects
nb: number of objects in the neighborhood
SD: sum of the na distances to the representative
MD: mean of the na distances to the representative
DC: sum of the na distances to the center of the complete set
V: discrepancy of the subset
Q: quality of the representative
A complementary question. It would be great if there would be more explanation/clarification on how "na" and "nb" should be read and interpreted. For example, are the 4 representative sequences (r1, r2, r3, r4) representing the 2000 sequences or just the 556 sequences?
I tried to find answers to my questions.