Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tensor Representations in Signal Processing and Machine Learning (Tutorial at APSIPA-ASC 2020)

Tensor Representations in Signal Processing and Machine Learning (Tutorial at APSIPA-ASC 2020)

Tatsuya Yokota

July 02, 2024
Tweet

More Decks by Tatsuya Yokota

Other Decks in Science

Transcript

  1. 2 Organization of This Tutorial Basics Applications Advances This tutorial

    consists of 3 parts. Each part may have 45 mins for presentation + 5-10 mins for Q&A + break
  2. 4 Variables having indices Generalization of scalars, vectors, and matrices

    Ex: Number of indices : “Order” Scalar : 0th order tensor Vector : 1st order tensor Matrices : 2nd order tensor … Tensors
  3. 5 Multi-way array 0th order tensors : 1st order tensors

    : 2nd order tensors : 3rd order tensors : Tensor data Vector → array of scalars Matrix → array of vectors 3rd order Tensor → array of matrices
  4. 6 Let us imagine 3rd order tensor by a “cube”

    4th order tensor A block 3rd order tensor of which elements are vectors A block matrix of which elements are matrices A block vector of which elements are 3rd order tensors 5th order tensor A block 4th order tensor of which elements are vectors A block 3rd order tensor of which elements are matrices A block matrix of which elements are 3rd order tensors A block vector of which elements are 4th order tensors Nth order tensor A block vector of which elements are (N-1)th order tensors Recursive representations of Nth order tensor ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・・・ ・・・ ・・・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・・・ ・・・ ・・・ ・ ・ ・ Block matrix Block matrix Block vector Block vector
  5. 7 Tensor representation has benefits when tensors have “low-rank structures”

    or “smooth structures” Many applications Dimensionality Reduction (Compression), Pattern Recognition, Blind Source Separation, Signal Restoration, Deep Neural Networks, etc. Why tensors
  6. 8 Psychometrics (1960s-) Linguistics (1970s-) Chemometrics (1980s-) Food industry (1990s-)

    Computer Science / Data Engineering (1990s-) Multi-media signal processing (audio, speech, image, video) Bio-medical signal processing (EEG, MEG, functional MRI, diffusion MRI) Bio-informatics (gene, protein) Remote sensing (Hyperspectral image) Web/data mining (SNS analysis, recommender system) Non-linear time series analysis Wireless communications Deep learning etc Fields of applications ID 001 ID 002 ID 003 ID 004
  7. 9 “Tensor Component Analysis” by Yipeng Liu in APSIPA ASC

    2019 “Cause-and-Effect in a Tensor Framework” by L. De Lathauwer, M. Alex, O. Vasilescu, and J. Kossaifi in CVPR 2019 “Tensor Factorizations, Data Fusion & Applications” by E. Acar in LVA/ICA 2018 “Blind Audio Source Separation on Tensor Representation” by H. Sawada, N. Ono, H. Kameoka, D. Kitamura in ICASSP 2018 “Tensor Decomposition for Signal Processing and Machine Learning” by N.D. Sidiropoulos, L. De Lathauwer, and X. Fu, E.E. Papalexakis in ICASSP 2017 “A New Tensor Algebra” by L. Horesh and M. Kilmer in CVPR 2017 Related Tutorials / Workshops
  8. 10 “Tensor Decompositions for Signal Processing Applications: From Two-Way to

    Multiway Component Analysis” has been awarded in ICASSP 2019. SPM Best Paper Award
  9. 11 Intermediate Summary Tensor is a multi dimensional array Generalization

    of vectors, matrices … Many data can be represented by using tensors Tensor processing have many applications. Tensor processing have attracted attentions Workshops/Tutorials Awards So, lets learn about tensors!!
  10. 13 Scalar variables lowercase / uppercase letter : Vector variables

    Lowercase bold letter : Matrix variables Uppercase bold letter : High order tensor variables (3rd, 4th, …, Nth ) Calligraphic bold letter : Mathematical notations
  11. 14 Here, we call axes of a tensor as “modes”

    Modes of tensor 1st mode 2nd mode 3rd mode
  12. 15 A value of (i, j, k) element of a

    tensor is denoted by Elements of a tensor
  13. 16 Fibers of a tensor are denoted by using “

    : ” Fibers of a tensor Fiber along the 1st mode Fiber along the 2nd mode Fiber along the 3rd mode
  14. 17 Slices of a tensor are also denoted by using

    “ : ” Slices of a tensor Slice of the 1st mode Slice of the 2nd mode Slice of the 3rd mode
  15. 18 Matrix transposition unique Switch the 1st and 2nd modes

    Transposition / Mode permutation Tensor transposition several choices permute selected modes A case of 3rd order tensor
  16. 19 Vectorization is a operation that extracts all fibers from

    a tensor and constructs a very long vector by concatenating all fibers. Vectorization
  17. 20 Note the order of elements of a vector transformed

    from a tensor Vectorization (cont’d)
  18. 21 A motivation of vectorization It is easier to understand

    some functions which does not care tensor or not l2 norm of a tensor (Frobenius norm) Inner product between two tensors
  19. 22 n-th mode unfolding is an operation that extracts all

    fivers along the n-th mode and constructs a horizontally long matrix by concatenation. Unfolding / Matricization
  20. 23 Meaning of “n-th mode unfolding” We should carefully understand

    the meaning of Ex) 3rd mode unfolding of 5th order tensor n-th mode unfolding unfold all modes except the n-th mode keep the 3rd mode unfold the 1st, 2nd, 4th, 5th modes
  21. 24 A motivation of n-th mode unfolding n-th mode unfolding

    is useful to see the linear characteristics of tensor from each mode For example, Tucker rank or multilinear rank of tensor can be defined by using matrix rank of n-th mode unfolding
  22. 25 Folding / Tensorization Vectorization and matricization are generalized as

    “unfolding” It sometimes help us to understand the linear characteristics of tensors On the other hand, “folding” can be considered, too. unfolding folding
  23. 26 2 An example of folding : vector to matrix

    Folding can be considered as a “grouping” elements Ex: triplet is extracted for each column vector, not unique How to consider which folding is better? ➔ low rankness can be one criterion 9 6 3 6 4 2 3 2 1 3 2 1 6 4 2 9 6 3 3 2 1 6 3 4 6 9 Folding 1 Folding 2 3 2 1 1 2 3 × 0 0 1 1 4 9 × 2 3 6 1 1 0 × + rank is 1 rank is 2
  24. 27 An interesting tensor representation of images Image data can

    be naturally represented by matrix, but folding have some benefits for compression (256,256) Low rank approximation #param = 2048 #param = 2048 16th order tensor (2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2) Low rank approximation Folding
  25. 28 Higher order tensor representations of images More beneficial tensor

    representation? (256,256) #param = 2048 #param = 2048 (2,2,2,2,2,2,2,2) (2,2,2,2,2,2,2,2) 16th order tensor 8th order tensor Folding 1 Folding 2 (4,4,4,4,4,4,4,4) Low rank approximation Low rank approximation
  26. 29 Vectorization and matricization are special cases of unfolding/folding Folding

    and unfolding are linear operations in general Folding / unfolding folding folding unfolding unfolding folding unfolding ��
  27. 31 Addition of two tensors ➔ element-wise addition Scaling A

    tensor multiplied by a scalar Addition & Scaling + = + = = = * * Image of addition Image of scaling
  28. 32 Elementwise multiplication (Adamard product) and division is denoted by

    circled asterisk and circled slack , respectively. Elementwise multiplication / division = = Image of Adamard product * = = Image of elementwise division / ◦ * / ◦
  29. 33 Inner product of two vectors is a summation of

    all elements of their elementwise (Adamard) product. It can be naturally generalized that “inner product of two vectors obtained by vectorising two tensors”. Inner product * * * * … * sum * * * * … * sum vec vec
  30. 34 Outer product of two vectors outputs a matrix Outer

    product of 3 vectors outputs a 3rd order tensor Outer product = =
  31. 35 Kronecker product outputs a large matrix from two small

    matrices. Kronecker product ◦ × = * * * * * * =
  32. 36 Kronecker product and outer product are essentially same. Kronecker

    product (cont’d) ◦ × = * * * * * * Vectorize each * * * * * * Vectorize reshaping =
  33. 37 Khatri-Rao product is defined between two matrices of which

    number of columns is same. Khatri-Rao product = * * * * * * =
  34. 38 Matrix product is defined between two matrices of which

    number of columns/rows is same. Matrix product is an combination of inner product and outer product Matrix product = + + = inner product w.r.t r outer product w.r.t i and j = + +
  35. 39 Matrix product is defined between two matrices of which

    number of columns/rows is same. Matrix product is an combination of inner product and outer product Khatri-Rao product is a sub-part of matrix product. Matrix product (cont’d) = + + = = + + = * * * * * * = sum fold
  36. 40 Tensor-Matrix product Tensor-Matrix product is defined between a tensor

    and a matrix of which the size of some mode is same. inner product w.r.t r outer product w.r.t i and (j,k) = = fold / unfold fold / unfold
  37. 41 Tensor-Matrix product (cont’d) Tensor-Matrix product is defined between a

    tensor and a matrix of which the size of some mode is same. It is also called as nth-mode product. Easier to understand through nth-mode unfolding. inner prod nth mode prod
  38. 42 Tensor-Matrix product (cont’d) all modes An Nth order tensor

    multiplied by N matrices from all modes inner prod
  39. 44 We denote a single tensor as following diagram (it

    is very helpful) Using a node and edges, the number of edges means order of a tensor Inner product of tensors can be denoted by the link of all edges Tensor Diagram vector matrix 3rd order tensor 4th order tensor ・・・ ・・・ ・・・ ・・・
  40. 45 Matrix multiplication General tensor multiplication The tensor multiplication is

    essentially connecting the edges (modes) of tensors. There is some freedom in how to connect them. Tensor Diagram (cont’d) (I×J) (I×R)(R×J) R I J I J = I J K I J K I J K I J K J K I J K I J K I I J K
  41. 46 Summary of Part I Important concepts in tensor processing

    are explained Notations Fiber / Slice Unfolding (vectorization, matricization etc) Inner product Outer product Special products (Kronecker, Khatri-Rao etc) General tensor multiplication Tensor diagram I think that it is nice to geometrically imagine how the tensors are unfolded, folded, transformed, and etc. So, this time, I tried to explain them using figure. Q&A
  42. 48 Outline of Part II Tensor decompositions CP decomposition Tucker

    decomposition TT decomposition Others Applications Dimensionality Reduction Pattern Recognition Blind Source Separation Signal Restoration Deep Neural Networks
  43. 49 Tensor decomposition is a higher order generalization of matrix

    decomposition General definition may be “a tensor represented by multiple tensors” A specific decomposition is given by a parametric model f and its parameter tensors. f( , , ) Tensor decomposition = Matrix decomposition = core tensor factor matrix
  44. 50 + Canonical polyadic (CP) decomposition Canonical Polyadic (CP) decomposition/model

    is a natural extension of low- rank matrix decomposition in terms of Khatri-Rao product. = = ◦ ◦ + ◦ = + + ◦ ◦ ◦ unfold extend fold CP decomposition Low-rank matrix decomposition extend
  45. 51 Optimization problem is as follow Alternating Least Squares (ALS)

    algorithm is typical and widely used. Alternately update each factor matrix with a least squares solution while freezing the other factors An algorithm for CP decomposition Initialize:
  46. 52 Least square rule for update in CP Unfolding the

    cost function Its derivative Seeking note
  47. 53 Tucker decomposition Tucker decomposition/model is a natural extension of

    low-rank matrix decomposition in terms of Kronecker product. unfold extend fold = Tucker decomposition Low-rank matrix decomposition extend =
  48. 54 Optimization problem is as follow First, we unfold cost

    function Partial differential with g Parameter g can be eliminated An algorithm for Tucker decomposition note
  49. 56 Optimization problem is as follow Alternating Least Squares (ALS)

    algorithm is typical and widely used. Alternately update each factor matrix with a least squares solution while freezing the other factors An algorithm for Tucker decomposition (cont’d) Initialize:
  50. 57 Least square rule for update in Tucker Unfolding the

    cost function Since the constraint of orthogonality, we consider Lagrange function as follow Seeking note Lagrange multiplier
  51. 58 Tensor-Train (TT) decomposition Tensor-Train (TT) decomposition [Oseledets, 2011] is

    a relatively emergent model It have a benefit of extremely light parameterization for high order huge tensors Elements of tensor can be calculated by multiplication of slices/fibers of core tensors = TT decomposition
  52. 60 ALS update for TT-decomposition ⓪ TT-SVD diagonal matrix left

    orthogonal right orthogonal ① SVD ③ SVD ④ SVD ⑤ SVD ⑥ SVD ⑦ SVD ② SVD ⑧ SVD ⑨ SVD ⑩ SVD
  53. 61 Tensor-Ring (TR) decomposition and others Tensor-Ring (TR) decomposition is

    a generalization of TT-decomposition Also many other networks can be considered. Now a day, tensor decompositions (including CP, Tucker etc) are called as Tensor Netoworks in general. = CP Tucker Hierarchical Tucker (HT) TT Projected Entangled-Pair States (PEPS)
  54. 64 In general, the number of elements/parameters to describe the

    tensor data grows exponentially with the tensor order, N. Tensor decomposition works to reduce the number of parameters. For example, CP decomposition with I=10, R = 10 Dimensionality reduction Number of param. Original Tensor CP (R=10) N=3 1,000 3,00 N=4 10,000 4,00 N=5 100,000 5,00 N=6 1,000,000 6,00
  55. 65 A simple experiment: image reconstruction with smaller #parameters Dimensionality

    reduction (cont’d) 128 128 128 Unfold & SVD = 128 128*128 R CP decomposition + … + + ◦ ◦ ◦ R Tucker decomposition (R,R,R)
  56. 66 Reconstructed MR images Dimensionality reduction (cont’d) CP Tucker SVD

    R=1 R=10 R=100 R=200 R=400 R=800 R=1600 R=(1,1,1) R=1 R=5 R=10 R=20 R=40 R=70 R=100 R=(10,10,10) R=(30,30,30) R=(40,40,40) R=(50,50,50) R=(80,80,80) R=(120,120,120)
  57. 67 SNR vs #parameters Dimensionality reduction (cont’d) SVD, R=10 Tucker,

    R=[50, 50, 50] CP, R=400 SVD, R=10 CP, R=400 Tucker R=[50, 50, 50]
  58. 68 Advances in dimensionality reduction Sparse coding of tensor [Caiafa+,

    2013 (in NECO)] [Caiafa+, 2013 (in WIRES)] Tensor networks (TNs) for dimensionality reduction [Zhao+, 2016] [Cichocki+, 2017] Genetic algorithm for TN model selection [Li+, 2020 (in ICML)]
  59. 69 Pattern recognition Linear Discriminant Analysis (LDA) is a typical

    method to obtain the best linear projection which can separate different categorized patterns. … Linear projection maximize minimize
  60. 70 Pattern recognition (cont’d) LDA can be naturally extended by

    using multi-linear projection instead of linear projection. Multiway LDA … Linear projection maximize minimize … Multi-linear projection maximize minimize (N+1)-th order tensor
  61. 71 Pattern recognition (cont’d) LDA has a closed form solution

    using Eigen Value Decomposition (EVD) Multi-way LDA is obtained by alternating optimization Eigenvectors of Update by EVD for n = 1, …, N repeat endfor until converge
  62. 72 Human face identification [Ye+, 2005 (in NeurIPS)] Pattern recognition

    (cont’d) 112 92 ORL faces 10^4 … 10^4 10^4  In the case of LDA, EVD of a large matrix is necessary ☺ In the case of (Multi-way) 2DLDA, the matrix is much smaller L 112 92 L Obtain EVD 112 112 92 92 Update EVD Update EVD
  63. 73 Human face identification [Ye+, 2005 (in NeurIPS)] Two stage

    feature extraction by using 2DLDA + LDA is proposed Running time of 2DLDA was much shorter than LDA Accuracy of 2DLDA+LDA was slightly better than LDA Pattern recognition (cont’d) 112 92 10 10 vec 100 40 Category Nearest Neighbor
  64. 74 Gate recognition [Tao+, 2007 (in TPAMI)] Gate image is

    represented as a 4th order tensor by convolving Gabor functions The Gabor Tensor is discriminated by (multi-way) Generalized LDA Pattern recognition (cont’d) 5×8 Gabor functions Category vec GLDA LDA Some simple classifier
  65. 75 Blind source separation (BSS) is a method to estimate

    original sources from multiple mixture signals Blind source separation
  66. 76 BSS on Tensor Representation [Sawada+, Tutorial in ICASSP2018] Independent

    low-rank matrix analysis (ILRMA) [Kitamura+, 2016] Blind source separation (cont’d) channel time channel time frequency Tensor representation by STFT for each channel source = Demixing matrices for each frequency frequency ≈ .^2 ② Nonnegative CP decomp. of power tensor ◦ ① Reconstructed tensor is parameterized by using demixing matrices
  67. 77 Tensorizing BSS problem using “Segmentation” [Bousse+, 2017 (in IEEE-TSP)]

    Segmentation is one of the way of tensorization (folding) This strategy is nice in case of the sources having low-rank structure in matrix/tensor representation Blind source separation (cont’d) tensor representation
  68. 78 It is a technique to recover the corruption of

    signals Signal restoration Noise reduction Deconvolution Inpainting Super-resolution
  69. 79 Signal restoration (cont’d) Tensor decomposition can be used for

    recovering signals which have low- rank structure = Consistent Tensor representation f( , , ) CP decomposition Tucker decomposition TT decomposition etc
  70. 80 Signal restoration (cont’d) Low-rank tensor completion Corrupted image is

    directly represented as a 3rd order tensor (Height * Width * RGB) Smooth CP decomposition [Yokota+, 2016 (in IEEE-TSP)] = ・・・ height width rgb height width rgb Consistent + + ◦ ◦ ◦ ・・・ + + + + = smooth components
  71. 81 Signal restoration (cont’d) Non-local filtering for image denoising BM3D

    [Dabov+, 2007 (in IEEE-TIP)] GSR [Zhang+, 2014 (in IEEE-TIP)] Truncated HOSVD [Yokota+, 2017 (in IEEE-TSP)] http://www.cs.tut.fi/~foi/GCF-BM3D/ ① Extracting similar patches and represent it as a tensor ② The tensor is approximated by low-rank tensor to reduce noise component
  72. 82 Signal restoration (cont’d) There are many studies of low-rank

    based tensor data recovery Low-n-rank tensor recovery [Gandy+, 2011 (in Iverse Problems)] Low rank tensor completion [Liu+, 2012 (in IEEE-TPAMI)] Bayesian CP decomposition [Zhao+, 2015 (in IEEE-TPAMI)] t-SVD [Zhang+, 2017 (in IEEE-TSP)] Low-rank and smooth tensor recovery [Yokota+, 2017 (in CVPR)] [Yokota+, 2019 (in IEEE-Access)] Tensor ring completion [Wang+, 2017 (in ICCV)]
  73. 83 Currently, DNNs are playing central roles in various signal

    and information processing studies. Especially, convolutional neural networks (CNNs) is widely employed for its efficiency. Deep neural networks (DNNs) Recognition Segmentation Enhancement Restoration Synthesis for Audio Speech Language Image Bio-signal Input Convolution Activation Convolution Activation … Output Convolution Convolution Activation
  74. 84 , 2D convolutional layer in CNNs is a mapping

    from the 3rd order tensor space to another 3rd order tensor space Its parameter is a 4th order tensor DNNs (cont’d) height width channels height width channels , , conv layer
  75. 85 Tensor decomposition in DNNs Convolutional layer [Lebedev+, 2015 (in

    ICLR)] Fully connected layer [Novikov+, 2015 (in NeurIPS)] DNNs (cont’d) , , , Fold / Tensorize = = TT decomposition CP decomposition ◦ ◦ ◦
  76. 86 Tensor decomposition in DNNs Convolutional layer [Lebedev+, 2015 (in

    ICLR)] Full convolution (4th order tensor) is approximated by CP model It becomes sequential convolutions with fibers Experiments: evaluated in image recognition (largely speedup with few accuracy drop) DNNs (cont’d) , , , CP decomposition ◦ ◦ ◦ (9,9,48,128) rank
  77. 87 Tensor decomposition in DNNs Fully connected (FC) layer [Novikov+,

    2015 (in NeurIPS)] Experiments: 1000 class ImageNet using VGG16 and VGG19 There are three FC layers DNNs (cont’d) = = TT decomposition FC1 (25088, 4096) FC2 (4096, 4096) FC3 (4096, 1000) 25088 = 2*7*8*8*7*4 4096 = 4*4*4*4*4*4 No compression ➔ Using TT TT-rank ➔ was 1, 2, or 4 Matrix decomp matrix rank ➔ was 1, 5, or 50 Efficient compression with few degradation!!
  78. 88 DNNs (cont’d) Other references in machine learning Compression of

    CNN with Tucker [Kim+, 2016 (in ICLR)] Supervised Learning with Tensor Networks [Stoudenmire+, 2016 (in NeurIPS)] Exponential Machines [Nivikov+, 2017 (in ICLR workshop)] Compact RNNs with Block-term Tensor Decomposition [Ye+, 2018 (in CVPR)] Compressing RNNs with Tensor Ring [Pan+, 2019 (in AAAI)] Tensor network decomposition of CNN kernels [Hayashi+, 2019 (in NeurIPS)] For understanding DNNs [Li+, 2019 (in ICASSP)] [Li+, 2020 (in AISTATS)]
  79. 89 Summary of Part II Tensor decomposition has various applications

    A common concept is “representing signals by high-order tensors which has low-rank structure” and do the tensor decomposition Direct representation Multi-scale Gabor representation Time-frequency representation Similar patch grouping High-order tensorization Other perspectives in tensor studies: decomposition models, optimization algorithm, model selection, rank estimation, etc signals tensors decomposition Dimensionality Reduction Pattern Recognition Blind Source Separation Signal Restoration Deep Neural Networks
  80. 90 Some Discussions Main flow was Tensorization + Decomposition However,

    in more strict sense we need to consider how to select the model for decomposition Also it is also important how to efficiently obtain the decomposition in optimization Model selection Usually, model of tensor decomposition is manually selected by researchers / users, however, the there are various candidates of the tensor decomposition nowadays. It is very difficult task to determine the model from given data. Usually it requires some professional knowledge of both application and tensor fields. Evolutionary Topology Search for Tensor Networks [Li+, 2020 (in ICML)] Optimization It is quite important to study how to efficiently solve the complicated optimization problems to perform tensor (network) decompositions Many methods for optimization: Gradient method, Newton method, alternating least square, hierarchical alternating least squares, stochastic gradient … Randomized SVD [Minster+, 2020] [Ahmadi-Asl+, 2020] TensorSketch [Malik+, 2018 (in NeurIPS)] GPU acceleration [Choi+, 2018]
  81. 92 Introduction Till now, I showed tensor operations, decompositions, and

    its applications A common concept was “tensorize” and “decompose” A question here is “how to tensorize the signals” Generally, some specific knowledge on the signals is necessary to design the tensors Here, I introduce the multi-way Hankelization / delay-embedding for tensors as an advanced topic on the tensor processing. signals tensors decomposition here
  82. 93 Outline Hankelization / delay-embedding for time-series signals Multi-way Hankelization

    / delay-embedding for tensors Applications Open problems and perspectives
  83. 94 Limitation of direct representation Revisiting signal restoration using tensor

    decomposition When whole slices are missing in a tensor, low-rank tensor decomposition does not work Random pixel missing can be recovered, but slice can not be recovered. Since slice missing deflates the matrix/tensor rank, its deflated component can not be recovered. Slices & random missing Recovered by low-rank decomposition
  84. 95 Approach to use Hankel representation Here, we introduce to

    use low-rank tensor decomposition not in direct tensor space, but in embedded (Hankel) tensor space. height width rgb Low-rank tensor completion Delay embedding / Hankelization High order incomplete tensor Low-rank tensor completion Inverse operation output High order completed tensor  ☺
  85. 96 Here, I introduce a technique from dynamical system analysis.

    For a time series: for We consider a state space : for It is called as “Delay-embedding (DE)” What is delay-embedding (DE)
  86. 97 DE is a transformation from a vector to a

    Hankel matrix. (i.e. Hankelization) Hankel matrix : a matrix of which anti-diagonal elements are the same Delay-embedding (DE) is Hankelization
  87. 99 Inverse of delay-embedding (DE) Can we define the inverse

    of DE? There is no inversion of DE since it is essentially a duplication We only consider the pseudo-inverse of DE folding unfolding duplicate … average … average duplicate …  ☺ ?
  88. 100 SVD of Hankel matrix Seeing singular values of Hankel

    matrix with various value of Hankel matrix has a low-rank structure 3d-plot of DE(x,τ=3) and its principal patterns Order of singular values
  89. 101 SVD of Hankel matrix (cont’d) Seeing left- and right-

    singular vectors of Hankel matrix Left singular vectors are very similar to discrete cosine transform (DCT) basis
  90. 102 Time-series recovery by using truncated SVD missing&noisy D.E. missing&noisy

    Hankel matrix low rank approximation Inverse D.E. recovered Completion
  91. 104 We have three choices of delay-embedding for matrix data

    (a) Delay-embedding only column direction (b) Delay-embedding only row direction (c) Delay-embedding both directions Delay-embedding for matrix … … (a) … … … (b) (c) … … … … … … … … 3rd order tensor 4th order tensor
  92. 105 Considering “both” case for matrix delay-embedding Each column is

    Hankelized by DE It obtains time-series of Hankel matrices Further DE obtains “block Hankel matrix” Block Hankel matrix is defined as a block matrix which has Hankel structure and its all elements are also Hankel matrices Block Hankel structure … … … … … … … … … … DE DE a b c d z A B C D A B C B C D C D E Z Z
  93. 106 DE = linear duplication + folding Multiway delay-embedding =

    multilinear duplication + folding We call it as “Multiway delay-embedding transform (MDT)” MDT: A generalization of DE for tensors = I τ*(I-τ+1) I-τ+1 τ = 3rd order tensor (I1, I2, I3) 6th order tensor
  94. 107 DE vs Inverse DE MDT vs Inverse MDT Inverse

    MDT 6th order tensor † † † = 3rd order tensor (I1, I2, I3) 6th order tensor (I1, I2, I3) 3rd order tensor
  95. 108 MDT of Image and its high order SVD Seeing

    the components of Hankel tensor [Yokota+, 2018 (in APSIPA)] Very similar to DCT Bases
  96. 109 Application to Tensor Completion Incomplete tensor Incomplete block Hankel

    tensor Complete block Hankel tensor Complete tensor Low-rank Tucker decomposition in embedded space [Yokota+, 2018 (in CVPR)] ① MDT of given incomplete tensor of original size (Nth order ➔ 2Nth order) ② Tucker decomposition of incomplete block Hankel tensor ③ Inverse MDT of comlete block Hankel tensor (2Nth order ➔ Nth order) MDT Inv. MDT Tucker Decomp.
  97. 110 Tucker decomposition of complete tensor was explained before Here,

    I explain the method for Tucker decomposition of incomplete tensor Optimization problem can be given by It can be solved by auxiliary function approach and ALS algorithm Update Tucker decomposition of an incomplete tensor : Incomplete tensor : binary tensor of same size to → 0 represents missing entry → 1 represents observed entry ALS to minimize alternative optimization To convergence ↑ it is same as Tucker for complete tensor
  98. 111 (256,256,3) (32,225,32,225,3) (R1,R2,R3,R4,3) (256,256,3) Experimental results (32,32) → (R1,R3)

    (225,225) → (R2,R4) 16 32 64 4 8 Slice missing was recovered with appropriate rank decomposition ➔ But the best rank is unknown
  99. 112 Adaptive rank increment algorithm To be free from the

    rank estimation problem in tensor completion, we propose to use rank increment algorithm
  100. 117 Other works Here, I showed an original work of

    MDT Tucker decomposition with MDT [Yokota+, 2018 (in CVPR)] We are extending it to others TT decomposition with MDT [Sedighin+, 2020 (in IEEE-SPL)] Time series analysis with MDT [Shi+, 2020 (in AAAI)] Manifold modeling with MDT [Yokota+, 2020 (in IEEE-TNNLS)] MDT Inv. MDT + ◦ ◦
  101. 118 Relation between Hankelization and Convolution Actually, Hankelization is a

    part of convolution Hankelization can be applied to incomplete data ☺ Convolution (STFT, Wavelet) can not be applied to incomplete data  convolution Hankel matrix
  102. 119 Open problems How to decide the value of Some

    criteria would be helpful Computationally expensive The number of elements of the Hankel tensor dramatically increases w.r.t. Ex: we consider MDT of (128,128,128)-tensor small large DE 128 128 128 τ Size of Hankel tensor Memory requirement 8 (8, 121, 8, 121, 8, 121) 約7GB 16 (16, 113, 16, 113, 16, 113) 約47GB 32 (32, 97, 32, 97, 32, 97) 約239GB 64 (64, 65, 64, 65, 64, 65) 約575GB
  103. 120 Other perspectives More applications would be interesting Network data

    analysis, Medical imaging, Hyperspectral images, etc Hankel representation in analysis of CNNs Convolutional sparse coding [Papyan+, 2017] Neural tangent denoiser [Tachella+, 2020] Dual convex formulation of CNNs [Anonymous, 2020]
  104. 121 Closing tutorial I’d like to show my appreciation to

    organizers of APSIPA ASC 2020 and all participants!! This tutorial is an intro. of tensor methods in signal processing and machine learning. I tried to avoid cutting-edge difficult content as much as possible and make the tutorial friendly to students and beginners. Community of tensor studies is growing now, and all the people are welcome to this fields having any interest on theory, algorithm, software development, and applications!! Basics Applications Advances * * * * … * = + + ◦ ◦ ◦ = … = + +
  105. 122 Acknowledgements Andrzej Cichocki (Skoltech) Qibin Zhao (RIKEN AIP) Chao

    Li (RIKEN AIP) Cesar F. Caiafa (CONICET) Hidekata Hontani (Nitech) Burak Erem (TrueMotion) Simon K. Warfield (Harvard)