Tensor Representations in Signal Processing and Machine Learning (Tutorial at APSIPA-ASC 2020)

Tensor Representations in Signal Processing and Machine Learning Tatsuya Yokota
Tutorial in APSIPA-ASC 2020

2 Organization of This Tutorial Basics Applications Advances This tutorial
consists of 3 parts. Each part may have 45 mins for presentation + 5-10 mins for Q&A + break

3 3 Introduction What & Why tensor? Basics

4 Variables having indices Generalization of scalars, vectors, and matrices
Ex: Number of indices : “Order” Scalar : 0th order tensor Vector : 1st order tensor Matrices : 2nd order tensor … Tensors

5 Multi-way array 0th order tensors : 1st order tensors
: 2nd order tensors : 3rd order tensors : Tensor data Vector → array of scalars Matrix → array of vectors 3rd order Tensor → array of matrices

6 Let us imagine 3rd order tensor by a “cube”
4th order tensor A block 3rd order tensor of which elements are vectors A block matrix of which elements are matrices A block vector of which elements are 3rd order tensors 5th order tensor A block 4th order tensor of which elements are vectors A block 3rd order tensor of which elements are matrices A block matrix of which elements are 3rd order tensors A block vector of which elements are 4th order tensors Nth order tensor A block vector of which elements are (N-1)th order tensors Recursive representations of Nth order tensor ・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・ Block matrix Block matrix Block vector Block vector

7 Tensor representation has benefits when tensors have “low-rank structures”
or “smooth structures” Many applications Dimensionality Reduction (Compression), Pattern Recognition, Blind Source Separation, Signal Restoration, Deep Neural Networks, etc. Why tensors

8 Psychometrics (1960s-) Linguistics (1970s-) Chemometrics (1980s-) Food industry (1990s-)
Computer Science / Data Engineering (1990s-) Multi-media signal processing (audio, speech, image, video) Bio-medical signal processing (EEG, MEG, functional MRI, diffusion MRI) Bio-informatics (gene, protein) Remote sensing (Hyperspectral image) Web/data mining (SNS analysis, recommender system) Non-linear time series analysis Wireless communications Deep learning etc Fields of applications ID 001 ID 002 ID 003 ID 004

9 “Tensor Component Analysis” by Yipeng Liu in APSIPA ASC
2019 “Cause-and-Effect in a Tensor Framework” by L. De Lathauwer, M. Alex, O. Vasilescu, and J. Kossaifi in CVPR 2019 “Tensor Factorizations, Data Fusion & Applications” by E. Acar in LVA/ICA 2018 “Blind Audio Source Separation on Tensor Representation” by H. Sawada, N. Ono, H. Kameoka, D. Kitamura in ICASSP 2018 “Tensor Decomposition for Signal Processing and Machine Learning” by N.D. Sidiropoulos, L. De Lathauwer, and X. Fu, E.E. Papalexakis in ICASSP 2017 “A New Tensor Algebra” by L. Horesh and M. Kilmer in CVPR 2017 Related Tutorials / Workshops

10 “Tensor Decompositions for Signal Processing Applications: From Two-Way to
Multiway Component Analysis” has been awarded in ICASSP 2019. SPM Best Paper Award

11 Intermediate Summary Tensor is a multi dimensional array Generalization
of vectors, matrices … Many data can be represented by using tensors Tensor processing have many applications. Tensor processing have attracted attentions Workshops/Tutorials Awards So, lets learn about tensors!!

12 12 Basic methods for handling tensors Part I :
Basics

13 Scalar variables lowercase / uppercase letter : Vector variables
Lowercase bold letter : Matrix variables Uppercase bold letter : High order tensor variables (3rd, 4th, …, Nth ) Calligraphic bold letter : Mathematical notations

14 Here, we call axes of a tensor as “modes”
Modes of tensor 1st mode 2nd mode 3rd mode

15 A value of (i, j, k) element of a
tensor is denoted by Elements of a tensor

16 Fibers of a tensor are denoted by using “
: ” Fibers of a tensor Fiber along the 1st mode Fiber along the 2nd mode Fiber along the 3rd mode

17 Slices of a tensor are also denoted by using
“ : ” Slices of a tensor Slice of the 1st mode Slice of the 2nd mode Slice of the 3rd mode

18 Matrix transposition unique Switch the 1st and 2nd modes
Transposition / Mode permutation Tensor transposition several choices permute selected modes A case of 3rd order tensor

19 Vectorization is a operation that extracts all fibers from
a tensor and constructs a very long vector by concatenating all fibers. Vectorization

20 Note the order of elements of a vector transformed
from a tensor Vectorization (cont’d)

21 A motivation of vectorization It is easier to understand
some functions which does not care tensor or not l2 norm of a tensor (Frobenius norm) Inner product between two tensors

22 n-th mode unfolding is an operation that extracts all
fivers along the n-th mode and constructs a horizontally long matrix by concatenation. Unfolding / Matricization

23 Meaning of “n-th mode unfolding” We should carefully understand
the meaning of Ex) 3rd mode unfolding of 5th order tensor n-th mode unfolding unfold all modes except the n-th mode keep the 3rd mode unfold the 1st, 2nd, 4th, 5th modes

24 A motivation of n-th mode unfolding n-th mode unfolding
is useful to see the linear characteristics of tensor from each mode For example, Tucker rank or multilinear rank of tensor can be defined by using matrix rank of n-th mode unfolding

25 Folding / Tensorization Vectorization and matricization are generalized as
“unfolding” It sometimes help us to understand the linear characteristics of tensors On the other hand, “folding” can be considered, too. unfolding folding

26 2 An example of folding : vector to matrix
Folding can be considered as a “grouping” elements Ex: triplet is extracted for each column vector, not unique How to consider which folding is better? ➔ low rankness can be one criterion 9 6 3 6 4 2 3 2 1 3 2 1 6 4 2 9 6 3 3 2 1 6 3 4 6 9 Folding 1 Folding 2 3 2 1 1 2 3 × 0 0 1 1 4 9 × 2 3 6 1 1 0 × + rank is 1 rank is 2

27 An interesting tensor representation of images Image data can
be naturally represented by matrix, but folding have some benefits for compression (256,256) Low rank approximation #param = 2048 #param = 2048 16th order tensor (2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2) Low rank approximation Folding

28 Higher order tensor representations of images More beneficial tensor
representation? (256,256) #param = 2048 #param = 2048 (2,2,2,2,2,2,2,2) (2,2,2,2,2,2,2,2) 16th order tensor 8th order tensor Folding 1 Folding 2 (4,4,4,4,4,4,4,4) Low rank approximation Low rank approximation

29 Vectorization and matricization are special cases of unfolding/folding Folding
and unfolding are linear operations in general Folding / unfolding folding folding unfolding unfolding folding unfolding ��

30 30 Calculation of Tensors Part I : Basics

31 Addition of two tensors ➔ element-wise addition Scaling A
tensor multiplied by a scalar Addition & Scaling + = + = = = * * Image of addition Image of scaling

32 Elementwise multiplication (Adamard product) and division is denoted by
circled asterisk and circled slack , respectively. Elementwise multiplication / division = = Image of Adamard product * = = Image of elementwise division / ◦ * ／ ◦

33 Inner product of two vectors is a summation of
all elements of their elementwise (Adamard) product. It can be naturally generalized that “inner product of two vectors obtained by vectorising two tensors”. Inner product * * * * … * sum * * * * … * sum vec vec

34 Outer product of two vectors outputs a matrix Outer
product of 3 vectors outputs a 3rd order tensor Outer product = =

35 Kronecker product outputs a large matrix from two small
matrices. Kronecker product ◦ × = * * * * * * =

36 Kronecker product and outer product are essentially same. Kronecker
product (cont’d) ◦ × = * * * * * * Vectorize each * * * * * * Vectorize reshaping =

37 Khatri-Rao product is defined between two matrices of which
number of columns is same. Khatri-Rao product = * * * * * * =

38 Matrix product is defined between two matrices of which
number of columns/rows is same. Matrix product is an combination of inner product and outer product Matrix product = + + = inner product w.r.t r outer product w.r.t i and j = + +

39 Matrix product is defined between two matrices of which
number of columns/rows is same. Matrix product is an combination of inner product and outer product Khatri-Rao product is a sub-part of matrix product. Matrix product (cont’d) = + + = = + + = * * * * * * = sum fold

40 Tensor-Matrix product Tensor-Matrix product is defined between a tensor
and a matrix of which the size of some mode is same. inner product w.r.t r outer product w.r.t i and (j,k) = = fold / unfold fold / unfold

41 Tensor-Matrix product (cont’d) Tensor-Matrix product is defined between a
tensor and a matrix of which the size of some mode is same. It is also called as nth-mode product. Easier to understand through nth-mode unfolding. inner prod nth mode prod

42 Tensor-Matrix product (cont’d) all modes An Nth order tensor
multiplied by N matrices from all modes inner prod

43 Tensor-Matrix product (cont’d) Kronecker product helps to understand it
as linear transformation of tensor = vec =

44 We denote a single tensor as following diagram (it
is very helpful) Using a node and edges, the number of edges means order of a tensor Inner product of tensors can be denoted by the link of all edges Tensor Diagram vector matrix 3rd order tensor 4th order tensor ・・・・・・・・・・・・

45 Matrix multiplication General tensor multiplication The tensor multiplication is
essentially connecting the edges (modes) of tensors. There is some freedom in how to connect them. Tensor Diagram (cont’d) (I×J) (I×R)(R×J) R I J I J = I J K I J K I J K I J K J K I J K I J K I I J K

46 Summary of Part I Important concepts in tensor processing
are explained Notations Fiber / Slice Unfolding (vectorization, matricization etc) Inner product Outer product Special products (Kronecker, Khatri-Rao etc) General tensor multiplication Tensor diagram I think that it is nice to geometrically imagine how the tensors are unfolded, folded, transformed, and etc. So, this time, I tried to explain them using figure. Q&A

47 47 Part II : Applications

48 Outline of Part II Tensor decompositions CP decomposition Tucker
decomposition TT decomposition Others Applications Dimensionality Reduction Pattern Recognition Blind Source Separation Signal Restoration Deep Neural Networks

49 Tensor decomposition is a higher order generalization of matrix
decomposition General definition may be “a tensor represented by multiple tensors” A specific decomposition is given by a parametric model f and its parameter tensors. f( , , ) Tensor decomposition = Matrix decomposition = core tensor factor matrix

50 + Canonical polyadic (CP) decomposition Canonical Polyadic (CP) decomposition/model
is a natural extension of low- rank matrix decomposition in terms of Khatri-Rao product. = = ◦ ◦ + ◦ = + + ◦ ◦ ◦ unfold extend fold CP decomposition Low-rank matrix decomposition extend

51 Optimization problem is as follow Alternating Least Squares (ALS)
algorithm is typical and widely used. Alternately update each factor matrix with a least squares solution while freezing the other factors An algorithm for CP decomposition Initialize:

52 Least square rule for update in CP Unfolding the
cost function Its derivative Seeking note

53 Tucker decomposition Tucker decomposition/model is a natural extension of
low-rank matrix decomposition in terms of Kronecker product. unfold extend fold = Tucker decomposition Low-rank matrix decomposition extend =

54 Optimization problem is as follow First, we unfold cost
function Partial differential with g Parameter g can be eliminated An algorithm for Tucker decomposition note

55 Optimization problem is as follow An algorithm for Tucker
decomposition (cont’d)

56 Optimization problem is as follow Alternating Least Squares (ALS)
algorithm is typical and widely used. Alternately update each factor matrix with a least squares solution while freezing the other factors An algorithm for Tucker decomposition (cont’d) Initialize:

57 Least square rule for update in Tucker Unfolding the
cost function Since the constraint of orthogonality, we consider Lagrange function as follow Seeking note Lagrange multiplier

58 Tensor-Train (TT) decomposition Tensor-Train (TT) decomposition [Oseledets, 2011] is
a relatively emergent model It have a benefit of extremely light parameterization for high order huge tensors Elements of tensor can be calculated by multiplication of slices/fibers of core tensors = TT decomposition

59 An sequential algorithm for TT-decomposition TT can be obtained
by truncated SVD and unfolding TT-SVD

60 ALS update for TT-decomposition ⓪ TT-SVD diagonal matrix left
orthogonal right orthogonal ① SVD ③ SVD ④ SVD ⑤ SVD ⑥ SVD ⑦ SVD ② SVD ⑧ SVD ⑨ SVD ⑩ SVD

61 Tensor-Ring (TR) decomposition and others Tensor-Ring (TR) decomposition is
a generalization of TT-decomposition Also many other networks can be considered. Now a day, tensor decompositions (including CP, Tucker etc) are called as Tensor Netoworks in general. = CP Tucker Hierarchical Tucker (HT) TT Projected Entangled-Pair States (PEPS)

62 62 Part II : Applications Applications of tensor decomposition

63 Applications Dimensionality Reduction Pattern Recognition Blind Source Separation Signal
Restoration Deep Neural Networks

64 In general, the number of elements/parameters to describe the
tensor data grows exponentially with the tensor order, N. Tensor decomposition works to reduce the number of parameters. For example, CP decomposition with I=10, R = 10 Dimensionality reduction Number of param. Original Tensor CP (R=10) N=3 1,000 3,00 N=4 10,000 4,00 N=5 100,000 5,00 N=6 1,000,000 6,00

65 A simple experiment: image reconstruction with smaller #parameters Dimensionality
reduction (cont’d) 128 128 128 Unfold & SVD = 128 128*128 R CP decomposition + … + + ◦ ◦ ◦ R Tucker decomposition (R,R,R)

66 Reconstructed MR images Dimensionality reduction (cont’d) CP Tucker SVD
R=1 R=10 R=100 R=200 R=400 R=800 R=1600 R=(1,1,1) R=1 R=5 R=10 R=20 R=40 R=70 R=100 R=(10,10,10) R=(30,30,30) R=(40,40,40) R=(50,50,50) R=(80,80,80) R=(120,120,120)

67 SNR vs #parameters Dimensionality reduction (cont’d) SVD, R=10 Tucker,
R=[50, 50, 50] CP, R=400 SVD, R=10 CP, R=400 Tucker R=[50, 50, 50]

68 Advances in dimensionality reduction Sparse coding of tensor [Caiafa+,
2013 (in NECO)] [Caiafa+, 2013 (in WIRES)] Tensor networks (TNs) for dimensionality reduction [Zhao+, 2016] [Cichocki+, 2017] Genetic algorithm for TN model selection [Li+, 2020 (in ICML)]

69 Pattern recognition Linear Discriminant Analysis (LDA) is a typical
method to obtain the best linear projection which can separate different categorized patterns. … Linear projection maximize minimize

70 Pattern recognition (cont’d) LDA can be naturally extended by
using multi-linear projection instead of linear projection. Multiway LDA … Linear projection maximize minimize … Multi-linear projection maximize minimize (N+1)-th order tensor

71 Pattern recognition (cont’d) LDA has a closed form solution
using Eigen Value Decomposition (EVD) Multi-way LDA is obtained by alternating optimization Eigenvectors of Update by EVD for n = 1, …, N repeat endfor until converge

72 Human face identification [Ye+, 2005 (in NeurIPS)] Pattern recognition
(cont’d) 112 92 ORL faces 10^4 … 10^4 10^4  In the case of LDA, EVD of a large matrix is necessary ☺ In the case of (Multi-way) 2DLDA, the matrix is much smaller L 112 92 L Obtain EVD 112 112 92 92 Update EVD Update EVD

73 Human face identification [Ye+, 2005 (in NeurIPS)] Two stage
feature extraction by using 2DLDA + LDA is proposed Running time of 2DLDA was much shorter than LDA Accuracy of 2DLDA+LDA was slightly better than LDA Pattern recognition (cont’d) 112 92 10 10 vec 100 40 Category Nearest Neighbor

74 Gate recognition [Tao+, 2007 (in TPAMI)] Gate image is
represented as a 4th order tensor by convolving Gabor functions The Gabor Tensor is discriminated by (multi-way) Generalized LDA Pattern recognition (cont’d) 5×8 Gabor functions Category vec GLDA LDA Some simple classifier

75 Blind source separation (BSS) is a method to estimate
original sources from multiple mixture signals Blind source separation

76 BSS on Tensor Representation [Sawada+, Tutorial in ICASSP2018] Independent
low-rank matrix analysis (ILRMA) [Kitamura+, 2016] Blind source separation (cont’d) channel time channel time frequency Tensor representation by STFT for each channel source = Demixing matrices for each frequency frequency ≈ .^2 ② Nonnegative CP decomp. of power tensor ◦ ① Reconstructed tensor is parameterized by using demixing matrices

77 Tensorizing BSS problem using “Segmentation” [Bousse+, 2017 (in IEEE-TSP)]
Segmentation is one of the way of tensorization (folding) This strategy is nice in case of the sources having low-rank structure in matrix/tensor representation Blind source separation (cont’d) tensor representation

78 It is a technique to recover the corruption of
signals Signal restoration Noise reduction Deconvolution Inpainting Super-resolution

79 Signal restoration (cont’d) Tensor decomposition can be used for
recovering signals which have low- rank structure = Consistent Tensor representation f( , , ) CP decomposition Tucker decomposition TT decomposition etc

80 Signal restoration (cont’d) Low-rank tensor completion Corrupted image is
directly represented as a 3rd order tensor (Height * Width * RGB) Smooth CP decomposition [Yokota+, 2016 (in IEEE-TSP)] = ・・・ height width rgb height width rgb Consistent + + ◦ ◦ ◦ ・・・ + + + + = smooth components

81 Signal restoration (cont’d) Non-local filtering for image denoising BM3D
[Dabov+, 2007 (in IEEE-TIP)] GSR [Zhang+, 2014 (in IEEE-TIP)] Truncated HOSVD [Yokota+, 2017 (in IEEE-TSP)] http://www.cs.tut.fi/~foi/GCF-BM3D/ ① Extracting similar patches and represent it as a tensor ② The tensor is approximated by low-rank tensor to reduce noise component

82 Signal restoration (cont’d) There are many studies of low-rank
based tensor data recovery Low-n-rank tensor recovery [Gandy+, 2011 (in Iverse Problems)] Low rank tensor completion [Liu+, 2012 (in IEEE-TPAMI)] Bayesian CP decomposition [Zhao+, 2015 (in IEEE-TPAMI)] t-SVD [Zhang+, 2017 (in IEEE-TSP)] Low-rank and smooth tensor recovery [Yokota+, 2017 (in CVPR)] [Yokota+, 2019 (in IEEE-Access)] Tensor ring completion [Wang+, 2017 (in ICCV)]

83 Currently, DNNs are playing central roles in various signal
and information processing studies. Especially, convolutional neural networks (CNNs) is widely employed for its efficiency. Deep neural networks (DNNs) Recognition Segmentation Enhancement Restoration Synthesis for Audio Speech Language Image Bio-signal Input Convolution Activation Convolution Activation … Output Convolution Convolution Activation

84 , 2D convolutional layer in CNNs is a mapping
from the 3rd order tensor space to another 3rd order tensor space Its parameter is a 4th order tensor DNNs (cont’d) height width channels height width channels , , conv layer

85 Tensor decomposition in DNNs Convolutional layer [Lebedev+, 2015 (in
ICLR)] Fully connected layer [Novikov+, 2015 (in NeurIPS)] DNNs (cont’d) , , , Fold / Tensorize = = TT decomposition CP decomposition ◦ ◦ ◦

86 Tensor decomposition in DNNs Convolutional layer [Lebedev+, 2015 (in
ICLR)] Full convolution (4th order tensor) is approximated by CP model It becomes sequential convolutions with fibers Experiments: evaluated in image recognition (largely speedup with few accuracy drop) DNNs (cont’d) , , , CP decomposition ◦ ◦ ◦ (9,9,48,128) rank

87 Tensor decomposition in DNNs Fully connected (FC) layer [Novikov+,
2015 (in NeurIPS)] Experiments: 1000 class ImageNet using VGG16 and VGG19 There are three FC layers DNNs (cont’d) = = TT decomposition FC1 (25088, 4096) FC2 (4096, 4096) FC3 (4096, 1000) 25088 = 2*7*8*8*7*4 4096 = 4*4*4*4*4*4 No compression ➔ Using TT TT-rank ➔ was 1, 2, or 4 Matrix decomp matrix rank ➔ was 1, 5, or 50 Efficient compression with few degradation!!

88 DNNs (cont’d) Other references in machine learning Compression of
CNN with Tucker [Kim+, 2016 (in ICLR)] Supervised Learning with Tensor Networks [Stoudenmire+, 2016 (in NeurIPS)] Exponential Machines [Nivikov+, 2017 (in ICLR workshop)] Compact RNNs with Block-term Tensor Decomposition [Ye+, 2018 (in CVPR)] Compressing RNNs with Tensor Ring [Pan+, 2019 (in AAAI)] Tensor network decomposition of CNN kernels [Hayashi+, 2019 (in NeurIPS)] For understanding DNNs [Li+, 2019 (in ICASSP)] [Li+, 2020 (in AISTATS)]

89 Summary of Part II Tensor decomposition has various applications
A common concept is “representing signals by high-order tensors which has low-rank structure” and do the tensor decomposition Direct representation Multi-scale Gabor representation Time-frequency representation Similar patch grouping High-order tensorization Other perspectives in tensor studies: decomposition models, optimization algorithm, model selection, rank estimation, etc signals tensors decomposition Dimensionality Reduction Pattern Recognition Blind Source Separation Signal Restoration Deep Neural Networks

90 Some Discussions Main flow was Tensorization + Decomposition However,
in more strict sense we need to consider how to select the model for decomposition Also it is also important how to efficiently obtain the decomposition in optimization Model selection Usually, model of tensor decomposition is manually selected by researchers / users, however, the there are various candidates of the tensor decomposition nowadays. It is very difficult task to determine the model from given data. Usually it requires some professional knowledge of both application and tensor fields. Evolutionary Topology Search for Tensor Networks [Li+, 2020 (in ICML)] Optimization It is quite important to study how to efficiently solve the complicated optimization problems to perform tensor (network) decompositions Many methods for optimization: Gradient method, Newton method, alternating least square, hierarchical alternating least squares, stochastic gradient … Randomized SVD [Minster+, 2020] [Ahmadi-Asl+, 2020] TensorSketch [Malik+, 2018 (in NeurIPS)] GPU acceleration [Choi+, 2018]

91 91 Part III : Advances

92 Introduction Till now, I showed tensor operations, decompositions, and
its applications A common concept was “tensorize” and “decompose” A question here is “how to tensorize the signals” Generally, some specific knowledge on the signals is necessary to design the tensors Here, I introduce the multi-way Hankelization / delay-embedding for tensors as an advanced topic on the tensor processing. signals tensors decomposition here

93 Outline Hankelization / delay-embedding for time-series signals Multi-way Hankelization
/ delay-embedding for tensors Applications Open problems and perspectives

94 Limitation of direct representation Revisiting signal restoration using tensor
decomposition When whole slices are missing in a tensor, low-rank tensor decomposition does not work Random pixel missing can be recovered, but slice can not be recovered. Since slice missing deflates the matrix/tensor rank, its deflated component can not be recovered. Slices & random missing Recovered by low-rank decomposition

95 Approach to use Hankel representation Here, we introduce to
use low-rank tensor decomposition not in direct tensor space, but in embedded (Hankel) tensor space. height width rgb Low-rank tensor completion Delay embedding / Hankelization High order incomplete tensor Low-rank tensor completion Inverse operation output High order completed tensor  ☺

96 Here, I introduce a technique from dynamical system analysis.
For a time series: for We consider a state space : for It is called as “Delay-embedding (DE)” What is delay-embedding (DE)

97 DE is a transformation from a vector to a
Hankel matrix. (i.e. Hankelization) Hankel matrix : a matrix of which anti-diagonal elements are the same Delay-embedding (DE) is Hankelization

98 DE can be regarded as a linear operation. DE
= duplication & unfolding

99 Inverse of delay-embedding (DE) Can we define the inverse
of DE? There is no inversion of DE since it is essentially a duplication We only consider the pseudo-inverse of DE folding unfolding duplicate … average … average duplicate …  ☺ ?

100 SVD of Hankel matrix Seeing singular values of Hankel
matrix with various value of Hankel matrix has a low-rank structure 3d-plot of DE(x,τ=3) and its principal patterns Order of singular values

101 SVD of Hankel matrix (cont’d) Seeing left- and right-
singular vectors of Hankel matrix Left singular vectors are very similar to discrete cosine transform (DCT) basis

102 Time-series recovery by using truncated SVD missing&noisy D.E. missing&noisy
Hankel matrix low rank approximation Inverse D.E. recovered Completion

103 How to extend it to tensors? vector DE matrix
DE ? tensor

104 We have three choices of delay-embedding for matrix data
(a) Delay-embedding only column direction (b) Delay-embedding only row direction (c) Delay-embedding both directions Delay-embedding for matrix … … (a) … … … (b) (c) … … … … … … … … 3rd order tensor 4th order tensor

105 Considering “both” case for matrix delay-embedding Each column is
Hankelized by DE It obtains time-series of Hankel matrices Further DE obtains “block Hankel matrix” Block Hankel matrix is defined as a block matrix which has Hankel structure and its all elements are also Hankel matrices Block Hankel structure … … … … … … … … … … DE DE a b c d z A B C D A B C B C D C D E Z Z

106 DE = linear duplication + folding Multiway delay-embedding =
multilinear duplication + folding We call it as “Multiway delay-embedding transform (MDT)” MDT: A generalization of DE for tensors = I τ*(I-τ+1) I-τ+1 τ = 3rd order tensor (I1, I2, I3) 6th order tensor

107 DE vs Inverse DE MDT vs Inverse MDT Inverse
MDT 6th order tensor † † † = 3rd order tensor (I1, I2, I3) 6th order tensor (I1, I2, I3) 3rd order tensor

108 MDT of Image and its high order SVD Seeing
the components of Hankel tensor [Yokota+, 2018 (in APSIPA)] Very similar to DCT Bases

109 Application to Tensor Completion Incomplete tensor Incomplete block Hankel
tensor Complete block Hankel tensor Complete tensor Low-rank Tucker decomposition in embedded space [Yokota+, 2018 (in CVPR)] ① MDT of given incomplete tensor of original size (Nth order ➔ 2Nth order) ② Tucker decomposition of incomplete block Hankel tensor ③ Inverse MDT of comlete block Hankel tensor (2Nth order ➔ Nth order) MDT Inv. MDT Tucker Decomp.

110 Tucker decomposition of complete tensor was explained before Here,
I explain the method for Tucker decomposition of incomplete tensor Optimization problem can be given by It can be solved by auxiliary function approach and ALS algorithm Update Tucker decomposition of an incomplete tensor : Incomplete tensor : binary tensor of same size to → 0 represents missing entry → 1 represents observed entry ALS to minimize alternative optimization To convergence ↑ it is same as Tucker for complete tensor

111 (256,256,3) (32,225,32,225,3) (R1,R2,R3,R4,3) (256,256,3) Experimental results (32,32) → (R1,R3)
(225,225) → (R2,R4) 16 32 64 4 8 Slice missing was recovered with appropriate rank decomposition ➔ But the best rank is unknown

112 Adaptive rank increment algorithm To be free from the
rank estimation problem in tensor completion, we propose to use rank increment algorithm

113 ・・・・・・ Initialization is very important !! (32,225,32,225,3) Experimental
results

114 95% 99%

115 Recovery of functional MRI data

116 116 Video reconstruction

117 Other works Here, I showed an original work of
MDT Tucker decomposition with MDT [Yokota+, 2018 (in CVPR)] We are extending it to others TT decomposition with MDT [Sedighin+, 2020 (in IEEE-SPL)] Time series analysis with MDT [Shi+, 2020 (in AAAI)] Manifold modeling with MDT [Yokota+, 2020 (in IEEE-TNNLS)] MDT Inv. MDT + ◦ ◦

118 Relation between Hankelization and Convolution Actually, Hankelization is a
part of convolution Hankelization can be applied to incomplete data ☺ Convolution (STFT, Wavelet) can not be applied to incomplete data  convolution Hankel matrix

119 Open problems How to decide the value of Some
criteria would be helpful Computationally expensive The number of elements of the Hankel tensor dramatically increases w.r.t. Ex: we consider MDT of (128,128,128)-tensor small large DE 128 128 128 τ Size of Hankel tensor Memory requirement 8 (8, 121, 8, 121, 8, 121) 約7GB 16 (16, 113, 16, 113, 16, 113) 約47GB 32 (32, 97, 32, 97, 32, 97) 約239GB 64 (64, 65, 64, 65, 64, 65) 約575GB

120 Other perspectives More applications would be interesting Network data
analysis, Medical imaging, Hyperspectral images, etc Hankel representation in analysis of CNNs Convolutional sparse coding [Papyan+, 2017] Neural tangent denoiser [Tachella+, 2020] Dual convex formulation of CNNs [Anonymous, 2020]

121 Closing tutorial I’d like to show my appreciation to
organizers of APSIPA ASC 2020 and all participants!! This tutorial is an intro. of tensor methods in signal processing and machine learning. I tried to avoid cutting-edge difficult content as much as possible and make the tutorial friendly to students and beginners. Community of tensor studies is growing now, and all the people are welcome to this fields having any interest on theory, algorithm, software development, and applications!! Basics Applications Advances * * * * … * = + + ◦ ◦ ◦ = … = + +

122 Acknowledgements Andrzej Cichocki (Skoltech) Qibin Zhao (RIKEN AIP) Chao
Li (RIKEN AIP) Cesar F. Caiafa (CONICET) Hidekata Hontani (Nitech) Burak Erem (TrueMotion) Simon K. Warfield (Harvard)

Tensor Representations in Signal Processing and Machine Learning (Tutorial at APSIPA-ASC 2020)

Tensor Representations in Signal Processing and Machine Learning (Tutorial at APSIPA-ASC 2020)

More Decks by Tatsuya Yokota

Other Decks in Science

Featured

Transcript