6. REFERENCES
[1] Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis,
Jort F Gemmeke, Aren Jansen, R Channing Moore,
Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Sey-
bold, et al., “Cnn architectures for large-scale audio
classification,” in Acoustics, Speech and Signal Process-
ing (ICASSP), 2017 IEEE International Conference on.
IEEE, 2017, pp. 131–135.
[2] Annamaria Mesaros, Toni Heittola, and Tuomas Virta-
nen, “A multi-device dataset for urban acoustic scene
classification,” Submitted to DCASE2018 Workshop,
2018.
[3] Yuma Sakashita and Masaki Aono, “Acoustic scene
classification by ensemble of spectrograms based on
adaptive temporal divisions,” Tech. Rep., DCASE2018
Challenge, September 2018.
[4] Matthias Dorfer, Bernhard Lehner, Hamid Eghbal-
zadeh, Heindl Christop, Paischer Fabian, and Widmer
Gerhard, “Acoustic scene classification with fully con-
volutional neural networks and I-vectors,” Tech. Rep.,
DCASE2018 Challenge, September 2018.
[5] Hossein Zeinali, Lukas Burget, and Honza Cernocky,
“Convolutional neural networks and x-vector embed-
ding for dcase2018 acoustic scene classification chal-
lenge,” Tech. Rep., DCASE2018 Challenge, September
2018.
[6] Karol J Piczak, “Esc: Dataset for environmental sound
classification,” in Proceedings of the 23rd ACM in-
ternational conference on Multimedia. ACM, 2015, pp.
1015–1018.
[7] Hardik B Sailor, Dharmesh M Agrawal, and Hemant A
Patil, “Unsupervised filterbank learning using convolu-
tional restricted boltzmann machine for environmental
sound classification,” Proc. Interspeech 2017, pp. 3107–
3111, 2017.
[8] Yuji Tokozume, Yoshitaka Ushiku, and Tatsuya Harada,
“Learning from between-class examples for deep sound
recognition,” in International Conference on Learning
Representations, 2018.
[9] Anurag Kumar, Maksim Khadkevich, and Christian
F�gen, “Knowledge transfer from weakly labeled au-
dio using convolutional neural network for sound events
and scenes,” in 2018 IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP).
IEEE, 2018, pp. 326–330.
[10] Rishabh N Tak, Dharmesh M Agrawal, and Hemant A
Patil, “Novel phase encoded mel filterbank energies
for environmental sound classification,” in International
Conference on Pattern Recognition and Machine Intel-
ligence. Springer, 2017, pp. 317–325.
[11] Michael Deisher; Andrzej Polonski, “Implementation
of efficient, low power deep neural networks on next-
generation intel client platforms,” IEEE SigPort, 2017.
[12] Mircea Horea Ionica and David Gregg, “The movid-
ius myriad architecture’s potential for scientific comput-
ing,” IEEE Micro, vol. 35, no. 1, pp. 6–14, 2015.
[13] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and
David Lopez-Paz, “mixup: Beyond empirical risk min-
imization,” arXiv preprint arXiv:1710.09412, 2017.
[14] Yusuf Aytar, Carl Vondrick, and Antonio Torralba,
“Soundnet: Learning sound representations from unla-
beled video,” in Advances in Neural Information Pro-
cessing Systems, 2016, pp. 892–900.
[15] Wei Dai, Chia Dai, Shuhui Qu, Juncheng Li, and Samar-
jit Das, “Very deep convolutional neural networks for
raw waveforms,” 2017 IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP),
pp. 421–425, 2017.
[16] Yuji Tokozume and Tatsuya Harada, “Learning envi-
ronmental sounds with end-to-end convolutional neural
network,” in Acoustics, Speech and Signal Process-
ing (ICASSP), 2017 IEEE International Conference on.
IEEE, 2017, pp. 2721–2725.
[17] Song Han, Huizi Mao, and William J Dally, “Deep com-
pression: Compressing deep neural networks with prun-
ing, trained quantization and huffman coding,” arXiv
preprint arXiv:1510.00149, 2015.
[18] Jian Xue, Jinyu Li, Dong Yu, Mike Seltzer, and Yi-
fan Gong, “Singular value decomposition based low-
footprint speaker adaptation and personalization for
deep neural network,” in Acoustics, Speech and Signal
Processing (ICASSP), 2014 IEEE International Confer-
ence on. IEEE, 2014, pp. 6359–6363.
[19] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry
Kalenichenko, Weijun Wang, Tobias Weyand, Marco
Andreetto, and Hartwig Adam, “Mobilenets: Efficient
convolutional neural networks for mobile vision appli-
cations,” arXiv preprint arXiv:1704.04861, 2017.
[20] Karen Simonyan and Andrew Zisserman, “Very deep
convolutional networks for large-scale image recogni-
tion,” arXiv preprint arXiv:1409.1556, 2014.
[21] Ankit Shah, Anurag Kumar, Alexander G. Hauptmann,
and Bhiksha Raj, “A closer look at weak label learning
for audio events,” CoRR, vol. abs/1804.09288, 2018.