A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system. The model provides a unified multiresolution representation of the spectral and temporal features likely critical in the perception of sound. Simplified, more specifically tailored versions of this model have already been validated by successful application in the assessment of speech intelligibility [Elhilali et al, Speech Commun.41(2-3), 331348 (2003); Chi et al, J. Acoust. Soc. Am.106, 27192732 (1999)] and in explaining the perception of monaural phase sensitivity [R. Carlyon and S. Shamma, J. Acoust. Soc. Am.114, 333348 (2003)]. Here we provide a more complete mathematical formulation of the model, illustrating how complex signals are transformed through various stages of the model, and relating it to comparable existing models of auditory processing. Furthermore, we outline several reconstruction algorithms to resynthesize the sound from the model output so as to evaluate the fidelity of the representation and contribution of different features and cues to the sound percept.

1.
Akansu
,
A. N.
, and
Haddad
,
R. A.
(
1992
).
Multiresolution Signal Decomposition
Academic
, Boston.
2.
Amagai
,
S.
,
Dooling
,
R.
,
Shamma
,
S.
,
Kidd
,
T.
, and
Lohr
,
B.
(
1999
). “
Detection of modulation in spectral envelopes and linear-rippled noises by budgerigars
,”
J. Acoust. Soc. Am.
105
,
2029
2035
.
3.
Arai
,
T.
,
Pavel
,
M.
,
Hermansky
,
H.
, and
Avendano
,
C.
(
1996
). “
Intelligibility of speech with filtered time trajectories of spectral envelopes
,”
Proc. ICSLP
, pp.
2490
2492
.
4.
Atal
,
B. S.
(
1974
). “
Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identifucation and verification
,”
J. Acoust. Soc. Am.
55
,
1304
1312
.
5.
Atlas
,
L.
, and
Shamma
,
S.
(
2003
). “
Joint acoustic and modulation frequency
,”
EURASIP J. Appl. Signal Process.
7
,
668
675
.
6.
Bacon
,
S. P.
, and
Grantham
,
D. W.
(
1989
). “
Modulation masking: Effects of modulation frequency, depth, and phase
,”
J. Acoust. Soc. Am.
85
,
2575
2580
.
7.
Baer
,
T.
, and
Moore
,
B. C. J.
(
1993
). “
Effects of spectral smearing on the intelligibility of sentences in noise
,”
J. Acoust. Soc. Am.
94
,
1229
1241
.
8.
Bates
,
R. H. T.
(
1984
). “
Uniqueness of solutions to two-dimensional fourier phase problems for localized and positive images
,”
Comput. Vis. Graph. Image Process.
25
,
205
217
.
9.
Calhoun
,
B.
, and
Schreiner
,
C.
(
1995
). “
Spectral envelope coding in cat primary auditory cortex
,”
J. Aud. Neuroscie.
1
,
39
61
.
10.
Carlyon
,
R.
, and
Shamma
,
S.
(
2003
). “
An account of monaural phase sensitivity
,”
J. Acoust. Soc. Am.
114
,
333
348
.
11.
Carney
,
L. H.
(
1993
). “
A model for the responses of low-frequency auditory-nerve fibers in cat
,”
J. Acoust. Soc. Am.
93
,
401
417
.
12.
Chi
,
T.
(
2003
). “
Computational Spectro-temporal Auditory Model with Applications to Acoustical Information Processing
,” Ph.D. thesis,
University of Maryland
, College Park, MD.
13.
Chi
,
T.
,
Gao
,
Y.
,
Guyton
,
C. G.
,
Ru
,
P.
, and
Shamma
,
S.
(
1999
). “
Spectro-temporal modulation transfer functions and speech intelligibility
,”
J. Acoust. Soc. Am.
106
,
2719
2732
.
14.
Cohen
,
J. R.
(
1989
). “
Application of an auditory model to speech recognition
,”
J. Acoust. Soc. Am.
85
,
2623
2633
.
15.
Dau
,
T.
,
Kollmeier
,
B.
, and
Kohlrausch
,
A.
(
1997a
). “
Modeling auditory processing of amplitude modulation. i. detection and masking with narrow-band carriers
,”
J. Acoust. Soc. Am.
102
,
2892
2905
.
16.
Dau
,
T.
,
Kollmeier
,
B.
, and
Kohlrausch
,
A.
(
1997b
). “
Modeling auditory processing of amplitude modulation. ii. spectral and temporal integration
,”
J. Acoust. Soc. Am.
102
,
2906
2919
.
17.
Dau
,
T.
,
Puschel
,
D.
, and
Kohlrausch
,
A.
(
1996
). “
A quantitative model of the effective signal processing in the auditory system. I. Model structure
,”
J. Acoust. Soc. Am.
99
,
3615
3622
.
18.
deCharms
,
R. C.
,
Blake
,
D. T.
, and
Merzenich
,
M. M.
(
1998
). “
Optimizing sound features for cortical neurons
,”
Science
280
(
5368
),
1439
1443
.
19.
Depireux
,
D.
,
Simon
,
J.
,
Klein
,
D.
, and
Shamma
,
S.
(
2001
). “
Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex
,”
J. Neurophysiol.
85
(
3
),
1220
1234
.
20.
deRibaupierre
,
F.
, and
Rouiller
,
E.
(
1981
). “
Temporal coding of repetitive clicks: presence of rate selective units in the cat’s medial geniculate body (mgb)
,”
J. Physiol. (London)
318
,
23
24
.
21.
Drullman
,
R.
(
1995
). “
Temporal envelope and fine structure cues for speech intelligibility
,”
J. Acoust. Soc. Am.
97
,
585
592
.
22.
Drullman
,
R.
,
Festen
,
J.
, and
Plomp
,
R.
(
1994
). “
Effect of temporal envelope smearing on speech reception
,”
J. Acoust. Soc. Am.
95
,
1053
1064
.
23.
Edamatsu
,
H.
,
Kawasaki
,
M.
, and
Suga
,
N.
(
1989
). “
Distribution of combination-sensitive neurons in the ventral fringe area of the auditory cortex of the mustached bat
,”
J. Neurophysiol.
61
(
1
),
202
207
.
24.
Eggermont
,
J. J.
(
2002
). “
Temporal modulation transfer functions in cat primary auditory cortex: Separating stimulus effects from neural mechanisms
,”
J. Neurophysiol.
87
,
305
321
.
25.
Elhilali
,
M.
,
Chi
,
T.
, and
Shamma
,
S. A.
(
2003
). “
A spectro-temporal modulation index (stmi) for assessment of speech intelligibility
,”
Speech Commun.
41
(
2–3
),
331
348
.
26.
Elhilali
,
M.
,
Fritz
,
J. B.
,
Klein
,
D. J.
,
Simon
,
J. Z.
, and
Shamma
,
S. A.
(
2004
). “
Dynamics of precise spike timing in primary auditory cortex
,”
J. Neurosci.
24
(
5
),
1159
1172
.
27.
Ewert
,
S. D.
, and
Dau
,
T.
(
2000
). “
Characterizing frequency selectivity for envelope fluctuations
,”
J. Acoust. Soc. Am.
108
,
1181
1196
.
28.
Fienup
,
J. R.
(
1982
). “
Phase retrieval algorithms: a comparison
,”
Appl. Opt.
21
,
2758
2769
.
29.
Fienup
,
J. R.
, and
Wackerman
,
C. C.
(
1987
). “
Phase-retrieval stagnation problems and solutions
,”
J. Opt. Soc. Am. A
3
(
11
),
1897
1907
.
30.
Fu
,
Q.-J.
, and
Shannon
,
R. V.
(
2000
). “
Effect of stimulation rate on phoneme recognition by nucleus-22 cochlear implant listeners
,”
J. Acoust. Soc. Am.
107
,
589
597
.
31.
Gerchberg
,
R. W.
, and
Saxton
,
W. O.
(
1972
). “
A practical algorithm for the determination of phase from image and diffraction plane pictures
,”
Optik (Jena)
35
,
237
246
.
32.
Ghitza
,
O.
(
2001
). “
On the upper cutoff frequency of the auditory critical-band envelope detectors in the context of speech perception
,”
J. Acoust. Soc. Am.
110
,
1628
1640
.
33.
Green
,
D. M.
(
1986
). “
Frequency and the detection of spectral shape change
,” in
Auditory Frequency Selectivity
(
Plenum
, New York), pp.
351
359
.
34.
Greenberg
,
S.
, and
Kingsbury
,
B.
(
1997
). “
The modulation spectrogram: In pursuit of an invariant representation of speech
,” in
Proc. ICASSP
, pp.
1647
1650
.
35.
Greenberg
,
S.
,
Arai
,
T.
, and
Silipo
,
R.
(
1998
). “
Speech intelligibility derived from exceedingly sparse spectral information
,” in
Proc. of the Intl. Conf. on Spoken Language Processing
, Sydney, pp.
2803
2806
.
36.
Grimault
,
N.
,
Bacon
,
S. P.
, and
Micheyl
,
C.
(
2002
). “
Auditory stream segregation on the basis of amplitude-modulation rate
,”
J. Acoust. Soc. Am.
111
,
1340
1348
.
37.
Hansen
,
M.
, and
Kollmeier
,
B.
(
1999
). “
Continuous assessment of time-varying speech quality
,”
J. Acoust. Soc. Am.
106
,
2888
2899
.
38.
Hayes
,
M. H.
(
1982
). “
The reconstruction of a multidimensional sequence from the phase or magnitude of its fourier transform
,”
IEEE Trans. Acoust., Speech, Signal Process.
ASSP-30
(
2
),
140
154
.
39.
Hayes
,
M. H.
(
1987
). “
The unique reconstruction of multidimensional sequences from fourier transform magnitude or phase
,” in
Image Recovery: Theory and Application
, edited by
H.
Stark
(
Academic
, San Diego), pp.
195
230
.
40.
Hayes
,
M. H.
,
Lim
,
J. S.
, and
Oppenheim
,
A. V.
(
1980
). “
Signal reconstruction from phase or magnitude
,”
IEEE Trans. Acoust., Speech, Signal Process.
ASSP-28
(
6
),
672
680
.
41.
Hermansky
,
H.
, and
Morgan
,
N.
(
1994
). “
Rasta processing of speech
,”
IEEE Trans. Speech Audio Process.
2
(
4
),
578
589
.
42.
Houtgast
,
T.
(
1989
). “
Frequency selectivity in amplitude-modulation detection
,”
J. Acoust. Soc. Am.
85
(
4
),
1676
1680
.
43.
Houtgast
,
T.
,
Steeneken
,
H. J. M.
, and
Plomp
,
R.
(
1980
). “
Predicting speech intelligibility in rooms from the modulation transfer function. i. general room acoustics
,”
Acustica
46
,
60
72
.
44.
Irino
,
T.
, and
Kawahara
,
H.
(
1993
). “
Signal reconstruction from modified auditory wavelet transform
,”
IEEE Trans. Signal Process.
41
(
12
),
3549
3554
.
45.
ITU-T (
2001
). “
Perceptual evaluation of speech quality (pesq): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs
,” ITU-T Recommendation P.862, February.
46.
Jones
,
J. P.
, and
Palmer
,
L. A.
(
1987
). “
An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex
,”
J. Neurophysiol.
58
(
6
),
1233
1258
.
47.
Joris
,
P.
, and
Yin
,
T. C.
(
1992
). “
Responses to amplitude-modulated tones in the auditory nerve of the cat
,”
J. Acoust. Soc. Am.
91
,
215
232
.
48.
Klein
,
D. J.
,
Depireux
,
D. A.
,
Simon
,
J. Z.
, and
Shamma
,
S. A.
(
2000
). “
Robust spectro temporal reverse correlation for the auditory system: Optimizing stimulus design
,”
J. Comput. Neurosci.
9
,
85
111
.
49.
Kleinschmidt
,
M.
,
Tchorz
,
J.
, and
Kollmeier
,
B.
(
2001
). “
Combining speech enhancement and auditory feature extraction for robust speech recognition
,”
Speech Commun.
34
(
1–2
),
75
91
.
50.
Kowalski
,
N.
,
Depireux
,
D.
, and
Shamma
,
S. A.
(
1996
). “
Analysis of dynamic spectra in ferret primary auditory cortex: I. Characteristics of single unit responses to moving ripple spectra
,”
J. Neurophysiol.
76
(
5
),
3503
3523
.
51.
Kryter
,
K.
(
1962
). “
Methods for the calculation and use of the articulation index
,”
J. Acoust. Soc. Am.
34
,
1689
2147
.
52.
Langner
,
G.
(
1992
). “
Periodicity coding in the auditory system
,”
Hear. Res.
60
,
115
142
.
53.
Langner
,
G.
, and
Schreiner
,
C. E.
(
1988
). “
Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms
,”
J. Neurophysiol.
60
(
6
),
1799
1822
.
54.
Levi
,
A.
, and
Stark
,
H.
(
1983
). “
Signal restoration from phase by projections onto convex sets
,”
J. Opt. Soc. Am.
73
(
6
),
810
822
.
55.
Levi
,
A.
, and
Stark
,
H.
(
1984
). “
Image restoration by the method of generalized projections with application to restoration from magnitude
,”
J. Opt. Soc. Am. A
1
(
9
),
932
943
.
56.
Lu
,
T.
,
Liang
,
L.
, and
Wang
,
X.
(
2001
). “
Temporal and rate representations of time-varying signals in the auditory cortex of awake primates
,”
Nat. Neurosci.
11
,
1131
1138
.
57.
Lyon
,
R.
, and
Shamma
,
S.
(
1996
). “
Auditory representations of timbre and pitch
,” in
Auditory Computation
, edited by
H.
Hawkins
,
E. T.
McMullen
,
A.
Popper
, and
R.
Fay
(
Springer Verlag
, New York), pp.
221
270
.
58.
Meddis
,
R.
,
Hewitt
,
M. J.
, and
Shackleton
,
T. M.
(
1990
). “
Implementation details of a computation model of the inner hair-cell/auditory-nerve synapse
,”
J. Acoust. Soc. Am.
87
,
1813
1816
.
59.
Mesgarani
,
N.
, and
Shamma
,
S.
(
2005
). “
Speech enhancement based on filtering the spectrotemporal modulations
,” in
Proc. ICASSP
. Vol. 1, pp.
1105
1108
.
60.
Mesgarani
,
N.
,
Slaney
,
M.
, and
Shamma
,
S.
(
2004
). “
Discrimination of speech from non-speech based on multiscale spectro-temporal modulations
,”
IEEE Trans. Speech Audio Process.
(accepted for publication).
61.
Miller
,
L. M.
,
Escabi
,
M. A.
,
Read
,
H. L.
, and
Schreiner
,
C. E.
(
2002
). “
Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex
,”
J. Neurophysiol.
87
(
1
),
516
527
.
62.
Mou-yan
,
Z.
, and
Unbehauen
,
R.
(
1997
). “
Methods for reconstruction of 2-d sequences from fourier transform magnitude
,”
IEEE Trans. Image Process.
6
(
2
),
222
233
.
63.
Nelken
,
I.
, and
Versnel
,
H.
(
2000
). “
Responses to linear and logarithmic frequency-modulated sweeps in ferret primary auditory cortex
,”
Eur. J. Neurosci.
12
(
2
),
549
562
.
64.
Pan
,
D.
(
1995
). “
A tutorial on mpeg audio compression
,”
IEEE Multimedia
2
(
2
),
60
74
.
65.
Papoulis
,
A.
(
1975
). “
A new algorithm in spectral analysis and band-limited extrapolation
,”
IEEE Trans. Circuits Syst.
CAS-22
(
9
),
735
742
.
66.
Pfeiffer
,
R. R.
, and
Kim
,
D. O.
(
1975
). “
Cochlear nerve fiber responses: distributing along the cochlear partition
,”
J. Acoust. Soc. Am.
58
,
867
869
.
67.
Pitton
,
J. W.
,
Wang
,
K.
, and
Juang
,
B.-H.
(
1996
). “
Time-frequency analysis and auditory modeling for automatic recognition of speech
,”
Proc. IEEE
84
(
9
),
1199
1215
.
68.
Roberts
,
B.
,
Glasberg
,
B. R.
, and
Moore
,
B. C. J.
(
2002
). “
Primitive stream segregation of tone sequences without differences in fundamental frequency or passband
,”
J. Acoust. Soc. Am.
112
(
5
),
2074
2085
.
69.
Rosen
,
S.
(
1992
). “
Temporal information in speech: acoustic, auditory, and linguistic aspects
,”
Philos. Trans. R. Soc. London, Ser. B
336
(
10
),
367
373
.
70.
Ru
,
P.
(
2000
). “
Perception-Based Multi-resolution Auditory Processing of Acoustic Signal
,” Ph.D. thesis,
University of Maryland
, College Park, MD.
71.
Ru
,
P.
, and
Shamma
,
S. A.
(
1997
). “
Presentation of musical timbre in the auditory cortex
,”
J. New Music Res.
26
(
2
),
154
169
.
72.
Schreiner
,
C. E.
, and
Urbas
,
J. V.
(
1988a
). “
Representation of amplitude modulation in the auditory cortex of the cat. i: The anterior field
,”
Hear. Res.
21
,
227
241
.
73.
Schreiner
,
C. E.
, and
Urbas
,
J. V.
(
1988b
). “
Representation of amplitude modulation in the auditory cortex of the cat. ii: Comparison between cortical fields
,”
Hear. Res.
32
,
49
63
.
74.
Seldin
,
J. H.
, and
Fienup
,
J. R.
(
1990
). “
Numerical investigation of the uniqueness of phase retrieval
,”
J. Opt. Soc. Am. A
7
(
3
),
412
427
.
75.
Shamma
,
S.
(
2003
). “
Physiological foundations of temporal integration in the perception of speech
,”
J. Phonetics
31
,
495
501
.
76.
Shamma
,
S.
,
Chadwick
,
R.
,
Wilbur
,
J.
,
Morrish
,
K.
, and
Rinzel
,
J.
(
1986
). “
A biophysical model of cochlear processing: Intensity dependence of pure tone responses
,”
J. Acoust. Soc. Am.
80
,
133
145
.
77.
Shamma
,
S. A.
(
1985a
). “
Speech processing in the auditory system I: The representation of speech in the response of the auditory nerve
,”
J. Acoust. Soc. Am.
78
,
1612
1621
.
78.
Shamma
,
S. A.
(
1985b
). “
Speech processing in the auditory system II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve
,”
J. Acoust. Soc. Am.
78
,
1622
1632
.
79.
Shamma
,
S. A.
(
1989
). “
Spatial and temporal processing in central auditory networks
,” in
Methods in Neuronal Modeling
, edited by
C.
Koch
and
I.
Segev
(
MIT
, Cambridge, MA), pp.
247
289
.
80.
Shamma
,
S. A.
,
Versnel
,
H.
, and
Kowalski
,
N.
(
1995
). “
Ripple analysis in the ferret auditory cortex: I. Response characteristics of single units to sinusoidally rippled spectra
,”
J. Aud. Neurosci.
1
(
2
),
233
254
.
81.
Shamma
,
S. A.
,
Fleshman
,
J. W.
,
Wiser
,
P. R.
, and
Versnal
,
H.
(
1993
). “
Organization of the response areas in ferret primary auditory cortex
,”
J. Neurophysiol.
69
(
2
),
367
383
.
82.
Shannon
,
R. V.
,
Zeng
,
F.-G.
,
Wygonski
,
J.
,
Kamath
,
V.
, and
Ekelid
,
M.
(
1995
). “
Speech recognition with primarily temporal cues
,”
Science
270
,
303
304
.
83.
Sheft
,
S.
, and
Yost
,
W.
(
1990
). “
Temporal integration in amplitude modulation detection
,”
J. Acoust. Soc. Am.
88
,
796
805
.
84.
Slaney
,
M.
(
1998
). “
Auditory toolbox: Version 2
,” Technical Report 1998-010, Interval Research Corporation.
85.
Slaney
,
M.
,
Naar
,
D.
, and
Lyon
,
R. F.
(
1994
). “
Auditory model inversion for sound separation
,” in
Proc. ICASSP
, Vol.
II
, pp.
77
80
.
86.
Smith
,
Z. M.
,
Delgutte
,
B.
, and
Oxenham
,
A. J.
(
2002
). “
Chimaeric sounds reveal dichotomies in auditory perception
,”
Nature (London)
416
(
6876
),
87
90
.
87.
Tchorz
,
J.
, and
Kollmeier
,
B.
(
1999
). “
A model of auditory perception as front end for automatic speech recognition
,”
J. Acoust. Soc. Am.
106
,
2040
2050
.
88.
ter Keurs
,
M.
,
Festen
,
J. M.
, and
Plomp
,
R.
(
1992
). “
Effect of spectral envelope smearing on speech reception. I
,”
J. Acoust. Soc. Am.
91
,
2872
2880
.
89.
Ulanovsky
,
N.
,
Las
,
L.
, and
Nelken
,
I.
(
2003
). “
Processing of low-probability sounds by cortical neurons
,”
Nat. Neurosci.
6
,
391
398
.
90.
Viemeister
,
N. F.
(
1979
). “
Temporal modulation transfer functions based upon modulation thresholds
,”
J. Acoust. Soc. Am.
66
,
1364
1380
.
91.
Wang
,
K.
, and
Shamma
,
S. A.
(
1994
). “
Self-normalization and noise-robustness in early auditory representations
,”
IEEE Trans. Speech Audio Process.
2
(
3
),
421
435
.
92.
Wang
,
K.
, and
Shamma
,
S. A.
(
1995
). “
Representation of spectral profiles in primary auditory cortex
,”
IEEE Trans. Speech Audio Process.
3
(
5
),
382
395
.
93.
Watson
,
A. B.
, and
Ahumada
,
A. J.
(
1985
). “
Model of human visual-motion sensing
,”
J. Opt. Soc. Am. A
2
(
2
),
322
342
.
94.
Westerman
,
L. A.
, and
Smith
,
R. L.
(
1984
). “
Rapid and short term adaptation in auditory nerve responses
,”
Hear. Res.
15
,
249
260
.
95.
Yang
,
X.
,
Wang
,
K.
, and
Shamma
,
S. A.
(
1992
). “
Auditory representations of acoustic signals
,”
IEEE Trans. Inf. Theory
38
(
2
),
824
839
.
You do not currently have access to this content.