Multiresolution spectrotemporal analysis of complex sounds

2.

Amagai

,

S.

,

Dooling

,

R.

,

Shamma

,

S.

,

Kidd

,

T.

, and

Lohr

,

B.

(

1999

). “

Detection of modulation in spectral envelopes and linear-rippled noises by budgerigars

,”

J. Acoust. Soc. Am.

105

,

2029

–

2035

.

3.

Arai

,

T.

,

Pavel

,

M.

,

Hermansky

,

H.

, and

Avendano

,

C.

(

1996

). “

Intelligibility of speech with filtered time trajectories of spectral envelopes

,”

Proc. ICSLP

, pp.

2490

–

2492

.

4.

Atal

,

B. S.

(

1974

). “

Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identifucation and verification

,”

J. Acoust. Soc. Am.

55

,

1304

–

1312

.

5.

Atlas

,

L.

, and

Shamma

,

S.

(

2003

). “

Joint acoustic and modulation frequency

,”

EURASIP J. Appl. Signal Process.

7

,

668

–

675

.

6.

Bacon

,

S. P.

, and

Grantham

,

D. W.

(

1989

). “

Modulation masking: Effects of modulation frequency, depth, and phase

,”

J. Acoust. Soc. Am.

85

,

2575

–

2580

.

7.

Baer

,

T.

, and

Moore

,

B. C. J.

(

1993

). “

Effects of spectral smearing on the intelligibility of sentences in noise

,”

J. Acoust. Soc. Am.

94

,

1229

–

1241

.

8.

Bates

,

R. H. T.

(

1984

). “

Uniqueness of solutions to two-dimensional fourier phase problems for localized and positive images

,”

Comput. Vis. Graph. Image Process.

25

,

205

–

217

.

9.

Calhoun

,

B.

, and

Schreiner

,

C.

(

1995

). “

Spectral envelope coding in cat primary auditory cortex

,”

J. Aud. Neuroscie.

1

,

39

–

61

.

10.

Carlyon

,

R.

, and

Shamma

,

S.

(

2003

). “

An account of monaural phase sensitivity

,”

J. Acoust. Soc. Am.

114

,

333

–

348

.

11.

Carney

,

L. H.

(

1993

). “

A model for the responses of low-frequency auditory-nerve fibers in cat

,”

J. Acoust. Soc. Am.

93

,

401

–

417

.

12.

Chi

,

T.

(

2003

). “

Computational Spectro-temporal Auditory Model with Applications to Acoustical Information Processing

,” Ph.D. thesis,

University of Maryland

, College Park, MD.

13.

Chi

,

T.

,

Gao

,

Y.

,

Guyton

,

C. G.

,

Ru

,

P.

, and

Shamma

,

S.

(

1999

). “

Spectro-temporal modulation transfer functions and speech intelligibility

,”

J. Acoust. Soc. Am.

106

,

2719

–

2732

.

14.

Cohen

,

J. R.

(

1989

). “

Application of an auditory model to speech recognition

,”

J. Acoust. Soc. Am.

85

,

2623

–

2633

.

15.

Dau

,

T.

,

Kollmeier

,

B.

, and

Kohlrausch

,

A.

(

1997a

). “

Modeling auditory processing of amplitude modulation. i. detection and masking with narrow-band carriers

,”

J. Acoust. Soc. Am.

102

,

2892

–

2905

.

16.

Dau

,

T.

,

Kollmeier

,

B.

, and

Kohlrausch

,

A.

(

1997b

). “

Modeling auditory processing of amplitude modulation. ii. spectral and temporal integration

,”

J. Acoust. Soc. Am.

102

,

2906

–

2919

.

17.

Dau

,

T.

,

Puschel

,

D.

, and

Kohlrausch

,

A.

(

1996

). “

A quantitative model of the effective signal processing in the auditory system. I. Model structure

,”

J. Acoust. Soc. Am.

99

,

3615

–

3622

.

18.

deCharms

,

R. C.

,

Blake

,

D. T.

, and

Merzenich

,

M. M.

(

1998

). “

Optimizing sound features for cortical neurons

,”

Science

280

(

5368

),

1439

–

1443

.

19.

Depireux

,

D.

,

Simon

,

J.

,

Klein

,

D.

, and

Shamma

,

S.

(

2001

). “

Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex

,”

J. Neurophysiol.

85

(

3

),

1220

–

1234

.

20.

deRibaupierre

,

F.

, and

Rouiller

,

E.

(

1981

). “

Temporal coding of repetitive clicks: presence of rate selective units in the cat’s medial geniculate body (mgb)

,”

J. Physiol. (London)

318

,

23

–

24

.

21.

Drullman

,

R.

(

1995

). “

Temporal envelope and fine structure cues for speech intelligibility

,”

J. Acoust. Soc. Am.

97

,

585

–

592

.

22.

Drullman

,

R.

,

Festen

,

J.

, and

Plomp

,

R.

(

1994

). “

Effect of temporal envelope smearing on speech reception

,”

J. Acoust. Soc. Am.

95

,

1053

–

1064

.

23.

Edamatsu

,

H.

,

Kawasaki

,

M.

, and

Suga

,

N.

(

1989

). “

Distribution of combination-sensitive neurons in the ventral fringe area of the auditory cortex of the mustached bat

,”

J. Neurophysiol.

61

(

1

),

202

–

207

.

24.

Eggermont

,

J. J.

(

2002

). “

Temporal modulation transfer functions in cat primary auditory cortex: Separating stimulus effects from neural mechanisms

,”

J. Neurophysiol.

87

,

305

–

321

.

25.

Elhilali

,

M.

,

Chi

,

T.

, and

Shamma

,

S. A.

(

2003

). “

A spectro-temporal modulation index (stmi) for assessment of speech intelligibility

,”

Speech Commun.

41

(

2–3

),

331

–

348

.

26.

Elhilali

,

M.

,

Fritz

,

J. B.

,

Klein

,

D. J.

,

Simon

,

J. Z.

, and

Shamma

,

S. A.

(

2004

). “

Dynamics of precise spike timing in primary auditory cortex

,”

J. Neurosci.

24

(

5

),

1159

–

1172

.

27.

Ewert

,

S. D.

, and

Dau

,

T.

(

2000

). “

Characterizing frequency selectivity for envelope fluctuations

,”

J. Acoust. Soc. Am.

108

,

1181

–

1196

.

28.

Fienup

,

J. R.

(

1982

). “

Phase retrieval algorithms: a comparison

,”

Appl. Opt.

21

,

2758

–

2769

.

29.

Fienup

,

J. R.

, and

Wackerman

,

C. C.

(

1987

). “

Phase-retrieval stagnation problems and solutions

,”

J. Opt. Soc. Am. A

3

(

11

),

1897

–

1907

.

30.

Fu

,

Q.-J.

, and

Shannon

,

R. V.

(

2000

). “

Effect of stimulation rate on phoneme recognition by nucleus-22 cochlear implant listeners

,”

J. Acoust. Soc. Am.

107

,

589

–

597

.

31.

Gerchberg

,

R. W.

, and

Saxton

,

W. O.

(

1972

). “

A practical algorithm for the determination of phase from image and diffraction plane pictures

,”

Optik (Jena)

35

,

237

–

246

.

32.

Ghitza

,

O.

(

2001

). “

On the upper cutoff frequency of the auditory critical-band envelope detectors in the context of speech perception

,”

J. Acoust. Soc. Am.

110

,

1628

–

1640

.

33.

Green

,

D. M.

(

1986

). “

Frequency and the detection of spectral shape change

,” in

Auditory Frequency Selectivity

(

Plenum

, New York), pp.

351

–

359

.

34.

Greenberg

,

S.

, and

Kingsbury

,

B.

(

1997

). “

The modulation spectrogram: In pursuit of an invariant representation of speech

,” in

Proc. ICASSP

, pp.

1647

–

1650

.

35.

Greenberg

,

S.

,

Arai

,

T.

, and

Silipo

,

R.

(

1998

). “

Speech intelligibility derived from exceedingly sparse spectral information

,” in

Proc. of the Intl. Conf. on Spoken Language Processing

, Sydney, pp.

2803

–

2806

.

36.

Grimault

,

N.

,

Bacon

,

S. P.

, and

Micheyl

,

C.

(

2002

). “

Auditory stream segregation on the basis of amplitude-modulation rate

,”

J. Acoust. Soc. Am.

111

,

1340

–

1348

.

37.

Hansen

,

M.

, and

Kollmeier

,

B.

(

1999

). “

Continuous assessment of time-varying speech quality

,”

J. Acoust. Soc. Am.

106

,

2888

–

2899

.

38.

Hayes

,

M. H.

(

1982

). “

The reconstruction of a multidimensional sequence from the phase or magnitude of its fourier transform

,”

IEEE Trans. Acoust., Speech, Signal Process.

ASSP-30

(

2

),

140

–

154

.

39.

Hayes

,

M. H.

(

1987

). “

The unique reconstruction of multidimensional sequences from fourier transform magnitude or phase

,” in

Image Recovery: Theory and Application

, edited by

H.

Stark

(

Academic

, San Diego), pp.

195

–

230

.

40.

Hayes

,

M. H.

,

Lim

,

J. S.

, and

Oppenheim

,

A. V.

(

1980

). “

Signal reconstruction from phase or magnitude

,”

IEEE Trans. Acoust., Speech, Signal Process.

ASSP-28

(

6

),

672

–

680

.

41.

Hermansky

,

H.

, and

Morgan

,

N.

(

1994

). “

Rasta processing of speech

,”

IEEE Trans. Speech Audio Process.

2

(

4

),

578

–

589

.

42.

Houtgast

,

T.

(

1989

). “

Frequency selectivity in amplitude-modulation detection

,”

J. Acoust. Soc. Am.

85

(

4

),

1676

–

1680

.

43.

Houtgast

,

T.

,

Steeneken

,

H. J. M.

, and

Plomp

,

R.

(

1980

). “

Predicting speech intelligibility in rooms from the modulation transfer function. i. general room acoustics

,”

Acustica

46

,

60

–

72

.

44.

Irino

,

T.

, and

Kawahara

,

H.

(

1993

). “

Signal reconstruction from modified auditory wavelet transform

,”

IEEE Trans. Signal Process.

41

(

12

),

3549

–

3554

.

45.

ITU-T (

2001

). “

Perceptual evaluation of speech quality (pesq): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs

,” ITU-T Recommendation P.862, February.

46.

Jones

,

J. P.

, and

Palmer

,

L. A.

(

1987

). “

An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex

,”

J. Neurophysiol.

58

(

6

),

1233

–

1258

.

47.

Joris

,

P.

, and

Yin

,

T. C.

(

1992

). “

Responses to amplitude-modulated tones in the auditory nerve of the cat

,”

J. Acoust. Soc. Am.

91

,

215

–

232

.

48.

Klein

,

D. J.

,

Depireux

,

D. A.

,

Simon

,

J. Z.

, and

Shamma

,

S. A.

(

2000

). “

Robust spectro temporal reverse correlation for the auditory system: Optimizing stimulus design

,”

J. Comput. Neurosci.

9

,

85

–

111

.

49.

Kleinschmidt

,

M.

,

Tchorz

,

J.

, and

Kollmeier

,

B.

(

2001

). “

Combining speech enhancement and auditory feature extraction for robust speech recognition

,”

Speech Commun.

34

(

1–2

),

75

–

91

.

50.

Kowalski

,

N.

,

Depireux

,

D.

, and

Shamma

,

S. A.

(

1996

). “

Analysis of dynamic spectra in ferret primary auditory cortex: I. Characteristics of single unit responses to moving ripple spectra

,”

J. Neurophysiol.

76

(

5

),

3503

–

3523

.

51.

Kryter

,

K.

(

1962

). “

Methods for the calculation and use of the articulation index

,”

J. Acoust. Soc. Am.

34

,

1689

–

2147

.

52.

Langner

,

G.

(

1992

). “

Periodicity coding in the auditory system

,”

Hear. Res.

60

,

115

–

142

.

53.

Langner

,

G.

, and

Schreiner

,

C. E.

(

1988

). “

Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms

,”

J. Neurophysiol.

60

(

6

),

1799

–

1822

.

54.

Levi

,

A.

, and

Stark

,

H.

(

1983

). “

Signal restoration from phase by projections onto convex sets

,”

J. Opt. Soc. Am.

73

(

6

),

810

–

822

.

55.

Levi

,

A.

, and

Stark

,

H.

(

1984

). “

Image restoration by the method of generalized projections with application to restoration from magnitude

,”

J. Opt. Soc. Am. A

1

(

9

),

932

–

943

.

56.

Lu

,

T.

,

Liang

,

L.

, and

Wang

,

X.

(

2001

). “

Temporal and rate representations of time-varying signals in the auditory cortex of awake primates

,”

Nat. Neurosci.

11

,

1131

–

1138

.

57.

Lyon

,

R.

, and

Shamma

,

S.

(

1996

). “

Auditory representations of timbre and pitch

,” in

Auditory Computation

, edited by

H.

Hawkins

,

E. T.

McMullen

,

A.

Popper

, and

R.

Fay

(

Springer Verlag

, New York), pp.

221

–

270

.

58.

Meddis

,

R.

,

Hewitt

,

M. J.

, and

Shackleton

,

T. M.

(

1990

). “

Implementation details of a computation model of the inner hair-cell/auditory-nerve synapse

,”

J. Acoust. Soc. Am.

87

,

1813

–

1816

.

59.

Mesgarani

,

N.

, and

Shamma

,

S.

(

2005

). “

Speech enhancement based on filtering the spectrotemporal modulations

,” in

Proc. ICASSP

. Vol. 1, pp.

1105

–

1108

.

60.

Mesgarani

,

N.

,

Slaney

,

M.

, and

Shamma

,

S.

(

2004

). “

Discrimination of speech from non-speech based on multiscale spectro-temporal modulations

,”

IEEE Trans. Speech Audio Process.

(accepted for publication).

61.

Miller

,

L. M.

,

Escabi

,

M. A.

,

Read

,

H. L.

, and

Schreiner

,

C. E.

(

2002

). “

Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex

,”

J. Neurophysiol.

87

(

1

),

516

–

527

.

62.

Mou-yan

,

Z.

, and

Unbehauen

,

R.

(

1997

). “

Methods for reconstruction of 2-d sequences from fourier transform magnitude

,”

IEEE Trans. Image Process.

6

(

2

),

222

–

233

.

63.

Nelken

,

I.

, and

Versnel

,

H.

(

2000

). “

Responses to linear and logarithmic frequency-modulated sweeps in ferret primary auditory cortex

,”

Eur. J. Neurosci.

12

(

2

),

549

–

562

.

64.

Pan

,

D.

(

1995

). “

A tutorial on mpeg audio compression

,”

IEEE Multimedia

2

(

2

),

60

–

74

.

65.

Papoulis

,

A.

(

1975

). “

A new algorithm in spectral analysis and band-limited extrapolation

,”

IEEE Trans. Circuits Syst.

CAS-22

(

9

),

735

–

742

.

66.

Pfeiffer

,

R. R.

, and

Kim

,

D. O.

(

1975

). “

Cochlear nerve fiber responses: distributing along the cochlear partition

,”

J. Acoust. Soc. Am.

58

,

867

–

869

.

67.

Pitton

,

J. W.

,

Wang

,

K.

, and

Juang

,

B.-H.

(

1996

). “

Time-frequency analysis and auditory modeling for automatic recognition of speech

,”

Proc. IEEE

84

(

9

),

1199

–

1215

.

68.

Roberts

,

B.

,

Glasberg

,

B. R.

, and

Moore

,

B. C. J.

(

2002

). “

Primitive stream segregation of tone sequences without differences in fundamental frequency or passband

,”

J. Acoust. Soc. Am.

112

(

5

),

2074

–

2085

.

69.

Rosen

,

S.

(

1992

). “

Temporal information in speech: acoustic, auditory, and linguistic aspects

,”

Philos. Trans. R. Soc. London, Ser. B

336

(

10

),

367

–

373

.

70.

Ru

,

P.

(

2000

). “

Perception-Based Multi-resolution Auditory Processing of Acoustic Signal

,” Ph.D. thesis,

University of Maryland

, College Park, MD.

71.

Ru

,

P.

, and

Shamma

,

S. A.

(

1997

). “

Presentation of musical timbre in the auditory cortex

,”

J. New Music Res.

26

(

2

),

154

–

169

.

72.

Schreiner

,

C. E.

, and

Urbas

,

J. V.

(

1988a

). “

Representation of amplitude modulation in the auditory cortex of the cat. i: The anterior field

,”

Hear. Res.

21

,

227

–

241

.

73.

Schreiner

,

C. E.

, and

Urbas

,

J. V.

(

1988b

). “

Representation of amplitude modulation in the auditory cortex of the cat. ii: Comparison between cortical fields

,”

Hear. Res.

32

,

49

–

63

.

74.

Seldin

,

J. H.

, and

Fienup

,

J. R.

(

1990

). “

Numerical investigation of the uniqueness of phase retrieval

,”

J. Opt. Soc. Am. A

7

(

3

),

412

–

427

.

75.

Shamma

,

S.

(

2003

). “

Physiological foundations of temporal integration in the perception of speech

,”

J. Phonetics

31

,

495

–

501

.

76.

Shamma

,

S.

,

Chadwick

,

R.

,

Wilbur

,

J.

,

Morrish

,

K.

, and

Rinzel

,

J.

(

1986

). “

A biophysical model of cochlear processing: Intensity dependence of pure tone responses

,”

J. Acoust. Soc. Am.

80

,

133

–

145

.

77.

Shamma

,

S. A.

(

1985a

). “

Speech processing in the auditory system I: The representation of speech in the response of the auditory nerve

,”

J. Acoust. Soc. Am.

78

,

1612

–

1621

.

78.

Shamma

,

S. A.

(

1985b

). “

Speech processing in the auditory system II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve

,”

J. Acoust. Soc. Am.

78

,

1622

–

1632

.

79.

Shamma

,

S. A.

(

1989

). “

Spatial and temporal processing in central auditory networks

,” in

Methods in Neuronal Modeling

, edited by

C.

Koch

and

I.

Segev

(

MIT

, Cambridge, MA), pp.

247

–

289

.

80.

Shamma

,

S. A.

,

Versnel

,

H.

, and

Kowalski

,

N.

(

1995

). “

Ripple analysis in the ferret auditory cortex: I. Response characteristics of single units to sinusoidally rippled spectra

,”

J. Aud. Neurosci.

1

(

2

),

233

–

254

.

81.

Shamma

,

S. A.

,

Fleshman

,

J. W.

,

Wiser

,

P. R.

, and

Versnal

,

H.

(

1993

). “

Organization of the response areas in ferret primary auditory cortex

,”

J. Neurophysiol.

69

(

2

),

367

–

383

.

82.

Shannon

,

R. V.

,

Zeng

,

F.-G.

,

Wygonski

,

J.

,

Kamath

,

V.

, and

Ekelid

,

M.

(

1995

). “

Speech recognition with primarily temporal cues

,”

Science

270

,

303

–

304

.

83.

Sheft

,

S.

, and

Yost

,

W.

(

1990

). “

Temporal integration in amplitude modulation detection

,”

J. Acoust. Soc. Am.

88

,

796

–

805

.

84.

Slaney

,

M.

(

1998

). “

Auditory toolbox: Version 2

,” Technical Report 1998-010, Interval Research Corporation.

85.

Slaney

,

M.

,

Naar

,

D.

, and

Lyon

,

R. F.

(

1994

). “

Auditory model inversion for sound separation

,” in

Proc. ICASSP

, Vol.

II

, pp.

77

–

80

.

86.

Smith

,

Z. M.

,

Delgutte

,

B.

, and

Oxenham

,

A. J.

(

2002

). “

Chimaeric sounds reveal dichotomies in auditory perception

,”

Nature (London)

416

(

6876

),

87

–

90

.

87.

Tchorz

,

J.

, and

Kollmeier

,

B.

(

1999

). “

A model of auditory perception as front end for automatic speech recognition

,”

J. Acoust. Soc. Am.

106

,

2040

–

2050

.

88.

ter Keurs

,

M.

,

Festen

,

J. M.

, and

Plomp

,

R.

(

1992

). “

Effect of spectral envelope smearing on speech reception. I

,”

J. Acoust. Soc. Am.

91

,

2872

–

2880

.

89.

Ulanovsky

,

N.

,

Las

,

L.

, and

Nelken

,

I.

(

2003

). “

Processing of low-probability sounds by cortical neurons

,”

Nat. Neurosci.

6

,

391

–

398

.

90.

Viemeister

,

N. F.

(

1979

). “

Temporal modulation transfer functions based upon modulation thresholds

,”

J. Acoust. Soc. Am.

66

,

1364

–

1380

.

91.

Wang

,

K.

, and

Shamma

,

S. A.

(

1994

). “

Self-normalization and noise-robustness in early auditory representations

,”

IEEE Trans. Speech Audio Process.

2

(

3

),

421

–

435

.

92.

Wang

,

K.

, and

Shamma

,

S. A.

(

1995

). “

Representation of spectral profiles in primary auditory cortex

,”

IEEE Trans. Speech Audio Process.

3

(

5

),

382

–

395

.

93.

Watson

,

A. B.

, and

Ahumada

,

A. J.

(

1985

). “

Model of human visual-motion sensing

,”

J. Opt. Soc. Am. A

2

(

2

),

322

–

342

.

94.

Westerman

,

L. A.

, and

Smith

,

R. L.

(

1984

). “

Rapid and short term adaptation in auditory nerve responses

,”

Hear. Res.

15

,

249

–

260

.

95.

Yang

,

X.

,

Wang

,

K.

, and

Shamma

,

S. A.

(

1992

). “

Auditory representations of acoustic signals

,”

IEEE Trans. Inf. Theory

38

(

2

),

824

–

839

.