Speaker Identification using Data-Driven Score Classification

Hock Gan, Iosif Mporas, Saeid Safavi, Reza Sotudeh

Abstract


We present a comparative evaluation of different classification algorithms for a fusion engine that is used in a speaker identity selection task. The fusion engine combines the scores from a number of classifiers, which uses the GMM-UBM approach to match speaker identity.  The performances of the evaluated classification algorithms were examined in both the text-dependent and text-independent operation modes. The experimental results indicated a significant improvement in terms of speaker identification accuracy, which was approximately 7% and 14.5% for the text-dependent and the text-independent scenarios, respectively. We suggest the use of fusion with a discriminative algorithm such as a Support Vector Machine in a real-world speaker identification application where the text-independent scenario predominates based on the findings.

References


Altman, N.S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185

Beigi, H. (2011). Speaker Recognition, Encyclopedia of Cryptography and Security, Springer, pp. 1232–1242

Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Reynolds, D.A. (2004). A tutorial on textindependent speaker verification. EURASIP journal on applied signal processing, 2004, 430–451

Bishop, C.M. (2008, June). A new framework for machine learning. In IEEE World Congress on Computational Intelligence (pp. 1–24). Springer Berlin Heidelberg

Bouchard, G. (2007). Bias-variance tradeoff in hybrid generative-discriminative models. In Machine Learning and Applications. ICMLA 2007. Sixth International Conference on (pp. 124–129). IEEE

Burges, C.J.C., Ben, J.I., Denker, J.S., LeCun, Y., Nohl, C.R. (1993). Off line recognition of handwritten postal words using neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7(04), 689–704

Campbell, J.P. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85(9), 1437–1462

Campbell, J.P., Reynolds, D A. (1999, March). Corpora for the evaluation of speaker recognition systems. In Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on (Vol. 2, pp. 829–832). IEEE

Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798

Damper, R.I., Higgins, J.E. (2003). Improving speaker identification in noise by subband processing and decision fusion. Pattern Recognition Letters, 24(13), 2167–2173

Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272

Ganchev, T., Siafarikas, M., Mporas, I., Stoyanova, T. (2014). Wavelet basis selection for enhanced speech parametrization in speaker verification. International Journal of Speech Technology, 17(1), 27–36

Hermansky, H., Morgan, N. (1994). RASTA processing of speech. IEEE transactions on speech and audio processing, 2(4), 578–589

Hsu, C.W., Lin, C.J. (2002). A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks, 13(2), 415–425

Kittler, J., Hatef, M., Duin, R.P., Matas, J. (1998). On combining classifiers. IEEE transactions on pattern analysis and machine intelligence, 20(3), 226–239

Kuncheva, L.I., Alpaydin, E. (2007). Combining Pattern Classifiers: Methods and Algorithms, IEEE Transactions on Neural Networks, 18(3), 964–964

Kung, S.Y. (2014). Kernel methods and machine learning. Cambridge University Press. pp. 341–342

Larcher, A., Lee, K.A., Ma, B., Li, H. (2014). Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Communication, 60, 56–77

Mitchell, H. B. (2007). Multi-sensor data fusion: an introduction. Springer Science & Business Media

Monte-Moreno, E., Chetouani, M., Faundez-Zanuy, M., Sole-Casals, J. (2009). Maximum likelihood linear programming data fusion for speaker recognition. Speech Communication, 51(9), 820–830

Najafian, M., Safavi, S., Weber, P., Russell, M. (2016). Identification of British English regional accents using fusion of i-vector and multi-accent phonotactic systems. ODYSSEY

Nandakumar, K., Jain, A. K. (2008, September). Multibiometric template security using fuzzy vault. In Biometrics: Theory, Applications and Systems, 2008. BTAS 2008. 2nd IEEE International Conference on (pp. 1–6). IEEE

Pal, S.K., Mitra, S. (1996). Noisy fingerprint classification using multilayer perceptron with fuzzy geometrical and textural features. Fuzzy sets and systems, 80(2), 121–132

Ramachandran, R.P., Farrell, K.R., Ramachandran, R., Mammone, R.J. (2002). Speaker recognition–general classifier approaches and data fusion methods. Pattern Recognition, 35(12), 2801–2821

Raudys, Š. (2006). Trainable fusion rules. I. Large sample size case. Neural Networks, 19(10), 1506–1516

Reynolds, D.A., Rose, R. C. (1995). Robust textindependent speaker identification using Gaussian mixture speaker models. IEEE transactions on speech and audio processing, 3(1), 72–83

Reynolds, D.A., Quatieri, T.F., Dunn, R.B. (2000). Speaker verification using adapted Gaussian mixture models. Digital signal processing, 10(1), 19–41

Safavi, S., Gan, H., Mporas, I., Sotudeh, R. Fraud Detection in Voice-based Identity Authentication Applications and Services. In The IEEE International Conference on Data Mining series (ICDM), 2016

Safavi, S., Hanani, A., Russell, M., Jancovic, P., Carey, M.J. (2012). Contrasting the effects of different frequency bands on speaker and accent identification. IEEE Signal Processing Letters, 19(12), 829–832.

Safavi, S., Jancovic, P., Russell, M.J., Carey, M.J. (2013). Identification of gender from children’s speech by computers and humans. In INTERSPEECH (pp. 2440–2444)

Safavi, S., Najafian, M., Hanani, A., Russell, M.J., Jancovic, P., Carey, M.J. (2012). Speaker Recognition for Children’s Speech. In INTERSPEECH (pp. 1836–1839)

Safavi, S., Russell, M.J., Jancovic, P. (2014, September). Identification of age-group from children’s speech by computers and humans. In INTERSPEECH (pp. 243–247)

Soong, F.K., Rosenberg, A.E., Juang, B.H., Rabiner, L.R. (1987). Report: A vector quantization approach to speaker recognition. AT&T technical journal, 66(2), 14–26

Sukkar, R.A., Lee, C.H. (1996). Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition. IEEE Transactions on Speech and Audio Processing, 4(6), 420–429

Witten, I.H., Frank, E., Hall, M.A. (20011). Embedded Machine Learning. Data Mining: Practical machine learning tools and techniques. Elsevier BV, pp. 531–538

Viikki, O., Laurila, K. (1998). Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication, 25(1), 133–147

Zhang, S., Zhu, L. (2013). A packet classification algorithm based on improved decision tree. Journal of Networks, 8(12), 2864–2871


Refbacks

  • There are currently no refbacks.


Copyright (c) 2016 IMAGE PROCESSING & COMMUNICATIONS