Pldabased speaker recognition stateoftheart speaker recognition techniques rely on generative pairwise models 8. Indexterms xvectors, plda, neural plda, soft detection cost, speaker veri. Introduction speaker recognition accepts or rejects a claimed identity of a speaker based on speech input. An accurate estimation of speaker and channel subspaces from a multilin. In order to do speaker verification, the embeddings are extracted and used in a standard backend, e. Ivectorplda variants for textdependent speaker recognition. Speaker recognition until recently, most stateoftheart speaker recognition systems were based on ivectors 2. A language independent plda training algorithm has been proposed to improve performance of textindependent speaker recognition under multilingual trial condition. In this study, we use plda to transform speaker characteristics in the ivector space.
The proposed model, termed as neural plda nplda, is initialized using the generative plda model parameters. In these evaluations, the canonical speaker detection task has always prescribed trials. In speaker or face recognition, plda factorizes the variability of the observations for a. Discriminative scoring for speaker recognition based on ivectors. Prince, 2007 given a pair of ivectors dw 1,w 2, 1 means two vectors from the same speaker and 0 means two vectors from different speakers. Pdf ivector feature representation with probabilistic linear discriminant analysis plda scoring in speaker recognition system has recently. This paper studies the problem of speaker recognition for multi speaker conversations using a modern dnn embeddingbased system. Plda based speaker recognition on short utterances qut. Compensating interdataset variability in plda hyper. Stateoftheart speaker recognition for telephone and video.
Ivector plda variants for textdependent speaker recognition t. This paper proposes a generalized framework for domain adaptation of probabilistic linear discriminant analysis plda in speaker recognition. We assume that the phrase labels are given for all utterances in plda training, speaker enrollment and testing. Modifiedprior plda and score calibration for duration. The dnns most often found in speaker recognition are trained as acoustic models for automatic speech recognition asr, and are then used to enhance phonetic modeling in the ivector ubm. The proposed approach take advantageous of multilingual utterances by bilingual speakers to improve speaker recognition in multilingual scenarios. Key method in this paper, we propose a system that incorporates probabilistic linear discriminant analysis plda for ivector scoring, a method already frequently utilized in speaker recognition tasks, and uses unsupervised calibration of the plda scores to determine the clustering stopping. Introduction the series of nist speaker recognition evaluations 1 has had a strong in.
Textdependent speaker recognition using plda with uncertainty propagation t. Moreover, the uncertainty in the ivector estimates should be taken into account in the plda model, due to the short duration of the utterances. A plda approach for language and text independent speaker. An ivector extractor suitable for speaker recognition with. The ivector space gives a lowdimensional representation of a speech segment and training data of a plda model, which offers greater robustness under different conditions. This study aims at proposing a languageindependent plda training algorithm in order to reduce the effect of language on the performance of speaker recognition. Plda based speaker recognition on short utterances core. In this work we investigate the application of one of these techniques supervised plda map adaptation 6 to adapting a telephony speaker recognition system to microphone channel speech. I vector transformation and scaling for plda based speaker. Recently we have introduced a method named interdataset variability compensation idvc in the context of speaker recognition in a mismatched dataset.
Introduction automatic speaker recognition technology aims to distinguish the target speaker and the imposter by two main processing. Apr 18, 2018 1 anna silnova, mireia diez, oldrich plchot, pavel matejka, lukas burget, endtoend dnn based speaker recognition inspired by ivector and plda, ieee sigport, 2018. The well known ivector representation of speech segments has the convenient property. Plda subsystem among stateoftheart speaker verification systems, leading positions are occupied by plda systems 3,4, working in the.
Matejka, and lukas burget, endtoend dnn based speaker recognition inspired by ivector and plda, arxiv eprints arxiv. Channel compensation for speaker recognition using map. Plda, still the performance of speaker recognition is affected under crosssource. Plda based speaker recognition on short utterances by ahilan kanagasundaram, robert j. The system combining ivector and probabilistic linear discriminant analysis plda has been applied with great success in the speaker recognition task. Local training for plda in speaker verification arxiv. The gaussian plda model assumes that the ivectors are distributed according. Proceedings of the speaker and language recognition workshop. Deep discriminant analysis for ivector based robust speaker. It not only includes several existing supervised and unsupervised domain adaptation methods but also makes possible more flexible usage of available data in.
Speaker recognition stateoftheart techniques are usually considered for these representations, including gaussian mixture models, jfa, ivectors, and plda. Ideally the nns should however be trained directly for the speaker verification task, i. This paper proposes to estimate parametric nonlinear transformations of ivectors for speaker recognition systems based on probabilistic linear discriminant analysis plda classification. The dnns most often found in speaker recognition are trained as acoustic models for automatic speech recognition asr, and are then used to enhance phonetic modeling in. The ivectorplda technique and its variants have also been successfully used in textdependent speaker recognition tasks 8, 9, 10. A plda approach for language and text independent speaker recognition. A plda approach for language and text independent speaker recognition abbas khosravani 1, mohammad mehdi homayounpour 1, dijana petrovskadelacr eta. Speaker recognition with random digit strings using. Plda based speaker recognition on short utterances qut eprints. Despite the application of dnn are very successful in automatic speech recognition asr field, a direct transition to speaker recognition is much more challenging. Idvc compensates dataset shifts in the ivector space by constraining the shifts to a low. This is a big advantage of plda in speaker recognition, since in most situations only very few utterances are available for enrollment. The availability of more than one enrollment utterance for a speaker allows a variety of con. G plda model introduced in 3 then assumes that each ivector can be decomposed as 2 in the jargon of speaker recognition, t he model comprises two parts.
In this paper, we apply and enhance the ivectorplda paradigm to textdependent speaker recognition. In 1, the ivector features were tested on the 2008 nist speaker recognition evaluation sre telephone data. A plda model for textdependent speaker recognition in this section, we describe the phrase dependent version of plda which we used in experimenting with the rsr data. The likelihood ratio score of the generative plda model is posed as a discriminative similarity function and the learnable parameters of the score function are optimized using a veri. Nonlinear ivector transformations for pldabased speaker. Pdf compensating interdataset variability in plda hyper. The ivectors are smaller in size to reduce the execution time of the recognition task while maintaining. A generalized framework for domain adaptation of plda in. Plda which is closely related to joint factor analysis jfa 15 used for speaker recognition is a probabilistic extension of linear discriminant analysis lda.
This package contains scripts that run the fast and scalable plda 1 and twostage plda 2. Unsupervised adaptation of plda models for broadcast. In past studies, neural networks have been investigated for speaker recognition 11, 12. Pdf there are many factors affecting the variability of an ivector extracted from a speech segment such as the acoustic con tent, segment duration. I vector transformation and scaling for plda based speaker recognition sandro cumani and pietro laface fsandro. Languageaware plda for multilingual speaker recognition. A big part of this improvement has been the availability of large quantities of speaker labeled data from telephone recordings. It not only includes several existing supervised and unsupervised domain adaptation methods but also makes possible more flexible usage of available data in different domains. The gaussian plda model assumes that the ivectors are distributed according to the standard normal distribution.
Analysis of ivector length normalization in speaker. Besides the original formulation in 7, there are other. Deep learning for ivector speaker and language recognition. A big part of this improvement has been the availability of large quantities of speakerlabeled data from telephone recordings. Discriminative scoring for speaker recognition based on i. Section 2 describes the training, development and evaluation data. Fullposterior plda in speaker recognition technical. Also, research has proven that it is possible to recover biometric samples from templates for other modalities such as. Plda for speaker verification with utterances of arbitrary duration. The vectors in the lowdimensional space are called ivectors. On behaviour of plda models in the task of speaker recognition.
Introduction the earliest successful approach to speaker recognition used the gaussian mixture modeling gmm from the training data followed by an adaptation using maximumaposteriori map rule 1. Probabilistic linear discriminant analysis plda with. Mar 25, 2015 this package contains scripts that run the fast and scalable plda 1 and twostage plda 2. In this area, neural networks also contribute with solutions such as 21, 22. Introduction the impressive gains in performance obtained using deep neural networks dnns for automatic speech recognition asr 1 have motivated the application of dnns to other speech technologies such as speaker recognition sr and language recognition lr 210. A matlab toolbox for speaker recognition research version 1. Abstract recently we have introduced a method named interdataset variability compensation idvc in the context of speaker recognition in a mismatched dataset. In this paper, we apply and enhance the ivector plda paradigm to textdependent speaker recognition. Deep neural networks for small footprint textdependent. Mixture of plda models in ivector space for gender.
Over several decades, speaker recognition performance has steadily improved for applications using telephone speech. Speaker diarization via unsupervised ivector clustering has gained popularity in recent years. Plda baseline on both long and short duration utterances. These are the joint factor analysis, its modified version called the concept of ivectors, and the probabilistic linear discriminant analysis plda. Pdf plda based speaker recognition on short utterances. This paper studies the problem of speaker recognition for multispeaker conversations using a modern dnn embeddingbased system. Plda, still the performance of speaker recognition is affected under crosssource evaluation condition. Due to its origin in textindependent speaker recognition, this paradigm does not make use of the phonetic content of each utterance. We should note that all ivectors of the test set must be whitened. Nowadays, factor analysis based techniques become part of stateoftheart speaker recognition sr systems. Stc speaker recognition system for the nist i vector.
748 1528 1019 451 1448 1153 933 1450 797 93 232 124 620 548 933 582 609 1479 495 533 851 764 1383 916 1338 1190 771 90 1424 1240 692 593 981 126 899 30 172 1431 63 1361 988 754 853