Articulatory Feature Based Automatic Speech Recognition Using Neural Network

Ifrat, Kazi; Israt, Kazi; Saimun, Imran Hossain; Akter, Fahima

View/Open

articulatory feature based ASR system.pdf (3.785Mb)

Date

2018-07-06

Author

Ifrat, Kazi

Israt, Kazi

Saimun, Imran Hossain

Akter, Fahima

Metadata

Show full item record

Abstract

Many ASR systems based on hidden Markov model (HMM) have been developed over the last years. Most of these use mel frequency cepstral coefficient(MFCC) of the speech signal, that considers the time frequency distribution of signal energy. The main purpose of this research is to improve the performance of ASR systems by introducing articulatory information. The articulatory information describes the features of speech production rather than the features of acoustic signal. Articulatory Information is represented in term of articulatory features AF. A phoneme can easily be identified by using its unique AF set, which comprises the manner of articulation (vocalic, consonantal, continuant etc.) and place of articulation (tongue position: high, low, front, back etc.) The use of AFs in ASR had been investigated previously, and has been actively discussed in recent years. However, AF isn’t widely used as features for ASR instead of MFCC because AF itself cannot provide enough performance. This thesis presents a method to extract DPFs using multilayer neural network(MLNs). Since AFs are designed after full consideration of speech production, an AFspace well represents the distances among phonemes corresponding to their articulation differences. The fact suggests that DPFs are efficient feature parameter for ASR. In the AF extractor construction, the MLN takes 39 dimensional MFCC vectors as input and outputs 22 dimensional AF vectors. In this work a new Bangla speech corpus along with proper transcriptions has been developed; also various acoustic feature extraction methods have been investigated in order to find their effective integration into Bangla ASR system. The use of multiple acoustic features of the speech signal is considered for Bangla speech recognition. The features are usually a sequence of representative vectors that are extracted from speech signals and the classes are either words or sub word units such as phonemes. The Bangla automatic speech recognition system, developed in this work, models the probability distribution of feature vectors using hidden Markov model (HMM) and adopts the Viterbi rule for classification. Experimental results are presented for medium database of female speech samples. The performance analysis of the individual methods are compared. It has been found that out that our proposed methods out performs the existing standard MFCC based method

URI

http://dspace.uiu.ac.bd/handle/52243/321

Collections

B.Sc Thesis/Project [82]