Articulatory Feature Based Automatic Speech Recognition Using Neural Network
Date
2018-07-06Author
Ifrat, Kazi
Israt, Kazi
Saimun, Imran Hossain
Akter, Fahima
Metadata
Show full item recordAbstract
Many ASR systems based on hidden Markov model (HMM) have been developed over the
last years. Most of these use mel frequency cepstral coefficient(MFCC) of the speech signal,
that considers the time frequency distribution of signal energy. The main purpose of this
research is to improve the performance of ASR systems by introducing articulatory
information. The articulatory information describes the features of speech production rather
than the features of acoustic signal. Articulatory Information is represented in term of
articulatory features AF. A phoneme can easily be identified by using its unique AF set, which
comprises the manner of articulation (vocalic, consonantal, continuant etc.) and place of
articulation (tongue position: high, low, front, back etc.) The use of AFs in ASR had been
investigated previously, and has been actively discussed in recent years. However, AF isn’t
widely used as features for ASR instead of MFCC because AF itself cannot provide enough
performance. This thesis presents a method to extract DPFs using multilayer neural
network(MLNs). Since AFs are designed after full consideration of speech production, an AFspace well represents the distances among phonemes corresponding to their articulation
differences. The fact suggests that DPFs are efficient feature parameter for ASR. In the AF
extractor construction, the MLN takes 39 dimensional MFCC vectors as input and outputs 22
dimensional AF vectors.
In this work a new Bangla speech corpus along with proper transcriptions has been developed;
also various acoustic feature extraction methods have been investigated in order to find their
effective integration into Bangla ASR system. The use of multiple acoustic features of the
speech signal is considered for Bangla speech recognition. The features are usually a sequence
of representative vectors that are extracted from speech signals and the classes are either words
or sub word units such as phonemes.
The Bangla automatic speech recognition system, developed in this work, models the
probability distribution of feature vectors using hidden Markov model (HMM) and adopts the
Viterbi rule for classification. Experimental results are presented for medium database of
female speech samples. The performance analysis of the individual methods are compared. It
has been found that out that our proposed methods out performs the existing standard MFCC
based method
Collections
- B.Sc Thesis/Project [82]