VOICE RECOGNITION SYSTEM using microcontroller

Voice recognition system involves a biometric technology. This technology is getting very popular nowadays for security purposes and for electronics projects among engineering students. The individuals are easily identified through it and the chances of theft and fraud are reduced. Other methods of biometric identification are iris/eye scan, fingerprints, face scan, hand print, voice print, handwriting etc. Through the biometric voice recognition system, the unique voice characteristics of an individual can be recognized. This security system has a wide range of applications and uses as for ATM manufacturers, automobile manufacturers and in cell phone security access system to eliminate any sort of theft or fraud. It is also have many applications in embedded based applications.


Voice recognition system

The voice recognition system is the device’s capacity to understand spoken instructions. It is acutally a type of embedded system. When used with a computer an ADC is used which converts varying analog voice signals into digital pulses or digital signals, to be easily understood by the computer. The hard drive already has the forms of speech stored in it. The voice signal is decoded and checked against the stored forms. Sometimes due to the presence of other voices and noises, the output does not come out to be accurate.


In order to convert the speech or spoken words into a computer command, several complex steps are performed by the computer. The analog to digital converter converts the voice signal into digital signal for the computer. The ADC digitizes the sound wave at frequent intervals by taking some precise measurements. This sampled or digitized sound is then filtered in order to remove noise. This is also done to separate the sound in different bands of frequency. Sound also gets normalized by it. Different people have different speed of speaking, so the sound is adjusted such that it can match with the speed of the stored sound template in the memory of the system.

The next step is to divide the signal in smaller segments as few hundredths or thousandths of a second. These signals are then matched with the known phonemes. The smallest element of any language is said to be a phoneme. In the English language, there are approximately 40 phonemes.  Different languages have different number of phonemes.


Next is the most difficult step in speech recognition. The phonemes are examined in the context of other phonemes which are around them. A complex statistical model then examines the contextual phoneme plot and it is compared with a large library of words, sentences and phrases. Then the program finally determines the words being said by the user and displays the output as text or issues a command.The earlier speech recognition systems applied a set of syntactical and grammatical rules that if the spoken words follow these rules then the words can be determined. But the human language cannot be modeled by just a set of rules.

Voice recognition system models

The speech recognition systems of modern day involve the use of complicated and powerful statistical modeling systems. Different mathematical functions and probability techniques are used to determine the correct word or sentence. John Garofolo has proposed two models for it:

  1. Hidden Markov Model
  2. Neural Networks

The hidden Markov model is more important. According to this model, a phoneme is treated as a link in a chain, and the completed chain represents a word. To determine the next phoneme, the chain forms branches of different sounds that can come next, a probability score is given to each branched off phoneme based on the built in dictionary. Thus, the complete word is finally determined.


HARDWARE DESIGN of Voice recognition system

The hardware design of a very basic voice recognition security system involves three main elements:

  1. A microphone circuit.
  2. A microcontroller circuit.
  3. LCD Display.

The microphone circuit is connected to the ADC of the controller. A set of words and phrases is stored in the memory of the microcontroller. Once a word is spoken in the mic, the adc converts it into digital signals which pass through digital filters and finally LCD, connected to the microcontroller displays the words spoken.




The ultrasonic processing is similar to radar. The ultra high frequency acoustic tone is thrown at a moving object, the reflections produced are recorded by a receiver. The Doppler effect governs the frequency of the tone reflected, the equation for it can be expressed as:

f = f0(1+ v c ),

f0= emitted tone frequency

f= reflected tone frequency

v= velocity of reflecting surface towards the emitter

c= speed of sound

Thus we can conclude that if the reflecting surface is moving far from the emitter, the recorded frequency tone will be lower and vice versa.The reflected signal will consist of a sum of sinusoids having different strengths and frequencies. In a case where a person is talking, the articulator motion during speech production will cause the reflections. To differentiate between speech sounds, the time frequency patterns can be of potential use.


It is a device developed at Army Research Laboratory. This sensor physically couples to a patient and records the medical information such as the patient’s heartbeat and respiration. This sensor is worn around the throat.

It is useful in places where there is too much noise. The words spoken by a person in a microphone is compared with the data obtained from the physiological sensor attached on the neck of the person and then the words or the sentences are determined easily.

Earlier it was difficult to recognize speech using this sensor in IBM’s Via voce because of the distortions caused in speech. But later on, Rockwell Science Center a Hidden Markov model based speech recognizer to be used with the physiological sensor.

These two were the common sensors which have been developed. Research and development on other types of voice recognition sensors is already being carried out nowadays.


There are four types of voice recognition systems.


  1. ISOLATED VOICE RECOGNITION SYSTEM: This system requires a brief pass between the words spoken.
  1. CONTINUOUS VOICE RECOGNITION SYSTEM: as the name suggests, this system does not require any pass between the words.
  1. SPEAKER DEPENDANT VOICE RECOGNITION SYSTEM: This system identifies the speech from a single speaker only. That means only a certain speaker can get passed through this system.
  1. SPEAKER INDEPENDENT VOICE RECOGNITION SYSTEM: This system can identify speech of any person.



This car is designed to be operated through the interfacing of a human and a machine. The robotic vehicle performs its operations based on the voice commands given by a human. This operation is achieved by using an 89C51 controller with a voice recognition module such as HM2007 etc. Voice commands or the push buttons control the direction of the robotic car. From transmitter to receiver end, the voice signals or commands are sent by the RF. The car can move left,right, forward or backward depending upon the command given to it.

Transmitter block diagram of robotic car

There are two motors interfaced with the 89C51 microcontroller. These two motors control the movement of the robotic car. The commands are converted by the RF transmitter and encoded into digital data. The receiver circuit in the vehicle receives the data and decodes it to send it to another microcontroller which can drive the DC motors. A motor driver IC is used to control the directions and movements of the car according to the decoded data of voice commands.

Receiver block diagram of voice controlled robotic car

This voice controlled vehicle can also be operated with the help of a DTMF technology for a very long range communication. Through DTMF, the car can be controlled through a mobile phone.

Photo credits:

Add Comment

Subscribe to our blog to get updates in your email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 704 other subscribers