Audio Input for Voice Recognition
By Dr. Jim on Sep 3, 2009 | In Welcome
Dr. Jim has created an audio filter for his voice recognition software. It takes raw vocal data, amplifies it (first pic is the amplifier schematic), then passes it through a circuit he built that will take out the vocal cords, what he calls the "carrier wave" so all that is left is the modulation waveform. An analog to digital conversion takes place with this circuit after the carrier wave is stripped off.
He then is able to use this data, the modulation waveform, to pass to his machine intelligence technology so our robot can remember and learn from what it hears. We have effectively given our android synthetic "ears".

Here is Dr. Jim's video explaining this.
Voice Recognition Analysis
By Dr. Jim on Aug 20, 2009 | In Welcome, Science, Applications
The computer world has struggled with voice recognition and its correct implementation for many years. Primitive and very poor attempts have been achieved thus far.
We will look at this problem starting from empirical observation of the human as a model.
You have vocal cords and a pharynx, a face and an oral cavity, and a tongue and teeth that form the words of human speech.
You have to be able to strike a note vibrating at a specific frequency in order to speak. Some people have lower or higher voices – higher or lower frequencies with which people are most comfortable with. That will be the voice with which you speak, the one most comfortable to you.
When you whisper, you have the rush of air out of your lungs and oral cavity, over the tongue and teeth, and out through the mouth. No vibration of vocal cords occurs with whispering.
It should be noted that words can be formed understandably with either vocal cords or by whispering. Again, whispering can be understood. This is a clue that everybody in voice recognition misses. So, obviously, the vocal cords only serve to project and increase the volume and distance that your words carry.
There are some inflections that can be added to your speech that increase or decrease volume only. These intonations are not necessary to understand speech. If these were necessary, you could not speak while whispering.
The vocal cords provide a carrier wave for the spoken word. But when you whisper, that carrier wave is eliminated, but you can still understand speech. All you have left is merely the rush of air through the oral cavity.
When you whisper, all you are modulating with your facial muscles is the rush of air from the lungs to the throat, tongue, teeth, and lips. There is no carrier wave, i.e. the vibration of vocal cords.
Without the carrier wave that the vocal cords generate, you can still be as well understood as long as you can be heard.
We assume that the microphone (audio pickup) has sufficient sensitivity to pick up a whisper.
What all this means is that we can extract and discard the carrier wave, because it is not necessary for speech. That is the key. All you have left is the modulation. In a whisper, all you have is the modulation, nothing else. The rush of air in a whisper is the carrier, but it is nothing more than white noise in the audio spectrum.
Therefore, we have established that to do speaker-independent voice recognition, it is absolutely imperative that you remove the carrier whether it be the vibration of vocal cords or white noise, and analyze nothing but the modulation absent any carrier.
Would that not be step one in the creation of speaker-independent voice recognition, the removal of the carrier wave and the analysis of only the modulations?
This is contrasted with the current technology of voice recognition where the carrier wave is always considered. This needlessly and exponentially increases the complexity and computation power necessary for voice recognition, not to mention speaker-independence.
By empirical observation alone, we have exponentially decreased the complexity of speech recognition.
My approach to speech recognition is to filter out all vestiges of the carrier wave leaving nothing but the modulation waveform.
On the oscilloscope, all you have left is the modulation wave, a far less complex waveform to analyze.
My voice recognition uses nothing but the modulation.
Here are all the steps I take to create my voice recognition software, except for the actual algorithms I use, which I will never divulge in source code, but will provide for sale in binary format only.
Step 1: We have to design a circuit to filter out the carrier wave component of speech, regardless of what that is, whether the vibration of vocal cords in normal speech, or the rush of air in a whisper. This should be done before digitization of the audio input and should leave nothing but the modulation waveforms.
Step 2: Now we digitize the raw modulation information. This is a relatively low frequency component. A one to two KSPS (thousand samples per second) digitization should be more than sufficient to yield a good tracking of the modulation information.
Since most words require less than one second to pronounce, the raw digitized data will be less than or equal to 2000 bytes, where 1 byte = 1 sample. Eight bits per sample is more than sufficient to track this modulation waveform.
Step 3: We convert the 2000 samples, as an example, into a delta modulation waveform, which yields approximately 1 bit per sample. This, then, is a valid compression technique for the modulation component. This technique yields 250 bytes to represent the original 2000 bytes, or an 8:1 compression ratio of the original data.
Step 4: We pass the 250 bytes of information through the neural synaptic generator which then yields one synaptic interconnection to represent the original spoken word.
Step 5: Repeat steps 1 – 4 by speaking the word 10 – 20 times, which will represent 10 – 20 variations on the word being taught. This will yield a 95 – 98% recognition rate of the spoken word in a speaker-independent form.
This recognition percentage can be increased by merely increasing the number of teachers with differing dialects, such as northern, southern, eastern, and western and perhaps English and Australian dialects as well. The machine can then be allowed to make a “best guess” at the word, which further increases its accuracy.
These algorithms require no floating point or signed arithmetic to implement and can be implemented using an extremely simplistic instruction set, i.e. a RISC instruction set computer environment.
This is why we don't need a powerful mainframe architecture, we merely need the speed, and that only in moderation.
By following these engineering axioms, a highly sensitive, speaker-independent speech recognition system can be implemented on a low-cost platform.
Let us compare this to the brain. In the neurobiological model (the brain) there are no adders, subtracters, floating point arithmetic mathematical units, or signed arithmetic units, and yet a child can learn to recognize any and multiple spoken languages with great accuracy.
The key to my version of voice recognition is, as Einstein stated, “empirical observation.”
Dr. Jim
Machine Intelligence Technologies
support@machineinteltech.com
Wiffiti Enabled MIT Home Page
By Dr. Jim on Aug 14, 2009 | In Welcome
Link: http://www.machineinteltech.com/
The Machine Intelligence Technologies home page has now been Wiffiti enabled. It is live and interactive. You may text @wif6005 + your message to 87884 to see your responses from web or mobile phone. Normal text rates apply.
Free KISS OS to Current Customers
By Dr. Jim on Aug 11, 2009 | In Welcome, Tools
Dr. Gouge asked me to do some data scrubbing and find out all the email addresses of our current customers who have purchased the SRAM Memory Expansion board from us.
We will be sending you an early edition of the KISS OS for free. Thank you for your continuing support, and have fun with KISS OS! We will be distributing it on Friday, August 14th, so keep an eye on your email inbox.
As the KISS OS needs the expanded memory to run, only those who have purchased our SRAM Memory Expansion board will be sent the operating system.
Thanks for everything,
Mark Allred
Machine Intelligence Technologies
support@machineinteltech.com
KISS OS Release
By Dr. Jim on Aug 8, 2009 | In Welcome, Tools
We have found some bugs in the KISS OS that we are currently working to fix. This is the reason we have not yet released it.
We are also working on the voice recognition component at the same time. What may happen is the voice recognition will be included in the first release. That is still up for debate, but it is getting closer while the bugs are being fixed in the OS, so if the voice recognition portion is not released with the OS, it will be soon after.
Thanks for your patience with this. Dr. Gouge is working very hard to bring this to fruition.
Mark Allred
Machine Intelligence Technologies