Parallel processing, due to its tremendous ability to speed up computationally expensive problems, has always been a fascinating topic to me. I believe that just like 3d graphics, speech recognition could benefit greatly from parallel processing via dedicated hardware. In order to accurate complex interfaces on embedded devices dedicated hardware is an ideal way to go because it free’s up main CPU for other tasks, keeps power consumption down relative to other equivalent solutions, an d also can keep costs down if enough volume is available. My search, while not complete, pretty much lead me to research papers. Then, on the advice of a co-worker I decided to look for general purpose parallel processing hardware and found the NVidia, a leader in graphics cards, has a program that enables developers to harness the parallel nature of their devices. The program is called CUDA and there are a lot of cool application people have built. Basically you just need a special driver, a compatible graphics CARD (extra i assume), and their developer kit that comes with custom C compiler.
Here is a detailed whitepaper describing how it all works: