AI Assistant: Speech to Text

 

My AI Assistant project, or Project Calcifer, is one of my longer-term projects. It will consist of many interworking parts of various AI components. As such, it will require extensive study and expertise, as well as general improvements in the field of AI for this project to become realized.

Main Focus

The goal of this project is to create a conversational digital assistant to give my computer a human-level ability to communicate and assist with all of my work. The main challenge of this project is that this technology is still being developed and improving significantly so that although the technology that exists today has massive utility and potential for indie developers, it is raw with little documentation or support. Each tool has great potential, and I am approaching each one individually to develop a suite of capabilities to implement an AI assistant over time.

The first step in development is implementing Speech to Text so that I can speak naturally and have my words transcribed and organized.

The Process

As the first step towards a larger AI Assistant, I used Microsoft Azure Cognitive Services to create a speech to text script that takes audio files and transcribes them into written documents. I have over 300 hours of spoken audio that I’ve accumulated that I am in the process of converting.

The experience working with Azure’s speech to text has been very positive. The transcription is very accurate and the API is easy to work with.

The biggest problem with Azure is that since its only option is to run the speech to text through their cloud service, it ends up costing quite a bit of money. I am exploring other alternatives such as Nvidia’s Speech to Text and other open-source options.

Conclusion

Working on this project has given me a lot of experience working with Azure tools, and I have been able to familiarize myself with the entire set of Azure offerings. I am continuing to make use of Azure to create the AI Assistant while expanding my knowledge and understanding of the field of AI offerings for continued Assistant implementation.

 
Christopher DiCarlo