How voice assistants work | We explain it!

How are the user interfaces of voice assistants structured?

Conversational user interfaces (CUI) or voice user interfaces (VUI) replace the GUI (graphical user interface) in voice assistants. In order for these systems to react to a wide variety of questions and formulations, the context of use must be linked accordingly. This is the fundamental difference between a CUI and a graphical user interface (GUI). In the case of a GUI, the respective windows, menus and dialog areas form a context that the computer program easily recognizes. With language – especially with dialogue-oriented language – the context is formed from the respective sentences. Here, the program must be able to react much more flexibly to context changes via NLP (Natural Language Processing) than with classic graphical interfaces.

How do voice-controlled dialogue systems work?

Basically, dialog systems use NLP to analyze speech in text form and to understand the user’s concern (intent). After the intents have been recognized, the required action is carried out via the underlying API connections and returned to the user as feedback. The NLP is always based on text in different languages. For the conversion of spoken language, TTS (text to speech) engines are used, which generate a text from speech that can be used by NLP. Either text is used for output or the text is converted back into spoken language using the STT (speech to text) engine.

Possible access and output media for virtual assistants

In principle, there are a large number of possible entry points to virtual assistants. It is advantageous that all entry points work with language. Therefore, only the input nodes need to be defined, processing and information retrieval remain the same. This saves development time, especially with many different access routes, and is easy on the budget.

Telephone / dial-in
Microphones at the POA / POS
Apps & websites as chatbots or voice systems
Text messages (SMS, WhatsApp, Telegram)
Voice messages (WhatsApp, Facebook)
Social Media (Facebook / Twitter / Instagram)
Letters and faxes via OCR

Other examples of voice assistants: Google Home Mini and Siri

How does natural language processing (NLP) work?

All natural language processing approaches have in common that they observe the hierarchies that determine the relationships between the individual words. This is difficult because many words have multiple meanings: “net”, for example, can be used in a money context (“What was your net gain for the year?”), but it can also be used in an entirely different context (“He caught the fish in a net.”). For this reason, natural language processing is one of the most complicated areas in computer science. Language is often ambiguous and understanding it requires extensive knowledge of the context in which it is used. In order to teach computers natural language, computational linguists use the knowledge of the various linguistic areas:

Morphology deals with the composition of words and their relationships to other words.
The syntax defines how words are put together to form sentences.
Semantics is about the meaning of words and groups of words.
With the help of pragmatics, the context of linguistic utterances is taken into account.
Finally, phonology deals with the acoustic structure of spoken language and is important for speech recognition.

In the meantime, NLP is also very often supported by deep learning systems, which analyze language rules, context and language usage based on large databases and make them usable for computer systems.

Common questions about voice assistants

Do you always need an online connection for voice assistants?

Basically there are also systems (TTS, STT and NLP) that manage without an online connection and also with limited computing power. We have already carried out a number of projects in such setups and the results are satisfactory.

Is there a data protection problem with voice assistants?

In principle, voice assistants have to react to certain words to activate the voice interaction (e.g. “Ok, Google”, “Alexa”, …). To do this, you always have to analyze the noises in your backdrop or there are buttons or gestures that activate the assistant. If the room is permanently bugged, data protection problems are very likely, especially in public spaces such as means of transport. The issue of data protection must therefore be included and checked at an early stage when using voice assistants.

Our conclusion on voice assistants

Natural input based on human language is a great way to interact with computers. Technically less savvy people in particular can work with software in a very targeted and efficient manner. In the areas of customer care, eCommerce, building control, trade fairs and many other areas, the current possibilities of voice assistants already offer considerable added value for companies and consumers.

Are you planning a voice assistant?

Then get in touch right away. We have experience in the design and implementation of voice assistants and will be happy to advise you without obligation.

Together we can create great things. Shall we talk about it?

hello@thisisdmg.com

You can find more about our services in the area of voice assistance on our service page Chatbot & Voice Interface

How do voice assistants work?

Google Home, Amazon's Alexa, and Apple's Siri are some examples of successful conversational voice assistants. In this article, we explain the individual technical components and provide insights into how voice assistants and voice-based dialog systems work.