Let's start by introducing general categories of data used or generated by voice assistants. I will apply these to our SABOT, which is designed as a voice assistant for machines. In the context of operator-machine interaction, SABOT processes the following types of data:
1. Operator-related data
Operator speech is one of the most obvious types of data. The voice interface processes the operator's speech via the audio input, e.g. via the headset microphone or the audio input at the machine itself.
2. Machine-related data
Machine data can include a wide range of information, from machine manuals to data from IoT systems. We can divide it into three basic groups:
3. Business data
There may also be other data that influences the conversation between operators and machines, such as data from planning tools. For example, operators could ask for the daily schedule for the machine and change it according to the current circumstances.
4. External data
Examples of external data are weather forecasts or event calendars that can influence the number of customers visiting the store. The voice assistant makes requests to public services to obtain such information.
5. Data generated by the voice assistant
Voice assistant provides responses based on input from operator or according to updates from machine or any related external system. These responses must be curated to ensure that they consist of correct information and guarantee privacy and legal obligations.
In addition to the described data, we also need to consider other aspects of data processing, such as issues of data protection, data persistence or the lifespan of data. I will deal with these topics in the following posts.