or several years, Sabo has been interested in the use of artificial intelligence and machine learning to support software solutions in industry. We invented and are constantly developing our product Sabot (Smart Assistant Bot), our Intelligent Voice Assistant (IVA) solution for controlling machines like the MIWE bakery ovens. The user can then not only control the device with his voice commands, but also use the intelligent capabilities of the assistant itself.
For the purpose of the trade show, we wanted to wrap the voice user interface in an attractive package, and the cute NAO robot seemed like an ideal candidate.
There were three main areas of operating the NAO:
Following paragraphs describes our experience with implementation of the Service mode.
In the Service mode the NAO robot acted as a barman. A visitor could put questions, let the robot explain its offer, added or removed items and submitted the order. The robot was able to speak in English and German.
The texts below are transcripts. Conversations took place in the form of natural speech.
Robot: How can I assist you with your order?
Visitor: What refreshments do you offer?
Robot: We offer hot drinks, cold drinks, and snacks. How can I assist you with your order?
Visitor: Hot drinks.
Robot: Here are the hot drinks we offer: Caffé Crema, Espresso, Cappuccino, Latte Macchiato, Hot Chocolate, Tea.
Visitor: OK. Please give me an espresso.
Robot: Sure! I've added an Espresso to your order. Would you like a single or double shot?
Visitor: Single with milk.
Robot: Got it! I've added a single shot Espresso with milk to your order. Is there anything else you would like to add?
Visitor: And something to eat please.
Robot: Certainly! We have a variety of snacks available. Here are some options: Swiss Platter, Sweet Snack, Savory Snack, Pretzel, Croissant.
Visitor: Two croissants.
Robot: Great choice! I've added 2 Croissants to your order. Is there anything else you would like to add?
Visitor: No thanks.
Robot: You're welcome! Your order has been submitted. If you need any further assistance, feel free to ask.
Visitor: A single espresso with a milk and two croissants.
Robot: Sure! I've added a single shot Espresso with milk and 2 Croissants to your order. Is there anything else you would like to add or modify?
After collecting and submitting an order, the order was visually published via a web application on a near large display. So the visitor saw their ordering and waited for a minute to prepare the real refreshment.
The NAO robot is equipped with some sensors and I/O devices. There is a graphical IDE named the NAO Choregraphe supporting NAOqi graphical language. The language can build scenarios consisting of multiple steps. A step could contain code written in Python language and using NAO SDK.
NAO Audio Provider is a component hosted directly on NAO robot providing audio from its microphone.
NAO Audio Source Service is a dedicated Python application hosted on a computer that is interconnected to the NAO robot. Its responsibility is just routing audio data fragments from NAO microphone via a socket to NAO Backend Service.
NAO Backend Service is a .Net 6 service orchestrating these main sub-tasks:
NAO Text Response Handler is a Python component hosted directly on the NAO robot. It constantly checks for a textual response on NAO Backend Service. In case there is any response, the NAO robot passes the text into its built-in Text-To-Speech processor.
NAO Order Visualizer is a web application that listens to a socket on NAO Backend Service. Once a new order appears, the order is shown as a visual confirmation for the visitor.
We integrated a solution for ordering based on ChatGPT-3.5 from OpenAI provided as the Azure OpenAI Service. The integration could be logically divided into a few sub-tasks:
I am not surprised that ChatGPT prompt engineering has become a standalone field in software development. To tune the behavior of a chat bot is not a trivial task. A changed word in the prompt could impact the behavior significantly. We experimented with various versions and wording till we got a solution passing our expectations.
You can check fragments from our barman prompt:
Using ChatGPT functions is a powerful feature provided by some language models starting from gpt-3.5-turbo-0613. I cannot imagine achieving the ordering functionality without using the functions.
We defined three functions named action_ItemAddedIntoOrder, action_ItemRemovedFromOrder and action_OrderSubmitted. A function’s name, descriptions, parameters, everything matters and could impact behavior of the bot.
The function for adding an item into the order was defined:
Name: action_ItemAddedIntoOrder
Description: A product item added to the order.
Parameters [name, description]:
[Position, Position of the product item in the order starting from 1.]
[Quantity, Quantity of the product. Default is 1.]
[Product, The product item's name added to the order.]
…
ChatGPT can detect the point in conversation that is handled by invoking the method. E. g.
Visitor: A single espresso with a milk.
is interpreted as a calling:
In the handling of the function’s arguments, it is necessary to do some validation logic. The returning response must be adjusted according to required behavior.
And what helped us much was a mention the function name in the prompt for some specific cases. E. g. this rule helped us to properly handle adding the next item of the same product within the order:
- If there is anything added into the order even if the added item is similar to the previous one, call `action_ItemAddedIntoOrder` again.
The main component abstracting the order bot logic was designed like that:
The implementation of the abstraction was built from the orchestrating class OrderBotController and several sub-components responsible for partial business logic.
There were more than 60 integration tests for OrderBotController. ChatGPT responding is typical for its nondeterministic character. It is hard to assert exact assistant responses by the tests. However by introducing the ChatGPT functions, you can check their invocations including their parameters. The tests bring very quick and valuable feedback to any change in the function handling as well as in the prompt definition. By using this approach, the tuning was more safety and efficient.
The NAO robot was accepted in a very positive way. It was the cutest barman at the fair.
We developed the first version. It was a very exciting journey and experience. We also collected many ideas and suggestions on how to improve our solution for the next version. And we are going to benefit from the given experiences. The technologies related to ChatGPT are evolving very quickly, so new features occur often. And projects like that bring us better understanding of the potential and constraints of related technologies.