Host Speech Recognition :: QUARC Targets Library

navigation bar

Host Speech Recognition

Recognizes spoken commands on the host, rather than the target.

Library

Description

Host Speech Recognition

The Host Speech Recognition block recognizes spoken commands on the host and outputs the command identifier and any additional values corresponding to format specifiers in the commands. The Host Speech Recognition block must be used in connection with a Host Initialize block. The Host name parameter must be set to the identity of the Host Initialize block. The Host Initialize block determines how the speech recognition on the host will communicate with the real-time code running on the target, which may be remote. Refer to the documentation for the Host Initialize block for details.

A list of commands may be entered in the block's configuration dialog. Each command entered is assigned an identifier equal to the command's position in the list. When a particular spoken command is recognized, the integer identifier is output from the cmd output of the block.

Format specifiers may be included in a command and will cause the Host Speech Recognition block to recognize and output the corresponding value as part of the command. For example, a command of "turn %3d degrees" will recognize spoken phrases such as "turn 30 degrees" or "turn minus 225 degrees". The "%d" indicates a signed integer and "%3d" limits the signed integer to three digits. Limiting the number of digits to the number actually required results in better speech recognition since there are fewer possibilities for the speech recognition engine to evaluate.

The new output indicates when a new command has been recognized. The output is high for one sampling instant whenever a new command is recognized. The new output can be fed to a Triggered Subsystem to only respond when a command is spoken.

For a detailed description of the format specifiers, refer to Format Strings for Scanning. Not all possible variations are currently supported. The interpretation of the format specifiers is also slightly different. The following table outlines the differences.

Format Specfier	Comment
Alternate forms (#)	The '#' flag is ignored. There are currently no alternate spoken forms.
Assignment suppression (*)	Assignment suppression is supported. A phrase corresponding to the given format must be spoken but is not output from the block.
Grouping (')	The digit grouping flag is ignored. Spoken numbers do require digit separators.
Field width	The field width limits the number of digits in the integral portion of spoken numeric values. For example, "%2d" recognizes a signed integer whose magnitude is less than 100 (two digits or less). For strings, the field width determines the dimensions of the output vector.
Modifier	All the modifiers are supported.
Argument type	All the argument types except 'k', 'n', 'p' and '[' are supported. Binary numbers should be spoken as a series of "zero" and "one" digits. Similarly, octal and hexadecimal numbers must be spoken as a series of individual digits. Integers and real numbers are spoken in natural form, such as "two hundred and thirty five". Real numbers support an optional fractional part and exponent. The fractional part is spoken as a series of individual digits. For example, the number "31.45" should be spoken as "thirty one point four five", not "thirty one point forty-five". Similarly, the number "3.14e6" would be spoken as "three point one four e six" or "three point one four times ten to the power of 6" or "three point one four e to the six". The field width restricts the number of digits in the integral portion of the number and is highly recommended. The string argument types ('s' and 'S') recognize a single arbitrary word and can thus be used for general dictation. For example, a command such as "My first name is %s" would recognize the phrase "My first name is Dan" and the word "Dan" would be output by the block. A field width should be specified for strings in order to provide the dimensions of the output vector. The character argument types ('c' and 'C') only recognize printable ASCII characters, including some control characters such as "tab", "newline", "carriage return" and "space".
Dimensions	Dimensions are not currently supported.

While a field width is not required for numeric format specifiers, it is recommended. The field width determines the number of digits recognized in the integral portion of the number. For example, "%2d" will recognize a signed integer up to two digits, while "%3f" will recognize a real number in which the integer portion (prior to any exponent being applied) is limited to three digits. Strings should have a field width or maximum number of code units because it determines the dimensions of the string otput. Variable field widths are not supported.

Helpful Hints

Specifying the confidence required for a word to be recognized

Hint The Host Speech Recognition also provides a means for specifying the confidence required to recognize a word. Place a '+' immediately prior to a word require greater confidence in order to recognize the word. Place a '-' immediately prior to a word to allow less confidence in the recognition in order to match the word. For example, "+This -is -a +test" will work harder to match the words "This" and "test", but will be less stringent in matching "is" and "a".

Improving recognition accuracy

Hint Recognition can also be improved by adding a generic "%*50s" to the bottom of the list that acts as a "catch-all" for unrecognized words. Doing so forces the speech recognition engine to have more confidence in the other commands in order to recognize them. The Minimum confidence parameter may also be used to reject recognitions that do not have sufficient statistical confidence.

Installation Requirements

Microsoft Speech Recognition

Install Before using the Host Speech Recognition block, configure the default Microsoft speech recognition engine that comes with Windows 7 and above in Control Panel. Failure to do so will cause it to ask for settings when the model is first run.

Input Ports

This block has no input ports.

Output Ports

new

This output is high in any sampling instant in which the data at the other outputs is new. This output may go high when the other outputs don't appear to change, although new values will have been sent from the Host Peripheral Client. For example, host peripheral drivers typically send the initial state of their device when the Host Peripheral Client first connects, but this initial state may match the default outputs.

cmd

A 32-bit integer equal to the identifier of the last command recognized. This output holds its value until a new spoken command is recognized. Use the new output to determine when a command has actually been spoken.

...

Additional ports are created for each format specifier in each command string. The data type and dimensions of each output are determined by the format specifier. The port labels take the following form:

<command identifier>:<format>

The <command identifier> indicates the command to which the output corresponds, and the <format> indicates the format specifier within the command. For example, "3:%d" would represent the signed integer format specified "%d" within the third command.

Parameters and Dialog Box

Host Speech Recognition

Host name

The identity of the associated Host Initialize block. The Host Speech Recognition block must be associated with a Host Initialize block.

Commands

The list of commands to recognize. Commands may be added, modified or removed from the list, as well as rearranged, using the "+", "=", "-" and arrow buttons respectively at the bottom of the list. Type new commands in the edit box below the list and press "+" to add that command to the list. To modify a command, select the command in the list, change the text in the edit box, and then click "=" to apply the changes to the command. To remove a command, select it in the list and press the "-" button. To move a command, select it in the list and use the arrow buttons to move it up or down.

Minimum confidence (tunable offline)

The minimum confidence required for a command to be recognized. This parameter is used to make recognition more stringent by rejecting commands that have been recognized by the speech engine but with lower confidence. Specify a value of zero to put no restrictions on the confidence level. Higher values are more restrictive. The range is specific to the speech engine. Microsoft's speech engine that comes pre-packaged with Windows uses a scale from 0 to 1. Hence, a value of 0.95 may be a reasonable choice to obtain better accuracy.

Sample time

The sampling period (in seconds) at which spoken commands are output from the Host Speech Recognition block. A sample time of 0 indicates that the block will be treated as a continuous time block. A positive sample time indicates that the block is a discrete time block with the given sample time.

A sample time of -1 indicates that the block inherits its sample time. Since this is a source block, only inherent the sample time when it is placed in a conditionally executed subsystem, like a Triggered or Enabled Subsystem, or in a referenced model.

The default sample time is set to qc_get_step_size, which is a QUARC function that returns the fundamental sampling time of the model. Hence, the default sample time is a discrete sample time with the same sampling time as the fixed step size of the model.

Enable Windows voice commands (tunable offline)

Check this option to allow Windows voice commands to be used at the same time. This option also allows the speech recognition engine to be shared with other applications as well. If this option is not checked, then the Host Speech Recognition block acquires exclusive access to the speech recognition engine and audio input source.

Enabled (tunable offline)

Check this option to enable the block. If this option is not checked then the outputs will be set to default values. This checkbox is convenient for disabling the block when use of speech recognition is not desired.

Targets

Target Name	Compatible*	Model Referencing	Comments
QUARC Win32 Target	Yes	Yes
QUARC Win64 Target	Yes	Yes
QUARC Linux Nvidia Target	Yes	Yes
QUARC Linux QBot Platform Target	Yes	Yes
QUARC Linux QCar 2 Target	Yes	Yes
QUARC Linux QDrone 2 Target	Yes	Yes
QUARC Linux Raspberry Pi 3 Target	Yes	Yes
QUARC Linux Raspberry Pi 4 Target	Yes	Yes
QUARC Linux RT ARMv7 Target	Yes	Yes
QUARC Linux x64 Target	Yes	Yes
QUARC Linux DuoVero Target	Yes	Yes
QUARC Linux DuoVero 2016 Target	Yes	Yes
QUARC Linux Verdex Target	Yes	Yes
QUARC QNX x86 Target	Yes	Yes	Last fully supported in QUARC 2018.
Rapid Simulation (RSIM) Target	Yes	Yes
S-Function Target	No	N/A	Old technology. Use model referencing instead.
Normal simulation	Yes	Yes

* Compatible means that the block can be compiled for the target.