Format Strings for Scanning

navigation bar

Format string are much like format strings for the C library function scanf. It generally contains text as well as format specifiers. Format specifiers are used to format input data and are preceded by the '%' character. For example, the format string "X: %f" contains one format specifier: "%f", which indicates that a single-precision floating-point formatted value should be scanned. Hence, the input in this case might look like "X: 3.14".

The general form of a format specifier is:

%<flags><.><field width><modifer><argument type>[<code units>](<dimension>)

The '<'and'>' characters delimit the name of a component of the format specifier and are not included in the actual format string. All other characters are literals that are included in the format string, including '[', ']', '(' and ')'. All components are optional except the argument type, which is required. Note also that the ability to specify the maximum number of code units and an array dimension goes beyond the capability of the C library scanf function.

The components of a format specifier are:

flags:

* = suppress assignment to an input argument. Parse but ignore this field.

# = allow alternate form.

%b - allows input to be prefixed with 0b.

%d - allows input to be prefixed with 0b, 0x or 0.

%i - allows input to be prefixed with 0b, 0x or 0.

%o - allows input to be prefixed with 0.

%p - allows input to be prefixed with 0x.

%x - allows input to be prefixed with 0x.

' = ignore separator (comma) between digits grouped into thousands.

'.'

An optional '.' may be entered prior to the field width. This character is only necessary for specifying a variable field width so that the variable field width indicator, '*', is not confused with the assignment suppression flag. For example, "%*d" indicates that an integer should be scanned but not assigned to an input argument, while "%.*d" indicates that an integer should be scanned and assigned to an input argument, and that the field width is specified by an additional input argument. In the first case, the '*' is the assignment suppression flag. In the second case, the '*' is a variable field width indicator.

field width:

The maximum width in characters (not code units) of the input field. Specify a positive integer. The input will not be scanned past the maximum field width. To specify a variable field width precede the field width with a '.' and then specify a '*' character to indicate a variable field width. In this case, an extra input is required which allows the field width to be specified dynamically via the input. The '.' is necessary to avoid confusion with '*' as the assignment suppression flag.

modifier:

hh = integer argument is a char.

h = integer argument is a short. Character argument is a char. String argument is a multibyte string.

I8 = integer argument is an 8-bit integer (t_int8). Character argument is a UTF-8 character (t_uint8). String argument is a UTF-8 string.

I16 = integer argument is a 16-bit integer (t_int16). Character argument is a UTF-16 character (t_uint16). String argument is a UTF-16 string.

I32 = integer argument is a 32-bit integer (t_int32). Floating-point argument is a 32-bit float (t_single). Character argument is a UTF-32 character (t_uint32). String argument is a UTF-32 string.

I64 = integer argument is a 64-bit integer (t_int64). Floating-point argument is a 64-bit float (t_double).

j = integer argument is an intmax_t.

l = integer argument is a long. Floating-point argument is double. Character argument is a wchar_t. String argument is a wide string.

ll = integer argument is a long long (64-bit integer).

L = floating-point argument is a long double.

q = integer argument is a long long (64-bit integer).

t = integer argument is a ptrdiff_t.

w = Character argument is a wchar_t. String argument is a wide string.

z = integer argument is a size_t.

argument type:

b = scan the unsigned integer argument in binary, alternate form allows 0b as a prefix.

B = scan the unsigned integer argument in binary, alternate form allows 0B as a prefix.

c = scan a character.

C = scan a wide character.

d = scan a signed integer value. Alternate form allows the prefixes 0b, 0x and 0, with the value interpreted accordingly (binary, hexadecimal or octal). If no prefix is specified then the value is assumed to be decimal.

e = scan a floating-point value. Any form of valid floating-point value is recognized.

E = scan a floating-point value. Any form of valid floating-point value is recognized.

f = scan a floating-point value. Any form of valid floating-point value is recognized.

g = scan a floating-point value. Any form of valid floating-point value is recognized.

G = scan a floating-point value. Any form of valid floating-point value is recognized.

i = scan a signed integer value. The prefixes 0b, 0x and 0 are always recognized and the value interpreted accordingly (binary, hexadecimal or octal). If no prefix is specified then the value is assumed to be decimal.

k = scan a pointer value.

n = return the number of code units, not characters, read so far from the input.

o = scan an unsigned integer argument in octal, alternate form allows a 0 prefix.

p = scan a pointer value, alternate form allows 0x prefix.

s = scan a string (delimited by whitespace).

S = scan a wide string (delimited by whitespace).

[...] = scan a string containing only characters in the given character set (or only characters outside the character set if [^...] specified. The '...' is any string of characters to be included in the set. To include a ']' in the set, specify it as the first character i.e. []...] or [^]...]. To specify a range of characters, use a hyphen e.g. [a-z]. To include a hyphen, place it as the first or last character in the set e.g. [-abc] or [abc-]. To include both a hyphen and ']' in the set, specify []-..] or []...-]. To ignore input to the end of a line, specify "%*[^\r\n]".

u = scan an unsigned integer value Alternate form allows the prefixes 0b, 0x and 0, with the value interpreted accordingly (binary, hexadecimal or octal). If no prefix is specified then the value is assumed to be decimal.

x = scan an unsigned integer argument in hexadecimal, alternate form allows a 0x prefix.

X = scan an unsigned integer argument in hexadecimal, alternate form allows a 0X prefix.

code units:

Specify a positive integer to use a fixed maximum number of code units. Specify a '*' character to indicate a variable number of code units as the maximum. In this case, an extra input is required which allows the maximum number of code units to be specified dynamically via the input. The code units are interpreted differently according to the argument type:

%c = the maximum number of code units in a character after conversion (defaults to 1).

%s = the maximum number of code units in the string

dimension:

If specified, the argument is treated as an array with the specified dimension (length). Array elements must separated in the formatted input by commas. For the 'c' and 'C' argument types, characters are not separated by commas but are scanned as a string containing exactly the number of characters indicated by the dimension. Whitespace is not ignored in this case. Each array element is assumed to be formatted according to the given field width, modifier, etc. There is no capacity for specifying the total field width of the entire array - only the field width for each element scanned.

Specify a positive integer to use a fixed array dimension. Specify a '*' character to indicate a variable array dimension. In this case, the size of the array is determined by the size of the corresponding input port for Simulink blocks, and by an extra input argument for C code. Unlike the maximum number of code units, specifying a '*' for the dimension does not cause another input port to be created for Simulink blocks.

Whitespace in the format string matches any number of whitespace characters in the input (including no whitespace).

The numeric and string format specifiers (e, E, f, F, g, G, and s) skip leading whitespace. This whitespace is not included when checking the maximum field width. For example, the format specifier "%3u" skips leading whitespace and then scans the input for an unsigned integer value of up to 3 digits in length, regardless how much whitespace was present prior to the first digit.

Note that Simulink uses double-precision floating-point by default, so format specifiers will generally require the 'l' modifier.

In scanf, the format string "%10c" scans 10 characters. In QUARC, the field width is ignored for the 'c' argument type. Instead, use the format string "%c(10)" to accomplish the same thing. You may also specify a variable dimension, such as "%c(*)", to specify the dimension as an extra input argument, allowing a variable number of characters to be scanned.

Use the modifiers to adjust the size of the input data types to the desired data type. The format specificiers supports a broad range of modifiers and thus a wide variety of potential input and output data types.

Any characters other than a format specifier are matched exactly in the input. Hence, a format string of "X: %lf" requires that the string "X:" appear in the input, followed by any number of whitespace characters, followed by a double-precision floating-point value.

Special Characters

For Simulink blocks, the special characters '\n', '\r', '\t', '\b', '\f' and '\v' are recognized in format strings to be the newline, carriage return, tab, backspace, formfeed and vertical tab respectively. For example, the format string "Hello world\n" results in the text "Hello world" followed by a newline character. For C code, all the standard escape sequences are recognized.

For scanning formatted text from Windows Hyperterminal, note that the '\r' character is used to terminate lines of text rather than the '\n' character. Both characters can be ignored simply by putting a space character in the format string.