In use

Simple cmds_scanf() v2 example

struct {

unsigned int l_data;

unsigned char * p_data;

} cmd;

// 'BC' endoded as a v1: 024243

// 'BC' encoded as a v2: 00024243

// 'BC' encoded as a v3: 0000024243

// 'BC' encoded as a v4: 000000024243

// In all these cases, 'cmds_scanf()' will populate l_data with '00000002',

// and p_data will point at the byte with 0x42 in it.

// Note the big-endian encodings for the length bytes.

:warning:

All integer values (whether data or length values) should be encoded/entered into the serialized data as big-endian values.

Here is a carefully crafted cmd struct, designed to demonstrate what can go wrong:

Simple example of a Problematic C Struct

struct { // Command struct for sfc 0

unsigned int flags; // u1

unsigned int mech; // u2

unsigned int spec; // u4

} cmd;

A PATTERN for this structure (as commented) is u1u2u4. An equally correct pattern is u2u1u4. Or u4u2u1. Each of these patterns have the correct number of tokens, and the input buffer size will match what the tokens tell cmds_scanf to expect.

To the cmds_scanf command, these three patterns match the cmd struct given. What will differ is how the incoming buffer was serialized on the host, how p_cmd is interpreted, and what the effective result values in the cmd struct are. The call may return no error -- but the populated struct would have incorrect information in it.

The cmds_scanf method is smart enough to identify when the p_cmd length differs from what the pattern tells cmds_scanf to expect. In the above example, if data is supplied for the terminal token ('v1' or '\*') the pattern is accepted. If no data is supplied (valid when '*'), the 'v1' pattern will generate a deserialization error (unexpected end of input).

If the host programmer is misinformed or doesn't understand how things are parsed on the CryptoServer, they could easily provide the correct data, but be responsible for an incorrect serialization of that data. An incorrect serialization would result in either the parse failing, or worse, the parse succeeding but the data being corrupted.

Common failure modes:

  • Adding a length value into the data when the pattern expects a 'remainder' mark, '+*+',

  • Using an incorrect length mark size (serialized v4, expected v2),

  • Two integers like `u4u4`, but on the host they are encoded as "flags, then specifier", but on the HSM as "specifier, then flags".

  • ...

The cmds_scanf method is a "limited copy" parser. It is limited copy, because integers (the length fields of 'v' tokens, or 'u' tokens) are copied by value, but data fields (for the 'v' and '*' tokens) are pointed to.

:warning:

Do not attempt to modify data pointed to by a field in the cmd struct, as populated by cmds_scanf -- the sub-function code does not 'own' that data.

Because data fields are pointers, subsequent code can use the struct members for read-only access to data within the passed-in p_cmd data. However: If the code tries to modify the pointed to memory, results are indeterminate (the p_cmd data is 'owned' by the parent context, not by the sub-function method).

Again, the programmer is expected to understand how the pattern scanning works on the HSM, what is required in the interface (based on each field within the pattern), the order of the fields being serialized, and also is expected to know what is best practice on the host, for assembly of the serialized data in a given language, taking into account endianness, length markers for 'v'-type fields, etc. The programmer is also expected to understand that the CryptoServer is big-endian, that a command byte array (or response) is limited to 256Kb of data (less some overhead), etc.

There are, in short, several different failure modes that automation could check for, or ensure do not happen.

While cmds_scanf-like methodology is provided on the host, it is not available in the high-level (cryptosystem) APIs, only in the low-level libcsx \(C) api.