Overview¶
Background¶
Typically python applications don’t care about memory layout of the used varables or objects. This is generally not a problem when parsing text based data such as JSON, XML data. However, when parsing binary data the Python language and standard library has limited support for this.
The pycstruct library solves this problem by allowing the user to define the memory layout of an “object”. Once the memory layout has been defined data can serialized or deserialized into/from simple python dictionaries.
Why and when does the memory layout matter?¶
Strict memory layout is required when reading and writing binary data, such as:
- Binary file formats
- Binary network data
Structs¶
Memory layout of an object is defined using the StructDef()
object. For example:
myStruct = pycstruct.StructDef()
myStruct.add('int8', 'mySmallInteger')
myStruct.add('uint32', 'myUnsignedInteger')
myStruct.add('float32', 'myFloatingPointNumber')
The above example corresponds to following layout:
| Size in bytes | Type | Name |
|---|---|---|
| 1 | Signed integer | mySmallInteger |
| 4 | Unsigned integer | myUnsignedInteger |
| 4 | Floating point number | myFloatingPointNumber |
Now, when the layout has been defined, you can write binary data using ordinary python dictionaries.
myDict = {}
myDict['mySmallInteger'] = -4
myDict['myUnsignedInteger'] = 12345
myDict['myFloatingPointNumber'] = 3.1415
myByteArray = myStruct.serialize(myDict)
myByteArray is now a byte array that can for example can be written to a file or transmittet over a network.
The reverse process looks like this (assuming data is stored in the file myDataFile.dat):
with open('myDataFile.dat', 'rb') as f:
inbytes = f.read()
myByteArray2 = myStruct.deserialize(inbytes)
myByteArray2 will now be a dictionary with the fields mySmallInteger, myUnsignedInteger and myFloatingPointNumber.
Arrays¶
Arrays are added like this:
myStruct = pycstruct.StructDef()
myStruct.add('int32', 'myArray', length=100)
Now myArray will be an array with 100 elements.
myDict = {}
myDict['myArray'] = []
myDict['myArray'].append(32)
myDict['myArray'].append(12)
myByteArray = myStruct.serialize(myDict)
Note that you don’t have to provide all elements of the array in the dictionary. Elements not defined will be set to 0 during serialization.
Strings¶
Strings are always encoded as UTF-8. UTF-8 is backwards compatible with ASCII, thus ASCII strings are also supported.
myStruct = pycstruct.StructDef()
myStruct.add('utf-8', 'myString', length=50)
Now myString will be a string of 50 bytes. Note that:
- Non-ASCII characters are larger than one byte. Thus the number of characters might not be equal to the specified length (which is in bytes not characters)
- The last byte is used as null-termination and should not be used for characters data.
To write a string:
myDict = {}
myDict['myString'] = "this is a string"
myByteArray = myStruct.serialize(myDict)
If you need another encoding that UTF-8 or ASCII it is recommended that you define your element as an array of uint8. Then you can decode/encode the array to any format you wan’t.
Embedding Structs¶
Embedding structs in other structs is simple:
myChildStruct = pycstruct.StructDef()
myChildStruct.add('int8', 'myChildInteger')
myParentStruct = pycstruct.StructDef()
myParentStruct.add('int8', 'myParentInteger')
myParentStruct.add(myChildStruct, 'myChild')
Now myParentStruct includes myChildStruct.
myChildDict = {}
myChildDict['myChildInteger'] = 7
myParentDict['myParentInteger'] = 45
myParentDict['myChild'] = myChildDict
myByteArray = myStruct.serialize(myParentDict)
Note that you can also make an array of child structs by setting the length argument when adding the element.
Bitfields¶
The struct definition requires that the size of each member is 1, 2, 4 or 8
bytes. BitfieldDef() allows you to define members that have any
size between 1 to 64 bits.
myBitfield = pycstruct.BitfieldDef()
myBitfield.add("myBit",1)
myBitfield.add("myTwoBits",2)
myBitfield.add("myFourSignedBits",4 ,signed=True)
The above bitfield will allocate one byte with following layout:
| BIT index 7 | BIT index 6 - 3 | BIT index 2-1 | BIT index 0 |
|---|---|---|---|
| Unused | MyFourSignedBits | myTwoBits | myBit |
To add myBitfield to a struct def:
myStruct = pycstruct.StructDef()
myStruct.add(myBitfield, 'myBitfieldChild')
To access myBitfield
myBitfieldDict = {}
myBitfieldDict['myBit'] = 0
myBitfieldDict['myTwoBit'] = 3
myBitfieldDict['myFourSignedBits'] = -1
myDict = {}
myDict['myBitfieldChild'] = myBitfieldDict
myByteArray = myStruct.serialize(myDict)
Enum¶
EnumDef() allows your to define a signed integer of size 1, 2, 3, …
or 8 bytes with a defined set of values (constants):
myEnum = pycstruct.EnumDef()
myEnum.add('myConstantM3',-3)
myEnum.add('myConstant0',0)
myEnum.add('myConstant5',5)
myEnum.add('myConstant44',44)
To add an enum to a struct:
myStruct = pycstruct.StructDef()
myStruct.add(myEnum, 'myEnumChild')
The constants are accessed as strings:
myDict = {}
myDict['myEnumChild'] = 'myConstant5'
myByteArray = myStruct.serialize(myDict)
Setting myEnumChild to a value not defined in the EnumDef will result in an exception.
Byte order¶
Structs, bitfields and enums are by default read and written in the native byte order. However, you can always override the default byteorder by providing the byteorder argument.
myStruct = pycstruct.StructDef(default_byteorder = 'big')
myStruct.add('int16', 'willBeBigEndian')
myStruct.add('int32', 'willBeBigEndianAlso')
myStruct.add('int32', 'willBeLittleEndian', byteorder = 'little')
myBitfield = pycstruct.BitfieldDef(byteorder = 'little')
myEnum = pycstruct.EnumDef(byteorder = 'big')