Communication Between Heterogeneous Systems

# Communication Between Heterogeneous Systems > [!quote] > _I’m a huge proponent of designing your code around the data, rather than the other way around. I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships._ > > Linus Torvalds Imagine a simple digital device like a thermostat. The thermostat is composed of a small microprocessor and a temperature sensor. The temperature sensor is connected to the processor using an [[Physical Layer#I2C|I2C]] interface. The circuit is shown in the figure below, with the processor marked as U2 and the temperature sensor marked as U1 (not the best schematic capture practice, but oh well). ![A simple thermostat; a world separates these two things](site/Resources/media/image18.png) > [!Figure] > _A simple thermostat. A world separates these two things._ Now, imagine we wrote a simple piece of code in C (bare metal) to read data from the temperature sensor, something simple: ```C #include <stdint.h> // Define the I2C address of the sensor #define SENSOR_I2C_ADDRESS 0x48 // Define a struct to hold temperature data, threshold, and hysteresis typedef struct { uint16_t value; // Temperature value uint16_t threshold; // Pre-programmed threshold value uint16_t hysteresis; // Hysteresis value } TemperatureData; // Function prototypes void I2C_Init(); TemperatureData I2C_ReadTemperature(); void ThermostatControl(TemperatureData tempData); int main() { // Initialize I2C communication I2C_Init(); while (1) { // Read temperature, threshold, and hysteresis from the sensor TemperatureData tempData = I2C_ReadTemperature(); // Control logic based on the temperature, threshold, and hysteresis ThermostatControl(tempData); } return 0; } void I2C_Init() { // Initialize I2C hardware (specific to your microcontroller) // ... } TemperatureData I2C_ReadTemperature() { TemperatureData tempData; tempData.value = 0; tempData.threshold = 0; tempData.hysteresis = 0; // Implement I2C read for your specific sensor // Example: // I2C_Read(SENSOR_I2C_ADDRESS, &(tempData.value), sizeof(tempData.value)); // I2C_Read(SENSOR_I2C_ADDRESS, &(tempData.threshold), sizeof(tempData.threshold)); // I2C_Read(SENSOR_I2C_ADDRESS, &(tempData.hysteresis), sizeof(tempData.hysteresis)); // ... return tempData; } void ThermostatControl(TemperatureData tempData) { // Example control logic using temperature, threshold, and hysteresis if (tempData.value < tempData.threshold - tempData.hysteresis) { // Turn on heating } else if (tempData.value > tempData.threshold + tempData.hysteresis) { // Turn on cooling } else { // Maintain current state } } ``` We see that our code has defined a structure called ```TemperatureData``` with three member variables: ```value```, ```threshold```, and ```hysteresis```. On the other hand, as per the hypothetical temperature sensor datasheet, we know the temperature values read reside in volatile registers in the device, whose content must be transferred through the I2C link to the processor so we can use them in our firmware routine (see figure below). ![Two endpoints of a data interface with most likely have dissimilar data representations](site/Resources/media/image19.png) > [!Figure] > _Two endpoints of a data interface managing dissimilar data representations_ When we see these two simple digital systems linked together, we can clearly see they work with dissimilar syntaxes and worldviews. For instance, the temperature sensor, as per its datasheet, stores the values of interest in 16-bit registers from which only 9 bits are meaningful (see figure above, right hand). On the microprocessor side, the values are stored in memory in a C ```struct``` whose data types are 16-bit. Moreover, the data interface in between (I2C) works with 8-bit transactions. As you can see, for such a simple system, there are still multiple data types and bit widths involved, and transferring data back and forth would need a nice dose of type casting and conversions. ==In this diversity lies one of the fundamental problems of communication in digital systems: moving data that originates from dissimilar distant endpoints while being serialized through interfaces that also impose their syntaxes and constraints. This transforms the act of digital communication into an error-prone, time-consuming process of data manipulation and conversion to ensure data structures at the endpoints will be populated with the right data at the right time.== > [!Note] We spend most of our lives as embedded software developers making data structures that are separated to match. That means, our brain energy is mostly spent seeking data coherence between two or more data structures that are supposed to work with the same data but are formatted differently and are remote from each other, either separated by noisy channels (if they sit on different systems) or by data buses and memory maps when they sit on the same system. Networks and computers constitute a very heterogeneous world: computers can have different internal representation modes for the data they store. Encoding, precision, endianness, etc. This heterogeneousness of processor architectures adds on top of a great diversity in programming languages as well. If that was not enough diversity, two same processors running software coded in the same programming language will show design diversity. For example, in programs made in C, data may be represented by an array of integers and by a loose collection of integers in another. A third program could use a ```struct```. The example of the thermostat we discussed above is rather trivial; the C structure has only one level, therefore the conversion between what arrives on the I2C port and what's written in the struct is quite straightforward. With more complex data types (for instance, with nested structs), the issue gets substantially more difficult. ![Data diversity between two systems (C on the left, JSON on the right)](site/Resources/media/image20.png) > [!Figure] > _Data diversity between two systems (C on the left, JSON on the right)_ ## ASN.1 — Concrete syntax, abstract syntax, transfer syntax Now that the problem has been presented, let's introduce some definitions. We will call *concrete syntax* to the representation, in a given programming language, of the data structures to be transferred. This syntax respects the lexical and grammatical rules of a language (for instance C). It is called _concrete_ because it is actually handled by applications (implemented in a language) and it complies with the machine architectures' restrictions. Two examples of concrete syntaxes are shown in the figure above, with C on the left hand and JSON on the right hand. How should we proceed if we want to break free of the diversity of concrete syntax? To free ourselves of this heterogeneity of concrete syntax, the data structures to be transmitted should be described regardless of the programming languages used on the transmitting and receiving side. This description should also respect the lexical and grammatical rules of a certain language but should remain independent from programming languages and never be directly implemented on a machine. For all these reasons, we will call an _abstract syntax_ such as a description and Abstract Syntax Notation or ASN the language utilizing which this abstract syntax is denoted. Though independent from programming languages, the abstract syntax notation should be at least as powerful as any language's datatype formalism, that is, a recursive notation that allows building complex, nested data types from basic types (equivalent to the char, int, float C types for instance) and type constructors (equivalent to ```struct```, ```union```... in C). Many different messages can be exchanged between applications. The abstract syntax would then describe in a more condensed way these messages. For this, the abstract syntax must be defined utilizing a grammar that the data to be transferred should respect. That being said, one should bear in mind the two levels of grammar: first, the grammar of the 'ASN' abstract syntax notation itself; second, the abstract syntax, which is also a grammar and is defined using the former. This abstract syntax notation must be formal to prevent all ambiguities when being interpreted and handled by computing tools and parsers. The abstract syntax precisely defines the data structure but says nothing about the associated semantics, that is, the interpretation of these data by the application; what meaning should we associate with a TRUE Boolean value? What am I supposed to do if no value is assigned to a field? Many questions remain unanswered because they do not fall within the competence of a data transfer but are resolved when the data is used by the application. The semantic part is described, if necessary, using comments within the abstract syntax or through documentation associated with it. The advantages of a data transfer which is independent from the machine architectures are clear. As for the data received as byte streams or bits, they comply with a syntax called _transfer syntax_ so that these streams can be properly recognized by the interface peripherals in the peer machine. Of course, this transfer syntax depends on the abstract syntax, since it sets up how the data should be transmitted according to this abstract syntax. In fact, the transfer syntax structures and orders the bytes that are sent to the other machine; this process is called "serialization", but it should not be confused with serialization as we will discuss when we will see [[Semiconductors#Field Programmable Gate Arrays (FPGA)#SerDes and High-Speed Transceivers|SerDes]]. But contrary to the abstract syntax, the transfer syntax is a more physical quantity, and, because of that, it should take into account the ordering of the bytes, the weight of the bits, etc. Different transfer syntaxes can be associated with a single abstract syntax. This is particularly relevant when the throughput increases and makes more complex [[Semiconductors#Field Programmable Gate Arrays (FPGA)#SerDes and High-Speed Transceivers#Encoding Schemes|encoding]] necessary; in such a case, however, it is possible to change the transfer syntax without changing the abstract syntax. When you have two systems exchanging data, there are always data structures somewhere waiting to be populated with the data that is traversing through the channels. ASN.1 is used extensively in 5G, in particular in the RRC protocol. ![The Syntax Triad](site/Resources/media/image21.png) > [!Figure] > _The Syntax Triad. Source: #ref/Dub_ More about ASN can be found [[https://luca.ntop.org/Teaching/Appunti/asn1.html|here]] and the classic reference is #ref/Dubuisson. ## Foreign Function Interfaces Again, imagine two machines where one has an application software running which is coded in C, and whose data structures are defined in a set of header files. This application sends messages that contain data originating from these structures. The application will send different messages which point to different structures and the software will use different IDs for each type of message and send it over a socket. The other machine runs the client program in Python, and it is supposed to ingest the incoming messages and load its own data structures accordingly. In the previous section, we discussed the scenario where dissimilar programming languages send data to each other, and how it required that the transmitter and receiver used an abstract syntax to transfer data. We saw that, by using this abstract syntax, the receiver can reconstruct data structures because it can correctly interpret the incoming data structure. This approach, of course, requires that both ends have ASN interpreters that will transform the abstract syntax into concrete syntax. In a scenario without ASN interpreters, it will require that the receiving end uses the same concrete syntax as the transmitting end, despite using different languages. This is possible, and the mechanism for achieving that is called Foreign Function Interface, or FFI. A Foreign Function Interface (FFI) is a way of allowing a program written in one programming language to use services written in another. At the heart of FFI is the ability to describe the data types of the foreign language in a way that the host language can understand. This is handy because different programming languages represent data types differently. The use of FFI can introduce some complexities though. It requires a good understanding of both the host and foreign languages, and there are potential risks associated with it. For example, if the foreign functions are written in a lower-level language like C or C++, improper use of FFI can lead to memory management issues, security vulnerabilities, and inelegant crashes. In the Python ecosystem, ```ctypes``` and ```cffi``` are examples of libraries that provide FFI capabilities, allowing Python objects to interact with C functions and data structures. We will use ```ctypes``` in this example. Using ```ctypes```, one can create, manipulate, and inspect C data types directly in Python. This makes it a powerful tool for low-level system programming and for creating Python bindings to software made in C. The figure below captures the idea of the scenario we are discussing. Note how the header must be present on both sides of the "channel". ![Data serialization using Foreign Function Interface features](site/Resources/media/image22.png) > [!Figure] > _Data serialization using Foreign Function Interface features_ Note that serializing is easier when the data types are not nested. That is, when the structs being serialized are composed of primitive data types. Un-nesting data structures for serialization is possible but substantially more cumbersome. The example below implements serialization using FFI and through a common understanding of the data structures involved, on both sides of the communication channel. The example below works like this: the receiver is passed a struct name and a path to header files. Then, the receiver searches the path and finds the struct definition in a header file. Then, it parses the struct definition in the header and creates a twin C structure in Python using ```ctypes``` which is then populated with the data as the message arrives. The code is shown below. ```python # Concrete Syntax Interpreter using Foreign Function Interface (FFI) features in languages # This code takes a frame as an input, the name of a data structure as defined in the source, and it will take care of all the rest: going and fetching the definition of the structure in the source, creating a mirror struct in ctypes after that, and parse the incoming frame as per the structure. # Note that frame structures with non-standard data types might not be supported # Note also that support of data types is a work in progress. # Author: IC (2023) import socket import struct import binascii import ctypes as ct import ctypes import sys from tabulate import tabulate import os import re from ctypes import * import subprocess import time obj = None Payload="" # Create a UDP socket sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.setsockopt( socket.SOL_SOCKET,socket.SO_REUSEADDR, 1 ) # Put here the IP and port where the frames are received from sock.bind(('127.0.0.1', 2235)) # Here we extract the whole struct definition in the header file, which we will iterate soon for constructing a ctype twin def parse_struct(struct_name, directory_name): command = ["pcregrep", "-rM", f"--include=.*\.[hc]quot;, f"typedef struct\s*\n?\s*\{{\s*[^{{}}]*\n\s*\}}\s*{struct_name}\s*;", directory_name] result = subprocess.run(command, stdout=subprocess.PIPE, universal_newlines=True,stderr=subprocess.DEVNULL) lines=[] for line in result.stdout.splitlines(): line=line.lstrip() if(line)=='': continue if(line[0].isalpha()): if(";" in line): print(line) lines.append(line) else: continue else: continue #Here we start to write the code for manufacturing the twin ctypes struct struct_code = f"class {struct_name}(ctypes.Structure):\n" struct_code += " _fields_ = [\n" #tidy things up a bit res = [] for line in lines: line=line.replace(';', '') # remove semicolon el = line.split(" ") el = [i for i in el if i] print(el) res.append(el) #now expand arrays for member in res: varType=member[0] match = re.search(r'^([\w\d]+)(?=\[)', member[1]) if match: varName = match.group(1) else: varName = member[1] pattern = re.compile(r'(?<=\[)[^\]]+') match1 = pattern.search(member[1]) if match1: content1 = match1.group(0) else: content1=None #Check if there is another index match2 = pattern.findall(member[1]) if match2: content2 = match2[-1] else: content2=None #print(varName,content1,content2) if(content1!=None): if content1.isdigit()==False: rows = search_define(content1,directory_name)[0] else: rows = int(content1) print("r",varType,varName,rows) if(content2!=None): if(content2!=content1): if(content2.isdigit()==False): cols = search_define(content2,directory_name)[0] else: cols = int(content2) print("c",varType,varName,cols) if(content1==None and content2 == None): print(varType,varName) rows=0 cols=0 if("uint8" in varType): if(rows==0): struct_code += f" ('{varName}', ctypes.c_ubyte),\n" if(rows>0 and cols ==0): struct_code += f" ('{varName}', ctypes.c_ubyte*{rows}),\n" if(cols>0): struct_code += f" ('{varName}', (ctypes.c_ubyte*{cols})*{rows}),\n" if("uint16" in varType): if(rows==0): struct_code += f" ('{varName}', ctypes.c_uint16),\n" if(rows>0 and cols ==0): struct_code += f" ('{varName}', ctypes.c_uint16*{rows}),\n" if(cols>0): struct_code += f" ('{varName}', (ctypes.c_uint16*{cols})*{rows}),\n" if("uint32" in varType): if(rows==0): struct_code += f" ('{varName}', ctypes.c_uint32),\n" if(rows>0 and cols ==0): struct_code += f" ('{varName}', ctypes.c_uint32*{rows}),\n" if(cols>0): struct_code += f" ('{varName}', (ctypes.c_uint32*{cols})*{rows}),\n" if("float" in member[0]): if(rows==0): struct_code += f" ('{varName}', ctypes.c_float),\n" if(rows>0 and cols ==0): struct_code += f" ('{varName}', ctypes.c_float*{rows}),\n" if(cols>0): struct_code += f" ('{varName}', (ctypes.c_float*{cols})*{rows}),\n" if("long" in member[0]): if(rows==0): struct_code += f" ('{varName}', ctypes.c_long),\n" if(rows>0 and cols ==0): struct_code += f" ('{varName}', ctypes.c_long*{rows}),\n" if(cols>0): struct_code += f" ('{varName}', (ctypes.c_long*{cols})*{rows}),\n" content1=None content2=None rows=0 cols=0 struct_code += " ]\n" print(struct_code) return exec(struct_code, globals()) #Here we create an instance of the ctypes structure we just made mirroring the C struct as per the header def create_ctypes_struct(struct_name, member_list): struct_code = f"class {struct_name}(ctypes.Structure):\n" struct_code += " _fields_ = [\n" for member_name, member_ctype in member_list: struct_code += f" ('{member_name}', {member_ctype}),\n" struct_code += " ]\n" exec(struct_code, globals()) print(struct_code) return globals()[struct_name] #Here we go and find a define in order to expand arrays and matrices def search_define(pattern,path): command = "pcregrep -rM -s --include='.*\\.[hc] '" + '#define ' + pattern + "' " + path result = subprocess.Popen(command, shell=True,stdout=subprocess.PIPE,stderr=subprocess.DEVNULL) output = result.stdout.read() if(len(output)>0): output = output.decode().split("#")[1] else: return None value = re.findall(r'\d+', output) test_list = [int(i) for i in value] return test_list def print_ctypes_structure(structure): for field_name, field_type in structure._fields_: field_value = getattr(structure, field_name) print(field_name, field_value) # Check if field_value is an array if isinstance(field_value, ctypes.Array): if len(field_value) > 0 and isinstance(field_value[0], ctypes.Array): # 2D array for i in range(len(field_value)): for j in range(len(field_value[i])): print(f" {i},{j}: {field_value[i][j]}") else: # 1D array for i in range(len(field_value)): print(f" {i}: {field_value[i]}") def parseFrame(FRAME_ID, structName,directoryName,offset=16): # Receive data from the socket as per frame ID global obj global Payload myStruct = parse_struct(structName,directoryName) structStr = f'obj = {structName}()' print(structStr) exec(structStr, globals()) while(True): data, address = sock.recvfrom(1024) MID = data[:2] MID = struct.unpack('>H', MID)[0] Payload = data[offset:] if(MID==FRAME_ID): #print(binascii.hexlify(data)) print(binascii.hexlify(Payload)) structStr = f'obj = {structName}.from_buffer_copy(Payload)' try: exec(structStr, globals()) except ValueError: print("Error: Most likely the parsed struct differs in size with the original. Check data types") exit(1) print_ctypes_structure(obj) return if __name__ == "__main__": print(f"Arguments count: {len(sys.argv)}") for i, arg in enumerate(sys.argv): print(f"Argument {i:>6}: {arg}") if(len(sys.argv)>4): frame = parseFrame(int(sys.argv[1],0),sys.argv[2],sys.argv[3],int(sys.argv[4])) else: frame = parseFrame(int(sys.argv[1],0),sys.argv[2],sys.argv[3]) ``` You can see the code heavily relies on regular expressions. The scariest one, the one used inside the ```parse_struct``` function, can be easily explained: this regex pattern is defined to match a specific format of a C struct definition. In short, this regex searches for a C typedef that might look something like this: ```C typedef struct { // struct fields here }StructName; ``` The other regular expressions are simpler and aim to find and parse all members of the structure of interest. If the members happen to be arrays or matrices whose sizes are defined using ```#define``` macros, the code also searches for the definition of those macros and sizes the ```ctypes``` arrays and matrices accordingly. Finally, as the data arrives, the twin structure is populated accordingly, mirroring the data structure on the remote end. What this example ultimately shows is that abstract syntax interpreters are not always obligatory and there might be simpler solutions as long as both ends in the channel can have access to the concrete syntax used to format the messages being sent.