AI Tutorial: Understanding the basics of X-CUBE-AI
Abstract
In our last lab, we used TensorFlow Lite for Microcontrollers (TFLM). TFLM acts as an interpreter: it loads your model at runtime and figures out which math operations to run. This is flexible but adds overhead.
Today, we switch to STM32 X-CUBE-AI. This tool is more of a Compiler. It takes your model.h5, analyzes it, and converts the Neural Network directly into optimized C code. There is no interpreter logic occupying your RAM—only the pure math required for your specific layers. This often results in faster inference and lower memory usage.
In this lab, we will deploy the same HAR (Human Activity Recognition) Keras model onto the B-U585I-IOT02A board using the STM32 professional toolchain.
1. Introduction to Compiler-Based Edge AI
1.1 The Interpreter vs. The Compiler Paradigm
In Lab 1, we successfully deployed a foundational Neural Network using TensorFlow Lite for Microcontrollers (TFLM). TFLM operates as an Interpreter: it loads a standardized model file at runtime and uses a generic set of highly optimized kernels (like CMSIS-NN) to execute the graph. This approach is highly flexible and portable across different hardware platforms (STM32, ESP32, Pico).
However, TFLM introduces a necessary overhead—memory is required for the interpreter logic, the model structure, and the large Tensor Arena.
In this lab, we explore a different, often more efficient, deployment method driven by a Compiler. We will use STM32 X-CUBE-AI, a proprietary expansion package from STMicroelectronics. This tool takes your trained Keras model, analyzes its specific layers, and converts the entire network directly into highly optimized, layer-specific C code. The result is a lighter memory footprint and typically faster inference, as there is no runtime interpreter overhead.
1.2 The Application: Human Activity Recognition (HAR)
Transitioning from the theoretical Sine Wave, this lab introduces a genuine, real-world problem: Human Activity Recognition (HAR).
HAR is the task of classifying an individual’s physical state (e.g., standing, walking, running) based on time-series data from a sensor. We will use the integrated Inertial Measurement Unit (IMU) on the B-U585I-IOT02A Discovery Kit to feed 3-axis accelerometer data into a 1D Convolutional Neural Network (1D CNN). This exercise demonstrates how to handle continuous, multi-dimensional sensor input—a core requirement for nearly all TinyML projects.
1.3 Navigating the Professional Toolchain
Mastering TinyML for production requires familiarity with the professional-grade toolchains favored by major chip manufacturers.
This lab will guide you through the STMicroelectronics Ecosystem:
- STM32CubeIDE/CubeMX: Used for hardware configuration, clock setup, peripheral initialization (I2C for the IMU, UART for debugging), and code generation.
- X-CUBE-AI: The dedicated compiler that integrates the model as a self-contained C-library into your CubeIDE project.
By the end of this lab, you will not only have a running HAR model but a profound understanding of the differences between Interpreter-based (TFLM) and Compiler-based (X-CUBE-AI) embedded AI deployment methods.
1.4 Lab Objectives
This laboratory exercise is broken down into four key stages:
- Model Review: Understanding the pre-trained 1D CNN HAR model and its data normalization requirements.
- Hardware Configuration: Setting up the STM32U5 board peripherals (IMU I2C interface and interrupt handling) using STM32CubeMX.
- Model Compilation: Importing the Keras model into the X-CUBE-AI GUI to generate the optimized C-library source files.
- Application Coding: Integrating the sensor reading loop with the generated AI library, normalizing the data, and performing real-time inference.
2. Prerequisites
- Hardware: B-U585I-IOT02A (STM32U5 Discovery Kit IoT Node).
- Software:
- STM32CubeIDE (Integrated Development Environment).
- STM32CubeMX (Configuration Tool, usually built-in to CubeIDE).
- X-CUBE-AI Expansion Pack: You must install this inside CubeMX (Software Packs -> Manage Software Packs -> STMicroelectronics -> X-CUBE-AI).
- Google Colab or other notebook / python
- Files: The h5 file provided in this article.
3. Model Explanation
To explore more on the embedded side, this article uses ST’s notebook for the HAR model and the provided dataset with only 3 classes: standing, walking and running. All available in this github link> stm32ai-wiki/AI_resources/HAR at master · STMicroelectronics/stm32ai-wiki
The HAR is a time series problem and can be solved with ease using 1D CNN approach, among many others.
The notebook can be automatically opened with this link: Human Activity Recognition – Colab
Quickly going through the notebook: the framework used is TensorFlow and this is the first thing imported in it (the notebook is quite old, but still works fine, even if the versions are updated to the latest):
The code will then load the datasheet, which is fully available in the gitub, but it is downloaded and unzipped in the notebook, where the dataset contains 92 classes samples, divided into 3 classes:
After that, the dataset is loaded into memory and this creates two lists:
- x_recordings, that contains all data, each position has the [x,y,z] from the accelerometer
- y_recordings, that contains the classes
The code also provides a small display on how the data looks like:
The code will implement normalization, which implies the accelerometer can use different ranges, as long as it is divided by its end-scale. On the embedded side, we’ll use +-4g as well.
There are several split between train and test approaches that can be used, this implementation opted for the simpler one, called Hold-Out:
Finally, the model based on CNN-1D approach:
Here is a visual representation of the model:
The very last step before saving the model is to display a clean and reliable way to evaluate the model, in this case, the confusion matrix was used:
With the model validated, the notebook code saves the model as *.hd5 and its from this point we’ll start the embedded development using the X-CUBE-AI:
4. Basic Hardware Usage
By checking the schematic, PH4/PH5 should be used for I2C and PE11 for the dataready interrupt. The UART can be used to display the data on the terminal:
5. Step 1: The CubeMX Configuration
Unlike Arduino, we do not write the model into a header file manually. We use the GUI.
Start Project: Open STM32CubeIDE and start a new STM32 project. Select the STM32U585AII6Q:
With the newly created project, add the Software Package that adds the X-CUBE-MEMS1, which grants access to the ISM330DHCX driver. Then, configure the I2C2 on the pins mentioned on chapter 4, as well as the EXTI11 using PE11:
Add the UART for debugging and monitoring purposes:
Just as a checking point, generate the code, open STM32CubeIDE and include the following code – entire main.c provided:
/* USER CODE BEGIN Header */
/**
******************************************************************************
* @file : main.c
* @brief : Main program body
******************************************************************************
* @attention
*
* Copyright (c) 2025 STMicroelectronics.
* All rights reserved.
*
* This software is licensed under terms that can be found in the LICENSE file
* in the root directory of this software component.
* If no LICENSE file comes with this software, it is provided AS-IS.
*
******************************************************************************
*/
/* USER CODE END Header */
/* Includes ------------------------------------------------------------------*/
#include "main.h"
/* Private includes ----------------------------------------------------------*/
/* USER CODE BEGIN Includes */
#include "ism330dhcx.h"
#include "custom_bus.h"
#include
/* USER CODE END Includes */
/* Private typedef -----------------------------------------------------------*/
/* USER CODE BEGIN PTD */
ISM330DHCX_Object_t MotionSensor;
volatile uint32_t dataRdyIntReceived;
/* USER CODE END PTD */
/* Private define ------------------------------------------------------------*/
/* USER CODE BEGIN PD */
/* USER CODE END PD */
/* Private macro -------------------------------------------------------------*/
/* USER CODE BEGIN PM */
/* USER CODE END PM */
/* Private variables ---------------------------------------------------------*/
CRC_HandleTypeDef hcrc;
UART_HandleTypeDef huart1;
/* USER CODE BEGIN PV */
static void MEMS_Init(void);
/* USER CODE END PV */
/* Private function prototypes -----------------------------------------------*/
void SystemClock_Config(void);
static void MX_GPIO_Init(void);
static void MX_USART1_UART_Init(void);
static void MX_CRC_Init(void);
static void MX_ICACHE_Init(void);
/* USER CODE BEGIN PFP */
/* USER CODE END PFP */
/* Private user code ---------------------------------------------------------*/
/* USER CODE BEGIN 0 */
ISM330DHCX_Axes_t acc_axes;
/* USER CODE END 0 */
/**
* @brief The application entry point.
* @retval int
*/
int main(void)
{
/* USER CODE BEGIN 1 */
/* USER CODE END 1 */
/* MCU Configuration--------------------------------------------------------*/
/* Reset of all peripherals, Initializes the Flash interface and the Systick. */
HAL_Init();
/* USER CODE BEGIN Init */
/* USER CODE END Init */
/* Configure the system clock */
SystemClock_Config();
/* USER CODE BEGIN SysInit */
/* USER CODE END SysInit */
/* Initialize all configured peripherals */
MX_GPIO_Init();
MX_USART1_UART_Init();
MX_CRC_Init();
MX_ICACHE_Init();
/* USER CODE BEGIN 2 */
dataRdyIntReceived = 0;
MEMS_Init();
/* USER CODE END 2 */
/* Infinite loop */
/* USER CODE BEGIN WHILE */
while (1)
{
/* USER CODE END WHILE */
/* USER CODE BEGIN 3 */
if (dataRdyIntReceived != 0) {
dataRdyIntReceived = 0;
ISM330DHCX_ACC_GetAxes(&MotionSensor, &acc_axes);
printf("% 5d, % 5d, % 5d\r\n", (int) acc_axes.x, (int) acc_axes.y, (int) acc_axes.z);
}
}
/* USER CODE END 3 */
}
/**
* @brief System Clock Configuration
* @retval None
*/
void SystemClock_Config(void)
{
RCC_OscInitTypeDef RCC_OscInitStruct = {0};
RCC_ClkInitTypeDef RCC_ClkInitStruct = {0};
/** Configure the main internal regulator output voltage
*/
if (HAL_PWREx_ControlVoltageScaling(PWR_REGULATOR_VOLTAGE_SCALE1) != HAL_OK)
{
Error_Handler();
}
/** Initializes the CPU, AHB and APB buses clocks
*/
RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_MSI;
RCC_OscInitStruct.MSIState = RCC_MSI_ON;
RCC_OscInitStruct.MSICalibrationValue = RCC_MSICALIBRATION_DEFAULT;
RCC_OscInitStruct.MSIClockRange = RCC_MSIRANGE_0;
RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON;
RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_MSI;
RCC_OscInitStruct.PLL.PLLMBOOST = RCC_PLLMBOOST_DIV4;
RCC_OscInitStruct.PLL.PLLM = 3;
RCC_OscInitStruct.PLL.PLLN = 10;
RCC_OscInitStruct.PLL.PLLP = 2;
RCC_OscInitStruct.PLL.PLLQ = 2;
RCC_OscInitStruct.PLL.PLLR = 1;
RCC_OscInitStruct.PLL.PLLRGE = RCC_PLLVCIRANGE_1;
RCC_OscInitStruct.PLL.PLLFRACN = 0;
if (HAL_RCC_OscConfig(&RCC_OscInitStruct) != HAL_OK)
{
Error_Handler();
}
/** Initializes the CPU, AHB and APB buses clocks
*/
RCC_ClkInitStruct.ClockType = RCC_CLOCKTYPE_HCLK|RCC_CLOCKTYPE_SYSCLK
|RCC_CLOCKTYPE_PCLK1|RCC_CLOCKTYPE_PCLK2
|RCC_CLOCKTYPE_PCLK3;
RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK;
RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1;
RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV1;
RCC_ClkInitStruct.APB2CLKDivider = RCC_HCLK_DIV1;
RCC_ClkInitStruct.APB3CLKDivider = RCC_HCLK_DIV1;
if (HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_4) != HAL_OK)
{
Error_Handler();
}
}
/**
* @brief CRC Initialization Function
* @param None
* @retval None
*/
static void MX_CRC_Init(void)
{
/* USER CODE BEGIN CRC_Init 0 */
/* USER CODE END CRC_Init 0 */
/* USER CODE BEGIN CRC_Init 1 */
/* USER CODE END CRC_Init 1 */
hcrc.Instance = CRC;
hcrc.Init.DefaultPolynomialUse = DEFAULT_POLYNOMIAL_ENABLE;
hcrc.Init.DefaultInitValueUse = DEFAULT_INIT_VALUE_ENABLE;
hcrc.Init.InputDataInversionMode = CRC_INPUTDATA_INVERSION_NONE;
hcrc.Init.OutputDataInversionMode = CRC_OUTPUTDATA_INVERSION_DISABLE;
hcrc.InputDataFormat = CRC_INPUTDATA_FORMAT_BYTES;
if (HAL_CRC_Init(&hcrc) != HAL_OK)
{
Error_Handler();
}
/* USER CODE BEGIN CRC_Init 2 */
/* USER CODE END CRC_Init 2 */
}
/**
* @brief ICACHE Initialization Function
* @param None
* @retval None
*/
static void MX_ICACHE_Init(void)
{
/* USER CODE BEGIN ICACHE_Init 0 */
/* USER CODE END ICACHE_Init 0 */
/* USER CODE BEGIN ICACHE_Init 1 */
/* USER CODE END ICACHE_Init 1 */
/** Enable instruction cache (default 2-ways set associative cache)
*/
if (HAL_ICACHE_Enable() != HAL_OK)
{
Error_Handler();
}
/* USER CODE BEGIN ICACHE_Init 2 */
/* USER CODE END ICACHE_Init 2 */
}
/**
* @brief USART1 Initialization Function
* @param None
* @retval None
*/
static void MX_USART1_UART_Init(void)
{
/* USER CODE BEGIN USART1_Init 0 */
/* USER CODE END USART1_Init 0 */
/* USER CODE BEGIN USART1_Init 1 */
/* USER CODE END USART1_Init 1 */
huart1.Instance = USART1;
huart1.Init.BaudRate = 115200;
huart1.Init.WordLength = UART_WORDLENGTH_8B;
huart1.Init.StopBits = UART_STOPBITS_1;
huart1.Init.Parity = UART_PARITY_NONE;
huart1.Init.Mode = UART_MODE_TX_RX;
huart1.Init.HwFlowCtl = UART_HWCONTROL_NONE;
huart1.Init.OverSampling = UART_OVERSAMPLING_16;
huart1.Init.OneBitSampling = UART_ONE_BIT_SAMPLE_DISABLE;
huart1.Init.ClockPrescaler = UART_PRESCALER_DIV1;
huart1.AdvancedInit.AdvFeatureInit = UART_ADVFEATURE_NO_INIT;
if (HAL_UART_Init(&huart1) != HAL_OK)
{
Error_Handler();
}
if (HAL_UARTEx_SetTxFifoThreshold(&huart1, UART_TXFIFO_THRESHOLD_1_8) != HAL_OK)
{
Error_Handler();
}
if (HAL_UARTEx_SetRxFifoThreshold(&huart1, UART_RXFIFO_THRESHOLD_1_8) != HAL_OK)
{
Error_Handler();
}
if (HAL_UARTEx_DisableFifoMode(&huart1) != HAL_OK)
{
Error_Handler();
}
/* USER CODE BEGIN USART1_Init 2 */
/* USER CODE END USART1_Init 2 */
}
/**
* @brief GPIO Initialization Function
* @param None
* @retval None
*/
static void MX_GPIO_Init(void)
{
GPIO_InitTypeDef GPIO_InitStruct = {0};
/* USER CODE BEGIN MX_GPIO_Init_1 */
/* USER CODE END MX_GPIO_Init_1 */
/* GPIO Ports Clock Enable */
__HAL_RCC_GPIOH_CLK_ENABLE();
__HAL_RCC_GPIOA_CLK_ENABLE();
__HAL_RCC_GPIOE_CLK_ENABLE();
/*Configure GPIO pin : PE11 */
GPIO_InitStruct.Pin = GPIO_PIN_11;
GPIO_InitStruct.Mode = GPIO_MODE_IT_RISING;
GPIO_InitStruct.Pull = GPIO_NOPULL;
HAL_GPIO_Init(GPIOE, &GPIO_InitStruct);
/* EXTI interrupt init*/
HAL_NVIC_SetPriority(EXTI11_IRQn, 0, 0);
HAL_NVIC_EnableIRQ(EXTI11_IRQn);
/* USER CODE BEGIN MX_GPIO_Init_2 */
/* USER CODE END MX_GPIO_Init_2 */
}
/* USER CODE BEGIN 4 */
static void MEMS_Init(void)
{
ISM330DHCX_IO_t io_ctx;
uint8_t id;
ISM330DHCX_AxesRaw_t axes;
/* Link I2C functions to the ISM330DHCX driver */
io_ctx.BusType = ISM330DHCX_I2C_BUS;
io_ctx.Address = ISM330DHCX_I2C_ADD_H;
io_ctx.Init = BSP_I2C2_Init;
io_ctx.DeInit = BSP_I2C2_DeInit;
io_ctx.ReadReg = BSP_I2C2_ReadReg;
io_ctx.WriteReg = BSP_I2C2_WriteReg;
io_ctx.GetTick = BSP_GetTick;
ISM330DHCX_RegisterBusIO(&MotionSensor, &io_ctx);
/* Read the ISM330DHCX WHO_AM_I register */
ISM330DHCX_ReadID(&MotionSensor, &id);
if (id != ISM330DHCX_ID) {
Error_Handler();
}
/* Initialize the ISM330DHCX sensor */
ISM330DHCX_Init(&MotionSensor);
/* Configure the ISM330DHCX accelerometer (ODR, scale and interrupt) */
ISM330DHCX_ACC_SetOutputDataRate(&MotionSensor, 26.0f); /* 26 Hz */
ISM330DHCX_ACC_SetFullScale(&MotionSensor, 4); /* [-4000mg; +4000mg] */
ISM330DHCX_Set_INT1_Drdy(&MotionSensor, ENABLE); /* Enable DRDY */
ISM330DHCX_ACC_GetAxesRaw(&MotionSensor, &axes); /* Clear DRDY */
/* Start the ISM330DHCX accelerometer */
ISM330DHCX_ACC_Enable(&MotionSensor);
}
void HAL_GPIO_EXTI_Rising_Callback(uint16_t GPIO_Pin)
{
if (GPIO_Pin == GPIO_PIN_11) {
dataRdyIntReceived++;
}
}
int _write(int fd, char * ptr, int len)
{
HAL_UART_Transmit(&huart1, (uint8_t *) ptr, len, HAL_MAX_DELAY);
return len;
}
/* USER CODE END 4 */
/**
* @brief This function is executed in case of error occurrence.
* @retval None
*/
void Error_Handler(void)
{
/* USER CODE BEGIN Error_Handler_Debug */
/* User can add his own implementation to report the HAL error return state */
__disable_irq();
while (1)
{
}
/* USER CODE END Error_Handler_Debug */
}
#ifdef USE_FULL_ASSERT
/**
* @brief Reports the name of the source file and the source line number
* where the assert_param error has occurred.
* @param file: pointer to the source file name
* @param line: assert_param error line source number
* @retval None
*/
void assert_failed(uint8_t *file, uint32_t line)
{
/* USER CODE BEGIN 6 */
/* User can add his own implementation to report the file name and line number,
ex: printf("Wrong parameters value: file %s on line %d\r\n", file, line) */
/* USER CODE END 6 */
}
#endif /* USE_FULL_ASSERT */
Build and program the board and open your preferred terminal, such as Tera Term with these settings:
And the 3-axis should be displayed as follows:
6. Step 2: Add the model via X-CUBE-AI
Note: Upon code generation with the X-CUBE-AI, it is quite often that the syscalls.c will be deleted, so make sure you have a copy of it, so it’s possible to restore it after the generated code.
Go back to CubeMX and enable X-CUBE-AI:
- Click Software Packs -> Select Components.
- Find X-CUBE-AI. Check the box for Core.

- Close the window. You will see “Software Packs” in the left category pane.
Import Model:
- Click on X-CUBE-AI in the left pane.
- Click Add Network. Name it har_network.
- Set “Model Kind” to Keras.
- Browse and select your h5 file.

The “Magic” Button (Analyze):
- Click the Analyze
- X-CUBE-AI will compile your model virtually and report the exact RAM and Flash usage.
- Tip: If the RAM usage exceeds your chip’s limit, you can enable “Compression” here.

Generate Code: Click Project -> Generate Code. This creates a Core/Src and Core/Inc folder structure with your AI library generated in C.
7. Step 3: Coding the Application (main.c)
CubeMX generated the “Engine,” but we need to write the “Driver.” Open Core/Src/main.c.
A. Includes and Buffers
Add the generated headers and allocate memory buffers for the AI.
B. Initialization
Inside main(), before the while(1) loop, initialize the AI library.
C. The Inference Loop
Read the sensor, normalize the data, and run the inference.
The entire main.c code is provided below:
/* USER CODE BEGIN Header */
/**
******************************************************************************
* @file : main.c
* @brief : Main program body
******************************************************************************
* @attention
*
* Copyright (c) 2025 STMicroelectronics.
* All rights reserved.
*
* This software is licensed under terms that can be found in the LICENSE file
* in the root directory of this software component.
* If no LICENSE file comes with this software, it is provided AS-IS.
*
******************************************************************************
*/
/* USER CODE END Header */
/* Includes ------------------------------------------------------------------*/
#include "main.h"
/* Private includes ----------------------------------------------------------*/
/* USER CODE BEGIN Includes */
#include "ism330dhcx.h"
#include "custom_bus.h"
#include
#include "ai_platform.h"
#include "network.h"
#include "network_data.h"
/* USER CODE END Includes */
/* Private typedef -----------------------------------------------------------*/
/* USER CODE BEGIN PTD */
ISM330DHCX_Object_t MotionSensor;
volatile uint32_t dataRdyIntReceived;
ai_handle network;
float aiInData[AI_NETWORK_IN_1_SIZE];
float aiOutData[AI_NETWORK_OUT_1_SIZE];
ai_u8 activations[AI_NETWORK_DATA_ACTIVATIONS_SIZE];
const char* activities[AI_NETWORK_OUT_1_SIZE] = {
"stationary", "walking", "running"
};
ai_buffer * ai_input;
ai_buffer * ai_output;
uint32_t write_index;
/* USER CODE END PTD */
/* Private define ------------------------------------------------------------*/
/* USER CODE BEGIN PD */
/* USER CODE END PD */
/* Private macro -------------------------------------------------------------*/
/* USER CODE BEGIN PM */
/* USER CODE END PM */
/* Private variables ---------------------------------------------------------*/
CRC_HandleTypeDef hcrc;
UART_HandleTypeDef huart1;
/* USER CODE BEGIN PV */
static void MEMS_Init(void);
static void AI_Init(void);
static void AI_Run(float *pIn, float *pOut);
static uint32_t argmax(const float * values, uint32_t len);
/* USER CODE END PV */
/* Private function prototypes -----------------------------------------------*/
void SystemClock_Config(void);
static void MX_GPIO_Init(void);
static void MX_USART1_UART_Init(void);
static void MX_CRC_Init(void);
static void MX_ICACHE_Init(void);
/* USER CODE BEGIN PFP */
/* USER CODE END PFP */
/* Private user code ---------------------------------------------------------*/
/* USER CODE BEGIN 0 */
ISM330DHCX_Axes_t acc_axes;
/* USER CODE END 0 */
/**
* @brief The application entry point.
* @retval int
*/
int main(void)
{
/* USER CODE BEGIN 1 */
/* USER CODE END 1 */
/* MCU Configuration--------------------------------------------------------*/
/* Reset of all peripherals, Initializes the Flash interface and the Systick. */
HAL_Init();
/* USER CODE BEGIN Init */
/* USER CODE END Init */
/* Configure the system clock */
SystemClock_Config();
/* USER CODE BEGIN SysInit */
/* USER CODE END SysInit */
/* Initialize all configured peripherals */
MX_GPIO_Init();
MX_USART1_UART_Init();
MX_CRC_Init();
MX_ICACHE_Init();
/* USER CODE BEGIN 2 */
dataRdyIntReceived = 0;
MEMS_Init();
AI_Init();
/* USER CODE END 2 */
/* Infinite loop */
/* USER CODE BEGIN WHILE */
while (1)
{
/* USER CODE END WHILE */
/* USER CODE BEGIN 3 */
if (dataRdyIntReceived != 0) {
dataRdyIntReceived = 0;
ISM330DHCX_ACC_GetAxes(&MotionSensor, &acc_axes);
// printf("% 5d, % 5d, % 5d\r\n", (int) acc_axes.x, (int) acc_axes.y, (int) acc_axes.z);
/* Normalize data to [-1; 1] and accumulate into input buffer */
/* Note: window overlapping can be managed here */
aiInData[write_index + 0] = (float) acc_axes.x / 4000.0f;
aiInData[write_index + 1] = (float) acc_axes.y / 4000.0f;
aiInData[write_index + 2] = (float) acc_axes.z / 4000.0f;
write_index += 3;
if (write_index == AI_NETWORK_IN_1_SIZE) {
write_index = 0;
printf("Running inference\r\n");
AI_Run(aiInData, aiOutData);
/* Output results */
for (uint32_t i = 0; i < AI_NETWORK_OUT_1_SIZE; i++) {
printf("%8.6f ", aiOutData[i]);
}
uint32_t class = argmax(aiOutData, AI_NETWORK_OUT_1_SIZE);
printf(": %d - %s\r\n", (int) class, activities[class]);
}
}
}
/* USER CODE END 3 */
}
/**
* @brief System Clock Configuration
* @retval None
*/
void SystemClock_Config(void)
{
RCC_OscInitTypeDef RCC_OscInitStruct = {0};
RCC_ClkInitTypeDef RCC_ClkInitStruct = {0};
/** Configure the main internal regulator output voltage
*/
if (HAL_PWREx_ControlVoltageScaling(PWR_REGULATOR_VOLTAGE_SCALE1) != HAL_OK)
{
Error_Handler();
}
/** Initializes the CPU, AHB and APB buses clocks
*/
RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_MSI;
RCC_OscInitStruct.MSIState = RCC_MSI_ON;
RCC_OscInitStruct.MSICalibrationValue = RCC_MSICALIBRATION_DEFAULT;
RCC_OscInitStruct.MSIClockRange = RCC_MSIRANGE_0;
RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON;
RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_MSI;
RCC_OscInitStruct.PLL.PLLMBOOST = RCC_PLLMBOOST_DIV4;
RCC_OscInitStruct.PLL.PLLM = 3;
RCC_OscInitStruct.PLL.PLLN = 10;
RCC_OscInitStruct.PLL.PLLP = 2;
RCC_OscInitStruct.PLL.PLLQ = 2;
RCC_OscInitStruct.PLL.PLLR = 1;
RCC_OscInitStruct.PLL.PLLRGE = RCC_PLLVCIRANGE_1;
RCC_OscInitStruct.PLL.PLLFRACN = 0;
if (HAL_RCC_OscConfig(&RCC_OscInitStruct) != HAL_OK)
{
Error_Handler();
}
/** Initializes the CPU, AHB and APB buses clocks
*/
RCC_ClkInitStruct.ClockType = RCC_CLOCKTYPE_HCLK|RCC_CLOCKTYPE_SYSCLK
|RCC_CLOCKTYPE_PCLK1|RCC_CLOCKTYPE_PCLK2
|RCC_CLOCKTYPE_PCLK3;
RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK;
RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1;
RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV1;
RCC_ClkInitStruct.APB2CLKDivider = RCC_HCLK_DIV1;
RCC_ClkInitStruct.APB3CLKDivider = RCC_HCLK_DIV1;
if (HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_4) != HAL_OK)
{
Error_Handler();
}
}
/**
* @brief CRC Initialization Function
* @param None
* @retval None
*/
static void MX_CRC_Init(void)
{
/* USER CODE BEGIN CRC_Init 0 */
/* USER CODE END CRC_Init 0 */
/* USER CODE BEGIN CRC_Init 1 */
/* USER CODE END CRC_Init 1 */
hcrc.Instance = CRC;
hcrc.Init.DefaultPolynomialUse = DEFAULT_POLYNOMIAL_ENABLE;
hcrc.Init.DefaultInitValueUse = DEFAULT_INIT_VALUE_ENABLE;
hcrc.Init.InputDataInversionMode = CRC_INPUTDATA_INVERSION_NONE;
hcrc.Init.OutputDataInversionMode = CRC_OUTPUTDATA_INVERSION_DISABLE;
hcrc.InputDataFormat = CRC_INPUTDATA_FORMAT_BYTES;
if (HAL_CRC_Init(&hcrc) != HAL_OK)
{
Error_Handler();
}
/* USER CODE BEGIN CRC_Init 2 */
/* USER CODE END CRC_Init 2 */
}
/**
* @brief ICACHE Initialization Function
* @param None
* @retval None
*/
static void MX_ICACHE_Init(void)
{
/* USER CODE BEGIN ICACHE_Init 0 */
/* USER CODE END ICACHE_Init 0 */
/* USER CODE BEGIN ICACHE_Init 1 */
/* USER CODE END ICACHE_Init 1 */
/** Enable instruction cache (default 2-ways set associative cache)
*/
if (HAL_ICACHE_Enable() != HAL_OK)
{
Error_Handler();
}
/* USER CODE BEGIN ICACHE_Init 2 */
/* USER CODE END ICACHE_Init 2 */
}
/**
* @brief USART1 Initialization Function
* @param None
* @retval None
*/
static void MX_USART1_UART_Init(void)
{
/* USER CODE BEGIN USART1_Init 0 */
/* USER CODE END USART1_Init 0 */
/* USER CODE BEGIN USART1_Init 1 */
/* USER CODE END USART1_Init 1 */
huart1.Instance = USART1;
huart1.Init.BaudRate = 115200;
huart1.Init.WordLength = UART_WORDLENGTH_8B;
huart1.Init.StopBits = UART_STOPBITS_1;
huart1.Init.Parity = UART_PARITY_NONE;
huart1.Init.Mode = UART_MODE_TX_RX;
huart1.Init.HwFlowCtl = UART_HWCONTROL_NONE;
huart1.Init.OverSampling = UART_OVERSAMPLING_16;
huart1.Init.OneBitSampling = UART_ONE_BIT_SAMPLE_DISABLE;
huart1.Init.ClockPrescaler = UART_PRESCALER_DIV1;
huart1.AdvancedInit.AdvFeatureInit = UART_ADVFEATURE_NO_INIT;
if (HAL_UART_Init(&huart1) != HAL_OK)
{
Error_Handler();
}
if (HAL_UARTEx_SetTxFifoThreshold(&huart1, UART_TXFIFO_THRESHOLD_1_8) != HAL_OK)
{
Error_Handler();
}
if (HAL_UARTEx_SetRxFifoThreshold(&huart1, UART_RXFIFO_THRESHOLD_1_8) != HAL_OK)
{
Error_Handler();
}
if (HAL_UARTEx_DisableFifoMode(&huart1) != HAL_OK)
{
Error_Handler();
}
/* USER CODE BEGIN USART1_Init 2 */
/* USER CODE END USART1_Init 2 */
}
/**
* @brief GPIO Initialization Function
* @param None
* @retval None
*/
static void MX_GPIO_Init(void)
{
GPIO_InitTypeDef GPIO_InitStruct = {0};
/* USER CODE BEGIN MX_GPIO_Init_1 */
/* USER CODE END MX_GPIO_Init_1 */
/* GPIO Ports Clock Enable */
__HAL_RCC_GPIOH_CLK_ENABLE();
__HAL_RCC_GPIOA_CLK_ENABLE();
__HAL_RCC_GPIOE_CLK_ENABLE();
/*Configure GPIO pin : PE11 */
GPIO_InitStruct.Pin = GPIO_PIN_11;
GPIO_InitStruct.Mode = GPIO_MODE_IT_RISING;
GPIO_InitStruct.Pull = GPIO_NOPULL;
HAL_GPIO_Init(GPIOE, &GPIO_InitStruct);
/* EXTI interrupt init*/
HAL_NVIC_SetPriority(EXTI11_IRQn, 0, 0);
HAL_NVIC_EnableIRQ(EXTI11_IRQn);
/* USER CODE BEGIN MX_GPIO_Init_2 */
/* USER CODE END MX_GPIO_Init_2 */
}
/* USER CODE BEGIN 4 */
static void MEMS_Init(void)
{
ISM330DHCX_IO_t io_ctx;
uint8_t id;
ISM330DHCX_AxesRaw_t axes;
/* Link I2C functions to the ISM330DHCX driver */
io_ctx.BusType = ISM330DHCX_I2C_BUS;
io_ctx.Address = ISM330DHCX_I2C_ADD_H;
io_ctx.Init = BSP_I2C2_Init;
io_ctx.DeInit = BSP_I2C2_DeInit;
io_ctx.ReadReg = BSP_I2C2_ReadReg;
io_ctx.WriteReg = BSP_I2C2_WriteReg;
io_ctx.GetTick = BSP_GetTick;
ISM330DHCX_RegisterBusIO(&MotionSensor, &io_ctx);
/* Read the ISM330DHCX WHO_AM_I register */
ISM330DHCX_ReadID(&MotionSensor, &id);
if (id != ISM330DHCX_ID) {
Error_Handler();
}
/* Initialize the ISM330DHCX sensor */
ISM330DHCX_Init(&MotionSensor);
/* Configure the ISM330DHCX accelerometer (ODR, scale and interrupt) */
ISM330DHCX_ACC_SetOutputDataRate(&MotionSensor, 26.0f); /* 26 Hz */
ISM330DHCX_ACC_SetFullScale(&MotionSensor, 4); /* [-4000mg; +4000mg] */
ISM330DHCX_Set_INT1_Drdy(&MotionSensor, ENABLE); /* Enable DRDY */
ISM330DHCX_ACC_GetAxesRaw(&MotionSensor, &axes); /* Clear DRDY */
/* Start the ISM330DHCX accelerometer */
ISM330DHCX_ACC_Enable(&MotionSensor);
}
void HAL_GPIO_EXTI_Rising_Callback(uint16_t GPIO_Pin)
{
if (GPIO_Pin == GPIO_PIN_11) {
dataRdyIntReceived++;
}
}
int _write(int fd, char * ptr, int len)
{
HAL_UART_Transmit(&huart1, (uint8_t *) ptr, len, HAL_MAX_DELAY);
return len;
}
static void AI_Init(void)
{
ai_error err;
/* Create a local array with the addresses of the activations buffers */
const ai_handle act_addr[] = { activations };
/* Create an instance of the model */
err = ai_network_create_and_init(&network, act_addr, NULL);
if (err.type != AI_ERROR_NONE) {
printf("ai_network_create error - type=%d code=%d\r\n", err.type, err.code);
Error_Handler();
}
ai_input = ai_network_inputs_get(network, NULL);
ai_output = ai_network_outputs_get(network, NULL);
}
static void AI_Run(float *pIn, float *pOut)
{
ai_i32 batch;
ai_error err;
/* Update IO handlers with the data payload */
ai_input[0].data = AI_HANDLE_PTR(pIn);
ai_output[0].data = AI_HANDLE_PTR(pOut);
batch = ai_network_run(network, ai_input, ai_output);
if (batch != 1) {
err = ai_network_get_error(network);
printf("AI ai_network_run error - type=%d code=%d\r\n", err.type, err.code);
Error_Handler();
}
}
static uint32_t argmax(const float * values, uint32_t len)
{
float max_value = values[0];
uint32_t max_index = 0;
for (uint32_t i = 1; i < len; i++) {
if (values[i] > max_value) {
max_value = values[i];
max_index = i;
}
}
return max_index;
}
/* USER CODE END 4 */
/**
* @brief This function is executed in case of error occurrence.
* @retval None
*/
void Error_Handler(void)
{
/* USER CODE BEGIN Error_Handler_Debug */
/* User can add his own implementation to report the HAL error return state */
__disable_irq();
while (1)
{
}
/* USER CODE END Error_Handler_Debug */
}
#ifdef USE_FULL_ASSERT
/**
* @brief Reports the name of the source file and the source line number
* where the assert_param error has occurred.
* @param file: pointer to the source file name
* @param line: assert_param error line source number
* @retval None
*/
void assert_failed(uint8_t *file, uint32_t line)
{
/* USER CODE BEGIN 6 */
/* User can add his own implementation to report the file name and line number,
ex: printf("Wrong parameters value: file %s on line %d\r\n", file, line) */
/* USER CODE END 6 */
}
#endif /* USE_FULL_ASSERT */
Before building, go the project settings and enable the float printf:
Build the application, load it into your STM32 board and you should see the AI model running:
- At idle, when the board is at rest, the serial output should display “stationary”.
- If you move the board up and down slowly to moderately fast, the serial output should display “walking”.
- If you shake the board quickly, the serial output should display “running”.
8. Lab Recap
We have now successfully deployed the same Neural Network using two different methods.
TFLM vs. X-CUBE-AI
Feature | TensorFlow Lite (Lab 1) | X-CUBE-AI (Lab 2) |
Deployment | Load Flatbuffer (.tflite) | Compile C Code |
Flexibility | High (Update model without recompiling code) | Low (Model is hard-coded into firmware) |
RAM Usage | Higher (Needs Interpreter overhead) | Lowest (Static allocation) |
Speed | Good (Uses CMSIS-NN) | Best (Optimized C loops) |
Workflow | Python -> Header File | CubeMX GUI |
Why use X-CUBE-AI?
If you are using an STM32 chip, X-CUBE-AI allows you to squeeze larger models into smaller chips. The “Analyze” feature in CubeMX is invaluable for feasibility studies—telling you instantly if your model is too big for your hardware before you write a single line of code.


