Raspberry Pi Pico: Dual-core Synchronization and Booting M33 and RISC-V cores

Raspberry Pi Pico Intermediate: Booting the M33 and RISC-V Cores on the RP2350

Abstract

The Raspberry Pi Pico 2 W, built around the RP2350 microcontroller, represents a significant step up in embedded processing power, featuring a dual-core complex that can run heterogeneous cores (Cortex-M33 and/or RISC-V Hazard3) up to 150 MHz. For low-level embedded development, this architecture necessitates a fundamental shift in the boot process. This guide provides a bare-metal deep-dive into the critical challenges of the RP2350: satisfying the mandatory Secure Bootrom by embedding image metadata (IMAGE_DEF) , achieving asynchronous synchronization between the two cores, and establishing a robust Inter-Processor Communication (IPC) link using shared memory and atomic operations.

1. Introduction

The Raspberry Pi Pico 2 W, featuring the RP2350 microcontroller, is a watershed moment for high-performance embedded systems. Moving beyond the RP2040’s dual Cortex-M0+ architecture, the RP2350 introduces a sophisticated dual-core complex that can utilize two ARM Cortex-M33 cores or a combination of one Cortex-M33 and one RISC-V Hazard3 processor, all running up to 150 MHz. A quick comparison between the RP2040 and RP2350 is shown below:

ARM Core Comparison: RP2040 vs. RP2350

Feature	RP2040	RP2350 (ARM Configuration)
CPU Architecture	ARM Cortex-M0+ (ARMv6-M instruction set)	ARM Cortex-M33 (ARMv8-M instruction set)
Core Count	Dual-core (2 cores total)	Dual-core (2 cores active at a time)
Max Clock Speed	133 MHz (overclockable)	150 MHz (overclockable)
Floating Point Unit (FPU)	No	Yes (single-precision hardware FPU, double-precision coprocessor)
DSP Instructions	No	Yes
Security Features	None	Arm TrustZone, secure boot, SHA-256 acceleration, hardware TRNG
SRAM	264 KB	520 KB

For the embedded hacker, this enhanced architecture—coupled with new security mandates—transforms the boot sequence from a straightforward jump to a complex, multi-stage choreography. True mastery of the RP2350 requires bypassing the Software Development Kit (SDK) and addressing the silicon directly. This guide details the two fundamental challenges of bare-metal development on the RP2350: satisfying the mandatory Secure Bootrom and achieving asynchronous core synchronization and communication.

2. Bootloader Concept

Unlike previous hobbyist microcontrollers, the RP2350 incorporates a fixed bootrom that executes immediately upon power-up. This bootrom is not simply a jump-starter, it is the enforcer of the chip’s security architecture, managing features like Secure Boot, versioning, and potential A/B updates.

This security foundation dictates a crucial, non-negotiable requirement for bare-metal developers: a simple C binary compiled with standard tools will not execute on the RP2350. The system first executes its bootrom, which then validates user firmware before proceeding to execution.

3. ARM Cortex M33 and RISC-V Cores

The RP2350 introduces a duality of processing options: developers may configure both cores as Cortex-M33, or they may utilize one Cortex-M33 and one RISC-V Hazard3 core. This potential heterogeneity complicates core synchronization beyond the simpler, identical dual Cortex-M0+ architecture of the RP2040.

The two cores run entirely asynchronously from the moment the second core is released from reset. This necessitates managing two potentially dissimilar instruction sets (ARMv8-M vs. RISC-V) and ensuring that the two compiled binary images (or sections of a single image) are tailored for their respective targets.

The RP2350 microcontroller provides a choice between dual ARM Cortex-M33 cores and dual Hazard3 RISC-V cores, selectable via software or at boot time. Here’s a comparison of the key features of the two core options within the RP2350 :

Feature	ARM Cortex-M33	Hazard3 RISC-V
Architecture	Proprietary Armv8-M instruction set	Open-source RISC-V (RV32IMAC+) instruction set
Clock Speed	Up to 150 MHz	Up to 150 MHz
Floating Point	Has dedicated hardware single-precision FPU (Floating Point Unit) and DSP instructions	Uses a simplified double-precision floating-point coprocessor for add/subtract/multiply/divide/square root, but lacks a full hardware FPU
Performance	Generally faster in floating-point-heavy tasks due to the dedicated FPU	Can be the more performant choice for integer code depending heavily on compiler optimizations (e.g., `-O3`)
Security	Supports Arm TrustZone for Cortex-M, allowing for isolation of code and data	Operates with the same security/privilege levels and global bus filtering as the ARM core, but TrustZone features are specific to the ARM architecture
Ecosystem	Benefits from the mature, widely adopted ARM ecosystem and commercial support	Has a growing but less mature ecosystem and toolchain compared to ARM
Licensing	Proprietary, requiring licensing fees for chip designers	Open-standard, open-source core design (developed by a Raspberry Pi engineer in his spare time)

4. Bootloader Image

To proceed, the user application binary must contain specific, predefined image metadata blocks, known as IMAGE_DEF. This metadata allows the bootrom to interpret the image, validate its integrity, and locate the user code’s entry point. Without this block, the bootrom will fail validation and refuse to execute the application.

4.1 The Requirement for IMAGE_DEF

The mandatory requirement for specific image metadata blocks (IMAGE_DEF) embedded within the compiled binary is a critical technical hurdle for bare-metal developers. The minimum viable image metadata required is described in the RP2350 Datasheet section 5.9.5.

4.2 Custom Linker Scripts

The fix for the mandatory metadata requirement involves two steps: defining the metadata structure and ensuring the linker places it in the correct memory section, exactly where the bootrom expects to find it.

4.3 Assembly Definition of the Metadata Block

The minimum requirement is defining a data block that conforms to the IMAGE_DEF structure (detailed in the RP2350 datasheet). This is typically achieved using embedded assembly code to create a specific data section, often named .embedded_block.

A simplified representation of this assembly inclusion, often managed by the SDK’s internal files, looks like this in a bare-metal context:

				
					// In an assembly file (.S): embedded_metadata.S

.section.embedded_block, "a"
.global embedded_block_start
.global embedded_block_end

embedded_block_start:
    // Define mandatory fields of IMAGE_DEF here, including:
    // - Structure magic numbers
    // - Image length and offset pointers
    // - Version information
    // - Optional signature fields (for Secure Boot)
embedded_block_end

4.4 Custom Linker Configuration

The linker script (.ld file) is then customized to ensure this .embedded_block section is placed at the necessary absolute offset within the QSPI flash memory. By defining the memory regions and explicitly placing the .embedded_block at the beginning of the application’s code space, the developer satisfies the bootrom’s initial security check, effectively gaining the Root of Trust necessary to proceed to Core 0 initialization.

5. Core Wakeup

Once the bootrom has validated the image and executed the primary core, Core 0 (typically the Cortex-M33) takes control. This core is responsible for establishing the fundamental operating environment and initiating the execution of the second core.

5.1 Core 0 Initialization: Clock and Memory Setup

Core 0 must establish the complete memory map, defining the address spaces for the 520 KB of on-chip SRAM , the 4 MB of QSPI flash , and all peripheral registers. This core must also initialize the system clocks (PLLs), which are often configured to run at the maximum 150 MHz operating frequency. In bare-metal C, this means using volatile pointers to directly access the hardware registers for control, such as:

				
					// Example: Direct Register Access for Clock Setup (Illustrative)
#define CLK_PERI_CTRL_REG (*(volatile uint32_t*)0x40008000)

void initialize_clocks() {
    // Write value to register to set peripheral clock source...
    CLK_PERI_CTRL_REG = 0xDEADBEEF;
}

5.2 Manual Core Handoff

Core 0 must manually wake up Core 1. The Core 0 sequence for waking Core 1 involves using a dedicated hardware register block accessible by both cores:

Set Entry Point: Write the starting address of Core 1’s main function into a designated Core 1 Start Address Register. This function is often called core1_entry().
Release from Reset: Write a specific value to a dedicated Core 1 Control Register to release the core from its hardware reset state.

Upon release, Core 1 immediately fetches the instruction at the specified entry point and begins execution, running entirely asynchronously from Core 0.

Example: Simplified Core Handoff (Conceptual)

				
					// Shared register definitions (Must be verified against datasheet)
#define CORE1_START_REG (*(volatile uint32_t*)0x40000004)
#define CORE1_CTRL_REG  (*(volatile uint32_t*)0x40000008)

// Entry function for Core 1
void core1_entry() {
    // Core 1-specific initialization (e.g., dedicated peripheral setup)
    while (1) {
        // Core 1's main loop
    }
}

void main_core0() {
    // 1. Core 0 Initialization (Clocks, Memory Map, etc.)

    // 2. Set Core 1 entry address
    CORE1_START_REG = (uint32_t)core1_entry;

    // 3. Release Core 1 from reset
    CORE1_CTRL_REG = 0x1; // Write value to start execution
}

6. Inter Core Communication

With both cores running asynchronously, a mechanism for safe data sharing is essential. The RP2350’s generous 520 KB of high-speed SRAM is the ideal medium for this Inter-Processor Communication (IPC).

6.1 Shared Memory Structures

A critical challenge is memory coherence—ensuring both cores see the same, correct version of shared data. The strategy involves defining a dedicated segment within the SRAM for shared data structures, such as a messaging queue or a status buffer. This section must be declared with appropriate memory attributes (often volatile) to prevent compiler optimizations that could hide race conditions.

6.2 Atomic Synchronization Primitives

To prevent data corruption when both cores attempt to read or write to the same shared memory location simultaneously, synchronization primitives are mandatory. In a bare-metal environment, this means using hardware-level atomic instructions or dedicated hardware spinlocks.

The bare-metal implementation involves:

Hardware Spinlocks: Utilizing a dedicated hardware spinlock peripheral available on the RP2350 (similar to its predecessor).
Atomic Operations: Using the ARMv8-M instruction set’s built-in atomic load/store capabilities (e.g., LDREX/STREX) for tasks like incrementing a shared counter or flipping a flag.

By wrapping all shared memory access within a critical section protected by a spinlock, the developer guarantees that only one core can manipulate the shared resource at any given time, thus ensuring data integrity and coordinated parallelism.

7. Firmware Division

The RP2350’s core options, M33/M33 or M33/RISC-V, requires explicit firmware division at the compilation and linking stage. The dual-core execution model demands that the developer manage two conceptually distinct execution environments:

Core 0 (Primary, M33): This core handles initialization tasks (clocks, memory map, peripheral setup), manages the boot sequence validation, and explicitly launches Core 1. It often runs the main control loop and higher-level application logic.
Core 1 (Secondary, M33 or RISC-V): This core runs dedicated, often time-critical or computationally intensive tasks. If Core 1 is the RISC-V Hazard3 core, its firmware must be compiled using a separate RISC-V toolchain, and its memory map must be carefully aligned to access shared peripherals and SRAM using the addresses defined by Core 0.

This division dictates that the application is not monolithic but a coordinated pair of binaries or sections, bound together by custom boot logic and the Inter-Core Communication mechanism.

8. Conclusion

Mastering this low-level initialization and synchronization process, from appeasing the mandatory secure bootrom to managing two potentially different CPU instruction sets, is the necessary prerequisite for unlocking the full performance and security potential of the Raspberry Pi Pico 2 W. The complexity introduced by the IMAGE_DEF requirement and the manual orchestration of core wakeup and inter-core communication transforms bare-metal development on the RP2350 into a high-value skill for the professional embedded hacker 😉