Exciting news: NET4EXA was proudly represented at SC2025 in St. Louis, Missouri, from 16–21 November. From presentations and workshops to demonstrations of cutting-edge HPC technologies, the event was a great opportunity to showcase our project’s progress and engage with the high-performance computing community.Discover all our activities at SC2025 in this article!
The project’s presence included several highlights: Étienne Walter (Eviden) contributed to a Birds of a Feather (BoF) session organized by ETP4HPC, showcasing the project, while our partners Daniele De Sensi (Sapienza Università di Roma) presented his scientific publication in the “Collective Operations and Communication” session, and Michele Martinelli (INFN) took part in the Workshop on Accelerator Programming and Directives (WACCPD 2025). These contributions highlighted the latest research advances within NET4EXA.

NET4EXA at SC2025 – Birds of a Feather with ETP4HPC
One of the highlights for NET4EXA at SC2025 was our participation in the Birds of a Feather (BoF) session organized by ETP4HPC, the European Technology Platform for HPC. The session, held on 19 November in Room 275, focused on European collaboration for next-generation HPC technologies under the theme: “Where could Europe add value?”.
During this session, Etienne Walter showcased NET4EXA’s contributions, providing attendees with an overview of the project’s objectives, achievements, and its role within the broader European HPC ecosystem. The BoF offered an excellent platform to exchange ideas, strengthen collaborations, and learn more about key EuroHPC Joint Undertaking (EuroHPC JU) projects shaping the future of HPC in Europe.
We are grateful to ETP4HPC for the invitation and the opportunity to share our work with the European HPC community.

Advancing Collective Operations with Bine Trees
Another highlight from NET4EXA at SC2025 was Daniele De Sensi’s presentation on optimizing collective operations through innovative network communication strategies. His work introduces Bine Trees, a new approach to improving communication locality and performance in high-performance computing, and demonstrates the project’s commitment to tackling key challenges in next-generation HPC networks.
Daniele De Sensi presented his work on optimizing collective operations through innovative network communication strategies. He was invited to present his research at Supercomputing during the “Collective Operations and Communication” session. His article, titled “Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality” introduced Bine Trees, a new family of collective algorithms designed to address oversubscribed network topologies.
Bine Trees reduce traffic on oversubscribed links by up to 33%, leading to significant performance improvements. The algorithms’ performance was assessed on four supercomputers, three of which were procured by EuroHPC, demonstrating up to 80% performance gain. This work is directly related to the activities carried out within the NET4EXA project, focusing on optimizing communication locality for high-performance computing applications.
He was also invited to present at the Broadcom booth his work on the design of congestion-tolerant collective operations, covering both Bine Trees and in-network compute, which UNIROMA1 will further investigate in relation to the design of future BXI versions.
Read the full publication to dive deeper into Daniele’s research on Bine Trees >>> Here

Enabling Ultra-Low-Latency GPU-FPGA Communication
The final highlight from NET4EXA at SC2025 featured Michele’s presentation at the Workshop on Accelerator Programming and Directives (WACCPD 2025). His work focuses on developing ultra-low-latency communication mechanisms between GPUs and PCIe-based FPGA devices, a key enabler for next-generation HPC and AI workloads.
Michele took part in the Workshop on Accelerator Programming and Directives (WACCPD 2025), where he presented the publication titled:“Bridging FPGA and GPU over PCIe: A Low-Latency Communication Path using AVX-512.”
During his presentation, Michele highlighted the key results of the work, which introduces a new communication mechanism enabling ultra-low-latency data transfers between GPUs and PCIe-based FPGA devices.
Unlike traditional approaches relying on DMA engines, this method leverages Programmed I/O (PIO) operations to achieve sub-2µs one-way latency for small messages, when the FPGA acts as a network interface card (NIC).
The prototype was validated on APEnetX, a custom FPGA-based NIC, with a CPU engine capable of injecting data directly into the FPGA memory-mapped region using AVX-512 instructions, from either the host CPU or GPU memory. Microbenchmarks show that this approach achieves lower latency than classical RDMA for small packets, while maintaining a simpler software stack. Importantly, the method is generalizable to any NIC exposing a PCIe-mapped control aperture.
Low-latency communication is essential for scalable network architectures and will play a decisive role in the evolution of future BXI versions. As high-performance computing increasingly supports AI workloads and data-intensive simulations, having a fast, reliable network is critical for performance and scalability. This work demonstrates a key step toward enabling the next generation of HPC and AI-optimized networks within the NET4EXA project.
Read the full publication to dive deeper into Michele’s research on low-latency FPGA-GPU communication >>> Here

