2025 OCP APAC Summit: Full Schedule

The talk explored Azure’s purpose-built infrastructure—featuring advanced accelerators, scalable networking, and robust orchestration—and its journey of innovation through partnerships, including the Mount Diablo project with Meta/Google. Emphasis was placed on overcoming challenges in power delivery, cooling, and energy efficiency, with a call to reimagine system architecture and embrace high-voltage DC solutions to sustainably scale next-generation AI workloads.

Speakers

Kelvin Huang

Microsoft

Wednesday August 6, 2025 9:30am - 9:50am PDT
TaiNEX 2 - 701 E

AI Clusters

9:50am PDT

Catalina- Meta's Latest AI/ML System Overview

Wednesday August 6, 2025 9:50am - 10:10am PDT

TaiNEX 2 - 701 E

- Meta's latest AI/ML rack design, Catalina (GB200), Meta's latest AI system features a compute tray that serves as the primary CPU+GPU components. To expedite time-to-market, we leveraged industry solutions while implementing targeted customizations to optimize integration within Meta's infrastructure.
- The increasing power density of AI hardware poses significant challenges, including the need for liquid cooling, which introduces complexities in leak detection, system response, reliability, and safety. With multiple hardware platforms in rapid development, there is a pressing need for adaptable hardware that can manage these new interfaces and controls.
- Our solution, the RMC (Rack Management Controller) tray, addresses these challenges by providing a 1OU device that handles all leak detection and hardware response to leaks. The RMC offers flexible integration into upcoming AI platforms and interfaces with various systems, including Air-Assisted Liquid Cooling (AALC), Facility Liquid Cooling (FLC), and all leak sensors. The RMC provides a robust and reliable solution for managing liquid cooling across Meta’s multiple platforms.

Speakers

LJ Chen

10:10am PDT

AI Computing Design Trends for LLMs in the Generative AI Era

Wednesday August 6, 2025 10:10am - 10:30am PDT

TaiNEX 2 - 701 E

Large Language Models (LLMs) have demonstrated exceptional performance across numerous generative AI applications, but require large model parameter sizes. These parameters range from several billion to trillions, leading to significant computational demands for both AI training and inference. The growth rate of these computational requirements significantly outpaces advancements in semiconductor process technology. Consequently, innovative IC and system design techniques are essential to address challenges related to computing power, memory, bandwidth, energy consumption, and thermal management to meet AI computing needs.

In this talk, we will explore the evolution of LLMs in the generative AI era and their influence on AI computing design trends. For AI computing in data centers, both scale-up and scale-out strategies are employed to deliver the huge computational power required by LLMs. Conversely, even smaller LLM models for edge devices demand more resources than previous generations without LLMs. Moreover, edge devices may also act as orchestrators in device-cloud collaboration. These emerging trends will significantly shape the design of future computing architectures and influence the advancement of circuit and system designs.

Speakers

Bor-Sung (BS) Liang

MediaTek

Wednesday August 6, 2025 10:10am - 10:30am PDT
TaiNEX 2 - 701 E

AI Clusters

10:45am PDT

Role of Ethernet in the Next Generation AI System Architectures

Wednesday August 6, 2025 10:45am - 11:00am PDT

TaiNEX 2 - 701 E

The relentless demand for AI is driving hyperscalers to deploy ever-increasing clusters of GPUs and custom accelerators. As these deployments scale, system architectures must evolve to balance cost, performance, power, and reliability. A critical aspect of this evolution is the high-speed signaling that connects the various components. This presentation delves into the high-speed protocols such as PCIe, CXL, UALink, Ethernet, and Ultra Ethernet – exploring their intended use cases and evaluating where these protocols are complementary or competitive. Additionally, the presentation will address the evolving Scale-Up and Scale-Out architecture, highlighting their respective protocols and interconnect solutions. Special attention will be given to the adoption of Ethernet as a problem-solving technology in AI-driven environments. Through this discussion, we aim to provide a comprehensive overview of the options available and justify their use in modern cloud service architectures.

Speakers

Susmita Joshi

Product Line Manager, Astera

Wednesday August 6, 2025 10:45am - 11:00am PDT
TaiNEX 2 - 701 E

AI Clusters

11:00am PDT

High LLM Density for Sustainable AI Computing Now and Coming Years

Wednesday August 6, 2025 11:00am - 11:15am PDT

TaiNEX 2 - 701 E

FuriosaAI's technology demonstrates to infra/datacenter AI deployment professionals that rapid and more powerful advancements to GPUs are great for hyperscalers but poorly matched for typical data center (Leveraging OCP for Sovereign AI Plans - presented by Supermicro shows over 70% of data centers are 50kW - 0.5Mw.) The ability to openly choose compute projects that are designed to make computers more sustainable are the cornerstone of the OCP.

We will introduce the Tensor Contraction Processor (TCP), a novel architecture that reconceptualizes tensor contraction as the central computational primitive, enabling a broader class of operations beyond traditional matrix multiplication. And how it unlocks designing AI inference chips that can achieve the performance, programmability, and power efficiency trifecta for data centers.

Given the power constraints of data centers and the wide variation in rack power capacities, we are learning to evaluate total token generation throughput across AI accelerators within the same rack power budget, which is a metric that resonates strongly with our early enterprise and AI compute provider partners.

Speakers

Alex Liu

Furiosa AI

Wednesday August 6, 2025 11:00am - 11:15am PDT
TaiNEX 2 - 701 E

AI Clusters

11:30am PDT

Elastic Management Framework for the AI Cluster

Wednesday August 6, 2025 11:30am - 11:45am PDT

TaiNEX 2 - 701 E

In this presentation, we will showcase a new data center architecture based on OCP Rack with liquid-cooling equipment. Under such a new AI Custer, how to collect, store, analyze, and visualize data to provide data center managers with the ability to effectively manage such a new architecture. We also provide a mechanism to effectively cooperate with the existing operating support system to seamless integrate new AI cluster architecture into legacy datacenter management. Going further, we will propose a new approach on how to use AI methodology to manage AI Clusters as Wiwynn's future work.

Speakers

Karl Chiang

Director, Wiwynn

Wednesday August 6, 2025 11:30am - 11:45am PDT
TaiNEX 2 - 701 E

AI Clusters

11:45am PDT

AI Application Concepts and Hardware Optimization by Workload

Wednesday August 6, 2025 11:45am - 12:00pm PDT

TaiNEX 2 - 701 E

Inference tasks vary widely in complexity, data size, latency requirements, and parallelism, and each workload type interacts differently with CPU capabilities. Understanding this relationship allows for more effective hardware selection and optimization strategies tailored to specific use cases.

Key Learning Areas
-AI Model Architecture
-Types of Inference Workloads
-Quantization: Balancing Accuracy and Efficiency
-Data Throughput and Bandwidth
-Benchmarking Inference Performance
-Frameworks and Libraries Impact Performance

Speakers

Gerry Juan

System Architect, Intel

Wednesday August 6, 2025 11:45am - 12:00pm PDT
TaiNEX 2 - 701 E

AI Clusters

1:00pm PDT

Technical Deep Dive of UEC 1.0

Wednesday August 6, 2025 1:00pm - 1:15pm PDT

TaiNEX 2 - 701 E

The recent publication of Ultra Ethernet 1.0 is an ambitious project of tuning the Ethernet stack to accommodate AI and HPC workloads. It covers everything from physical layer to software APIs. What makes it different? How does it work? This session explains the whys, hows, and whats of UEC 1.0 and describes the future of high-performance networking.

Speakers

J Metz

Technical Director, Advanced Networking and Storage Strategy, AMD

Wednesday August 6, 2025 1:00pm - 1:15pm PDT
TaiNEX 2 - 701 E

AI Clusters

1:15pm PDT

UALink Overview

Wednesday August 6, 2025 1:15pm - 1:30pm PDT

TaiNEX 2 - 701 E

Speakers

Kurtis Bowman

UALink Consortium Board Chair, AMD

Wednesday August 6, 2025 1:15pm - 1:30pm PDT
TaiNEX 2 - 701 E

AI Clusters

1:30pm PDT

Scale Up Ethernet Overview

Wednesday August 6, 2025 1:30pm - 1:45pm PDT

TaiNEX 2 - 701 E

Speakers

Hasan Siraj

Broadcom

Wednesday August 6, 2025 1:30pm - 1:45pm PDT
TaiNEX 2 - 701 E

AI Clusters

2:45pm PDT

Cooling Solutions for AI Racks and Clusters

Wednesday August 6, 2025 2:45pm - 3:00pm PDT

TaiNEX 2 - 701 E

According to the development trend of power consumption and heat dissipation in the general sever and AI server, the evolution of cooling solutions has changed from air cooling to hybrid cooling, then to full liquid cooling. In response to this development trend, we proposed an integrated liquid cooling solution for the building blocks of the AI clusters, including the AI IT rack, High Power Cabinet, and the Cooling Cabinet.

Speakers

Chris Pai

Engineering VP, Foxconn/Ingrasys

Wednesday August 6, 2025 2:45pm - 3:00pm PDT
TaiNEX 2 - 701 E

AI Clusters

3:15pm PDT

Liquid Cooling Solutions for Large-Scale AI Clusters

Wednesday August 6, 2025 3:15pm - 3:30pm PDT

TaiNEX 2 - 701 E

Speakers

Daniel Kapesa

Supermicro

Wednesday August 6, 2025 3:15pm - 3:30pm PDT
TaiNEX 2 - 701 E

AI Clusters

3:30pm PDT

Innovative Liquid-Cooled Bus Bars: Enhancing Power Management in High-Density Data Centers

Wednesday August 6, 2025 3:30pm - 3:45pm PDT

TaiNEX 2 - 701 E

Exploring liquid-cooled bus bars addresses the increasing power demands in modern data centers, particularly those exceeding 150kW per rack with AI and HPC workloads. Traditional bus bar designs struggle with current limitations, hindering efficient power management. Liquid-cooled bus bars integrate cooling channels to enhance heat dissipation, maintaining optimal temperatures and improving system safety and reliability. This approach mitigates thermal runaway risks and ensures compliance with industry standards, while boosting efficiency by minimizing energy losses associated with high current densities. Implementing liquid-cooled bus bars signifies a significant advancement in data center infrastructure, enabling higher power densities, superior thermal management, and overall improved performance.

Speakers

Nrupathunga Chakravarthy SR

Technology Manager, Molex

Anumalasetty Sai Surya Teja

Technology Engineer, Molex

Wednesday August 6, 2025 3:30pm - 3:45pm PDT
TaiNEX 2 - 701 E

AI Clusters

3:45pm PDT

Advancing the AI Factory by Doing More with More

Wednesday August 6, 2025 3:45pm - 4:00pm PDT

TaiNEX 2 - 701 E

ChatGPT began AI's watershed moment that triggered IT infrastructure's tectonic shift and race in extraordinary and lasting commitments to AI Factory. Many governments and enterprises alike are making enormous capital and people investments to not be left behind the AI boom. Corporate boardrooms are evaluating purposeful infrastructure plans. What is the best architectural decision - retrofitting, built from scratch or adopt a wait-and-see? This fork in the road has given pause and decision paralysis to some infrastructure decision makers. Our talk examines the AI Factory Spectrum to identify solutions that advance the infrastructure challenge sustainably.

Speakers

Chih-Tsung Huang

Senior Director, Cisco

Wei-Jen Huang

Distinguished Engineer, Cisco

Wednesday August 6, 2025 3:45pm - 4:00pm PDT
TaiNEX 2 - 701 E

AI Clusters