Loading…
2025 OCP APAC Summit
Venue: TaiNEX2 - 701 F clear filter
Tuesday, August 5
 

1:00pm PDT

Architecting the AI Fabric: Scalable Networking for Next-Generation AI Servers, Racks, and Clusters
Tuesday August 5, 2025 1:00pm - 1:20pm PDT
AI workloads demand unprecedented levels of bandwidth, low latency, and deterministic communication across increasingly dense compute infrastructures. This work focuses on emerging network architectures tailored for AI servers, racks, and clusters—highlighting trends such as high-radix topologies, RDMA over converged Ethernet (RoCE), optical interconnects, and in-network compute. It examines how networking shapes system performance, scalability, and efficiency, and outlines architectural strategies to address bottlenecks in collective communication, model parallelism, and distributed training at hyperscale.
Speakers
Tuesday August 5, 2025 1:00pm - 1:20pm PDT
TaiNEX2 - 701 F

1:20pm PDT

Revisit RoCEv2 issues in large scale deployment and the future that UEC promise
Tuesday August 5, 2025 1:20pm - 1:40pm PDT
RoCEv2 is getting widely deployed due to emerging GenAI trend, and there is growing needs to mix AI workload and HPC workload to maximize infrastructure investment efficiency, RoCEv2 which is developed decades ago for simple workload starts to show its issues in hyperscale deployment, this leads to the development of UEC – Ultra Ethernet Consortium.
Speakers
PT

PoWen Tsai

Director, Technical Sales, Edgecore
SA

Suleman Azeem

Technical Product Management Executive, AMD
Tuesday August 5, 2025 1:20pm - 1:40pm PDT
TaiNEX2 - 701 F

1:40pm PDT

Evolving FBOSS to support Generative AI network workloads
Tuesday August 5, 2025 1:40pm - 2:00pm PDT
FBOSS is Meta’s own Software Stack for managing Network Switches deployed in Meta’s data centers. It is one of the largest services in Meta in terms of the number of instances deployed.

Network Traffic in AI Fabric presents unique challenges such as “elephant flows” (a small number of extremely large, continuous flows), and low entropy (limited variation in flow characteristics, increasing likelihood of hash collisions).

At OCP 2024, we showcased how we evolved FBOSS to tackle these challenges. This solution is capable of building non-blocking clusters for up to 4K GPUs. However, generative AI use cases demand significantly larger non-blocking clusters. This can be solved by interconnecting multiple 4K GPU clusters into a single, larger cluster using traditional Routing and ECMP. In this design, intra-cluster traffic benefits from non-blocking I/O, but inter-cluster traffic continues to suffer from poor network performance due to the aforementioned elephant flows and low entropy.

In this talk, we will share our journey evolving FBOSS for generative AI workloads. We will discuss the hierarchical design that enables us to build significantly larger non-blocking clusters, the unique challenges we encountered in scaling both the dataplane and control plane, and the solutions we developed to overcome them. Additionally, we will highlight the SAI enhancements that were instrumental in adapting FBOSS to support the demands of generative AI.
Tuesday August 5, 2025 1:40pm - 2:00pm PDT
TaiNEX2 - 701 F

2:00pm PDT

New paradigm for lossless DCI interconnect in the AI/ML era
Tuesday August 5, 2025 2:00pm - 2:20pm PDT
While AI/ML clusters continue to scale and are breaching the boundaries of physical locations in terms of both size and power - the need to scale and interconnect different locations becomes ever more crucial.

When new challenges of interconnected locations are extended to these use-cases, few considerations have to be met:
- Allowing high bandwidth to be effectively used between geographically dispersed location through various distances
- Support for lossless RDMA traffic
- Simple and condensed interconnection layer

The presentation will focus on how Broadcom’s Jericho product line allows for the implementation of such needs with innovations throughout the stack - from physical connectivity and all the way to intelligent load-balancing.
Speakers
AK

Amir Krayden

Sr Director Marketing, Broadcom
Tuesday August 5, 2025 2:00pm - 2:20pm PDT
TaiNEX2 - 701 F

2:20pm PDT

Dynamic ECN Threshold Testing Methodology and the Importance of qp-fairness
Tuesday August 5, 2025 2:20pm - 2:40pm PDT
This presentation will focus on an innovative dynamic Explicit Congestion Notification (ECN) threshold testing methodology, emphasizing the design rationale for test cases and the observational analysis of experimental results. We will explore how designed test cases trigger ECN threshold changes in dynamic network environments, ensuring comprehensive and effective testing.

A key insight from our research is the critical role of qp-fairness (Queue Pair fairness) in collective benchmarking, alongside traditional metrics like algorithmic bandwidth and bus bandwidth. Through comparative analysis of real-world test data, we demonstrate how maintaining qp-fairness under dynamic conditions significantly enhances the stability of ECN mechanisms and ensures equitable allocation of network resources.
By aligning theoretical insights with practical implementations, we hope to provide actionable insights for advancing research and applications in dynamic ECN technologies.
Speakers
EY

Eric Yu

Solution Architect, Keysight
Tuesday August 5, 2025 2:20pm - 2:40pm PDT
TaiNEX2 - 701 F

2:40pm PDT

e-Tube Technology - Breaking Interconnect Barriers To Accelerate AI Cluster Scale Up
Tuesday August 5, 2025 2:40pm - 3:00pm PDT
Decades-old copper and optical interconnect technologies limit AI cluster compute efficiency. The presentation will showcase e-Tube Technology - RF data transmission over plastic waveguide - and how it breaks the barriers of these legacy technologies by providing near-zero latency and 3x better energy efficiency than optics at a cost structure similar to copper. e-Tube is an ideal replacement for copper for terabit interconnect to scale up next-generation AI clusters.
Speakers
DK

David Kuo

VP of Product Marketing and Business Development, Point2 Technology
Tuesday August 5, 2025 2:40pm - 3:00pm PDT
TaiNEX2 - 701 F

3:15pm PDT

Panel: End-to-End Observability Across Network and Compute Layers for AI Workload Optimization
Tuesday August 5, 2025 3:15pm - 3:45pm PDT
Traditional network infrastructure observability tools fall short in AI environments, where interdependence between networking and computing layers directly impacts inference latency and throughput. Modern AI workloads—particularly large language models and computer vision pipelines—demand synchronized visibility across the data transport path (RDMA/GPU-to-GPU) and GPU execution stack to ensure performance consistency, avoid bottlenecks, and support real-time SLAs.

Our panelists will share their views and real world learnings on the required observabiolitty paradigm shiftings in opened networking in terms of architecture design, telemetry stack, policy engine, etc. that drives closed loop observability
Moderators
TZ

Tim Zhou

Accton
Speakers
avatar for Stefan Bokaie

Stefan Bokaie

CTO, Dorado Software
Stefan is a growth-focused and dynamic executive with extensive experience in leading all facets of technical operations. Stefan is currently serving as CTO of Dorado Software, a leading provider of Fabric Orchestration and Management for Enterprise, Cloud and Telco. Stefan's prior... Read More →
WC

William Chiang

Edgecore Networks
HS

Hasan Siraj

Broadcom
AE

Amir Elbaz

Beyond Edge Networks
Tuesday August 5, 2025 3:15pm - 3:45pm PDT
TaiNEX2 - 701 F

3:45pm PDT

Proactive Link Management in AI Networks: Lessons from Meta
Tuesday August 5, 2025 3:45pm - 4:05pm PDT
In the realm of AI networks, the health of physical links is paramount to ensuring optimal performance and reliability. At Meta, we recognize that robust physical connectivity is crucial for the seamless operation of AI workloads, which demand high-speed and reliable data transmission. This presentation will delve into Meta's comprehensive strategy for maintaining healthy physical links within our AI networks.

We will explore the significance of link health in AI networks, emphasizing how it impacts overall system efficiency and performance. Meta employs advanced physical layer diagnostics, including Pseudo-Random Binary Sequence (PRBS) and Forward Error Correction (FEC) diagnostics, to rigorously test and validate link integrity before deployment into production. These diagnostics help identify potential issues, ensuring only healthy links are operational.

Furthermore, we will discuss Meta's proactive approach to managing link health in production environments. Unhealthy links are swiftly removed from service, and an automated triage pipeline is employed to facilitate effective repairs. This pipeline not only enhances the speed and accuracy of link restoration but also minimizes downtime, thereby maintaining the high reliability standards expected in AI network operations.
Speakers
Tuesday August 5, 2025 3:45pm - 4:05pm PDT
TaiNEX2 - 701 F

4:05pm PDT

Adaptive Multi-Tenant Orchestration of AI Fabric
Tuesday August 5, 2025 4:05pm - 4:25pm PDT
This presentation delves into challenges and opportunities for AIaaS-providers to efficiently deploy and manage multi-tenant AI fabrics and clusters. Deployment of SONiC AI infrastructure with optimal tuning especially for AIaaS provider, or and Enterprise supporting Inference at Edge can be a complex and daunting task. We will present the required features to simplify deployment of backend AI SONiC Fabrics in a controller. Tuning fabrics supporting AI must be take into consideration factors such as, AI job type as well as its sensitivity to latency, tier of the tenant scheduling the job, and tuning capabilities of the underlying SONiC platforms, and implement an adaptive solution.
The presentation introduces the concept of AI tenancy,, and how tenancy could be considered when orchestrating and tuning the underlying infrastructure.
Speakers
avatar for Stefan Bokaie

Stefan Bokaie

CTO, Dorado Software
Stefan is a growth-focused and dynamic executive with extensive experience in leading all facets of technical operations. Stefan is currently serving as CTO of Dorado Software, a leading provider of Fabric Orchestration and Management for Enterprise, Cloud and Telco. Stefan's prior... Read More →
Tuesday August 5, 2025 4:05pm - 4:25pm PDT
TaiNEX2 - 701 F

4:25pm PDT

Scale-up AI Networking Alternatives - Comparing UALink, SUE and NVLink
Tuesday August 5, 2025 4:25pm - 5:00pm PDT
Speakers
SY

Sharada Yeluri

Astera Labs
Tuesday August 5, 2025 4:25pm - 5:00pm PDT
TaiNEX2 - 701 F
 
Wednesday, August 6
 

9:00am PDT

Open Chiplet Economy: Bridging Taiwan and Silicon Valley
Wednesday August 6, 2025 9:00am - 9:30am PDT
Speakers
avatar for Cliff Grossner

Cliff Grossner

Chief Innovation Officer, Open Compute Project Foundation (OCP)
JN

Jawad Nasrullah

Open Compute Project Foundation (OCP)
Wednesday August 6, 2025 9:00am - 9:30am PDT
TaiNEX2 - 701 F

9:30am PDT

Meeting AI Workload Demands with Arm CSA and Chiplet
Wednesday August 6, 2025 9:30am - 9:45am PDT
The growing scale and specialization of AI workloads are reshaping infrastructure design. With Arm Chiplet System Architecture, it enables custom silicon/chiplet to meet market-specific needs. In this talk, we explore how chiplet-based designs optimize performance and lower total cost of ownership. Learn how standards, compute subsystems, and a maturing ecosystem are reshaping the datacenter at scale.
Speakers
Wednesday August 6, 2025 9:30am - 9:45am PDT
TaiNEX2 - 701 F

9:45am PDT

10:00am PDT

Integrated Photonics for Optical Interconnects
Wednesday August 6, 2025 10:00am - 10:15am PDT
Speakers
EC

Erik Chen

Artilux
Wednesday August 6, 2025 10:00am - 10:15am PDT
TaiNEX2 - 701 F

10:15am PDT

11:15am PDT

Extending the Frontier: Heterogeneous Integration of Chiplet Designs
Wednesday August 6, 2025 11:15am - 11:30am PDT
Speakers
DC

Dr. CT Kao

Cadence
Wednesday August 6, 2025 11:15am - 11:30am PDT
TaiNEX2 - 701 F

1:00pm PDT

AI-Driven Multiphysics Analysis for Silicon-to-System Advanced Packaging
Wednesday August 6, 2025 1:00pm - 1:15pm PDT
The rapid evolution of semiconductor technology and the growing demand for heterogeneous integration have positioned advanced packaging as a critical enabler of next-generation electronic systems. As devices become more compact and functionally dense, traditional single-die analysis methods are no longer sufficient. Instead, a system-level approach—spanning from silicon to full system integration—is essential to ensure performance, and reliability.
This talk explores how advanced packaging technologies such as 2.5D/3D IC, and chiplets serve as the foundation for silicon-to-system multiphysics analysis. We delve into the multi-scale, multi-domain simulation challenges—including thermal, mechanical, electrical and optical interactions—and examine how state-of-the-art simulation tools and methodologies are bridging the gap between design abstraction levels.
Finally, an AI-driven thermal analysis that illustrates how complex chiplet designs influence floorplanning decisions. That proposed approach accelerates design space exploration, enhances prediction accuracy, and enables optimization of packaging architectures—from chiplet interconnects to full-system integration.
Speakers
Wednesday August 6, 2025 1:00pm - 1:15pm PDT
TaiNEX2 - 701 F

1:15pm PDT

AMD Advanced Packaging - Past, Present, and Future
Wednesday August 6, 2025 1:15pm - 1:30pm PDT
Speakers
Wednesday August 6, 2025 1:15pm - 1:30pm PDT
TaiNEX2 - 701 F

1:45pm PDT

From Edge to Cloud: Custom ASIC Unleashing Datacenter AI Innovation
Wednesday August 6, 2025 1:45pm - 2:00pm PDT
Speakers
CP

CK Peng

MediaTek
Wednesday August 6, 2025 1:45pm - 2:00pm PDT
TaiNEX2 - 701 F

2:15pm PDT

Chiplets Based HPC And AI Product Test Challenges
Wednesday August 6, 2025 2:15pm - 2:30pm PDT
AI / HPC solutions is being addressed by Heterogenous Packaging 2.5 and 3D.
Chiplets and HBM stack are finding way to realise the product development quicker and optimised for the required performance.
Till now mostly the Chiplets based integration is Homogeneous ( Same kind of Chiplet designed within the company only HBM stack from third party vendor) but the industry started to move to Heterogenous ( different Chiplet from different vendors ).
Complex Package design with different CTE ( Coefficient of Thermal Expansion ) becomes a key aspect to be taken care in the material selection and design as the physical phenomena can impact the Physical and Electrical aspects of the Device and hence Final Test Yield, Reliability and Field Returns.
A well thought "Design For Test" to Final ATE Test strategy ( Wafer and Package) are required to optimise Test cost, performance and product reliability, as the defects in even a single Chiplet can lead to costly failures at the System Level.
Speakers
YS

Yogan Senthilkumar

Vice President - Engineering, Tessolve
Wednesday August 6, 2025 2:15pm - 2:30pm PDT
TaiNEX2 - 701 F

2:45pm PDT

Lifecycle System Monitoring with Arm’s System Monitoring Control Framework (SMCF) and proteanTecs
Wednesday August 6, 2025 2:45pm - 3:00pm PDT
As system complexity grows, ensuring reliability, power efficiency, and performance is critical. proteanTecs, a leader in electronics monitoring, has integrated its deep data monitoring with Arm’s System Monitoring Control Framework (SMCF), enhancing Arm Neoverse CSS solutions with predictive analytics and lifecycle insights. SMCF offers a modular framework for telemetry, diagnostics, and control. By embedding proteanTecs' in-chip agents and software, the integration boosts system visibility, enabling optimized power/performance, improved reliability, and faster diagnostics. This collaboration empowers semiconductor manufacturers and system operators to meet evolving demands with scalable, architecture-agnostic solutions. The presentation will highlight key applications such as predictive maintenance, defect detection, and power optimization for next-gen high-performance compute environments.
Speakers
DH

Dragon Hsu

Director Application Engineering, proteanTecs
Wednesday August 6, 2025 2:45pm - 3:00pm PDT
TaiNEX2 - 701 F

3:15pm PDT

Panel: Chiplet Technology in the AI Era: Opportunities and Challenges
Wednesday August 6, 2025 3:15pm - 4:00pm PDT
Moderators
EH

Eric Huang

DIGITIMES
Speakers
Wednesday August 6, 2025 3:15pm - 4:00pm PDT
TaiNEX2 - 701 F
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.