AI workloads are reshaping the architecture and demands of modern data centers, calling for high-performance, scalable, and energy-efficient infrastructure. This presentation explores how AI-driven transformation is impacting data center design and operations, and highlights how Delta leverages the expertise in power and thermal solutions to meet these demands. Delta’s integrated systems play a crucial role in ensuring reliable, intelligent, and sustainable operations in the age of AI.
Nowadays, data centers use dielectric fluids as a coolant to prevent damage and downtime when leakage occurs. However, dielectric fluids typically have high viscosity and low specific heat, resulting in poor cooling performance. To improve performance while retaining the benefits of dielectric fluids, Superfluid technology has emerged and been investigated. Superfluid technology introduces air into the coolant, forming bubbles that reduce the frictional resistance in the movement of the coolant. This results in a lower boundary layer thickness and enhances the heat convection coefficient. When using a specific dielectric fluid with superfluid technology, the heat transfer capacity can achieve 66% of that of water (compared to 55% with dielectric fluid alone). This paper implemented superfluid technology on an AI server with a cold plate solution as a test platform and explored the improvements brought by superfluid technology.
The growing scale and specialization of AI workloads are reshaping infrastructure design. With Arm Chiplet System Architecture, it enables custom silicon/chiplet to meet market-specific needs. In this talk, we explore how chiplet-based designs optimize performance and lower total cost of ownership. Learn how standards, compute subsystems, and a maturing ecosystem are reshaping the datacenter at scale.
I will go through the topic from a single server node design to the data-centre- level design under economy od scale in terms of mechanical thermal power and such things that we can differentiate ourselves.
The talk explored Azure’s purpose-built infrastructure—featuring advanced accelerators, scalable networking, and robust orchestration—and its journey of innovation through partnerships, including the Mount Diablo project with Meta/Google. Emphasis was placed on overcoming challenges in power delivery, cooling, and energy efficiency, with a call to reimagine system architecture and embrace high-voltage DC solutions to sustainably scale next-generation AI workloads.
Liteon will share its latest advancements in power solutions for AI infrastructure, focusing on high-efficiency, high-density designs for GPU-centric systems. This session will explore how Liteon's integrated architectures support scalable deployment in modern data centers, addressing the growing demands of performance and energy optimization.
As data centers evolve to meet increasing demands for energy efficiency, operational safety, and environmental sustainability, cooling technologies play a pivotal role in enabling this transformation. This presentation explores how synthetic ester coolants offer a versatile and eco-friendly solution to address the diverse thermal management needs of modern data centers.
As technological progress and innovation continue to shape server products, Wiwynn introduces a reinforced chassis with a novel embossed pattern design to reduce material consumption and carbon footprint. This paper presents the development process, from pattern optimization using Finite Element Analysis (FEA) to real-world static and dynamic mechanical testing for verification. Through this approach, Wiwynn successfully developed an embossed pattern, enabling the replacement of the original heavy chassis with a thinner and lighter design. In currently applications, this innovation has reduced material usage by at least 16.7% and lowered carbon emissions by approximately 15.9%, while achieving a 4.2% cost reduction. This lightweight, cost-effective, and sustainable chassis design reinforces Wiwynn’s commitment to sustainable server solutions and offers potential for further development.
- Meta's latest AI/ML rack design, Catalina (GB200), Meta's latest AI system features a compute tray that serves as the primary CPU+GPU components. To expedite time-to-market, we leveraged industry solutions while implementing targeted customizations to optimize integration within Meta's infrastructure. - The increasing power density of AI hardware poses significant challenges, including the need for liquid cooling, which introduces complexities in leak detection, system response, reliability, and safety. With multiple hardware platforms in rapid development, there is a pressing need for adaptable hardware that can manage these new interfaces and controls. - Our solution, the RMC (Rack Management Controller) tray, addresses these challenges by providing a 1OU device that handles all leak detection and hardware response to leaks. The RMC offers flexible integration into upcoming AI platforms and interfaces with various systems, including Air-Assisted Liquid Cooling (AALC), Facility Liquid Cooling (FLC), and all leak sensors. The RMC provides a robust and reliable solution for managing liquid cooling across Meta’s multiple platforms.
Data center operators and silicon providers are aligning on a durable coolant temperature. of 30℃ to meet long-term roadmaps. There is also interest in supporting higher coolant temperatures for heat reuse and lower temperatures for extreme density required for AI workloads. To understand coolant temperature requirements, thermal resistance from silicon to the environment will be discussed. In addition, areas of thermal performance be investigated by the industry will be reviewed.
Nepal is in early stages of the digitisation, after the political reform infrastructure, digitisation and egovernance project are widely scaling, however the amount of infrastructure that were supposed to scale has not been done as per the demand. Sustainable solutions and scalability has not been in the priority as still awareness is required however the basic data center design around the fintech and health care technology is widely scaling. Being closely working with Ministries and government projects there are requirements within the government but right direction and roadmap is required for which National level Blue print document to include the AI, Healthcare, Fintech and National interoperability project is in the pipeline. Interoperability layer requires a lot of the resources to be build, health related Interoperability layer has been built but we are looking after the National IOL layer. OpenMRS, OpenStack, OpenHIM, OpenHIE, Ubuntu, Nutanix, Dell are key players.
As AI servers rapidly scale in performance and density, traditional data centers face increasing challenges in meeting low PUE (Power Usage Effectiveness) targets and high TDP (Thermal Design Power) components cooling due to infrastructure limitations. Cold plate liquid cooling has emerged as a mainstream solution with its high thermal efficiency. However, the risk of coolant leakage — potentially damaging AI systems — remains a significant concern. While existing mechanisms (e.g. leak detection) offer a partial safeguard, still do not address the root cause. To resolve this, Intel introduces a game-changing approach by replacing conventional coolants with dielectric fluids, inherently eliminating the threat of electrical damage from leaks. Recognizing the thermal performance limitations of dielectric fluids compared to water, Intel integrates superfluid technology into CDU to dramatically enhance heat dissipation capabilities. This innovation not only fortifies cold plate cooling systems but also paves the way for extending the benefits to single-phase immersion cooling, redefining the technical boundaries of liquid cooling in data centers.
Large Language Models (LLMs) have demonstrated exceptional performance across numerous generative AI applications, but require large model parameter sizes. These parameters range from several billion to trillions, leading to significant computational demands for both AI training and inference. The growth rate of these computational requirements significantly outpaces advancements in semiconductor process technology. Consequently, innovative IC and system design techniques are essential to address challenges related to computing power, memory, bandwidth, energy consumption, and thermal management to meet AI computing needs.
In this talk, we will explore the evolution of LLMs in the generative AI era and their influence on AI computing design trends. For AI computing in data centers, both scale-up and scale-out strategies are employed to deliver the huge computational power required by LLMs. Conversely, even smaller LLM models for edge devices demand more resources than previous generations without LLMs. Moreover, edge devices may also act as orchestrators in device-cloud collaboration. These emerging trends will significantly shape the design of future computing architectures and influence the advancement of circuit and system designs.
As the MHS standard continue to grow, the need to complete the remaining elements in the solutions become critical. Intel and UNEEC has been following the Edge -MHS standardization and working on developing off-the-shelf chassis solutions that can easily enable the Edge-MHS building blocks.
The relentless demand for AI is driving hyperscalers to deploy ever-increasing clusters of GPUs and custom accelerators. As these deployments scale, system architectures must evolve to balance cost, performance, power, and reliability. A critical aspect of this evolution is the high-speed signaling that connects the various components. This presentation delves into the high-speed protocols such as PCIe, CXL, UALink, Ethernet, and Ultra Ethernet – exploring their intended use cases and evaluating where these protocols are complementary or competitive. Additionally, the presentation will address the evolving Scale-Up and Scale-Out architecture, highlighting their respective protocols and interconnect solutions. Special attention will be given to the adoption of Ethernet as a problem-solving technology in AI-driven environments. Through this discussion, we aim to provide a comprehensive overview of the options available and justify their use in modern cloud service architectures.
With the rise of AI computing, traditional air cooling methods are no longer sufficient to handle the thermal challenges in high-performance computing (HPC) systems. Liquid cooling has emerged as a reliable and efficient alternative to dissipate heat at kilowatt levels. In this presentation, we will introduce the liquid cooling technologies developed by TAIWAN MICROLOOPS, including the Cooling Distribution Unit (CDU) and various types of cold plates. Standard and customized CDUs are designed to meet refrigeration capacity demands ranging from several kilowatts to hundreds of kilowatts. We will also demonstrate both single-phase and two-phase cold plates. These solutions are designed to enhance thermal management efficiency and meet the increasing demands of AI-driven data centers.
This presentation outlines the evolving requirements and technical considerations for next-generation Open Rack V3 (ORv3) Power Supply Units (PSUs) and power shelves, with a focus on the transition from ORv3 to High Power Rack (HPR) and HPR2 architectures. It highlights significant advancements such as increased power density from 33kW to 72kW, enhanced support for AI-driven pulse load demand. HVDC architecture is also introduced for quick adaptation to solve the challenging of bus bar while power demand from AI still keeps on increasing.
Since the number of CPU cores grows significantly nowadays, the demand of hardware partitioning has become evident. Hardware partitioning could improve the security, multi-task ability and resource efficiency of each CPU. In this paper, we’d like to share Wiwynn’s concept of Hardware Partitioning (HPAR) architecture, which can be implemented in multiple CPUs system with single DC-SCM. With assistant BMC’s help, BMC has the access to each CPU and dual socket system can boot up as either single or dual node. The HPAR method creates strict boundaries between each socket, which reduces the risk of unauthorized access or data leakage between partitions. Also, each partition can perform different tasks on one system simultaneously, optimizing the hardware utilization by segmenting workloads.
FuriosaAI's technology demonstrates to infra/datacenter AI deployment professionals that rapid and more powerful advancements to GPUs are great for hyperscalers but poorly matched for typical data center (Leveraging OCP for Sovereign AI Plans - presented by Supermicro shows over 70% of data centers are 50kW - 0.5Mw.) The ability to openly choose compute projects that are designed to make computers more sustainable are the cornerstone of the OCP.
We will introduce the Tensor Contraction Processor (TCP), a novel architecture that reconceptualizes tensor contraction as the central computational primitive, enabling a broader class of operations beyond traditional matrix multiplication. And how it unlocks designing AI inference chips that can achieve the performance, programmability, and power efficiency trifecta for data centers.
Given the power constraints of data centers and the wide variation in rack power capacities, we are learning to evaluate total token generation throughput across AI accelerators within the same rack power budget, which is a metric that resonates strongly with our early enterprise and AI compute provider partners.
Energy efficiency is one of the main contributors to reaching the Paris Agreement. By optimizing the world’s energy consumption, and being able to produce more from less, we can meet our increased energy demand and reduce CO 2 emissions at the same time. In fact, according to the International Energy Agency, increased efficiency could account for more than 40% of emissions reductions in the next 20 years. As much as 50% of data center potential for energy saving comes from the waste heat recovery, and 30% can be achieved in data center buildings. And the solutions to enable these energy efficiency improvements already exist! We have decades of experience developing plate heat exchanger technologies that support our customers to optimize energy use in their processes. Our unique thermal solutions make it possible to save dramatic amounts of energy and electric power and thereby reduce carbon emissions!
The shift to +/-400V DC power systems is crucial to meet the rising power demands of AI/ML applications, supporting rack densities of >140 kW. This transition introduces significant challenges for power distribution within datacenters. Critical components like bus bars, connectors, and cables must meet stringent requirements for power handling, thermal management, reliability, safety, and density. This paper explores design solutions for electromechanical interconnects in these high-power environments, drawing parallels with mature ecosystems in industries like Electric Vehicles. Innovative approaches to bus bar design and connector technology offer the performance and space savings needed for next-gen AI/ML infrastructure. The discussion addresses crucial safety aspects, including arc flash mitigation, insulation systems, and touch-safe designs. By overcoming these challenges, the industry can accelerate the transition to higher voltages, unlocking AI/ML platforms' full potential.
When planning and operating an Internet Data Center (IDC), PUE (Power Usage Effectiveness) is a critical metric for licensing and energy performance. While technologies like direct liquid cooling and immersion cooling are effective, they often require high capital investments.
We propose an efficient and scalable solution: Turbo Blowers + Free Cooling + Heat Reuse System - Introduce outdoor air via high-efficiency turbo blowers to remove heat from hot aisles. - Capture and reuse the exhausted heat for drying, building heating, or hot water systems. - Proven performance: Microsoft applied free cooling with a PUE around 1.22 in 2021.
Trans-Inductor Voltage Regulator (TLVR) Technology is a new onboard xPU power delivery solution proposed by Google in IEEE APEC 2020. ■ TLVR is an innovative fast-transient onboard voltage regulator (VR) solution for xPUs. This VR topology provides increased VR bandwidth, faster transient response, and potential reduction in decoupling capacitors. ■ TLVR has been widely used in recent years since it offers a good transient performance with reduced equivalent output transient inductance. However, existing TLVR has not been optimized for power efficiency and density. ■ One of the limitations is that each trans-inductor has to be designed for the peak load current in terms of magnetic core saturation. ■ Zero Bias TLVR was introduced to address this limitation. It moves one phase from a primary side to a secondary side. ■ By doing so, the secondary side phase is able to drive TLVR secondary winding with equal magnitude and opposite direction to primary winding current for both DC and transient.
The future of artificial intelligence (AI) is continuous demanding on higher performance, greater efficiency, and increasing scalability in modern data centers. As a designer of advanced server CPUs and specialized AI accelerators, AMD plays a crucial role in addressing these priorities. AMD delivers leading high-performance computing solutions, from advanced chiplet architecture and server design to rack and data center infrastructure to meet AI market demands.
In this presentation, we will showcase a new data center architecture based on OCP Rack with liquid-cooling equipment. Under such a new AI Custer, how to collect, store, analyze, and visualize data to provide data center managers with the ability to effectively manage such a new architecture. We also provide a mechanism to effectively cooperate with the existing operating support system to seamless integrate new AI cluster architecture into legacy datacenter management. Going further, we will propose a new approach on how to use AI methodology to manage AI Clusters as Wiwynn's future work.
This session delves into the critical design considerations and emerging challenges associated with immersion cooling for high-speed signals in data centers. Key topics include the electrical characterization of cooling liquids, the performance benefits of improved thermal environments, and the impact of immersion fluids on high-speed interconnects—from individual components to entire signal channels. The discussion also covers design optimization strategies tailored for submerged environments. Finally, the session highlights the current state of industry readiness and the technical hurdles that must be addressed to ensure reliable high-speed signaling under immersion cooling conditions.
Inference tasks vary widely in complexity, data size, latency requirements, and parallelism, and each workload type interacts differently with CPU capabilities. Understanding this relationship allows for more effective hardware selection and optimization strategies tailored to specific use cases.
Key Learning Areas -AI Model Architecture -Types of Inference Workloads -Quantization: Balancing Accuracy and Efficiency -Data Throughput and Bandwidth -Benchmarking Inference Performance -Frameworks and Libraries Impact Performance
With the rapid development of AI, the demand for performance in data centers and computing infrastructure continues to rise, bringing significant challenges in energy consumption and heat dissipation. This paper discusses the application of AI in infrastructure and thermal management solutions, focusing on how Auras products integrate advanced intelligent cooling systems and temperature control technologies. By leveraging AI-driven monitoring and control, energy efficiency is significantly improved. Looking ahead, as AI technology advances, intelligent infrastructure and innovative thermal management will become key drivers for high-performance computing and green energy saving.
As memory capacity and bandwidth demands continue to rise, system designs are pushing toward higher memory density—particularly in dual-socket server platforms. This session will explore the thermal design challenges and considerations involved in supporting a 2-socket, 32-DIMM configuration on the latest Intel® Xeon® platform within a standard 19-inch rack chassis. In such configurations, DIMM pitch is constrained to 0.25"–0.27", significantly increasing the complexity of memory cooling. We will present thermal evaluation results based on Intel-developed CPU and DDR5 Thermal Test Vehicles (TTVs), which simulate real-world heat profiles and airflow interactions.
The recent publication of Ultra Ethernet 1.0 is an ambitious project of tuning the Ethernet stack to accommodate AI and HPC workloads. It covers everything from physical layer to software APIs. What makes it different? How does it work? This session explains the whys, hows, and whats of UEC 1.0 and describes the future of high-performance networking.
The rapid evolution of semiconductor technology and the growing demand for heterogeneous integration have positioned advanced packaging as a critical enabler of next-generation electronic systems. As devices become more compact and functionally dense, traditional single-die analysis methods are no longer sufficient. Instead, a system-level approach—spanning from silicon to full system integration—is essential to ensure performance, and reliability. This talk explores how advanced packaging technologies such as 2.5D/3D IC, and chiplets serve as the foundation for silicon-to-system multiphysics analysis. We delve into the multi-scale, multi-domain simulation challenges—including thermal, mechanical, electrical and optical interactions—and examine how state-of-the-art simulation tools and methodologies are bridging the gap between design abstraction levels. Finally, an AI-driven thermal analysis that illustrates how complex chiplet designs influence floorplanning decisions. That proposed approach accelerates design space exploration, enhances prediction accuracy, and enables optimization of packaging architectures—from chiplet interconnects to full-system integration.
The Universal Quick-Disconnect (UQD) has played a significant role in the cooling ecosystem for GPUs and genAI. In order to scale, and to further enable the adoption of liquid throughout the ecosystem, a workstream was established at the end of 2024 to develop a UQD Version 2. The purpose of this workstream is to update the UQD/UQDB v1 specification such that gaps in requirements and performance are resolved, ambiguity is removed, and true interoperability is defined and achievable. Key deliverables include unification of the UQD and UQDB as a singular specification, defined performance and interoperability testing requirements, and realization of a new mating configuration. Progress updates with relevant performance attributes and technical detail of the v2 proposal will be discussed, as well as plans for official release and deployment.
Beth Langer is the Lead Technical Engineer in the Thermal Management Business Unit at CPC where all connectors manufactured for liquid cooling applications meet or exceed established criterion.
For decades the motherboard ecosystem has toiled in the service of the steady tic/toc beat of server processor roadmaps. That was then - this is now! Today there are multiple processor lines each within a larger set of processor makers than ever before in the server industry. The complexity of server processor complexes have skyrocketed increasing board layers, design rules and all manner of motherboard attributes.
The DC-MHS standards come at the right time. Motherboards (transformed now to HPMs) can be much more efficiently produced when originated by the processor manufacturers. The advent of the HPM reduces costs, increases diversity of systems and generally allows the ecosystem to innovate around the processor complex including baseboard management. This comes at exactly the time when the design aperture seemed to be closing on server system vendors. DC-MHS standards have created a whole new opportunity to build thriving horizontal ecosystems.
Sean Varley leads the Solutions group at Ampere Computing. His group is responsible for building out vertical solutions on Ampere server platforms which includes strategic business relationships, business planning and solution definition in the rapidly evolving Cloud and Edge server... Read More →
As AI workloads push rack power demands well beyond the ~30 kW limits of Open Rack v3, the industry has defined a High-Power Rack (HPR) standard that delivers over 200 kW per rack. This talk explains how liquid-cooled vertical busbars integrate coolant channels around copper conductors to dramatically improve heat removal and reduce I²R losses, all while fitting into existing ORv3 form factors. It also covers modular power-whip assemblies for simplified maintenance, upgraded high-voltage PSUs and battery backup units for resilience, and how OCP member companies collaborate on safety, interoperability, and scalability. Together, these innovations form an end-to-end ecosystem enabling next-generation AI data centers to meet extreme power, thermal, and reliability requirements.
Listeners will gain a clear understanding of the differences between single-phase (1P) and two-phase (2P) direct liquid cooling (DLC) technologies, including the thermal mechanisms, benefits, and limitations of each. The paper offers practical insights into real-world challenges of implementing 2P DLC, such as pressure drop effects, series vs. parallel configurations, and flow imbalance. A new method for calculating thermal resistance in 2P systems is introduced, enabling fair comparison to 1P systems. Listeners will also learn about economic and operational barriers to 2P adoption, including refrigerant costs and high system pressure. By the end, they will understand why 1P DLC is currently more viable for mass deployment and what advancements are needed for 2P DLC to become practical for data centers.
This talk presents the integration of OpenBMC with Arm Fixed Virtual Platforms (FVP) to prototype manageability features aligned with SBMR compliance. It showcases lessons from virtual platform development, sensor telemetry, and Redfish-based remote management, enabling early-stage validation without physical hardware.
As compute densities soar and chip thermal loads rise, data centers are under pressure to deliver efficient, scalable cooling without extensive retrofits. 2-phase liquid cooling, integrated into modular sidecar systems, offers a high-performance, energy-efficient solution that meets this need while maintaining compatibility with existing infrastructure. The presentation will dive into how sidecar architectures—deployed alongside standard racks—leverage the latent heat of vaporization to manage extreme heat loads with minimal coolant flow. By maintaining constant temperatures across cold plates, 2-phase cooling ensures thermal uniformity for processors with varying power profiles, preventing hot spots and throttling. Key takeaways will include how 2-phase sidecars enable efficient, localized cooling without facility water, deployment strategies for retrofitting existing data centers without major disruption and environmental benefits such as reduced energy use and lower carbon footprint
As AI and ML power demands increase, driving rack power levels to 140 kW and necessitating higher voltages like +/-400V DC, optimizing bus bar systems becomes crucial for efficient, reliable power delivery. Bus bars, ideal for high-current applications, face unique challenges in high-density AI/ML racks, including thermal management, space optimization, structural rigidity, and safety. This paper explores advanced design techniques for future AI/ML power architectures, covering material selection (e.g., copper, aluminum), cross-section optimization, insulation strategies, and terminal methods. Thermal and mechanical simulations ensure performance and durability. Critical safety features, such as touch protection and creepage distances, are integrated. These solutions aim to develop robust power infrastructure for next-gen AI/ML data centers.
This presentation will showcase the current two-phase CDU design and the two-phase cold plate samples. The test results of the two-phase cold plate samples are compared with those of the same samples filled with PG25 as the working fluid. Based on the comparison, the potential of the two-phase cold plates can be discovered. Without significantly altering the existing single-phase architecture, the two-phase coolant can be distributed freely to various racks, providing a solution for the chips with locally higher heat flux. Lastly, the future role of the pumped two-phase solution in the cooling environment, and the forthcoming business model will be discussed.
As data center power densities surge, traditional air cooling increasingly fails to meet thermal demands efficiently. This presentation explores the evolution of Direct Liquid Cooling (DLC), tracing its progression from single-phase to two-phase technologies. We begin by examining single-phase DLC, where coolant absorbs heat without phase change, offering reliable yet limited performance. We then transition to two-phase DLC, where phase change enables significantly higher heat flux dissipation through latent heat transfer. Key distinctions in efficiency, system complexity, and deployment readiness are analyzed. The session concludes with emerging trends such as low-GWP dielectric fluids and 3D chip cooling that position two-phase DLC as a critical enabler for next-generation high-performance computing and AI workloads.
Google contributed the Advanced PCIe Enclosure Compatible (APEC) Form Factor to OCP in 2024. APEC (Advanced PCIe Enclosure Compatible) is an electrical mechanical interface standard intended to advance the PCIe CEM standard with increased PCIe lane count, bandwidth, power, and management capability for use cases that need more advanced capabilities. This session will go deeper on what progress we have made, including the test methodology and challenges, and also our next steps to keep this moving forward. To make this happen, Google has developed the end-to-end testing modules to qualify the signals at both PCIe root complex and endpoint based on APEC. We will guide you through how the test module was designed from SI and layout routing considerations toward the goal of test efficiency and automation.
■ This talk traces the evolution of 48V power delivery architectures for datacenter applications since commencing with Google's introduction of a tray-level, two-stage approach at OCP in 2016. ■ Subsequent advancements in topologies and ecosystems have paved the way for collaborative standardization efforts. ■ In 2024, Google, Microsoft, Meta jointly presented an updated 48V Onboard Power Specification and Qualification Framework, leading to the formation of an OCP workstream aimed at finalizing and implementing comprehensive 48V power module solutions and qualification protocols. ■ This talk will outline critical design principles to mitigate challenges associated with 48V two-stage power delivery, encompassing power failure mechanisms in complex 48V environments, explore the challenges of high power density and physical limitations, providing a detailed electrical specification and qualification requirements for data centers applications.
The dimensions of the Intel next platform has experienced an increase compared to the preceding ones, primarily due to the augmentation in pin count to increase the signal to dnoise ratio in both PCI Express 6.0 and DDR5. This alteration creates difficulties in arranging two processors, each of them has a 16 DDR5 channels, on a standard 19-inch rack. In response to this issue, Intel has embarked on a strategic initiative aimed at facilitating the accommodation of this challenge, which involves a proposal to reduce the distance between DDR5 connectors (a.k.a. DIMM pitch) as well as the processor’s keep out zone. To increase the DDR routing space underneath the DIMM connector’s pin-field area after shrinking the DIMM to DIMMM pitch, VIPPO (Via in Pad Plated Over) PCB (Printed Circuit Board) technology is used. These technologies significantly enhance signal quality when embracing the next generation MCRDIMM (Multiplexer Combined Ranks DIMM).
In this presentation, we will look at the requirements of next generation higher power ORv3 power supplies and HVDC power shelves, which will help increase rack payload and power density yet again, while supporting key design requirements ranging from hot swapability to battery backup. Among the topics covered during the session will be an update on key design specifications and design considerations, as well as the most recent ORv3 technologies – including power supplies, power shelves, shelf controllers and battery backup solutions. We will also explore the next Generation Rack and Power roadmaps.
AI / HPC solutions is being addressed by Heterogenous Packaging 2.5 and 3D. Chiplets and HBM stack are finding way to realise the product development quicker and optimised for the required performance. Till now mostly the Chiplets based integration is Homogeneous ( Same kind of Chiplet designed within the company only HBM stack from third party vendor) but the industry started to move to Heterogenous ( different Chiplet from different vendors ). Complex Package design with different CTE ( Coefficient of Thermal Expansion ) becomes a key aspect to be taken care in the material selection and design as the physical phenomena can impact the Physical and Electrical aspects of the Device and hence Final Test Yield, Reliability and Field Returns. A well thought "Design For Test" to Final ATE Test strategy ( Wafer and Package) are required to optimise Test cost, performance and product reliability, as the defects in even a single Chiplet can lead to costly failures at the System Level.
Wiwynn collaborates with Intel through the Open IP program to integrate a 1OU computing server into Intel’s single-phase immersion cooling tank, following the OCP ORv3 standard. The system uses Perstorp’s Synmerse DC synthetic ester coolant to thoroughly evaluate thermal performance under high-power workloads. In this study, CPUs are stressed up to 550W TDP, while researchers examine how variables such as CDU pumping frequency, inlet coolant temperature, and different heatsink types impact cooling effectiveness. Results are compared to those of traditional air cooling systems under similar operating conditions. The goal of this analysis is to optimize immersion cooling approaches, providing valuable insights for improving thermal management in high-performance computing and modern data centers.
According to the development trend of power consumption and heat dissipation in the general sever and AI server, the evolution of cooling solutions has changed from air cooling to hybrid cooling, then to full liquid cooling. In response to this development trend, we proposed an integrated liquid cooling solution for the building blocks of the AI clusters, including the AI IT rack, High Power Cabinet, and the Cooling Cabinet.
As system complexity grows, ensuring reliability, power efficiency, and performance is critical. proteanTecs, a leader in electronics monitoring, has integrated its deep data monitoring with Arm’s System Monitoring Control Framework (SMCF), enhancing Arm Neoverse CSS solutions with predictive analytics and lifecycle insights. SMCF offers a modular framework for telemetry, diagnostics, and control. By embedding proteanTecs' in-chip agents and software, the integration boosts system visibility, enabling optimized power/performance, improved reliability, and faster diagnostics. This collaboration empowers semiconductor manufacturers and system operators to meet evolving demands with scalable, architecture-agnostic solutions. The presentation will highlight key applications such as predictive maintenance, defect detection, and power optimization for next-gen high-performance compute environments.
The key focus of this presentation is on the safety requirements for liquid cooling technologies systems, particularly regarding pressurized liquid-filled components (LFCs), as addressed in Annex G.15 of IEC 62368-1. By analyzing the construction and testing requirements specified in the standard, this presentation offers insights into designing safe and reliable liquid cooling solutions aimed at mitigating risks associated with leaks, preventing hardware damage, and ensuring global regulatory compliance in AI and ML-driven data centers.
As the power consumption of each high densigy AI server rack goes higher and higher, the design of the cabinet can no longer only consider a signle AI server rack, but must also take the power cabinet and even the cooling cabinet into consideration. This presentation will introduce a rack architecture to integrate the AI server rack with power loop and cooling loop.
Enabling Direct Liquid Cooled (DLC) IT solutions in data center environments requires a comprehensive understanding of the facility design, Coolant Distribution Units (CDU), and the IT solutions. There are many interdependencies and design considerations when integrating and commissioning DLC solutions in data center environments. The Open Compute Project (OCP) Community has many workgroups which are addressing various aspects of the DLC solution enablement.
The ORV3 OCP ecosystem currently lacks robust protection for the rack-loaded lifecycle in ship-loadable packaging. This presentation will highlight the innovative packaging solution developed to ensure safe transport of a fully-loaded ORV3 system. We will delve into the design considerations that maintain both rack protection and cost-efficiency. Additionally, we will provide an overview of the extensive testing conducted to validate the system’s resilience and ensure the protection of the rack and equipment from transportation-related impacts.
Sean Varley leads the Solutions group at Ampere Computing. His group is responsible for building out vertical solutions on Ampere server platforms which includes strategic business relationships, business planning and solution definition in the rapidly evolving Cloud and Edge server... Read More →
This study investigates galvanic corrosion in heterogeneous metal materials utilized in cold plate assemblies for single-phase liquid cooling systems. The galvanic corrosion behavior (Tafel plot) of pure copper, stainless steel 304, stainless steel 316, and nickel-based brazing fillers (BNi2 and BNi6) immersed in PG25 working fluid was measured on days 0, 7, and 60. Furthermore, accelerated aeration experiments were conducted on PG25 to assess its chemical stability, and its electrochemical properties were subsequently analyzed after 30 days of aeration using electrochemical methods.
This presentation offers a comprehensive overview of key accessories in the ORv3 ecosystem, highlighting two main areas: the 19” adapter and cabling & airflow management solutions. We will introduce essential components, including the 19” adapter rail, cable management arm, blanking panels, side skirts, and side expanders, detailing their design and benefits for the community. Additionally, the session will explore the extensive testing conducted on these accessories. These solutions are crucial for modern data centres, offering flexible, efficient, and organized approaches to infrastructure management.
Exploring liquid-cooled bus bars addresses the increasing power demands in modern data centers, particularly those exceeding 150kW per rack with AI and HPC workloads. Traditional bus bar designs struggle with current limitations, hindering efficient power management. Liquid-cooled bus bars integrate cooling channels to enhance heat dissipation, maintaining optimal temperatures and improving system safety and reliability. This approach mitigates thermal runaway risks and ensures compliance with industry standards, while boosting efficiency by minimizing energy losses associated with high current densities. Implementing liquid-cooled bus bars signifies a significant advancement in data center infrastructure, enabling higher power densities, superior thermal management, and overall improved performance.
cool data centers in a very energy-efficient way, and we recover and reuse the excess heat produced within the data centers. This is what we consider green digitalization!
ChatGPT began AI's watershed moment that triggered IT infrastructure's tectonic shift and race in extraordinary and lasting commitments to AI Factory. Many governments and enterprises alike are making enormous capital and people investments to not be left behind the AI boom. Corporate boardrooms are evaluating purposeful infrastructure plans. What is the best architectural decision - retrofitting, built from scratch or adopt a wait-and-see? This fork in the road has given pause and decision paralysis to some infrastructure decision makers. Our talk examines the AI Factory Spectrum to identify solutions that advance the infrastructure challenge sustainably.
This study explores the long-term stability of immersion cooling fluids through accelerated aging experiments designed to comply severer operational conditions. As immersion cooling becomes a vital solution in high-performance and data-intensive systems, understanding fluid deterioration behavior over thermal and metal induced decay is essential for ensuring system reliability. By subjecting the fluids to several thermal stress over time at the present of metal, we continuously monitor key aging indicators such as flash point descend, dielectric constant& tangent loss shift, viscosity change, acid number increase and oxide accumulate . These metrics are then used to construct predictive models that define its "" fluid's stability window"" under real-world conditions. The resulting approach enables manufacturers and system integrators to determine quality assurance periods more accurately, facilitating better maintenance planning and formulation design.