In 2023, we witness the explosion of artificial intelligence (AI), which is changing the way people work, live, and interact with technology. Generative AI, represented by ChatGPT, has also attracted a great deal of attention in the last year due to its remarkable progress and wide range of applications. As AI continues to develop and mature, it has the potential to revolutionize industries ranging from healthcare, finance, and manufacturing to transportation and entertainment. The huge demand for artificial intelligence is driving the development of new chip and server technologies, and these changes will bring disruptive challenges to data center construction, power demand, water consumption, power supply and distribution, and cooling technology and architecture. How to deal with these challenges will become a concern of the industry in the New Year.
As a global leader in infrastructure construction and digital services for data centers and key application areas of the industry, Schneider Electric has released the “Visible Future – New Trends and new Breakthroughs in the Data center Industry” series of insights at the beginning of the seventh consecutive year since 2018, creating a pioneer in the forward-looking interpretation of industry trends and continuing to lead the direction of future change. Inject strong development momentum into the data center industry. Based on deep industry insights and practices, Schneider Electric is committed to revealing what will change in the data center industry in the New Year, the value and significance of these changes and trends for data center operators, and the perception nd value proposition of these industry changes. The following is the forecast of Schneider Electric’s Global Data Center Research Center for 2024.
Trend one
The intelligent computing center will lead the construction of data centers
Over the past decade, cloud computing has been a major driver of data center construction and development to provide society with the universal computing power needed for digital transformation. However, the explosion of AI has brought a huge
demand for computing power, and in order to meet the training and applied reasoning of AI large models, we need to build a large number of intelligent computing centers.
Based on data such as global data center power consumption, GPU chips and future shipments of AI servers, Schneider Electric estimates that the current power demand of global smart computing centers is 4.5 GW, accounting for 8% of the total 57 GW of data centers, and predicts that it will grow at a compound annual growth rate of 26-36% through 2028. It will eventually reach 14.0 GW to 18.7 GW, accounting for 15%-20% of the total 93 GW. This growth rate is two to three times the compound annual growth rate of traditional data centers (4%-10%). The distribution of computing power will also migrate from the current centralized deployment (95% vs. 5% at the edge) to the edge (50% vs. 50%), which means that the intelligent computing center will lead the trend of data center construction. According to the planning of the Ministry of Industry and Information Technology, the proportion of intelligent computing power in our country will reach 35% in 2025, with a compound annual growth rate of moe than 30%.
Schneider Electric view:
Compared with traditional data centers, the construction of intelligent computing centers needs to achieve sustainable development and be more forward-looking under the premise of ensuring high energy efficiency and high availability, that is, minimize the impact on the environment, and especially improve adaptability to meet the needs of future IT technologies (high-power chips and servers).
Trend two
AI will drive the power density of the cabinet to increase rapidly
Cabinet power density has a great impact on the design and cost of data centers, including power supply and distribution, cooling, and the layout of IT rooms, and has always been one of the design parameters of data centers.
Uptime’s findings over the past few years show that the power density of server cabinets is steadily but slowly climbing. The average power density of cabinets is usually less than 6 kW, and most operators do not have cabinets that exceed 20 kW. The reasons for this trend include Moore’s Law keeping the thermal design power of chips relatively low (150 watts), and the fact that high-density servers are often dispersed in different cabinets to reduce infrastructure requirements, but the explosion of AI will change this trend.
Schneider Electric view:
The power density of AI cabinets for training can be as high as 30-100 kW (depending on the type of chip and the configuration of the server). The reasons for this high density are many, including the rapidly increasing CPU/GPU thermal design power consumption, 200-400 watts for cpus and 400-700 watts for Gpus, which will be further increased in the future; AI servers typically consume around 10 kilowatts, and since Gpus work in parallel, AI servers need to be deployed compact in clusters to reduce network latency between chips and storage. The sudden increase in cabinet power density will pose a huge challenge to the design of the physical infrastructure of the data center.
Trend three
Data centers are transitioning from air-cooled to liquid-cooled
Air cooling has always been the mainstream way of data center IT room cooling, if properly designed, can support more than a dozen kilowatts or even higher cabinet power density. But with the constant pursuit of AI training performance, developers continue to improve the thermal design power consumption of chips, and it becomes impractical to air-cool these chips. Although some server suppliers continue to break the limits of air cooling technology by redesigning the chip’s heat sink, increasing the server air volume and the inlet and outlet air temperature difference, and configuring 40-50 kW air-cooled AI cabinets, this will increase the power consumption of the fan exponentially. For example, AI server fans can consume up to 25% of server power, but the typical value for traditional servers is only 8%.
Schneider Electric view:
The cooling of the chip is the main driving force for liquid cooling, and the 20 kW cabinet power density is a relatively reasonable dividing line between air cooling and liquid cooling. When the power density of the AI cabinet exceeds this value, the liquid cooled server should be considered.
Liquid cooling also offers a number of benefits over air cooling, including improved processor reliability and performance, improved energy efficiency, reduced water consumption, and reduced noise levels. Currently, for high-density AI servers, vendors typically offer both air-cooled and liquid-cooled solutions, but for next-generation Gpus, liquid cooling will be the only option.
Trend four
The safety and reliability of power distribution is more important in the intelligent computing center
For traditional data centers, the probability of different workloads peaking at the same time is extremely low. For example, the typical large data center peak ratio is usually 1.5-2.0 or higher. But in the intelligence center, due to the lack of variation in the AI training load (peak-to-average ratio is close to 1.0), the workload can run at peak power for hours, days, or even weeks. The result is an increased likelihood of tripping of large upstream circuit breakers, as well as the risk of downtime. At the same time, due to the increase in the power density of the cabinet, it is necessary to use circuit breakers with higher rated current values, column head cabinets, small bus bars, etc. While the resistance is smaller, the fault current that can be passed is larger, which means that the risk of arc in the IT room will also increase, and the safety of the staff in the area must be solved.
Schneider Electric view:
In the design phase, simulation software is used to assess the risk of flash flashes in the power system, analyze the fault currents that can be generated, and analyze the reliability in order to design the best solution for the specific site.
This study must be analyzed from the medium-voltage switchgear to the cabinet level, and IT is recommended that if the AI training workload in the IT room of the new data center exceeds 60-70%, the size of the main circuit breaker needs to be determined based on the sum of the downstream feeder circuit breakers, and the simultaneous coefficient is no longer considered in the design.
Trend five
Standardization will be the key to liquid-cooled propulsion
Cold plate liquicooling and submerged liquid cooling are the two main methods of liquid cooling in data centers. Which liquid cooling method to choose and how to achieve rapid deployment has always been a hot topic in the industry.
As more and more AI servers use cold plate liquid cooling, cold plate liquid cooling is also more compatible with traditional air cooling systems, and is favored by many data center operators. However, the variety of design methods of liquid cooling by server manufacturers, the problems of quick splice, blind plug and Manifold compatibility, and the blurred boundaries of responsibility between IT and infrastructure have greatly limited the acceptance and promotion of liquid cooling in the data center.
Compared with the cold plate liquid cooling, the immersion liquid cooling using fluorocarbon fluid is not only relatively high in price, but also many fluorocarbon compounds belong to artificial synthetic chemicals that are harmful to the environment, facing more and more industry regulation and policy pressure. Therefore, in addition to the use of oil coolant, the available fluorocarbon fluid will be less and less.
Schneider Electric view:
IT manufacturers provide more standardized design solutions, including fluid temperature, pressure, flow, equipment interface, etc., and provide more clear responsibility boundaries.
Schneider Electric will release a liquid cooling white paper in the first quarter to help data centers better deploy liquid cooling technology.
Trend six
Data centers will focus more on WUE
Water scarcity is becoming a serious problem in many regions, and understanding and reducing water consumption in data centers is becoming increasingly important. An important reason data center water consumption has not been paid attention to before is that the cost of water is often negligible compared to electricity, and many data centers even use more water to improve energy efficiency. However, data center water use has attracted the attention of many local governments, especially in water-scarce areas, and governments are introducing various policies to limit and optimize data center water use. This includes using WUE as a design indicator for data centers and adopting a dual control policy for hydropower. As a result, reducing water consumption will be a key area of focus for many data center operators in the future.
Schneider Electric view:
A WUE value in a data center between 0.3-0.45 L/kWh is a relatively good value. Schneider Electric recommends finding a balance between electricity and water based on water availability in the location of the data center, climate conditions and the type of data center.
The industry can adopt various technological innovations such as adiabatic evaporation, indirect evaporative cooling, and liquid cooling to reduce direct water consumption. Data center operators should use WUE as part of their sustainability Goals to report on water use/savings, while also focusing on indirect water use from electricity use.
Trend 7
Improving the power distribution capacity will become a new demand of the intelligent computing center
In the intelligent computing center, with the increase of cabinet power density and the cluster deployment of AI cabinets, the power distribution in the IT room faces the challenge of low rated capacity. For example, in the past, a 300 kW power distribution module could support dozens or even hundreds of cabinets. Today, the same power distribution module cannot support even a minimum NVIDIA DGX SuperPOD AI cluster (10 cabinets with a single row of 358 kW, each cabinet with 36 kW). The power distribution module specifications are too small, and using multiple power distribution modules not only wastes IT space, but also becomes impractical. Multiple power distribution modules also add cost compared to a single high-capacity power distribution module. Returning to the essence of distribution, the main means to increase distribution capacity is to increase current.
Schneider Electric view:
Select power distribution modules with high specifications to flexibly deploy them and meet future power distribution requirements. Ensure that at least one row of clusters is supported.
For example, at rated voltage, the 800 A power distribution module is the current standard capacity size for all three distribution types (PDU, RPP and bus) and can provide 576 kW (461 kW after capacity reduction). Small bus bars can be used for terminal distribution, thus avoiding the need to customize cabinet PDUs with A rated current greater than 63 A. When space permits, multiple standardized cabinet PDUs can be used as a transition.