Show HN: DomainTasker – avoid losing domains and surprise renewals
Comments
🇺🇸 미국 · IT/기술 · "AVOID" · 총 18건
필터 보기현재 지수
50.0
0 = 부정 우세
50 = 중립
100 = 긍정 우세
최근 7일 기준 11,320건을 분석한 결과, 뉴스 심리지수는 50.0(균형)입니다. 긍정 1건(0.0%)·중립 11,318건(100.0%)·부정 1건(0.0%)이며, 중립 비중이 뚜렷하게 높습니다. 성향 지수는 종합 19.2(중도 균형)입니다.
Comments
C.C. Wei told shareholders the company is working hard to avoid becoming a bottleneck as AI demand overwhelms capacity across the supply chain
It's almost impossible to avoid seeing AI-generated content online, but it doesn't have to be this way. YouTube, Instagram, TikTok, and more have ramped up content authentication efforts over the last year, with many now automatically applying labels to distinguish AI-generated images, videos, and music from those made by real, human creators. That's all very […]
Walmart's viral Code Puppy AI tool helps avoid vendor lock-in, cut costs, and reduce dependence on Claude Code and Codex.
The Dreame L20 Ultra isn’t the company’s newest model, but it’s still a great robovac / mop hybrid that offers strong performance while requiring very little day-to-day maintenance thanks to its included trash bin and AI obstacle avoidance. Verge readers can get for its best-ever price right now. Originally $1,400 when it launched in 2023, […]
The MAI model family spans reasoning, coding, image, voice, and transcription — and lets Microsoft avoid paying third parties like OpenAI
A glass screen protector is one of a few essential accessories that I strongly recommend to every Switch 2 owner. In fact, it should be a priority to stick one onto the console’s screen as soon as possible to avoid accidental scratches. To test the candidates below, I installed and removed Switch 2 screen protectors […]
Amazon is a murky mess of ads, unknown sellers, misleading sales, and specious information. Stay safe while shopping on Prime Day and beyond with these tips and tricks.
Microsoft has delayed its upcoming Fable reboot once again. The game was set to launch in autumn 2026, but Microsoft now says that Fable will come out in February 2027. However, it will show a "new look" at the game at its Xbox Games Showcase on June 7th. "This is year is packed with incredible […]
Comments
Anthropic is releasing Claude Opus 4.8 on Thursday, and the company is touting the model's "honesty." According to Anthropic, it trains "all [its] models to be honest - for instance, to avoid making claims that they can't support." But it notes that "a general problem with AI models is that they sometimes jump to conclusions, […]
Comments
Go into the office, embrace 'messy jobs,' and other dos and don'ts for the future workplace in the age of AI, advises Tyler Cowen.
Learn how to fact check AI with tips and techniques to verify accuracy, avoid hallucinations, and ensure reliable information from tools like ChatGPT.
So why didn’t Swatch do anything to avoid it?
For years, the field of robotics has used the terms “dull, dirty, and dangerous” (DDD) to describe the types of tasks or jobs where robots might be useful—by doing work that’s undesirable for people. A classic example of a DDD job is one of “repetitive physical labor on a steaming hot factory floor involving heavy machinery that threatens life and limb.” But determining which human activities fit into these categories is not as straightforward as it seems. What exactly is a “dull” task, and who makes that assumption? Is “dirty” work just about needing to wash your hands afterwards, or is there also an aspect of social stigma? What data can we rely on to classify jobs as “dangerous?” Our recent work (which was not dull at all) tackles these questions and proposes a framework to help roboticists understand the job context for our technology. First, we did an empirical analysis of robotics publications between 1980 and 2024 that mention DDD and found that only 2.7 percent define DDD and only 8.7 percent provide examples of tasks or jobs. The definitions vary, and many of the examples aren’t particularly specific (for example, “industrial manufacturing,” “home care”). Next, we reviewed the social science literature in anthropology, economics, political science, psychology, and sociology to develop better definitions for “dull,” “dirty,” and “dangerous” work. Again, while it might seem intuitive which tasks to put into these buckets, it turns out that there are some underlying social, economic, and cultural factors that matter. Dangerous Work: Occupations or tasks that result in injury or risk of harm It’s possible to measure the danger of a task or job by using reported information. There are administrative records and surveys that provide numbers on occupational injury rates and hazardous risk factors. While that seems straightforward, it’s important to understand how this data was collected, reported, and verified. First, occupational injuries tend to be underreported, with some studies estimating up to 70 percent of cases missing in administrative databases. Second, injuries and risk factors are rarely disaggregated by characteristics like gender, migration status, formal/informal employment, and work activities. For example, because most personal protective equipment—such as masks, vests, and gloves—are sized for men, women in dangerous work environments face increased safety risks. These caveats are an opportunity for robotics to be helpful. If we went out and looked for it, we could probably find some less obviously dangerous work where robotics might be an important intervention, not to mention some groups that are disproportionately affected and would benefit from more workplace safety. Dirty Work: Occupations or tasks that are physically, socially, or morally tainted Colloquially, most people might think of dirty work as involving physical dirtiness, such as trash removal, cleaning, or dealing with hazardous substances. But social science literature makes clear that dirty work is also about stigma. Socially tainted jobs are often servile or involve interacting with stigmatized groups (for example, correctional officers), and morally tainted jobs include tasks that people commonly perceive as sinful, deceptive, or otherwise defying norms of civility (like a stripper or a collection agent). “Dirty work” is a social construct that can vary across time (like tattoo industry stigma in the United States) and culture (such as nursing in the U.S. versus in Bangladesh). One way to measure whether work is “dirty” is by using the closely related concept of occupational prestige, captured through quantitative surveys where people rank jobs. Another way to measure it is through qualitative data, like ethnographies and interviews. Similar to “dangerous,” we see some hidden opportunities for robotics in “dirty” work. But one of our more interesting takeaways from the data is that a lower-ranked job can be something that the workers themselves enjoy or find immense pride and meaning in. If we care about what tasks are truly undesirable, understanding this worker perspective is important. Dull Work: Occupations or tasks that are repetitive and lacking in autonomy When it comes to defining dull work, what matters most is workers’ own experiences. Outsiders can make a lot of false assumptions about what tasks have value and meaning. Sometimes things that seem boring or routine create the right conditions for developing skills and competence, such as the concentration needed for woodworking, or for socializing and support, when tasks are done alongside others. Instead of assuming that repetitive work is negative, it’s important to examine qualitative data on how people experience the work and what purpose it serves for them. DDD: An actionable framework In our paper, we propose a framework to help the robotics community explore how automation impacts individual jobs. For each term—dull, dirty, and dangerous—the framework gathers key pieces of information to reflect on what physical or social aspects of the task are, in fact, DDD. Worker perspective is an important part of all three considerations. The framework also emphasizes awareness of context—meaning the physical and social environment of an occupation and industry that can influence the DDD nature of a task. Our corresponding worksheet suggests existing data sources to draw on and encourages us to seek out multiple perspectives and consider potential sources of bias in the information. What makes tasks dull, dirty, or dangerous depends on the perspective of the humans doing those tasks.RAI Let’s take, for example, the waste and recycling industry. The world generates over 2 billion tonnes of waste annually, and this figure is expected to rise to nearly 4 billion tonnes by 2050. Intuitively, trash collection seems like a job that hits all the Ds. Going through our worksheet, we confirm that globally, workers in this industry face significant health hazards (dangerous), and waste collection is ranked as a low-status job (dirty), although interestingly, many workers take pride in providing this essential service. The job is also repetitive, but there are aspects that make it not dull. Specifically, workers cite the day-to-day interaction with their coworkers (which includes extensive insider vocabulary, work hacks, and mutual aid groups) and task variety as two of the most enjoyable aspects of the job. Task variety includes inspecting their vehicle and equipment, driving their truck, coordinating with crew members, lifting bins and bags, detecting incorrect sorting of waste, and unloading at the end destination. This finding matters because some types of robotic solutions will eliminate the parts of the job that workers most appreciate. For instance, the National Institute for Occupational Safety and Health (NIOSH) recommends the adoption of automated side loader trucks and collision avoidance systems. This innovation increases safety, which is great, but it also results in a sole worker operating a joystick in a cab, surrounded by sensor and camera surveillance. Instead, we should challenge ourselves to think of solutions that make jobs safer without making them terrible in a different way. To do this, we need to understand all aspects of what makes a job dull, dirty, or dangerous (or not). Our framework aims to facilitate this understanding. Finally, it’s important to note that DDD is only one of many possible approaches to classify what work might be better served by robots. There are lots of ways we could think about which types of tasks or jobs to automate (for example, economic impact or environmental sustainability). Given the popularity of DDD in robotics, we chose this common phrase as a starting point. We would love to see more work in this space, whether it’s data collection on DDD itself or the creation of other frameworks. At RAI, we believe that the fusion of robotics and social sciences opens a whole new world of information, perspectives, opportunities, and value. It fosters a culture of curiosity and mutual learning, and allows us to create actionable tools for anyone in robotics who cares about societal impact. Dull, Dirty, Dangerous: Understanding the Past, Present, and Future of a Key Motivation for Robotics, by Nozomi Nakajima, Pedro Reynolds-Cuéllar, Caitrin Lynch, and Kate Darling from the RAI Institute, was presented at the 21st ACM/IEEE International Conference on Human-Robot Interaction (HRI) in Edinburgh, Scotland.
This sponsored article is brought to you by Ampace. As AI workloads grow to gigascale levels, the global data center industry has hit a hidden physical wall. The real bottleneck is no longer just the thermal limit of the chip or the capacity of the cooling system — it is the dynamic resilience of the power chain. Modern AI computing clusters, driven by massive GPU clusters, generate high-frequency, abrupt, and synchronized spikey pulse loads. As rack densities soar beyond 100 kW, these fluctuations are amplified into a “power paradox”: while the digital logic of AI is moving faster than ever, the physical infrastructure supporting it remains tethered to legacy response capabilities. The power usage of these gigascale sites and their drastic, high frequency, abrupt load surges from the AI GPU clusters can trigger transient voltage events and frequency instability, risking the entire local grid. The grid itself is not robust enough to support these loads. This leads to the infrastructure gap: The utility is not robust enough and traditional backup sources, such as diesel generators and gas turbines, simply cannot react to millisecond-level power spikes in output. This will often force operators into a cycle of costly infrastructure over sizing just to buffer the volatility. AI infrastructure requires energy systems capable of instantaneous response while safeguarding continuity and reliability. The industry has explored various mitigations — from rack-level BBUs to 800V DC architectures — yet the mature, high volume, traditional UPS system remains the most viable and scalable foundation for gigawatt-level facilities. Consequently, the UPS-integrated battery system has emerged as the critical “physical buffer” to neutralize these pulses at the source. At Data Center World 2026 in Washington, D.C., Ampace led a pivotal technical dialogue with Eaton during the session “Powering Giga-scale AI.” Their exchange unveiled a fundamental paradigm shift: To bridge the AI power gap, energy storage must evolve from a passive insurance policy into an active, high-speed stabilizer. By aligning Ampace’s semi-solid-state battery innovation with Eaton’s proven system intelligence, we are moving beyond simple backup to solve the physical paradox of the AI era. To move beyond simple backup and solve the physical paradox of the AI era, Ampace is aligning its semi-solid-state battery innovation with Eaton’s proven system intelligence.Ampace The “Shock Absorber” physics: semi-solid chemistry for AI pulses Conventional power systems were designed for steady-state loads, not the rapid heartbeat of a massive AI GPU cluster. When thousands of GPUs synchronize their computing cycles, they generate high-frequency, abrupt pulse loads that can lead to voltage sags, frequency oscillations, and potential interruptions of critical AI training. Ampace’s PU Series semi-solid and low-electrolyte cells address this challenge by acting as high-speed “shock absorbers.” Leveraging ultra-low internal resistance (DCR) and high cycle capability, these batteries neutralize millisecond-level power spikes at the source, stabilizing the local power loop before disturbances propagate upstream to the grid or on-site generators. These high-rate cells enable 100 kW+ racks to maintain peak performance without transmitting instability across the power chain. This capability aligns closely with Eaton’s matured UPS architectures, such as double-conversion topologies and advanced power electronics upgrades, which have long prioritized rapid load responsiveness and high system stability. Together, these approaches embody a shared industry philosophy: AI infrastructure requires energy systems capable of instantaneous response while safeguarding continuity and reliability. Ampace’s semi-solid state chemistry minimizes liquid electrolyte, greatly reducing the risk of leakage and thermal runaway under continuous AI high-load conditions.Ampace Algorithmic intelligence: synchronizing energy and control Hardware alone cannot solve the AI power paradox; the system also requires intelligent coordination between energy storage and power management. Sophisticated battery management systems (BMS) like Ampace’s high-precision design track state-of-charge (SOC) with high-speed sampling, even during rapid, shallow cycling typical in AI workloads. Complementary algorithmic approaches in modern UPS platforms — such as ramp-rate control and average power management — effectively suppress sub-synchronous oscillations and optimize load smoothing. In large-scale AI training environments, where thousands of GPUs can trigger millisecond-level power pulses, these intelligent layers ensure that batteries buffer high-frequency fluctuations without compromising the mandatory emergency backup reserves. By transforming energy storage from passive “standby insurance” into active, schedulable assets, the system simultaneously safeguards continuous AI training and maintains the long-term health of the data center infrastructure. In practical terms, this means that even during peak compute bursts, the infrastructure remains stable, training cycles continue uninterrupted, and operators avoid costly oversizing or grid stress. Eaton’s dual-layer algorithms serve as a valuable benchmark in this space, demonstrating how advanced control logic can achieve similar objectives, reinforcing Ampace’s approach and philosophy within the broader data center power ecosystem. Economic scalability: optimizing AI infrastructure efficiently One of the largest costs in deploying AI infrastructure is “oversizing”: procuring transformers, generators, and UPS systems to handle brief peak spikes. This traditional approach inflates the Total Cost of Ownership (TCO) and leads to wasted capital on underutilized hardware. Ampace’s turn-key cabinet design developed by its independent R&D is engineered for seamless compatibility with mature, high volume UPS systems. By leveraging Eaton’s double-conversion UPS topologies alongside intelligent ramp-rate and average power management algorithms, AI data centers can scale dynamically without requiring costly infrastructure redesigns. This approach allows the UPS and batteries to act as active load-shapers, smoothing AI-driven pulses while strictly maintaining mandatory emergency backup capacity. By utilizing energy storage as an active, schedulable asset, operators can right-size their infrastructure, avoid unnecessary grid upgrades, and deploy gigascale AI clusters with unprecedented efficiency. Safety First: Protecting AI Infrastructure While Enabling Innovation In high-density AI facilities, safety is non-negotiable. Ampace’s semi-solid state chemistry minimizes liquid electrolyte, greatly reducing the risk of leakage and thermal runaway under continuous AI high-load conditions. Ampace’s turn-key cabinet design developed by its independent R&D is engineered for seamless compatibility with mature, high volume UPS systems. Ampace At the same time, Eaton’s UPS design emphasizes system-level energy scheduling that never sacrifices mandatory emergency backup reserves, ensuring thermal safety and uninterrupted operation. This “safety-first” approach ensures that infrastructure can sustain aggressive performance targets without compromising the physical integrity of the facility. Coupled with over a decade of proven high-cycle life operation and design under shallow pulse conditions, these systems can extend operational lifespan, reduce replacement requirements, and provide operators with confidence that safety and reliability remain uncompromised as compute density continues to grow. To remain the scalable backbone of AI data centers As AI computing scales over the next two to three years, the industry will face stricter grid requirements and even more demanding pulse load characteristics. This evolution demands a forward-looking design philosophy that harmonizes UPS, battery, and grid compatibility. Ampace views current low-electrolyte semi-solid technologies as the optimal transitional step toward a fully solid-state future — one that promises ultimate safety and performance. Ampace remains committed to this long-term technological roadmap. We view current low-electrolyte semi-solid technologies as the optimal transitional step toward a fully solid-state future — one that promises ultimate safety and performance. Whether through rack-level BBU, integrated UPS systems, or containerized storage, the universal core of the AI era remains constant: high-speed response, long shallow-cycle life, and refined energy management. By engaging in deep technical exchanges with Eaton and leading energy innovators, Ampace ensures that its solutions not only meet today’s AI pulse challenges but also harmonize with broader infrastructure strategies and shared industry best practices. Ultimately, as traditional diesel generators gradually give way to diversified alternatives, the integrated UPS-plus-energy-storage system will become the fundamental infrastructure standard. The dialogue has just begun. Ampace will continue to engage in strategic exchanges with global industrial automation leaders and digital energy pioneers, co-authoring the playbook for a safer, more efficient, and more resilient AI-ready world.
When it comes to AI models, size matters. Even though some artificial-intelligence experts warn that scaling up large language models (LLMs) is hitting diminishing performance returns, companies are still coming out with ever larger AI tools. Meta’s latest Llama release had a staggering 2 trillion parameters that define the model. As models grow in size, their capabilities increase. But so do the energy demands and the time it takes to run the models, which increases their carbon footprint. To mitigate these issues, people have turned to smaller, less capable models and using lower-precision numbers whenever possible for the model parameters. But there is another path that may retain a staggeringly large model’s high performance while reducing the time it takes to run an energy footprint. This approach involves befriending the zeros inside large AI models. For many models, most of the parameters—the weights and activations—are actually zero, or so close to zero that they could be treated as such without losing accuracy. This quality is known as sparsity. Sparsity offers a significant opportunity for computational savings: Instead of wasting time and energy adding or multiplying zeros, these calculations could simply be skipped; rather than storing lots of zeros in memory, one need only store the nonzero parameters. Unfortunately, today’s popular hardware, like multicore CPUs and GPUs, do not naturally take full advantage of sparsity. To fully leverage sparsity, researchers and engineers need to rethink and re-architect each piece of the design stack, including the hardware, low-level firmware, and application software. In our research group at Stanford University, we have developed the first (to our knowledge) piece of hardware that’s capable of calculating all kinds of sparse and traditional workloads efficiently. The energy savings varied widely over the workloads, but on average our chip consumed one-seventieth the energy of a CPU, and performed the computation on average eight times as fast. To do this, we had to engineer the hardware, low-level firmware, and software from the ground up to take advantage of sparsity. We hope this is just the beginning of hardware and model development that will allow for more energy-efficient AI. What is sparsity? Neural networks, and the data that feeds into them, are represented as arrays of numbers. These arrays can be one-dimensional (vectors), two-dimensional (matrices), or more (tensors). A sparse vector, matrix, or tensor has mostly zero elements. The level of sparsity varies, but when zeroes make up more than 50 percent of any type of array, it can stand to benefit from sparsity-specific computational methods. In contrast, an object that is not sparse—that is, it has few zeros compared with the total number of elements—is called dense. Sparsity can be naturally present, or it can be induced. For example, a social-network graph will be naturally sparse. Imagine a graph where each node (point) represents a person, and each edge (a line segment connecting the points) represents a friendship. Since most people are not friends with one another, a matrix representing all possible edges will be mostly zeros. Other popular applications of AI, such as other forms of graph learning and recommendation models, contain naturally occurring sparsity as well. Beyond naturally occurring sparsity, sparsity can also be induced within an AI model in several ways. Two years ago, a team at Cerebras showed that one can set up to 70 to 80 percent of parameters in an LLM to zero without losing any accuracy. Cerebras demonstrated these results specifically on Meta’s open-source Llama 7B model, but the ideas extend to other LLM models like ChatGPT and Claude. The case for sparsity Sparse computation’s efficiency stems from two fundamental properties: the ability to compress away zeros and the convenient mathematical properties of zeros. Both the algorithms used in sparse computation and the hardware dedicated to them leverage these two basic ideas. First, sparse data can be compressed, making it more memory efficient to store “sparsely”—that is, in something called a sparse data type. Compression also makes it more energy efficient to move data when dealing with large amounts of it. This is best understood by an example. Take a four-by-four matrix with three nonzero elements. Traditionally, this matrix would be stored in memory as is, taking up 16 spaces. This matrix can also be compressed into a sparse data type, getting rid of the zeros and saving only the nonzero elements. In our example, this results in 13 memory spaces as opposed to 16 for the dense, uncompressed version. These savings in memory increase with increased sparsity and matrix size. In addition to the actual data values, compressed data also requires metadata. The row and column locations of the nonzero elements also must be stored. This is usually thought of as a “fibertree”: The row labels containing nonzero elements are listed and linked to the column labels of the nonzero elements, which are then linked to the values stored in those elements. In memory, things get a bit more complicated still: The row and column labels for each nonzero value must be stored as well as the “segments” that indicate how many such labels to expect, so the metadata and data can be clearly delineated from one another. In a dense, noncompressed matrix data type, values can be accessed either one at a time or in parallel, and their locations can be calculated directly with a simple equation. However, accessing values in sparse, compressed data requires looking up the coordinates of the row index and using that information to “indirectly” look up the coordinates of the column index before finally reaching the value. Depending on the actual locations of the sparse data values, these indirect lookups can be extremely random, making the computation data-dependent and requiring the allocation of memory lookups on the fly. Second, two mathematical properties of zero let software and hardware skip a lot of computation. Multiplying any number by zero will result in a zero, so there’s no need to actually do the multiplication. Adding zero to any number will always return that number, so there’s no need to do the addition either. In matrix-vector multiplication, one of the most common operations in AI workloads, all computations except those involving two nonzero elements can simply be skipped. Take, for example, the four-by-four matrix from the previous example and a vector of four numbers. In dense computation, each element of the vector must be multiplied by the corresponding element in each row and then added together to compute the final vector. In this case, that would take 16 multiplication operations and 16 additions (or four accumulations). In sparse computation, only the nonzero elements of the vector need be considered. For each nonzero vector element, indirect lookup can be used to find any corresponding nonzero matrix element, and only those need to be multiplied and added. In the example shown here, only two multiplication steps will be performed, instead of 16. The trouble with GPUs and CPUs Unfortunately, modern hardware is not well suited to accelerating sparse computation. For example, say we want to perform a matrix-vector multiplication. In the simplest case, in a single CPU core, each element in the vector would be multiplied sequentially and then written to memory. This is slow, because we can do only one multiplication at a time. So instead people use CPUs with vector support or GPUs. With this hardware, all elements would be multiplied in parallel, greatly speeding up the application. Now, imagine that both the matrix and vector contain extremely sparse data. The vectorized CPU and GPU would spend most of their efforts multiplying by zero, performing completely ineffectual computations. Newer generations of GPUs are capable of taking some advantage of sparsity in their hardware, but only a particular kind, called structured sparsity. Structured sparsity assumes that two out of every four adjacent parameters are zero. However, some models benefit more from unstructured sparsity—the ability for any parameter (weight or activation) to be zero and compressed away, regardless of where it is and what it is adjacent to. GPUs can run unstructured sparse computation in software, for example, through the use of the cuSparse GPU library. However, the support for sparse computations is often limited, and the GPU hardware gets underutilized, wasting energy-intensive computations on overhead. Petra Péterffy When doing sparse computations in software, modern CPUs may be a better alternative to GPU computation, because they are designed to be more flexible. Yet, sparse computations on the CPU are often bottlenecked by the indirect lookups used to find nonzero data. CPUs are designed to “prefetch” data based on what they expect they’ll need from memory, but for randomly sparse data, that process often fails to pull in the right stuff from memory. When that happens, the CPU must waste cycles calling for the right data. Apple was the first to speed up these indirect lookups by supporting a method called an array-of-pointers access pattern in the prefetcher of their A14 and M1 chips. Although innovations in prefetching make Apple CPUs more competitive for sparse computation, CPU architectures still have fundamental overheads that a dedicated sparse computing architecture would not, because they need to handle general-purpose computation. Other companies have been developing hardware that accelerates sparse machine learning as well. These include Cerebras’s Wafer Scale Engine and Meta’s Training and Inference Accelerator (MTIA). The Wafer Scale Engine, and its corresponding sparse programming framework, have shown incredibly sparse results of up to 70 percent sparsity on LLMs. However, the company’s hardware and software solutions support only weight sparsity, not activation sparsity, which is important for many applications. The second version of the MTIA claims a sevenfold sparse compute performance boost over the MTIA v1. However, the only publicly available information regarding sparsity support in the MTIA v2 is for matrix multiplication, not for vectors or tensors. Although matrix multiplications take up the majority of computation time in most modern ML models, it’s important to have sparsity support for other parts of the process. To avoid switching back and forth between sparse and dense data types, all of the operations should be sparse. Onyx Instead of these halfway solutions, our team at Stanford has developed a hardware accelerator, Onyx, that can take advantage of sparsity from the ground up, whether it’s structured or unstructured. Onyx is the first programmable accelerator to support both sparse and dense computation; it’s capable of accelerating key operations in both domains. To understand Onyx, it is useful to know what a coarse-grained reconfigurable array (CGRA) is and how it compares with more familiar hardware, like CPUs and field-programmable gate arrays (FPGAs). CPUs, CGRAs, and FPGAs represent a trade-off between efficiency and flexibility. Each individual logic unit of a CPU is designed for a specific function that it performs efficiently. On the other hand, since each individual bit of an FPGA is configurable, these arrays are extremely flexible, but very inefficient. The goal of CGRAs is to achieve the flexibility of FPGAs with the efficiency of CPUs. CGRAs are composed of efficient and configurable units, typically memory and compute, that are specialized for a particular application domain. This is the key benefit of this type of array: Programmers can reconfigure the internals of a CGRA at a high level, making it more efficient than an FPGA but more flexible than a CPU. The Onyx chip, built on a coarse-grained reconfigurable array (CGRA), is the first (to our knowledge) to support both sparse and dense computations. Olivia Hsu Onyx is composed of flexible, programmable processing element (PE) tiles and memory (MEM) tiles. The memory tiles store compressed matrices and other data formats. The processing element tiles operate on compressed matrices, eliminating all unnecessary and ineffectual computation. The Onyx compiler handles conversion from software instructions to CGRA configuration. First, the input expression—for instance, a sparse vector multiplication—is translated into a graph of abstract memory and compute nodes. In this example, there are memories for the input vectors and output vectors, a compute node for finding the intersection between nonzero elements, and a compute node for the multiplication. The compiler figures out how to map the abstract memory and compute nodes onto MEMs and PEs on the CGRA, and then how to route them together so that they can transfer data between them. Finally, the compiler produces the instruction set needed to configure the CGRA for the desired purpose. Since Onyx is programmable, engineers can map many different operations, such as vector-vector element multiplication, or the key tasks in AI, like matrix-vector or matrix-matrix multiplication, onto the accelerator. We evaluated the efficiency gains of our hardware by looking at the product of energy used and the time it took to compute, called the energy-delay product (EDP). This metric captures the trade-off of speed and energy. Minimizing just energy would lead to very slow devices, and minimizing speed would lead to high-area, high-power devices. Onyx achieves up to 565 times as much energy-delay product over CPUs (we used a 12-core Intel Xeon CPU) that utilize dedicated sparse libraries. Onyx can also be configured to accelerate regular, dense applications, similar to the way a GPU or TPU would. If the computation is sparse, Onyx is configured to use sparse primitives, and if the computation is dense, Onyx is reconfigured to take advantage of parallelism, similar to how GPUs function. This architecture is a step toward a single system that can accelerate both sparse and dense computations on the same silicon. Just as important, Onyx enables new algorithmic thinking. Sparse acceleration hardware will not only make AI more performance- and energy efficient but also enable researchers and engineers to explore new algorithms that have the potential to dramatically improve AI. The future with sparsity Our team is already working on next-generation chips built off of Onyx. Beyond matrix multiplication operations, machine learning models perform other types of math, like nonlinear layers, normalization, the softmax function, and more. We are adding support for the full range of computations on our next-gen accelerator and within the compiler. Since sparse machine learning models may have both sparse and dense layers, we are also working on integrating the dense and sparse accelerator architecture more efficiently on the chip, allowing for fast transformation between the different data types. We’re also looking at ways to manage memory constraints by breaking up the sparse data more effectively so we can run computations on several sparse accelerator chips. We are also working on systems that can predict the performance of accelerators such as ours, which will help in designing better hardware for sparse AI. Longer term, we’re interested in seeing whether high degrees of sparsity throughout AI computation will catch on with more model types, and whether sparse accelerators become adopted at a larger scale. Building the hardware to unstructured sparsity and optimally take advantage of zeros is just the beginning. With this hardware in hand, AI researchers and engineers will have the opportunity to explore new models and algorithms that leverage sparsity in novel and creative ways. We see this as a crucial research area for managing the ever-increasing runtime, costs, and environmental impact of AI.