뉴스 | 🇺🇸 미국 | IT/기술 | "ADDING"

Business Insider

중도 성향

IT/기술

The hot woman in that Facebook Marketplace listing might be AI.

Sellers are adding AI-generated babes to listings for cars, boats, and more to draw attention. It's a new spin on the old adage of "sex sells."

The Verge

중도 성향

IT/기술

The Sonos Era 100 speaker is down to its lowest price in months

Whether you’re considering starting a Sonos speaker setup, or adding to an existing group, the Sonos Era 100 is worth picking up. The compact, capable smart speaker is currently marked down to $189 ($30 off) at a variety of retailers, including Amazon, Best Buy, and directly from Sonos. If you want an even lower price, […]

Fortune

중도 성향

IT/기술

AI may already be adding hundreds of billions to the economy—without showing up in the data

A new policy brief argues AI may already be adding hundreds of billions to the global economy—but official statistics aren’t built to see it.

IEEE Spectrum

중도 성향

IT/기술

IEEE President’s Note: Designing a Safer Digital World for Kids

Children born after 2013 are the first generation to grow up fully immersed in digital systems, which weren’t designed with them in mind. One‑third of the world’s Internet users are younger than 18, according to UNICEF, yet these systems shaping their daily lives were built for adults. They were optimized for engagement and designed long before people understood how profoundly digital environments influence children. For engineers and technical professionals, online safety is not an abstract policy debate. It is a design challenge that demands rigor, systems thinking, and ethical foresight. Governments around the world are also beginning to recognize the problem. Policymakers from across Australia, Brazil, the European Union, Indonesia, and the United States are responding to risks engineers have long understood: Addictive features, inappropriate content, opaque data practices, and algorithmic systems shape user behavior in ways that their creators did not fully predict. For years, technology moved faster than governance. Now governance is trying to catch up. Global Shift Toward Design Reform Supporting National Digital Ambitions In Athens this year I met with senior leaders of Greek government agencies and key national research institutions. Greece is moving quickly on digital transformation and responsible technology governance, and our discussions reinforced IEEE’s role as a trusted, neutral collaborator. We focused on supporting Greece’s ambitions in digital modernization and public‑sector innovation. We also discussed responsible AI and age-appropriate digital design in Europe and elsewhere. These engagements, grounded in shared values and long‑term commitment, strengthened IEEE’s presence within the European ecosystem and opened new pathways for collaboration on trustworthy AI and child‑focused digital well‑being. The European Union and the United Kingdom have been among the first to act, embedding age‑appropriate digital design into their broader children’s rights agenda. Drawing on IEEE expertise and global best practices, Indonesia is the first country in Asia, and Brazil is the first country in Latin America, to adopt age-appropriate design regulation. Australia is aiming to limit access to harmful content and addictive design features through age restrictions on certain platforms. And in the United States, in addition to federal efforts, states including California, New York, and Utah are enacting approaches including age-appropriate design principles. Across these efforts, a shared realization is emerging. Protecting children online is not simply about filtering content or adding parental controls. It requires rethinking the architecture of digital systems regarding how data is collected, how algorithms make decisions, how interfaces influence attention, and how AI interacts with the developing minds of young users. Engineers and technical professionals understand that design choices are never neutral. They encode values, incentives, and assumptions. When the user is a child, those choices carry greater weight. This is where IEEE’s work becomes more essential. Protecting Children Online For more than a decade, IEEE has been building technical and ethical foundations for safer digital experiences. The first IEEE standard on age-appropriate design in 2021 marked a turning point. It offers a structured, principled approach to designing with children’s rights in mind. The Institute’s 2022 article “Use a New IEEE Standard to Design a Safer Digital World for Kids” highlights how the standard helps translate those principles into engineering practice. Today the IEEE Standards Association’s (SA) Trustworthy Digital Experiences portfolio provides a practical, technically grounded framework for governments and industry. Spanning ethical design, data governance, algorithmic transparency, and child‑focused digital well‑being, it has already initiated discussions with government stakeholders around the world. This work helps bridge the gap between engineering realities and policy ambitions. No single country can solve these challenges alone. Many policymakers lack access to the combined expertise in technology, governance, and children’s rights needed to act quickly and effectively. This collaborative effort helps close that gap. The stakes are high. Without coordinated action, public policy will continue to lag behind technology, leaving children exposed to risks that could have been mitigated through thoughtful design. But with the right frameworks, governments can ensure digital systems respect children’s rights, support healthy development, and promote well‑being. IEEE’s emerging standards and collaborative technology policy work offer a path forward. By grounding national efforts in evidence‑based, rights-aligned design principles, IEEE is helping governments move from reactive regulation to proactive, coherent, and globally informed strategies for protecting children online. Safeguarding childhood in the digital age is both a moral imperative and an engineering challenge. And IEEE is helping to lead the way. —Mary Ellen Randall IEEE president and CEO Please share your thoughts with me: president@ieee.org. This article appears in the June 2026 print issue.

Hacker News Front Page

중도 성향

IT/기술

Adding Linux support back for the BASIC (free) version of Vivado

Comments

Hacker News Front Page

중도 성향

IT/기술

Show HN: TV Explorer. Adding advanced UI to free online TV

Comments

IEEE Spectrum

중도 성향

IT/기술

Make a Soft Digital Clock Tick With Millifluidics

Electrons are great. We use them to move vehicles, illuminate cities, and, of course, compute. But computation is not confined to the world of electronics. And shifting to alternative nonelectronic realms can unlock unique advantages: Photonic chips, for instance, process information with light while generating little heat. Another compelling alternative is fluidics, which uses pressurized gases or liquids to build logic circuits. Pioneered in the 1960s but sidelined by microchips, the field reemerged in the 1990s as “microfluidics.” This approach aims to shrink laboratories onto a single chip by creating microscopic fluid channels with integrated micropneumatic control systems. Today, there is a second fluidic revival, this time in the domain of soft robotics. Scaling microfluidic designs up to the millimeter-scale range (millifluidics) enables the higher flow rates necessary to drive robotic actuators. These robots exploit the nonlinear behaviors of soft materials to create lifelike motion and safer interactions, often utilizing pressurized air. By building systems that “think” with the same air that powers them, we can drastically reduce the need for bulky electronic-to-pneumatic interfaces. This is the focus of my Soiboi Studio robotics lab. With millifluidic logic, I have steadily scaled the complexity of my designs. What began with a simple oscillator has most recently evolved into a clock featuring a soft, four-digit, seven-segment display. What Is Millifluidics? Building on microfluidics research from the early 2000s and recent developments from the Grover Lab at the University of California, Riverside, I’ve developed millifluidic devices using standard 3D printing and silicone casting. The basic architecture is simple: A flexible membrane is sandwiched between rigid layers embedded with networks of air channels. Just as electronics rely on differing voltage potentials, these fluidic circuits operate on the pressure difference between atmospheric pressure (logical 0) and a near-vacuum at around −60 kilopascals of relative pressure (logical 1). Using negative pressure means the membrane is pulled into openings. This creates robust seals that allow me to replicate electronic building blocks. A cast silicone membrane forms the face of the clock [top], while behind it sits 3D-printed millifluidic blocks [middle rows]. An Arduino Uno controls driver boards that operate solenoids, which are connected to valves that are attached to a vacuum pump [bottom row].James Provost While fluidic resistors are easily realized by adjusting the channel geometry, the heart of the system is a valve that mimics a metal-oxide-semiconductor field-effect transistor, or MOSFET. This vacuum “transistor” features a flow layer with two chambers (the source and drain) divided by a central valve seat and a control layer containing a cavity (the gate). A membrane runs between the control and flow layers and normally prevents airflow between the source and drain chambers. To switch the transistor on, a vacuum is applied to the gate chamber, sucking the membrane into the cavity and lifting it off the seat. This opens a path for airflow, equivalent to closing an electric circuit. By adding a small aperture to the membrane, I created a check valve—the fluidic equivalent of a diode. By combining transistors and resistive “pull-down” channels, I can build a full suite of logic gates. The original microfluidic designs that inspired me were fabricated from etched glass and milled acrylic. Adapting them for a standard 3D printer required reengineering the logic elements and mastering two critical fabrication techniques. First, I need airtight prints, yet printed plastic is notoriously porous. By printing at elevated temperatures, slow speeds, and slight overextrusion, I was able to fill microscopic gaps. When you’re using transparent filament, there’s a handy visual indicator: The more transparent the plastic appears, the lower its porosity. Second, I used glass for my print bed. By printing the upper and lower chambers directly against this bed, I got the interface surface to become mirror smooth. This finish is essential for creating reliable, airtight seals. A 0.3-millimeter silicone membrane is placed between the layers and secured with screws. How Does the Soft Clock Work? The clockface is a cast silicone membrane. Each digit segment is formed by a small underlying cavity. When air is evacuated from this cavity, the membrane is sucked inward to create a concave hollow; when atmospheric pressure is restored, the silicone pops back flush with the surface. The result is a mesmerizing, organic motion. The “brain” of the clock is an Arduino Uno, while the fluidics significantly reduce the hardware footprint. A four-digit, seven-segment display with two separator dots would require 29 solenoid valves to control directly. My clock needs just 11 valves. A pneumatic transistor is off when its upper control chamber is at atmospheric pressure [top]. When air is removed from the control chamber, it lifts a membrane, which allows air to flow between lower flow chambers and turns the transistor on [bottom]. James Provost To understand how it works, consider a standard electronic four-digit, seven-segment LED display. This also uses 11 pins to drive its digits. (In clockface displays, an additional pin is required to drive the separator dots.) Every digit is connected to a shared data bus with seven lines, one per segment. The four control lines select individual digits. Only one digit is illuminated at time, and strobing the digits at least 50 times per second creates the illusion that all four are simultaneously illuminated. Such high-speed switching is not possible with air. Instead, I rely on memory. Each segment acts like a capacitor: By evacuating its cavity (logic 1), you “charge” the segment; by restoring atmospheric pressure (logic 0), you discharge it. Hence, each digit acts as an independent 7-bit memory. If the system is sufficiently airtight, the segments maintain their state for several seconds. Like the electronic display, the system utilizes a seven-line data bus. Each line connects to a solenoid valve that provides either vacuum or atmospheric pressure. To selectively address the individual digits, I placed a fluidic transistor between each segment and its data line. All the transistors’ control inputs for a given digit are combined into one “write enable” line connected to its own solenoid valve. Activating this valve allows me to write data into the corresponding digit’s memory. The clock updates one digit per second, meaning a full cycle across the face takes 4 seconds. This cycle also drives the separator dots: A set of fluidic diodes connects the enable lines to the dots’ cavities. Consequently, as each digit is addressed, the dots pulse automatically. This display is more than a clock; it is a soft robot that happens to tell time. By offloading computation to the same air that powers movement, the clock approaches a new class of machines that are simpler, lighter, and more integrated. I’m now developing a guide for getting started with vacuum-powered logic and may release a refined version of this clock in the future. Watching the silicone skin morph serves as a fascinating reminder that not all logic needs silicon; sometimes, all you need is flexible silicone and a flow of air. This article appears in the June 2026 print issue as “The Soft Clock.”

Fortune

중도 성향

IT/기술

Adding AI ’employees’ is backfiring by creating new office scapegoats and making human workers sloppier and lazier

Research from Boston Consulting Group found that human staff becomes less accountable, blaming their new bot colleagues for their mistakes.

TechCrunch

중도 성향

IT/기술

Payroll startup Remote says it grew revenue 50% per employee without adding headcount

Payroll service provider Remote recently surpassed $300 million in annual recurring revenue (ARR) and became cash-flow positive, thanks to a 50% increase in revenue per employee resulting from AI adoption.

IEEE Spectrum

중도 성향

IT/기술

Meet NASA Low Outgassing Standards With Adhesives for Aerospace and Optical Systems

This sponsored article is brought to you by Master Bond. Outgassing is the release of volatile substances from a cured adhesive over time. These released materials, which may include residual solvents, unreacted monomers, or other chemical species, can deposit on nearby surfaces, causing contamination that interferes with sensitive components. What Is Outgassing and How Is It Measured? The industry standard for measuring outgassing is ASTM E595, developed by NASA. This test exposes a cured sample to 125 °C at high vacuum (10⁻⁵ to 10⁻⁶ torr) for 24 hours, measuring Total Mass Loss (TML) and Collected Volatile Condensable Materials (CVCM). To meet NASA low outgassing requirements, materials must exhibit less than 1 percent TML and less than 0.1 percent CVCM. Optical assemblies need contamination-free bonding and prevention of fogging the optics to maintain clarity. High-vacuum scientific equipment, semiconductor manufacturing tools, and aerospace electronics also demand low outgassing materials. Key Applications Low outgassing adhesives are essential wherever contamination could compromise performance and this is particularly relevant for space and satellite systems. Optical assemblies, including cameras, telescopes, and laser systems, need contamination-free bonding and prevention of fogging the optics to maintain clarity. High-vacuum scientific equipment, semiconductor manufacturing tools, and aerospace electronics also demand low outgassing materials. Even terrestrial optical devices benefit from reduced outgassing to ensure long-term reliability. EP30-2 is a versatile system can be used in a variety of applications in aerospace, electronic, optical and specialty OEM industries, especially when optical clarity and low outgassing are important criteria.Master Bond Ensuring Low Outgassing Performance Through Proper Handling Achieving specified outgassing performance requires attention to storage, mixing, and curing. For two-part systems, use the correct mix ratio and mix thoroughly to ensure complete reaction. Follow recommended cure schedules — adding heat, even at modest temperatures of 150-200 °F, significantly improves cross-linking and reduces outgassing. For UV-curable adhesives, ensure complete cure by using the correct lamp wavelength (typically 365 nm), adequate intensity, and proper exposure time with no shadowed areas. Troubleshooting Outgassing Issues If contamination appears on optical surfaces or outgassing test results are higher than expected, an incomplete cure might be one of the root causes. The first step is to verify that the adhesive has fully hardened to its specified Shore hardness. The next step is to consider adding or extending heat cure to improve cross-linking. Master Bond Product Recommendations Master Bond offers a range of adhesives meeting NASA low outgassing requirements. EP30-2 and EP21TCHT-1 are some examples of two-part epoxy systems that have been successfully deployed in demanding vacuum applications, including ultra-high vacuum environments. For applications requiring UV cure, Master Bond provides specialty UV formulations such as UV16 meeting ASTM E595, as well as dual-cure systems (UV plus heat) such as UV22DC80-10F for assemblies where shadows prevent complete UV exposure. These dual-cure products initiate with UV light and complete curing with heat as low as 180 °F (80 °C).

TechCrunch

중도 성향

IT/기술

OpenAI is making it easier to check if an image was made by their models

OpenAI announced two new measures to help detect AI generated imagery: joining the open C2PA standard and adding Google's SynthID to its products.

IEEE Spectrum

중도 성향

IT/기술

1

Better Hardware Could Turn Zeros into AI Heroes

When it comes to AI models, size matters. Even though some artificial-intelligence experts warn that scaling up large language models (LLMs) is hitting diminishing performance returns, companies are still coming out with ever larger AI tools. Meta’s latest Llama release had a staggering 2 trillion parameters that define the model. As models grow in size, their capabilities increase. But so do the energy demands and the time it takes to run the models, which increases their carbon footprint. To mitigate these issues, people have turned to smaller, less capable models and using lower-precision numbers whenever possible for the model parameters. But there is another path that may retain a staggeringly large model’s high performance while reducing the time it takes to run an energy footprint. This approach involves befriending the zeros inside large AI models. For many models, most of the parameters—the weights and activations—are actually zero, or so close to zero that they could be treated as such without losing accuracy. This quality is known as sparsity. Sparsity offers a significant opportunity for computational savings: Instead of wasting time and energy adding or multiplying zeros, these calculations could simply be skipped; rather than storing lots of zeros in memory, one need only store the nonzero parameters. Unfortunately, today’s popular hardware, like multicore CPUs and GPUs, do not naturally take full advantage of sparsity. To fully leverage sparsity, researchers and engineers need to rethink and re-architect each piece of the design stack, including the hardware, low-level firmware, and application software. In our research group at Stanford University, we have developed the first (to our knowledge) piece of hardware that’s capable of calculating all kinds of sparse and traditional workloads efficiently. The energy savings varied widely over the workloads, but on average our chip consumed one-seventieth the energy of a CPU, and performed the computation on average eight times as fast. To do this, we had to engineer the hardware, low-level firmware, and software from the ground up to take advantage of sparsity. We hope this is just the beginning of hardware and model development that will allow for more energy-efficient AI. What is sparsity? Neural networks, and the data that feeds into them, are represented as arrays of numbers. These arrays can be one-dimensional (vectors), two-dimensional (matrices), or more (tensors). A sparse vector, matrix, or tensor has mostly zero elements. The level of sparsity varies, but when zeroes make up more than 50 percent of any type of array, it can stand to benefit from sparsity-specific computational methods. In contrast, an object that is not sparse—that is, it has few zeros compared with the total number of elements—is called dense. Sparsity can be naturally present, or it can be induced. For example, a social-network graph will be naturally sparse. Imagine a graph where each node (point) represents a person, and each edge (a line segment connecting the points) represents a friendship. Since most people are not friends with one another, a matrix representing all possible edges will be mostly zeros. Other popular applications of AI, such as other forms of graph learning and recommendation models, contain naturally occurring sparsity as well. Beyond naturally occurring sparsity, sparsity can also be induced within an AI model in several ways. Two years ago, a team at Cerebras showed that one can set up to 70 to 80 percent of parameters in an LLM to zero without losing any accuracy. Cerebras demonstrated these results specifically on Meta’s open-source Llama 7B model, but the ideas extend to other LLM models like ChatGPT and Claude. The case for sparsity Sparse computation’s efficiency stems from two fundamental properties: the ability to compress away zeros and the convenient mathematical properties of zeros. Both the algorithms used in sparse computation and the hardware dedicated to them leverage these two basic ideas. First, sparse data can be compressed, making it more memory efficient to store “sparsely”—that is, in something called a sparse data type. Compression also makes it more energy efficient to move data when dealing with large amounts of it. This is best understood by an example. Take a four-by-four matrix with three nonzero elements. Traditionally, this matrix would be stored in memory as is, taking up 16 spaces. This matrix can also be compressed into a sparse data type, getting rid of the zeros and saving only the nonzero elements. In our example, this results in 13 memory spaces as opposed to 16 for the dense, uncompressed version. These savings in memory increase with increased sparsity and matrix size. In addition to the actual data values, compressed data also requires metadata. The row and column locations of the nonzero elements also must be stored. This is usually thought of as a “fibertree”: The row labels containing nonzero elements are listed and linked to the column labels of the nonzero elements, which are then linked to the values stored in those elements. In memory, things get a bit more complicated still: The row and column labels for each nonzero value must be stored as well as the “segments” that indicate how many such labels to expect, so the metadata and data can be clearly delineated from one another. In a dense, noncompressed matrix data type, values can be accessed either one at a time or in parallel, and their locations can be calculated directly with a simple equation. However, accessing values in sparse, compressed data requires looking up the coordinates of the row index and using that information to “indirectly” look up the coordinates of the column index before finally reaching the value. Depending on the actual locations of the sparse data values, these indirect lookups can be extremely random, making the computation data-dependent and requiring the allocation of memory lookups on the fly. Second, two mathematical properties of zero let software and hardware skip a lot of computation. Multiplying any number by zero will result in a zero, so there’s no need to actually do the multiplication. Adding zero to any number will always return that number, so there’s no need to do the addition either. In matrix-vector multiplication, one of the most common operations in AI workloads, all computations except those involving two nonzero elements can simply be skipped. Take, for example, the four-by-four matrix from the previous example and a vector of four numbers. In dense computation, each element of the vector must be multiplied by the corresponding element in each row and then added together to compute the final vector. In this case, that would take 16 multiplication operations and 16 additions (or four accumulations). In sparse computation, only the nonzero elements of the vector need be considered. For each nonzero vector element, indirect lookup can be used to find any corresponding nonzero matrix element, and only those need to be multiplied and added. In the example shown here, only two multiplication steps will be performed, instead of 16. The trouble with GPUs and CPUs Unfortunately, modern hardware is not well suited to accelerating sparse computation. For example, say we want to perform a matrix-vector multiplication. In the simplest case, in a single CPU core, each element in the vector would be multiplied sequentially and then written to memory. This is slow, because we can do only one multiplication at a time. So instead people use CPUs with vector support or GPUs. With this hardware, all elements would be multiplied in parallel, greatly speeding up the application. Now, imagine that both the matrix and vector contain extremely sparse data. The vectorized CPU and GPU would spend most of their efforts multiplying by zero, performing completely ineffectual computations. Newer generations of GPUs are capable of taking some advantage of sparsity in their hardware, but only a particular kind, called structured sparsity. Structured sparsity assumes that two out of every four adjacent parameters are zero. However, some models benefit more from unstructured sparsity—the ability for any parameter (weight or activation) to be zero and compressed away, regardless of where it is and what it is adjacent to. GPUs can run unstructured sparse computation in software, for example, through the use of the cuSparse GPU library. However, the support for sparse computations is often limited, and the GPU hardware gets underutilized, wasting energy-intensive computations on overhead. Petra Péterffy When doing sparse computations in software, modern CPUs may be a better alternative to GPU computation, because they are designed to be more flexible. Yet, sparse computations on the CPU are often bottlenecked by the indirect lookups used to find nonzero data. CPUs are designed to “prefetch” data based on what they expect they’ll need from memory, but for randomly sparse data, that process often fails to pull in the right stuff from memory. When that happens, the CPU must waste cycles calling for the right data. Apple was the first to speed up these indirect lookups by supporting a method called an array-of-pointers access pattern in the prefetcher of their A14 and M1 chips. Although innovations in prefetching make Apple CPUs more competitive for sparse computation, CPU architectures still have fundamental overheads that a dedicated sparse computing architecture would not, because they need to handle general-purpose computation. Other companies have been developing hardware that accelerates sparse machine learning as well. These include Cerebras’s Wafer Scale Engine and Meta’s Training and Inference Accelerator (MTIA). The Wafer Scale Engine, and its corresponding sparse programming framework, have shown incredibly sparse results of up to 70 percent sparsity on LLMs. However, the company’s hardware and software solutions support only weight sparsity, not activation sparsity, which is important for many applications. The second version of the MTIA claims a sevenfold sparse compute performance boost over the MTIA v1. However, the only publicly available information regarding sparsity support in the MTIA v2 is for matrix multiplication, not for vectors or tensors. Although matrix multiplications take up the majority of computation time in most modern ML models, it’s important to have sparsity support for other parts of the process. To avoid switching back and forth between sparse and dense data types, all of the operations should be sparse. Onyx Instead of these halfway solutions, our team at Stanford has developed a hardware accelerator, Onyx, that can take advantage of sparsity from the ground up, whether it’s structured or unstructured. Onyx is the first programmable accelerator to support both sparse and dense computation; it’s capable of accelerating key operations in both domains. To understand Onyx, it is useful to know what a coarse-grained reconfigurable array (CGRA) is and how it compares with more familiar hardware, like CPUs and field-programmable gate arrays (FPGAs). CPUs, CGRAs, and FPGAs represent a trade-off between efficiency and flexibility. Each individual logic unit of a CPU is designed for a specific function that it performs efficiently. On the other hand, since each individual bit of an FPGA is configurable, these arrays are extremely flexible, but very inefficient. The goal of CGRAs is to achieve the flexibility of FPGAs with the efficiency of CPUs. CGRAs are composed of efficient and configurable units, typically memory and compute, that are specialized for a particular application domain. This is the key benefit of this type of array: Programmers can reconfigure the internals of a CGRA at a high level, making it more efficient than an FPGA but more flexible than a CPU. The Onyx chip, built on a coarse-grained reconfigurable array (CGRA), is the first (to our knowledge) to support both sparse and dense computations. Olivia Hsu Onyx is composed of flexible, programmable processing element (PE) tiles and memory (MEM) tiles. The memory tiles store compressed matrices and other data formats. The processing element tiles operate on compressed matrices, eliminating all unnecessary and ineffectual computation. The Onyx compiler handles conversion from software instructions to CGRA configuration. First, the input expression—for instance, a sparse vector multiplication—is translated into a graph of abstract memory and compute nodes. In this example, there are memories for the input vectors and output vectors, a compute node for finding the intersection between nonzero elements, and a compute node for the multiplication. The compiler figures out how to map the abstract memory and compute nodes onto MEMs and PEs on the CGRA, and then how to route them together so that they can transfer data between them. Finally, the compiler produces the instruction set needed to configure the CGRA for the desired purpose. Since Onyx is programmable, engineers can map many different operations, such as vector-vector element multiplication, or the key tasks in AI, like matrix-vector or matrix-matrix multiplication, onto the accelerator. We evaluated the efficiency gains of our hardware by looking at the product of energy used and the time it took to compute, called the energy-delay product (EDP). This metric captures the trade-off of speed and energy. Minimizing just energy would lead to very slow devices, and minimizing speed would lead to high-area, high-power devices. Onyx achieves up to 565 times as much energy-delay product over CPUs (we used a 12-core Intel Xeon CPU) that utilize dedicated sparse libraries. Onyx can also be configured to accelerate regular, dense applications, similar to the way a GPU or TPU would. If the computation is sparse, Onyx is configured to use sparse primitives, and if the computation is dense, Onyx is reconfigured to take advantage of parallelism, similar to how GPUs function. This architecture is a step toward a single system that can accelerate both sparse and dense computations on the same silicon. Just as important, Onyx enables new algorithmic thinking. Sparse acceleration hardware will not only make AI more performance- and energy efficient but also enable researchers and engineers to explore new algorithms that have the potential to dramatically improve AI. The future with sparsity Our team is already working on next-generation chips built off of Onyx. Beyond matrix multiplication operations, machine learning models perform other types of math, like nonlinear layers, normalization, the softmax function, and more. We are adding support for the full range of computations on our next-gen accelerator and within the compiler. Since sparse machine learning models may have both sparse and dense layers, we are also working on integrating the dense and sparse accelerator architecture more efficiently on the chip, allowing for fast transformation between the different data types. We’re also looking at ways to manage memory constraints by breaking up the sparse data more effectively so we can run computations on several sparse accelerator chips. We are also working on systems that can predict the performance of accelerators such as ours, which will help in designing better hardware for sparse AI. Longer term, we’re interested in seeing whether high degrees of sparsity throughout AI computation will catch on with more model types, and whether sparse accelerators become adopted at a larger scale. Building the hardware to unstructured sparsity and optimally take advantage of zeros is just the beginning. With this hardware in hand, AI researchers and engineers will have the opportunity to explore new models and algorithms that leverage sparsity in novel and creative ways. We see this as a crucial research area for managing the ever-increasing runtime, costs, and environmental impact of AI.

IEEE Spectrum

중도 성향

IT/기술

How AI Will Change Chip Design

The end of Moore’s Law is looming. Engineers and designers can do only so much to miniaturize transistors and pack as many of them as possible into chips. So they’re turning to other approaches to chip design, incorporating technologies like AI into the process. Samsung, for instance, is adding AI to its memory chips to enable processing in memory, thereby saving energy and speeding up machine learning. Speaking of speed, Google’s TPU V4 AI chip has doubled its processing power compared with that of its previous version. But AI holds still more promise and potential for the semiconductor industry. To better understand how AI is set to revolutionize chip design, we spoke with Heather Gorr, senior product manager for MathWorks’ MATLAB platform. How is AI currently being used to design the next generation of chips? Heather Gorr: AI is such an important technology because it’s involved in most parts of the cycle, including the design and manufacturing process. There’s a lot of important applications here, even in the general process engineering where we want to optimize things. I think defect detection is a big one at all phases of the process, especially in manufacturing. But even thinking ahead in the design process, [AI now plays a significant role] when you’re designing the light and the sensors and all the different components. There’s a lot of anomaly detection and fault mitigation that you really want to consider. Heather GorrMathWorks Then, thinking about the logistical modeling that you see in any industry, there is always planned downtime that you want to mitigate; but you also end up having unplanned downtime. So, looking back at that historical data of when you’ve had those moments where maybe it took a bit longer than expected to manufacture something, you can take a look at all of that data and use AI to try to identify the proximate cause or to see something that might jump out even in the processing and design phases. We think of AI oftentimes as a predictive tool, or as a robot doing something, but a lot of times you get a lot of insight from the data through AI. What are the benefits of using AI for chip design? Gorr: Historically, we’ve seen a lot of physics-based modeling, which is a very intensive process. We want to do a reduced order model, where instead of solving such a computationally expensive and extensive model, we can do something a little cheaper. You could create a surrogate model, so to speak, of that physics-based model, use the data, and then do your parameter sweeps, your optimizations, your Monte Carlo simulations using the surrogate model. That takes a lot less time computationally than solving the physics-based equations directly. So, we’re seeing that benefit in many ways, including the efficiency and economy that are the results of iterating quickly on the experiments and the simulations that will really help in the design. So it’s like having a digital twin in a sense? Gorr: Exactly. That’s pretty much what people are doing, where you have the physical system model and the experimental data. Then, in conjunction, you have this other model that you could tweak and tune and try different parameters and experiments that let sweep through all of those different situations and come up with a better design in the end. So, it’s going to be more efficient and, as you said, cheaper? Gorr: Yeah, definitely. Especially in the experimentation and design phases, where you’re trying different things. That’s obviously going to yield dramatic cost savings if you’re actually manufacturing and producing [the chips]. You want to simulate, test, experiment as much as possible without making something using the actual process engineering. We’ve talked about the benefits. How about the drawbacks? Gorr: The [AI-based experimental models] tend to not be as accurate as physics-based models. Of course, that’s why you do many simulations and parameter sweeps. But that’s also the benefit of having that digital twin, where you can keep that in mind—it’s not going to be as accurate as that precise model that we’ve developed over the years. Both chip design and manufacturing are system intensive; you have to consider every little part. And that can be really challenging. It’s a case where you might have models to predict something and different parts of it, but you still need to bring it all together. One of the other things to think about too is that you need the data to build the models. You have to incorporate data from all sorts of different sensors and different sorts of teams, and so that heightens the challenge. How can engineers use AI to better prepare and extract insights from hardware or sensor data? Gorr: We always think about using AI to predict something or do some robot task, but you can use AI to come up with patterns and pick out things you might not have noticed before on your own. People will use AI when they have high-frequency data coming from many different sensors, and a lot of times it’s useful to explore the frequency domain and things like data synchronization or resampling. Those can be really challenging if you’re not sure where to start. One of the things I would say is, use the tools that are available. There’s a vast community of people working on these things, and you can find lots of examples [of applications and techniques] on GitHub or MATLAB Central, where people have shared nice examples, even little apps they’ve created. I think many of us are buried in data and just not sure what to do with it, so definitely take advantage of what’s already out there in the community. You can explore and see what makes sense to you, and bring in that balance of domain knowledge and the insight you get from the tools and AI. What should engineers and designers consider when using AI for chip design? Gorr: Think through what problems you’re trying to solve or what insights you might hope to find, and try to be clear about that. Consider all of the different components, and document and test each of those different parts. Consider all of the people involved, and explain and hand off in a way that is sensible for the whole team. How do you think AI will affect chip designers’ jobs? Gorr: It’s going to free up a lot of human capital for more advanced tasks. We can use AI to reduce waste, to optimize the materials, to optimize the design, but then you still have that human involved whenever it comes to decision-making. I think it’s a great example of people and technology working hand in hand. It’s also an industry where all people involved—even on the manufacturing floor—need to have some level of understanding of what’s happening, so this is a great industry for advancing AI because of how we test things and how we think about them before we put them on the chip. How do you envision the future of AI and chip design? Gorr: It’s very much dependent on that human element—involving people in the process and having that interpretable model. We can do many things with the mathematical minutiae of modeling, but it comes down to how people are using it, how everybody in the process is understanding and applying it. Communication and involvement of people of all skill levels in the process are going to be really important. We’re going to see less of those superprecise predictions and more transparency of information, sharing, and that digital twin—not only using AI but also using our human knowledge and all of the work that many people have done over the years.

뉴스

타임라인 키워드

The hot woman in that Facebook Marketplace listing might be AI.

The Sonos Era 100 speaker is down to its lowest price in months

AI may already be adding hundreds of billions to the economy—without showing up in the data

IEEE President’s Note: Designing a Safer Digital World for Kids

Adding Linux support back for the BASIC (free) version of Vivado

Show HN: TV Explorer. Adding advanced UI to free online TV

Make a Soft Digital Clock Tick With Millifluidics

Adding AI ’employees’ is backfiring by creating new office scapegoats and making human workers sloppier and lazier

Payroll startup Remote says it grew revenue 50% per employee without adding headcount

Meet NASA Low Outgassing Standards With Adhesives for Aerospace and Optical Systems

OpenAI is making it easier to check if an image was made by their models

Better Hardware Could Turn Zeros into AI Heroes

How AI Will Change Chip Design