Variant raises $222 million for new fund with a thesis of AI, crypto and ‘autonomy’
A VC who helped Andreessen Horowitz design its early crypto strategy has big ideas on the next era of decentralization.
🇺🇸 미국 · IT/기술 · "IDEAS" · 총 11건
필터 보기현재 지수
50.0
0 = 부정 우세
50 = 중립
100 = 긍정 우세
최근 7일 기준 11,145건을 분석한 결과, 뉴스 심리지수는 50.0(균형)입니다. 긍정 1건(0.0%)·중립 11,143건(100.0%)·부정 1건(0.0%)이며, 중립 비중이 뚜렷하게 높습니다. 성향 지수는 종합 18.4(중도 균형)입니다.
A VC who helped Andreessen Horowitz design its early crypto strategy has big ideas on the next era of decentralization.
Syeda Irtizaali, Netflix’s recently appointed U.K. director of unscripted, told an audience at SXSW London on Tuesday she doesn’t want to use AI in unscripted television. Describing Netflix as a “tech-forward company,” she said that she was “very relaxed about people using AI as a tool to bring ideas in,” describing AI as a “really […]
Slots & Daggers, a low-key, fantasy-themed slot machine roguelike, was one of my favorite games last year. That may sound like a complicated description, but the game mixes ideas from deckbuilding roguelikes with slot machines to create an engrossing loop, and there's steady meta-progression that helps you push further with just about every run. Perhaps […]
Dads are traditionally tough to shop for—let me help with these handpicked gift ideas for fathers with great taste.
AI-fueled delusions can happen when chatbots respond to grandiose, paranoid or imaginary ideas with affirmation or encouragement.
Comments
The IEEE Communications Society (ComSoc)’s Research Collaboration Pitch Session initiative is proving to be a catalyst for meaningful engagement between academic researchers and industry innovators. Launched last year, the program connects promising researchers with industry leaders who can offer them funding, mentorship, and connections to bring interesting ideas closer to real-world deployment. Rather than relying on chance encounters at conferences, the pitch sessions create a focused environment. Five academic presenters share their work with five industry representatives, known as “innovation scouts”: senior leaders primarily chosen from ComSoc’s Corporate Program partner companies such as Ericsson, Intel, Keysight, and Nokia. The curated format ensures that each idea receives dedicated attention from professionals who are seeking new concepts aligned with their organization’s priorities. The initiative was launched in November at the IEEE Middle East Conference on Communications and Networking (MECOM) in Cairo and appeared in December at the IEEE Global Communications Conference (GLOBECOM) in Taipei, Taiwan. AI-driven communication network One of the most compelling outcomes came from the inaugural session in Cairo. Angela Waithaka, a student member and biomedical engineering student at Kenyatta University, in Nairobi, Kenya, presented her “AI-Driven Predictive Communication Networks for Enhanced Performance in Resource-Constrained Environments” paper. You can view her presentation along with others on IEEE.tv. Waithaka’s research tackles a critical challenge: Next-generation communication systems increasingly rely on artificial intelligence and machine learning, yet most existing architectures consume abundant computational and energy resources, which are not always present in developing regions. Waithaka proposed lightweight, adaptive AI/machine learning models capable of delivering predictive, reliable communication performance even under tight resource constraints. Her vision resonated with Ruiqi “Richie” Liu, a master researcher at ZTE in China. ZTE is a global leader in integrated information and communication technology solutions. Liu says he recognized the relevance Waithaka’s proposal had to his company’s work with the International Telecommunication Union. He invited her to establish an ITU account so she could participate in the organization’s meetings discussing global telecommunications standardization projects—which would elevate her work to an international stage. Simplifying data center protocols The momentum continued at GLOBECOM. Among the presenters was Nirmala Shenoy, a professor at the Rochester Institute of Technology, in New York. Shenoy, an IEEE member, spoke on the topic of simplifying data center network protocols. She highlighted the growing complexity of the critical networks, which underpin cloud services, enterprise IT, and emerging AI workloads. Shenoy’s focus on reducing protocol complexity while maintaining scalability, resilience, and low latency caught the attention of an innovation scout from Nokia, who heads its eXtended Reality Lab in Madrid. He found the key person at Nokia for Shenoy to connect with to discuss her research, and it led her to record a video for the company detailing her approach and its potential applications. A model for accelerating innovation The early success stories demonstrate the power of intentional, structured engagement. By bringing researchers and industry leaders together in a format designed for discovery, ComSoc is helping accelerate innovation and expand opportunities for collaboration. The pitch sessions are not merely conference events; they are becoming a bridge between academic creativity and industry implementation. This year sessions will be held during the IEEE International Conference on Communications in Glasgow from 24 to 28 May, and more are scheduled during the IEEE International Mediterranean Conference on Communications and Networking in Sardinia from 6 to 9 July, and at GLOBECOM in Macau from 7 to 11 December. As the program continues to grow, it could become a signature ComSoc initiative, one that strengthens the research ecosystem, supports emerging talent, and ensures that promising ideas find pathways to real-world impact.
This sponsored article is brought to you by Applied Materials. At pivotal moments in history, progress has required more than individual brilliance. The most consequential breakthroughs — such as those achieved under the Human Genome Project — required a new operating paradigm: Concentrate the world’s best talent around a single mission, establish a common platform, share critical infrastructure, and collapse feedback loops. When stakes are high and timelines are compressed, sequential and siloed innovation simply cannot keep pace. Today’s AI era is creating an engineering race with similar demands. Every company is pushing to deliver higher-performance AI systems, faster. But performance is no longer defined by compute alone. AI workloads are increasingly dominated by the movement of data: In many cases, moving bits consumes as much — or more — energy than compute itself. As a result, reducing energy per bit can extend system‑level performance alongside gains in peak compute. The path to energy‑efficient AI therefore runs through system‑level engineering, spanning three tightly interconnected domains: Logic, where performance per watt depends on efficient transistor switching, low‑loss power, and signal delivery through dense wiring stacks. Memory, where surging bandwidth and capacity demands expose the memory wall, with processor capability advancing faster than memory access. Advanced packaging, where 3D integration, chiplet architectures, and high‑density interconnects bring compute and memory closer together — enabling system designs monolithic scaling can no longer sustain. These domains can no longer be optimized independently. Gains in logic efficiency stall without sufficient memory bandwidth. Advances in memory bandwidth fall short if packaging cannot deliver proximity within thermal and mechanical constraints. Packaging, in turn, is constrained by the precision of both front‑end device fabrication and back‑end integration processes. In the angstrom era, the hardest problems arise at the boundaries — between compute and memory in the package, front‑end and back‑end integration, and the tightly coupled process steps needed for precise 3D fabrication. And it is precisely this boundary‑driven complexity where the traditional innovation model breaks down. The Traditional R&D Workflow Is Too Slow for Angstrom‑Era AI For decades, the semiconductor industry’s R&D model has resembled a relay race. Capabilities are developed in one part of the ecosystem, handed off downstream through integration and manufacturing, evaluated by chip and system designers, and only then fed back for the next iteration. That model worked when progress was dominated by relatively modular steps that could be scaled independently and simply dropped into the manufacturing flow. But the AI timeline has upended these rules. At angstrom‑scale dimensions, the physics enforces inescapable coupling across the entire stack: materials choices shape integration schemes; integration defines design rules; design rules dictate power delivery; wiring sets thermal budgets; and thermals ultimately constrain packaging scaling. System architects simply cannot wait 10–15 years for each major semiconductor technology inflection to mature. Representing a roughly $5 billion investment, EPIC is the largest commitment to advanced semiconductor equipment R&D in U.S. history. A long‑term perspective is essential to align materials innovation with emerging device architectures — and to develop the tools and processes required to integrate both with manufacturable precision. At Applied Materials, together with our customers, we are charting a course across the next 3–4 generations, extending as far as 10 years down the roadmap. The angstrom era demands that we break down silos and bring together the industry’s best minds — from leading companies to leading academic institutions. If the problem is coupled, the solution must be coupled. If the timeline is compressed, the learning loop must be compressed. It’s not enough to just innovate — we must innovate how we innovate. EPIC: A Center and Platform for High‑Velocity Co‑Innovation This is the challenge that Applied Materials EPIC Center is designed to solve. Representing a roughly US $5 billion investment, EPIC is the largest commitment to advanced semiconductor equipment R&D in U.S. history. When it opens in 2026, it will deliver state‑of‑the‑art cleanroom capabilities built from the ground up to shorten the path from early‑stage research to full‑scale manufacturing. But the facilities are only one component of the model. EPIC is also a platform, an operating system for high-velocity co‑innovation that revolutionizes how ideas move from the lab to the fab. EPIC is a platform, an operating system for high-velocity co‑innovation that revolutionizes how ideas move from the lab to the fab.Applied Materials The EPIC model compresses the traditional workflow. Customer engineers work side‑by‑side with Applied technologists from day one — moving beyond isolated process optimization and downstream handoffs. Within a shared, secure environment, EPIC tightly integrates atomistic modeling, test vehicles, process development, validation, and metrology feedback. Constraints that once surfaced late in development are identified and addressed early. The result is a potentially 2x faster path that benefits the entire ecosystem under one roof: Chipmakers gain earlier access to Applied’s R&D portfolio, faster learning cycles, and accelerated transfer of next‑generation technologies into high‑volume manufacturing. Ecosystem partners gain earlier access to advanced manufacturing technology and collaboration opportunities that expand what is possible through materials innovation. Academic institutions gain opportunities to strengthen the lab‑to‑fab pipeline and help develop future semiconductor talent. Building on decades of co‑development, we are reinventing the innovation pipeline with our partners across logic, memory, and advanced packaging to deliver the next leap in energy‑efficient AI. Accelerating Advanced Logic Logic remains the engine of AI compute. In the angstrom era, however, system‑level gains are increasingly constrained by power and energy. Extending AI performance now depends on architectures that deliver more performance per watt — accelerating the move to 3D devices such as gate‑all‑around (GAA) transistors, which boost density within a compact footprint while preserving power efficiency. Architectures that deliver more performance per watt are accelerating the move to 3D devices such as gate‑all‑around (GAA) transistors, and further out, complementary FETs (CFETs), which push density scaling even more.Applied Materials These architectural shifts are unfolding at unprecedented scale, with the logic roadmap already extending beyond first‑generation GAA toward more advanced designs. One key example is GAA with backside power delivery, which relocates thick power lines to the backside of the wafer, reducing resistive losses and freeing front‑side routing for tighter logic cell integration. Another example brings adjacent GAA PMOS and NMOS transistors closer together while inserting a dielectric isolation wall between them to minimize electrical interference. Further out, complementary FETs (CFETs) push density scaling even more by stacking PMOS and NMOS devices directly atop one another. While these architectures deliver compelling gains in performance per watt and logic density without relying solely on tighter lithography, they significantly raise integration complexity. Manufacturing a single GAA device today can involve more than 2,000 tightly interdependent process steps. At the same time, wiring stacks continue to grow taller and denser to connect these advanced logic devices. Modern leading‑edge GPUs now in development pack more than 300 billion transistors into an area little larger than a postage stamp, interconnected by over 2,000 miles of wiring. Modern leading‑edge GPUs now in development pack more than 300 billion transistors into an area little larger than a postage stamp, interconnected by over 2,000 miles of wiring.Applied Materials At this level of complexity, the process steps used to create these precise 3D devices and wiring stacks cannot be optimized independently. Design and process must evolve in lockstep, and materials innovation and fabrication methods must advance alongside device architecture. EPIC’s co‑innovation model is designed to accelerate exactly this convergence — enabling logic compute to continue advancing the frontiers of AI at the pace the roadmap demands. Powering the Memory Roadmap At the same time, the AI computing era is fundamentally reshaping how data is generated, moved, and processed — making memory technologies, especially DRAM, central to delivering the energy‑efficient performance AI systems require. As models grow larger and more data‑hungry, the DRAM roadmap is shifting toward architectures that deliver higher density, greater bandwidth, and faster access per watt. At the DRAM cell level, AI performance requirements are driving a transition from 6F² buried‑channel array transistors (BCAT) to more compact 4F², and beyond that, architectures that move past what 2D scaling alone can deliver. Applied Materials At the DRAM cell level, this shift is driving a transition from 6F² buried‑channel array transistors (BCAT) to more compact 4F² architectures, which orient the transistor vertically to boost density and reduce chip area. Looking beyond 4F², sustaining gains in performance per watt will require moving past what 2D scaling alone can deliver. The industry is therefore turning to 3D DRAM, stacking memory cells vertically to add capacity within a constrained footprint. As these structures grow taller and aspect ratios intensify, high-mobility materials engineering in three dimensions becomes increasingly critical to performance and reliability. Beyond the memory cell array, another powerful lever for DRAM scaling is shrinking the peripheral circuitry, which includes logic transistors and interconnect wiring. One emerging approach places select periphery functions beneath the DRAM array by bonding two wafers — one optimized for the DRAM cells and the other for CMOS logic — using multiple wiring layers. Beyond the memory cell array, another powerful lever for DRAM scaling is shrinking the peripheral circuitry, which includes logic transistors and interconnect wiring.Applied Materials In parallel, DRAM performance is being extended by leveraging logic‑proven enhancers in the memory periphery. These include mobility boosters such as embedded silicon germanium and stress films, along with wiring upgrades like improved low‑k dielectrics and advanced copper interconnects. Memory manufacturers are also transitioning periphery transistors from planar devices to FinFET architectures, following the logic roadmap to further improve I/O speed. These valuable inflections are central to EPIC’s mission — where they can be co-developed and rapidly validated for next‑generation memory systems. Driving System Scaling With Advanced Packaging As data movement becomes the dominant energy cost in AI systems, advanced packaging has emerged as a critical lever for improving system‑level efficiency—shortening interconnect distances, increasing bandwidth density, and reducing the power required to move data between logic and memory. The rise of 3D packages such as high‑bandwidth memory (HBM) underscores why advanced packaging is becoming central to the AI era.Applied Materials High‑bandwidth memory (HBM) marks a major inflection along this path. By stacking DRAM dies — scaling to 16 layers and beyond — and placing memory much closer to the processor, HBM enables rapid access to ever‑larger working datasets. This delivers step‑function gains in both bandwidth and energy efficiency. More broadly, the rise of 3D packages such as HBM underscores why advanced packaging is becoming central to the AI era. Packaging now addresses system‑level constraints that logic and memory device scaling alone can no longer overcome. It also enables a move away from monolithic systems‑on‑chip toward chiplet‑based architectures, as AI workloads increasingly demand flexible designs that combine logic, memory, and specialized accelerators optimized for specific tasks. A vital technology powering this roadmap is hybrid bonding. With interconnect pitches approaching those of on‑chip wiring, conventional bumps and microbumps run into fundamental limits in density, power, and signal integrity. Hybrid bonding removes these barriers by allowing dramatically higher interconnect and I/O density, supporting a broad range of chiplet architectures — from memory stacking to tighter compute‑memory integration. EPIC tackles high‑value advanced‑packaging challenges through early, parallel co‑innovation across materials, integration, and manufacturing.Applied Materials As bonded structures like HBM stacks grow larger and more complex, warpage control, die placement, stack alignment, and thermal management become first‑order challenges. EPIC tackles these and other high‑value advanced‑packaging challenges through early, parallel co‑innovation across materials, integration, and manufacturing. Bringing It All Together Across logic, memory, and advanced packaging, our industry faces an ambitious roadmap that promises significant gains in energy efficiency for AI systems. But realizing that potential demands breakthrough materials innovation at a time when feature sizes are shrinking, interfaces are multiplying, and process interdependencies are escalating. These challenges cannot be solved on 10–15‑year timelines under the traditional relay‑race model. We must break down silos, align earlier across the ecosystem, and parallelize learning to keep pace with AI’s demands. In the AI era, progress will be defined by the speed at which lightbulb moments turn into manufacturing and commercialization reality. The only viable path forward is a new innovation model — and EPIC is how we are driving it.
More than 30 years ago, in the mountain village of Mbem in northwest Cameroon, the moon and stars in the night sky were the only light young Jude Numfor knew after the sunset. Electricity had not yet reached his rural community. “There was one person in the village with a petrol generator and a small television,” Numfor says. “When he turned it on, all the children would run to his house and peep through the window.” That memory became the spark for Numfor’s mission: to bring electricity to rural communities like his hometown. To accomplish his goal, in 2006 he cofounded Wireless Light and Power, since renamed Renewable Energy Innovators Cameroon, and he serves as its CEO. REI Cameroon designs, installs, and maintains solar minigrids for rural electrification. The minigrids use photovoltaic technology and battery-energy storage systems to generate electricity at 50 hertz. The electricity is distributed through smart meters. In 2017 the company received a grant from IEEE Smart Village to fund the expansion of REI’s minigrid operations and refine its business model. Smart Village supports projects and organizations bringing electricity and educational and employment opportunities to remote communities worldwide. The program is supported by IEEE societies and donations to the IEEE Foundation. The partnership has led to a collaboration developing open source metering, a free, community-driven way of tracking energy usage. Unlike proprietary utility meters, the system allows users, researchers, and utilities to view, customize, and verify how data is collected, ensuring transparency in billing, consumption tracking, and grid management. Smart Village’s support has been pivotal, Numfor says: “It’s not just about money. We share ideas, we get advice, and we have made friends. Entrepreneurship is lonely, but with the [Smart Village] community, it is different.” From teenage tinkerer to entrepreneur Numfor’s first experience of life with electricity was in 2001, after moving in with a missionary family in the small village of Allat. They used solar panels to power their whole home—an unimaginable luxury in Mbem. “I could watch TV, eat ice cream, and turn on lights,” he says. “It made me wish my brothers in Mbem had the same opportunity.” Numfor’s curiosity about electricity was ignited when a motion-sensor solar light in the family’s home stopped working. He tinkered with the device to find out why. “My missionary family told me to play with it like a toy,” he says, laughingly. “I replaced the dead battery with a motorcycle battery and was able to bring the power back for the night.” Jude Numfor [right] testing a rechargeable solar lantern, which aimed to replace hazardous kerosene lamps—known locally as “bush lamps.”REI Cameroon His missionary parents encouraged Numfor to study technology and engineering on his own, as none of the country’s universities offered solar energy educational programs at the time. They built him a library and stocked it with books on engineering, management, and entrepreneurship. In 2006, armed with his new knowledge, Numfor launched Wireless Light and Power with a friend, Ludwig Teichgraber. The nonprofit aimed to replace hazardous kerosene lamps—known locally as “bush lamps”—with rechargeable solar lanterns. These solar lanterns—called “light packs”—were built locally by Numfor and a team of 11 young Cameroonians using PVC pipes, nickel-metal hydride batteries, and LED bulbs. Families rented the lamps for a small fee, swapping discharged lamps for fully charged ones at solar-powered charging kiosks when they ran out of power. The kiosks then recharged the depleted lamps, making them available for the next swap. “The solar lantern was safer and cleaner, plus it gave children a chance to read at night,” Numfor explains. “People loved them.” Between 2006 and 2010, his team replicated the model across several villages. But when the global financial crisis hit in 2008, donor support dwindled, forcing the organization to evolve. “We pivoted from being an NGO to a commercial venture,” he says. “That’s how REI was born.” Building solar minigrids to serve community needs The new company’s goal was to move away from the lanterns and toward full electrification of communities. Villagers’ aspirations changed, Numfor says, as they now wanted to power their TVs, music systems, and mobile phones. In response, in 2010, REI developed one of the first solar minigrids in West Africa. Using locally procured components, the prototype supplied steady power to six households. The minigrid system used 12 123-watt solar photovoltaic panels manufactured by Sharp, 16 12-volt 100 ampere-hour automatic gain control lead acid batteries, and a Xantrex charge controller and inverter. Locally sourced wooden light poles were erected to distribute electricity throughout the village. REI charged each household a fee for the electricity. “It was a product-market-fit moment,” Numfor says. “People immediately asked, ‘When can we get this, too?’” The word-of-mouth, grassroots growth caught the attention of global partners. Numfor connected with Smart Village and in 2017, REI Cameroon received its first seed grant from the program. With that funding, Numfor was able to grow organically and attract additional grants, including one from the U.S. Trade Development Agency (USTDA), in partnership with the U.S. Department of Energy’s National Renewable Energy Laboratory. REI has since expanded to six villages, providing power to more than 1,000 households and businesses. With a dedicated team of 16 people, the company operates in multiple regions of the country, each with unique terrain, languages, and cultural dynamics. “It wasn’t easy,” he acknowledges. “I’m not an academic person—I had to learn everything by doing. [Smart Village] helped me structure the project and grow as an entrepreneur.” Today, Numfor pays it forward by sharing his Smart Village experience and mentoring new entrepreneurs. Launching a coalition for smart metering Minigrids can’t operate efficiently without clarifying operating rules to ensure quality service requirements and consumer protection, while also enabling reliable and effective monitoring of the system, Numfor says. “We need to know how power is being used, detect problems early, and manage the minigrid from a distance,” he explains. Existing commercial smart-meter providers offer limited and proprietary solutions. One major provider left the market, making their technology infrastructure obsolete. “It’s risky for an entire sector to depend on a few companies for such a critical technology,” Numfor says. In 2025, with the help of the Smart Village technical community, Numfor convened a consortium of open-source power advocates, including the Africa Mini-Grid Developers Association, EnAccess, Energy IOT, and NESL. The goal was to develop an open smart metering system that is accessible, transparent, and sustainable for all energy providers. “These organizations are collaborating as Open Advanced Metering Infrastructure [OpenAMI], which is about giving control back to the people who deliver the energy,” he says. Scaling for impact Numfor’s passion has grown from bringing light to local rural communities to bringing light to his entire country. Just 54 percent of Cameroon’s citizens have access to electricity, according to the International Energy Agency. For Numfor, the challenge is not just technological—it’s social and economic as well. “Electricity is the most important enabler of education and economic growth today,” he says. “When you have power, you unlock everything else.” “Electricity changed my life. Now I want to make sure every child can grow up with that same light.” —Jude Numfor Across the villages where REI has installed sustainable electricity solutions, small businesses are flourishing. Barbershops hum with community chatter, food vendors can preserve perishables, and entrepreneurs run companies such as phone-charging stations and small mills. “Some villages even have laundromats now,” Numfor says proudly. “Electricity creates jobs and changes mindsets.” Still, it has been a bumpy journey. It wasn’t until 2025 that REI obtained its official authorization (license) from Cameroon’s government to produce and distribute electricity in off-grid areas using solar minigrids. This was a major milestone because REI is one of the first private enterprises in the country to receive such authorization. “We were stuck between pilot projects and growth,” he explains. “Our projects were successful, and there was community demand for more, but to grow, we needed investors who require legal guarantees before committing funds. Now we can scale up and attract investors.” REI plans to expand its reach dramatically, beginning with 134 new villages identified through a feasibility study supported by the USTDA. Their long-term goal is to electrify 760 villages across Cameroon by 2031. While authorization opens doors, financing remains one of REI’s biggest challenges. “The minigrid space doesn’t attract venture capitalists easily,” Numfor notes. “Our return on investment is under 15 percent, so it’s not a typical tech startup model. The real return here is the impact” on the community. He hopes to attract investors who understand that access to electricity drives education, health care, and entrepreneurship. “There are people out there who want to make meaningful change,” he says. “We just need to connect with them. When you electrify a village, you never know who the next innovator will be. Maybe it’s another kid like me, looking through a window, dreaming.” Finding skilled staff is another challenge, Numfor says. To address this, REI developed an intensive recruitment and training process. “It used to take years to find the right people,” he says. “Now, we can identify who fits our company culture within six months.” Numfor’s wife, Angela Taliklong, who joined the venture in 2010, now oversees administration and human resources. A brighter Cameroon and beyond Numfor offers simple words of advice to other impact-driven entrepreneurs: Keep moving. “One of my mistakes early on was trying to be perfect,” he says. “I was spending time improving prototypes instead of increasing the number of our project installations and scaling how many communities we could electrify. You must keep momentum. Don’t wait until everything is perfect before you move forward.” That mindset, rooted in resilience and experimentation, has defined his journey. Rajan Kapur, president of Smart Village, says Numfor is a “shining example” of the program’s vision: “scalable and enduring impact through local entrepreneurs, local procurement, and community engagement based on the use of IEEE technology in underserved communities.” With the ongoing Smart Village partnership, Numfor is determined to bring light and opportunity to every corner of Cameroon, and beyond. He already has launched REI Nigeria. “Electricity changed my life,” he says. “Now I want to make sure every child can grow up with that same light.”
When it comes to AI models, size matters. Even though some artificial-intelligence experts warn that scaling up large language models (LLMs) is hitting diminishing performance returns, companies are still coming out with ever larger AI tools. Meta’s latest Llama release had a staggering 2 trillion parameters that define the model. As models grow in size, their capabilities increase. But so do the energy demands and the time it takes to run the models, which increases their carbon footprint. To mitigate these issues, people have turned to smaller, less capable models and using lower-precision numbers whenever possible for the model parameters. But there is another path that may retain a staggeringly large model’s high performance while reducing the time it takes to run an energy footprint. This approach involves befriending the zeros inside large AI models. For many models, most of the parameters—the weights and activations—are actually zero, or so close to zero that they could be treated as such without losing accuracy. This quality is known as sparsity. Sparsity offers a significant opportunity for computational savings: Instead of wasting time and energy adding or multiplying zeros, these calculations could simply be skipped; rather than storing lots of zeros in memory, one need only store the nonzero parameters. Unfortunately, today’s popular hardware, like multicore CPUs and GPUs, do not naturally take full advantage of sparsity. To fully leverage sparsity, researchers and engineers need to rethink and re-architect each piece of the design stack, including the hardware, low-level firmware, and application software. In our research group at Stanford University, we have developed the first (to our knowledge) piece of hardware that’s capable of calculating all kinds of sparse and traditional workloads efficiently. The energy savings varied widely over the workloads, but on average our chip consumed one-seventieth the energy of a CPU, and performed the computation on average eight times as fast. To do this, we had to engineer the hardware, low-level firmware, and software from the ground up to take advantage of sparsity. We hope this is just the beginning of hardware and model development that will allow for more energy-efficient AI. What is sparsity? Neural networks, and the data that feeds into them, are represented as arrays of numbers. These arrays can be one-dimensional (vectors), two-dimensional (matrices), or more (tensors). A sparse vector, matrix, or tensor has mostly zero elements. The level of sparsity varies, but when zeroes make up more than 50 percent of any type of array, it can stand to benefit from sparsity-specific computational methods. In contrast, an object that is not sparse—that is, it has few zeros compared with the total number of elements—is called dense. Sparsity can be naturally present, or it can be induced. For example, a social-network graph will be naturally sparse. Imagine a graph where each node (point) represents a person, and each edge (a line segment connecting the points) represents a friendship. Since most people are not friends with one another, a matrix representing all possible edges will be mostly zeros. Other popular applications of AI, such as other forms of graph learning and recommendation models, contain naturally occurring sparsity as well. Beyond naturally occurring sparsity, sparsity can also be induced within an AI model in several ways. Two years ago, a team at Cerebras showed that one can set up to 70 to 80 percent of parameters in an LLM to zero without losing any accuracy. Cerebras demonstrated these results specifically on Meta’s open-source Llama 7B model, but the ideas extend to other LLM models like ChatGPT and Claude. The case for sparsity Sparse computation’s efficiency stems from two fundamental properties: the ability to compress away zeros and the convenient mathematical properties of zeros. Both the algorithms used in sparse computation and the hardware dedicated to them leverage these two basic ideas. First, sparse data can be compressed, making it more memory efficient to store “sparsely”—that is, in something called a sparse data type. Compression also makes it more energy efficient to move data when dealing with large amounts of it. This is best understood by an example. Take a four-by-four matrix with three nonzero elements. Traditionally, this matrix would be stored in memory as is, taking up 16 spaces. This matrix can also be compressed into a sparse data type, getting rid of the zeros and saving only the nonzero elements. In our example, this results in 13 memory spaces as opposed to 16 for the dense, uncompressed version. These savings in memory increase with increased sparsity and matrix size. In addition to the actual data values, compressed data also requires metadata. The row and column locations of the nonzero elements also must be stored. This is usually thought of as a “fibertree”: The row labels containing nonzero elements are listed and linked to the column labels of the nonzero elements, which are then linked to the values stored in those elements. In memory, things get a bit more complicated still: The row and column labels for each nonzero value must be stored as well as the “segments” that indicate how many such labels to expect, so the metadata and data can be clearly delineated from one another. In a dense, noncompressed matrix data type, values can be accessed either one at a time or in parallel, and their locations can be calculated directly with a simple equation. However, accessing values in sparse, compressed data requires looking up the coordinates of the row index and using that information to “indirectly” look up the coordinates of the column index before finally reaching the value. Depending on the actual locations of the sparse data values, these indirect lookups can be extremely random, making the computation data-dependent and requiring the allocation of memory lookups on the fly. Second, two mathematical properties of zero let software and hardware skip a lot of computation. Multiplying any number by zero will result in a zero, so there’s no need to actually do the multiplication. Adding zero to any number will always return that number, so there’s no need to do the addition either. In matrix-vector multiplication, one of the most common operations in AI workloads, all computations except those involving two nonzero elements can simply be skipped. Take, for example, the four-by-four matrix from the previous example and a vector of four numbers. In dense computation, each element of the vector must be multiplied by the corresponding element in each row and then added together to compute the final vector. In this case, that would take 16 multiplication operations and 16 additions (or four accumulations). In sparse computation, only the nonzero elements of the vector need be considered. For each nonzero vector element, indirect lookup can be used to find any corresponding nonzero matrix element, and only those need to be multiplied and added. In the example shown here, only two multiplication steps will be performed, instead of 16. The trouble with GPUs and CPUs Unfortunately, modern hardware is not well suited to accelerating sparse computation. For example, say we want to perform a matrix-vector multiplication. In the simplest case, in a single CPU core, each element in the vector would be multiplied sequentially and then written to memory. This is slow, because we can do only one multiplication at a time. So instead people use CPUs with vector support or GPUs. With this hardware, all elements would be multiplied in parallel, greatly speeding up the application. Now, imagine that both the matrix and vector contain extremely sparse data. The vectorized CPU and GPU would spend most of their efforts multiplying by zero, performing completely ineffectual computations. Newer generations of GPUs are capable of taking some advantage of sparsity in their hardware, but only a particular kind, called structured sparsity. Structured sparsity assumes that two out of every four adjacent parameters are zero. However, some models benefit more from unstructured sparsity—the ability for any parameter (weight or activation) to be zero and compressed away, regardless of where it is and what it is adjacent to. GPUs can run unstructured sparse computation in software, for example, through the use of the cuSparse GPU library. However, the support for sparse computations is often limited, and the GPU hardware gets underutilized, wasting energy-intensive computations on overhead. Petra Péterffy When doing sparse computations in software, modern CPUs may be a better alternative to GPU computation, because they are designed to be more flexible. Yet, sparse computations on the CPU are often bottlenecked by the indirect lookups used to find nonzero data. CPUs are designed to “prefetch” data based on what they expect they’ll need from memory, but for randomly sparse data, that process often fails to pull in the right stuff from memory. When that happens, the CPU must waste cycles calling for the right data. Apple was the first to speed up these indirect lookups by supporting a method called an array-of-pointers access pattern in the prefetcher of their A14 and M1 chips. Although innovations in prefetching make Apple CPUs more competitive for sparse computation, CPU architectures still have fundamental overheads that a dedicated sparse computing architecture would not, because they need to handle general-purpose computation. Other companies have been developing hardware that accelerates sparse machine learning as well. These include Cerebras’s Wafer Scale Engine and Meta’s Training and Inference Accelerator (MTIA). The Wafer Scale Engine, and its corresponding sparse programming framework, have shown incredibly sparse results of up to 70 percent sparsity on LLMs. However, the company’s hardware and software solutions support only weight sparsity, not activation sparsity, which is important for many applications. The second version of the MTIA claims a sevenfold sparse compute performance boost over the MTIA v1. However, the only publicly available information regarding sparsity support in the MTIA v2 is for matrix multiplication, not for vectors or tensors. Although matrix multiplications take up the majority of computation time in most modern ML models, it’s important to have sparsity support for other parts of the process. To avoid switching back and forth between sparse and dense data types, all of the operations should be sparse. Onyx Instead of these halfway solutions, our team at Stanford has developed a hardware accelerator, Onyx, that can take advantage of sparsity from the ground up, whether it’s structured or unstructured. Onyx is the first programmable accelerator to support both sparse and dense computation; it’s capable of accelerating key operations in both domains. To understand Onyx, it is useful to know what a coarse-grained reconfigurable array (CGRA) is and how it compares with more familiar hardware, like CPUs and field-programmable gate arrays (FPGAs). CPUs, CGRAs, and FPGAs represent a trade-off between efficiency and flexibility. Each individual logic unit of a CPU is designed for a specific function that it performs efficiently. On the other hand, since each individual bit of an FPGA is configurable, these arrays are extremely flexible, but very inefficient. The goal of CGRAs is to achieve the flexibility of FPGAs with the efficiency of CPUs. CGRAs are composed of efficient and configurable units, typically memory and compute, that are specialized for a particular application domain. This is the key benefit of this type of array: Programmers can reconfigure the internals of a CGRA at a high level, making it more efficient than an FPGA but more flexible than a CPU. The Onyx chip, built on a coarse-grained reconfigurable array (CGRA), is the first (to our knowledge) to support both sparse and dense computations. Olivia Hsu Onyx is composed of flexible, programmable processing element (PE) tiles and memory (MEM) tiles. The memory tiles store compressed matrices and other data formats. The processing element tiles operate on compressed matrices, eliminating all unnecessary and ineffectual computation. The Onyx compiler handles conversion from software instructions to CGRA configuration. First, the input expression—for instance, a sparse vector multiplication—is translated into a graph of abstract memory and compute nodes. In this example, there are memories for the input vectors and output vectors, a compute node for finding the intersection between nonzero elements, and a compute node for the multiplication. The compiler figures out how to map the abstract memory and compute nodes onto MEMs and PEs on the CGRA, and then how to route them together so that they can transfer data between them. Finally, the compiler produces the instruction set needed to configure the CGRA for the desired purpose. Since Onyx is programmable, engineers can map many different operations, such as vector-vector element multiplication, or the key tasks in AI, like matrix-vector or matrix-matrix multiplication, onto the accelerator. We evaluated the efficiency gains of our hardware by looking at the product of energy used and the time it took to compute, called the energy-delay product (EDP). This metric captures the trade-off of speed and energy. Minimizing just energy would lead to very slow devices, and minimizing speed would lead to high-area, high-power devices. Onyx achieves up to 565 times as much energy-delay product over CPUs (we used a 12-core Intel Xeon CPU) that utilize dedicated sparse libraries. Onyx can also be configured to accelerate regular, dense applications, similar to the way a GPU or TPU would. If the computation is sparse, Onyx is configured to use sparse primitives, and if the computation is dense, Onyx is reconfigured to take advantage of parallelism, similar to how GPUs function. This architecture is a step toward a single system that can accelerate both sparse and dense computations on the same silicon. Just as important, Onyx enables new algorithmic thinking. Sparse acceleration hardware will not only make AI more performance- and energy efficient but also enable researchers and engineers to explore new algorithms that have the potential to dramatically improve AI. The future with sparsity Our team is already working on next-generation chips built off of Onyx. Beyond matrix multiplication operations, machine learning models perform other types of math, like nonlinear layers, normalization, the softmax function, and more. We are adding support for the full range of computations on our next-gen accelerator and within the compiler. Since sparse machine learning models may have both sparse and dense layers, we are also working on integrating the dense and sparse accelerator architecture more efficiently on the chip, allowing for fast transformation between the different data types. We’re also looking at ways to manage memory constraints by breaking up the sparse data more effectively so we can run computations on several sparse accelerator chips. We are also working on systems that can predict the performance of accelerators such as ours, which will help in designing better hardware for sparse AI. Longer term, we’re interested in seeing whether high degrees of sparsity throughout AI computation will catch on with more model types, and whether sparse accelerators become adopted at a larger scale. Building the hardware to unstructured sparsity and optimally take advantage of zeros is just the beginning. With this hardware in hand, AI researchers and engineers will have the opportunity to explore new models and algorithms that leverage sparsity in novel and creative ways. We see this as a crucial research area for managing the ever-increasing runtime, costs, and environmental impact of AI.
Many of the world’s most advanced electronic systems—including Internet routers, wireless base stations, medical imaging scanners, and some artificial intelligence tools—depend on field-programmable gate arrays. Computer chips with internal hardware circuits, the FPGAs can be reconfigured after manufacturing. On 12 March, an IEEE Milestone plaque recognizing the first FPGA was dedicated at the Advanced Micro Devices campus in San Jose, Calif., the former Xilinx headquarters and the birthplace of the technology. The FPGA earned the Milestone designation because it introduced iteration to semiconductor design. Engineers could redesign hardware repeatedly without fabricating a new chip, dramatically reducing development risk and enabling faster innovation at a time when semiconductor costs were rising rapidly. The ceremony, which was organized by the IEEE Santa Clara Valley Section, brought together professionals from across the semiconductor industry and IEEE leadership. Speakers at the event included Stephen Trimberger, an IEEE and ACM Fellow whose technical contributions helped shape modern FPGA architecture. Trimberger reflected on how the invention enabled software-programmable hardware. Solving computing’s flexibility-performance tradeoff FPGAs emerged in the 1980s to address a core limitation in computing. A microprocessor executes software instructions sequentially, making it flexible but sometimes too slow for workloads requiring many operations at once. At the other extreme, application-specific integrated circuits are chips designed to do only one task. ASICs achieve high efficiency but require lengthy development cycles and nonrecurring engineering costs, which are large, upfront investments. Expenses include designing the chip and preparing it for manufacturing—a process that involves creating detailed layouts, building masks for the fabrication machines, and setting up production lines to handle the tiny circuits. “ASICs can deliver the best performance, but the development cycle is long and the nonrecurring engineering cost can be very high,” says Jason Cong, an IEEE Fellow and professor of computer science at the University of California, Los Angeles. “FPGAs provide a sweet spot between processors and custom silicon.” Cong’s foundational work in FPGA design automation and high-level synthesis transformed how reconfigurable systems are programmed. He developed synthesis tools that translate C/C++ into hardware designs, for example. At the heart of his work is an underlying principle first espoused by electrical engineer Ross Freeman: By configuring hardware using programmable memory embedded inside the chip, FPGAs combine hardware-level speed with the adaptability traditionally associated with software. Silicon Valley origins: the first FPGA The FPGA architecture originated in the mid-1980s at Xilinx, a Silicon Valley company founded in 1984. The invention is widely credited to Freeman, a Xilinx cofounder and the startup’s CTO. He envisioned a chip with circuitry that could be configured after fabrication rather than fixed permanently during creation. Articles about the history of the FPGA emphasize that he saw it as a deliberate break from conventional chip design. At the time, semiconductor engineers treated transistors as scarce resources. Custom chips were carefully optimized so that nearly every transistor served a specific purpose. Freeman proposed a different approach. He figured Moore’s Law would soon change chip economics. The principle holds that transistor counts roughly double every two years, making computing cheaper and more powerful. Freeman posited that as transistors became abundant, flexibility would matter more than perfect efficiency. He envisioned a device composed of programmable logic blocks connected through configurable routing—a chip filled with what he described as “open gates,” ready to be defined by users after manufacturing. Instead of fixing hardware in silicon permanently, engineers could configure and reconfigure circuits as requirements evolved. Freeman sometimes compared the concept to a blank cassette tape: Manufacturers would supply the medium, while engineers determined its function. The analogy captured a profound shift in who controls the technology, shifting hardware design flexibility from chip fabrication facilities to the system designers themselves. In 1985 Xilinx introduced the first FPGA for commercial sale: the XC2064. The device contained 64 configurable logic blocks—small digital circuits capable of performing logical operations—arranged in an 8-by-8 grid. Programmable routing channels allowed engineers to define how signals moved between blocks, effectively wiring a custom circuit with software. Fabricated using a 2-micrometer process (meaning that 2 µm was the minimum size of the features that could be patterned onto silicon using photolithography), the XC2064 implemented a few thousand logic gates. Modern FPGAs can contain hundreds of millions of gates, enabling vastly more complex designs. Yet the XC2064 established a design workflow still used today: Engineers describe the hardware behavior digitally and then “compile the design,” a process that automatically translates the plans into the instructions the FPGA needs to set its logic blocks and wiring, according to AMD. Engineers then load that configuration onto the chip. The breakthrough: hardware defined by memory Earlier programmable logic devices, such as erasable programmable read-only memory, or EPROM, allowed limited customization but relied on largely fixed wiring structures that did not scale well as circuits grew more complex, Cong says. FPGAs introduced programmable interconnects—networks of electronic switches controlled by memory cells distributed across the chip. When powered on, the device loads a bitstream configuration file that determines how its internal circuits behave. “As process technology improved and transistor counts increased, the cost of programmability became much less significant,” Cong says. From “glue logic” to essential infrastructure “Initially, FPGAs were used as what engineers called glue logic,” Cong says. Glue logic refers to simple circuits that connect processors, memory, and peripheral devices so the system works reliably, according to PC Magazine. In other words, it “glues” different components together, especially when interfaces change frequently. Early adopters recognized the advantage of hardware that could adapt as standards evolved. In “The History, Status, and Future of FPGAs,” published in Communications of the ACM, engineers at Xilinx and organizations such as Bell Labs, Fairchild Semiconductor, IBM, and Sun Microsystems said the earliest uses of FPGAs were for prototyping ASICs. They also used it for validating complex systems by running their software before fabrication, allowing the companies to deploy specialized products manufactured in modest volumes. Those uses revealed a broader shift: Hardware no longer needed to remain fixed once deployed. Attendees at the Milestone plaque dedication ceremony included (seated L to R) 2025 IEEE President Kathleen Kramer, 2024 IEEE President Tom Coughlin, and Santa Clara Valley Section Milestones Chair Brian Berg.Douglas Peck/AMD Semiconductor economics changed the equation The rise of FPGAs closely followed changes in semiconductor economics, Cong says. Developing a custom chip requires a large upfront investment before production begins. As fabrication costs increased, products had to ship in large quantities to make ASIC development economically viable, according to a post published by AnySilicon. FPGAs allowed designers to move forward without that larger monetary commitment. ASIC development typically requires 18 to 24 months from conception to silicon, while FPGA implementations often can be completed within three to six months using modern design tools, Cong says. The shorter cycle and the ability to reconfigure the hardware enabled startups, universities, and equipment manufacturers to experiment with advanced architectures that were previously accessible mainly to large chip companies. Lookup tables and the rise of reconfigurable computing A popular technique for implementing mathematical functions in hardware is the lookup table (LUT). A LUT is a small memory element that stores the results of logical operations, according to “LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs,” a paper selected for presentation next month at the 34th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM). Instead of repeatedly recalculating outcomes, the chip retrieves answers directly from memory. Cong compares the approach to consulting multiplication tables rather than recomputing the arithmetic each time. Research led by Cong and others helped develop efficient methods for mapping digital circuits onto LUT-based architectures, shaping routing and layout strategies used in modern devices. As transistor budgets expanded, FPGA vendors integrated memory blocks, digital signal-processing units, high-speed communication interfaces, cryptographic engines, and embedded processors, transforming the devices into versatile computing platforms. Why the gate arrays are distinct from CPUs, GPUs, and ASICs FPGAs coexist with other processors because each one optimizes different priorities. Central processing units excel at general computing. Graphics processing units, designed to perform many calculations simultaneously, dominate large parallel workloads such as AI training. ASICs provide maximum efficiency when designs remain stable and production volumes are high. “ASICs can deliver the best performance, but the development cycle is long, and the nonrecurring engineering cost can be very high. FPGAs provide a sweet spot between processors and custom silicon.” —Jason Cong, IEEE Fellow and professor of computer science at UCLA. “FPGAs are not replacements for CPUs or GPUs,” Cong says. “They complement those processors in heterogeneous computing systems.” Modern computing platforms increasingly combine multiple types of processors to balance flexibility, performance, and energy efficiency. A Milestone for an idea, not just a device This IEEE Milestone recognizes more than a successful semiconductor product. It also acknowledges a shift in how engineers innovate. Reconfigurable hardware allows designers to test ideas quickly, refine architectures, and deploy systems while standards and markets evolve. “Without FPGAs,” Cong says, “the pace of hardware innovation would likely be much slower.” Four decades after the first FPGA appeared, the technology’s enduring legacy reflects Freeman’s insight: Hardware did not need to remain fixed. By accepting a small amount of unused silicon in exchange for adaptability, engineers transformed chips from static products into platforms for continuous experimentation—turning silicon itself into a medium engineers could rewrite. Among those who attended the Milestone ceremony were 2025 IEEE President Kathleen Kramer; 2024 IEEE President Tom Coughlin; Avery Lu, chair of the IEEE Santa Clara Valley Section; and Brian Berg, history and milestones chair of IEEE Region 6. They joined AMD’s chief executive, Lisa Su, and Salil Raje, senior vice president and general manager of adaptive and embedded computing at AMD. The IEEE Milestone plaque honoring the field-programmable gate array reads: “The FPGA is an integrated circuit with user-programmable Boolean logic functions and interconnects. FPGA inventor Ross Freeman cofounded Xilinx to productize his 1984 invention, and in 1985 the XC2064 was introduced with 64 programmable 4-input logic functions. Xilinx’s FPGAs helped accelerate a dramatic industry shift wherein ‘fabless’ companies could use software tools to design hardware while engaging ‘foundry’ companies to handle the capital-intensive task of manufacturing the software-defined hardware.” Administered by the IEEE History Center and supported by donors, the IEEE Milestone program recognizes outstanding technical developments worldwide that are at least 25 years old. Check out Spectrum’s History of Technology channel to read more stories about key engineering achievements.