AI and bot internet activity outpaces human use for first time in history: ‘On the other side now’
Traffic from automated AI programs for human users also rose by about 8,000% last year, CNBC reported.
IT/기술 · "ACTIVITY" · 총 15건
필터 보기현재 지수
50.3
0 = 부정 우세
50 = 중립
100 = 긍정 우세
최근 7일 기준 85,132건을 분석한 결과, 뉴스 심리지수는 50.3(균형)입니다. 긍정 4,306건(5.1%)·중립 78,811건(92.6%)·부정 2,015건(2.4%)이며, 중립 비중이 뚜렷하게 높습니다. 성향 지수는 종합 15.0(중도 균형)입니다.
Traffic from automated AI programs for human users also rose by about 8,000% last year, CNBC reported.
Automated bot and AI agent traffic has overtaken human-generated web traffic for the first time ever, reaching this milestone earlier than industry experts anticipated, according to data from major internet hosting service Cloudflare. The post Cloudflare: Web Traffic from AI & Bots Surpasses Human Internet Activity for First Time in History appeared first on Breitbart.
Tech biz teaching AI to use computers by slurping staff activity
"It’s a reminder of how human activity is changing the natural world in unanticipated ways.”
Opera makers have always engaged with the latest inventions while also preserving historic crafts. I believe it’s possible to look both forwards and backwards in this fast-evolving landscape The disquiet and distrust surrounding artificial intelligence among artists and creatives remain real and consequential, and the language used by leading arts commentators is often apocalyptic: AI will decimate the arts, it is evil, it is the devil. Like many emerging technologies, AI has been driven by the corporations at the forefront of its creation. Introduced to the public at a rapid rate and continuously evolving, machine learning has become closely entwined with fear, antipathy and foreboding. At the same time, its powers and possibilities are expanding exponentially, becoming embedded in almost every aspect of human activity. The upcoming RBO/SHIFT festival at the Royal Opera House aims to interrogate all sides of this fast-evolving landscape to enable artists, performers, creatives and audiences to think deeply and widely about where we are now, and where we may be tomorrow. Machine learning represents a seismic shift, both in society and in the arts, and we need storytellers, artists, teachers and thinkers in this space to help determine the direction of that shift and help us navigate this unfamiliar territory. Continue reading...
Wall Street stocks posted modest gains on Monday as investors watched developments in U.S.-Iran peace negotiations and cheered the unveiling of a new computer chip that promises to bring artificial intelligence to personal computing.Tech shares boosted the Nasdaq and the S&P 500 to their latest in a series of record closing highs.U.S. President Donald Trump said talks with Iran continue. Earlier, Iran's news agency announced Tehran is halting indirect negotiations with Washington after a new round of strikes threatened to derail diplomatic efforts to end the war, now in its fourth month.The intensification of hostilities sent crude prices jumping, along with worries over the extent to which a protracted war could result in heightened, intransitory inflation."We don't really know where things stand," said Thomas Martin, senior portfolio manager at GLOBALT in Atlanta. "The market seems to think that something's going to get done at some point, but we don't have very good information to go on, like what the Iranians really want and what Trump is willing to settle for."Stocks added to their gains after Trump said no Israeli troops would go into Beirut, citing a call with Israeli Prime Minister Benjamin Netanyahu.Nvidia jumped after the company unveiled a new chip that puts AI capabilities directly into personal computers.The chip is the result of a three-year partnership with Microsoft to "reinvent the PC" for the AI era, Nvidia CEO Jensen Huang said. Microsoft shares rose.The reaction among semiconductor stocks was mixed. Qualcomm tumbled and while Intel also fell. Micron shares rose sharply, breaching the $1,000 mark for the first time.The Philadelphia SE Semiconductor Index advanced.In economic news, U.S. factory activity expanded in May for the fifth consecutive month as goods-makers navigate tariff and geopolitical crosswinds.Investors will turn to Friday's jobs report ahead of Kevin Warsh's debut policy meeting as chairman of the U.S. Federal Reserve this month, amid fears of rising inflation linked to the Iran war that could upend the stock market rally.According to preliminary data, the S&P 500 gained 20.19 points, or 0.27%, to end at 7,600.03 points, while the Nasdaq Composite gained 114.75 points, or 0.43%, to 27,087.37. The Dow Jones Industrial Average rose 44.70 points, or 0.09%, to 51,076.85.Software stocks rebounded from heavy selling earlier this year on AI disruption fears. ServiceNow and IBM rose sharply. The software services index advanced."On the software side, companies that hadn't been doing very well, but now are doing well today," Martin added. "Some of that has been attributed to Nvidia comments that software is part of the solution, so the market's coming back to" software stocks.Cadence Design Systems jumped after launching an Nvidia-powered AI agent for chip design.Broadcom's earnings, due on Wednesday, will be closely parsed in the wake of solid results from Dell last week, which signaled strong AI server demand.
“Not in my backyard” is the rallying cry of citizens everywhere resisting projects proposed for their locality. Whether it’s affordable housing, a waste treatment plant, or a new data center, they may recognize the benefit of the activity. They just don’t want it near them. And the roots of that resistance differ from place to place. When it comes to the ongoing transition from fossil fuels to renewables, companies and policymakers need to know where, exactly, people are coming from. The Italian island of Sardinia is a textbook example. As IEEE Spectrum’s power and energy editor Emily Waltz discovered when she traveled there last October, Sardinian opposition to wind and solar projects runs deep. It spurred a quarter of the voting population to queue up in public squares in 2024 to sign a petition banning all construction of renewable energy. Waltz was surprised. She went there to see a promising new grid-scale energy storage system that uses domes inflated with carbon dioxide. While reporting on that project, she interviewed residents, engineers, activists, and professors about their attitudes toward climate change and the Italian government’s grand plans for renewable energy on the island. And Waltz soon learned of Sardinians’ profound antipathy toward renewable energy and its deep ties to a history of invasion, occupation, and exploitation stretching back 2,700 years. It started with the Phoenicians and then extended through the Romans, the Byzantines, and the Iberians. Sardinia was absorbed into a newly unified Italy in 1861, and it became an autonomous region of Italy in 1948. The island’s population is justifiably suspicious of outsiders, including the Italian government. “When you’re in Sardinia, the weight of history—you can feel it like in the air,” Waltz told me. “And it gets passed down from one generation to the next.” Now, Italy needs Sardinia to produce even more power to meet the country’s climate goals—something that Sardinians see as Rome’s problem, not theirs. “Sardinia already exports about 30 percent of its electricity. It’s not like they need more,” Waltz says. “So it’s hard to make the case to build, build, build.” The result of Waltz’s old-fashioned shoe leather reporting is this month’s cover story. She notes that the Sardinians she talked to aren’t climate-change deniers, and they don’t object to renewables per se. They just don’t like the way corporations and Italian policymakers are trying to plug into Sardinia like it’s one giant battery rather than the home of an ancient and proud people. “I think Sardinians would be more receptive to renewable projects if it was more of a ground-up, grassroots approach,” Waltz says. Indeed, this homegrown approach is already working in some places in Sardinia. She knows of more than 50 projects, called energy communities, where the residents are deploying renewables themselves. The idea also holds promise for other places struggling to get locals to buy into the renewable-energy transition. The Sardinian experience is both a cautionary tale and a blueprint. Ignore the weight of history that communities carry and your project risks failure. Meet the people where they are and you might just get somewhere. The same lesson applies whether you’re in Sulawesi or sub-Saharan Africa. You just have to show up to learn it.
Thanks to the newly detailed FROST technique, telltale SSD activity can be measured in the browser using simple JavaScript.
US-listed cryptocurrency exchange Coinbase is allowing users in India to make trades using the rupee, marking a key expansion of its services in Asia’s third-largest economy. Customers can deposit and withdraw rupees through the so-called immediate payment service channel, the company said on Monday. They will also have access to spot trading across a range of assets, alongside perpetual futures contracts covering major crypto assets, the company said. Coinbase, which had discontinued its services in India in 2023, resumed crypto trading last year after registering with the Financial Intelligence Unit. “India has long been one of the most important markets in crypto: in terms of developer talent, trading activity, and the broader adoption of blockchain technology,” said John OLoghlen, Coinbase’s regional managing director for Asia Pacific. India requires crypto exchanges to comply with its anti-money laundering rules. The country levies a 30 per cent tax on crypto trading gains, among the highest globally, but has yet to outline regulations for the asset class.
Comments
“The demand for bitcoin to naira conversion is no longer driven mainly by trading activity. More users now rely on crypto for payments, business transactions and remote work income” The post Bitcoin to naira demand rises as Nigerian freelancers seek faster crypto payouts appeared first on Premium Times Nigeria.
Telltale SSD activity can be measured in the browser using simple JavaScript.
The OnCampus program, administered by IEEE Educational Activities, last year expanded its engineering experiences from two to seven universities. Part of TryEngineering, the program is held at universities around the world, offering preuniversity students hands-on opportunities to solve engineering problems. The IEEE Innovation Committee provided funding for the additional locations. New participating institutions The electrical engineering and computing faculty at the University of Zagreb, in Croatia, hosted a two-day program in June. Twenty-five children ages 10 to 14 participated in lectures and workshops on artificial intelligence, computer science, robotics, and astronomy. Tomislav Jagušt, an IEEE senior member and the chair of the IEEE preuniversity coordinating committee, led the program. In September the Arab Academy for Science, Technology, and Maritime Transport’s engineering college held a two-day session at its Abu Kir, Egypt, campus. Fifty students participated in hands-on activities on Ohm’s law, radio communications, and circuit building. They also learned from professors about engineering careers and job opportunities. Also in September, the Majan University College, in Muscat, Oman, hosted 40 high school students who competed in six challenges to design and build circuits. These include an IoT design and an LED brightness control using a potentiometer, a three-terminal, manually adjustable resistor that functions as a variable voltage divider. The program also highlighted AI and quantum computing technologies and introduced students to job opportunities in the fields. The workshop transformed curiosity into creation, empowering students with technical skills and confidence in emerging technologies. In November at the Universiti Malaysia Perlis, in Arau, 50 students explored the fundamentals of quantum computational intelligence and AI through hands-on activities and interactive simulations. IEEE Senior Member Mohd Hafiz Ismail, a professor of electronic engineering and technology, gave an introduction about quantum computing intelligence technology. The Hellenic Robotics Center of Excellence at the National Technical University of Athens hosted a two-day session in December. Twenty-five students explored robotics and AI through hands-on design challenges such as TryEngineering’s AI and machine learning methods. They also toured the university’s research facilities. Hong Kong and Greek universities participate again The City University and St. Francis University in Hong Kong, and the University of Ioannina, Arta campus, Greece, participated in the program for a second year. Under the leadership of IEEE Senior Member Paulina Chan and volunteers from the IEEE Hong Kong Section, the City and St. Francis universities jointly held the program in July. They welcomed 55 students ages 12 to 18 from 41 schools. The students attended tutorials on foundational concepts and theories of AI. They worked in small teams on projects using AI-generated images, voice, and music manipulations. They were coached by students from St. Francis and Imperial College London. The participants presented their projects to judges, teachers, and parents. The students also visited a nearby semiconductor equipment manufacturer to learn about technology careers from engineers working there. The results of a post-program survey showed strong satisfaction with OnCampus, with nearly 75 percent of participants giving it a rating of 4 or higher out of 5. “I enjoyed getting to know about deep learning and its application,” one student participant said. “The content of the activity matched my interest, and I gained new knowledge.” “OnCampus is led by a strong team with lots of experts in the field,” another said. “It’s a rare chance for students to use software, learn about the theory behind how deep learning works, and get a glance at future possibilities.” The University of Ioannina hosted the program in Arta in July with support from IEEE Senior Member Stamatis Dragoumanos and IEEE members Nikos Giannakeas and Eleftheria Kallinikou. Nearly 50 students, ages 12 to 16, attended the seven-day event, supported by 17 instructors and six volunteers from the university’s IEEE student branch. The students learned about AI, augmented reality, microchip design, microcontrollers, and 3D printing. They also attended presentations by engineers from the industry. To give the students exposure to real-world engineering, they visited two hydroelectric power plants and a green data center. At the end of the program, students presented their projects and showcased the technical skills they had developed. Those involved in the TryEngineering OnCampus program are proud of the impactful experiences students have gained. The opportunities are possible because universities open their doors, share their expertise, and invest in the next generation of innovators. The University of Zagreb, the Arab Academy for Science, Technology, and Maritime Transport, the Majan University College, and The City University and St. Francis University will be participating again this year. To learn how you can bring the OnCampus program to your educational institution, send a request to tryengineering@ieee.org.
In the late 1940s—when computer engineers were grappling with unreliable hardware and noisy transmission environments—a team of engineers inside a modest lab at the University of Manchester, England, confronted a problem so fundamental that it threatened the viability of digital computing itself. Machines could generate bits, but they could not reliably read them back. The inconsistent reading back of memory data did not initially present itself as a grand theoretical challenge. It showed up as something more mundane: inconsistent computing results. Engineers including Frederic C. Williams, Tom Kilburn, and G. E. (Tommy) Thomas traced the failures not to logic errors but to the physical behavior of the machines themselves. The team devised a technique for keeping a transmitter and a receiver synchronized without relying on a separate clock signal. Their innovation, known as Manchester code or phase encoding, encoded each bit with a transition in the middle of the bit period, effectively embedding timing information directly into the data stream to be a self-clocking signal. So, even if the signal degraded or the timing drifted slightly, the receiver could continually keep time based on those regular transitions. By eliminating the need for separate clocks and reducing synchronization errors, Manchester code made data transfer more robust across cables and circuits. Those qualities later made it a natural fit for technologies such as Ethernet and early data storage systems. Its self-clocking nature helped standardize how machines communicate, and it laid the groundwork for modern networking and digital communication protocols. On 13 April 2026, this breakthrough was honored with an IEEE Milestone plaque during a ceremony at the University of Manchester. Dignitaries from IEEE and the university attended the ceremony. Embedding timing in signals Those 1940s Manchester University engineers were working on systems that fed into the Manchester Mark I, one of the first practical stored-program machines. When troubles arose, they used oscilloscopes to probe signals. They found that electrical pulses did not arrive with consistent timing. Memory signals also blurred over time, making them harder to read, and when long runs of identical bits occurred, the waveform flattened into stretches with no transitions. That led to a crucial insight: The problem was not just detecting whether a signal was high or low; the system also lost track of when to sample the signal. Without reliable timing markers, even correctly formed signals were misread. Bits could effectively be lost or miscounted because the system fell out of sync. At first, the engineers tried to tame the hardware. They experimented with stabilizing circuits and more consistent pulse generation, attempting to impose a regular rhythm on an inherently unstable system. But the fixes proved fragile, and the electronics of the day could not maintain the required precision. So the Manchester group took a different approach. If the hardware could not provide a dependable clock, the signal itself would have to carry one. Instead of representing data as static levels, each bit changed state, with a guaranteed transition in the middle. Embedding timing in the signal reduced erratic behavior. Machines were suddenly able to reliably transmit, store, and read back data—an essential step toward practical stored-program computing. Making signals unmistakable The Manchester code addressed several issues at once. Regular transitions allowed continuous timing recovery. Transitions proved easier to detect than static levels, and long runs of identical bits no longer produced flat, ambiguous waveforms. Rather than fighting the imperfections of early electronics, the design worked with them. From lab curiosity to a global standard What began as a local solution in Manchester shaped digital communication systems for decades, including early Ethernet technology, for which timing and shared-medium communication were central challenges. According to Robert Metcalfe, a member of the team that built the first Ethernet system at Xerox PARC in 1973, he and his colleagues relied on Manchester code. “Manchester code solved a fundamental problem for us: timing,” Metcalfe says, explaining that each bit carried its own clock and removed the need for a global synchronized signal. That self-clocking property wasn’t the only benefit provided by the encoding scheme. On a shared coaxial cable, Manchester encoding did more than provide timing. Each transceiver left the medium undriven—effectively “off”—most of the time, allowing packets from other machines to pass without interference. Even during transmission, a station drove the signal only about half the time, leaving the line undriven during the other half of each bit cycle. This distinction—between a driven signal and an undriven line, rather than simple 1s and 0s—allowed receivers to recover both data and clock timing while also monitoring the cable for other activity. If a transceiver detected a signal when it expected the line to be undriven, the signal indicated that another station was transmitting at the same time. In other words, the system could detect collisions in real time and respond accordingly. The idea has proven durable far beyond local networks. Manchester code is being used aboard the Voyager spacecraft, which are now cruising through interstellar space—underscoring its reliability in extreme environments. The code also has found its way into everyday consumer electronics. Infrared remote controls for televisions and audio equipment commonly rely on Manchester code through protocols such as RC-5, developed by Philips in the early 1980s. The protocol encodes commands as timed infrared signals transmitted by a handset’s integrated circuit and LED, allowing devices to reliably interpret button presses even through noise and signal distortion. Manufacturers across Europe—and many in the United States—adopted the approach, extending Manchester code into the home. Why the Milestone matters An IEEE Milestone designation recognizes technologies with enduring impact. Manchester code qualifies because it solved a foundational timing problem at a critical moment in computing history. Without a way to embed timing in the data itself, early digital systems would have remained fragile and unreliable. Manchester code helped transform them into dependable machines, and it enabled much of today’s digital communication. “Manchester code solved a fundamental problem for us: timing,” —Robert Metcalfe, an Ethernet inventor Key participants at the plaque dedication ceremony included Tom Coughlin, 2024 IEEE president; Duncan Ivison, University of Manchester president and vice chancellor, and Nagham Saeed, chair of the IEEE U.K. and Ireland Section. Talks by Kees Schouhamer Immink (the 2017 IEEE Medal of Honor laureate probably best known for his work that made compact discs and other high-density digital media practical) and Peter Green (Manchester’s deputy dean for the engineering faculty) highlighted the code’s lasting impact on digital data storage and communications. The IEEE Milestone plaque for the Manchester code reads: “At this site in 1948–1949, Manchester code was invented for reliably encoding digital data stored on the Manchester Mark I computer’s magnetic drum. It became a standard for computer magnetic tapes and floppy disks and was used in digital communications, including the Voyager 1 and 2 spacecraft and early Ethernet networks. It found wide use in domestic remote controllers, radio frequency identification (RFID) tags, and many control network standards.” Administered by the IEEE History Center and supported by donors, the Milestone program recognizes outstanding technical developments worldwide. The IEEE U.K. and Ireland Section sponsored the nomination.
Andrew Ng has serious street cred in artificial intelligence. He pioneered the use of graphics processing units (GPUs) to train deep learning models in the late 2000s with his students at Stanford University, cofounded Google Brain in 2011, and then served for three years as chief scientist for Baidu, where he helped build the Chinese tech giant’s AI group. So when he says he has identified the next big shift in artificial intelligence, people listen. And that’s what he told IEEE Spectrum in an exclusive Q&A. Ng’s current efforts are focused on his company Landing AI, which built a platform called LandingLens to help manufacturers improve visual inspection with computer vision. He has also become something of an evangelist for what he calls the data-centric AI movement, which he says can yield “small data” solutions to big issues in AI, including model efficiency, accuracy, and bias. Andrew Ng on... What’s next for really big models The career advice he didn’t listen to Defining the data-centric AI movement Synthetic data Why Landing AI asks its customers to do the work The great advances in deep learning over the past decade or so have been powered by ever-bigger models crunching ever-bigger amounts of data. Some people argue that that’s an unsustainable trajectory. Do you agree that it can’t go on that way? Andrew Ng: This is a big question. We’ve seen foundation models in NLP [natural language processing]. I’m excited about NLP models getting even bigger, and also about the potential of building foundation models in computer vision. I think there’s lots of signal to still be exploited in video: We have not been able to build foundation models yet for video because of compute bandwidth and the cost of processing video, as opposed to tokenized text. So I think that this engine of scaling up deep learning algorithms, which has been running for something like 15 years now, still has steam in it. Having said that, it only applies to certain problems, and there’s a set of other problems that need small data solutions. When you say you want a foundation model for computer vision, what do you mean by that? Ng: This is a term coined by Percy Liang and some of my friends at Stanford to refer to very large models, trained on very large data sets, that can be tuned for specific applications. For example, GPT-3 is an example of a foundation model [for NLP]. Foundation models offer a lot of promise as a new paradigm in developing machine learning applications, but also challenges in terms of making sure that they’re reasonably fair and free from bias, especially if many of us will be building on top of them. What needs to happen for someone to build a foundation model for video? Ng: I think there is a scalability problem. The compute power needed to process the large volume of images for video is significant, and I think that’s why foundation models have arisen first in NLP. Many researchers are working on this, and I think we’re seeing early signs of such models being developed in computer vision. But I’m confident that if a semiconductor maker gave us 10 times more processor power, we could easily find 10 times more video to build such models for vision. Having said that, a lot of what’s happened over the past decade is that deep learning has happened in consumer-facing companies that have large user bases, sometimes billions of users, and therefore very large data sets. While that paradigm of machine learning has driven a lot of economic value in consumer software, I find that that recipe of scale doesn’t work for other industries. Back to top It’s funny to hear you say that, because your early work was at a consumer-facing company with millions of users. Ng: Over a decade ago, when I proposed starting the Google Brain project to use Google’s compute infrastructure to build very large neural networks, it was a controversial step. One very senior person pulled me aside and warned me that starting Google Brain would be bad for my career. I think he felt that the action couldn’t just be in scaling up, and that I should instead focus on architecture innovation. “In many industries where giant data sets simply don’t exist, I think the focus has to shift from big data to good data. Having 50 thoughtfully engineered examples can be sufficient to explain to the neural network what you want it to learn.” —Andrew Ng, CEO & Founder, Landing AI I remember when my students and I published the first NeurIPS workshop paper advocating using CUDA, a platform for processing on GPUs, for deep learning—a different senior person in AI sat me down and said, “CUDA is really complicated to program. As a programming paradigm, this seems like too much work.” I did manage to convince him; the other person I did not convince. I expect they’re both convinced now. Ng: I think so, yes. Over the past year as I’ve been speaking to people about the data-centric AI movement, I’ve been getting flashbacks to when I was speaking to people about deep learning and scalability 10 or 15 years ago. In the past year, I’ve been getting the same mix of “there’s nothing new here” and “this seems like the wrong direction.” Back to top How do you define data-centric AI, and why do you consider it a movement? Ng: Data-centric AI is the discipline of systematically engineering the data needed to successfully build an AI system. For an AI system, you have to implement some algorithm, say a neural network, in code and then train it on your data set. The dominant paradigm over the last decade was to download the data set while you focus on improving the code. Thanks to that paradigm, over the last decade deep learning networks have improved significantly, to the point where for a lot of applications the code—the neural network architecture—is basically a solved problem. So for many practical applications, it’s now more productive to hold the neural network architecture fixed, and instead find ways to improve the data. When I started speaking about this, there were many practitioners who, completely appropriately, raised their hands and said, “Yes, we’ve been doing this for 20 years.” This is the time to take the things that some individuals have been doing intuitively and make it a systematic engineering discipline. The data-centric AI movement is much bigger than one company or group of researchers. My collaborators and I organized a data-centric AI workshop at NeurIPS, and I was really delighted at the number of authors and presenters that showed up. You often talk about companies or institutions that have only a small amount of data to work with. How can data-centric AI help them? Ng: You hear a lot about vision systems built with millions of images—I once built a face recognition system using 350 million images. Architectures built for hundreds of millions of images don’t work with only 50 images. But it turns out, if you have 50 really good examples, you can build something valuable, like a defect-inspection system. In many industries where giant data sets simply don’t exist, I think the focus has to shift from big data to good data. Having 50 thoughtfully engineered examples can be sufficient to explain to the neural network what you want it to learn. When you talk about training a model with just 50 images, does that really mean you’re taking an existing model that was trained on a very large data set and fine-tuning it? Or do you mean a brand new model that’s designed to learn only from that small data set? Ng: Let me describe what Landing AI does. When doing visual inspection for manufacturers, we often use our own flavor of RetinaNet. It is a pretrained model. Having said that, the pretraining is a small piece of the puzzle. What’s a bigger piece of the puzzle is providing tools that enable the manufacturer to pick the right set of images [to use for fine-tuning] and label them in a consistent way. There’s a very practical problem we’ve seen spanning vision, NLP, and speech, where even human annotators don’t agree on the appropriate label. For big data applications, the common response has been: If the data is noisy, let’s just get a lot of data and the algorithm will average over it. But if you can develop tools that flag where the data’s inconsistent and give you a very targeted way to improve the consistency of the data, that turns out to be a more efficient way to get a high-performing system. “Collecting more data often helps, but if you try to collect more data for everything, that can be a very expensive activity.” —Andrew Ng For example, if you have 10,000 images where 30 images are of one class, and those 30 images are labeled inconsistently, one of the things we do is build tools to draw your attention to the subset of data that’s inconsistent. So you can very quickly relabel those images to be more consistent, and this leads to improvement in performance. Could this focus on high-quality data help with bias in data sets? If you’re able to curate the data more before training? Ng: Very much so. Many researchers have pointed out that biased data is one factor among many leading to biased systems. There have been many thoughtful efforts to engineer the data. At the NeurIPS workshop, Olga Russakovsky gave a really nice talk on this. At the main NeurIPS conference, I also really enjoyed Mary Gray’s presentation, which touched on how data-centric AI is one piece of the solution, but not the entire solution. New tools like Datasheets for Datasets also seem like an important piece of the puzzle. One of the powerful tools that data-centric AI gives us is the ability to engineer a subset of the data. Imagine training a machine-learning system and finding that its performance is okay for most of the data set, but its performance is biased for just a subset of the data. If you try to change the whole neural network architecture to improve the performance on just that subset, it’s quite difficult. But if you can engineer a subset of the data you can address the problem in a much more targeted way. When you talk about engineering the data, what do you mean exactly? Ng: In AI, data cleaning is important, but the way the data has been cleaned has often been in very manual ways. In computer vision, someone may visualize images through a Jupyter notebook and maybe spot the problem, and maybe fix it. But I’m excited about tools that allow you to have a very large data set, tools that draw your attention quickly and efficiently to the subset of data where, say, the labels are noisy. Or to quickly bring your attention to the one class among 100 classes where it would benefit you to collect more data. Collecting more data often helps, but if you try to collect more data for everything, that can be a very expensive activity. For example, I once figured out that a speech-recognition system was performing poorly when there was car noise in the background. Knowing that allowed me to collect more data with car noise in the background, rather than trying to collect more data for everything, which would have been expensive and slow. Back to top What about using synthetic data, is that often a good solution? Ng: I think synthetic data is an important tool in the tool chest of data-centric AI. At the NeurIPS workshop, Anima Anandkumar gave a great talk that touched on synthetic data. I think there are important uses of synthetic data that go beyond just being a preprocessing step for increasing the data set for a learning algorithm. I’d love to see more tools to let developers use synthetic data generation as part of the closed loop of iterative machine learning development. Do you mean that synthetic data would allow you to try the model on more data sets? Ng: Not really. Here’s an example. Let’s say you’re trying to detect defects in a smartphone casing. There are many different types of defects on smartphones. It could be a scratch, a dent, pit marks, discoloration of the material, other types of blemishes. If you train the model and then find through error analysis that it’s doing well overall but it’s performing poorly on pit marks, then synthetic data generation allows you to address the problem in a more targeted way. You could generate more data just for the pit-mark category. “In the consumer software Internet, we could train a handful of machine-learning models to serve a billion users. In manufacturing, you might have 10,000 manufacturers building 10,000 custom AI models.” —Andrew Ng Synthetic data generation is a very powerful tool, but there are many simpler tools that I will often try first. Such as data augmentation, improving labeling consistency, or just asking a factory to collect more data. Back to top To make these issues more concrete, can you walk me through an example? When a company approaches Landing AI and says it has a problem with visual inspection, how do you onboard them and work toward deployment? Ng: When a customer approaches us we usually have a conversation about their inspection problem and look at a few images to verify that the problem is feasible with computer vision. Assuming it is, we ask them to upload the data to the LandingLens platform. We often advise them on the methodology of data-centric AI and help them label the data. One of the foci of Landing AI is to empower manufacturing companies to do the machine learning work themselves. A lot of our work is making sure the software is fast and easy to use. Through the iterative process of machine learning development, we advise customers on things like how to train models on the platform, when and how to improve the labeling of data so the performance of the model improves. Our training and software supports them all the way through deploying the trained model to an edge device in the factory. How do you deal with changing needs? If products change or lighting conditions change in the factory, can the model keep up? Ng: It varies by manufacturer. There is data drift in many contexts. But there are some manufacturers that have been running the same manufacturing line for 20 years now with few changes, so they don’t expect changes in the next five years. Those stable environments make things easier. For other manufacturers, we provide tools to flag when there’s a significant data-drift issue. I find it really important to empower manufacturing customers to correct data, retrain, and update the model. Because if something changes and it’s 3 a.m. in the United States, I want them to be able to adapt their learning algorithm right away to maintain operations. In the consumer software Internet, we could train a handful of machine-learning models to serve a billion users. In manufacturing, you might have 10,000 manufacturers building 10,000 custom AI models. The challenge is, how do you do that without Landing AI having to hire 10,000 machine learning specialists? So you’re saying that to make it scale, you have to empower customers to do a lot of the training and other work. Ng: Yes, exactly! This is an industry-wide problem in AI, not just in manufacturing. Look at health care. Every hospital has its own slightly different format for electronic health records. How can every hospital train its own custom AI model? Expecting every hospital’s IT personnel to invent new neural-network architectures is unrealistic. The only way out of this dilemma is to build tools that empower the customers to build their own models by giving them tools to engineer the data and express their domain knowledge. That’s what Landing AI is executing in computer vision, and the field of AI needs other teams to execute this in other domains. Is there anything else you think it’s important for people to understand about the work you’re doing or the data-centric AI movement? Ng: In the last decade, the biggest shift in AI was a shift to deep learning. I think it’s quite possible that in this decade the biggest shift will be to data-centric AI. With the maturity of today’s neural network architectures, I think for a lot of the practical applications the bottleneck will be whether we can efficiently get the data we need to develop systems that work well. The data-centric AI movement has tremendous energy and momentum across the whole community. I hope more researchers and developers will jump in and work on it. Back to top This article appears in the April 2022 print issue as “Andrew Ng, AI Minimalist.”