Quantcast
Channel: DBA Consulting Blog

WHAT IS LF EDGE

$
0
0

 

WHAT IS LF EDGE

LF Edge is an umbrella organization that aims to establish an open, interoperable framework for edge computing independent of hardware, silicon, cloud, or operating system. By bringing together industry leaders, LF Edge will create a common framework for hardware and software standards and best practices critical to sustaining current and future generations of IoT and edge devices.

We are fostering collaboration and innovation across the multiple industries including industrial manufacturing, cities and government, energy, transportation, retail, home and building automation, automotive, logistics and health care — all of which stand to be transformed by edge computing.

What is. LF Edge

Project EVE Promotes Cloud-Native Approach to Edge Computing

The LF Edge umbrella organization for open source edge computing that was announced by The Linux Foundation last week includes two new projects: Samsung Home Edge and Project EVE. We don’t know much about Samsung’s project for home automation, but we found out more about Project EVE, which is based on Zededa’s edge virtualization technology. Last week, we spoke with Zededa co-founder Roman Shaposhnik about Project EVE, which provides a cloud-native based virtualization engine for developing and deploying containers for industrial edge computers (see below).

LF Edge aims to establish “an open, interoperable framework for edge computing independent of hardware, silicon, cloud, or operating system.” It is built around The Linux Foundation’s telecom-oriented Akraino Edge Stack, as well as its EdgeX Foundry, an industrial IoT middleware project..

Like the mostly proprietary cloud-to-edge platforms emerging from Google (Google Cloud IoT Edge), Amazon (AWS IoT), Microsoft (Azure Sphere), and most recently Baidu (Open Edge), among others, the LF Edge envisions a world where software running on IoT gateway and edge devices evolves top down from the cloud rather than from the ground up with traditional embedded platforms.

The Linux Foundation, which also supports numerous “ground up” embedded projects such as the Yocto Project and Iotivity, but with LF Edge it has taken a substantial step toward the cloud-centric paradigm. The touted benefits of a cloud-native approach for embedded include easier software development, especially when multiple apps are needed, and improved security via virtualized, regularly updated container apps. Cloud-native edge computing should also enable more effective deployment of cloud-based analytics on the edge while reducing expensive, high-latency cloud communications.

None of the four major cloud operators listed above are currently members of LF Edge, which poses a challenge for the organization. However, there’s already a deep roster of companies onboard, including Arm, AT&T, Dell EMC, Ericsson, HPE, Huawei, IBM, Intel, Nokia Solutions, Qualcomm, Radisys, Red Hat, Samsung, Seagate, and WindRiver (see the LF Edge announcement for the full list.)

With developers coming at the edge computing problem from both the top-down and bottom-up perspectives, often with limited knowledge of the opposite realm, the first step is agreeing on terminology. Back in June, the Linux Foundation launched an Open Glossary of Edge Computing project to address this issue. Now part of LF Edge, the Open Glossary effort “seeks to provide a concise collection of terms related to the field of edge computing.”

There’s no mention of Linux in the announcements for the LF Edge projects, all of which propose open source, OS-agnostic, approaches to edge computing. Yet, there’s no question that Linux will be the driving force here.

Project EVE aims to be the Android of edge computing

Project EVE is developing an “open, agnostic and standardized architecture unifying the approach to developing and orchestrating cloud-native applications across the enterprise edge,” says the Linux Foundation. Built around an open source EVE (Edge Virtualization Engine) version of the proprietary Edge Virtualization X (EVx) engine from Santa Clara startup Zededa, Project EVE aims to reinvent embedded using Docker containers and other open source cloud-native software such as Kubernetes. Cloud-native edge computing’s “simple, standardized orchestration” will enable developers to “extend cloud applications to edge devices safely without the need for specialized engineering tied to specific hardware platforms,” says the project.

Earlier this year, Zededa joined the EdgeX Foundry project, and its technology similarly targets the industrial realm. However, Project EVE primarily concerns the higher application level rather than middleware. The project’s cloud-native approach to edge software also connects it to another LF project: the Cloud Native Computing Foundation.

In addition to its lightweight virtualization engine, Project EVE also provides a zero-trust security framework. In conversation with Linux.com, Zededa co-founder Roman Shaposhnik proposed to consign the word “embedded” to the lower levels of simple, MCU-based IoT devices that can’t run Linux. “To learn embedded you have to go back in time, which is no longer cutting it,” said Shaposhnik. “We have millions of cloud-native software developers who can drive edge computing. If you are familiar with cloud-native, you should have no problem in developing edge-native applications.”

If Shaposhnik is critical of traditional, ground-up embedded development, with all its complexity and lack of security, he is also dismissive of the proprietary cloud-to-edge solutions. “It’s clear that building silo’d end-to-end integration cloud applications is not really flying,” he says, noting the dangers of vendor lock-in and lack of interoperability and privacy.

To achieve the goals of edge computing, what’s needed is a standardized, open source approach to edge virtualization that can work with any cloud, says Shaposhnik. Project EVE can accomplish this, he says, by being the edge computing equivalent of Android.

“The edge market today is where mobile was in the early 2000s,” said Shaposhnik, referring to an era when early mobile OSes such as Palm, BlackBerry, and Windows Mobile created proprietary silos. The iPhone changed the paradigm with apps and other advanced features, but it was the far more open Android that really kicked the mobile world into overdrive.

“Project EVE is doing with edge what Android has done with mobile,” said Shaposhnik. The project’s standardized edge virtualization technology is the equivalent of Android package management and Dalvik VM for Java combined, he added. “As a mobile developer you don’t think about what driver is being used. In the same way our technology protects the developer from hardware complexity.”

Project EVE is based on Zededa’s EVx edge virtualization engine, which currently runs on edge hardware from partners including Advantech, Lanner, SuperMicro, and Scalys. Zededa’s customers are mostly large industrial or energy companies that need timely analytics, which increasingly requires multiple applications.

“We have customers who want to optimize their wind turbines and need predictive maintenance and vibration analytics,” said Shaposhnik. “There are a half dozen machine learning and AI companies that could help, but the only way they can deliver their product is by giving them a new box, which adds to cost and complexity.”

A typical edge computer may need only a handful of different apps rather than the hundreds found on a typical smartphone. Yet, without an application management solution such as virtualized containers, there’s no easy way to host them. Other open source cloud-to-edge solutions that use embedded container technology to provide apps include the Balena IoT fleet management solution from Balena (formerly Resin.io) and Canonical’s container-like Ubuntu Core distribution.

Right now, the focus is on getting the open source version of EVx out the door. Project EVE plans to release a 1.0 version of the EVE in the second quarter along with an SDK for developing EVE edge containers. An app store platform will follow later in the year.

Whether or not Edge computing serves as the backbone of mission-critical business worldwide depends on the success of the underlying network.

Linux Foundation's Project EVE: a Cloud-Native Edge Computing Platform

Recognizing the Edge’s potential and urgency to support Edge network, The Linux Foundation earlier this year created LF Edge, an umbrella organization dedicated to creating an open, agnostic and interoperable framework for edge computing. Similar to what the Cloud Native Computing Foundation (CNCF) has done for cloud development, LF Edge aims to enhance cooperation among key players so that the industry as a whole can advance more quickly.

By 2021, Gartner forecasts that there will be approximately 25 billion IoT devices in use around the world. Each of those devices, in turn, has the capacity to produce immense volumes of valuable data. Much of this data could be used to improve business-critical operations — but only if we’re able to analyze it in a timely and efficient manner. As mentioned above, it’s this combination of factors that has led to the rise of edge computing as one of the most rapidly -developing technology spaces today.

This idea of interoperability at the edge is particularly important because the hardware that makes up edge devices is so diverse — much more so than servers in a data center. Yet for edge computing to succeed, we need to be able to run applications right on local gateway devices to analyze and respond to IoT and Industry 4.0 data in near-real time. How do you design applications that are compatible with a huge variety of hardware and capable of running without a reliable cloud connection? This is the challenge that LF Edge is helping to solve.

Part of the solution is Project EVE, an Edge Virtualization Engine donated to LF Edge by ZEDEDA last month. I think of EVE as doing for the edge what Android did for mobile phones and what VMware did for data centers: decoupling software from hardware to make application development and deployment easier.

his curious (and somewhat unexpected) interplay between mobile and server computing requirements is exactly what makes edge so exciting. As an open source project, EVE now has a unique opportunity to blend the best parts of building blocks from projects as diverse as Android, ChromeOS, CoreOS, Qubes OS, Xen, Linuxkit, Linuxboot, Docker, Kubernetes and unikernels (AKA library operating systems — out of which AtmanOS is our favorite). And if you are still not convinced that all of these projects have much in common, simply consider this:

Today’s edge hardware is nothing like underpowered, specialized embedded hardware of yesterday. All of these boxes typically come with a few gigabits of RAM, dozens (if not hundreds) of GBs of flash and modern, high-speed CPUs with the latest features (like virtualization extensions) available by default. In short, they are very capable of supporting exactly the same cloud-native software abstractions developers now take for granted in any public cloud: containers, immutable infrastructure, 12-factor apps and continuous delivery software pipelines. From this perspective, edge hardware starts to look very much like servers in a data center (be it a public cloud or a private colo). At the same time;

These boxes are deployed out in the wild. Which means when it comes to security and network requirements, they exist in a world that looks nothing like a traditional data center. In fact, it looks a lot like the world mobile computing platforms have evolved in. Just like iPhones, these boxes get stolen, disassembled and hacked all the time in the hopes that secrets inside of them can be revealed and used as attack vectors. On the networking side, the similarity is even more striking: the way our smartphones have to constantly cope with ill-defined, flaky and heterogeneous networks (hopping between WiFi and LTE, for example) sets up a really good model for how to approach edge computing networking.

There’s no denying that EVE stands on the shoulders of all these open source giants that came before it and yet it has plenty of its own open source development to be done. In the remainder of this article, I’ll cover some of the technical details of Project EVE.

Project EVE overview

Fundamentally, EVE is a replacement for traditional (or even some of the real-time) operating systems (Linux, Windows, VxWorks, etc.) that are commonplace today in IoT and edge deployments. EVE takes control right after UEFI/BIOS and we have future plans around Linuxboot to have EVE actually replace your UEFI/BIOS altogether.

There are three key components of EVE: a type-1 hypervisor, running directly on bare metal; an Edge Container runtime that allows you to run applications in either a virtual machine or container; and a hardened root-of-trust implementation for security. A full list of hardware that EVE was tested on is available on the project’s Wiki page, but we expect EVE to run on most modern edge computing hardware (including products from major companies like Advantech and Supermicro, as well as architectures from ARM and Intel).

Project EVE Introduction

Once the EVE instance is up and running, the first thing it does is contact a pre-defined controller and receive instructions from the controller on how to configure itself and what workloads to start executing. The controller builds these instruction manifests for every EVE-enabled device that it knows about, based on the overall orchestration requests it receives from the DevOps rolling out a given deployment.

The API that EVE uses to talk to the controller is part of the LF Edge standardization efforts and we fully expect that it can evolve into the industry de-facto standard for how edge virtualization infrastructure is being controlled and monitored. You can see the current version of the API and documentation in EVE’s GitHub repository.

The kinds of workloads that a DevOps will be deploying to all EVE-enabled devices are packaged as Edge Containers. Edge Containers are meant to be an extension of traditional OCI Containers and the effort around their standardization will be ongoing in LF Edge in the coming months. The idea behind Edge Container extensions is to allow for seamless integration between virtual machine, unikernel and container workloads through a single packaging and distribution format.

Continuing with our Android analogy, one may say that while EVE is trying to do for the edge what Android has done for mobile, Edge Containers are meant to be the APKs of the edge.

All of EVE’s functionality is provided by a series of individual Go microservices that are running in full isolation from each other, similar to the pioneering ideas of radical isolation introduced by Qubes OS. Our ultimate goal is to make each one of those microservices be a standalone unikernel running directly on top of a type-1 hypervisor without requiring any operating system at all. We are planning to leverage excellent work done by the AtmanOS community in order to achieve that.

All of EVE’s microservices and infrastructure elements (think boot loader, Linux kernel, etc.) are tied together into a Linuxkit-like distribution that allows us to provide bootable EVE images ready to be deployed on Intel– and ARM-based edge hardware.

Our root-of-trust architecture leverages TPM and TEE hardware elements and provides a solid foundation for implementing flexible secret management, data encryption and measured boot capabilities without burdening application developers with any of that complexity.

Finally, on the connectivity side, EVE offers flexible networking capabilities to its Edge Containers through transparent integration of LISP protocol and crypto-routing. That way, EVE can provide SD-WAN and mesh networking functionality right out of the box, without requiring additional integration efforts.

Putting it all together, the internals of EVE’s architecture look something like this:

While this architecture may seem complex and daunting at times, we’re rapidly investing in documenting it and making it more flexible to work with. The EVE community shares the spirit of the Apache Way and believes in “Community over Code.” We welcome any and all types of contributions that benefit the community at large, not just code contributions:

  • Providing user feedback;
  • Sharing your use cases;
  • Evangelizing or collaborating with related products and technologies;
  • Maintaining our wiki;
  • Improving documentation;
  • Contributing test scenarios and test code;
  • Adding or improving hardware support;
  • Fixing bugs and adding new features.

The most important part of Project EVE is that it’s an open standard for the community, designed to make it easier for others to create and deploy applications for the edge. Now that the code is officially open sourced through LF Edge, it’s also available for anyone to contribute to and explore.

Shaposhnik: I think through the introduction, it's pretty clear who I am. If you're interested in talking to me about some of the other things that I do in the open source, feel free to do that. I happen to be very involved in Apache Software Foundation and Linux Foundation.

Today, we will be talking about edge computing. Let's start by defining the term, what is edge computing? I think we started a long time ago with IoT, Internet of Things. Then Cisco introduced this term called fog computing, which was telco-ish, IoT view. I think edge computing to me is very simple. It is basically cloud native IoT. It is when the small devices, I call them computers outside of data centers, they start to be treated by developers in a very cloud native way. People say, "We've been doing it for years. What's different?" The difference is it's all of the APIs and all of the things that we take for granted in the cloud and even in the private data center today. That actually took time to develop. We didn't start with Kubernetes, and Docker, and orchestration tools, and mesh networks. We started with individual machines. We started with individual rackable servers. That's basically what IoT is still, individual machines. The whole hope is that we can make it much better and much more exciting by applying some of the cloud native paradigms like liquid software, pipeline delivery, CI/CD, DevOps, that type of thing, but with the software running outside of your data center.

When I talk about edge, let's actually be very specific, because there are different types of edge. I will cover the edge I will not be talking about. Very specifically, let's talk about the edge that's very interesting to me, and I think it should be interesting to all of you. These are the type of devices that some people called deep edge, some people call enterprise edge. These are basically computers that are attached to some physical object. That physical object could be a moving vehicle. It could be a big turbine generating electricity. It could be a construction site. The point being is that something is happening in the real world and you either need to capture data about that something, or you need to drive the process of that something. Manufacturing is a really good example. You have your pipeline. You're manufacturing your product. You need to control that process. You have a computer that is typically called industrial PC attached to it. Same deal with a construction site, or even your local McDonald's. In McDonald's, you want to orchestrate the experience of your customers. You have a little computer that's attached to the cash register. You have a little computer that's attached to the display, and all of that needs to be orchestrated.

What I'm not talking about, I'm not actually talking about two things. I'm not talking about Raspberry Pis. There's definitely a lot of excitement about Raspberry Pis. It's interesting because if you think about the original motivation for the Raspberry Pi, it was to give underprivileged kids access to computing. It was basically to replace your personal laptop or desktop with essentially a very inexpensive device. The fact that Raspberry Pis now find their way into pretty much every single personal IoT project, there's almost a byproduct of how they designed the thing. I am yet to see Raspberry Pis being used for business, most of the time they just stop at the level of you personally doing something, or maybe you doing something with your friends, your hackerspace. Today, we'll not be talking about any of that. The reason we're not talking about that is because just like with container orchestration and Docker, you don't really need those tools unless you actually do some level of production. You don't really need those tools if you're just tinkering. You don't need Kubernetes to basically run your application if you're just writing an application for yourself. You only need Kubernetes if that is something that actually generates some business. We will not be talking about Raspberry Pis. We'll not be talking about telco edge, edge of the network, all of that.

Even this slice of the edge computing alone, given various estimations, represents a huge total addressable market. The biggest reason for that is the size of the data. These computers are connected to something that is in the real world. The data originates in the real world. The previous presentation today about self-driving vehicle from Uber is a perfect example of that. There's so much data that the vehicle is gathering, even if it was legally allowed, it is completely impossible to transfer all of that data to the big cloud in the sky for any processing. You have to orchestrate that behavior on the edge. As practitioners, we actually have to figure out how to do that. I was a little bit underwhelmed that Uber is focusing more on the machine learning. I understand why, but I'm an infrastructure guy. Today, I will be talking to you about infrastructure, how to make those types of applications easily deployable.

The good news is the total addressable market. The bad news is that it's a little bit of a situation like building the airplane while it's in flight. I think it would be fair to say that edge computing today is where cloud computing was in 2006. 2006, Amazon was starting to introduce EC2. Everybody was saying, it's crazy, it will never work. People at Netflix started doing microservices. Everybody says it's crazy, it will never work. The rest is history. Edge computing is a little bit of that. My goal today is to give you enough understanding of the space, to give you enough understanding of the challenges in this space but also the opportunities in this space. Also, explain maybe a little bit of the vocabulary of this space so you can orient yourself. I cannot give you the tools. I cannot really give you something that you will be immediately productive at your workplace, the same way that I can talk about Kubernetes, or Kafka, or any other tool that's fairly mature. Edge computing is just happening in front of our eyes. To me, that's what makes it exciting.

In a way, when I say cloud native, to me, edge computing represents basically one final cloud that we're building, because we've built a lot of the public clouds. There's Google. There is Microsoft. There is obviously Amazon. All of these are essentially in the business of getting all of the applications that don't have to have any physicality attached to them. What we're trying to do is we're trying to basically build a distributed cloud from the API perspective that will be executing on the equipment that doesn't belong to the same people who run public clouds. Edge computing is where ownership belongs to somebody else, not the infrastructure provider. From any other perspective, it's just the cloud. People always ask me, "If edge is just another cloud, can we actually reuse all of the software that we developed for the cloud and run it on these small computers"?

Project EVE Architecture Overview

It used to be a challenge even to do that, because those computers used to be really small. The good news now is that the whole space of IoT bifurcated. The only constraint that you have from now on is power budget. It might still be the case that you have to count every single milliamp. If you're in that type of a business, you're doing essential Snowflake's and bespoke things all the time. There's really no commonality that I can give you because everything has to be so super tightly integrated, because you're really in a very constrained power budget. Everything else where power is not a problem, it used to be that silicon cost used to be a problem, but that's not the case anymore. Thanks to the economy of scale, you can basically get Raspberry Pi class devices for essentially a couple dozen bucks. It actually costs more to encase them in a way that would make them weatherproof than to actually produce the silicon.

The computers are actually pretty powerful. These are the type of computers we used to have in our data centers five years ago. Five years ago, public cloud existed. Five years ago, Kubernetes already existed. Docker definitely existed. The temptation is to take that software and run it at the edge. There have been numerous attempts to rub some Kubernetes on it because, obviously, that's what we do. We try to reuse as much as possible. Pretty much every attempt of reusing the implementation that I know of failed. I can talk in greater details of why that is. APIs are still very useful. If you're taking the implementation that Kubernetes gives you today, that will not work for two reasons. First of all, it will not work because of the network issues. All of those devices happen to be offline more than they are online. Kubernetes is not happy about that type of situation. Second of all, and this is where you need to start appreciating the differences of why edge is different, interestingly enough, in the data center, the game that Kubernetes and all of these orchestration technologies play is essentially a game of workload consolidation. You're trying to run as many containers on as few servers as possible. The scalability requirements that we're building the Kubernetes-like platforms with are essentially not as many servers and tons of containers and applications. On the edge, it's exactly the reverse. On the edge, you basically have maybe half a dozen applications on each box, because boxes are ok, but they're still 4, 8 gigs of memory. It's not like your rackable server, but you have a lot of them.

Here's one data point that was given to us by one of our biggest customers. There's an industrial company called Siemens. That industrial company is in the business of managing and supporting industrial PCs that are attached to all things. Today, they have a challenge of managing 10 million of those industrial PCs. By various estimations, total number of servers inside of all of the Amazon data centers is single digit millions. That gives you a feel for what scale we should actually be building this for.

Finally, the economics of the edge is not the same as with the data center. All of these challenges essentially, make you think, we can reuse some of the principles that made cloud so successful and so developer friendly nowadays. We actually have to come up with slightly different implementations. My thesis is that the edge computing will be this really interesting, weird mix of traditional data center requirements, and actually mobile requirements. Because edge computing is like the original edge computing is this. Actually, the original edge computing, I would argue, is Microsoft Xbox. With this we really got our first taste for what an edge computing-like platform could look like. All of the things that made it so, the platforms, Android or iOS, the mobile device management approaches, cloud, Google Play Store or Google services, all of that will actually find its way into the edge. We have to think about, how will it look like? We also need to think about traditional data center architectures, like operating systems, hypervisors, all of that. I will try to outline and map out how Linux Foundation is trying to approach this space.

Open Source Edge Computing Platforms - Overview

Edge is actually pretty diverse, not just in terms of the ownership, but also in terms of the hardware and applications. Today, let's take industrial PCs. Pretty much all of them are running Windows. They're all x86 based hardware running Windows. When I say Windows, I actually mean Windows XP. Yes, it exists. A lot of SCADA applications are still based on Windows XP. If you show up as a developer and start razzle-dazzling these customers with your cloud native microservices-based architectures, the first question that they're going to ask you is, "It's all great. This is the new stuff. What about my old stuff? I want to keep running my old stuff. Can you give me a platform that would be able to support my old stuff, while I am slowly rebuilding it in this new next-generation architecture?" That becomes one of the fundamental requirements.

Scale, we already talked about the geographic aspect of it and deployments and the maintenance. The security is also interesting. Edge computing, unlike data center is much closer to this. Because edge computing is physical, which means you cannot really rely on physical security to protect it. It's not like there is a guy holding a machine gun in front of a data center, you cannot put that guy in front of every single edge computing device. You basically have to build your platform, very similarly to how iOS and Android are protecting all of your personal data. That's not something that data center people are even thinking about, because in a data center, you have your physical security and you have your network security. We are done with that. On a perimeter, you pay a lot of attention to it, but within the data center, not so much.

Also, interestingly enough, what I like about edge is that edge is probably the hardest one to really succumb to a vendor lock-in. Because the diversity is such that not a single vendor like a big cloud provider can actually handle it all. Edge is driven a lot by system integrator companies, SIs. SIs are typically pretty vertical. There may be an SI that is specializing in industrial, in retail, this and that. That diversity is actually good news for us as developers because we will not see the same concentration of power like we're seeing in the public cloud today, so I think it's good for us.

A lot of what I will be covering, in this talk, I wanted to pitch this other talk that just was made publicly available, taken out. This is the first time ever that Microsoft Xbox team talked about how they develop the platform for Xbox. That was done about a month ago, maybe two months ago, first time ever. A lot of the same principles apply, which makes me happy because we thought about them independently. The tricks that they played are really fascinating. The challenges they faced are very similar to the edge. If you want to hear from somebody who can claim that they successfully developed an edge platform, listen to those guys. I'm talking about the platform that's being developed. Mine can still fail, theirs is pretty successful.


Enterprise Edge Computing with Project EVE - Jason Shepherd, Zededa

Let's switch gears a little bit and talk about how Linux Foundation got involved in all of this. I shouldn't be the one to tell you that Cloud Native Compute Foundation has been super successful. In a way, I would say that Kubernetes was the first Google project that was successful precisely because of CNCF. I love Google, but they have a tendency of just throwing their open-source project over the wall and basically say, "If you like it, use it, if you don't, not our problem." Kubernetes was the first one where they actively tried to build a community. The fact that they went and donated it to Linux Foundation, and that was the anchor tenant for the Cloud Native Compute Foundation, I think made all the difference. Obviously, Linux Foundation itself was pretty happy about this outcome. They would like to do more of it.

The thought process went exactly like what I was talking about. When I say inside of data centers, I mean public cloud or your private data center. It doesn't matter. It's just a computer inside of a data center. For all of that, there's basically a forum of technologists that can decide, what is the common set of best practices that we all need to apply to the space to be more productive, more effective? That's CNCF, Cloud Native Compute Foundation. For all of the computers outside of data centers, it feels like we at least need to provide that type of forum even if we don't really have an anchor tenant like Kubernetes still. We need to give people a chance to talk among themselves, because otherwise there is really no way for them to synchronize on how the technology gets developed. That's LF EDGE.

Linux Foundation Edge Initiative was announced, not that long ago, actually, this year. It was announced in January, February this year. My company, ZEDEDA, we ended up being one of the founding members. We donated our project. There are a lot of companies in the space that are now part of the LF EDGE, so if you're interested, you can go to this lfedge.org website. The membership is pretty vast at this point. These are the premium members. There are also tons of general members. A lot of the good discussions are already happening within LF EDGE.

To give you a complete picture, what does LF EDGE cover? LF EDGE basically covers all of the computers outside of data centers. It starts with what we consider to be partial edge. A partial edge would be a quasi data center. It's not quite a data center, but it looks almost like a data center if you squint. A good example of that would be a telco central office, a telco CO. It's not really built to the same specification that a telco data center or a hyperscale data center would be built for, but a lot of technologies still apply. That's definitely in scope for LF EDGE. Then we basically go to telco access points. These are already physical devices. We're talking base stations. We're talking 5G deployments. These are all of the things in the CD infrastructure, or any infrastructure that would have to run some compute on them. That's definitely in scope for LF EDGE. Both of these are pretty dominated by telcos today, for good reason, because they're probably the best example of that type of an edge computing.

Then there are two other examples of edge. One that I will spend a lot of time talking about, we call it, for now, enterprise edge. This is basically all of those industrial PCs, IoT gateways. An example of the enterprise edge would be also a self-driving vehicle. Uber or Tesla building it would be also an example. Finally, there's obviously consumer edge. This is all of your washers, and dryers, and your refrigerators, all of that is in scope for LF EDGE. Every single one of these areas basically has a project that was donated by one of the founding companies. HomeEdge is from Samsung, which is not surprising because they're making all of these devices that you buy. Enterprise edge is us, ZEDEDA, and a few big enterprise companies like Dell, those types of guys. There's project Akraino that's dominated by telcos.

Interestingly enough, I have a friend of mine from Dell, Jason Shepherd, who keeps joking that this edge thing, it's very similar to how this country was settled. Because it feels we're now running away from the big hyperscale cloud providers, just like in the good old days people were running away for big businesses on the East Coast. The only place for us to actually build this exciting technology now is on the edge because everything else is dominated, and you have to join Google or Facebook to have a play in there. Go West, young man, go Edge.

These are the projects. I will be specifically talking about one of them, Edge Virtualization Engine. Check out the rest on the Linux Foundation website. I think you will find it very useful. Edge Virtualization Engine is what was donated by my company, ZEDEDA. We're actually working very closely with Fledge. Fledge is a middleware that runs on top of the project EVE. EVE stands for Edge Virtualization Engine.

Specifically, what requirements does EVE try to address? We basically approach looking at these boxes essentially from the ground up. We feel that we have to take control pretty much from the BIOS level up. I will talk about why that is important, because a lot of the technology that you would find at the BIOS and board management level in the data center simply doesn't exist on the edge. For those of you who know BMCs and iLOs, those things are not present on the edge for obvious reasons, because the control plane is not really to be had on the edge. Who are you going to talk to even if you have a BMC? Which creates an interesting challenge for how you can cut down on BIOS, and things like that. We feel that we need to start supporting hardware from the ground up. The hardware at the same time has to be zero touch. The experience of actually deploying the edge computing device should be as much similar to you buying a mobile device as possible. You get a device with an Android pre-installed. You turn it on, and you can run any applications that are compatible with an Android platform, so zero touch deployment.

We also feel that we need to run legacy applications. The legacy applications would include Windows XP. For Windows XP, you actually have to make sure that the application can access a floppy drive. That's a requirement. You also need to run real-time operating systems for control processes. You need to basically do hard partitioning of the hardware to guarantee the real-time SLAs on these applications. You need to build it at IoT scale, but what it really means is it needs to be at the same scale that all of the services that support your mobile devices operate at. What it means is that when you talk about edge computing, just building a service, a control plane in a single data center is not good enough, because your customers will be all over the place, sometimes even in Antarctica, or in the middle of the ocean. That also happens. You have to figure that one out. The platform has to be built with zero trust, absolutely zero trust, because we all know the stories of hacks that happened at uranium enrichment plant at Iranian facilities. The attack vector was very simple. It was a physical attack vector. Those things will keep happening unless we secure the platforms, and make them trustworthy as much as possible.

Finally, and that's where all of you come in, those platforms have to be made cloud native, in a sense that what APIs we give to developers to actually provide applications on top of them. Because if you look at the state of the industry today, and I already scared you at least a little bit with my Windows XP story, but Windows XP is actually a good story. The rest of the industry is still stuck in the embedded mindset. It's not a good embedded mindset. It's not like using Yocto or something. It's using some god-awful, embedded operating system that the company purchased 12, 15, 20 years ago, where people cannot even use modern GCC to compile the binary. That's the development experience in the edge and IoT today. I think it is only if we allow the same developers who built the cloud to actually develop for these platforms, it's only then that edge computing will actually take off. Because we are artificially restricting the number of innovative people that can come to the platform by not allowing the same tools that allowed us to make cloud as successful as it is today.

I talked a lot about various things that we plan to tackle. As developers, when I talk about cloud native, people tend to really just focus and assume app deployments. They're like, "Give me app deployments, and I'm done." The trouble is, app deployments, the way we think about them in a data center is just the tip of the iceberg on the edge. My favorite example that I give to everyone is, even if you assume virtualization, on the edge you basically have to solve the following problem. Suppose you decided on Docker containers, and now there is one Docker container that needs to drive a certain process, and another Docker container that needs to get a certain set of data. The process and the data happened to be connected to the single GPIO. This is a single physical device that basically has a pin out. Now you're in business of making sure that one container gets these two pins, and the other container gets those two pins. It's not something that would even come up as a problem in a data center. Because in a data center, all of your IO is basically restricted to networking, maybe a little bit of GPU. That's about it. Edge, is all about IO. All of that data that we're trying to get access to and unlock, that is the data that we can only access through a reasonable IO.

There are a lot of interesting plumbing challenges that need to be solved first before we can even start deploying our Docker containers. Docker containers are great. I think the thesis that we have at LF EDGE, at least within the project EVE, is basically very similar to what you would see in a data center, but with a certain set of specific details attached to it. We feel that edge needs to be treated exactly like you treat your Kubernetes cluster edge. The physical nodes, like your pods will be out there. There will be a controller sitting typically in the cloud, or it can sit on-prem, either one. All of these devices will basically talk to the controller just like your pods talk to the Kubernetes controller. Then somebody deploying the applications would talk to the control through typically a Kubernetes-like API. It is very much guaranteed to be a Kubernetes-like API. I think the API itself is great. That's very familiar to all of you. The question is, how do we build the layer that actually makes it all possible? That's where the project EVE comes in.

If I were to go through EVE's architecture, high level view, very quickly. It all starts with the hardware. Actually, it starts with the physical devices that you attach to the hardware. Then there needs to be some operating system that would allow you to do all of the above. That operating system needs to be open source. It needs to be Android of the edge type of an offering. That operating system will talk to the control plane. The control plane will sit in the cloud. On top of that offering of an operating system, you would be running your applications just like you do today in a data center, so a very typical, very familiar architecture.

Typically, your applications will talk to the big clouds in the sky from time to time, because that's where the data ends up anyway. You need to help them do that. Because a lot of times, people will talk to me and say, "I'm deploying my edge application today using Docker." I'm like, "That's great." They're like, "Now we need to make sure that the traffic flows into this particular Amazon VPC. How can we do that?" It just so happens that now you have to read a lot of documentation, because there's strongSwan involved, there's IPsec. It's not really configured by default. It's like, how can we actually connect the big cloud in the sky with this last cloud that we're building called edge computing? That has to come out of the box. These are essentially the requirements. That's the high-level architecture. I will deep dive into one specific component, which is EVE today.

State of the Edge: Exploring the Intersection of IoT, AI, 5G and Edge Computing

What we're trying to accomplish is, at the open-source layer, we need to standardize on two components. One is the runtime itself. The other one is the notion of an application. An application we're now trying to standardize we're calling that standard edge containers. The runtime is project EVE. At the top you basically have catalogs, and you have control planes. That's where companies can innovate and monetize. I would expect a lot of big cloud providers to basically join LF EDGE and essentially start building their controller offerings. Just like Amazon today gives you a lot of managed services, that will be one of the services that they would give you.

Deep diving into project EVE. EVE today is based on the type-1 hypervisor, currently Xen. We actually just integrated patches for ACRN. ACRN is Intel's type-1 hypervisor. It's a pretty simple layered cake, very traditional virtualization architecture. I will explain why virtualization is involved. It's hardware, a hypervisor, then there's a bunch of microservices that are running on that hypervisor. Finally, you get to run your containers.

That is to say that we're building the very same architecture that Android had to build for the mobile. The biggest difference being that Android built it in 2003. They essentially answered the same questions that we're answering just in a different way, because those were different times. The hardware was different. The questions are still the same. The questions are, how can you do application and operating system sandboxing because you don't want your applications to affect the operating system and vice versa? How do you do application bundling? How do you do application deployment? What hardware do you support? We are answering it more closely to a traditional virtualization play. Android basically did it through the sandboxing on top of JVM, because it made sense at the time. At the end of the day, I think Android also had this idea in mind that mobile platforms will only be successful if we invite all of the developers to actually develop for them. At the time developing for mobile was painful. It was that type of an embedded development experience. It's god-awful compilers, tool chains from the '80s. One of the key pieces of innovation of Android was like, let's actually pick a language that everybody understands and can program in called Java. We're essentially doing the same, but we're saying, language nowadays doesn't matter because we have this technology called Docker container. Language can be anything. It's the same idea of opening it up to the biggest amount of people who can actually bring their workloads to the platform.

EVE happens to be a post-, post-modern operating system. When I say it like that, I've built a couple of operating systems. I used to work at Sun Microsystems for a long time. I've built a couple of those. I used to hack on plotnine. I spent a bit of time doing that. All throughout my career, an operating system wanted to be a point of aggregation for anything that you do, hence packaging, shared libraries. An operating system wanted to be that point, that skeleton on which you hang everything. What happened a few years ago with basically the help of virtualization and technologies like unikernels, and things like that, is that we no longer view an operating system as that central aggregation point. An operating system these days is basically just enough operating system to run my Docker engine. I don't actually update my operating system, hence CoreOS. I don't really care about my operating system that much. I care about it running a certain type of workload. That's about it. That's what I mean by post-, post-modern operating system. It is an operating system in support of a certain type of workload. In case of EVE, that workload happens to be edge container.

Testing Challenges and Approaches in Edge Computing

Inside of EVE, there is a lot of moving parts. I will be talking about a few of those today. If you're interested, we actually have a really good documentation, which I'm proud of, because most of the open source projects lack that aspect of it. Go to our GitHub if you want to read some of the other stuff, so it's LF EDGE EVE, and click on the docs folder. There's the whole design and implementation of EVE that would be available to you. Let's quickly cover a few interesting bits and pieces. Here, I'm doing this hopefully to explain to you that what we're building is legit, but also maybe generate some interest so you can help us build it. If anything like that sounds interesting to you just talk to me after the presentation, we can figure out what pull request and GitHub issues I can assign to you.

EVE was inspired by a few operating systems that I had privilege to be associated with, one is Qubes OS. How many of you do know about Qubes OS? That's surprisingly few. You absolutely should check out Qubes OS. Qubes OS is the only operating system that Edward Snowden trusts. That's what he's running on his laptop, because that is the only one that he trusts. When he was escaping, his whole journey was Qubes OS that was running on his laptop. It's not perfect, but it's probably the best in terms of security thinking that I have seen in a long while.

Then there is Chrome OS. It's basically this idea that you can take an operating system and make it available on devices that you don't really manage. SmartOS was like Chrome OS or CoreOS, but derived from Solaris. EVE today is based on the type-1 hypervisor. People always ask me, why type-1? Why KVM is not allowed. The answer is simple. It's that requirement for the real-time workloads. Yes, patches for the real-time Linux kernel exist. They are really tricky. If you're talking about a pretty heterogeneous set of hardware, it's actually really tricky to maintain this single view of guaranteeing that your scheduler in Linux kernel would really be real-time. We use type-1 hypervisors and an ACRN, our choice today. We're running containers. We're running VMs. We're running unikernels. Basically, everything gets partitioned into its own domain by hypervisor but those domains can be super lightweight. With projects like Firecracker, that becomes faster and faster and pretty much indistinguishable from just starting a container.

DomU, basically where all of the microservices run, that is based on LinuxKit. LinuxKit is one of the most exciting projects in building specialized Linux-based distributions that I found in the last five years. It came out of Docker. It basically came out of Docker trying to build Docker Desktop. LinuxKit is how Docker manages that VM that happens to give you all of the Docker Desktop Services. It's also based on Alpine Linux. We get a lot of Alpine Linux dependencies.

We're driving towards unikernel architecture. Every single instance of a service will be running in its own domain. All of our stuff is implemented in Go. One of the really interesting projects that we're looking at is called AtmanOS, which basically allows you to do this, see that line, GOOS equals Xen, and you just do, go build. AtmanOS figured out that you can create very little infrastructure to allow binary run without an operating system, because it so happens that Go is actually pretty good about sandboxing you. Go needs a few services from an operating system, like memory management, scheduling, and that's about it. All of those services are provided directly by the hypervisor. You can actually do, go build, with GOOS Xen and have a binary that's a unikernel.

Edge Computing Standardisation and Initiatives

Finally, we're actually trying to standardize edge containers, which is pretty exciting. We are trying to truly extend the OCI specification. There have been a few areas in the OCI that we're looking at. Image specification itself doesn't require much of a change. The biggest focus that we have is on registry support. We don't actually need runtime specification because OCI had this problem that they needed to integrate with other tools. Remember, classical operating system, when all the classical operating systems were black box execution engine. We don't need to integrate with anything but ourselves, hence runtime specification is not really needed. Good news is that there are actually a lot of the parallel efforts of extending the OCI into supporting different types of containers. Two that I would mention are Kata Containers, which are more traditional OCI, but also Singularity Containers, which came more from HPC and giving you access to hardware. Weaveworks is doing some of that same thing. Check them out. Obviously, Firecracker is pretty cool as a container execution environment that also gives you isolation of the hypervisor.

Top three goals that we have for edge containers are, basically allow you not only file system level composition, which is what a traditional container gives you. You can basically compose layers. We happen to be glorified tarball. You do everything at the level of the file system. You add this file, you remove that file. We're also allowing you block-level composition. You can basically compose block-level devices, which allows you then to manage disks, VMs, unikernels, this and that. We allow you hardware mapping. You can basically associate how the hardware maps to a given container, not at the runtime level, but at the container level itself.

We still feel that the registry is the best thing that ever happened to Docker. The fact that you can produce a container is not interesting enough. The fact that you can share that container with everybody else, that is interesting. We feel that the registry basically has to take onto an ownership of managing as many artifacts as possible, which seems to be the trajectory of OCI anyway. Things like Helm charts and all the other things that you need for orchestration, I would love for them to exist in the registry. Because that becomes my single choke point for any deployment that then happens throughout my enterprise.

Eve's networking is intent based. You will find that very familiar to any type of networking architecture that exists in VMware or any virtualization product, with a couple of exceptions. One is cloud network, which is, literally, intent is, connect me to that cloud. I don't care how. I'm willing to give you my credentials but I need my traffic to flow into the Google, into Amazon, into Microsoft Cloud. Just make it happen. The way we do make it happen is each container or each VM, because everything is virtualized, basically gets a virtualized NIC, network interface card. What happens on the other side of that NIC? Think of it as basically one glorified sidecar, but instead of using the sidecar that has to communicate through the operating system. We communicate through the hypervisor. Basically, the VM is none the wiser of what happens to the traffic. All of that is configured by the system, which allows us really interesting tricks, like networking that Windows XP into the Amazon cloud. Otherwise, it would be impossible. You can install IPsec to Windows XP but it's super tricky. Windows XP that just communicates over the virtualized NIC and the traffic happens to flow through IPsec and to Amazon cloud, that Windows XP instance, is none the wiser.

Another cool thing that we do networking-wise is called mesh network. It is basically based on the standard called LISP, which has an RFC 6830. It allows you to have a flat IPv6 overlay namespace where anything can see anything else. That IPv6 is true overlay. It doesn't change if you move the device. What allows it to do is basically bypass all of the NetBoxes, and all of the things that may be in between this edge device and that edge device, so that they can directly communicate with each other. Think about it as one gigantic Skype or peer-to-peer system that allows everything to basically have a service mesh that is based on IPv6 instead of some interesting service discovery. That's networking.

On trust, we're basically building everything through the root-of-trust that's rooted at the hardware element. On Intel most of the time it happens to be TPM. TPMs exist in pretty much every single system I've seen. Yet nobody but Microsoft seems to be using them. Why? Because the developer support still sucks. On Linux, it actually takes a lot of time to enable TPM and configure TPM. We're virtualizing the TPM. We use it internally, but then the applications, the edge containers get the virtualized view of the TPM. We also deal with a lot of the crap that exists today in the modern x86 based system. Because a lot of people don't realize it but there is a lot of processors and software that runs on your x86 system that you don't know about. Your operating system, even your hypervisor is not the only piece of software. We're trying to either disable it or make it manageable. Our management starts from the BIOS level up. Thanks to Qubes for pioneering this. Everything runs in its own domain. We're even disaggregating device drivers. If you have a device driver for Bluetooth and it gets compromised, since it's running in its own domain, that will not compromise the rest of the system. Stuff like that.

EVE's software update model is super easy for applications. It's your traditional cloud native deployment. You push to the cloud and the application happens to run. If you don't like it, you push the next version. You can do canary deployments. You can do all of the stuff that you expect to see from Kubernetes. EVE itself needs to be updated. That's where ideas from Chrome OS and CoreOS kick in. It's pretty similar to what happens on your cell phone. It's dual partitioned with multiple levels of fallback, lots of burn-in testing that we do. We're trying to avoid the need for physical contact with edge nodes as much as possible, which means that a lot of things that would have you press a key would have to be simulated by us. That's a whole tricky area of how to do that. That's something that we also do in EVE. We are really big fans of the open-source BIOS reimplementation from coreboot, and especially u-root on top of coreboot. That allows us to basically have a complete open-source stack on everything from the BIOS level up.

The most interesting work that we're doing with TPM, and I have to plug it because I get excited about every single time, we're trying to basically do a hardware-protected vTPM, something that hasn't been done before, even in the data center. There's a group of us who is doing it, if you're interested you can contact any one of us. TrenchBoot is the name of the project. There's Dave Smith and LF EDGE in general.

Eve itself is actually super easy to develop. That's the demo that I wanted to give, because it's not a QCon without a demo. EVE is based on LinuxKit. There is a little makefile infrastructure that allows you to do all of the traditional operating system developer things. Basically, typing make run would allow you to manage the operating system, run the operating system. The only reason I'm mentioning this is because people get afraid a lot of times if I talk about operating system development and design, because there's a little bit of a stigma. It's like, "I need a real device. I need some J-Tech connector. I need a serial port to debug it." No, with EVE, you can actually debug it all in the comfort of your terminal window on macOS.

The entire build system is Docker based. Basically, all of the artifacts in EVE get packaged as Docker containers. It's actually super easy to develop within a single artifact. Because we're developing edge containers in parallel, we are planning to start using that for the unikernel development as well, which might, interestingly enough, bifurcate and be its own project. Because, I think when it comes to unikernels, developers still don't really have the tools. There's a few available like UniK and a few others. There's not, really, that same level of usefulness of the tools that Docker Desktop just gives me. We're looking into that as well.

Edge computing today is where public cloud was in '06. Sorry, I cannot give you ready-made tools, but I can invite you to actually build the tools with me and us at Linux Foundation. Edge computing is one final cloud that's left. I think it's the cloud that will never ever be taken away from us. By us, I mean people who actually run the actual physical hardware. Because you could tell, I'm an infrastructure guy. It sucked when people stopped buying servers and operating systems, and now everything just moved to the cloud. My refuge is edge. Edge computing is a huge total addressable market. As a founder of a startup company, I can assure you that there is tremendous amount of VC activity in the space. It's a good place to be if you're trying to build a company. Kubernetes as an implementation is dead, but long live Kubernetes as an API. That stays with us. Edge computing is a lot of fun. Just help us build either EVE, a super exciting project, or there are a few projects to pick in the LF EDGE in general.

Participant 1: We see the clouds, AWS, Azure, and all that. Is there L2 connectivity? Are you using, for example, the AWS Direct Connect APIs, and for Azure, ExpressRoute? That's what you're doing?

Shaposhnik: Yes, exactly.

Participant 1: I belong to Aconex and we are delving into a similar thing, we already allow people to connect to the cloud. We'll look deeper into this.

Shaposhnik: Absolutely. That's exactly right. That's why I'm saying it's a different approach to running an operating system because I see a lot of companies trying to still integrate with Linux, which is great. There is a lot of business in that. What we're saying is Linux itself doesn't matter anymore. It's the Docker container that matters. We're extending it into the edge container. Docker container is an edge container. It almost doesn't matter what an operating system is. We're replacing all layers of it with this very built for purpose engine. While it's still absolutely a valid approach to still say, "I need Yocto," or some traditional Linux distribution that integrates with that. I think my only call to action would be, let's build tools that would be applicable in both scenarios. That way we can help each other grow.

Participant 2: I know in your presentation you mentioned that edge is going to be more diverse. What's your opinion on cloud providers extending to the edge through projects like Azure Sphere and Azure IoT Edge?

Shaposhnik: They will be doing it, no question about it. I think they will come from the cloud side. Remember that long range of what's edge and what's not edge. They will basically start addressing the issues at the CO, the central office. They will start addressing the issues at the maybe Mac access points. I don't see them completely flipping and basically running on the deep edge. The reason for that is, business-wise, they're not set up to do that. The only company that I see that potentially can do that is Microsoft. Because if you want to run on the deep edge, you need to develop and foster your ecosystem, the same way that Microsoft developed and fostered the ecosystem that made every single PC run Windows. Amazon and ecosystem don't go together in the same sentence. Google is just confused. If anybody tackles it, that would be Microsoft, but they are distracted by so much of a low-hanging fruit in front of them just moving their traditional customers into the cloud, that I just don't see them as applying effort in that space. It may happen in five years, but for now, running this company, at least I don't see any of that happening.

Participant 3: What about drivers for sensors on these edge devices? It seems EVE abstracts the OS away from you, but in industrial, for instance, you need to detect things, so you need peripherals.

Shaposhnik: Correct. What about drivers? Because it's a hypervisor based architecture, we can just assign the hardware directly to you. If you want to have that Windows XP based VM drive your hardware, we can do that. That's not interesting, because we need software abstractions that will make it easier for developers to basically not think about it. That is the work that is a very nascent chunk of work. How do you provide software abstractions for a lot of things that we took for granted, like there's a file in /dev someplace, and I do something with it through Yocto. Now we're flipping it back and saying, "If I'm running a Docker container, what would be the most natural abstraction to a particular hardware resource?" A lot of times, surprisingly, to me, that abstraction happens to be a network socket. We can manage the driver on the other side of the hypervisor. Again, we will still run the driver in its own domain. To all of the containers that want to use it, we will basically present a nice software abstraction such as network socket.

More Information:

https://www.infoq.com/presentations/linux-eve/

https://landscape.lfedge.org/card-mode?license=apache-license-2-0

https://www.lfedge.org/resources/publications/

https://www.lfedge.org/#

https://www.lfedge.org/news-events/blog/

https://www.lfedge.org/2021/03/12/state-of-the-edge-2021-report/

https://www.linux.com/topic/networking/project-eve-promotes-cloud-native-approach-edge-computing/

https://zededa.com/product/

https://thenewstack.io/how-the-linux-foundations-eve-can-replace-windows-linux-for-edge-computing/


Tesla Dojo and Hydranet and AI and Deep Learning with New Super Computer Dojo and D1 Chip

$
0
0

 

Tesla Dojo and Hydranet and AI and Deep Learning

Tesla Has Done Something No Other Automaker Has: Assumed The Mantle Of Moore’s Law

Steve Jurvetson shared on Twitter that Tesla now holds the mantle of Moore’s law in the same manner NVIDIA took leadership from Intel a decade ago. He noted that the substrates have shifted several times, but humanity’s capacity to compute has compounded for 122 years. He shared a log scale with details.

https://www.flickr.com/photos/jurvetson/51391518506/

The link Jurvetson shared included a detailed article explaining how Tesla holds the mantel of Moore’s Law. Tesla’s introduced its D1 chip for the DOJO Supercomputer and he said:


“This should not be a surprise, as Intel ceded leadership to NVIDIA a decade ago, and further handoffs were inevitable. The computational frontier has shifted across many technology substrates over the past 120 years, most recently from the CPU to the GPU to ASICs optimized for neural networks (the majority of new compute cycles).”


“Of all of the depictions of Moore’s Law, this is the one I find to be most useful, as it captures what customers actually value — computation per $ spent (note: on a log scale, so a straight line is an exponential; each y-axis tick is 100x).”

“Humanity’s capacity to compute has compounded for as long as we can measure it, exogenous to the economy, and starting long before Intel co-founder Gordon Moore noticed a refraction of the longer-term trend in the belly of the fledgling semiconductor industry in 1965.”

Project Dojo: Check out Tesla Bot AI chip! (full presentation)


“In the modern era of accelerating change, it is hard to find even five-year trends with any predictive value, let alone trends that span the centuries. I would go further and assert that this is the most important graph ever conceived (my earlier blog post on its origins and importance).”

“Why the transition within the integrated circuit era? Intel lost to NVIDIA for neural networks because the fine-grained parallel compute architecture of a GPU maps better to the needs of deep learning. There is a poetic beauty to the computational similarity of a processor optimized for graphics processing and the computational needs of a sensory cortex, as commonly seen in neural networks today. A custom chip (like the Tesla D1 ASIC) optimized for neural networks extends that trend to its inevitable future in the digital domain. Further advances are possible in analog in-memory compute, an even closer biomimicry of the human cortex. The best business planning assumption is that Moore’s Law, as depicted here, will continue for the next 20 years as it has for the past 120.”

In the detailed description of the chart, Jurvetson pointed out that in the perception of Moore’s Law, computer chips are compounding in their complexity at near-constant per unit cost. He explained that this is one of many abstractions of the law. Moore’s Law is both a prediction and an abstraction this abstraction is related to the compounding of transistor density in two dimensions. He explained that others related to speed or computational power.

He also added:

“What Moore observed in the belly of the early IC industry was a derivative metric, a refracted signal, from a longer-term trend, a trend that begs various philosophical questions and predicts mind-bending futures.”

“Ray Kurzweil’s abstraction of Moore’s Law shows computational power on a logarithmic scale, and finds a double exponential curve that holds over 120 years! A straight line would represent a geometrically compounding curve of progress.”



He explained that, through five paradigm shifts, the computation power that $1,000 buys has doubled every two years. And it has been doubling every year for the past 30 years. In this graph, he explained that each dot represented a frontier of the computational price performance of the day. He gave these examples: one machine used in the 1890 Census, one cracked the Nazi Enigma cipher in WW2, and one predicted Eisenhower’s win in the 1956 presidential election.

He also pointed out that each dot represented a human drama and that before Moore’s first paper in 1965, none of them realized that they were on a predictive curve. The dots represent an attempt to build the best computer with the tools of the day, he explained. And with those creations, we use them to make better design software and manufacturing control algorithms.

“Notice that the pace of innovation is exogenous to the economy. The Great Depression and the World Wars and various recessions do not introduce a meaningful change in the long-term trajectory of Moore’s Law. Certainly, the adoption rates, revenue, profits, and economic fates of the computer companies behind the various dots on the graph may go through wild oscillations, but the long-term trend emerges nevertheless.”

Tesla now holds the mantle of Moore’s Law, with the D1 chip introduced last night for the DOJO supercomputer (video, news summary).

Tesla’s BREAKTHROUGH DOJO Supercomputer Hardware Explained

This should not be a surprise, as Intel ceded leadership to NVIDIA a decade ago, and further handoffs were inevitable. The computational frontier has shifted across many technology substrates over the past 120 years, most recently from the CPU to the GPU to ASICs optimized for neural networks (the majority of new compute cycles). The ASIC approach is being pursued by scores of new companies and Google TPUs now added to the chart by popular request (see note below for methodology), as well as the Mythic analog M.2

By taking on the mantle of Moore’s Law, Tesla is achieving something that no other automaker has achieved. I used the term “automaker” since Tesla is often referred to as such by the media, friends, family, and those who don’t really follow the company closely. Tesla started out as an automaker and that’s what people remember most about it: “a car for rich people,” one of my close friends told me. (She was shocked when I told her how much a Model 3 cost. She thought it was over $100K for the base model.)

Jurvetson’s post is very technical, but it reflects the truth: Tesla has done something unique for the auto industry. Tesla has progressed an industry that was outdated and challenged the legacy OEMs to evolve. This was is a hard thing for them to do, as there hasn’t been any new revolutionary technology introduced to this industry since Henry Ford moved humanity from the horse and buggy to automobiles.

Sure, over the years, designs of vehicles changed along with pricing, specs, and other details, but until Tesla, none of these changes affected the industry largely as a whole. None of these changes made the industry so uncomfortable that they laughed at the idea before lated getting scared of being left behind. The only company to have done this is Tesla, and now new companies are trying to be the next Tesla or create competing cars — and do whatever they can to keep up with Tesla’s lead.

Teaching a Car to Drive Itself by Imitation and Imagination (Google I/O'19)

For the auto industry, Tesla represents a jump in evolution, and not many people understand this. I think most automakers have figured this out, though. Ford and VW especially.

Of all of the depictions of Moore’s Law, this is the one I find to be most useful, as it captures what customers actually value — computation per $ spent (note: on a log scale, so a straight line is an exponential; each y-axis tick is 100x).

Humanity’s capacity to compute has compounded for as long as we can measure it, exogenous to the economy, and starting long before Intel co-founder Gordon Moore noticed a refraction of the longer-term trend in the belly of the fledgling semiconductor industry in 1965.

Why the transition within the integrated circuit era? Intel lost to NVIDIA for neural networks because the fine-grained parallel compute architecture of a GPU maps better to the needs of deep learning. There is a poetic beauty to the computational similarity of a processor optimized for graphics processing and the computational needs of a sensory cortex, as commonly seen in neural networks today. A custom chip (like the Tesla D1 ASIC) optimized for neural networks extends that trend to its inevitable future in the digital domain. Further advances are possible in analog in-memory compute, an even closer biomimicry of the human cortex. The best business planning assumption is that Moore’s Law, as depicted here, will continue for the next 20 years as it has for the past 120.

For those unfamiliar with this chart, here is a more detailed description:

Moore's Law is both a prediction and an abstraction

Moore’s Law is commonly reported as a doubling of transistor density every 18 months. But this is not something the co-founder of Intel, Gordon Moore, has ever said. It is a nice blending of his two predictions; in 1965, he predicted an annual doubling of transistor counts in the most cost effective chip and revised it in 1975 to every 24 months. With a little hand waving, most reports attribute 18 months to Moore’s Law, but there is quite a bit of variability. The popular perception of Moore’s Law is that computer chips are compounding in their complexity at near constant per unit cost. This is one of the many abstractions of Moore’s Law, and it relates to the compounding of transistor density in two dimensions. Others relate to speed (the signals have less distance to travel) and computational power (speed x density).

Unless you work for a chip company and focus on fab-yield optimization, you do not care about transistor counts. Integrated circuit customers do not buy transistors. Consumers of technology purchase computational speed and data storage density. When recast in these terms, Moore’s Law is no longer a transistor-centric metric, and this abstraction allows for longer-term analysis.

Tesla’s MIND BLOWING Dojo AI Chip (changes everything)

What Moore observed in the belly of the early IC industry was a derivative metric, a refracted signal, from a longer-term trend, a trend that begs various philosophical questions and predicts mind-bending futures.

Ray Kurzweil’s abstraction of Moore’s Law shows computational power on a logarithmic scale, and finds a double exponential curve that holds over 120 years! A straight line would represent a geometrically compounding curve of progress. 

Through five paradigm shifts – such as electro-mechanical calculators and vacuum tube computers – the computational power that $1000 buys has doubled every two years. For the past 35 years, it has been doubling every year. 

Each dot is the frontier of computational price performance of the day. One machine was used in the 1890 Census; one cracked the Nazi Enigma cipher in World War II; one predicted Eisenhower’s win in the 1956 Presidential election. Many of them can be seen in the Computer History Museum. 

Each dot represents a human drama. Prior to Moore’s first paper in 1965, none of them even knew they were on a predictive curve. Each dot represents an attempt to build the best computer with the tools of the day. Of course, we use these computers to make better design software and manufacturing control algorithms. And so the progress continues.

Notice that the pace of innovation is exogenous to the economy. The Great Depression and the World Wars and various recessions do not introduce a meaningful change in the long-term trajectory of Moore’s Law. Certainly, the adoption rates, revenue, profits and economic fates of the computer companies behind the various dots on the graph may go though wild oscillations, but the long-term trend emerges nevertheless.

Any one technology, such as the CMOS transistor, follows an elongated S-shaped curve of slow progress during initial development, upward progress during a rapid adoption phase, and then slower growth from market saturation over time. But a more generalized capability, such as computation, storage, or bandwidth, tends to follow a pure exponential – bridging across a variety of technologies and their cascade of S-curves.

In the modern era of accelerating change in the tech industry, it is hard to find even five-year trends with any predictive value, let alone trends that span the centuries. I would go further and assert that this is the most important graph ever conceived.

Why is this the most important graph in human history?

A large and growing set of industries depends on continued exponential cost declines in computational power and storage density. Moore’s Law drives electronics, communications and computers and has become a primary driver in drug discovery, biotech and bioinformatics, medical imaging and diagnostics. As Moore’s Law crosses critical thresholds, a formerly lab science of trial and error experimentation becomes a simulation science, and the pace of progress accelerates dramatically, creating opportunities for new entrants in new industries. Boeing used to rely on the wind tunnels to test novel aircraft design performance. Ever since CFD modeling became powerful enough, design moves to the rapid pace of iterative simulations, and the nearby wind tunnels of NASA Ames lie fallow. The engineer can iterate at a rapid rate while simply sitting at their desk.

Tesla unveils "Dojo" Computer Chip | Tesla AI Day 

Every industry on our planet is going to become an information business. Consider agriculture. If you ask a farmer in 20 years’ time about how they compete, it will depend on how they use information, from satellite imagery driving robotic field optimization to the code in their seeds. It will have nothing to do with workmanship or labor. That will eventually percolate through every industry as IT innervates the economy.

Non-linear shifts in the marketplace are also essential for entrepreneurship and meaningful change. Technology’s exponential pace of progress has been the primary juggernaut of perpetual market disruption, spawning wave after wave of opportunities for new companies. Without disruption, entrepreneurs would not exist.

Moore’s Law is not just exogenous to the economy; it is why we have economic growth and an accelerating pace of progress. At Future Ventures, we see that in the growing diversity and global impact of the entrepreneurial ideas that we see each year. The industries impacted by the current wave of tech entrepreneurs are more diverse, and an order of magnitude larger than those of the 90’s — from automobiles and aerospace to energy and chemicals.

At the cutting edge of computational capture is biology; we are actively reengineering the information systems of biology and creating synthetic microbes whose DNA is manufactured from bare computer code and an organic chemistry printer. But what to build? So far, we largely copy large tracts of code from nature. But the question spans across all the complex systems that we might wish to build, from cities to designer microbes, to computer intelligence.

Reengineering engineering

As these systems transcend human comprehension, we will shift from traditional engineering to evolutionary algorithms and iterative learning algorithms like deep learning and machine learning. As we design for evolvability, the locus of learning shifts from the artifacts themselves to the process that created them. There is no mathematical shortcut for the decomposition of a neural network or genetic program, no way to "reverse evolve" with the ease that we can reverse engineer the artifacts of purposeful design. The beauty of compounding iterative algorithms (evolution, fractals, organic growth, art) derives from their irreducibility. And it empowers us to design complex systems that exceed human understanding.

Tesla AI Day

Why does progress perpetually accelerate?

All new technologies are combinations of technologies that already exist. Innovation does not occur in a vacuum; it is a combination of ideas from before. In any academic field, the advances today are built on a large edifice of history. . This is why major innovations tend to be 'ripe' and tend to be discovered at the nearly the same time by multiple people. The compounding of ideas is the foundation of progress, something that was not so evident to the casual observer before the age of science. Science tuned the process parameters for innovation, and became the best method for a culture to learn.

From this conceptual base, come the origin of economic growth and accelerating technological change, as the combinatorial explosion of possible idea pairings grows exponentially as new ideas come into the mix (on the order of 2^n of possible groupings per Reed’s Law). It explains the innovative power of urbanization and networked globalization. And it explains why interdisciplinary ideas are so powerfully disruptive; it is like the differential immunity of epidemiology, whereby islands of cognitive isolation (e.g., academic disciplines) are vulnerable to disruptive memes hopping across, much like South America was to smallpox from Cortés and the Conquistadors. If disruption is what you seek, cognitive island-hopping is good place to start, mining the interstices between academic disciplines.

Predicting cut-ins (Andrej Karpathy)

It is the combinatorial explosion of possible innovation-pairings that creates economic growth, and it’s about to go into overdrive. In recent years, we have begun to see the global innovation effects of a new factor: the internet. People can exchange ideas like never before Long ago, people were not communicating across continents; ideas were partitioned, and so the success of nations and regions pivoted on their own innovations. Richard Dawkins states that in biology it is genes which really matter, and we as people are just vessels for the conveyance of genes. It’s the same with ideas or “memes”. We are the vessels that hold and communicate ideas, and now that pool of ideas percolates on a global basis more rapidly than ever before.

In the next 6 years, three billion minds will come online for the first time to join this global conversation (via inexpensive smart phones in the developing world). This rapid influx of three billion people to the global economy is unprecedented in human history, and so to, will the pace of idea-pairings and progress.

We live in interesting times, at the cusp of the frontiers of the unknown and breathtaking advances. But, it should always feel that way, engendering a perpetual sense of future shock.

The D1 is the second semiconductor designed internally by Tesla, following the in-car supercomputer released in 2019. According to Tesla Official, each D1 packs 362 teraflops (TFLOPs) of processing power, meaning it can perform 362 trillion floating-point operations per second.

Is the ‘D1’ AI chip speeding Tesla towards full autonomy?

The company has designed a super powerful and efficient chip for self-driving, but can be used for many other things

Tesla on its AI day, unveiled a custom chip for training artificial intelligence networks in data centers

The D1 chip is part of Tesla’s Dojo supercomputer system, uses a 7-nm manufacturing process, with 362 teraflops of processing power

The chips can help train models to recognize items from camera feeds inside Tesla vehicles

Will the just-announced Tesla Bot make future working optional for humans - or obsolete?

Elon Musk says Tesla robot will make physical work a ‘choice’

Back at the Tesla 2019 Autonomy Day, CEO Elon Musk unveiled its first custom artificial intelligence (AI) chip, which promised to propel the company toward its goal of full autonomy. The automaker then started producing cars with its custom AI within the same year. This year, as the world grapples with a chip shortage conundrum, the company presented its in-house D1 chip — the processor that will power its Dojo supercomputer.

Tesla's Dojo Supercomputer, Full Analysis (Part 1/2)

Tesla's Dojo Supercomputer, Full Analysis (Part 2/2)


The D1 is the second semiconductor designed internally by Tesla, following the in-car supercomputer released in 2019. According to Tesla Official, each D1 packs 362 teraflops (TFLOPs) of processing power, meaning it can perform 362 trillion floating-point operations per second. 

Tesla combines 25 chips into a training tile and links 120 training tiles together across several server cabinets. In simple terms, each training tile clocks in at 9 petaflops, meaning Dojo will boast over 1 exaflop of computing power. In other words, Dojo can easily be the most powerful AI training machine in the world.

The company believes that AI has limitless possibilities and the system is getting smarter than an average human. Tesla announced that to speed up the AI software workloads, its D1 Dojo custom application-specific integrated circuit (ASIC) for AI training will be of great use, the software that the company presented during this year’s AI Day that was held last week.

Although many companies including tech giants like Amazing, Baidu, Intel and NVIDIA are building ASICs for AI workloads, not everyone has the right formula or satisfies each workload perfectly. Experts reckon it is the reason why Tesla opted to develop its own ASIC for AI training purposes.

Tesla and its foray into AI

The system which is called the D1 resembles a part of the Dojo supercomputer used to train AI models inside Tesla’s headquarters. It is fair to note that the chip is a product of Taiwan’s TSMC’s manufacturing efforts and is produced using the 7nm semiconductor node. The chip reportedly is packed with over 50 billion transistors and boasts a huge die size of 645mm^2.

Now, with the introduction of an exascale supercomputer which management says will be operational next year, Tesla has reinforced that advantage. Since AI training requires two things: massive amounts of data, and a powerful supercomputer that can use that data to train deep neural nets, Tesla has the added advantage. With over one million autopilot-enabled EVs on the road, Tesla already has a vast dataset edge over other automakers. 

All this work comes two years after Tesla began producing vehicles containing AI chips it built in-house. Those chips help the car’s onboard software make decisions very quickly in response to what’s happening on the road. This time, Musk noted that its latest supercomputer tech can be used for many other things and that Tesla is willing to open up other automakers and tech companies who are interested. 


At first it seems improbable — how could it be that Tesla, who has never designed a chip before — would design the best chip in the world? But that is objectively what has occurred. Not best by a small margin, best by a huge margin. It’s in the cars right now,” Musk said. With that, his newest big prediction is that Tesla will have self-driving cars on the road next year — without humans inside — operating in a so-called robo-taxi fleet. 


Tesla introduced the Tesla D1, a new chip designed specifically for artificial intelligence that is capable of delivering a power of 362 TFLOPs in BF16 / CFP8. This was announced at Tesla’s recent AI Day event.

The Tesla D1 adds a total of 354 training nodes that form a network of functional units, which are interconnected to create a massive chip. Each functional unit comes with a quad-core, 64-bit ISA CPU that uses a specialized, custom design for transpositions, compilations, broadcasts, and link traversal. This CPU adopts a superscalar implementation (4-wide scalar and 2-wide vector pipelines).

This new Tesla silicon is manufactured in 7nm process, has a total of 50,000 million transistors, and occupies an area of ​​645 mm square, which means that it is smaller than the GA100 GPU, used in the NVIDIA A100 accelerator, which is 826 mm square in size.

Each functional unit has 1.25 MB SRAM and 512 GB/sec bandwidth in any direction on the unit network. The CPUs are joined in multichip configurations of 25 D1 units, which Tesla calls "Dojo Interface Processors" (DIPs).



Tesla claims its Dojo chip will process computer vision data four times faster than existing systems, enabling the company to bring its self-driving system to full autonomy, but the two most difficult technological feats have not been accomplished by Tesla yet, this is the tile to tile interconnect and software. Each tile has more external bandwidth than the highest end networking switches. To achieve this, Tesla developed custom interconnects. Tesla says the first Dojo cluster will be running by next year.

The same technology that undergirds Tesla’s cars will drive the forthcoming Tesla Bot, which is intended to perform mundane tasks like grocery shopping or assembly-line work. Its design spec calls for 45-pound carrying capacity, “human-level hands,” and a top speed of 5 miles per hour (so humans can outrun it).

IBM’s Telum Processor is the latest silicon wafer chip and a competitor to the Tesla D1. IBM’s first commercial processor, the Telum contains on-chip acceleration and allows clients to use deep-learning interference at scale. IBM claims that the on-chip acceleration empowers the system to conduct inference at a great speed.

IBM’s Telum is integral in fraud detection during the early periods of transaction processing while Tesla’s Dojo is mainly essential for computer vision for self-driving cars using cameras. While Telum is a silicon wafer, Dojo has gone against industry standards: the chips are designed to connect without any glue.

The most powerful supercomputer in the world, Fugaku, lives at the RIKEN Center for Computational Science in Japan. At its tested limit it is capable of 442,010 TFLOPs per second, and theoretically it could perform up to 537,212 TFLOPs per second. Dojo, Tesla said, could end up being capable of breaking the exaflop barrier, something that no supercomputing company, university or government has been capable of doing.

Tesla unveils "Dojo" Computer Chip | Tesla AI Day

Dojo is made up of a mere 10 cabinets and is thus also the smallest supercomputer in the world when it comes to size. Fugaku on the other hand is made up of 256 cabinets. If Tesla was to add 54 cabinets to Dojo V1 for a total of 64 cabinets, Dojo would surpass Fugaku in computer performance.

All along, Tesla seemed positioned to gain an edge in artificial intelligence. Sure, Elon Musk’s Neuralink — along with SpaceX and The Boring Company — are separately held companies from Tesla, but certainly seepage among the companies occurs. So, at the Tesla AI event last month, when the company announced it would be designing its own silicon chips, more than ever it seemed Tesla had an advantage.

The AI event culminated with a dancing human posing as a humanoid robot, previewing the Tesla Bot the company intends to build. But the more immediate and important reveal was the custom AI chip “D1,” which would be used for training the machine-learning algorithm behind Tesla’s Autopilot self-driving system. Tesla has a keen focus on this technology, with a single giant neural network known as a “transformer” receiving input from 8 cameras at once.

“We are effectively building a synthetic animal from the ground up,” Tesla’s AI chief, Andrej Karpathy, said during the August, 2021 event. “The car can be thought of as an animal. It moves around autonomously, senses the environment, and acts autonomously.”

CleanTechnica‘s Johnna Crider, who attended the AI event, shared that, “At the very beginning of the event, Tesla CEO Musk said that Tesla is much more than an electric car company, and that it has ‘deep AI activity in hardware on the inference level and on the training level.’” She concluded that, “by unveiling the Dojo supercomputer plans and getting into the details of how it is solving computer vision problems, Tesla showed the world another side to its identity.”

Tesla’s Foray into Silicon Chips

Tesla is the latest nontraditional chipmaker, as described in a recent Wired analysis. Intel Corporation is the world’s largest semiconductor chip maker, based on its 2020 sales. It is the inventor of the x86 series of microprocessors found in most personal computers today. Yet, as AI gains prominence and silicon chips become essential ingredients in technology-integrated manufacturing, many others, including Google, Amazon, and Microsoft, are now designing their own chips.

Tesla FSD chip explained! Tesla vs Nvidia vs Intel chips

For Tesla, the key to silicon chip success will be deriving optimal performance out of the computer system used to train the company’s neural network. “If it takes a couple of days for a model to train versus a couple of hours,” CEO Elon Musk said at the AI event, “it’s a big deal.”

Initially, Tesla relied on Nvidia hardware for its silicon chips. That changed in 2019, when Tesla turned in-house to design chips that interpret sensor input in its cars. However, manufacturing the chips needed to train AI algorithms — moving the creative process from vision to execution — is quite a sophisticated, costly, and demanding endeavor.

The D1 chip, part of Tesla’s Dojo supercomputer system, uses a 7-nanometer manufacturing process, with 362 teraflops of processing power, said Ganesh Venkataramanan, senior director of Autopilot hardware. Tesla places 25 of these chips onto a single “training tile,” and 120 of these tiles come together across several server cabinets, amounting to over an exaflop of power. “We are assembling our first cabinets pretty soon,” Venkataramanan disclosed.

CleanTechnica‘s Chanan Bos deconstructed the D1 chip intricately in a series of articles (in case you missed them) and related that, under its specifications, the D1 chip boasts that it has 50 billion transistors. When it comes to processors, that absolutely beats the current record held by AMD’s Epyc Rome chip of 39.54 billion transistors.


Tesla says on its website that the company believes “that an approach based on advanced AI for vision and planning, supported by efficient use of inference hardware, is the only way to achieve a general solution for full self-driving and beyond.” To do so, the company will:

Build silicon chips that power the full self-driving software from the ground up, taking every small architectural and micro-architectural improvement into account while pushing hard to squeeze maximum silicon performance-per-watt;

Perform floor-planning, timing, and power analyses on the design;

Write robust, randomized tests and scoreboards to verify functionality and performance;

Implement compilers and drivers to program and communicate with the chip, with a strong focus on performance optimization and power savings; and,

Validate the silicon chip and bring it to mass production.

“We should have Dojo operational next year,” CEO Elon Musk affirmed.

Keynote - Andrej Karpathy, Tesla


The Tesla Neural Network & Data Training

Tesla’s approach to full self-driving is grounded in its neural network. Most companies that are developing self-driving technology look to lidar, which is an acronym for “Light Detection and Ranging.” It’s a remote sensing method that uses light in the form of a pulsed laser to measure ranges — i.e., variable distances — to the Earth. These light pulses are combined with other data recorded by the airborne system to generate precise, 3-dimensional information about the shape of the Earth and its surface characteristics.

PyTorch at Tesla - Andrej Karpathy, Tesla

Tesla, however, rejected lidar, partially due to its expensive cost and the amount of technology required per vehicle. Instead, it interprets scenes by using the neural network algorithm to dissect input from its cameras and radar. Chris Gerdes, director of the Center for Automotive Research at Stanford, says this approach is “computationally formidable. The algorithm has to reconstruct a map of its surroundings from the camera feeds rather than relying on sensors that can capture that picture directly.”

Tesla explains on its website the protocols it has embraced to develop its neural networks:

Apply cutting-edge research to train deep neural networks on problems ranging from perception to control;

Per-camera networks analyze raw images to perform semantic segmentation, object detection, and monocular depth estimation;

Birds-eye-view networks take video from all cameras to output the road layout, static infrastructure, and 3D objects directly in the top-down view;

Networks learn from the most complicated and diverse scenarios in the world, iteratively sourced from a fleet of nearly 1M vehicles in real time; and,

A full build of Autopilot neural networks involves 48 networks that take 70,000 GPU hours to train, and, together, output 1,000 distinct tensors (predictions) at each timestep.

Training Teslas via Videofeeds

Tesla gathers more training data than other car companies. Each of the more than 1 million Teslas on the road sends back to the company the videofeeds from its 8 cameras. Hardware 3 onboard computer processes more than 40s the data compared to Tesla’s previous generation system. The company employs 1,000 people who label those images — noting cars, trucks, traffic signs, lane markings, and other features — to help train the large transformer.


At the August event, Tesla also said it can automatically select which images to prioritize in labeling to make the process more efficient. This is one of the many pieces that sets Tesla apart from its competitors.

Conclusion

Tesla has an advantage over Waymo (and other competitors) in three key areas thanks to its fleet of roughly 500,000 vehicles:

  • Computer vision
  • Prediction
  • Path planning/driving policy

Concerns about collecting the right data, paying people to label it, or paying for bandwidth and storage don’t obviate these advantages. These concerns are addressed by designing good triggers, using data that doesn’t need human labelling, and using abstracted representations (replays) instead of raw video.

The majority view among business analysts, journalists, and the general public appears to be that Waymo is far in the lead with autonomous driving, and Tesla isn’t close. This view doesn’t make sense when you look at the first principles of neural networks.

Wafer-Scale Hardware for ML and Beyond

What’s more, AlphaStar is a proof of concept of large-scale imitation learning for complex tasks. If you are skeptical that Tesla’s approach is the right one, or that path planning/driving policy is a tractable problem, you have to explain why imitation learning worked for StarCraft but won’t work for driving.

I predict that – barring a radical move by Waymo to increase the size of its fleet – in the next 1-3 years, the view that Waymo is far in the lead and Tesla is far behind will be widely abandoned. People have been focusing too much on demos that don’t inform us about system robustness, deeply limited disengagement metrics, and Google/Waymo’s access to top machine learning engineers and researchers. They have been focusing too little on training data, particularly for rare objects and behaviours where Waymo doesn’t have enough data to do machine learning well, or at all.

Wafer-scale AI for science and HPC (Cerebras)

Simulation isn’t an advantage for Waymo because Tesla (like all autonomous vehicle companies) also uses simulation. More importantly, a simulation can’t generate rare objects and rare behaviours that the simulation’s creators can’t anticipate or don’t know how to model accurately.

Pure reinforcement learning didn’t work for AlphaStar because the action space of StarCraft is too large for random exploration to hit upon good strategies. So, DeepMind had to bootstrap with imitation learning. This shows a weakness in the supposition that, as with AlphaGo Zero, pure simulated experience will solve any problem. Especially when it comes to a problem like driving where anticipating the behaviour of humans is a key component. Anticipating human behaviour requires empirical information about the real world.

Compiler Construction for Hardware Acceleration: Challenges and Opportunities

Observers of the autonomous vehicles space may be underestimating Tesla’s ability to attract top machine learning talent. A survey of tech workers found that Tesla is the 2nd most sought-after company in the Bay Area, one rank behind Google. It also found Tesla is the 4th most sought-after company globally, two ranks behind Google at 2nd place. (Shopify is in 3rd place globally, and SpaceX is in 1st.) It also bears noting that fundamental advances in machine learning are often shared openly by academia, OpenAI, and corporate labs at Google, Facebook, and DeepMind. The difference between what Tesla can do and what Waymo can do may not be that big.

2020 LLVM in HPC Workshop: Keynote: MLIR: an Agile Infrastructure for Building a Compiler Ecosystem

The big difference between the two companies is data. As Tesla’s fleet grows to 1 million vehicles, its monthly mileage will be about 1 billion miles, 1000x more than Waymo’s monthly rate of about 1 million miles. What that 1000x difference implies for Tesla is superior detection for rare objects, superior prediction for rare behaviours, and superior path planning/driving policy for rare situations. The self-driving challenge is more about handling the 0.001% of miles that contain rare edge cases than the 99.999% of miles that are unremarkable. So, it stands to reason that the company that can collect a large number of training examples from this 0.001% of miles will do better than the companies that can’t.

More Information:

https://www.datacenterdynamics.com/en/news/tesla-detail-pre-dojo-supercomputer-could-be-up-to-80-petaflops/

https://www.allaboutcircuits.com/news/a-circuit-level-assessment-teslas-proposed-supercomputer-dojo/

https://heartbeat.fritz.ai/computer-vision-at-tesla-cd5e88074376

https://towardsdatascience.com/teslas-deep-learning-at-scale-7eed85b235d3

https://www.autopilotreview.com/teslas-andrej-karpathy-details-autopilot-inner-workings/

https://phucnsp.github.io/blog/self-taught/2020/04/30/tesla-nn-in-production.html

https://asiliconvalleyinsider.com/2020/03/08/waymo-chauffeurnet-versus-telsa-hydranet/

https://www.infoworld.com/article/3597904/why-enterprises-are-turning-from-tensorflow-to-pytorch.html

https://cleantechnica.com/2021/08/22/teslas-dojo-supercomputer-breaks-all-established-industry-standards-cleantechnica-deep-dive-part-3/

https://semianalysis.com/the-tesla-dojo-chip-is-impressive-but-there-are-some-major-technical-issues/

https://www.tweaktown.com/news/81229/teslas-insane-new-dojo-d1-ai-chip-full-transcript-of-its-unveiling/index.html

https://www.inverse.com/innovation/tesla-full-self-driving-release-date-ai-day

https://videocardz.com/newz/tesla-d1-chip-features-50-billion-transistors-scales-up-to-1-1-exaflops-with-exapod

https://cleantechnica.com/2021/09/15/what-advantage-will-tesla-gain-by-making-its-own-silicon-chips/


NeuroMorphic Photonic Computing and Better AI

$
0
0

 

Taking Neuromorphic Computing to the Next Level with Loihi 2

Intel Labs’ new Loihi 2 research chip outperforms its predecessor by up to 10x and comes with an open-source, community-driven neuromorphic computing framework

Today, Intel introduced Loihi 2, its second-generation neuromorphic research chip, and Lava, an open-source software framework for developing neuro-inspired applications. Their introduction signals Intel’s ongoing progress in advancing neuromorphic technology.

“Loihi 2 and Lava harvest insights from several years of collaborative research using Loihi. Our second-generation chip greatly improves the speed, programmability, and capacity of neuromorphic processing, broadening its usages in power and latency constrained intelligent computing applications. We are open sourcing Lava to address the need for software convergence, benchmarking, and cross-platform collaboration in the field, and to accelerate our progress toward commercial viability.”

–Mike Davies, director of Intel’s Neuromorphic Computing Lab

Why It Matters: Neuromorphic computing, which draws insights from neuroscience to create chips that function more like the biological brain, aspires to deliver orders of magnitude improvements in energy efficiency, speed of computation and efficiency of learning across a range of edge applications: from vision, voice and gesture recognition to search retrieval, robotics, and constrained optimization problems.

Neuromorphic Chipsets - Industry Adoption Analysis


Applications Intel and its partners have demonstrated to date include robotic arms, neuromorphic skins and olfactory sensing.

About Loihi 2: The research chip incorporates learnings from three years of use with the first-generation research chip and leverages progress in Intel’s process technology and asynchronous design methods.

Advances in Loihi 2 allow the architecture to support new classes of neuro-inspired algorithms and applications, while providing up to 10 times faster processing1, up to 15 times greater resource density2 with up to 1 million neurons per chip, and improved energy efficiency. Benefitting from a close collaboration with Intel’s Technology Development Group, Loihi 2 has been fabricated with a pre-production version of the Intel 4 process, which underscores the health and progress of Intel 4. The use of extreme ultraviolet (EUV) lithography in Intel 4 has simplified the layout design rules compared to past process technologies. This has made it possible to rapidly develop Loihi 2.

The Lava software framework addresses the need for a common software framework in the neuromorphic research community. As an open, modular, and extensible framework, Lava will allow researchers and application developers to build on each other’s progress and converge on a common set of tools, methods, and libraries. Lava runs seamlessly on heterogeneous architectures across conventional and neuromorphic processors, enabling cross-platform execution and interoperability with a variety of artificial intelligence, neuromorphic and robotics frameworks. Developers can begin building neuromorphic applications without access to specialized neuromorphic hardware and can contribute to the Lava code base, including porting it to run on other platforms.

Architectures for Accelerating Deep Neural Nets

"Investigators at Los Alamos National Laboratory have been using the Loihi neuromorphic platform to investigate the trade-offs between quantum and neuromorphic computing, as well as implementing learning processes on-chip,” said Dr. Gerd J. Kunde, staff scientist, Los Alamos National Laboratory. “This research has shown some exciting equivalences between spiking neural networks and quantum annealing approaches for solving hard optimization problems. We have also demonstrated that the backpropagation algorithm, a foundational building block for training neural networks and previously believed not to be implementable on neuromorphic architectures, can be realized efficiently on Loihi. Our team is excited to continue this research with the second generation Loihi 2 chip."

About Key Breakthroughs: Loihi 2 and Lava provide tools for researchers to develop and characterize new neuro-inspired applications for real-time processing, problem-solving, adaptation and learning. Notable highlights include:

  • Faster and more general optimization: Loihi 2’s greater programmability will allow a wider class of difficult optimization problems to be supported, including real-time optimization, planning, and decision-making from edge to datacenter systems.
  • New approaches for continual and associative learning: Loihi 2 improves support for advanced learning methods, including variations of backpropagation, the workhorse algorithm of deep learning. This expands the scope of adaptation and data efficient learning algorithms that can be supported by low-power form factors operating in online settings.
  • Novel neural networks trainable by deep learning: Fully programmable neuron models and generalized spike messaging in Loihi 2 open the door to a wide range of new neural network models that can be trained in deep learning. Early evaluations suggest reductions of over 60 times fewer ops per inference on Loihi 2 compared to standard deep networks running on the original Loihi without loss in accuracy3. Loihi 2 addresses a practical limitation of Loihi by incorporating faster, more flexible, and more standard input/output interfaces. 
  • Seamless integration with real-world robotics systems, conventional processors, and novel sensors: Loihi 2 addresses a practical limitation of Loihi by incorporating faster, more flexible, and more standard input/output interfaces. Loihi 2 chips will support Ethernet interfaces, glueless integration with a wider range of event-based vision sensors, and larger meshed networks of Loihi 2 chips.

More details may be found in the Loihi 2/Lava technical brief.

About the Intel Neuromorphic Research Community: The Intel Neuromorphic Research Community (INRC) has grown to nearly 150 members, with several new additions this year, including Ford, Georgia Institute of Technology, Southwest Research Institute (SwRI) and Teledyne-FLIR. New partners join a robust community of academic, government and industry partners that are working with Intel to drive advances in real-world commercial usages of neuromorphic computing. (Read what our partners are saying about Loihi technology.)

“Advances like the new Loihi 2 chip and the Lava API are important steps forward in neuromorphic computing,” said Edy Liongosari, chief research scientist and managing director at Accenture Labs. “Next-generation neuromorphic architecture will be crucial for Accenture Labs’ research on brain-inspired computer vision algorithms for intelligent edge computing that could power future extended-reality headsets or intelligent mobile robots. The new chip provides features that will make it more efficient for hyper-dimensional computing and can enable more advanced on-chip learning, while the Lava API provides developers with a simpler and more streamlined interface to build neuromorphic systems.”

Deep learning: Hardware Landscape

About the Path to Commercialization: Advancing neuromorphic computing from laboratory research to commercially viable technology is a three-pronged effort. It requires continual iterative improvement of neuromorphic hardware in response to the results of algorithmic and application research; development of a common cross-platform software framework so developers can benchmark, integrate, and improve on the best algorithmic ideas from different groups; and deep collaborations across industry, academia and governments to build a rich, productive neuromorphic ecosystem for exploring commercial use cases that offer near-term business value.

Today’s announcements from Intel span all these areas, putting new tools into the hands of an expanding ecosystem of neuromorphic researchers engaged in re-thinking computing from its foundations to deliver breakthroughs in intelligent information processing.

What’s Next: Intel currently offers two Loihi 2 based neuromorphic systems through the Neuromorphic Research cloud to engaged members of the INRC: Oheo Gulch, a single chip system for early evaluation and Kapoho Point, an eight-chip system that will be available soon.

Introduction

Recent breakthroughs in AI have swelled our appetite for intelligence in computing devices at all scales and form factors. This new intelligence ranges from recommendation systems, automated call centers, and gaming systems in the data center to autonomous vehicles and robots to more intuitive and predictive interfacing with our personal computing devices to smart city and road infrastructure that immediately responds to emergencies. Meanwhile, as today’s AI technology matures, a clear view of its limitations is emerging. While deep neural networks (DNNs) demonstrate a near limitless capacity to scale to solve large problems, these gains come at a very high price in computational power and pre-collected data. Many emerging AI applications—especially those that must operate in unpredictable real-world environments with power, latency, and data constraints—require fundamentally new approaches. Neuromorphic computing represents a fundamental rethinking of computer architecture at the transistor level, inspired by the form and function of the brain’s biological neural networks. Despite many decades of progress in computing, biological neural circuits remain unrivaled in their ability to intelligently process, respond to, and learn from real-world data at microwatt power levels and millisecond response times. Guided by the principles of biological neural computation, neuromorphic computing intentionally departs from the familiar algorithms and programming abstractions of conventional computing so it can unlock orders of magnitude gains in efficiency and performance compared to conventional architectures. The goal is to discover a computer architecture that is inherently suited for the full breadth of intelligent information processing that living brains effortlessly support.

Advances in neuromorphic computing technology

Three Years of Loihi Research

Intel Labs is pioneering research that drives the evolution of compute and algorithms toward next-generation AI. In 2018, Intel Labs launched the Intel Neuromorphic Research Community (Intel NRC) and released the Loihi research processor for external use. The Loihi chip represented a milestone in the neuromorphic research field. It incorporated self-learning capabilities, novel neuron models, asynchronous spike-based communication, and many other properties inspired from neuroscience modeling, with leading silicon integration scale and circuit speeds. Over the past three years, Intel NRC members have evaluated Loihi in a wide range of application demonstrations. Some examples include:

 • Adaptive robot arm control 

• Visual-tactile sensory perception 

• Learning and recognizing new odors and gestures 

• Drone motor control with state-of-the-art latency in response to visual input 

• Fast database similarity search • Modeling diffusion processes for scientific computing applications 

• Solving hard optimization problems such as railway scheduling In most of these demonstrations, Loihi consumes far less than 1 watt of power, compared to the tens to hundreds of watts that standard CPU and GPU solutions consume. 

With relative gains often reaching several orders of magnitude, these Loihi demonstrations represent breakthroughs in energy efficiency.1 Furthermore, for the best applications, Loihi simultaneously demonstrates state-of-the-art response times to arriving data samples, while also adapting and learning from incoming data streams. 

This combination of low power and low latency, with continuous adaptation, has the potential to bring new intelligent functionality to power- and latencyconstrained systems at a scale and versatility beyond what any other programmable architecture supports today. Loihi has also exposed limitations and weaknesses found in today’s neuromorphic computing approaches. 

While Loihi has one of the most flexible feature sets of any neuromorphic chip, many of the more promising applications stretch the range of its capabilities, such as its supported neuron models and learning rules. Interfacing with conventional sensors, processors, and data formats proved to be a challenge and often a bottleneck for performance. 

While Loihi applications show good scalability in large-scale systems such as the 768-chip Pohoiki Springs system, with gains often increasing relative to conventional solutions at larger scales, congestion in inter-chip links limited application performance. Loihi’s integrated compute-and-memory architecture foregoes off-chip DRAM memory, so scaling up workloads requires increasing the number of Loihi chips in an application. This means the economic viability of the technology depends on achieving significant improvements in the resource density of neuromorphic chips to minimize the number of required chips in commercial deployments. 

Wei Lu (U Mich) Neuromorphic Computing Based on Memristive Materials and Devices

Photonics for Computing: from Optical Interconnects to Neuromorphic Architectures

One of the biggest challenges holding back the commercialization of neuromorphic technology is the lack of software maturity and convergence. Since neuromorphic architecture is fundamentally incompatible with standard programming models, including today’s machine-learning and AI frameworks in wide use, neuromorphic software and application development is often fragmented across research teams, with different groups taking different approaches and often reinventing common functionality. 

Yet to emerge is a single, common software framework for neuromorphic computing that supports the full range of approaches pursued by the research community that presents compelling and productive abstractions to application developers. 

The Nx SDK software developed by Intel Labs for programming Loihi focused on low-level programming abstractions and did not attempt to address the larger community’s need for a more comprehensive and open neuromorphic software framework that runs on a wide range of platforms and allows contributions from throughout the community. This changes with the release of Lava.

 Intel Labs is pioneering research that drives the evolution of compute and algorithms toward next-generation AI.


Loihi 2: A New Generation of Neuromorphic Computing Architecture 

Building on the insights gained from the research performed on the Loihi chip, Intel Labs introduces Loihi 2. A complete tour of the new features, optimizations, and innovations of this chip is provided in the final section. Here are some highlights: • Generalized event-based messaging. Loihi originally supported only binary-valued spike messages. Loihi 2 permits spikes to carry integer-valued payloads with little extra cost in either performance or energy. These generalized spike messages support event-based messaging, preserving the desirable sparse and time-coded communication properties of spiking neural networks (SNNs), while also providing greater numerical precision. • Greater neuron model programmability. Loihi was specialized for a specific SNN model. Loihi 2 now implements its neuron models with a programmable pipeline in each neuromorphic core to support common arithmetic, comparison, and program control flow instructions. Loihi 2’s programmability greatly expands its range of neuron models without compromising performance or efficiency compared to Loihi, thereby enabling a richer space of use cases and applications.

 • Enhanced learning capabilities. Loihi primarily supported two-factor learning rules on its synapses, with a third modulatory term available from nonlocalized “reward” broadcasts. Loihi 2 allows networks to map localized “third factors” to specific synapses. This provides support for many of the latest neuroinspired learning algorithms under study, including approximations of the error backpropagation algorithm, the workhorse of deep learning. While Loihi was able to prototype some of these algorithms in proof-of-concept demonstrations, Loihi 2 will be able to scale these examples up, for example, so new gestures can be learned faster with a greater range of presented hand motions. 

 • Numerous capacity optimizations to improve resource density. Loihi 2 has been fabricated with a preproduction version of the Intel 4 process to address the need to achieve greater application scales within a single neuromorphic chip. Loihi 2 also incorporates numerous architectural optimizations to compress and maximize the efficiency of each chip’s neural memory resources. Together, these innovations improve the overall resource density of Intel’s neuromorphic silicon architecture from 2x to over 160x, depending on properties of the programmed networks. 

 • Faster circuit speeds. Loihi 2’s asynchronous circuits have been fully redesigned and optimized, improving on Loihi down to the lowest levels of pipeline sequencing. This has provided gains in processing speeds from 2x for simple neuron state updates to 5x for synaptic operations to 10x for spike generation.2 Loihi 2 supports minimum chip-wide time steps under 200ns; it can now process neuromorphic networks up to 5000x faster than biological neurons. 

 • Interface improvements. Loihi 2 offers more standard chip interfaces than Loihi. These interfaces are both faster and higher-radix. Loihi 2 chips support 4x faster asynchronous chip-to-chip signaling bandwidths,3 a destination spike broadcast feature that reduces interchip bandwidth utilization by 10x or more in common networks,4 and three-dimensional mesh network topologies with six scalability ports per chip. Loihi 2 supports glueless integration with a wider range of both standard chips, over its new Ethernet interface, as well as emerging event-based vision (and other) sensor devices. 

Photonic reservoir computing for high-speed neuromorphic computing applications - A.Lugnan

 Using these enhancements, Loihi 2 now supports a new deep neural network (DNN) implementation known as the Sigma-Delta Neural Network (SDNN) that provides great gains in speed and efficiency compared to the rate-coded spiking neural network approach commonly used on Loihi. SDNNs compute graded activation values in the same way that conventional DNNs do, but they only communicate significant changes as they happen in a sparse, eventdriven manner. Simulation characterizations show that SDNNs on Loihi 2 can improve on Loihi’s rate-coded SNNs for DNN inference workloads by over 10x in both inference speeds and energy efficiency.

A First Tour of Loihi 2 

 Loihi 2 has the same base architecture as its predecessor Loihi, but comes with several improvements to extend its functionality, improve its flexibility, increase its capacity, accelerate its performance, and make it easier to both scale and integrate into a larger system (see Figure 1). 

 Base Architecture Building on the strengths of its predecessor, each Loihi 2 chip consists of microprocessor cores and up to 128 fully asynchronous neuron cores connected by a network-on-chip (NoC). The neuron cores are optimized for neuromorphic workloads, each implementing a group of spiking neurons, including all synapses connecting to the neurons. All communication between neuron cores is in the form of spike messages. The number of embedded microprocessor cores has doubled from three in Loihi to six in Loihi 2. Microprocessor cores are optimized for spike-based communication and execute standard C code to assist with data I/O as well as network configuration, management, and monitoring. Parallel I/O interfaces extend the on-chip mesh across multiple chips—up to 16,384—with direct pin-to-pin wiring between neighbors. 

Programmable Photonic Integrated Circuits for Quantum Information Processing and Machine Learning

 New Functionality Loihi 2 supports fully programmable neuron models with graded spikes. Each neuron model takes the form of a program, which is a short sequence of microcode instructions describing the behavior of a single neuron. The microcode instruction set supports bitwise and basic math operations in addition to conditional branching, memory access, and specialized instructions for spike generation and probing. 

The second-generation “Loihi” processor from Intel has been made available to advance research into neuromorphic computing approaches that more closely mimic the behavior of biological cognitive processes. Loihi 2 outperforms the previous chip version in terms of density, energy efficiency, and other factors. This is part of an effort to create semiconductors that are more like a biological brain, which might lead to significant improvements in computer performance and efficiency.

Intel Announces Loihi 2, Lava Software Framework For Advancing Neuromorphic  Computing - Phoronix

The first generation of artificial intelligence was built on the foundation of defining rules and emulating classical logic to arrive at rational conclusions within a narrowly defined problem domain. It was ideal for monitoring and optimizing operations. The second generation is dominated by the use of deep learning networks to examine the contents and data that were mostly concerned with sensing and perception. The third generation of AI focuses on drawing similarities to human cognitive processes, like interpretation and autonomous adaptation. 

This is achieved by simulating neurons firing in the same way as humans’ nervous systems do, a method known as neuromorphic computing.

Neuromorphic computing is not a new concept. It was initially suggested in the 1980s by Carver Mead, who coined the phrase “neuromorphic engineering.” Carver had spent more than four decades building analytic systems that simulated human senses and processing mechanisms including sensation, seeing, hearing, and thinking. Neuromorphic computing is a subset of neuromorphic engineering that focuses on the human-like systems’ “thinking” and “processing” capabilities. Today, neuromorphic computing is gaining traction as the next milestone in artificial intelligence technology.

Intel Rolls Out New Loihi 2 Neuromorphic Chip: Built on Early Intel 4  Process

In 2017, Intel released the first-generation Loihi chip, a 14-nanometer chipset with a 60-millimeter die size. It has more than 2 billion transistors and three orchestration Lakemont cores. It also features 128 core packs and a configurable microcode engine for asynchronous spiking neural network-on-chip training. The benefit of having spiking neural networks enabled Loihi to be entirely asynchronous and event-driven, rather than being active and updating on a synchronized clock signal. When a charge builds up in the neurons, “spikes” are sent along active synapses. These spikes are mostly time-based, with time being recorded as part of the data. The core fires out its own spikes to its linked neurons when spikes accumulate in a neuron for a particular amount of time and reach a certain threshold.

Even though Loihi 2 has 128 neuromorphic cores, each core now has 8 times the number of neurons and synapses. Each of the 128 cores has 192 KB of flexible memory, compared to the prior limit of 24. Each neuron may now be assigned up to 4096 states depending on the model, compared to the previous limit of 24. The Neuron model can now be entirely programmable, similar to an FPGA, which gives it more versatility – allowing for new sorts of neuromorphic applications.

One of the drawbacks of Loihi was that spike signals were not programmable and had no context or range of values. Loihi 2 addresses all of these issues while also providing 2-10x (2X for neuron state updates, up to 10X for spike generation) faster circuits, eight times more neurons, and four times more link bandwidth for increased scalability.

Loihi 2 was created using the Intel 4 pre-production process and benefited from the usage of EUV technology in that node. The Intel 4 process allowed to halve the size of the chip from 60 mm2 to 31 mm2, with the number of transistors rising to 2.3 billion. In comparison to previous process technologies, the use of extreme ultraviolet (EUV) lithography in Intel 4 has simplified the layout design guidelines. This has allowed Loihi 2 to be developed quickly.

Programmable Photonic Circuits: a flexible way of manipulating light on chips

Support for three-factor learning rules has been added to the Loihi 2 architecture, as well as improved synaptic (internal interconnections) compression for quicker internal data transmission. Loihi 2 also features parallel off-chip connections (that enable the same types of compression as internal synapses) that may be utilized to extend an on-chip mesh network across many physical chips to create a very powerful neuromorphic computer system. Loihi 2 also features new approaches for continual and associative learning. Furthermore, the chip features 10GbE, GPIO, and SPI interfaces to make it easier to integrate Loihi 2 with traditional systems.

Loihi 2 further improves flexibility by integrating faster, standardized I/O interfaces that support Ethernet connections, vision sensors, and bigger mesh networks. These improvements are intended to improve the chip’s compatibility with robots and sensors, which have long been a part of Loihi’s use cases.

Another significant change is in the portion of the processor that assesses the condition of the neuron before deciding whether or not to transmit a spike. Earlier, users had to make such conclusions using a simple bit of arithmetic in the original processor. Now, they only need to conduct comparisons and regulate the flow of instructions in Loihi 2 thanks to a simpler programmable pipeline.

ESA+ Colloquium - Programmable Photonics - Wim Bogaerts - 3 May 2021

Intel claims Loihi 2’s enhanced architecture allows it to be compatible in carrying back-propagation processes, which is a key component of many AI models. This may help in accelerating the commercialization of neuromorphic chips. Loihi 2 has also been proven to execute inference calculations, with up to 60 times fewer operations per inference compared to Loihi – without any loss in accuracy. Often inference calculations are used by AI models to interpret given data.

The Neuromorphic Research Cloud is presently offering two Loihi 2-based neuromorphic devices to researchers. These are:

Oheo Gulch is a single-chip add-in card that comes with an Intel Arria 10 FPGA for interfacing with Loihi 2 which will be used for early assessment.

Kapoho Point, an 8-chip system board that mounts eight Loihi 2 chips in a 4×4-inch form factor, will be available shortly. It will have GPIO pins along with “standard synchronous and asynchronous interfaces” that will allow it to be used with things like sensors and actuators for embedded robotics applications

These will be available via a cloud service to members of the Intel Neuromorphic Research Community (INRC) and Lava via GitHub for free.

Intel has also created Lava to address the requirement for software convergence, benchmarking, and cross-platform collaboration in the realm of neuromorphic computing. As an open, modular, and extendable framework, it will enable academics and application developers to build on one other’s efforts and eventually converge on a common set of tools, techniques, and libraries. 

Intel Announces Loihi 2, Lava Software Framework For Advancing Neuromorphic  Computing - Phoronix

Lava operates on a range of conventional and neuromorphic processor architectures, allowing for cross-platform execution and compatibility with a variety of artificial intelligence, neuromorphic, and robotics frameworks. Users can get the Lava Software Framework for free on GitHub.

Edy Liongosari, chief research scientist and managing director for Accenture Labs believes that advances like the new Loihi-2 chip and the Lava API will be crucial to the future of neuromorphic computing. “Next-generation neuromorphic architecture will be crucial for Accenture Labs’ research on brain-inspired computer vision algorithms for intelligent edge computing that could power future extended-reality headsets or intelligent mobile robots,” says Edy.

For now, Loihi 2 has piqued the interest of the Queensland University of Technology. The institute is looking to work on more sophisticated neural modules to aid in the implementation of biologically inspired navigation and map formation algorithms. The first generation Loihi is already being used at Los Alamos National Lab to study tradeoffs between quantum and neuromorphic computing. It is also being used in the backpropagation algorithm, which is used to train neural networks.

Intel has unveiled its second-generation neuromorphic computing chip, Loihi 2, the first chip to be built on its Intel 4 process technology. Designed for research into cutting-edge neuromorphic neural networks, Loihi 2 brings a range of improvements. They include a new instruction set for neurons that provides more programmability, allowing spikes to have integer values beyond just 1 and 0, and the ability to scale into three-dimensional meshes of chips for larger systems.

The chipmaker also unveiled Lava, an open-source software framework for developing neuro-inspired applications. Intel hopes to engage neuromorphic researchers in development of Lava, which when up and running will allow research teams to build on each other’s work.

Loihi is Intel’s version of what neuromorphic hardware, designed for brain-inspired spiking neural networks (SNNs), should look like. SNNs are used in event-based computing, in which the timing of input spikes encodes the information. In general, spikes that arrive sooner have more computational effect than those arriving later.

Karlheinz meier - How neuromorphic computing may affect our future life HBP

Intel’s Loihi 2 second-generation neuromorphic processor. (Source: Intel)

Among the key differences between neuromorphic hardware and standard CPUs is fine-grained distribution of memory, meaning Loihi’s memory is embedded into individual cores. Since Loihi’s spikes rely on timing, the architecture is asynchronous.

“In neuromorphic computing, the computation is emerging through the interaction between these dynamical elements,” explained Mike Davies, director of Intel’s Neuromorphic Computing Lab. “In this case, it’s neurons that have this dynamical property of adapting online to the input it receives, and the programmer may not know the precise trajectory of steps that the chip will go through to arrive at an answer.

“It goes through a dynamical process of self-organizing its states and it settles into some new condition. That final fixed point as we call it, or equilibrium state, is what is encoding the answer to the problem that you want to solve,” Davies added. “So it’s very fundamentally different from how we even think about computing in other architectures.”

First-generation Loihi chips have thus far been demonstrated in a variety of research applications, including adaptive robot arm control, where the motion adapts to changes in the system, reducing friction and wear on the arm. Loihi is able to adapt its control algorithm to compensate for errors or unpredictable behavior, enabling robots to operate with the desired accuracy. Loihi has also been used in a system that recognizes different smells. In this scenario, it can learn and detect new odors much more efficiently than a deep learning-based equivalent. A project with Deutsche Bahn also used Loihi for train scheduling. The system reacted quickly to changes such as track closures or stalled trains.

Second-gen features

Built on a pre-production version of the Intel 4 process, Loihi 2 aims to increase programmability and performance without compromising energy efficiency. Like its predecessor, it typically consumes around 100 mW (up to 1 W).

An increase in resource density is one of the most important changes; while the chip still incorporates 128 cores, the neuron count jumps by a factor of eight.

“Getting to a higher amount of storage, neurons and synapses in a single chip is essential for the commercial viability… and commercializing them in a way that makes sense for customer applications,” said Davies.

Loihi 2 features. (Source: Intel)

With Loihi 1, workloads would often map onto the architecture in non-optimal ways. For example, the neuron count would often max out while free memory was still available. The amount of memory in Loihi 2 is similar in total, but has been broken up into memory banks that are more flexible. Additional compression has been added to network parameters to minimize the amount of memory required for larger models. This frees up memory that can be reallocated for neurons.

The upshot is that Loihi 2 can tackle larger problems with the same amount of memory, delivering a roughly 15-fold increase in neural network capacity per millimeter 2 of chip area–bearing in mind that die area is halved overall by new process technology.

Neuron programmability

Programmability is another important architectural modification. Neurons that were previously fixed-function, though configurable, in Loihi 1 gain a full instruction set in Loihi 2. The instruction set includes common arithmetic, comparison and program control flow instructions. That level of programmability would allow varied SNN types to be run more efficiently.

“This is a kind of microcode that allows us to program almost arbitrary neuron models,” Davies said. “This covers the limits of Loihi [1], and where generally we’re finding more application value could be unlocked with even more complex and richer neuron models, which is not what we were expecting at the beginning of Loihi. But now we can actually encompass that full extent of neuron models that our partners are trying to investigate, and what the computational neuroscience domain [is] proposing and characterizing.”

The Loihi 2 die is the first to be fabricated on a pre-production version of Intel 4 process technology. (Source: Intel)

Programmable Photonic Circuits

For Loihi 2, the idea of spikes has also been generalized. Loihi 1 employed strict binary spikes to mirror what is seen in biology, where spikes have no magnitude. All information is represented by spike timing, and earlier spikes would have greater computational effect than later spikes. In Loihi 2, spikes carry a configurable integer payload available to the programmable neuron model. While biological brains don’t do this, Davies said it was relatively easy for Intel to add to the silicon architecture without compromising performance.

“This is an instance where we’re departing from the strict biological fidelity, specifically because we understand what the importance is, the time-coding aspect of it,” he said. “But [we realized] that we can do better, and we can solve the same problems with fewer resources if we have this extra magnitude that can be sent alongside with this spike.”

Generalized event-based messaging is key to Loihi 2’s support of a deep neural network called the sigma-delta neural network (SDNN), which is much faster than the timing approach used on Loihi 1. SDNNs compute graded-activation values in the same way that conventional DNNs do, but only communicate significant changes as they happen in a sparse, event-driven manner.

3D Scaling

Loihi 2 is billed as up to 10 times faster than its predecessor at the circuit level. Combined with functional improvements, the design can deliver up to 10X speed gains, Davies claimed. Loihi 2 supports minimum chip-wide time steps under 200ns; it can also process neuromorphic networks up to 5,000 times faster than biological neurons.

Programmable Photonics - Wim Bogaerts - Stanford

The new chip also features scalability ports which allow Intel to scale neural networks into the third dimension. Without external memory on which to run larger neural networks, Loihi 1 required multiple devices (such as in Intel’s 768-Loihi chip system, Pohoiki Springs). Planar meshes of Loihi 1 chips become 3D meshes in Loihi 2. Meanwhile, chip-to-chip bandwidth has been improved by a factor of four, with compression and new protocols providing one-tenth the redundant spike traffic sent between chips. Davies said the combined capacity boost is around 60-fold for most workloads, avoiding bottlenecks caused by inter-chip links.

Also supported is three-factor learning, which is popular in cutting-edge neuromorphic algorithm research. The same modification, which maps third factors to specific synapses, can be used to approximate back-propagation, the training method used in deep learning. That creates new ways of learning via Loihi.

Loihi 2 will be available to researchers as a single-chip board for developing edge applications (Oheo Gulch). It will also be offered as an eight-chip board intended to scale for more demanding applications. (Source: Intel)

Lava

The Lava software framework rounds out the Loihi enhancements. The open-source project is available to the neuromorphic research community.

“Software continues to hold back the field,” Davies said. “There hasn’t been a lot of progress, not at the same pace as the hardware over the past several years. And there hasn’t been an emergence of a single software framework, as we’ve seen in the deep learning world where we have TensorFlow and PyTorch gathering huge momentum and a user base.”

While Intel has a portfolio of applications demonstrated for Loihi, code sharing among development teams has been limited. That makes it harder for developers to build on progress made elsewhere.

Promoted as a new project, not a product, Davies said Lava is intended as a way to build a framework that supports Loihi researchers working on a range of algorithms. While Lava is aimed at event-based asynchronous message passing, it will also support heterogeneous execution. That allows researchers to develop applications that initially run on CPUs. With access to Loihi hardware, researchers can then map parts of the workload onto the neuromorphic chip. The hope is that approach would help lower the barrier to entry.

“We see a need for convergence and a communal development here towards this greater goal which is going to be necessary for commercializing neuromorphic technology,” Davies said.

Loihi 2 will be used by researchers developing advanced neuromorphic algorithms. Oheo Gulch, a single-chip system for lab testing, will initially be available to researchers, followed by Kapoho Point, an eight-chip Loihi 2 version of Kapoho Bay. Kapoho Point includes an Ethernet interface designed to allow boards to be stacked for applications such as robotics requiring more computing power.

More Information:

https://www.youtube.com/c/PhotonicsResearchGroupUGentimec/videos

https://ecosystem.photonhub.eu/trainings/product/?action=view&id_form=7&id_form_data=14

https://aip.scitation.org/doi/10.1063/5.0047946

https://www.intel.com/content/www/us/en/research/neuromorphic-computing.html

https://www.intel.com/content/www/us/en/newsroom/resources/press-kits-neuromorphic-computing.html

https://www.photonics.com/Articles/Neuromorphic_Processing_Set_to_Propel_Growth_in_AI/a66821

https://www.embedded.com/intel-offers-loihi-2-neuromorphic-chip-and-software-framework/

https://github.com/Linaro/lava




Achieving Scalability in Quantum Computing

$
0
0

 


Achieving Scalability in Quantum Computing


As the path to build a quantum computer continues, challenges from across industries await solutions from this new computational power. One of the many examples of high-impact problems that can be solved on a quantum computer is developing a new alternative to fertilizer production. Making fertilizer requires a notable percentage of the world’s annual production of natural gas. This implies high cost, high energy waste, and substantial greenhouse emissions. Quantum computers can help identify a new alternative by analyzing nitrogenase, an enzyme in plants that converts nitrogen to ammonia naturally. To address this problem, a quantum computer would require at least 200 fault-free qubits—far beyond the small quantum systems of today. In order to find a solution, quantum computers must scale up. The challenge, however, is that scaling a quantum computer isn’t merely as simple as adding more qubits.

Building a quantum computer differs greatly from building a classical computer. The underlying physics, the operating environment, and the engineering each pose their own obstacles. With so many unique challenges, how can a quantum computer scale in a way that makes it possible to solve some of the world’s most challenging problems?

Experience quantum impact with Azure Quantum

Navigating obstacles

Most quantum computers require temperatures colder than those found in deep space. To reach these temperatures, all the components and hardware are contained within a dilution refrigerator—highly specialized equipment that cools the qubits to just above absolute zero. Because standard electronics don’t work at these temperatures, a majority of quantum computers today use room-temperature control. With this method, controls on the outside of the refrigerator send signals through cables, communicating with the qubits inside. The challenge is that this method ultimately reaches a roadblock: the heat created by the sheer number of cables limits the output of signals, restraining the number of qubits that can be added.

As more control electronics are added, more effort is needed to maintain the very low temperature the system requires. Increasing both the size of the refrigerator and the cooling capacity is a potential option, however, this would require additional logistics to interface with the room temperature electronics, which may not be a feasible approach.

Another alternative would be to break the system into separate refrigerators. Unfortunately, this isn’t ideal either because the transfer of quantum data between the refrigerators is likely to be slow and inefficient.

At this stage in the development of quantum computers, size is therefore limited by the cooling capacity of the specialized refrigerator. Given these parameters, the electronics controlling the qubits must be as efficient as possible.

Physical qubits, logical qubits, and the role of error correction

By nature, qubits are fragile. They require a precise environment and state to operate correctly, and they’re highly prone to outside interference. This interference is referred to as ‘noise’, which is a consistent challenge and a well-known reality of quantum computing. As a result, error correction plays a significant role.

As a computation begins, the initial set of qubits in the quantum computer are referred to as ‘physical qubits’. Error correction works by grouping many of these fragile physical qubits, which creates a smaller number of usable qubits that can remain immune to noise long enough to complete the computation. These stronger, more stable qubits used in the computation are referred to as ‘logical qubits’.

In classical computing, noisy bits are fixed through duplication (parity and Hamming codes), which is a way to correct errors as they occur. A similar process occurs in quantum computing, but is more difficult to achieve. This results in significantly more physical qubits than the number of logical qubits required for the computation. The ratio of physical to logical qubits is influenced by two factors: 1) the type of qubits used in the quantum computer, and 2) the overall size of the quantum computation performed. And due to the known difficulty of scaling the system size, reducing the ratio of physical to logical qubits is critical. This means that instead of just aiming for more qubits, it is crucial to aim for better qubits.

Quantum Algorithms Landscape

Stability and scale with a topological qubit

The topological qubit is a type of qubit that offers more immunity to noise than many traditional types of qubits. Topological qubits are more robust against outside interference, meaning fewer total physical qubits are needed when compared to other quantum systems. With this improved performance, the ratio of physical to logical qubits is reduced, which in turn, creates the ability to scale.

As we know from Schrödinger’s cat, outside interactions can destroy quantum information. Any interaction from a stray particle, such as an electron, a photon, a cosmic ray, etc., can cause the quantum computer to decohere.

There is a way to prevent this: parts of the electron can be separated, creating an increased level of protection for the information stored. This is a form of topological protection known as a Majorana quasi-particle. The Majorana quasi-particle was predicted in 1937 and was detected for the first time in the Microsoft Quantum lab in the Netherlands in 2012. This separation of the quantum information creates a stable, robust building block for a qubit. The topological qubit provides a better foundation with lower error rates, reducing the ratio of physical to logical qubits. With this reduced ratio, more logical qubits are able to fit inside the refrigerator, creating the ability to scale.

If topological qubits were used in the example of nitrogenase simulation, the required 200 logical qubits would be built out of thousands of physical qubits. However, if more traditional types of qubits were used, tens or even hundreds of thousands of physical qubits would be needed to achieve 200 logical qubits. The topological qubit’s improved performance causes this dramatic difference; fewer physical qubits are needed to achieve the logical qubits required.

Developing a topological qubit is extremely challenging and is still underway, but these benefits make the pursuit well worth the effort.

A solid foundation to tackle problems unsolved by today’s computers

A significant number of logical qubits are required to address some of the important problems currently unsolvable by today’s computers. Yet common approaches to quantum computing require massive numbers of physical qubits in order to reach these quantities of logical qubits—creating a huge roadblock to scalability. Instead, a topological approach to quantum computing requires far fewer physical qubits than other quantum systems, making scalability much more achievable.

Providing a more solid foundation, the topological approach offers robust, stable qubits, and helps to bring the solutions to some of our most challenging problems within reach.

Myth vs. reality: a practical perspective on quantum computing

There’s a lot of speculation about the potential for quantum computing, but to get a clearer vision of the future impact, we need to disentangle myth from reality. At this week’s virtual Q2B conference, we take a pragmatic perspective to cut through the hype and discuss the practicality of quantum computers, how to future-proof quantum software development, and the real value obtained today through quantum-inspired solutions on classical computers.

Azure Quantum (Itailian Video)

Achieving practical quantum advantage

Dr. Matthias Troyer, Distinguished Scientist with Microsoft Quantum, explains what will be needed for quantum computing to be better and faster than classical computing in his talk Disentangling Hype from Reality: Achieving Practical Quantum Advantage. People talk about many potential problems they hope quantum computers can help with, including fighting cancer, forecasting the weather, or countering climate change. Having a pragmatic approach to determining real speedups will enable us to focus the work on the areas that will deliver impact.

For example, quantum computers have limited I/O capability and will thus not be good at big data problems. However, the area where quantum does excel is large compute problems on small data. This includes chemistry and materials science, for game-changing solutions like designing better batteries, new catalysts, quantum materials, or countering climate change. But even for compute-intensive problems, we need to take a closer look. Troyer explains that each operation in a quantum algorithm is slower by more than 10 orders of magnitude compared to a classical computer. This means we need a large speedup advantage in the algorithm to overcome the slowdowns intrinsic to the quantum system; we need superquadratic speedups.

Troyer is optimistic about the potential for quantum computing but brings a realistic perspective to what is needed to get to practical quantum advantage: small data/big compute problems, superquadratic speedup, fault-tolerant quantum computers scaling to millions of qubits and beyond, and the tools and systems to develop the algorithms to run the quantum systems.

Experiencing Quantum impact with Microsoft today | Julie Love | Microsoft


Future-proofing quantum development

Developers and researchers want to ensure they invest in languages and tools that will adapt to the capabilities of more powerful quantum systems in the future. Microsoft’s open-source Quantum Intermediate Representation (QIR) and the Q# programming language provide developers with a flexible foundation that protects their development investments.

QIR is a new Microsoft-developed intermediate representation for quantum programs that is hardware and language agnostic, so it can be a common interface between many languages and target quantum computation platforms. Based on the popular open-source LLVM intermediate language, QIR is designed to enable the development of a broad and flexible ecosystem of software tools for quantum development.

As quantum computing capabilities evolve, we expect large-scale quantum applications will take full advantage of both classical and quantum computing resources working together. QIR provides full capabilities for describing rich classical computation fully integrated with quantum computation. It’s a key layer in achieving a scaled quantum system that can be programmed and controlled for general algorithms.

In his presentation at the Q2B conference, Future-Proofing Your Quantum Development with Q# and QIR, Microsoft Senior Software Engineer Stefan Wernli explains to a technical audience why QIR and Q# are practical investments for long-term quantum development. Learn more about QIR in our recent Quantum Blog post.

Quantum-inspired optimization solutions today

At the same time, there are ways to get practical value today through “quantum-inspired” solutions that apply quantum principles for increased speed and accuracy to algorithms running on classical computers.

We are already seeing how quantum-inspired optimization solutions can solve complex transportation and logistics challenges. An example is Microsoft’s collaboration with Trimble Transportation to optimize its transportation supply chain, presented at the Q2B conference in Freight for the Future: Quantum-Inspired Optimization for Transportation by Anita Ramanan, Microsoft Quantum Software Engineer, and Scott Vanselous, VP Digital Supply Chain Solutions at Trimble.

Trimble’s Vanselous explains how today’s increased dependence on e-commerce and shipping has fundamentally raised expectations across the supply chain. However, there was friction in the supply chain because of siloed data between shippers, carriers, and brokers; limited visibility; and a focus on task optimization vs. system optimization. Trimble and Microsoft are designing quantum-inspired load matching algorithms for a platform that enables all supply chain members to increase efficiency, minimize costs, and take advantage of newly visible opportunities. 

EdX Grover's Search Algorithm

Many industries—automotive, aerospace, healthcare, government, finance, manufacturing, and energy—have tough optimization problems where these quantum-inspired solutions can save time and money. And these solutions will only get more valuable when scaled quantum hardware becomes available and provides further acceleration.

Building a bridge to the future of supercomputing with quantum acceleration

Using supercomputing and new tools for understanding quantum algorithms in advance of scaled hardware gives us a view of what may be possible in a future with scaled quantum computing. Microsoft’s new Quantum Intermediate Representation (QIR), designed to bridge different languages and different target quantum computation platforms, is bringing us closer to that goal. Several Department of Energy (DOE) national laboratories are using this Microsoft technology in their research at the new National Quantum Initiative (NQI) quantum research centers.

As quantum computing capabilities mature, we expect most large-scale quantum applications will take full advantage of both classical and quantum computing resources working together. QIR provides a vital bridge between these two worlds by providing full capabilities for describing rich classical computation fully integrated with quantum computation.

QIR is central to a new collaboration between Microsoft and DOE’s Pacific Northwest National Laboratory (PNNL) born out of NQI’s Quantum Science Center (QSC) led by DOE’s Oak Ridge National Laboratory (ORNL). The goal of the PNNL project is to measure the impact of noisy qubits on the accuracy of quantum algorithms, specifically the Variational Quantum Eigensolver (VQE). In order to run it in simulation on the supercomputer, they needed a language to write the algorithm, and another representation to map it to run on the supercomputer. PNNL used Microsoft’s Q# language to write the VQE algorithm and then QIR provides the bridge, allowing easy translation and mapping to the supercomputer for the simulation.

The PNNL team is showcasing the simulation running on ORNL’s Summit supercomputer at this week’s virtual International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20). You can view their presentation here: Running Quantum Programs at Scale through an Open-Source, Extensible Framework.

Q# and QIR are also helping to advance research at ORNL, which is accelerating progress by enabling the use of the Q# language for all QSC members, including four national labs, three industry partners, and nine universities. ORNL is integrating Q# and QIR into its existing quantum computing framework, so ORNL researchers can run Q# code on a wide variety of targets including both supercomputer-based simulators and actual hardware devices. Supporting Q# is important to ORNL’s efforts to encourage experimentation with quantum programming in high-level languages.

The ORNL team is using QIR to develop quantum optimizations that work for multiple quantum programming languages. Having a shared intermediate representation allows the team to write optimizations and transformations that are independent of the original programming language. ORNL chose to use QIR because, being based on the popular LLVM suite, it integrates seamlessly with ORNL’s existing platform and provides a common platform that can support all of the different quantum and hybrid quantum/classical programming paradigms.

Shifting left to scale up: shortening the road to scalable quantum computing | Quantum Week 2021

Since QIR is based on the open source LLVM intermediate language, it will enable the development of a broad ecosystem of software tools around the Q# language. The community can use QIR to experiment and develop optimizations and code transformations that will be crucial for unlocking quantum computing.

Microsoft technology is playing a crucial role in DOE’s NQI initiative connecting experts in industry, national labs, and academia to accelerate our nation’s progress towards a future with scaled quantum computing.

Learn more about the latest developments in quantum computing from Microsoft and our QSC national lab partner PNNL in these virtual SC20 conference sessions.

Complex quantum programs will require programming frameworks with many of the same features as classical software development, including tools to visualize the behavior of programs and diagnose issues. The Microsoft Quantum team presents new visualization tools being added to the Microsoft Quantum Development Kit (QDK) for visualizing the execution flow of a quantum program at each step during its execution. These tools are valuable for experienced developers and researchers as well as students and newcomers to the field who want to explore and understand quantum algorithms interactively.

Dr. Krysta Svore, Microsoft’s General Manager of Quantum Systems and Software, is on this year’s exotic system panel. The SC20 panel will discuss predictions from past year sessions, what actually happened, and predict what will be available for computing systems in 2025, 2030 and 2035.

As quantum computers evolve, simulations of quantum programs on classical computers will be essential in validating quantum algorithms, understanding the effect of system noise and designing applications for future quantum computers. In this paper, PNNL researchers first propose a new multi-GPU programming methodology which constructs a virtual BSP machine on top of modern multi-GPU platforms, and apply this methodology to build a multi-GPU density matrix quantum simulator. Their simulator is more than 10x faster than a corresponding state-vector quantum simulator on various platforms.

Full stack ahead: Pioneering quantum hardware allows for controlling up to thousands of qubits at cryogenic temperatures

Quantum computing offers the promise of solutions to previously unsolvable problems, but in order to deliver on this promise, it will be necessary to preserve and manipulate information that is contained in the most delicate of resources: highly entangled quantum states. One thing that makes this so challenging is that quantum devices must be ensconced in an extreme environment in order to preserve quantum information, but signals must be sent to each qubit in order to manipulate this information—requiring, in essence, an information superhighway into this extreme environment. Both of these problems must, moreover, be solved at a scale far beyond that of present-day quantum device technology.

Microsoft’s David Reilly, leading a team of Microsoft and University of Sydney researchers, has developed a novel approach to the latter problem. Rather than employing a rack of room-temperature electronics to generate voltage pulses to control qubits in a special-purpose refrigerator whose base temperature is 20 times colder than interstellar space, they invented a control chip, dubbed Gooseberry, that sits next to the quantum device and operates in the extreme conditions prevalent at the base of the fridge. They’ve also developed a general-purpose cryo-compute core that operates at the slightly warmer temperatures comparable to that of interstellar space, which can be achieved by immersion in liquid Helium. This core performs the classical computations needed to determine the instructions that are sent to Gooseberry which, in turn, feeds voltage pulses to the qubits. These novel classical computing technologies solve the I/O nightmares associated with controlling thousands of qubits.

Quantum Algorithms for Hamiltonian Simulation | Quantum Colloquium

Quantum computing could impact chemistry, cryptography, and many more fields in game-changing ways. The building blocks of quantum computers are not just zeroes and ones but superpositions of zeroes and ones. These foundational units of quantum computation are known as qubits (short for quantum bits). Combining qubits into complex devices and manipulating them can open the door to solutions that would take lifetimes for even the most powerful classical computers.

Despite the unmatched potential computing power of qubits, they have an Achilles’ heel: great instability. Since quantum states are easily disturbed by the environment, researchers must go to extraordinary lengths to protect them. This involves cooling them nearly down to absolute zero temperature and isolating them from outside disruptions, like electrical noise. Hence, it is necessary to develop a full system, made up of many components, that maintains a regulated, stable environment. But all of this must be accomplished while enabling communication with the qubits. Until now, this has necessitated a bird’s nest-like tangle of cables, which could work for limited numbers of qubits (and, perhaps, even at an “intermediate scale”) but not for large-scale quantum computers.

Azure Quantum Developer Workshop | Part 3

Microsoft Quantum researchers are playing the long game, using a wholistic approach to aim for quantum computers at the larger scale needed for applications with real impact. Aiming for this bigger goal takes time, forethought, and a commitment to looking toward the future. In that context, the challenge of controlling large numbers of qubits looms large, even though quantum computing devices with thousands of qubits are still years in the future.

Enter the team of Microsoft and University of Sydney researchers, headed by Dr. David Reilly, who have developed a cryogenic quantum control platform that uses specialized CMOS circuits to take digital inputs and generate many parallel qubit control signals—allowing scaled-up support for thousands of qubits—a leap ahead from previous technology. The chip powering this platform, called Gooseberry, resolves several issues with I/O in quantum computers by operating at 100 milliKelvin (mK) while dissipating sufficiently low power so that it does not exceed the cooling power of a standard commercially-available research refrigerator at these temperatures. This sidesteps the otherwise insurmountable challenge of running thousands of wires into a fridge.

Harnessing the problem-solving power of quantum computing

Their work is detailed in a paper published in Nature this month, called “A Cryogenic Interface for Controlling Many Qubits.” They’ve also extended this research to create the first-of-its-kind general-purpose cryo-compute core, one step up the quantum stack. This operates at around 2 Kelvin (K), a temperature that can be reached by immersing it in liquid Helium. Although this is still very cold, it is 20 times warmer than the temperatures at which Gooseberry operates and, therefore, 400 times as much cooling power is available. With the luxury of dissipating 400 times as much heat, the core is capable of general-purpose computing. Both visionary pieces of hardware are critical advances toward large-scale quantum computer processes and are the result of years of work.

Both chips help manage communication between different parts of a large-scale quantum computer—and between the computer and its user. They are the key elements of a complex “nervous system” of sorts to send and receive information to and from every qubit, but in a way that maintains a stable cold environment, which is a significant challenge for a large-scale commercial system with tens of thousands of qubits or more. The Microsoft team has navigated many hurdles to accomplish this feat.

The big picture: Topological quantum computing and the quantum stack

Quantum computing devices are often measured by how many qubits they contain. However, all qubits are not created equal, so these qubit counts are often apples-to-oranges comparisons. Microsoft Quantum researchers are pioneering the development of topological qubits, which have a high level of error protection built in at the hardware level. This reduces the overhead needed for software-level error correction and enables meaningful computations to be done with fewer physical qubits.

Although this is one of the unique features of Microsoft’s approach, it is not the only one. In the quantum stack, qubits make up its base. The quantum plane (at the bottom of Figure 1) is made up of a series of topological qubits (themselves made up of semiconductors, superconductors, and dielectrics), gates, wiring, and other packaging that help to process information from raw qubits. The vital processes of communication occur in the next layer higher in the stack (labeled “Quantum-Classical Interface” in Figure 1 above). The Gooseberry chip and cryo-compute core work together to bookend this communication. The latter sits at the bottom of the “Classical Compute” portion of the stack, and Gooseberry is unique relative to other control platforms in that it sits right down with the qubits at the same temperature as the quantum plane—able to convert classical instructions from the cryo-compute core into voltage signals sent to the qubits.

Play it cool: Dissipating heat in a CMOS-based control platform

Why does it matter where the Gooseberry chip sits? It is partly an issue of heat. When the wires that connect the control chip to the qubits are long (as they would have to be if the control chip were at room temperature), significant heat can be generated inside the fridge. Putting a control chip near the qubits avoids this problem. The tradeoff is that the chip is now near the qubits, and the heat generated by the chip could potentially warm up the qubits. Gooseberry navigates these competing effects by putting the control chip near, but not too near, the qubits. By putting Gooseberry in the refrigerator but thermally isolated from the qubits, heat created by the chip is drawn away from the qubits and into the mixing chamber. (See Figure 2 below).

Placing the chip near the qubits at the quantum plane solves one set of problems with temperature but creates another. To operate a chip where the qubits are, it needs to function at the same temperature as the qubits—100 mK. Operating standard bulk CMOS chips at this temperature is challenging, so this chip uses fully-depleted silicon-on-insulator (FDSOI) technology, which optimizes the system for operation at cryogenic temperatures. It has a back-gate bias, with transistors having a fourth terminal that can be used to compensate for changes in temperature. This system of transistors and gates allows qubits to be calibrated individually, and the transistors send individualized voltages to each qubit.

Gates galore: No need for separate control lines from room temperature to every qubit

Another advantage of Gooseberry is that the chip is designed in such a way that the electrical gates controlling the qubits are charged from a single voltage source that cycles through the gates in a “round-robin” fashion, charging as necessary. Previous qubit controllers required one-to-one cables from multiple voltage sources at room temperature or 4K, compromising the ability to operate qubits at large scale. The design pioneered by Dr. Reilly’s team greatly reduces the heat dissipated by such a controller. The cryogenic temperatures also come into play here to make this possible—the extreme cold allows capacitors to hold their charge longer. This means that the gates need to be charged less frequently and produce less heat and other disruptions to qubit stability.

Azure Quantum Developer Workshop | July 2020

The Gooseberry chip is made up of both digital and analog blocks. Coupled digital logic circuits perform communication, waveform memory, and autonomous operation of the chip through a finite-state machine (FSM), and the digital part of the chip also includes a master oscillator (see Figure 3). The chip also uses a Serial Peripheral Interface (SPI) for easy communication higher up the quantum stack. The analog component of the chip is a series of cells, called “charge-lock fast-gate” (CLFG) cells, that perform two functions. First, the charge-lock function is the process for charging gates, as described above. The voltage stored on each gate is tailored to individual qubits. Information is processed in qubits by changing the voltages on the gate, and that happens in the second function, “fast-gating.” This creates pulses that physically manipulate the qubits, ultimately directing the processing of information in the qubits.

Benchmarking results of the cryo-CMOS control with a quantum dot chip

Low power dissipation is a key challenge when it comes to communicating with qubits efficiently via these pulses. There are three variables that impact power dissipation: voltage level, frequency, and capacitance. The voltage needed in this case is set by the qubit, and the frequency is set by both the qubit and clock rate of the quantum plane. This leaves capacitance as the only variable you can adjust to create low power dissipation when charging gates and sending pulses—low capacitance means low dissipation. The capacitors in this system are tiny, spaced close together, and are very near the quantum plane, so they require as little power as possible to shuffle charge between capacitors to communicate with the qubits.

Disentangling hype from reality: Achieving practical quantum advantage

The researchers tested the Gooseberry chip to see how it would perform by connecting it with a GaAs-based quantum dot (QD) device. Some of the gates in the quantum dot device were connected to a digital-analog converter (DAC) at room temperature to compare these results with standard control approaches. Power leakage from the CLFG cells is measured by a second quantum dot in the device, and measurements of the QD conductance provide a way to monitor the charge-locking process. The temperature of all the components of the chip are measured as the control chip is being powered up, revealing that temperature stays below 100 mK within the necessary range of frequencies or clock speeds (see figure 4). See the paper for more details on the benchmarking process.

Extrapolating these results, the researchers estimated the total system power needed for the Gooseberry control chip as a function of frequency and the number of output gates. These results take into account both the clock speed and temperature needed for topological qubits, and Figure 5 shows that this chip is able to operate within the acceptable limits while communicating with thousands of qubits. This CMOS-based control approach also appears feasible for qubit platforms based on electron spins or gatemons.

Proof of principle that general-purpose compute is possible at cryogenic temperatures

The general-purpose cryo-compute core is a recent development that continues the progress made by Gooseberry. This is a general-purpose CPU operating at cryogenic temperatures. At present, the core operates at approximately 2 K, and it handles some triggering manipulation and handling of data. With fewer limitations from temperature, it also deals with branching decision logic, which requires more digital circuit blocks and transistors than Gooseberry has. The core acts as an intermediary between Gooseberry and executable code that can be written by developers, allowing for software-configurable communication between the qubits and the outside world. This technology proves it’s possible to compile and run many different types of code (written on current tools) in a cryogenic environment, allowing for greater possibilities of what can be accomplished with qubits being controlled by the Gooseberry chip.

Journey before destination: The zen behind the Microsoft approach to quantum computers

Trapped-ion qubit, the maglev train of a quantum computer

There’s no doubt that both Gooseberry and the cryo-compute core represent big steps forward for quantum computing, and having these concepts peer-reviewed and validated by other scientists is another leap ahead. But there are still many more leaps needed by researchers before a meaningful quantum computer can be realized. This is one of the reasons Microsoft has chosen to focus on the long game. While it might be nice to ramp up one aspect of quantum computers—such as the number of qubits—there are many concepts to be developed beyond the fundamental building blocks of quantum computers, and researchers at Microsoft Quantum and the University of Sydney aren’t stopping with these results.

The Transmon qubit | QuTech Academy

 

Projects like the Gooseberry chip and cryo-compute core take years to develop, but these researchers aren’t waiting to put new quantum projects into motion. The idea is to keep scaffolding prior work with new ideas so that all of the components necessary for quantum computing at large scale will be in place, enabling Microsoft to deliver solutions to many of the world’s most challenging problems.

More Information

https://cloudblogs.microsoft.com/quantum/2018/05/16/achieving-scalability-in-quantum-computing/

https://cloudblogs.microsoft.com/quantum/2018/06/06/the-microsoft-approach-to-quantum-computing/

https://cloudblogs.microsoft.com/quantum/2021/10/07/the-azure-quantum-ecosystem-expands-to-welcome-qiskit-and-cirq-developer-community/

https://news.microsoft.com/europe/2018/09/24/microsoft-and-the-university-of-copenhagen-are-building-the-worlds-first-scalable-quantum-computer/

https://www.microsoft.com/en-us/research/research-area/quantum-computing/?facet%5Btax%5D%5Bmsr-research-area%5D%5B%5D=243138&facet%5Btax%5D%5Bmsr-content-type%5D%5B%5D=post

https://azure.microsoft.com/en-us/resources/whitepapers/search/?term=quantum

https://azure.microsoft.com/en-us/solutions/quantum-computing/#news-blogs

https://sc20.supercomputing.org/

https://www.microsoft.com/en-us/research/research-area/quantum-computing/?facet%5Btax%5D%5Bmsr-research-area%5D%5B0%5D=243138&sort_by=most-recent

https://www.microsoft.com/en-us/research/blog/state-of-the-art-algorithm-accelerates-path-for-quantum-computers-to-address-climate-change/

https://www.microsoft.com/en-us/research/blog/full-stack-ahead-pioneering-quantum-hardware-allows-for-controlling-up-to-thousands-of-qubits-at-cryogenic-temperatures/

https://arxiv.org/abs/2007.14460

https://www.microsoft.com/en-us/research/publication/quantum-computing-enhanced-computational-catalysis/

https://ionq.com/

https://www.honeywell.com/us/en/company/quantum

https://www.honeywell.com/us/en/news/2020/06/quantum-scientific-papers












Linux Kernel 30 Years

$
0
0

 

The Linux Kernel celebrates its 30th anniversary and it still has a lot to give

At the beginning of the month we released the note of the 30th anniversary of the publication of the first website, a fact that undoubtedly marked history and of which I have always related a bit to Linux, since both the publication of the first website as well as the first prototype of the Linux Kernel go hand in hand, since both were released in the same year.

As on August 25, 1991, after five months of development, 21-year-old student Linus Torvalds ad in the comp.os.minix conference call I was working on a working prototype of a new operating system Linux, for which the portability of bash 1.08 and gcc 1.40 had been completed. This first public version of the Linux kernel was released on September 17.

Kernel 0.0.1 was 62 KB in compressed form and it contained about 10 thousand lines of source code which compared to today's Linux kernel has more than 28 million lines of code.

According to a study commissioned by the European Union in 2010, the approximate cost of developing a project similar to a modern Linux kernel from scratch would have been more than a billion dollars (calculated when the kernel had 13 million lines of code), according to another estimate at more than 3 billion.

30 Years of Linux 1991-2021

A bit about Linux

The Linux kernel was inspired by the MINIX operating system, which Linus didn't like with his limited license. Later, when Linux became a famous project, the wicked they tried to accuse Linus of directly copying the code of some MINIX subsystems.

The attack was repelled by the author of MINIX, Andrew Tanenbaum, who commissioned a student to do a detailed comparison of the Minix code with the first public versions of Linux. Study results showed the presence of only four negligible code block matches due to POSIX and ANSI C requirements.

Linus originally thought of calling the kernel Freax, from free, freak and X (Unix). But the kernel got the name "Linux" with the light hand of Ari Lemmke, who, at Linus's request, put the kernel on the university's FTP server, naming the directory with the file not "freax," as Torvalds requested, but "linux."

Notably, entrepreneurial entrepreneur William Della Croce managed to trademark Linux and wanted to collect royalties over time, but then changed his mind and transferred all rights to the trademark to Linus. The official mascot for the Linux kernel, the Tux penguin, was selected through a competition held in 1996. The name Tux stands for Torvalds UniX.

Regarding the growth of the Kernel during the last 30 years:

  • 0.0.1 - September 1991, 10 thousand lines of code
  • 1.0.0 - March 1994, 176 thousand lines
  • 1.2.0 - March 1995, 311 thousand lines
  • 2.0.0 - June 1996, 778 thousand lines
  • 2.2.0 - January 1999, 1,8 million lines
  • 2.4.0 - January 2001, 3,4 million lines
  • 2.6.0 - December 2003, 5,9 million lines
  • 2.6.28 - December 2008, 10,2 million lines
  • 2.6.35 - August 2010, 13,4 million lines
  • 3.0 - August 2011, 14,6 million lines
  • 3.5 - July 2012, 15,5 million lines
  • 3.10 - July 2013, 15,8 million lines
  • 3.16 - August 2014, 17,5 million lines
  • 4.1 - June 2015, 19,5 million lines
  • 4.7 - July 2016, 21,7 million lines
  • 4.12 - July 2017, 24,1 million lines
  • 4.18 - August 2018, 25,3 million lines
  • 5.2 - July 2019, 26,55 million lines
  • 5.8 - August 2020, 28,4 million lines
  • 5.13 - June 2021, 29,2 million lines

While for the part of development and news:

  • September 1991: Linux 0.0.1, first public release that only supports i386 CPU and boots from floppy disk.
  • January 1992: Linux 0.12, the code began to be distributed under the GPLv2 license
  • March 1992: Linux 0.95, provided the ability to run the X Window System, support for virtual memory and partition swapping, and the first SLS and Yggdrasil distributions appeared.
  • In the summer of 1993, the Slackware and Debian projects were founded.
  • March 1994: Linux 1.0, first officially stable version.
  • March 1995: Linux 1.2, significant increase in the number of drivers, support for Alpha, MIPS and SPARC platforms, expansion of network stack capabilities, appearance of a packet filter, NFS support.
  • June 1996: Linux 2.0, support for multiprocessor systems.
  • January 1999: Linux 2.2, increased memory management system efficiency, added support for IPv6, implementation of a new firewall, introduced a new sound subsystem
  • February 2001: Linux 2.4, support for 8-processor systems and 64 GB of RAM, Ext3 file system, USB, ACPI support.
  • December 2003: Linux 2.6, SELinux support, automatic kernel tuning tools, sysfs, redesigned memory management system.
  • En septiembre de 2008, the first version of the Android platform based on the Linux kernel was formed.
  • In July 2011, after 10 years of development of the 2.6.x branch, the transition to 3.x numbering was made.
  • In 2015, Linux 4.0, the number of git objects in the repository has reached 4 million.
  • In April of 2018, I overcome the barrier of 6 million git-core objects in the repository.
  • In January of 2019, the Linux 5.0 kernel branch was formed.
  • Posted in August 2020, kernel 5.8 was the largest in terms of the amount of changes of all kernels during the entire life of the project.
  • In 2021, code for developing Rust language drivers was added to the next branch of the Linux kernel.

The Linux kernel is one of the most popular operating system kernels in the world. Less than 30 years after its humble beginnings in 1991, the Linux kernel now underpins modern computing infrastructure, with 2019 estimates of the number of running Linux kernels ranging upwards of twenty billion. To put that in perspective: There are about 3 Linux kernels for every living person.



The Linux kernel powers household appliances, smartphones, industrial automation, Internet data centers, almost all of the cloud, financial services, and supercomputers. It even powers a few percent of the world’s desktop systems, including the one that I am typing these words into. But the year of the Linux desktop continues its decades-long tradition of being next year.

But it wasn’t always that way.

A brave choice, big commitment in Linux’s early days

The Linux kernel was still a brave choice when IBM joined the Linux community in the late 1990s. IBM began its Linux-kernel work with a skunkworks port of Linux to the IBM mainframe and a corporate commitment that resulted in IBM’s investing $1B on Linux in 2001. Linux was ported to all IBM servers, and even to IBM Research’s Linux-powered wristwatch. Linux soon enjoyed widespread use within IBM’s hardware, software, and services.

Of course, IBM wasn’t the only player betting on Linux. For example, an IBM sales team spent much time preparing to convince a long-standing, technically conservative client to start moving towards Linux. When the team went to give their pitch, the client opened the discussion with: “We have decided that we are going with Linux. Will you be coming with us?” Although this destroyed untold hours of preparation, it produced a result beyond the sales team’s wildest imaginations.

And it wasn’t an isolated incident.

Keynote: Linus Torvalds in conversation with Dirk Hohndel

Setting Linux up for success

This widespread enthusiasm motivated IBM not only to make substantial contributions to Linux, but also to come to its defense. First, we committed to not attack Linux in the form of patent pledges. We took it a step further and opted to co-found the Open Invention Network, which helped defend open source projects such as the Linux kernel against attacks by patent holders. We made numerous visits to the courtroom to defend ourselves against a lawsuit related to our Linux involvement, and co-founded several umbrella organizations to facilitate open source projects, perhaps most notably helping to found the Linux Foundation.

IBM is also a strong technical contributor to the Linux kernel, ranking in the top ten corporate contributors and having maintainers for a wide range of Linux-kernel subsystems. Of course, IBM contributes heavily to support its own offerings, but it is also a strong contributor in the areas of scalability, robustness, security, and other areas that benefit the Linux ecosystem.

Of course, not everything that IBM attempted worked out. IBM’s scalability work in the scheduler was never accepted into the Linux kernel. Although its journaling filesystem (JFS) was accepted and remains part of the Linux kernel, it seems safe to say that JFS never achieved the level of popularity that IBM had hoped for. Nevertheless, it seems likely that IBM’s efforts helped to inspire the work leading to the Linux kernel’s excellent scalability, features, and functionality in its filesystems and scheduler.

In addition, these experiences taught IBM to work more closely with the community, paving the way to later substantial contributions. One example is the CPU-groups feature of the community’s scheduler that now underpins containers technologies such as Docker, along with the virtio feature that plays into the more traditional hypervisor-based virtualization. Another example is numerous improvements leading up to the community’s EXT4 filesystem. A final example is the device-tree hardware specification feature, originally developed for IBM’s Power servers but now also used by many embedded Linux systems.

Celebrating 30 Years of Open

Achieving impossible results

It has also been a great privilege for IBM to be involved in a number of Linux-kernel efforts that produced results widely believed to be impossible.

First, at the time that IBM joined the Linux kernel community, the kernel could scale to perhaps two or four CPUs. At the time there was a large patchset from SGI that permitted much higher scalability, but this patchset primarily addressed HPC workloads. About ten years of hard work across the community changed this situation dramatically, so that, despite the naysayers, the same Linux-kernel source code supports both deep embedded systems and huge servers with more than one thousand CPUs and terabytes of memory.

Second, it was once common knowledge that achieving sub-millisecond response times required a special-purpose, real-time operating system. In other words, sub-millisecond response times certainly could not be achieved by a general-purpose operating system such as the Linux kernel. IBM was an important part of a broad community effort that proved this to be wrong, as part of an award-winning effort including Raytheon and the US Navy. Although the real-time patchsett has not yet been fully integrated into the mainline Linux kernel, it does achieve not merely deep sub-millisecond response times, but rather deep sub-hundred-microsecond response times. And these response times are achieved not only on single-CPU embedded system, but also on systems with thousands of CPUs.

Third, only about a decade ago, it was common knowledge that battery-powered embedded systems required special-purpose operating systems. You might be surprised that IBM would be involved in kernel work in support of such systems. One reason for IBM’s involvement was that some of the same code that improves battery lifetime also improves the Linux kernel’s virtualization capabilities — capabilities important to the IBM mainframe. A second reason for IBM’s involvement was the large volume of ARM chips then produced by its semiconductor technology partners. This latter reason motivated IBM to cofound the Linaro consortium, which improved Linux support for ARM’s processor families. The result, as billions of Android smartphone users can attest, is that the Linux kernel has added battery-powered systems to its repertoire.

Fourth and finally, version 5.2 of the Linux kernel comprises 13,600 changes from 1,716 kernel developers. The vast majority of these changes were applied during the two-week merge window immediately following the release of version 5.1, with only a few hundred new changes appearing in each of the release candidates that appear at the end of each weekend following the merge window. This represents a huge volume of changes from a great many contributors, and with little formal coordination. Validating these changes is both a challenge and a first-class concern.

One of IBM’s contributions to validation is “-next” integration testing, which checks for conflicts among the contributions intended for the next merge window. The effects of -next integration testing, when combined with a number of other much-appreciated efforts throughout the community, has not been subtle. Ten years ago, serious kernel testing had to wait for the third or fourth release candidate due to bugs introduced during the preceding merge window. Today, serious kernel testing can almost always proceed with the first release candidate that comes out immediately at the close of the merge window.

But is Linux done yet?

Not yet.

Happy Birthday Linux! Plus Highlights of DLN MegaFest Celebrating 30 Years of Linux!

A continuing effort

Although developers should be proud of the great increases in stability of the Linux kernel over the years, the kernel still has bugs, some of which are exploitable. There is a wide range of possible improvements from more aggressively applying tried testing techniques to over-the-horizon research topics such as formal verification. In addition, existing techniques are finding new applications, so that CPU hotplug (which was spearheaded by IBM in the early 2000s) has recently been used to mitigate hardware side-channel attack vectors.

The size of hardware systems is still increasing, which will require additional work on scalability. Many of these larger systems will be used in various cloud-computing environments, some of which will pose new mixed-workload challenges. Changing hardware devices, including accelerators and non-volatile memory, will require additional support from Linux, as well as from hardware features such as IBM’s Power Systems servers’ support of the NVLink and CAPI interconnects.

Finally, there is more to security than simply fixing bugs faster than attackers can exploit them (though that would be challenge enough!). Although there is a great deal of security work needed in a great many areas, one important advance is Pervasive Encryption for IBM Z.

IBM congratulates the Linux kernel community on its excellent progress over the decades, and looks forward to being part of future efforts overturning yet more morsels of common wisdom!

Linux Kernel Internals

What 30 Years of Linux Taught the Software Industry

Linux has become the largest collaborative development project in the history of computing over the last 30 years. Reflecting on what made this possible and how its open source philosophy finally imposed itself in the industry can offer software vendors valuable lessons from this amazing success story.

The web may not have reached full adulthood yet, but it has already crafted its own mythology.

August 25, 1991: Linus Torvalds, a 21-year-old university student from Finland, writes a post to a Usenet group: “Hello everybody out there using minix — I’m doing a (free) operating system (just a hobby, won’t be big and professional like gnu) for 386 (486) AT clones […]”. A few weeks later, the project, which will eventually be known as Linux, is published for the first time.

This is the starting point of an epic that few could have foreseen.

Fast-forward 30 years and the Linux kernel isn’t only running on most of the web servers and smartphones around the globe, but it also supports virtually all of the much more recent cloud infrastructure. Without open source programs like Linux, cloud computing wouldn’t have happened.

Among the major factors that propelled Linux to success is security. Today, the largest software companies in the world are taking open source security to new levels, but the Linux project was one of the first to emphasize this.

HotCloud '14 - Cloudy with a Chance of …

How Linux Became the Backbone of the Modern IT World

Brief History

Open source predates the Linux project by many years and is arguably as old as software itself. Yet, it is the success of the latter that propelled this movement in the 1990s. When it was first submitted for contribution in 1991 by Torvalds, the Linux kernel was the GNU project’s ‘missing link’ to a completely free software operating system, which could be distributed and even sold without restrictions. In the following years, and as the project started to incorporate proprietary licensed components and grow in popularity, a clarification on the meaning of “free software” became necessary.

This led to the coining of the term “open source” as we use it today, thanks in part to Eric Raymond’s seminal paper The Cathedral and the Bazaar, a “reflective analysis of the hacker community and free software principles.” Open source was chosen to qualify software in which the source code is available to the general public for use or modification from its original design, depending on the terms of the license. People may then download, modify and publish their version of source code (fork) back to the community.

Open source projects started gaining traction in the late nineties thanks to the popularity of software like Apache HTTP Server, MySQL and PHP to run the first dynamic websites on the internet.

Facts and Figures

Today, not only is Linux powering most of the digital era, but open source has become the leading model for how we build and ship software. Though most people don’t realize it, much of the technology we rely on every day runs on free and open source software (FOSS). Phones, cars, planes and even many cutting-edge artificial intelligence programs use open source software. According to the Linux Foundation, 96.3% of the world’s top one million servers run on Linux and 95% of all cloud infrastructure operates on it. Other infrastructure also relies on open source: 70% of global mobile subscribers use devices running on networks built using ONAP (Open Network Automation Platform).

Linux adoption is very high in professional IT, where it’s become a de facto standard, especially with the advent of the cloud era. In fact, 83.1% of developers said Linux is the platform they prefer to work on. This success is due, in large part, to the community that contributed to its source code since its creation: More than 15,000 developers from more than 1,500 companies. Linux went on to become, arguably, the biggest success story of the free software movement, proving that open source could lead to the creation of software as powerful as any sold by a corporation.

The Linux Foundation, a non-profit technology consortium founded in 2000 to support the collaborative development of Linux and OS software projects, is itself a big success. It now has more than 100 projects under its umbrella, spread across technology sectors like artificial intelligence, autonomous vehicles, networking and security. Several subset foundations have also emerged over the years, including the Cloud Foundry Foundation, the influential Cloud Native Computing Foundation, and the recently announced Open Source Security Foundation. The Foundation estimates the total shared value created from the collective contributions of its community to a whopping $54.1 billion.

All these achievements may not have been possible without the embrace of open source by the enterprise world, which may represent its biggest win.

New Generation of Mainframers - John Mertic, The Linux Foundation & Len Santalucia, Vicom Infinity

Enterprise Adoption

Companies began to realize that many open source projects were easier and cheaper to implement than asking their developers to build the basic pieces of an internet business over and over again from scratch.

Twenty years ago, most businesses ran atop proprietary software from Microsoft, Oracle and IBM, and the idea of collaborating on big software projects might have sounded laughable to them. Today, these companies, along with relative newcomers such as Google, Facebook and Amazon, are not only employing thousands of full-time contributors to work on open source projects like Linux, they also regularly choose to open source some of their state-of-the-art projects; from Google Brain’s machine learning platform TensorFlow and container orchestration platform Kubernetes to Facebook’s React.

There’s no question that open source software created a new wave of business opportunities. As more companies took an interest in open source projects, they realized they didn’t necessarily have the in-house expertise to manage those projects themselves and turned to startups and larger companies for help.

Even Microsoft, which famously warred against the very concept of Linux for nearly a decade, made a strategic shift to embrace open source in the 2010s, led by CEO Satya Nadella. The IT giant finally joined the Linux Foundation in 2016 and acquired GitHub, the largest host for open source projects, two years later. It has since become one of the biggest sponsors of open source projects.

As a consequence, the stakes have been raised for open source software, which is the engine powering the shift toward the cloud for virtually every company. In this context, security is becoming a topic of the utmost importance, and the commitment to secure the open source ecosystem is growing fast.

LINUX Kernel

Setting a Standard for Security and Trust

Open Source Security Issues

Following the OSS adoption boom, the sustainability, stability and security of these software packages is now a major concern for every company that uses them.

The Census II report on structural and security complexities in the modern-day supply chain “where open source is pervasive but not always understood” revealed two concerning trends that could make FOSS more vulnerable to security breaches. First, the report said it is common to see popular packages published under individual developers’ accounts, raising the issue of security and reliability. Second, it is very common to see outdated versions of open source programs in use, meaning they contain fewer security patches.

The 2021 OSSRA report agrees: “98% of the codebases audited over the past year contain at least one open source component, with open source comprising 75% of the code overall.” The report also noted that 85% of the audited codebases contained components “more than four years out of date”.

This highlights the mounting security risk posed by “unmanaged” open source: “84% of audited codebases containing open source components with known security vulnerabilities, up from 75% the previous year. Similarly, 60% of the codebases contained high-risk vulnerabilities, compared to 49% just 12 months prior.” Not only is the security posture affected, but there are also compliance issues that can arise from unsupervised integration of open source content because licenses can be conflicting or even absent.

Because large corporations are now a big part of the open source ecosystem, their sponsorship is a welcome source of financing for many people whose work had been done for free until now, yet it may not be enough. The open source community is well-known for its commitment to independence, its sense of belonging and its self-sufficiency, and expecting contributors to voluntarily address security issues is unlikely to succeed.

This is where the experience of building Linux over 30 years and coordinating the work of thousands of individual contributors may be an example to follow.

Linux on IBM Z and LinuxONE: What's New

Linux Foundations

In Linux kernel development, security is taken very seriously. Because it is an underlying layer for so many public and private software ‘bricks’ in the digital world means that any mistake can cost millions to businesses, if not lives. Since the beginning, it has adopted a decentralized development approach with a large number of contributors collaborating continuously. Therefore, it has consolidated a strong peer-reviewing process as the community development effort grew and expanded.

The last stable release at the time of writing is 5.14, released on August 29th, 2021, only a few days before the 30th birthday of the project. The most important features in the release are security-related: One is intended to help mitigate processor-level vulnerabilities like Spectre and Meltdown and the other concerns system memory protection, which is a primary attack surface to exploit. Each Linux kernel release sees close to 100 new fixes per week committed by individuals and professionals from the likes of Intel, AMD, IBM, Oracle and Samsung.

With such broad adoption and long history, the Linux project has reached a level of maturity that few, if any, other FOSS projects have seen. The review process and release model have built confidence for numerous downstream vendors. Although the world is not perfect and it is arguably difficult for them to keep up with such a high rate of change, they can at least benefit from strong security enforcement mechanisms and they can adapt their security posture in concordance to their “risk appetite”: Vendors are able to do the calculus of determining how old a kernel they can tolerate exposing users to.

A maintainable, scalable, and verifiable SW architectural design model

Pushing the Boundaries of Open Source Security

Heartbleed and the Fragility of OS Security

In April 2014, a major security incident affecting the OpenSSL cryptography library; disclosed as “Heartbleed.” The developer who introduced the bug acknowledged that, though he was working on the project with a handful of other engineers:

“I am responsible for the error because I wrote the code and missed the necessary validation by an oversight. Unfortunately, this mistake also slipped through the review process and therefore made its way into the released version.” OpenSSL, an open source project, is widely used to implement the Transport Layer Security (TLS) protocol. In other words, it’s a fundamental piece used to secure a large part of the web.

Open source was seen as fundamentally secure for a long time because the more people examine a line of code, the better the chances of spotting any weakness. Additionally, this model prevents “security by obscurity,” whereby the bulk of the protection comes from people not knowing how the security software works—which can result in the whole edifice tumbling down if that confidential information is released or discovered externally.

This incident was a major turning point for a large share of the biggest web corporations: They realized that many open source technologies underpinning their core operations could not be “assumed to be secure” anymore. Any human error could have huge implications; therefore, a specific effort had to be made to improve the security in this specific space.

Linux Kernel Development

A New Era for Open Source

As we advance in an era where open source is omnipresent in codebases, tooling, networks and infrastructure and is even in fields other than software, security awareness is starting to take hold. But it needs a lot more work.

A big part of the challenge, to begin with, is for the industry to understand the scope of the problem.

Google just announced that it will be committing “$100 million to support third-party foundations that manage open source security priorities and help fix vulnerabilities.”

The Secure Open Source (SOS) pilot program, run by the Linux Foundation, will reward developers for enhancing the security of critical open source projects that we all depend on.

In doing so, Google leads the way in enlarging the financial sponsorship of big players like companies and governments — which are increasingly sponsoring open source both directly and indirectly. However, they also recommend that organizations “understand the impact they have on the future of the FOSS ecosystem and follow a few guiding principles.”

What could these principles look like?

Modernizing on IBM Z Made Easier With Open Source Software

A Roadmap to Safely Use and Contribute to Open Source

The Linux Foundation proposed a specific Trust and Security Initiative which describes a collection of eight best practices (with three degrees of maturity) open source teams should use to secure the software they produce as well as by a larger audience to “raise the collective security bar.” Here they are:

Clarifying the roles and responsibilities, and making sure everyone is aware of their security responsibilities across the organization.

Setting up a security policy for everyone; in other words, a clear north star for all members of the organization.

‘Know your contributors’ is defined as a set of practices to make risk-based decisions on who to trust and fight offensive cyberwarfare techniques, such as the poisoning of upstream code.

Locking down the software supply chain: This has become a preferred target as adversaries clearly understood that they can have a bigger and more effective impact with less effort than targeting individual systems.

Provide technical security guidance to narrow potential solutions down to the more appropriate ones in terms of security.

Deploy security playbooks to define how to do specific security processes, specifically incident response and vulnerability management processes, like creating roles and responsibilities or publishing security policies. This may feel formal, antiquated and old-school but having pre-defined playbooks means that teams can focus on shipping software and not learning how to do security, especially at the least convenient and most stressful time.

Securing Linux VM boot with AMD SEV measurement - Dov Murik & Hubertus Franke, IBM Research

Develop security testing techniques with automated testing strongly recommended since it scales better, has less friction and less cost to the teams and aligns well to modern continuous delivery pipelines.

However, the authors of the guide are aware that some major challenges are still facing the industry and, as such, need to be addressed. They mention:

  • The lack of open source security testing tools
  • The fact that open source package distribution is broken
  • The fact that the CVE format for vulnerability disclosure is also broken

The lack of a standard for a security build certificate which would allow any consumer to transparently verify that a product or component complies with the announced specifications

“The types of verification can and should include the use of automated security tools like SAST, DAST and SCA, as well as verification of security processes like the presence of security readmes in repos and that security response emails are valid.”

A scheme like this could have a significant and lasting effect on the security quality of open source software and the internet at large.

The Linux project, born 30 years ago, is present in all layers of the modern software stack today. It is used by all the largest server clusters powering the modern web and any business going digital will use it at some point. This unparalleled longevity and success have demonstrated that the open source model was compatible with the requirements of enterprise-grade services and economically viable. Now that open source is all the rage in the software industry, a consensus and action plan on how to ensure the sustainability of this ecosystem becomes urgent. The top priority for businesses that depend on it is to adopt strong application security guidelines, like the ones promoted by the Linux Foundation, which have proven their value.

One last note on the nature of open source: As businesses are now much more bound by the common use of open source components to build upon, they should not fall into the “tragedy of the commons” trap. This would mean waiting until others take action; for instance, to improve the global software security landscape. This might be one of the biggest challenges confronting our highly collaborative industry.

More Information:

https://www.linuxadictos.com/en/hoy-el-kernel-de-linux-cumple-su-30-aniversario-y-aun-le-queda-mucho-por-dar.html

https://developer.ibm.com/blogs/ibm-and-the-linux-kernel/

https://devops.com/what-30-years-of-linux-taught-the-software-industry/

https://www.howtogeek.com/754345/linux-turns-30-how-a-hobby-project-conquered-the-world/


Cyber Ark Security and Ansible Automation

$
0
0

 


CyberArk offers enhanced end-to-end security for critical assets

As an established leader in privileged access management and identity security capabilities, CyberArk helps the world’s leading organizations secure their most critical digital assets. CyberArk partnered to build Red Hat certified integrations, which offer joint customers endto- end security for Red Hat OpenShift Container Platform and Red Hat Ansible Automation Platform. This unique offering allows Red Hat and CyberArk to increase revenue and grow Partner resources their accounts through more efficient and secure business solutions.

“It’s really rewarding to see this win-win

partnership between Red Hat and CyberArk

that truly benefits both companies—

and their customers.”

Protecting Critical Digital AssetsWorldwide

For more than a decade, the world’s leading organizations have trusted CyberArk to help them secure their most critical digital assets. Today, the growing software security company protects more than 6,600 global businesses—including most of the Fortune 500—and a majority of Fortune banks and insurance, pharmaceutical, energy, and manufacturing companies rely on CyberArk. With its U.S. headquarters in Massachusetts and main office in Illinois, CyberArk offers customers solutions focused on privileged access management (PAM) identity security and DevSecOps. More than 2,000 staff members, located in offices around the globe, help security leaders get ahead of cyber threats, specifically cyberattacks against an organization’s most critical assets. “Typically the most privileged users inside an organization have access to the most sensitive information,” said John Walsh, Senior Product Marketing Manager at CyberArk. But the lines between the trusted insider, third-party vendor, and outsiders have started to blur and even disappear as sophisticated supply chain attacks like SolarWinds materialize. “It’s zero-trust—you really can’t tell who the outsiders and the insiders are anymore.” CyberArk is helping customers move their critical strategies forward more securely. Work from home dynamics and demand for efficiency have motivated companies to accelerate their digital transformations and cloud migration plans. And, with that, customers have a heightened sense of urgency around CyberArk’s PAM and identity solutions.

Secure Automation Secrets

Offering customers end-to-end security

A Red Hat partner since 2016, CyberArk is in the top four strategic security partners, and one of the highest revenue generating in the Global Partner Alliances (GPA) program, specifically in the security segment. GPA helps Red Hat partners build, expand, and sell software applications. As a result of the continued collaboration between the two organizations, CyberArk was recently awarded the Collaboration Technology Independent Software Vendor (ISV) Partner Of The Year, announced at Red Hat Summit 2021. The partnership is not localized to a specific region—it covers North America, Europe, the Middle East and Africa, and Asia Pacific and Japan, across a range of sectors. Red Hat elevated CyberArk to Globally Managed Partner in 2018. “We have a dedicated Red Hat resource,” said Joanne Wu, VP of Business Development at CyberArk. “When Red Hat runs campaigns or events, or goes to market, we are often, if not always, one of the top partners approached for these invitationonly strategic initiatives.” The partnership offers Red Hat customers enhanced security for Red Hat OpenShift and Red Hat Ansible Automation Platform. “Just like any good partnership, we complement and support each other as Red Hat is a market leader in container management and automation,” said Walsh, “while CyberArk is a market leader in privileged access management and identity security. Together, we not only help each other, but we also offer a better solution to our customers.” Red Hat and CyberArk collaborate on rich content such as whitepapers, a hands-on workshop that shows how the technologies integrate, and videos to increase customer skill levels.

Integrating leading solutions

Red Hat and CyberArk work together on Red Hat certified integrations to offer a solution that secures secrets and credentials in Red Hat Ansible Automation Platform and within the DevOps environments of Red Hat OpenShift. Red Hat OpenShift is an enterprise-ready Kubernetes container platform with full-stack automated operations to manage hybrid cloud, multicloud, and edge deployments. “CyberArk secures application secrets and the access they provide for Red Hat technologies, rotating them, auditing, and authenticating access according to best practices,” said Walsh. CyberArk’s Conjur provides a comprehensive, centralized solution for securing credentials and secrets for applications, containers, and continuous integration and continuous delivery (CI/CD) tools across native cloud and DevOps environments. CyberArk Conjur integrates with Red Hat OpenShift to provide ways to simplify and strengthen security by safeguarding the credentials used by applications running in OpenShift containers. CyberArk and Red Hat provide more than 10 integrations to enhance security and protect automation environments for Red Hat OpenShift and Red Hat Ansible Automation Platform. CyberArk makes these available as certified integrations on its marketplace, empowering DevOps and security teams to automatically secure and manage the credentials and secrets used by IT resources and CI/CD tools. These integrations simplify how operations teams write and use playbooks to more securely access credentials. Credentials are centrally managed and secured by CyberArk. Secrets used by Ansible Playbooks are automatically secured and rotated by CyberArk based on the organization’s policy.

Building a strong alliance Increased revenue year over year

Red Hat and CyberArk increase revenue for each other through their partnership, with revenue growing year over year. “CyberArk influences a Red Hat deal being closed and, vice versa, Red Hat helps CyberArk to find opportunities and close deals,” said Wu. “Both companies benefit from the value proposition. It’s a true win-win.” By mutually developing their pipeline over the years, both Red Hat and CyberArk have witnessed exponential growth in the number of accounts where they jointly present their value proposition.

Opened access to the wider organization

Red Hat helps CyberArk gain access to the DevOps team, and CyberArk helps Red Hat gain access to security teams. “CyberArk is mostly speaking to the security teams, all the way up to the CSO [Chief Security Officer],” said Wu. “Red Hat has given us visibility to the infrastructure side of the house.” Most importantly, the partnership with Red Hat helps CyberArk build relationships with DevOps teams using Ansible Automation Platform for their CI/CD pipeline, and looking for security solutions. CyberArk is then able to include security solutions with those DevOps projects. “Red Hat has helped CyberArk reach the IT organization,” said Walsh. “Red Hat enables CyberArk to provide our security solutions and Red Hat integrations as a stronger solution, to raise awareness, and to expand our market reach.”

Securing Your Digital Transformation - CyberArk and Red Hat Integration

Stayed aware of the latest developments

CyberArk’s close relationship with Red Hat means it is always fully informed about how Red Hat technologies are evolving, and, with that, it can ensure its security solutions are always fully aligned with new Red Hat features and products. “Having visibility into the Red Hat Ansible Automation Platform roadmap means we can stay ahead while developing our integrations,” said Wu. When Red Hat released Ansible security automation, CyberArk was one of the first ISVs to develop an integration. And when Ansible Automation Platform first included collections, CyberArk quickly packaged its collection to ensure it was available on Ansible Automation Hub. Enhanced security for users The partnership ensures customers get a more efficient and hardened implementation, whether with Red Hat OpenShift or Red Hat Ansible Automation Platform. Joint customers can find CyberArk’s Red Hat Certified integrations on the Red Hat Ecosystem Catalog and Ansible Automation Hub. CyberArk also has native integration with Ansible Automation Platform, built in at the product level. The integrations are not only free but also jointly supported by both Red Hat and CyberArk. Customers do not need to invest any development resources because the integrations do not require any code.

Ansible and Cyber Ark Security

Expanding on successes with Red Hat

CyberArk is a leader in PAM and identity security. Red Hat is a leader in DevOps and hybrid cloud technology. Their strong alliance offers significant benefits and value for customers. “It’s really rewarding to see this win-win partnership between Red Hat and CyberArk that truly benefits both

companies—and their customers,” said Wu.

As an established leader in privileged access management and identity security capabilities, CyberArk helps the world’s leading organizations secure their most critical digital assets. CyberArk partnered to build Red Hat certified integrations, which offer joint customers end-to-end security for Red Hat OpenShift Container Platform and Red Hat Ansible Automation Platform. This unique offering allows Red Hat and CyberArk to increase revenue and grow their accounts through more efficient and secure business solutions. 

Benefits

  • Increased revenue year over year
  • Opened access to customer’s wider IT organization to build new relationships
  • Enhanced security for users by automating credentials management
  • Protecting critical digital assets worldwide

For more than a decade, the world’s leading organizations have trusted CyberArk to help them secure their most critical digital assets. Today, the growing software security company protects more than 6,600 global businesses—including most of the Fortune 500—and a majority of Fortune banks and insurance, pharmaceutical, energy, and manufacturing companies rely on CyberArk.

Red Hat Ansible Security Automation Overview

With its U.S. headquarters in Massachusetts and main office in Illinois, CyberArk offers customers solutions focused on privileged access management (PAM) identity security and DevSecOps. More than 2,000 staff members, located in offices around the globe, help security leaders get ahead of cyber threats, specifically cyberattacks against an organization’s most critical assets. “Typically the most privileged users inside an organization have access to the most sensitive information,” said John Walsh, Senior Product Marketing Manager at CyberArk. But the lines between the trusted insider, third-party vendor, and outsiders have started to blur and even disappear as sophisticated supply chain attacks like SolarWinds materialize. “It’s zero-trust—you really can’t tell who the outsiders and the insiders are anymore.”

CyberArk is helping customers move their critical strategies forward more securely. Work from home dynamics and demand for efficiency have motivated companies to accelerate their digital transformations and cloud migration plans. And, with that, customers have a heightened sense of urgency around CyberArk’s PAM and identity solutions. 

Offering customers end-to-end security for Ansible and OpenShift

A Red Hat partner since 2016, CyberArk is in the top four strategic security partners, and one of the highest revenue generating in the Global Partner Alliances (GPA) program, specifically in the security segment. GPA helps Red Hat partners build, expand, and sell software applications. As a result of the continued collaboration between the two organizations, CyberArk was recently awarded the Collaboration Technology Independent Software Vendor (ISV) Partner Of The Year, announced at Red Hat Summit 2021.

The partnership is not localized to a specific region—it covers North America, Europe, the Middle East and Africa, and Asia Pacific and Japan, across a range of sectors. Red Hat elevated CyberArk to Globally Managed Partner in 2018. “We have a dedicated Red Hat resource,” said Joanne Wu, VP of Business Development at CyberArk. “When Red Hat runs campaigns or events, or goes to market, we are often, if not always, one of the top partners approached for these invitation-only strategic initiatives.”

The partnership offers Red Hat customers enhanced security for Red Hat OpenShift and Red Hat Ansible Automation Platform. “Just like any good partnership, we complement and support each other as Red Hat is a market leader in container management and automation,” said Walsh, “while CyberArk is a market leader in privileged access management and identity security. Together, we not only help each other, but we also offer a better solution to our customers.” 

Red Hat and CyberArk collaborate on rich content such as whitepapers, a hands-on workshop that shows how the technologies integrate, and videos to increase customer skill levels.

CyberArk Secrets Management in Red Hat OpenShift

CyberArk Secrets Management in Red Hat OpenShift

Integrating leading solutions

Red Hat and CyberArk work together on Red Hat certified integrations to offer a solution that secures secrets and credentials in  Red Hat Ansible Automation Platform and within the DevOps environments of Red Hat OpenShift. Red Hat OpenShift is an enterprise-ready Kubernetes container platform with full-stack automated operations to manage hybrid cloud, multicloud, and edge deployments. “CyberArk secures application secrets and the access they provide for Red Hat technologies, rotating them, auditing, and authenticating access according to best practices,” said Walsh.

CyberArk’s Conjur provides a comprehensive, centralized solution for securing credentials and secrets for applications, containers, and continuous integration and continuous delivery (CI/CD) tools across native cloud and DevOps environments. CyberArk Conjur integrates with Red Hat OpenShift to provide ways to simplify and strengthen security by safeguarding the credentials used by applications running in OpenShift containers.

AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer and OpenSCAP

CyberArk and Red Hat provide more than 10 integrations to enhance security and protect automation environments for Red Hat OpenShift and Red Hat Ansible Automation Platform. CyberArk makes these available as certified integrations on its marketplace, empowering DevOps and security teams to automatically secure and manage the credentials and secrets used by IT resources and CI/CD tools. 

These integrations simplify how operations teams write and use playbooks to more securely access credentials. Credentials are centrally managed and secured by CyberArk. Secrets used by Ansible Playbooks are automatically secured and rotated by CyberArk based on the organization’s policy.

Building a strong alliance: Red Hat and CyberArk increase revenue through partnership

Increased revenue year over year

Red Hat and CyberArk increase revenue for each other through their partnership, with revenue growing year over year. “CyberArk influences a Red Hat deal being closed and, vice versa, Red Hat helps CyberArk to find opportunities and close deals,” said Wu. “Both companies benefit from the value proposition. It’s a true win-win.” 

By mutually developing their pipeline over the years, both Red Hat and CyberArk have witnessed exponential growth in the number of accounts where they jointly present their value proposition.

Shifting Security Left: Streamlining Enterprise Secrets Management With CyberArk & Red Hat OpenShift

Opened access to the wider organization

Red Hat helps CyberArk gain access to the DevOps team, and CyberArk helps Red Hat gain access to security teams. “CyberArk is mostly speaking to the security teams, all the way up to the CSO [Chief Security Officer],” said Wu. “Red Hat has given us visibility to the infrastructure side of the house.”

Most importantly, the partnership with Red Hat helps CyberArk build relationships with DevOps teams using Ansible Automation Platform for their CI/CD pipeline, and looking for security solutions. CyberArk is then able to include security solutions with those DevOps projects. “Red Hat has helped CyberArk reach the IT organization,” said Walsh. “Red Hat enables CyberArk to provide our security solutions and Red Hat integrations as a stronger solution, to raise awareness, and to expand our market reach.”

Stayed aware of the latest developments

CyberArk’s close relationship with Red Hat means it is always fully informed about how Red Hat technologies are evolving, and, with that, it can ensure its security solutions are always fully aligned with new Red Hat features and products. “Having visibility into the Red Hat Ansible Automation Platform roadmap means we can stay ahead while developing our integrations,” said Wu.

When Red Hat released Ansible security automation, CyberArk was one of the first ISVs to develop an integration. And when Ansible Automation Platform first included collections, CyberArk quickly packaged its collection to ensure it was available on Ansible Automation Hub.

Container Technologies and Transformational value

Enhanced security for users

The partnership ensures customers get a more efficient and hardened implementation, whether with Red Hat OpenShift or Red Hat Ansible Automation Platform. 

Joint customers can find CyberArk’s Red Hat Certified integrations on the Red Hat Ecosystem Catalog and Ansible Automation Hub. CyberArk also has native integration with Ansible Automation Platform, built in at the product level.

The integrations are not only free but also jointly supported by both Red Hat and CyberArk. Customers do not need to invest any development resources because the integrations do not require any code.

Expanding on successes with Red Hat

Looking to the future, CyberArk is planning to build on its already strong partnership with Red Hat. “We’ve had a tremendous co-selling effort in the U.S. and EMEA [Europe, Middle East, and Africa], and I’d like to see that expand even more so to APJ [Asia Pacific and Japan] and South America,” said Wu. “And we’re also planning to get closer and increase reach in the public sector.” 

The security solutions company is also eager to expand its Red Hat Ansible Automation Platform integrations. CyberArk will soon be the first partner to develop a reference architecture with Ansible Automation Platform.

CyberArk is a leader in PAM and identity security. Red Hat is a leader in DevOps and hybrid cloud technology. Their strong alliance offers significant benefits and value for customers. “It’s really rewarding to see this win-win partnership between Red Hat and CyberArk that truly benefits both companies—and their customers,” said Wu. 

(OCB) Identity, Access and Security Management for DevOps: RedHat and CyberArk

The Inside Playbook

Automating Security with CyberArk and Red Hat Ansible Automation Platform

Proper privilege management is crucial with automation. Automation has the power to perform multiple functions across many different systems. When automation is deployed enterprise-wide, across sometimes siloed teams and functions, enterprise credential management can simplify adoption of automation — even complex authentication processes can be integrated into the setup seamlessly, while adding additional security in managing and handling those credentials.

Depending on how users have defined them, users can craft Ansible Playbooks that require access to credentials and secrets that have wide access to organizational systems. These are necessary to systems and IT resources to accomplish their automation tasks, but they’re also a very attractive target for bad actors. In particular, they are tempting targets for advanced persistent threat (APT) intruders. Gaining access to these credentials could give the attacker the keys to the entire organization.

Introduction to Red Hat Ansible Automation Platform

Most breaches involve stolen credentials, and APT intruders prefer to leverage privileged accounts like administrators, service accounts with domain privileges, and even local admin or privileged user accounts.

You’re probably familiar with the traditional attack flow: compromise an environment, escalate privilege, move laterally, continue to escalate, then own and exfiltrate. It works, but it also requires a lot of work and a lot of time. According to the Mandiant Report, median dwell time for an exploit, while well down from over 400 days in 2011, remained over 50 days in 2019. However, if you can steal privileged passwords or the API keys to a сloud environment, the next step is complete compromise. Put yourself into an attacker’s shoes: what would be more efficient? 

While Ansible Tower, one of the components of Red Hat Ansible Automation Platform, introduced built-in credentials and secret management capabilities, some may have the need for tighter integration with the enterprise management strategy. CyberArk works with Ansible Automation Platform, automating privileged access management (PAM), which involves the policies, processes and tools that monitor and protect privileged users and credentials.

Getting Started with OpenShift 4 Security

Why Privileged Access Management Matters

Technologies like cloud infrastructure, virtualization and containerization are being adopted by organizations and their development teams alongside DevOps practices that make the need for security practices based on identity and access management critical. Identity and access management isn't just about employees; it includes managing secrets and access granted to applications and infrastructure resources as well.

A PAM solution ideally handles the following key tasks for your organization:

  • Continuously scan an environment to detect privileged accounts and credentials. 
  • Add accounts to a pending list to validate privileges.
  • Perform automated discovery of privileged accounts and credentials.
  • Provide protected control points to prevent credential exposure and isolate critical assets.
  • Record privileged sessions for audit and forensic purposes.
  • View privileged activity by going directly to specified activities and even keystrokes.

Detect anomalous behavior aiming to bypass or circumvent privileged controls, and alert SOC and IT admins to such anomalies.

Suspend or terminate privileged sessions automatically based on risk score and activity type.

Initiate automatic credential rotation based on risk in the case of compromise or theft.

The common theme in the preceding functions is automation. There’s a reason for that: Automation is not just a “nice to have” feature. It’s absolutely essential to PAM. Large organizations may have thousands of resources that need privileged access, and tens of thousands of employees who may need various levels of privilege to get their work done. Even smaller organizations need to monitor and scale privileged access as they grow. Automated PAM solutions handle the trivial aspects of identity and access management so your team can focus on business goals and critical threats. 

WebLogic Continuous Deployment with Red Hat Ansible Automation Platform

Automation is what you use to:

  • Onboard and discover powerful secrets, where you auto-discover secrets, put them in a designated vault and trigger rotation, just to be on the safe side.
  • Apply compliance standards, such as auto-disabling certain network interfaces. 
  • Harden devices via OS- and network-level controls — like blocking SSH connections as root.
  • Track and maintain configurations.

And, of course, automation becomes indispensable in the remediation and response (R&R) stage. When you’re under attack, the absolute worst-case scenario is having to undertake manual R&R. We’ve seen many times — as you probably have — that it puts security and operations teams at odds with each other, and makes both of these look at development as a source of continuous trouble. 

Security can, and should, exist as code. Integrating Ansible with CyberArk implements security-as-code, which allows security, operations and developers to work in sync as your “first responder” group, giving them the time and peace of mind to meaningfully respond to the threat — and likely to find a way to prevent it from recurring.

Automatically Respond to Threats

For most teams, keeping a constant watch on every detail of privileged access is unsustainable and hard to scale. The default reaction is often to simply lock down access, making growth and development difficult. PAM automation can make responding to threats much more scalable. Your team can focus on setting identity and access parameters, and let automated tools apply those rules to daily access needs. 

For example, Ansible Automation Platform, working with CyberArk Response Manager (CARM), can respond to threats automatically by managing users, security policies and credentials based on preconfigured parameters. CARM is part of the CyberArk PAM Collection, developed as part of the Ansible security automation initiative. 

At a high level, the CARM algorithm works like this:

1. An event is detected. For example:

A user leaves the company

User credentials get compromised

An email address gets compromised

2. An automated workflow is triggered

3. A credential is retrieved to authenticate CyberArk

4. The relevant module is invoked:

cyberark_user

cyberark_policy

cyberark_account

cyberark_credential

5. A remediation is performed through the module

Depending on the specifics of the detected threat and the CyberArk platform configuration, the security action might be to, for example:

Reset a user’s credentials or disable the user so that the user must reset their password.

Enhance or relax a security policy or workflow.

Trigger a credential rotation, in which a vaulted credential is rotated.

As your environment goes about its daily business of deploying, testing and updating payloads, as well as processing and maintaining data, security operators can use Ansible to automatically call CARM  to perform the security actions, and then CARM also performs them automatically. 

Incident Response and Incident Remediation | E5: Ask CyberArk Podcast

Automating threat responses that previously required human intervention now serves as the basis for proactive defense in depth.

Credential retrieval is the first step in many scenarios using Ansible and CARM. This step is performed by the cyberark_credential module of the cyberark.pas Collection. The module can receive credentials from the Central Credential Provider. That way, we can obviate the need to hard code the credential in the environment:

- name: credential retrieval basic

  cyberark_credential:

    api_base_url: "http://10.10.0.1"

    app_id: "TestID"

    query: "Safe=test;UserName=admin"

As can be seen in this example, a target URL needs to be provided in addition to the application ID authorized for retrieving the credential. 

The central parameter is the query: it contains the details of the object actually being queried, in this case the “UserName” and “Safe”. The query parameters depend on the use case, and possible values are “Folder”, “Object”, “Address”, “Database” and “PolicyID”. 

If you are more familiar with the CyberArk API, here is the actual URI request that is created out of these parameter values:

{ api_base_url }"/AIMWebService/api/Accounts?AppId="{ app_id }"&Query="{ query }

The return value of the module contains — among other information — the actual credentials, and can be reused in further automation steps.

A more production-level approach is to also encrypt the communication to the API via client certificates:

- name: credential retrieval advanced

  cyberark_credential:

    api_base_url: "https://components.cyberark.local"

    validate_certs: yes

    client_cert: /etc/pki/ca-trust/source/client.pem

    client_key: /etc/pki/ca-trust/source/priv-key.pem

    app_id: "TestID"

    query: "Safe=test;UserName=admin"

    connection_timeout: 60

    query_format: Exact

    fail_request_on_password_change: True

    reason: "requesting credential for Ansible deployment"

Now, let’s look at an example where the detected “bad” event requires rotation of account credentials. With the help of the cyberark_account module, we can change the credentials of the compromised account. The module supports account object creation, deletion and modification using the PAS Web Services SDK.

    - name: Rotate credential via reconcile and provide new password

      cyberark_account:

        identified_by: "address,username"

        safe: "Domain_Admins"

        address: "prod.cyberark.local"

        username: "admin"

        platform_id: WinDomain

        platform_account_properties:

            LogonDomain: "PROD"

        secret_management:

            new_secret: "Ama123ah12@#!Xaamdjbdkl@#112"

            management_action: "reconcile"

            automatic_management_enabled: true

        state: present

        cyberark_session: "{{ cyberark_session }}"

In this example, we changed the password for the user “admin”. Note that the authentication is handled via the cyberark_session value, which is usually obtained from the  cyberark_authentication module.

Ansible Automates 2021: Session 1 - Modern Governance - John Willis

More Information:

https://www.redhat.com/en/resources/cyberark-partner-case-study

https://www.redhat.com/en/technologies/management/ansible

https://www.redhat.com/en/technologies/cloud-computing/openshift/container-platform

https://www.redhat.com/en/technologies/management/ansible/automation-execution-environments

https://www.redhat.com/en/technologies/management/ansible/features


IBM z16 and the Telium Chip

$
0
0

 


The Other IBM Big Iron That Is On The Horizon

The Hot Chips conference is underway this week, historically at Stanford University but this year as was the case last year, is being done virtually thanks to the coronavirus pandemic. There are a lot of new chips that are being discussed in detail, and one of them is not the forthcoming Power10 chip from IBM, which is expected to make its debut sometime in September and which was one of the hot items at last year’s Hot Chips event.

Previewing IBM Telum Processor

The one processor that IBM is talking about, however, is the “Telum” z16 processor for System z mainframes, and unlike in times past, IBM is revealing the latest of its epically long line of mainframe central processing units (1964 through 2022, and counting) before they are launched in systems rather than after. We happen to think IBM had hoped to be able to ship the Telum processors and their System z16 machines before the end of 2021 and the transition from 10 nanometer to 7 nanometer processes at former foundry partner GlobalFoundries to 7 nanometer processes at current foundry partner Samsung has delayed the z16 introduction from its usual cadence. As it stands, the z16 chip will come out in early 2022, after the Power10 chips with fat cores (meaning eight threads per core and only 15 cores per chip) come to market. The skinny Power10 cores (four threads per core but 30 cores on a die) used in so-called “scale out” systems are not expected until the second quarter of 2022. It is rough to change foundries and processes and microarchitectures all at the same time, so a delay from the original plan for both z16 and Power10 are to be expected.

It will be up to a judge to accept IBM’s lawsuit against GlobalFoundries, which we talked about back in June, or not accept it, and it will be up to a jury to decide economic damages should Big Blue prevail and win its case in the courts. Or, Intel could buy GlobalFoundries and settle the case and have IBM as its foundry partner. There are a lot of possible scenarios here. The good news is that IBM and Samsung have been able to get the z16 and Power10 processors designed and are ramping production on the Samsung 7 nanometer process, trying to drive up yields. If IBM could not deliver these chips in the near term, it would not be saying anything at this point. Like when the process shrink with the Power6+ or the Power6+ were not panning out, for instance.

The Telum z16 processor is interesting from a technology standpoint because it shows what IBM can do and what it might do with future Power processors, and it is important from an economic standpoint because the System z mainframe still accounts for a large percentage of IBM’s revenues and an even larger share of its profits. (It is hard to say with any precision.) As the saying goes around here, anything that lets IBM Systems stronger helps IBM i last longer.

Besides, it is just plain fun to look at enterprise server chips. So, without further ado, take a gander at the Telum z16 processor:

According to Ross Mauri, the general manager of the System z product line, “Telum” refers to one of the weapons sported by Artemis, the Greek goddess of the hunt, known for bow hunting but also for her skill with the javelin. This particular javelin has to hit its enterprise target and help Big Blue maintain its mainframe customer base and make them enthusiastic about investing in new servers. The secret sauce in the Telum chip, as it turns out, will be an integrated AI accelerator chip that was developed by IBM Research and that has been modified and incorporated into the design, thus allowing for machine learning inference algorithms to be run natively and in memory alongside production data and woven into mainframe applications.

This is important, and bodes well for the Power10 chip, which is also getting its own native machine learning inference acceleration, albeit of a different variety. The z16 chip has an on-die mixed-precision accelerator for floating point and integer data, while the Power10 chip has a matrix math overlay for its vector math units. The net result is the same, however: Machine learning inference can stay within the compute and memory footprint of the server, and that means it will not be offloaded to external systems or external GPUs or other kinds of ASICs and will therefore be inside the bulwark of legendary mainframe security. There will be no compliance or regulatory issues because customer data that is feeding the machine learning inference and the response or recommendation from that inference will all be in the same memory space. For this reason, we have expected for a lot of machine learning inference to stay on the CPUs on enterprise servers, while machine learning training will continue to be offloaded to GPUs and sometimes other kinds of ASICs or accelerators. (FPGAs are a good alternative for inference.)

Partner Preview of Telium 7nm Processor

https://cdnapisec.kaltura.com/index.php/extwidget/preview/partner_id/1773841/uiconf_id/27941801/entry_id/1_zkn3b6gd/embed/dynamic

The Telum chip measures 530 square millimeters in area and weighs in at about 22.5 billion transistors. By Power standards, the z16 cores are big fat ones, with lots of registers, branch target table entries, and such, which is why IBM can only get eight fat cores on that die. The Power10 chip, which we have nick-named “Cirrus” because IBM had a lame name for it, using the same 7 nanometer transistors can get sixteen fat cores (and 32 skinny cores) on a die that weighs in at 602 square millimeters but has only 18 billion transistors. The Telum chip will have a base clock speed of more than 5 GHz, which is normal for recent vintages of mainframe CPUs.

A whole bunch of things have changed with the Telum design compared to the z14 and z15 designs. IBM has used special versions of the chip called Service Processors, or SPs, to act as external I/O processors, offloading from Central Processors, or CPs, which actually do the compute. With this design, IBM is doing away with this and tightly coupling the chips together with on-die interconnects, much as it has done with Power processors for many generations. Mainframe processors in recent years also had lots of dedicated L3 cache and an external L4 cache that also housed the system interconnect bus (called the X-Bus). The z15 chip implemented in GlobalFoundries 14 nanometer processes had a dozen cores and 256 MB of L3 cache, plus 4 MB of L2 data cache and 4 MB of L2 instruction cache allocated for each core. Each core had 128 KB of L1 instruction cache and 128 KB of instruction cache. It ran at 5.2 GHz, and supported up to 40 TB of RAID-protected DDR4 main memory across a complex of 190 active compute processors.

With the z16 design, the cache is being brought way down. Each core has only 32 MB of L2 cache, which is made possible in part because the branch predictors on the front end of the chip have been redesigned. The core has four pipelines and supports SMT2 multithreading, but it doesn’t have physical L3 cache or physical L4 cache any more. Rather, according to Christian Jacobi, distinguished engineer and chief architect of the z16 processor, it implements a virtual 256 MB L3 cache across those physical L2 caches and a virtual 2 GB cache across eight chips in a system drawer. How this cache is all carved up is interesting, and it speaks to the idea that caches often are inclusive anyway (meaning everything in L1 is in L2, everything in L3 is in L3, and everything in L3 is in L4), which is a kind of dark silicon. Why not determine the hierarchy on the fly based on actual workload needs?

To make this virtual L3 cache, there are a pair of 320 GB/sec rings. Two chips are linked together in a single package using a synchronous transfer interface, shown at the bottom two thirds of the Telum chip and four sockets of these dual-chip modules (DCMs) are interlinked in a flat, all-to-all topology through on-drawer interfaces and fabric controllers, which run across the top of the Telum chip. At the bottom left is the AI Accelerator, which has more than 6 teraflops of mixed precision integer and floating point processing power that is accessible through the z16 instruction set and is not using a weird offload model as is the case with CPUs that offload machine learning inference to GPUs, FPGAs, or custom ASICs. This accelerator, says Jacobi, takes up a little less real estate on the chip than a core does. And clearly, if IBM wanted to raise the ratio, it can add more accelerators. This ratio is interesting in that it shows how much AI inference that IBM expects – and that its customers expect – to be woven into their applications.

That is the key insight here.

This on-chip AI Accelerator has 128 compute tiles that can do 8-way half precision (FP16) floating point SIMD operations, which is optimized for matrix multiplication and convolutions used in neural network training. The AI Accelerator also has 32 compute tiles that implement 8-way FP16/FP32 units that are optimized for activation functions and more complex operations. The accelerator also has what IBM calls an intelligent prefetcher and write-back block, which can move data to an internal scratchpad at more than 120 GB/sec and that can store data out to the processor caches at more than 80 GB/sec. The two collections of AI math units have what IBM calls an intelligent data mover and formatter that prepares incoming data for compute and then write-back after it has been chewed on by the math units, and this has an aggregate of 600 GB/sec of bandwidth.

That’s an impressive set of numbers for a small block of chips, and a 32-chip complex (four sets of four-sockets of DCMs) can deliver over 200 teraflops of machine learning inference performance. (There doesn’t seem to be INT8 or INT4 integer support on this device, but don’t be surprised if IBM turns it on eventually, thereby doubling and quadrupling the inference performance for some use cases that have relatively coarse data.)

Jacobi says that a z16 socket with an aggregate of 16 cores will deliver 40 percent more performance than a z15 socket, which had 12 cores. If you do the math, 33 percent of that increase came from the core count increase; the rest comes from microarchitecture tweaks and process shrinks. We don’t expect the clock speed to be much more than a few hundred megahertz more than the frequencies used in the z15 chip, in fact. There may be some refinements in the Samsung 7 nanometer process further down the road that allow IBM to crank it up and boost performance with some kickers. The same thing could happen with Power10 chips, by the way.

One final though, and it is where the rubber hits the road with this AI Accelerator. A customer in the financial services industry worked with IBM to adapt its recurrent neural network (RNN) to the AI Accelerator, allowing it to do inference on the machine for a credit card fraud model. This workload was simulated on a z16 system simulator, so take it with a grain of salt. It illustrates the principle:

With only one chip, the simulated System z16 machine could handle 116,000 inferences per second with an average latency of 1.1 milliseconds, which is acceptable throughput and latency for a financial transaction not to be stalled by the fraud detection and for it to be done in real time rather than after the fact. With 32 chips in a full System z16 machine, the AI Accelerator could scale linearly, yielding 3.5 million inferences per second with an average latency of 1.2 milliseconds. That’s a scalability factor of 94.3 percent of perfect linear scaling, and we think this has as much to do with the flat, fast topology in the new z16 interconnect and with the flatter cache hierarchy as it has to do with the robustness of the AI Accelerator.

IBM updates its mainframe processor to help AI

IBM's Telum processor will have on-chip acceleration for artificial intelligence inferencing.

IBM has introduced a new CPU for its Z Series mainframe that’s designed for transactions like banking, training, insurance, customer interactions, and fraud detection.

The Telum processor was unveiled at the annual Hot Chips conference and has been in development for three years to provide high-volume, real-time inferencing needed for artificial intelligence.

The Telum design is very different from its System z15 predecessor. It features 8 CPU cores, on-chip workload accelerators, and 32MB of what IBM calls Level 2 semi-private cache. The L2 cache is called semi-private because it is used to build a shared virtual 256MB L3 connection between the cores on the chip. This is a 1.5x growth in cache size over the z15.

The CPU comes in a module design that includes two closely coupled Telum processors, so you get 16 cores per socket running at 5Ghz. IBM Z systems pack their processors in what are known as drawers, with four sockets per drawer. The Telum processor will be manufactured by Samsung using a 7nm process, as compared to the 14nm process used for the z15 processor.

Stopping Fraud

IBM mainframes are still heavily used in online transaction processing (OLTP) and one of the problems that bedevils OLTP is that fraud usually isn’t caught until after it is committed.

Doing real-time analysis on millions of transactions is just not doable, particularly when fraud analysis and detection is conducted far away from mission-critical transactions and data, IBM says. AI could help, but AI workloads have much larger computational requirements than operating workloads.

“Due to latency requirements, complex fraud detection often cannot be completed in real-time—meaning a bad actor could have already successfully purchased goods with a stolen credit card before the retailer is aware fraud has taken place,” the company said in a blog post announcing Telum.

So the new chip is designed for real-time, AI-specific financial workloads. Just how it will work is not exactly known. Telum-based z16 mainframes are not expected until the second half of 2022.

A brief overview of IBM’s new 7 nm Telum mainframe CPU

A typical Telum-powered mainframe offers 256 cores at a base clock of 5+GHz.

From the perspective of a traditional x86 computing enthusiast—or professional—mainframes are strange, archaic beasts. They're physically enormous, power-hungry, and expensive by comparison to more traditional data-center gear, generally offering less compute per rack at a higher cost.

IBM MainFrame Life Cycle History

This raises the question, "Why keep using mainframes, then?" Once you hand-wave the cynical answers that boil down to "because that's how we've always done it," the practical answers largely come down to reliability and consistency. As AnandTech's Ian Cutress points out in a speculative piece focused on the Telum's redesigned cache, "downtime of these [IBM Z] systems is measured in milliseconds per year." (If true, that's at least seven nines.)

IBM's own announcement of the Telum hints at just how different mainframe and commodity computing's priorities are. It casually describes Telum's memory interface as "capable of tolerating complete channel or DIMM failures, and designed to transparently recover data without impact to response time."

When you pull a DIMM from a live, running x86 server, that server does not "transparently recover data"—it simply crashes.

IBM Z-series architecture

Telum is designed to be something of a one-chip-to-rule-them-all for mainframes, replacing a much more heterogeneous setup in earlier IBM mainframes.

The 14 nm IBM z15 CPU that Telum is replacing features five total processors—two pairs of 12-core Compute Processors and one System Controller. Each Compute Processor hosts 256MiB of L3 cache shared between its 12 cores, while the System Controller hosts a whopping 960MiB of L4 cache shared between the four Compute Processors.

Five of these z15 processors—each consisting of four Compute Processors and one System Controller—constitutes a "drawer." Four drawers come together in a single z15-powered mainframe.

Although the concept of multiple processors to a drawer and multiple drawers to a system remains, the architecture inside Telum itself is radically different—and considerably simplified.

Telum architecture

Telum is somewhat simpler at first glance than z15 was—it's an eight-core processor built on Samsung's 7nm process, with two processors combined on each package (similar to AMD's chiplet approach for Ryzen). There is no separate System Controller processor—all of Telum's processors are identical.

From here, four Telum CPU packages combine to make one four-socket "drawer," and four of those drawers go into a single mainframe system. This provides 256 total cores on 32 CPUs. Each core runs at a base clockrate over 5 GHz—providing more predictable and consistent latency for real-time transactions than a lower base with higher turbo rate would.

Pockets full of cache

Doing away with the central System Processor on each package meant redesigning Telum's cache, as well—the enormous 960MiB L4 cache is gone, as well as the per-die shared L3 cache. In Telum, each individual core has a private 32MiB L2 cache—and that's it. There is no hardware L3 or L4 cache at all.

This is where things get deeply weird—while each Telum core's 32MiB L2 cache is technically private, it's really only virtually private. When a line from one core's L2 cache is evicted, the processor looks for empty space in the other cores' L2. If it finds some, the evicted L2 cache line from core x is tagged as an L3 cache line and stored in core y's L2.

OK, so we have a virtual, shared up-to-256MiB L3 cache on each Telum processor, composed of the 32MiB "private" L2 cache on each of its eight cores. From here, things go one step further—that 256MiB of shared "virtual L3" on each processor can, in turn, be used as shared "virtual L4" among all processors in a system.

Telum's "virtual L4" works largely the same way its "virtual L3" did in the first place—evicted L3 cache lines from one processor look for a home on a different processor. If another processor in the same Telum system has spare room, the evicted L3 cache line gets retagged as L4 and lives in the virtual L3 on the other processor (which is made up of the "private" L2s of its eight cores) instead.

AnandTech's Ian Cutress goes into more detail on Telum's cache mechanisms. He eventually sums them up by answering "How is this possible?" with a simple "magic."

IBM Telum Processor brings deep learning inference to enterprise workloads

AI inference acceleration

IBM's Christian Jacobi briefly outlines Telum's AI acceleration in this two-minute clip.

Telum also introduces a 6TFLOPS on-die inference accelerator. It's intended to be used for—among other things—real-time fraud detection during financial transactions (as opposed to shortly after the transaction).

In the quest for maximum performance and minimal latency, IBM threads several needles. The new inference accelerator is placed on-die, which allows for lower latency interconnects between the accelerator and CPU cores—but it's not built into the cores themselves, a la Intel's AVX-512 instruction set.

The problem with in-core inference acceleration like Intel's is that it typically limits the AI processing power available to any single core. A Xeon core running an AVX-512 instruction only has the hardware inside its own core available to it, meaning larger inference jobs must be split among multiple Xeon cores to extract the full performance available.

Telum's accelerator is on-die but off-core. This allows a single core to run inference workloads with the might of the entire on-die accelerator, not just the portion built into itself.

In a major refresh of its Z Series chips, IBM is adding on-chip AI acceleration capabilities to allow enterprise customers to perform deep learning inferencing while transactions are taking place to capture business insights and fight fraud in real-time.

IBM is set to unveil the latest Z chip Aug. 23 (Monday) at the annual Hot Chips 33 conference, which is being held virtually due to the ongoing COVID-19 pandemic. The company provided advance details in a media pre-briefing last week.

This will be the first Z chip, used in IBM's System Z mainframes, that won't follow a traditional numeric naming pattern used in the past. Instead of following the previous z15 chip with a z16 moniker, the new processor is being called IBM Telum.

This will be IBM's first processor to include on-chip AI acceleration, according to the company. Designed for customers across a wide variety of uses, including banking, finance, trading, insurance applications and customer interactions, the Telum processors have been in development for the past three years. The first Telum-based systems are planned for release in the first half of 2022.

Previewing IBM Telum Processor

Ross Mauri of IBM

One of the major strengths of the new Telum chips is that they are designed to enable applications to run efficiently where their data resides, giving enterprises more flexibility with their most critical workloads, Ross Mauri, the general manager of IBM Z, said in a briefing with reporters on the announcement before Hot Chips.

"From an AI point of view, I have been listening to our clients for several years and they are telling me that they can't run their AI deep learning inferences in their transactions the way they want to," said Mauri. "They really want to bring AI into every transaction. And the types of clients I am talking to are running 1,000, 10,000, 50,000 transactions per second. We are talking high volume, high velocity transactions that are complex, with multiple database reads and writes and full recovery for … transactions in banking, finance, retail, insurance and more."

By integrating mid-transaction AI inferencing into the Telum chips, it will be a huge breakthrough for fraud detection, said Mauri.

"You hear about fraud detection all the time," he said. "Well, I think we are going to be able to move from fraud detection to fraud prevention. I think this is a real game changer when it comes to the business world, and I am really excited about that."

Inside the Telum Architecture

Christian Jacobi of IBM

Christian Jacobi, an IBM distinguished engineer and the chief architect for IBM Z processor design, said that the Telum chip design is specifically optimized to run these kinds of mission critical, heavy duty transaction processing and batch processing workloads, while ensuring top-notch security and availability.

"We have designed this accelerator using the AI core coming from the IBM AI research center," in cooperation with the IBM Research team, the IBM Z chip design team and the IBM Z firmware and software development team, said Jacobi. It is the first IBM chip created using technology from the IBM Research AI hardware center.

"The goal for this processor and the AI accelerator was to enable embedding AI with super low latency directly into the transactions without needing to send the data off-platform, which brings all sorts of latency inconsistencies and security concerns," said Jacobi. "Sending data over a network, oftentimes personal and private data, requires cryptography and auditing of security standards that creates a lot of complexity in an enterprise environment. We have designed this accelerator to operate directly on the data using virtual address mechanisms and the data protection mechanisms that naturally apply to any other thing on the IBM Z processor."

To achieve the low latencies, IBM engineers directly connected the accelerator to the on-chip cache infrastructure, which can directly access model data and transaction data from the caches, he explained. "It enables low batch counts so that we do not have to wait for multiple transactions to arrive of the same model type. All of that is geared towards enabling millisecond range inference tasks so that they can be done as part of the transaction processing without impacting their service level agreements … and have the results available to be able to utilize the AI inference result as part of the transaction processing.

The new Telum chips are manufactured for IBM by Samsung using a 7nm Extreme Ultraviolet Lithography (EUV) process.

The chips have a new design compared to IBM's existing z15 chips, according to Jacobi. Today's z15 chips use dual chip modules, but the Telum chips will use a single module.

"Four of those modules will be plugged into one drawer, basically a motherboard with four sockets," he said. "In the prior generations, we used two different chip types – a processor chip and a system control chip that contained a physical L4 cache, and that system control hub chip also acted as a hub whenever two processor chips needed to communicate with each other."

Dropping the system control chip enabled the designers to reduce the latencies, he said. "When one chip needs to talk to another chip, we can implement a flat topology, where each of the eight chips in a drawer has a direct connection to the seven other chips in the drawer. That optimizes latencies for cache interactions and memory access across the drawer."

Jacobi said that the Telum chips will provide a 40 percent performance improvement at the socket level over the company's previous z15 chips, which is boosted in part by its extra cores.

Each Telum processor contains eight processor cores and use a deep super-scalar, out-of-order instruction pipeline. Running with more than 5GHz clock frequency, the redesigned cache and chip-interconnection infrastructure provides 32MB cache per core and can scale to 32 Telum chips. The dual-chip module design contains 22 billion transistors and 19 miles of wire on 17 metal layers.

Introducing the new IBM z15 T02

An Analyst Weighs In

Karl Freund, analyst

Karl Freund, founder and principal analyst of Cambrian AI Research, told EnterpriseAI that the upcoming Telum chips directly target enterprise users with a wide range of useful capabilities.

"In addition to a new cache architecture that will significantly pump-up performance, the Telum processor will provide Z customers with the ability to run real-time AI on the same platform that conducts the mission critical transactions and analyses on which enterprises depend," said Freund. "Until Telum, Z applications could run machine learning on the Z cores, or offload data to another platform for deep learning analysis. The latter introduces unacceptable latencies and significant costs, as well as introducing entirely new and now unnecessary security risk."

For customers, these could be compelling improvements, said Freund.

"I believe Telum provides the biggest reason we have seen in a while for applications to remain in the Z fold," he said. "Honestly, enterprise AI is about to get very real with the Telum processor. After all, the Z is the custodian of some of the most important data in an enterprise. Being able to run deep learning models directly on the Z will unlock tremendous value for Z clients, value that has been hidden until Telum."

What still needs to be determined, he said, is how it all will perform when the development work is completed. "We need to see whether the small accelerator at 'only' 6 TFLOPS can provide adequate processing power to eight very fast cores needing AI services. However, since the data involved here is highly structured numerical floating point or decimal data, instead of images, voice or video, I suspect the accelerator will prove up to the task."

More Details to Come Later

Jacobi said that IBM is not yet providing any additional system performance specifications until the company's engineers complete further Optimizations of its firmware and software stacks that will run on the systems.

"We will provide more performance gains out of Optimization across those layers of the entire stack when we get to that point next year," he said.

"I will add that one of the unique things about our design … is that every core when it does AI … are performing a hybrid – they are performing the complex transaction work including the databases and the business logic, and then switch over to perform AI as part of the transaction," said Jacobi. "When they switch over, they can use the entire capacity of the AI accelerator and not just a sliver that is assigned to the core. It is a dedicated piece of silicon on the chip. And the core can access that piece of silicon and use the entire compute capacity of that AI accelerator. That is important for us to achieve the low latencies that we need to make it all fit within the transaction response budget."

About That Z Series Name Change

The upcoming Telum chips will show up in IBM's Z Series and LinuxONE systems as the main processor chip for both product lines, said IBM's Mauri. They are not superseding the Z Series chips, he said.

So, does that mean that z16, z17 and other future Z chips are no longer on the roadmap?

"No, said Mauri. "It just means that we never named our chips before. We are naming it this time because we are proud of the innovation and breakthroughs and think that it is unique in what it does. And that is the only thing. I think z15 will still be around and there will be future generations, many future generations."

Still to Come: New z/OS

To prepare for the arrival of the Telum chips, IBM has already slated the debut of the next version of its updated z/OS 2.5 operating system for Z Series hardware sometime in September, according to an earlier report by The Register. The mainframe operating system is expected to get more AI and hybrid cloud features, as well as expanded co-processor support, based on a preview of the OS that was unveiled in March.

More Information

https://www.ibm.com/blogs/systems/ibm-telum-processor-the-next-gen-microprocessor-for-ibm-z-and-ibm-linuxone/

https://research.ibm.com/blog/telum-processor

https://www.extremetech.com/computing/326402-ibms-new-system-z-cpu-offers-40-percent-more-performance-per-socket-integrated-ai

Ristricted boltzmann machine and boltzmann brain in ai and learning explained

$
0
0

Restricted Boltzmann machine (RBM)

A restricted Boltzmann machine (RBM) is a type of artificial neural network (ANN) for machine learning of probability distributions. An artificial neural network is a system of hardware and/or software patterned after the operation of neurons in the human brain.

Created by Geoff Hinton, RBM algorithms are useful in defining dimensionality reduction, classification, regression, collaborative filtering, feature learning and topic modeling. Like perceptrons, they are a relatively simple type of neural network.

RBMs fall into the categories of Stochastic and generative models of artificial intelligence. Stochastic refers to anything based on probabilities and generative means that it uses AI to produce (generate) a desired output. Generative models contrast with discriminative models, which classify existing data.

Like all multi-layer neural networks, RBMs have layers of artificial neurons, in their case two. The first layer is the input layer. The second is a hidden layer that only accepts what the first layer passes on. The restriction spoken of in RBM is that the different neurons within the same layer can’t communicate with one another. Instead, neurons can only communicate with other layers. (In a standard Boltzmann machine, neurons in the hidden layer intercommunicate.) Each node within a layer performs its own calculations. After performing its calculations, the node then makes a stochastic decision about whether to pass on the on to the next layer.

Though RBM are still sometimes used, they have mostly been replaced by generative adversarial networks or vibrational auto-encoders.

Boltzmann Machine -A Probabilistic Graphical Models

Sir Geoffrey Hinton, the “Godfather of Deep Learning” coined Boltzmann Machine in 1985 for the first time. A well-known figure and personality in the deep learning community  Sir Geoffrey Hinton also a professor at the University of Toronto.

Boltzmann Machines – A kind of imaginary recurrent neural network and this normally get interpreted from the probabilistic graphical models. In a short and concise manner a  neural network which is fully connected and consist of visible and hidden units. It operates in asynchronous mode with stochastic updates for each of its unit.

These machines are also called as probability distributions on high dimensional binary vectors. It’s a generative unsupervised model used for probability distribution from an original dataset. A great demanding/hungry tool for computation power however restricting its network topology the behaviour can be controlled.

It is indeed an algorithm which is useful for dimensionality reduction, classification, regression, collaborative filtering, feature learning and topic modelling. Like any other neural network, these machines also have (both BM and RBM) an input layer or referred to as the visible layer and one or several hidden layers or referred to as the hidden layer.

Restricted Boltzmann Machines

Boltzmann machines are probability distributions on high dimensional binary vectors which are analogous to Gaussian Markov Random Fields in that they are fully determined by first and second-order moments.

It is used for pattern storage and retrieval. As per wiki “A Boltzmann machine is also called stochastic Hopfield network with hidden units) is a type of stochastic recurrent neural network and Markov random field.” RBM itself has many applications, some of them are listed as below

  • Collaborative filtering
  • Multiclass classification
  • Information retrieval
  • Motion capture modelling
  • Segmentation
  • Modelling natural images

Deep belief nets use the Boltzmann machine especially the Restricted Boltzmann machine as a key component but first order weight updates.

Lecture 10 Boltzmann machine

Limitations of neural networks grow clearer in business

AI often means neural networks, but intensive training requirements are prompting enterprises to look for alternatives to neural networks that are easier to implement.

The rise in prominence AI today can be credited largely to improvements in one algorithm category: the neural network. But experts say that the limitations neural networks mean enterprises will need to embrace a fuller lineup algorithms to advance AI.

"With neural networks, there's this huge complication," said David Barrett, founder and CEO Expensify Inc. "You end up with trillions dimensions. If you want to change something, you need to start entirely from scratch. The more we tried [neural networks], we still couldn't get them to work."

Neural network technology is seen as cutting-edge today, but the underlying algorithms are nothing new. They were proposed as theoretically possible decades ago.

What's new is that we now have the massive stores data needed to train algorithms and the robust compute power to process all this data in a reasonable period time. As neural networks have moved from theoretical to practical, they've come to power some the most advanced AI applications, like computer vision, language translation and self-driving cars.

Tom Goldstein: "An empirical look at generalization in neural nets"

Training requirements for neural networks are too high

But the problem, as Barrett and others see it, is that neural networks simply require too much brute force. For example, if you show the algorithm a billion examples images containing certain objects, it will learn to classify that object in new images effectively. But that's a high bar for training, and meeting that requirement is sometimes impossible.

That was the case for Barrett and his team. At the 2018 Artificial Intelligence Conference in New York, he described how Expensify is using natural language processing to automate customer service for its expense reporting software. Neural networks weren't a good fit for Expensify because the San Francisco company didn't have the corpus of historical data necessary.

Expensify's customer inquiries are often esoteric, Barrett said. Even when customers' concerns map to common problems, their phrasing is unique and, therefore, hard to classify using a system that demands many training examples.

So, Barrett and his team developed their own approach. He didn't identify the specific type of algorithms their tool is based on, but he said it compares pieces of conversations to conversations that have proceeded successfully in the past. It doesn't need to classify queries with precision like a neural network would because it's more focused on moving the conversation along a path rather than delivering the right response to a given query. This gives the bot a chance to ask clarifying questions that reduce ambiguity.

"The challenge of AI is it's built to answer perfectly formed questions," Barrett said. "The challenge of the real world is different."

Deep Boltzmann Machines

A 'broad church' of algorithms is needed in AI

Part of the reason for the enthusiasm around neural network technology is that many people are just finding out about it, said Zoubin Ghahramani, chief scientist at Uber. But for those that have known about and used it for years, the limitations of neural networks are well known.

That doesn't mean it's time for people to ignore neural networks, however. Instead, Ghahramani said it comes down to using the right tool for the right job. He described an approach to incorporating Bayesian inference, in which the estimated probability of something occurring is updated when more evidence becomes available, into machine learning models.

"To have successful AI applications that solve challenging real-world problems, you have to have a broad church of methods," he said in a press conference. "You can't come in with one hammer trying to solve all problems."

Another alternative to neural network technology is deep reinforcement learning, which is optimized to achieve a goal over many steps by incentivizing effective steps and penalizing unfavorable steps. The AlphaGo program, which beat human champions at the game Go, used a combination of neural networks and deep reinforcement learning to learn the game.

Deep reinforcement learning algorithms essentially learn through trial and error, whereas neural networks learn through example. This means deep reinforcement requires less labeled training data upfront.

Kathryn Hume, vice president of product and strategy at Integrate.ai Inc., a Toronto-based software company that helps enterprises integrate AI into existing business processes, said any type of model that reduces the reliance on labeled training data is important. She mentioned Bayesian parametric models, which assess the probability of an occurrence based on existing data rather than requiring some minimum threshold of prior examples, one of the primary limitations of neural networks.

"We need not rely on just throwing a bunch of information into a pot," she said. "It can move us away from the reliance on labeled training data when we can infer the structure of data," rather than using algorithms like neural networks, which require millions or billions of examples of labeled training before they can make predictions.

What is a Neural Network and How Does it Work?

esearch on artificial neural networks was motivated by the observation that human intelligence emerges from highly parallel networks of relatively simple, non-linear neurons that learn by adjusting the strengths of their connections. This observation leads to a central computational question: How is it possible for networks of this general kind to learn the complicated internal representations that are required for difficult tasks such as recognizing objects or understanding language? Deep learning seeks to answer this question by using many layers of activity vectors as representations and learning the connection strengths that give rise to these vectors by following the stochastic gradient of an objective function that measures how well the network is performing. It is very surprising that such a conceptually simple approach has proved to be so effective when applied to large training sets using huge amounts of computation and it appears that a key ingredient is depth: shallow networks simply do not work as well.


We reviewed the basic concepts and some of the breakthrough achievements of deep learning several years ago.63 Here we briefly describe the origins of deep learning, describe a few of the more recent advances, and discuss some of the future challenges. These challenges include learning with little or no external supervision, coping with test examples that come from a different distribution than the training examples, and using the deep learning approach for tasks that humans solve by using a deliberate sequence of steps which we attend to consciously—tasks that Kahneman56 calls system 2 tasks as opposed to system 1 tasks like object recognition or immediate natural language understanding, which generally feel effortless.

From Hand-Coded Symbolic Expressions to Learned Distributed Representations

There are two quite different paradigms for AI. Put simply, the logic-inspired paradigm views sequential reasoning as the essence of intelligence and aims to implement reasoning in computers using hand-designed rules of inference that operate on hand-designed symbolic expressions that formalize knowledge. The brain-inspired paradigm views learning representations from data as the essence of intelligence and aims to implement learning by hand-designing or evolving rules for modifying the connection strengths in simulated networks of artificial neurons.

In the logic-inspired paradigm, a symbol has no meaningful internal structure: Its meaning resides in its relationships to other symbols which can be represented by a set of symbolic expressions or by a relational graph. By contrast, in the brain-inspired paradigm the external symbols that are used for communication are converted into internal vectors of neural activity and these vectors have a rich similarity structure. Activity vectors can be used to model the structure inherent in a set of symbol strings by learning appropriate activity vectors for each symbol and learning non-linear transformations that allow the activity vectors that correspond to missing elements of a symbol string to be filled in. This was first demonstrated in Rumelhart et al.74 on toy data and then by Bengio et al.14 on real sentences. A very impressive recent demonstration is BERT,22 which also exploits self-attention to dynamically connect groups of units, as described later.

The main advantage of using vectors of neural activity to represent concepts and weight matrices to capture relationships between concepts is that this leads to automatic generalization. If Tuesday and Thursday are represented by very similar vectors, they will have very similar causal effects on other vectors of neural activity. This facilitates analogical reasoning and suggests that immediate, intuitive analogical reasoning is our primary mode of reasoning, with logical sequential reasoning being a much later development,56 which we will discuss.

BOLTZMANN MACHINES

The Rise of Deep Learning

Deep learning re-energized neural network research in the early 2000s by introducing a few elements which made it easy to train deeper networks. The emergence of GPUs and the availability of large datasets were key enablers of deep learning and they were greatly enhanced by the development of open source, flexible software platforms with automatic differentiation such as Theano,16 Torch,25 Caffe,55 TensorFlow,1 and PyTorch.71 This made it easy to train complicated deep nets and to reuse the latest models and their building blocks. But the composition of more layers is what allowed more complex non-linearities and achieved surprisingly good results in perception tasks, as summarized here.

Why depth? Although the intuition that deeper neural networks could be more powerful pre-dated modern deep learning techniques, it was a series of advances in both architecture and training procedures,15,35,48 which ushered in the remarkable advances which are associated with the rise of deep learning. But why might deeper networks generalize better for the kinds of input-output relationships we are interested in modeling? It is important to realize that it is not simply a question of having more parameters, since deep networks often generalize better than shallow networks with the same number of parameters. The practice confirms this. The most popular class of convolutional net architecture for computer vision is the ResNet family of which the most common representative, ResNet-50 has 50 layers. Other ingredients not mentioned in this article but which turned out to be very useful include image deformations, drop-out, and batch normalization.

We believe that deep networks excel because they exploit a particular form of compositionality in which features in one layer are combined in many different ways to create more abstract features in the next layer.

For tasks like perception, this kind of compositionality works very well and there is strong evidence that it is used by biological perceptual systems.

Unsupervised pre-training. When the number of labeled training examples is small compared with the complexity of the neural network required to perform the task, it makes sense to start by using some other source of information to create layers of feature detectors and then to fine-tune these feature detectors using the limited supply of labels. In transfer learning, the source of information is another supervised learning task that has plentiful labels. But it is also possible to create layers of feature detectors without using any labels at all by stacking auto-encoders.

Deep Learning Lecture 10.3 - Restricted Boltzmann Machines

First, we learn a layer of feature detectors whose activities allow us to reconstruct the input. Then we learn a second layer of feature detectors whose activities allow us to reconstruct the activities of the first layer of feature detectors. After learning several hidden layers in this way, we then try to predict the label from the activities in the last hidden layer and we backpropagate the errors through all of the layers in order to fine-tune the feature detectors that were initially discovered without using the precious information in the labels. The pre-training may well extract all sorts of structure that is irrelevant to the final classification but, in the regime where computation is cheap and labeled data is expensive, this is fine so long as the pre-training transforms the input into a representation that makes classification easier.

In addition to improving generalization, unsupervised pre-training initializes the weights in such a way that it is easy to fine-tune a deep neural network with backpropagation. The effect of pre-training on optimization was historically important for overcoming the accepted wisdom that deep nets were hard to train, but it is much less relevant now that people use rectified linear units (see next section) and residual connections.43 However, the effect of pre-training on generalization has proved to be very important. It makes it possible to train very large models by leveraging large quantities of unlabeled data, for example, in natural language processing, for which huge corpora are available. The general principle of pre-training and fine-tuning has turned out to be an important tool in the deep learning toolbox, for example, when it comes to transfer learning or even as an ingredient of modern meta-learning.

The mysterious success of rectified linear units. The early successes of deep networks involved unsupervised pre-training of layers of units that used the logistic sigmoid nonlinearity or the closely related hyperbolic tangent. Rectified linear units had long been hypothesized in neuroscience29 and already used in some variants of RBMs70 and convolutional neural networks.54 It was an unexpected and pleasant surprise to discover35 that rectifying non-linearities (now called ReLUs, with many modern variants) made it easy to train deep networks by backprop and stochastic gradient descent, without the need for layerwise pre-training. This was one of the technical advances that enabled deep learning to outperform previous methods for object recognition, as outlined here.

Breakthroughs in speech and object recognition. An acoustic model converts a representation of the sound wave into a probability distribution over fragments of phonemes. Heroic efforts by Robinson using transputers and by Morgan et al. using DSP chips had already shown that, with sufficient processing power, neural networks were competitive with the state of the art for acoustic modeling. In 2009, two graduate students68 using Nvidia GPUs showed that pre-trained deep neural nets could slightly outperform the SOTA on the TIMIT dataset. This result reignited the interest of several leading speech groups in neural networks. In 2010, essentially the same deep network was shown to beat the SOTA for large vocabulary speech recognition without requiring speaker-dependent training and by 2012, Google had engineered a production version that significantly improved voice search on Android. This was an early demonstration of the disruptive power of deep learning.

Dr. Meir Shimon - ARE YOU A BOLTZMANN BRAIN?

At about the same time, deep learning scored a dramatic victory in the 2012 ImageNet competition, almost halving the error rate for recognizing a thousand different classes of object in natural images.60 The keys to this victory were the major effort by Fei-Fei Li and her collaborators in collecting more than a million labeled images31 for the training set and the very efficient use of multiple GPUs by Alex Krizhevsky. Current hardware, including GPUs, encourages the use of large mini-batches in order to amortize the cost of fetching a weight from memory across many uses of that weight. Pure online stochastic gradient descent which uses each weight once converges faster and future hardware may just use weights in place rather than fetching them from memory.

The deep convolutional neural net contained a few novelties such as the use of ReLUs to make learning faster and the use of dropout to prevent over-fitting, but it was basically just a feed-forward convolutional neural net of the kind that Yann LeCun and his collaborators had been developing for many years.64,65 The response of the computer vision community to this breakthrough was admirable. Given this incontrovertible evidence of the superiority of convolutional neural nets, the community rapidly abandoned previous hand-engineered approaches and switched to deep learning.

Recent Advances

Here we selectively touch on some of the more recent advances in deep learning, clearly leaving out many important subjects, such as deep reinforcement learning, graph neural networks and meta-learning.

Soft attention and the transformer architecture. A significant development in deep learning, especially when it comes to sequential processing, is the use of multiplicative interactions, particularly in the form of soft attention.7,32,39,78 This is a transformative addition to the neural net toolbox, in that it changes neural nets from purely vector transformation machines into architectures which can dynamically choose which inputs they operate on, and can store information in differentiable associative memories. A key property of such architectures is that they can effectively operate on different kinds of data structures including sets and graphs.

Soft attention can be used by modules in a layer to dynamically select which vectors from the previous layer they will combine to compute their outputs. This can serve to make the output independent of the order in which the inputs are presented (treating them as a set) or to use relationships between different inputs (treating them as a graph).

The transformer architecture,85 which has become the dominant architecture in many applications, stacks many layers of "self-attention" modules. Each module in a layer uses a scalar product to compute the match between its query vector and the key vectors of other modules in that layer. The matches are normalized to sum to 1, and the resulting scalar coefficients are then used to form a convex combination of the value vectors produced by the other modules in the previous layer. The resulting vector forms an input for a module of the next stage of computation. Modules can be made multi-headed so that each module computes several different query, key and value vectors, thus making it possible for each module to have several distinct inputs, each selected from the previous stage modules in a different way. The order and number of modules does not matter in this operation, making it possible to operate on sets of vectors rather than single vectors as in traditional neural networks. For instance, a language translation system, when producing a word in the output sentence, can choose to pay attention to the cor-responding group of words in the input sentence, independently of their position in the text. While multiplicative gating is an old idea for such things as coordinate transforms and powerful forms of recurrent networks, its recent forms have made it mainstream. Another way to think about attention mechanisms is that they make it possible to dynamically route information through appropriately selected modules and combine these modules in potentially novel ways for improved out-of-distribution generalization.

How a Boltzmann machine models data

We believe that deep networks excel because they exploit a particular form of compositionality in which features in one layer are combined in many different ways to create more abstract features in the next layer.

Transformers have produced dramatic performance improvements that have revolutionized natural language processing,27,32 and they are now being used routinely in industry. These systems are all pre-trained in a self-supervised manner to predict missing words in a segment of text.

Perhaps more surprisingly, transformers have been used successfully to solve integral and differential equations symbolically.62 A very promising recent trend uses transformers on top of convolutional nets for object detection and localization in images with state-of-the-art performance.19 The transformer performs post-processing and object-based reasoning in a differentiable manner, enabling the system to be trained end-to-end.

Unsupervised and self-supervised learning. Supervised learning, while successful in a wide variety of tasks, typically requires a large amount of human-labeled data. Similarly, when reinforcement learning is based only on rewards, it requires a very large number of interactions. These learning methods tend to produce task-specific, specialized systems that are often brittle outside of the narrow domain they have been trained on. Reducing the number of human-labeled samples or interactions with the world that are required to learn a task and increasing the out-of-domain robustness is of crucial importance for applications such as low-resource language translation, medical image analysis, autonomous driving, and content filtering.

Humans and animals seem to be able to learn massive amounts of background knowledge about the world, largely by observation, in a task-independent manner. This knowledge underpins common sense and allows humans to learn complex tasks, such as driving, with just a few hours of practice. A key question for the future of AI is how do humans learn so much from observation alone?

A key question for the future of AI is how do humans learn so much from observation alone?

In supervised learning, a label for one of N categories conveys, on average, at most log2(N) bits of information about the world. In model-free reinforcement learning, a reward similarly conveys only a few bits of information. In contrast, audio, images and video are high-bandwidth modalities that implicitly convey large amounts of information about the structure of the world. This motivates a form of prediction or reconstruction called self-supervised learning which is training to "fill in the blanks" by predicting masked or corrupted portions of the data. Self-supervised learning has been very successful for training transformers to extract vectors that capture the context-dependent meaning of a word or word fragment and these vectors work very well for downstream tasks.

For text, the transformer is trained to predict missing words from a discrete set of possibilities. But in high-dimensional continuous domains such as video, the set of plausible continuations of a particular video segment is large and complex and representing the distribution of plausible continuations properly is essentially an unsolved problem.

Contrastive learning. One way to approach this problem is through latent variable models that assign an energy (that is, a badness) to examples of a video and a possible continuation.a

Given an input video X and a proposed continuation Y, we want a model to indicate whether Y is compatible with X by using an energy function E(X, Y) which takes low values when X and Y are compatible, and higher values otherwise.

E(X, Y) can be computed by a deep neural net which, for a given X, is trained in a contrastive way to give a low energy to values Y that are compatible with X (such as examples of (X, Y) pairs from a training set), and high energy to other values of Y that are incompatible with X. For a given X, inference consists in finding one cacm6407_a.gif that minimizes E(X, Y) or perhaps sampling from the Y s that have low values of E(X, Y). This energy-based approach to representing the way Y depends on X makes it possible to model a diverse, multi-modal set of plausible continuations.

The key difficulty with contrastive learning is to pick good "negative" samples: suitable points Y whose energy will be pushed up. When the set of possible negative examples is not too large, we can just consider them all. This is what a softmax does, so in this case contrastive learning reduces to standard supervised or self- supervised learning over a finite discrete set of symbols. But in a real-valued high-dimensional space, there are far too many ways a vector cacm6407_b.gif could be different from Y and to improve the model we need to focus on those Ys that should have high energy but currently have low energy. Early methods to pick negative samples were based on Monte-Carlo methods, such as contrastive divergence for restricted Boltzmann machines48 and noise-contrastive estimation.

The Deep Learning Revolution

Generative Adversarial Networks (GANs)36 train a generative neural net to produce contrastive samples by applying a neural network to latent samples from a known distribution (for example, a Gaussian). The generator trains itself to produce outputs to which the model gives low energy). The generator can do so using backpropagation to get the gradient of Ewith respect to . The generator and the model are trained simultaneously, with the model attempting to give low energy to training samples, and high energy to generated contrastive samples.

GANs are somewhat tricky to optimize, but adversarial training ideas have proved extremely fertile, producing impressive results in image synthesis, and opening up many new applications in content creation and domain adaptation34 as well as domain or style transfer.87

Making representations agree using contrastive learning. Contrastive learning provides a way to discover good feature vectors without having to reconstruct or generate pixels. The idea is to learn a feed-forward neural network that produces very similar output vectors when given two different crops of the same image10 or two different views of the same object17 but dissimilar output vectors for crops from different images or views of different objects. The squared distance between the two output vectors can be treated as an energy, which is pushed down for compatible pairs and pushed up for incompatible pairs.

A series of recent papers that use convolutional nets for extracting representations that agree have produced promising results in visual feature learning. The positive pairs are composed of different versions of the same image that are distorted through cropping, scaling, rotation, color shift, blurring, and so on. The negative pairs are similarly distorted versions of different images which may be cleverly picked from the dataset through a process called hard negative mining or may simply be all of the distorted versions of other images in a minibatch. The hidden activity vector of one of the higher-level layers of the network is subsequently used as input to a linear classifier trained in a supervised manner. This Siamese net approach has yielded excellent results on standard image recognition benchmarks.6, Very recently, two Siamese net approaches have managed to eschew the need for contrastive samples. The first one, dubbed SwAV, quantizes the output of one network to train the other network,20 the second one, dubbed BYOL, smoothes the weight trajectory of one of the two networks, which is apparently enough to prevent a collapse.

Restricted Boltzmann machine - definition

Variational auto-encoders. A popular recent self-supervised learning method is the Variational Auto-Encoder (VAE).58 This consists of an encoder network that maps the image into a latent code space and a decoder network that generates an image from a latent code. The VAE limits the information capacity of the latent code by adding Gaussian noise to the output of the encoder before it is passed to the decoder. This is akin to packing small noisy spheres into a larger sphere of minimum radius. The information capacity is limited by how many noisy spheres fit inside the containing sphere. The noisy spheres repel each other because a good reconstruction error requires a small overlap between codes that correspond to different samples. Mathematically, the system minimizes a free energy obtained through marginalization of the latent code over the noise distribution. However, minimizing this free energy with respect to the parameters is intractable, and one has to rely on variational approximation methods from statistical physics that minimize an upper bound of the free energy.

The Future of Deep Learning

The performance of deep learning systems can often be dramatically improved by simply scaling them up. With a lot more data and a lot more computation, they generally work a lot better. The language model GPT-318 with 175 billion parameters (which is still tiny compared with the number of synapses in the human brain) generates noticeably better text than GPT-2 with only 1.5 billion parameters. The chatbots Meena2 and BlenderBot73 also keep improving as they get bigger. Enormous effort is now going into scaling up and it will improve existing systems a lot, but there are fundamental deficiencies of current deep learning that cannot be overcome by scaling alone, as discussed here.

Comparing human learning abilities with current AI suggests several directions for improvement:

Supervised learning requires too much labeled data and model-free reinforcement learning requires far too many trials. Humans seem to be able to generalize well with far less experience.

Current systems are not as robust to changes in distribution as humans, who can quickly adapt to such changes with very few examples.

Current deep learning is most successful at perception tasks and generally what are called system 1 tasks. Using deep learning for system 2 tasks that require a deliberate sequence of steps is an exciting area that is still in its infancy.

What needs to be improved. From the early days, theoreticians of machine learning have focused on the iid assumption, which states that the test cases are expected to come from the same distribution as the training examples. Unfortunately, this is not a realistic assumption in the real world: just consider the non-stationarities due to actions of various agents changing the world, or the gradually expanding mental horizon of a learning agent which always has more to learn and discover. As a practical consequence, the performance of today's best AI systems tends to take a hit when they go from the lab to the field.

Our desire to achieve greater robustness when confronted with changes in distribution (called out-of-distribution generalization) is a special case of the more general objective of reducing sample complexity (the number of examples needed to generalize well) when faced with a new task—as in transfer learning and lifelong learning81—or simply with a change in distribution or in the relationship between states of the world and rewards. Current supervised learning systems require many more examples than humans (when having to learn a new task) and the situation is even worse for model-free reinforcement learning23 since each rewarded trial provides less information about the task than each labeled example. It has already been noted61,76 that humans can generalize in a way that is different and more powerful than ordinary iid generalization: we can correctly interpret novel combinations of existing concepts, even if those combinations are extremely unlikely under our training distribution, so long as they respect high-level syntactic and semantic patterns we have already learned. Recent studies help us clarify how different neural net architectures fare in terms of this systematic generalization ability. How can we design future machine learning systems with these abilities to generalize better or adapt faster out-of-distribution?

Lecture 12.3 — Restricted Boltzmann Machines — [ Deep Learning | Geoffrey Hinton | UofT ]

From homogeneous layers to groups of neurons that represent entities. Evidence from neuroscience suggests that groups of nearby neurons (forming what is called a hyper-column) are tightly connected and might represent a kind of higher-level vector-valued unit able to send not just a scalar quantity but rather a set of coordinated values. This idea is at the heart of the capsules architectures,47,59 and it is also inherent in the use of soft-attention mechanisms, where each element in the set is associated with a vector, from which one can read a key vector and a value vector (and sometimes also a query vector). One way to think about these vector-level units is as representing the detection of an object along with its attributes (like pose information, in capsules). Recent papers in computer vision are exploring extensions of convolutional neural networks in which the top level of the hierarchy represents a set of candidate objects detected in the input image, and operations on these candidates is performed with transformer-like architectures. Neural networks that assign intrinsic frames of reference to objects and their parts and recognize objects by using the geometric relationships between parts should be far less vulnerable to directed adversarial attacks,79 which rely on the large difference between the information used by people and that used by neural nets to recognize objects.

Multiple time scales of adaption. Most neural nets only have two timescales: the weights adapt slowly over many examples and the activities adapt rapidly changing with each new input. Adding an overlay of rapidly adapting and rapidly, decaying "fast weights"49 introduces interesting new computational abilities. In particular, it creates a high-capacity, short-term memory,4 which allows a neural net to perform true recursion in which the same neurons can be reused in a recursive call because their activity vector in the higher-level call can be reconstructed later using the information in the fast weights. Multiple time scales of adaption also arise in learning to learn, or meta-learning.

Higher-level cognition. When thinking about a new challenge, such as driving in a city with unusual traffic rules, or even imagining driving a vehicle on the moon, we can take advantage of pieces of knowledge and generic skills we have already mastered and recombine them dynamically in new ways. This form of systematic generalization allows humans to generalize fairly well in contexts that are very unlikely under their training distribution. We can then further improve with practice, fine-tuning and compiling these new skills so they do not need conscious attention anymore. How could we endow neural networks with the ability to adapt quickly to new settings by mostly reusing already known pieces of knowledge, thus avoiding interference with known skills? Initial steps in that direction include Transformers32 and Recurrent Independent Mechanisms.

It seems that our implicit (system 1) processing abilities allow us to guess potentially good or dangerous futures, when planning or reasoning. This raises the question of how system 1 networks could guide search and planning at the higher (system 2) level, maybe in the spirit of the value functions which guide Monte-Carlo tree search for AlphaGo.77

Machine learning research relies on inductive biases or priors in order to encourage learning in directions which are compatible with some assumptions about the world. The nature of system 2 processing and cognitive neuroscience theories for them5,30 suggests several such inductive biases and architectures,11,45 which may be exploited to design novel deep learning systems. How do we design deep learning architectures and training frameworks which incorporate such inductive biases?

The ability of young children to perform causal discovery37 suggests this may be a basic property of the human brain, and recent work suggests that optimizing out-of-distribution generalization under interventional changes can be used to train neural networks to discover causal dependencies or causal variables. How should we structure and train neural nets so they can capture these underlying causal properties of the world?

How are the directions suggested by these open questions related to the symbolic AI research program from the 20th century? Clearly, this symbolic AI program aimed at achieving system 2 abilities, such as reasoning, being able to factorize knowledge into pieces which can easily recombined in a sequence of computational steps, and being able to manipulate abstract variables, types, and instances. We would like to design neural networks which can do all these things while working with real-valued vectors so as to preserve the strengths of deep learning which include efficient large-scale learning using differentiable computation and gradient-based adaptation, grounding of high-level concepts in low-level perception and action, handling uncertain data, and using distributed representations.

Introduction to Restricted Boltzmann Machines.

Invented by Geoffrey Hinton(Sometimes referred to as the Godfather of Deep Learning), a Restricted Boltzmann machine is an algorithm useful for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling.

Before moving forward let us first understand what is Boltzmann Machines?

What are Boltzmann Machines?

A Boltzmann machine is a stochastic(non-deterministic) or generative deep learning model which has only visible(input) and hidden nodes.

The image below presents ten nodes in it and all of them are inter-connected and are also often referred to as States. Brown ones represent Hidden nodes (h)and blue ones represent Visible nodes (v). If you already understand Artificial, Convolutional, and Recurrent Neural networks, you’ll notice they never had their Input nodes connected, whereas Boltzmann Machines have their inputs connected & that is what makes them fundamentally unconventional. All these nodes exchange information among themselves and self-generate subsequent data hence termed as Generative deep model.

There is no output node in this model hence like our other classifiers, we cannot make this model learn 1 or 0 from the Target variable of the training dataset after applying gradient descent or stochastic gradient descent (SGD), etc. Exactly similar cases with our regressor models as well, where it cannot learn the pattern from Target variables. These attributes make the model non-deterministic. Thinking of how does this model then learns and predicts, is that intriguing enough?

Here, Visible nodes are what we measure and Hidden nodes are what we don’t measure. When we input data, these nodes learn all the parameters, their patterns, and correlation between those on their own and forms an efficient system, hence Boltzmann Machine is termed as an Unsupervised Deep Learning model. This model then gets ready to monitor and study abnormal behavior depending on what it has learned.

Hinton once referred to the illustration of a Nuclear Power plant as an example for understanding Boltzmann Machines. This is a complex topic so we shall proceed slowly to understand the intuition behind each concept, with a minimum amount of mathematics and physics involved.

So in the simplest introductory terms, Boltzmann Machines are primarily divided into two categories: Energy-based Models (EBMs) and Restricted Boltzmann Machines (RBMs). When these RBMs are stacked on top of each other, they are known as Deep Belief Networks (DBNs).

What are Restricted Boltzmann Machines?

A Restricted Boltzmann Machine (RBM) is a generative, stochastic, and 2-layer artificial neural network that can learn a probability distribution over its set of inputs.

Stochastic means “randomly determined”, and in RBMs, the coefficients that modify inputs are randomly initialized.

The first layer of the RBM is called the visible, or input layer, and the second is the hidden layer. Each circle represents a neuron-like unit called a node. Each node in the input layer is connected to every node of the hidden layer.

The restriction in a Restricted Boltzmann Machine is that there is no intra-layer communication(nodes of the same layer are not connected). This restriction allows for more efficient training algorithms than what is available for the general class of Boltzmann machines, in particular, the gradient-based contrastive divergence algorithm. Each node is a locus of computation that processes input and begins by making stochastic decisions about whether to transmit that input or not.

RBM received a lot of attention after being proposed as building blocks of multi-layer learning architectures called Deep Belief Networks(DBNs). When these RBMs are stacked on top of each other, they are known as DBNs.

Difference between Autoencoders & RBMs

Autoencoder is a simple 3-layer neural network where output units are directly connected back to input units. Typically, the number of hidden units is much less than the number of visible ones. The task of training is to minimize an error or reconstruction, i.e. find the most efficient compact representation for input data.

  • Working of Restricted Boltzmann Machine
  • One aspect that distinguishes RBM from other Neural networks is that it has two biases.
  • The hidden bias helps the RBM produce the activations on the forward pass, while
  • The visible layer’s biases help the RBM learn the reconstructions on the backward pass.

The reconstructed input is always different from the actual input as there are no connections among visible nodes and therefore, no way of transferring information among themselves.

The above image shows the first step in training an RBM with multiple inputs. The inputs are multiplied by the weights and then added to the bias. The result is then passed through a sigmoid activation function and the output determines if the hidden state gets activated or not. Weights will be a matrix with the number of input nodes as the number of rows and the number of hidden nodes as the number of columns. The first hidden node will receive the vector multiplication of the inputs multiplied by the first column of weights before the corresponding bias term is added to it.

More Information:

https://medium.com/edureka/restricted-boltzmann-machine-tutorial-991ae688c154

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6997788/

https://www.frontiersin.org/articles/10.3389/fphar.2019.01631/full

https://cacm.acm.org/magazines/2021/7/253464-deep-learning-for-ai/fulltext

https://www.theaidream.com/post/introduction-to-restricted-boltzmann-machines-rbms

https://onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog0901_7

https://vinodsblog.com/2020/07/28/deep-learning-introduction-to-boltzmann-machines/

http://www.cs.toronto.edu/~hinton/papers.html

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6997788/pdf/fphar-10-01631.pdf

https://research.google.com/pubs/GeoffreyHinton.html?source=post_page---------------------------



Azure DDOS Security and DDOS Attack Prevention

$
0
0

 

Azure DDoS Protection | Distributed Denial of Service

Azure Security Center

Fundamental best practices

The following sections give prescriptive guidance to build DDoS-resilient services on Azure.

Design for security

Ensure that security is a priority throughout the entire lifecycle of an application, from design and implementation to deployment and operations. Applications can have bugs that allow a relatively low volume of requests to use an inordinate amount of resources, resulting in a service outage.

To help protect a service running on Microsoft Azure, you should have a good understanding of your application architecture and focus on the five pillars of software quality. You should know typical traffic volumes, the connectivity model between the application and other applications, and the service endpoints that are exposed to the public internet.

Ensuring that an application is resilient enough to handle a denial of service that's targeted at the application itself is most important. Security and privacy are built into the Azure platform, beginning with the Security Development Lifecycle (SDL). The SDL addresses security at every development phase and ensures that Azure is continually updated to make it even more secure.

Microsoft Security Virtual Training Day: Security, Compliance and Identity Fundamentals 1

Design for scalability

Scalability is how well a system can handle increased load. Design your applications to scale horizontally to meet the demand of an amplified load, specifically in the event of a DDoS attack. If your application depends on a single instance of a service, it creates a single point of failure. Provisioning multiple instances makes your system more resilient and more scalable.

For Azure App Service, select an App Service plan that offers multiple instances. For Azure Cloud Services, configure each of your roles to use multiple instances. For Azure Virtual Machines, ensure that your virtual machine (VM) architecture includes more than one VM and that each VM is included in an availability set. We recommend using virtual machine scale sets for autoscaling capabilities.

Defense in depth

The idea behind defense in depth is to manage risk by using diverse defensive strategies. Layering security defenses in an application reduces the chance of a successful attack. We recommend that you implement secure designs for your applications by using the built-in capabilities of the Azure platform.

For example, the risk of attack increases with the size (surface area) of the application. You can reduce the surface area by using an approval list to close down the exposed IP address space and listening ports that are not needed on the load balancers (Azure Load Balancer and Azure Application Gateway). Network security groups (NSGs) are another way to reduce the attack surface. You can use service tags and application security groups to minimize complexity for creating security rules and configuring network security, as a natural extension of an application’s structure.

You should deploy Azure services in a virtual network whenever possible. This practice allows service resources to communicate through private IP addresses. Azure service traffic from a virtual network uses public IP addresses as source IP addresses by default. Using service endpoints will switch service traffic to use virtual network private addresses as the source IP addresses when they're accessing the Azure service from a virtual network.

We often see customers' on-premises resources getting attacked along with their resources in Azure. If you're connecting an on-premises environment to Azure, we recommend that you minimize exposure of on-premises resources to the public internet. You can use the scale and advanced DDoS protection capabilities of Azure by deploying your well-known public entities in Azure. Because these publicly accessible entities are often a target for DDoS attacks, putting them in Azure reduces the impact on your on-premises resources.

Azure DDoS Protection | Distributed Denial of Service

Azure DDoS Protection Standard features

The following sections outline the key features of the Azure DDoS Protection Standard service.

Always-on traffic monitoring

DDoS Protection Standard monitors actual traffic utilization and constantly compares it against the thresholds defined in the DDoS Policy. When the traffic threshold is exceeded, DDoS mitigation is initiated automatically. When traffic returns below the thresholds, the mitigation is stopped.

Azure DDoS Protection Standard Mitigation

During mitigation, traffic sent to the protected resource is redirected by the DDoS protection service and several checks are performed, such as:

Ensure packets conform to internet specifications and are not malformed.

Interact with the client to determine if the traffic is potentially a spoofed packet (e.g: SYN Auth or SYN Cookie or by dropping a packet for the source to retransmit it).

Rate-limit packets, if no other enforcement method can be performed.

DDoS protection drops attack traffic and forwards the remaining traffic to its intended destination. Within a few minutes of attack detection, you are notified using Azure Monitor metrics. By configuring logging on DDoS Protection Standard telemetry, you can write the logs to available options for future analysis. Metric data in Azure Monitor for DDoS Protection Standard is retained for 30 days.

Adaptive real time tuning

The complexity of attacks (for example, multi-vector DDoS attacks) and the application-specific Behaviors of tenants call for per-customer, tailored protection policies. The service accomplishes this by using two insights:

Automatic learning of per-customer (per-Public IP) traffic patterns for Layer 3 and 4.

Minimizing false positives, considering that the scale of Azure allows it to absorb a significant amount of traffic.

Diagram of how DDoS Protection Standard works, with "Policy Generation" circled

DDoS Protection telemetry, monitoring, and alerting

DDoS Protection Standard exposes rich telemetry via Azure Monitor. You can configure alerts for any of the Azure Monitor metrics that DDoS Protection uses. You can integrate logging with Splunk (Azure Event Hubs), Azure Monitor logs, and Azure Storage for advanced analysis via the Azure Monitor Diagnostics interface.

DDoS mitigation policies

In the Azure portal, select Monitor > Metrics. In the Metrics pane, select the resource group, select a resource type of Public IP Address, and select your Azure public IP address. DDoS metrics are visible in the Available metrics pane.

DDoS Protection Standard applies three autotuned mitigation policies (TCP SYN, TCP, and UDP) for each public IP of the protected resource, in the virtual network that has DDoS enabled. You can view the policy thresholds by selecting the metric Inbound packets to trigger DDoS mitigation.

Available metrics and metrics chart

The policy thresholds are autoconfigured via machine learning-based network traffic profiling. DDoS mitigation occurs for an IP address under attack only when the policy threshold is exceeded.

Metric for an IP address under DDoS attack

If the public IP address is under attack, the value for the metric Under DDoS attack or not changes to 1 as DDoS Protection performs mitigation on the attack traffic.

We recommend configuring an alert on this metric. You'll then be notified when there’s an active DDoS mitigation performed on your public IP address.

For more information, see Manage Azure DDoS Protection Standard using the Azure portal.

Azure Network Security: DDoS Protection

Web application firewall for resource attacks

Specific to resource attacks at the application layer, you should configure a web application firewall (WAF) to help secure web applications. A WAF inspects inbound web traffic to block SQL injections, cross-site scripting, DDoS, and other Layer 7 attacks. Azure provides WAF as a feature of Application Gateway for centralized protection of your web applications from common exploits and vulnerabilities. There are other WAF offerings available from Azure partners that might be more suitable for your needs via the Azure Marketplace.

Even web application firewalls are susceptible to volumetric and state exhaustion attacks. We strongly recommend enabling DDoS Protection Standard on the WAF virtual network to help protect from volumetric and protocol attacks. For more information, see the DDoS Protection reference architectures section.

Protection Planning

Planning and preparation are crucial to understand how a system will perform during a DDoS attack. Designing an incident management response plan is part of this effort.

If you have DDoS Protection Standard, make sure that it's enabled on the virtual network of internet-facing endpoints. Configuring DDoS alerts helps you constantly watch for any potential attacks on your infrastructure.

Monitor your applications independently. Understand the normal behavior of an application. Prepare to act if the application is not behaving as expected during a DDoS attack.

Receiving Distributed Denial of Service (DDoS) attack threats?

DDoS threats have seen a significant rise in frequency lately, and Microsoft stopped numerous large-scale DDoS attacks last year. This guide provides an overview of what Microsoft provides at the platform level, information on recent mitigations, and best practices.

Getting started with Azure DDoS Protection - Azure Network Security webinar

Microsoft DDoS platform

Microsoft provides robust protection against layer three (L3) and layer four (L4) DDoS attacks, which include TCP SYN, new connections, and UDP/ICMP/TCP floods.

Microsoft DDoS Protection utilizes Azure’s global deployment scale, is distributed in nature, and offers 60Tbps of global attack mitigation capacity.

All Microsoft services (including Microsoft365, Azure, and Xbox) are protected by platform level DDoS protection. Microsoft's cloud services are intentionally built to support high loads, which help to protect against application-level DDoS attacks.

All Azure public endpoint VIPs (Virtual IP Address) are guarded at platform safe thresholds. The protection extends to traffic flows inbound from the internet, outbound to the internet, and from region to region.

Microsoft uses standard detection and mitigation techniques such as SYN cookies, rate limiting, and connection limits to protect against DDoS attacks. To support automated protections, a cross-workload DDoS incident response team identifies the roles and responsibilities across teams, the criteria for escalations, and the protocols for incident handling across affected teams.

Microsoft also takes a proactive approach to DDoS defense. Botnets are a common source of command and control for conducting DDoS attacks to amplify attacks and maintain anonymity. The Microsoft Digital Crimes Unit (DCU) focuses on identifying, investigating, and disrupting malware distribution and communications infrastructure to reduce the scale and impact of botnets.

At Microsoft, despite the evolving challenges in the cyber landscape, the Azure DDoS Protection team was able to successfully mitigate some of the largest DDoS attacks ever, both in Azure and in the course of history.

Last October 2021, Microsoft reported on a 2.4 terabit per second (Tbps) DDoS attack in Azure that we successfully mitigated. Since then, we have mitigated three larger attacks.

In November 2021, Microsoft mitigated a DDoS attack with a throughput of 3.47 Tbps and a packet rate of 340 million packets per second (pps), targeting an Azure customer in Asia. As of February 2022, this is believed to be the largest attack ever reported in history. It was a distributed attack originating from approximately 10,000 sources and from multiple countries across the globe, including the United States, China, South Korea, Russia, Thailand, India, Vietnam, Iran, Indonesia, and Taiwan.

Azure Network Security webinar: Getting started with Azure DDoS Protection

Protect your applications in Azure against DDoS attacks in three steps:

Customers can protect their Azure workloads by onboarding to Azure DDoS Protection Standard. For web workloads it is recommended to use web application firewall in conjunction with DDoS Protection Standard for extensive L3-L7 protection.

1. Evaluate risks for your Azure applications. This is the time to understand the scope of your risk from a DDoS attack if you haven’t done so already.

a. If there are virtual networks with applications exposed over the public internet, we strongly recommend enabling DDoS Protection on those virtual networks. Resources in a virtual network that requires protection against DDoS attacks are Azure Application Gateway and Azure Web Application Firewall (WAF), Azure Load Balancer, virtual machines, Bastion, Kubernetes, and Azure Firewall. Review “DDoS Protection reference architectures” to get more details on reference architectures to protect resources in virtual networks against DDoS attacks.

Enabling DDOS Protection Standard on a VNET

2. Validate your assumptions. Planning and preparation are crucial to understanding how a system will perform during a DDoS attack. You should be proactive to defend against DDoS attacks and not wait for an attack to happen and then act.

a. It is essential that you understand the normal behavior of an application and prepare to act if the application is not behaving as expected during a DDoS attack. Have monitors configured for your business-critical applications that mimic client behavior and notify you when relevant anomalies are detected. Refer to monitoring and diagnostics best practices to gain insights on the health of your application.

b. Azure Application Insights is an extensible application performance management (APM) service for web developers on multiple platforms. Use Application Insights to monitor your live web application. It automatically detects performance anomalies. It includes analytics tools to help you diagnose issues and to understand what users do with your app. It's designed to help you continuously improve performance and usability.

c. Finally, test your assumptions about how your services will respond to an attack by generating traffic against your applications to simulate DDoS attack. Don’t wait for an actual attack to happen! We have partnered with Ixia, a Keysight company, to provide a self-service traffic generator (BreakingPoint Cloud) that allows Azure DDoS Protection customers to simulate DDoS test traffic against their Azure public endpoints.

3. Configure alerts and attack analytics. Azure DDoS Protection identifies and mitigates DDoS attacks without any user intervention.

a. To get notified when there’s an active mitigation for a protected public IP, we recommend configuring an alert on the metric under DDoS attack or not. DDoS attack mitigation alerts are automatically sent to Microsoft Defender for Cloud.

b. You should also configure attack analytics to understand the scale of the attack, traffic being dropped, and other details.

Microsoft Azure Security Overview

DDOS attack analytics

Best practices to be followed

Provision enough service capacity and enable auto-scaling to absorb the initial burst of a DDoS attack.

Reduce attack surfaces; reevaluate the public endpoints and decide whether they need to be publicly accessible.

If applicable, configure Network Security Group to further lock-down surfaces.

If IIS (Internet Information Services) is used, leverage IIS Dynamic IP Address Restrictions to control traffic from malicious IPs.

Setup monitoring and alerting if you have not done so already.

Some of the counters to monitor:

  • TCP connection established
  • Web current connections
  • Web connection attempts

Optionally, use third-party security offerings, such as web application firewalls or inline virtual appliances, from the Azure Marketplace for additional L7 protection that is not covered via Azure DDoS Protection and Azure WAF (Azure Web Application Firewall).

When to contact Microsoft support

During a DDoS attack if you find that the performance of the protected resource is severely degraded, or the resource is not available. Review step two above on configuring monitors to detect resource availability and performance issues.

You think your resource is under DDoS attack, but DDoS Protection service is not mitigating the attack effectively.

You're planning a viral event that will significantly increase your network traffic.

Microsoft denial-of-service Defense Strategy

Denial-of-service Defense Strategy

Microsoft's strategy to defend against network-based distributed denial-of-service (DDoS) attacks is unique due to a large global footprint, allowing Microsoft to utilize strategies and techniques that are unavailable to most other organizations. Additionally, Microsoft contributes to and draws from collective knowledge aggregated by an extensive threat intelligence network, which includes Microsoft partners and the broader internet security community. This intelligence, along with information gathered from online services and Microsoft's global customer base, continuously improves Microsoft's DDoS defense system that protects all of Microsoft online services' assets.

The cornerstone of Microsoft's DDoS strategy is global presence. Microsoft engages with Internet providers, peering providers (public and private), and private corporations all over the world. This engagement gives Microsoft a significant Internet presence and enables Microsoft to absorb attacks across a large surface area

As Microsoft's edge capacity has grown over time, the significance of attacks against individual edges has substantially diminished. Because of this decrease, Microsoft has separated the detection and mitigation components of its DDoS prevention system. Microsoft deploys multi-tiered detection systems at regional datacenters to detect attacks closer to their saturation points while maintaining global mitigation at the edge nodes. This strategy ensures that Microsoft services can handle multiple simultaneous attacks.

One of the most effective and low-cost defenses employed by Microsoft against DDoS attacks is reducing service attack surfaces. Unwanted traffic is dropped at the network edge instead of analyzing, processing, and scrubbing the data inline.

At the interface with the public network, Microsoft uses special-purpose security devices for firewall, network address translation, and IP filtering functions. Microsoft also uses global equal-cost multi-path (ECMP) routing. Global ECMP routing is a network framework to ensure that there are multiple global paths to reach a service. With multiple paths to each service, DDoS attacks are limited to the region from which the attack originates. Other regions should be unaffected by the attack, as end users would use other paths to reach the service in those regions. Microsoft has also developed internal DDoS correlation and detection systems that use flow data, performance metrics, and other information to rapidly detect DDoS attacks.

Use Azure Security Center to prevent, detect, and respond to threats

To further protect cloud services, Microsoft uses Azure DDoS Protection, a DDoS defense system built into Microsoft Azure's continuous monitoring and penetration-testing processes. Azure DDoS Protection is designed not only to withstand external attacks, but also attacks from other Azure tenants. Azure uses standard detection and mitigation techniques such as SYN cookies, rate limiting, and connection limits to protect against DDoS attacks. To support automated protections, a cross-workload DDoS incident response team identifies the roles and responsibilities across teams, the criteria for escalations, and the protocols for incident handling across affected teams.

Most DDoS attacks launched against targets are at the Network (L3) and Transport (L4) layers of the Open Systems Interconnection (OSI) model. Attacks directed at the L3 and L4 layers are designed to flood a network interface or service with attack traffic to overwhelm resources and deny the ability to respond to legitimate traffic. To guard against L3 and L4 attacks, Microsoft's DDoS solutions use traffic sampling data from datacenter routers to safeguard the infrastructure and customer targets. Traffic sampling data is analyzed by a network monitoring service to detect attacks. When an attack is detected, automated defense mechanisms kick in to mitigate the attack and ensure that attack traffic directed at one customer does not result in collateral damage or diminished network quality of service for other customers.

Microsoft also takes an offensive approach to DDoS defense. Botnets are a common source of command and control for conducting DDoS attacks to amplify attacks and maintain anonymity. The Microsoft Digital Crimes Unit (DCU) focuses on identifying, investigating, and disrupting malware distribution and communications infrastructure to reduce the scale and impact of botnets.

Azure Network Security webinar: Safeguards for Successful Azure DDoS Protection Standard Deployment

Application-level Defenses

Microsoft's cloud services are intentionally built to support high loads, which help to protect against application-level DDoS attacks. Microsoft's scaled-out architecture distributes services across multiple global datacenters with regional isolation and workload-specific throttling features for relevant workloads.

Each customer's country or region, which the customer's administrator identifies during the initial configuration of the services, determines the primary storage location for that customer's data. Customer data is replicated between redundant datacenters according to a primary/backup strategy. A primary datacenter hosts the application software along with all the primary customer data running on the software. A backup datacenter provides automatic failover. If the primary datacenter ceases to function for any reason, requests are redirected to the copy of the software and customer data in the backup datacenter. At any given time, customer data may be processed in either the primary or the backup datacenter. Distributing data across multiple datacenters reduces the affected surface area in case one datacenter is attacked. Furthermore, the services in the affected datacenter can be quickly redirected to the secondary datacenter to maintain availability during an attack and redirected back to the primary datacenter once an attack has been mitigated.

As another mitigation against DDoS attacks, individual workloads include built-in features that manage resource utilization. For example, the throttling mechanisms in Exchange Online and SharePoint Online are part of a multi-layered approach to defending against DDoS attacks.

Azure SQL Database has an extra layer of security in the form of a gateway service called DoSGuard that tracks failed login attempts based on IP address. If the threshold for failed login attempts from the same IP is reached, DoSGuard blocks the address for a pre-determined amount of time.

Resources

  • Azure DDoS Protection Standard overview
  • Azure DDoS Protection Standard fundamental best practices
  • Components of a DDoS response strategy

Azure DDoS Protection Service preview

Announcing DDoS Protection preview for Azure

Microsoft Azure

Distributed Denial of Service (DDoS) attacks are one of the top availability and security concerns voiced by customers moving their applications to the cloud. These concerns are justified as the number of documented DDoS attacks grew 380% in Q1 2017 over Q1 2016 according to data from Nexusguard. In October 2016, a number of popular websites were impacted by a massive cyberattack consisting of multiple denial of service attacks. It’s estimated that up to one third of all Internet downtime incidents are related to DDoS attacks.

As the types and sophistication of network attacks increases, Azure is committed to providing our customers with solutions that continue to protect the security and availability of applications on Azure. Security and availability in the cloud is a shared responsibility. Azure provides platform level capabilities and design best practices for customers to adopting and apply into application designs meeting their business objectives.

Today we're excited to announce the preview of Azure DDoS Protection Standard. This service is integrated with Virtual Networks and provides protection for Azure applications from the impacts of DDoS attacks.  It enables additional application specific tuning, alerting and telemetry features beyond the basic DDoS Protection which is included automatically in the Azure platform.  

Azure DDoS Protection Service offerings

Azure DDoS Protection Basic service

Basic protection is integrated into the Azure platform by default and at no additional cost. The full scale and capacity of Azure’s globally deployed network provides defense against common network layer attacks through always on traffic monitoring and real-time mitigation. No user configuration or application changes are required to enable DDoS Protection Basic.

Global DDOS Mitigation Presence

Azure DDoS Protection Standard service

Azure DDoS Protection Standard is a new offering which provides additional DDoS mitigation capabilities and is automatically tuned to protect your specific Azure resources. Protection is simple to enable on any new or existing Virtual Network and requires no application or resource changes. Standard utilizes dedicated monitoring and machine learning to configure DDoS protection policies tuned to your Virtual Network. This additional protection is achieved by profiling your application’s normal traffic patterns, intelligently detecting malicious traffic and mitigating attacks as soon as they are detected. DDoS Protection Standard provides attack telemetry views through Azure Monitor, enabling alerting when your application is under attack. Integrated Layer 7 application protection can be provided by Application Gateway WAF.

Azure Network Security | EMEA Security Days April 11-12, 2022

Azure DDoS Protection Standard service features

Native Platform Integration

Azure DDoS Protection is natively integrated into Azure and includes configuration through the Azure Portal and PowerShell when you enable it on a Virtual Network (VNet).

Turn Key Protection

Simplified provisioning immediately protects all resources in a Virtual Network with no additional application changes required.

create-virtual-network

Always on monitoring

When DDoS Protection is enabled, your application traffic patterns are continuously monitored for indicators of attacks.

Adaptive tuning

DDoS protection understands your resources and resource configuration and customizes the DDoS Protection policy to your Virtual Network. Machine Learning algorithms set and adjust protection policies as traffic patterns change over time. Protection policies define protection limits, and mitigation is performed when actual network traffic exceeds the policies threshold.

internet-traffic

L3 to L7 Protection with Application Gateway

Azure DDoS Protection service in combination with Application Gateway Web application firewall provides DDoS Protection for common web vulnerabilities and attacks.

  • Request rate-limiting
  • HTTP Protocol Violations
  • HTTP Protocol Anomalies
  • SQL Injection
  • Cross site scripting
  • virtual-network

DDoS Protection telemetry, monitoring & alerting

Rich telemetry is exposed via Azure Monitor including detailed metrics during the duration of a DDoS attack. Alerting can be configured for any of the Azure Monitor metrics exposed by DDoS Protection. Logging can be further integrated with Splunk (Azure Event Hubs), OMS Log Analytics and Azure Storage for advanced analysis via the Azure Monitor Diagnostics interface. 

Cost protection

When the DDoS Protection services goes GA, Cost Protection will provide resource credits for scale out during a documented attack.

Azure DDoS Protection Standard service availability

Azure DDoS Protection is now available for preview in select regions in US, Europe, and Asia. For details, see DDoS Protection.

Azure DDoS Protection Standard overview

Distributed denial of service (DDoS) attacks are some of the largest availability and security concerns facing customers that are moving their applications to the cloud. A DDoS attack attempts to exhaust an application's resources, making the application unavailable to legitimate users. DDoS attacks can be targeted at any endpoint that is publicly reachable through the internet.

Azure DDoS Protection Standard, combined with application design best practices, provides enhanced DDoS mitigation features to defend against DDoS attacks. It is automatically tuned to help protect your specific Azure resources in a virtual network. Protection is simple to enable on any new or existing virtual network, and it requires no application or resource changes.

Features

- Native platform integration: Natively integrated into Azure. Includes configuration through the Azure portal. DDoS Protection Standard understands your resources and resource configuration.

- Turnkey protection: Simplified configuration immediately protects all resources on a virtual network as soon as DDoS Protection Standard is enabled. No intervention or user definition is required.

- Always-on traffic monitoring: Your application traffic patterns are monitored 24 hours a day, 7 days a week, looking for indicators of DDoS attacks. DDoS Protection Standard instantly and automatically mitigates the attack, once it is detected.

- Adaptive tuning: Intelligent traffic profiling learns your application's traffic over time, and selects and updates the profile that is the most suitable for your service. The profile adjusts as traffic changes over time.

- Multi-Layered protection: When deployed with a web application firewall (WAF), DDoS Protection Standard protects both at the network layer (Layer 3 and 4, offered by Azure DDoS Protection Standard) and at the application layer (Layer 7, offered by a WAF). WAF offerings include Azure Application Gateway WAF SKU as well as third-party web application firewall offerings available in the Azure Marketplace.

- Extensive mitigation scale: Over 60 different attack types can be mitigated, with global capacity, to protect against the largest known DDoS attacks.

- Attack analytics: Get detailed reports in five-minute increments during an attack, and a complete summary after the attack ends. Stream mitigation flow logs to Microsoft Sentinel or an offline security information and event management (SIEM) system for near real-time monitoring during an attack.

- Attack metrics: Summarized metrics from each attack are accessible through Azure Monitor.

- Attack alerting: Alerts can be configured at the start and stop of an attack, and over the attack's duration, using built-in attack metrics. Alerts integrate into your operational software like Microsoft Azure Monitor logs, Splunk, Azure Storage, Email, and the Azure portal.

- DDoS Rapid Response: Engage the DDoS Protection Rapid Response (DRR) team for help with attack investigation and analysis. To learn more, see DDoS Rapid Response.

- Cost guarantee: Receive data-transfer and application scale-out service credit for resource costs incurred as a result of documented DDoS attacks.



More Information:

https://azure.microsoft.com/en-us/services/ddos-protection

https://docs.microsoft.com/en-us/learn/modules/perimeter-security

https://blog.sflow.com/2014/07/ddos-mitigation-with-cumulus-linux.html

https://azure.microsoft.com/en-us/blog/azure-ddos-protection-service-preview

https://docs.microsoft.com/en-us/azure/ddos-protection/manage-ddos-protection

https://docs.microsoft.com/en-us/azure/ddos-protection/ddos-protection-reference-architectures

https://docs.microsoft.com/en-us/azure/ddos-protection/ddos-protection-standard-features

https://azure.microsoft.com/en-us/blog/microsoft-ddos-protection-response-guide

https://azure.microsoft.com/en-us/blog/azure-ddos-protection-2021-q3-and-q4-ddos-attack-trends

https://docs.microsoft.com/en-us/compliance/assurance/assurance-microsoft-dos-defense-strategy


What's new in Red Hat Enterprise Linux 9

$
0
0

 


RHEL 9’s Release Indicates a Turning Point for Red Hat

Can RHEL 9 put the company's Cent OS debacle behind it? The future's at least more secure.

The latest release of Red Hat Enterprise Linux version 9 into production marks a significant point in the company’s history. It’s the first version preceded in developmental terms by CentOS Stream, an OS that’s production-ready yet is something of a late-stage testing ground for new-ish features (trialed initially in Fedora) that will eventually percolate to grown-up Red Hat Enterprise Linux.

Red Hat Enterprise Linux 9: Stable anywhere. Available everywhere.


The biggest changes in RHEL 9 are in security and compliance, the latter in particular for so long the ugly step-sister of enterprise Linux, yet increasingly becoming a core pillar on which businesses can operate legally and more securely.

To help companies do more than engage in box-ticking exercises for governance like PCI-DSS compliance, security options now include smartcard authentication, more detailed SSSD logging, use of OpenSSL3 by default, and removal of root access to a RHEL box via SSH. Kernel patches on servers are also possible now without rebooting from a sys admin’s web console, and there are built-in checks against hardware layer vulnerabilities like Meltdown and Spectre.

There are improvements for Red Hat-flavored containers in Podman, and UBI images have been updated in their standard, micro, mini, and init forms. Container validation is improved, so there’s less danger of time-poor developers pulling rogue containers from spoofed domains.

Red Hat’s official press releases of RHEL version 9 stress the edge capabilities of the OS under the hood, making it easier for organizations to create canonical images that can be rolled out quickly at scale. There’s also a Podman roll-back capability that detects if new containers won’t start and will quietly replace the new with the (working) old.

To developers, of interest are newer versions of Python (3.9) and GCC (11) by default, plus there are the latest versions of Rust and Go. Applications in Flatpaks are fully welcomed (the current vogue for immutable Linux distributions takes another step towards mainstream), but RPMs are clearly not going anywhere just yet.

Red Hat’s other significant turning point is that RHEL 9 might just draw to a close the absolute class-A public relations SNAFU the company presided over when CentOS was discontinued. Or, to be more particular, when it was transitioned from an OS running in parallel with RHEL to a leading-edge, semi-rolling version of the more stable, licensed, production-ready RHEL OS.

German company SUSE does well, also consolidating container security player NeuVector in line with its Rancher acquisition in 2021

SUSE support goes multi-colored off Q4 2021 results

The phrase “mis-communication” tends to cover up any number of mistakes in business environments, ranging from a misdirected email, to, in Red Hat’s case, a full-on mishandling of product announcements that had incendiary effects in the business technology community.

But “mis-communications” aside, the Red Hat stable’s lineup appears to be more settled and accepted than a year ago. Registered users can run RHEL on a dozen or so instances without forking out for license fees, and Stream 8 is gradually finding itself in production too. The company’s Matthew Hicks (executive VP for products and technologies) said, “[…] Red Hat Enterprise Linux 9 extends wherever needed […], pairing the trusted backbone of enterprise Linux with the innovative catalysts of open source communities.” The community’s “innovative catalysts” are only just finished licking their wounds inflicted by a Red Hat marketing division that many would expect to have experienced a few personnel changes of late.

Red Hat Insights Overview

Red Hat Enterprise Linux (RHEL) 9 is now generally available (GA). This release is designed to meet the needs of the hybrid cloud environment, and is ready for you to develop and deploy from the edge to the cloud. It can run your code efficiently whether deployed on physical infrastructure, in a virtual machine, or in containers built from Red Hat Universal Base Images (UBIs).

RHEL 9 can be downloaded for free as part of the Red Hat Developer program subscription. In this article, you'll learn some of the ways that RHEL 9 can improve the developer experience.

Get access to the latest language runtimes and tools

Red Hat Enterprise Linux 9 is built with a number of the latest runtimes and compilers, including GCC 11.2 and updated versions of LLVM (12.0.1), Rust (1.54.0), and Go (1.16.6), enabling developers to modernize their applications.

RHEL 9 ships with updated versions of core developer toolchains such as GCC (11.2), glibc (2.34), and binutils (2.35). The new features in the GCC compiler help users better track code flow, improve debugging options, and write optimized code for efficient hardware usage. The new GCC compiler comes with modifications for C and C++ code compilation, along with new debugging messages for logs. That gives developers a better handle on how their code performs.

With next-generation application streams, developers will have more choices when it comes to versions of popular languages and tools. Red Hat Enterprise Linux 9 improves the application streams experience by providing initial application stream versions that can be installed as RPM packages using the traditional yum install command. Developers can select from multiple versions of user-space components as application streams that are easy to update, providing greater flexibility to customize RHEL for their development environment. Application stream contents also include tools and applications that move very fast and are updated frequently. These application streams, called rolling streams, are fully supported for the full life of RHEL 9.

In the Clouds (E23) | Red Hat Enterprise Linux 9 preview

Red Hat Enterprise Linux 9 extends RHEL 8's module packaging features. With RHEL 9, all packaging methods, such as Red Hat Software Collections, Flatpaks, and traditional RPMs, have been incorporated into application streams, making it easier for developers to use their preferred packages.

Support for newer versions of language runtimes

Python 3.9 gets lifetime support in Red Hat Enterprise Linux 9 and comes with a host of new features, including timezone-aware timestamps, new string prefix and suffix methods, dictionary union operations, high-performance parsers, multiprocessing improvements, and more. These features will help developers modernize their applications easily.

Node.js 6 provides changes that include an upgrade to the V8 engine to version 9.2, a new Timer Promises API, a new experimental web streams API, and support for npm package manager version 7.20.3. Node.js is now compatible with OpenSSL 3.0.

Ruby 3.0.2 provides several performance improvements, along with bug and security fixes. Some of the important improvements include concurrency and parallelism, static analysis, pattern matching with case/in expressions, redesigned one-line pattern matching, and find pattern matching.

Perl 5.32 provides a number of bug fixes and enhancements, including Unicode version 13, a new experimental infix operator, faster feature checks, and more.

PHP 8.0 provides several bug fixes and enhancements, such as the use of structured metadata syntax, newly named arguments that are order-independent, improved performance for Just-In-Time compilation, and more.

High Performance Computing (HPC) with Red Hat


Build Red Hat Enterprise Linux images for development and testing

Image builder is a tool that allows users to create custom RHEL system images in a variety of formats for major and minor releases. These images are compatible with major cloud providers and virtualization technologies popular in the market. This enables users to quickly spin up customized RHEL development environments on local, on-premise, or cloud platforms.

With image builder, custom filesystem configurations can be specified in blueprints to create images with a specific disk layout, instead of using the default layout configuration.

Image builder can be used to create bootable ISO installer images. These images consist of a tarball that contains a root filesystem that you can use to install directly to a bare metal server, which is ideal for bringing up test hardware for edge developments.

Monitor and maintain Red Hat Enterprise Linux environments

The Red Hat Enterprise Linux 9 web console has an enhanced performance metrics page that helps identify potential causes of high CPU, memory, disk, and network resource usage spikes. In addition, subsystem metrics can be easily exported to a Grafana server.

RHEL 9 also now supports kernel live patching via the web console. The latest critical kernel security patches and updates can be applied immediately without any need for scheduled downtime, and without disrupting ongoing development or production applications.

Build containers with Universal Base Images

Red Hat Enterprise Linux 9 ships with control groups (cgroup) and a recent release of Podman with improved defaults. Signature and container short-name validation are enabled by default and containerized applications can be tested on the out-of-the-box RHEL 9 configuration.

The RHEL 9 UBI is available in standard, micro, minimal or init image configurations, which range in size from as small as 7.5MB up to 80MB. Learn more about how to build, run, and manage containers.

Identity and security

With Red Hat Enterprise Linux 9, root user authentication with a password over SSH has been disabled by default. The OpenSSH default configuration disallows root user login with a password, thereby preventing attackers from gaining access through brute-force password attacks. Instead of using the root password, developers can access remote development environments using SSH keys to log in.

OpenSSL 3.0 adds a provider concept, a new versioning scheme, and an improved HTTPS. Providers are collections of algorithm implementations. Developers can programmatically invoke any providers based on application requirements. Built-in RHEL utilities have been recompiled to utilize OpenSSL 3. This allows users to take advantage of new security ciphers for encrypting and protecting information.

We are excited to announce the availability of Red Hat Enterprise Linux 9 (RHEL 9), the latest release of the world’s leading enterprise Linux platform. RHEL 9 provides a more flexible and stable foundation to support hybrid cloud innovation and a faster, more consistent experience for deploying applications and critical workloads across physical, virtual, private and public cloud and edge deployments.

Proactive Threat Hunting in Red Hat Environments With CrowdStrike


What’s new?

RHEL 9 includes features and enhancements to help achieve long-term IT success by using a common, flexible foundation to support innovation and accelerate time to market.

Primary features and benefits

Here are a few highlights of what’s included in RHEL 9.

A new platform for developers today and in the future

- Completing the migration to Python 3, version 3.9 will be the default Python for the life of RHEL 9. Python 3.9 brings several new enhancements, including timezone-aware timestamps, the recent string prefix, suffix methods and dictionary union operations to help developers modernize existing apps.

- RHEL 9 is also built with GCC 11 and the latest versions of LLVM, Rust and Go compilers. RHEL 9 is based on glibc 2.34 for 10+ years of enterprise-class platform stability.

- And finally, for the first time in RHEL, Link Time Optimization (LTO) will be enabled by default in userspace for deeper optimization of application code to help build smaller, more efficient executables.

Easy contribution path to future versions of RHEL

Organizations can now develop, test and contribute to a continuously-delivered distribution that tracks just ahead of RHEL. CentOS Stream, an upstream open source development platform, provides a seamless contribution path to the next minor release. RHEL 9 is the first RHEL major release built from CentOS Stream, and the RHEL 9 Beta was first available as CentOS Stream 9. All future RHEL 9 releases will be built from CentOS Stream.

Next-generation application streams

Building on the introduction of application streams and module packaging in RHEL 8, all packaging methods in RHEL 9 are incorporated into application streams, including modules, SCLs, Flatpacks and traditional RPMs, making them much easier to use.

Continuing commitment to multiple architecture support

Open source software gives users greater control over their digital future by preventing workloads from being locked into a specific vendor. RHEL extends this control beyond the source code by enabling diverse CPU architectures for users that need an evolving business environment. Whether you're running your workload on x86_64, aarch64, IBM POWER9, Power10, or IBM Z, we have you covered.

Container improvements

If you're building applications with universal base image (UBI) container images, you'll want to check out the RHEL 9 UBI images. The standard UBI image is available, as are micro, minimal and the init image. To get the entire experience, test the UBI images on a fully subscribed RHEL 9 container host, allowing you to pull additional RPMs from the RHEL 9 repositories.

Ansible 101: An introduction to Automating Everything with Red Hat Training.

RHEL for edge

RHEL 9 introduces automatic container updates and rollbacks, which expands the capacity to update container images automatically. Podman can now detect if an updated container fails to start and automatically roll the configuration back. Together with existing OS-level rollbacks, this provides new levels of reliability for applications.

Image Builder as-a-Service

Enhancements to Image Builder in RHEL 9 help organizations save time and drive system consistency at scale. With the new Image Builder as-a-Service, organizations can now build a standardized and optimized operating system image through our hosted service and deploy it to a cloud provider of choice.

Identity and security

New capabilities added to RHEL 9 help simplify how organizations manage security and compliance when deploying new systems or managing existing infrastructure. RHEL 9 now offers Integrity Measurement Architecture (IMA) to dynamically verify the integrity of the OS to detect if it has been compromised. RHEL 9 has also been enhanced to include digital signatures and hashes that help organizations detect rogue modifications across the infrastructure.

Automation and management

Organizations now have access to the enhanced performance metrics page in the RHEL 9 web console to help identify potential causes of high CPU, memory, disk and network resource usage spikes. In addition, customers can more easily export metrics to a Grafana server. Kernel live patch management is also available via the web console to significantly reduce the complexity of performing critical maintenance. The console also adds a simplified interface for applying kernel updates without using command line tooling. 

Why You Should Migrate to Red Hat Linux from CentOS

Predictive analytics

Red Hat Insights now encompasses Resource Optimization, which enables right-sizing RHEL in the public cloud. Resource Optimization does this by evaluating performance metrics to identify workload utilization. Insights then provides visibility and recommendations for optimizing to a more suitable instance for the workload needs. Insights also adds Malware Detection, a security assessment that analyzes RHEL systems across the enterprise for known malware signatures and provides detailed visibility into the risk.

Red Hat Enterprise Linux (RHEL) has been the Linux for business for a generation now. Today, RHEL touches more than $13 trillion of the global economy. Remember when people used to think Linux couldn't handle big business? Ha! With the release of RHEL 9 at the Red Hat Summit in Boston, Red Hat improved its offerings from the open hybrid cloud to bare metal servers to cloud providers and the farthest edge of enterprise networks. 

RHEL 9 Customers want better security, and Red Hat will deliver it. Beyond the usual RHEL hardening, testing, and vulnerability scanning, RHEL 9 incorporates features that help address hardware-level security vulnerabilities like Spectre and Meltdown. This includes capabilities to help user-space processes create memory areas that are inaccessible to potentially malicious code. The platform provides readiness for customer security requirements as well, supporting PCI-DSS, HIPAA, and more.

Specific security features:

- Smart Card authentication: Users can make use of smart card authentication to access remote hosts through the RHEL web console (Sudo, SSH, etc.).

- Additional security profiles: You can improve your security intelligence gathering and remediation services such as Red Hat Insights and Red Hat Satellite with security standards such as PCI-DSS and HIPAA.

- Detailed SSSD logging: SSSD, the enterprise single-sign-on framework, now includes more details for event logging. This includes time to complete tasks, errors, authentication flow, and more. New search capabilities also enable you to analyze performance and configuration issues.

- Integrated OpenSSL 3: It supports the new OpenSSL 3 cryptographic frameworks. RHEL's built-in utilities have been recompiled to utilize OpenSSL 3.

SSH root password login disabled by default: Yes, I know you ssh into your server with root passwords all the time. But it's never been a smart idea.  By default, RHEL  won't let you do this. Yes, this is annoying, but it's even more annoying to hackers trying to log in as `root` using brute force password attacks. All-in-all, this is a win in my book.

In this release, Red Hat also introduces Integrity Measurement Architecture (IMA) digital hashes and signatures. With IMA, users can verify the integrity of the operating system with digital signatures and hashes. With this, you can detect rogue infrastructure modifications, so you can stop system compromises in their tracks.

MEC: Multi-access Edge Computing: state of play from ETSI MEC and network automation perspectives

Red Hat is also adopting, via Kubernetes, Sigstore for signing artifacts and verifying signatures. Sigstore is a free software signing service that improves software supply chain security by making it easy to sign release files, container images, and binaries cryptographically. Once signed, the signing record is kept in a tamper-proof public log. The Sigstore will be free to use by all developers and software providers. This gives software artifacts a safer chain of custody that can be secured and traced back to their source. Looking ahead, Red Hat will adopt Sigstore in OpenShift. Podman and other container technologies.

This release has many new edge features. These include:

- Comprehensive edge management, delivered as a service, to oversee and scale remote deployments with greater control and security functionality, encompassing zero-touch provisioning, system health visibility and more responsive vulnerability mitigations all from a single interface.

- Automatic container roll-back with Podman, RHEL's integrated container management technology. This automatically detects if a newly-updated container fails to start. In this case, it then rolls the container back to the previous working version.

- The new RHEL also includes an expanded set of RHEL Roles, These enable you to create specific system configurations automatically. So, for instance, if you need RHEL set up just for Postfix, high-availability clusters, firewall, Microsoft SQL Server, or a web console, you're covered.

- Besides roles, RHEL 9 makes it easier to build new images: You can build RHEL 8 and RHEL 9 images via a single build nod. It also includes better support for customized file systems (non-LVM mount points) and bare-metal deployments. 

- If you're building Universal Base Image (UBI) containers, You can create them not only with standard UBI images but with micro, minimal, and init images as well. You'll need a fully subscribed RHEL 9 container host to do this. This enables you to pull additional RPMs from the RHEL 9 repositories. 

- RHEL now uses cgroup2 containers by default: Podman, Red Hat's drop-in daemonless container engine replacement for Docker, uses signature and short-name (e.g., ubi8 instead of registry.access.redhat.com/ubi8/ubi) validation by default when pulling container images. 

- And, of course, Red Hat being Red Hat, RHEL 9 Beta ships with GCC 11 and the latest versions of LLVM, Rust, and Go compilers. Looking ahead, Python 3.9 will also be RHEL 9's default version of Python.

Thinking of the console, the new RHEL also supports kernel live patching from the console. With this, you can apply patches across large, distributed system deployments without having to write a shell program. And, since it's live patching, your RHEL instances can keep running even as they're being patched.

Put it all together, and you get a solid business Linux for any purpose. Usually, we wait before moving from one major release to another. This time you may want to go ahead and jump to RHEL 9 sooner than later. 

More Information:

https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux/try-it

https://developers.redhat.com/articles/2022/05/18/whats-new-red-hat-enterprise-linux-9

https://www.redhat.com/en/blog/hot-presses-red-hat-enterprise-linux-9

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9

https://www.zdnet.com/article/red-hat-enterprise-linux-9-security-baked-in/

https://www.unixsysadmin.com/rhel-9-resources/


IBM Hybrid Multi-Cloud Strategy

$
0
0

 


Hybrid MultiCloud

IBM Hybrid MultiCloud Strategy

From time to time, we invite industry thought leaders to share their opinions and insights on current technology trends to the IBM Systems IT Infrastructure blog. The opinions in these posts are their own, and do not necessarily reflect the views of IBM.

New technologies breed new buzzwords and terminology, and sometimes it can be difficult to keep up with what it all means. For example, I’m sure you’ve heard the term “hybrid multicloud,” but have you ever really stopped to think about what it means and what it implies for IT in your organizations?

Developing Secure Multi-Cloud Kubernetes Applications


What does it mean?

First let’s take a moment to break down the term Hybrid Multicloud.

Hybrid implies something heterogeneous in origin or composition. In other words, it is something that is composed of multiple other things. Multicloud is pretty simple, and refers to using more than one cloud computing service.

So, when you use the term “hybrid” in conjunction with “multicloud,” it implies an IT infrastructure that uses a mix of on premises and/or private / public cloud from multiple providers.

This is a sensible approach for many organizations because it enables you to maintain and benefit from the systems and data that you have built over time. And, to couple it with current best practices for reducing cost and scaling with cloud services where and when it makes sense.

No one single system or technology is the right solution for every project. No matter what the prognosticators are saying, we will not be moving everything to the cloud and abandoning every enterprise computing system we ever built in the past. But the cloud offers economies of scale and flexibility that make it a great addition to the overall IT infrastructure for companies of all sizes.

With a hybrid multicloud approach, you can choose what makes sense for each component, task, and project that you tackle. Maintain existing platforms to benefit from their rich heritage and integrate them with new capabilities and techniques when appropriate.

Another way of saying this is that you utilize the appropriate platform and technology for the task at hand.

The mainframe is a vital component

For large enterprises, the mainframe has been a vital cog in their IT infrastructure for more than 50 years. Mainframes continue to drive a significant portion of mission critical workload for big business.

Mainframes house more of the world’s structured enterprise data than any other platform. A large percentage of all enterprise transactions run on or interact with the mainframe to conduct business. The mainframe is used by 44 out of the top 50 worldwide banks, 10 out of the top 10 insurers and 18 out of the top 25 retailers.[1]

Clearly the mainframe continues to be an important platform for hosting and developing critical business applications. As such, it is a critical component that should be considered for enterprise hybrid multicloud implementations.

Application-Level Data Protection on Kubernetes

Challenges of system change

As we embark on our hybrid multicloud journey, we must embrace the challenges that are involved in integrating, managing, and utilizing a complex heterogeneous system of different platforms and technologies.

The goal is to bring order, control and insight to disparate environments. This means building resiliency and business continuity into the applications and systems. An outage anywhere in the hybrid multicloud should not cause transactions and business to cease operating.

Furthermore, security and data protection must be part of your strategy. Your customers do not care about the technology you use–they expect to be able to access your systems easily and for their data to be protected. Furthermore, with regulations like HIPAA, PCI-DSS, GDPR and more, your hybrid multicloud must be secure.

It is also challenging to manage systems that rely on multiple cloud service providers, Each provider will have different configuration and security requirements, along with separate development and deployment techniques and requirements.

And let’s not forget that we are integrating many disparate components in a hybrid multicloud infrastructure, not just cloud providers. These are typically implemented, managed, and monitored in different ways, using different technologies. It is imperative that you build and acquire management solutions that can be used to manage and orchestrate the activities and projects across your environment with minimal disruption.

A rigorous plan for choosing multicloud management solutions that understand the cloud providers and on-premises technology that you use can be the difference between success and failure. Plan wisely!

The bottom line

Tackling modern technology is not as simple as “throw out the old and bring in the new.” You have to integrate the old and the new in order to continue to build business value. That means adopting a hybrid multicloud approach. This can deliver the most value to your organization, but it also requires being cognizant of the challenges and making plans to overcome them for your business.

To learn more about IT infrastructure for your hybrid multicloud environment, read this Forrester paper, Assess The Pain-Gain Tradeoff of Multicloud Strategies.

https://www.ibm.com/it-infrastructure/us-en/resources/hybrid-multicloud-infrastructure-strategy/

Multi-Cloud Strategy

What is a Multi-Cloud Strategy?

Why use a Multi-Cloud Strategy?

What are the Benefits of Multi-Cloud Strategy?

How does a Multi-Cloud Strategy enable Digital Transformation?

How do you Develop a Multi-Cloud Strategy?

What are the Key Success Factors for a Multi-Cloud Strategy?

What is a Multi-Cloud Strategy?

A multi-cloud strategy is the utilization of two or more cloud computing services from any number of cloud providers, that are compatible with and extend an organization’s private cloud capabilities. Generally, this means consuming Infrastructure-as-a-Service (IaaS) services are provided by more than one cloud vendor as well as by on-premises or private cloud infrastructure.

Many organizations adopt a multi-cloud strategy for redundancy or to prevent vendor lock-in, while others adopt a multi-cloud approach for best fit-for-purpose to meet application needs for example to take advantage of capacity or features available from a particular cloud provider, or to utilize services offered in a particular geography.

Why use a Multi-Cloud Strategy?

Organizations adopt an enterprise multi-cloud strategy for a number of reasons. Utilizing multiple cloud services from a variety of providers offers these advantages, amongst others:

Modernization: As organizations increasingly adopt cloud-native applications based on containers, microservices, and APIs, a multi-cloud strategy gives access to the broadest array of services while composing new applications.

Flexibility and Scalability: Using multiple cloud providers can prevent vendor lock-in, can provider leverage during vendor negotiations, and can also expose the organization to new capabilities unique to a second or third provider. Additionally, as demand varies, multi-cloud providers can support increase or decrease in capacity virtually instantaneously.

Enhance Best Practices: Leverage best practices learned working with one cloud to other public and private clouds.

Regulatory Compliance: Not all cloud providers provide services or store data in every geography. A multi-cloud strategy can help ensure that an organization is in compliance with the broad range of regulatory and governance mandates, such as GDPR in Europe.

Deploy resilient and secure Kubernetes apps across multi-cloud

What are the Benefits of MultiCloud Strategy?

Agility and Choice: Organizations adopting a multi-cloud strategy can support the needs of an entire application portfolio, and overcome challenges of legacy infrastructure and limited in-house capacity to achieve agility and flexibility needed to remain competitive in their markets. A solid multi-cloud strategy enables organizations to methodically migrate workloads and modernize their application portfolio with cloud-specific services best suited for each application.

Utilize Best of Breed Services: Organizations can pick the best cloud platform that offers the best possible technology solution at the most attractive price. Organizations can select from the physical location, database, service level agreement, pricing, and performance characteristics of each provider while crafting an overall cloud solution to meet pressing business needs.

Modernization and Innovation: Modern orchestration tools can automate management of a multi-cloud strategy, including cloud and on-premises workloads. This can free up valuable IT resources to focus on code modernization and innovation based on new services, products, and platforms that become available on a continual basis.

Enhanced Security: Multi-cloud strategies often include adopting a zero-trust approach to cloud security, which can help ensure the security of every cloud transaction and interaction. Although every major cloud provider offers state of the art physical security, logical security remains the responsibility of each organization using cloud providers for their IaaS platforms.

Price Negotiations: Utilizing multiple cloud providers offers pricing leverage to organizations, as providers are increasingly under competitive pressure to offer IaaS services to an increasingly savvy customer base. Organizations can compare different providers to secure the best possible price and payment terms for each contract.

Risk Reduction: Utilizing multiple cloud providers helps protect against infrastructure failure or cyber-attack. Organizations can rapidly failover workloads from one cloud provider to another and fail them back once the problem is solved.

Introduction to VMware Multi-Cloud Architecture and Strategy

How does a Multi-Cloud Strategy enable Digital Transformation?

Digital Transformation is achieved by utilizing applications to deliver services to customers, and to optimize business processes and supply chain operations. As organizations undertake their digital transformation journey, application modernization and a multi-cloud strategy supports the needs of applications – new and old. Digital transformation and application modernization is an ongoing process, not a one-time task, and so new services and products offered by a range of cloud providers will factor into the continuous improvement of enterprise applications as digital transformation evolves into digital maturity.

IT organizations may find that certain workloads perform better on a given platform, while others work better with a service that is uniquely offered by a specific vendor. A multi-cloud strategy enables the development of the best possible platform for a given function.

How do you Develop a Multi-Cloud Strategy?

Organizations should start on their multi-cloud strategy by first taking an assessment of application needs, as well as technical and business requirements – both cloud and on-premises based - to understand the motivation for adopting a multi-cloud strategy. Popular motivators include:

  • Lowering overall infrastructure costs by migration of workloads to the cloud provider with the most aggressive pricing models
  • Speeding application delivery by provisioning development resources when needed
  • Driving IT efficiency by freeing up manpower formerly utilized managing on-premises resources
  • Moving to OpEx from CapEx by eliminating in-house infrastructure entirely.
  • Once needs are assessed, organizations should plan which cloud services will best fill those needs. A multi-cloud strategy should consider:
  • Existing applications, and whether they currently reside in a cloud provider
  • Unique benefits of each cloud provider and how they map to current needs
  • Overall relationship with existing cloud provider portfolio
  • Whether there are concerns regarding vendor lock-in
  • Strategic or business benefits from a multi-cloud strategy, such as compliance or governance issues that would be solved or addressed.

It is important to consider what roadblocks could impede a multi-cloud strategy. One of the major issues is siloed data that is locked into standalone databases, data warehouses or data lakes with both structured and unstructured data, and block storage used for persistent volumes, all of which can be difficult to migrate. Organizations must also ensure that there are more than one instance of any data set; otherwise it will be impossible to determine which is the source of truth’ and which is an echo. Also, different cloud providers have different architectures and constructs that prevent simple migration of workloads, unless there is an abstraction layer that provides a consistent infrastructure environment.

Organizations should plan on implementing a multi-cloud governance strategy to ensure that policies are applied uniformly enterprise-wide and that business units are not utilizing ‘shadow IT’ resources instead of utilizing sanctioned platforms.

In this manner, IT becomes more of a broker than developer, making cloud resources available and applying policies and best practices to ensure that each instance and deployment adhere to defined policies.

A major issue to avoid is utilizing older offerings or platform-as-a-Service (PaaS) when simple compute is required. Although PaaS offers many benefits, most offerings are not easily portable between cloud providers and should be avoided. Since many organizations utilize a multi-cloud strategy as part of an overall modernization effort, PaaS deployments should be migrated to containerized applications which inherently support multi-cloud strategies.

Finally, when selecting services, avoid the need to find the exact perfect match for every application or function. Platforms that meet all the defined needs are all an organization needs; searching for the ultimate cloud provider offering for a given application can lead to adoption of a number of one-off providers when the job could have been done just as well with existing cloud partner offerings. The old adage that ‘99 percent done is done’ should be applied.

Organizations should then utilize development of multi-cloud pilots to gain competency in managing a multi-cloud strategy to execution, including offering necessary training and education for all stakeholders as to what will change in their day to day activities.

Normalizing Multi-cloud Security Notifications

What are the Key Success Factors for a Multi-Cloud Strategy?

Know the Why of multi-cloud. Organizations must keep their objectives top of mind, whether it is modernization, cost savings, reducing vendor lock-in or eliminating on-premises IT infrastructure. This also should include buy-in from all stakeholders including executives.

Keep an eye on costs. Cloud platforms are different. Without an abstraction layer or way to create consistent operations, operations, security, and governance costs can grow with the addition of each cloud.

Plan for needed skills. Multi-cloud adds complexity – perhaps two to three times more complex than utilizing a traditional single-sourced cloud environment. Although management tools can mitigate some of this complexity new skills will be required to manage a multi-cloud environment and to take advantage of the benefits of cloud-native application strategies. Whether these skills come from training existing teams, hiring from outside, or by leveraging integration partners they will be required to get a multi-cloud strategy off the ground.

Measure Progress. Organization leaders will want to determine if a multi-cloud strategy is achieving its stated goals. Look for ways to measure the payback of this approach, either through return on investment (ROI) or by demonstrating reduced total cost of ownership (TCO) for IT over a given timeframe.

Document and report on outcomes and share the reports with stakeholders to grow confidence in the strategy enterprise-wide.

Think Modernization. If achieving modern, cloud-native operations is a goal, embrace modernization and encourage thinking outside the box as development, DevOps and deployment times all accelerate. Innovation that leads to better employee and customer engagement can pay off in improved revenue and profits, so embrace new methods of interacting such as chatbots and mobile applications.

Mastering the Hybrid Multicloud World

It’s Critical that On Premises and Cloud Work Together

It’s easy to see why today’s organizations are flocking to the cloud. Hyperscalers give software developers access to a wide scope of resources for creating and managing applications. They also enable rapid scaling, and foster innovation by making it easy for developers to incorporate new features. As millions of customers provide feedback, new iterations are constantly being built and deployed. 

For organizations, it makes sense to take advantage of the cloud’s innovations and scaling capabilities by using SaaS applications for standard business functions such as Customer Relationship Management (CRM), office productivity software, and video conferencing. Fifty-four percent of businesses say they have moved on-premises applications to the cloud, and 46% have created applications expressly built for cloud use, according to the IDG 2020 Cloud Computing Survey.

In the multi-tenant public cloud, organizations also avoid the heavy capital expenses of purchasing infrastructure and pay-as-you-go pricing also allows them to avoid spending money on unused capacity.

Still, many organizations prefer to hold the more individualized and sensitive parts of their business processes – applications controlling finance, manufacturing, or supply chain operations, for example – in the data center.  This hybrid cloud model allows IT to focus on hosting internally the services that make the company unique.

Hybrid- and Multi-Cloud by design - IBM Cloud and your journey to Cloud

From Hybrid Cloud to Multicloud

The norm for today’s enterprises is the MultiCloud. Fifty-five percent of companies are using at least two public clouds in addition to their own data centers, the IDG survey found.

One reason is that AWS, Microsoft Azure, and the Google Cloud each have different features and pricing structures. IT managers make choices based on the performance and services a platform offers, which vary according to application type. IT leaders also optimize costs by selecting the storage options best suited to their needs.

And because the public cloud is a dynamic environment, with providers continually creating new services, a multicloud strategy allows organizations to avoid vendor lock-in and take advantage of these innovations as they are introduced.

Multi-Cloud Kubernetes Management and Operations


Management and Data Challenges

The sprawling multi-cloud-and-on-premises environment gives IT leaders a wide array of choices for managing resources and data. While having more options is a boon, 46% of technology leaders in the survey said it has also increased management complexity.

IT teams must constantly evaluate the environment and decide where it is best to locate workloads. Some decisions are relatively straightforward. Security or compliance regulations keep certain applications on premises. Another big issue is lag. Kim Stevenson, Senior Vice President and General Manager of NetApp Foundational Data Services, points out that “Some applications don’t tolerate even a nanosecond of latency.”

But for many applications, decisions aren’t so clear-cut. Technology leaders must weigh their options, running calculations to determine the advantages and disadvantages of on premises versus Cloud A or Cloud B.

Sometimes it makes sense to move applications permanently to the cloud. Other times, it may be better to shuttle them between cloud and data center as the organization grows, tries out new services, or responds to changing demands.

“If you’re in retail, you need to do a lot more credit card processing on Black Friday. If you’re an accounting firm, you need to do a lot of tax application processing in the first quarter. At the end of the fiscal year, you may want to tier off older data to object storage,” Stevenson says.

But applications and data don’t always move smoothly between the on-premises environment and the cloud. Inconsistent data formatting can lead to confusion and errors.

For example, dates can be expressed in several different formats, making data containing them difficult to transfer.  Customer records may contain 16 characters in some data stores and 20 in others. If a company moves them from a 20-character to a 16-character format, IT must pause to determine whether any important information will be lost, and if so, what to do about it.

Because data about application use and costs is scattered across public clouds and the data center, it’s tough for IT to see the big picture. Different clouds use different management tools, making it even harder to have visibility into actual IT resource usage and spend forecast predictability.

Taming Multi-Cloud, Hybrid Cloud, Docker, and Kubernetes

Improving Operations with Unified Management

Today’s technology makes managing the multicloud, hybrid environment much easier. Solutions such as NetApp ONTAP standardize data architecture, so companies can move applications at will and automatically tier off old data to cheaper storage without worrying about quality control. They have strong and consistent security protections surrounding their data wherever it goes.

IT leaders can also see and manage infrastructure both at home and across multiple public clouds – all from one central control plane. A unified management platform also enables cloud features like automation and advanced AI algorithms to be extended to applications in the data center.

“A single management console helps you do two things,” Stevenson says. “It diagnoses problems and shows you where they’re located, and it gives you the tools to solve them.”

Administrators can manage everything with a single toolset, making training easier and avoiding the confusion that can arise when switching among on-premises and public clouds.

Managers can view resources across the entire organization or parse them according to business unit or service type. This unparalleled visibility enables them to avoid guesswork when creating a technology strategy, as well as make informed decisions based on reliable and timely operational data.

The state of the cloud csa survey webinar

Businesses can also increase agility by scaling compute and storage resources separately, helping them respond better to shifting workloads and customer demands. Remote teams can collaborate seamlessly using data from both on-premises storage and the cloud.

Making Better Choices

The hybrid, multicloud environment gives companies choices, but without a coherent framework, conflicts and inefficiencies are bound to arise.

Today’s technology allows IT leaders to literally see what they’re doing and judge how one move on the chessboard will affect other pieces of the business. Whether they’re building their own private clouds or deploying resources in public ones, they can make sound, data-driven decisions about operations, costs, scaling, and services. By bringing the best capabilities of the cloud to the data center, IT leaders can finally achieve their elusive goal of aligning IT strategy with business strategy.

“The cloud and the on-premises environments will continue to coexist for a long time,” Stevenson says. “Organizations that enable them to work well together will realize the full benefits of both, giving them a competitive edge.”

Multi-Cloud Connectivity and Security Needs of Kubernetes Applications

Application initiatives are driving better business outcomes, an elevated customer experience, innovative digital services, and the anywhere workforce. Organizations surveyed by VMware report that 90% of app initiatives are focused on modernization(1). Using a container-based microservices architecture and Kubernetes, app modernization enables rapid feature releases, higher resiliency, and on-demand scalability. This approach can break apps into thousands of microservices deployed across a heterogeneous and often distributed environment. VMware research also shows 80% of surveyed customers today deploy applications in a distributed model across data center, cloud, and edge(2).

Enterprises are deploying their applications across multiple clusters in the data center and across multiple public or private clouds (as an extension of on-premises infrastructure) to support disaster avoidance, cost reduction, regulatory compliance, and more.

 Applications Deployed in a Distributed Model

Fig 1: Drivers for Multi-Cloud Transformation 


The Challenges in Transitioning to Modern Apps 

While app teams can quickly develop and validate Kubernetes applications in dev environments, a very different set of security, connectivity, and operational considerations awaits networking and operations teams deploying applications to production environments. These teams face new challenges as they transition to production with existing applications — even more so when applications are distributed across multiple infrastructures, clusters, and clouds. Some of these challenges include:

Application connectivity across multi-cluster, multi-cloud, and VM environments 

Application teams developing new applications using a microservices architecture need to be concerned about how to enable connectivity between microservices deployed as containers and distributed across multiple clouds and hybrid environments (data centers and public clouds).

Private cloud in the Hybrid Era

Additionally, some of these application components reside in VM environments. For example, a new eCommerce app designed with a microservices architecture may need to contact a database running in a VMware vSphere environment or in the cloud. The lack of seamless connectivity between these heterogeneous environments (container-based vs. VM-based) is one of the reasons that prevent enterprises from meeting time-to-market requirements and slows down their app modernization initiatives, as they are unable to re-use their existing application components.

Consistent end-to-end security policies and access controls 

With heterogeneous application architectures and infrastructure environments, the trusted perimeter has dissolved, and enterprises are seeing breaches that continue to grow via exploits, vulnerabilities, phishing attacks, and more. Modern applications raise several security challenges, such as how to secure connectivity not only from end-users into Kubernetes clusters, but across clusters, availability zones, and sites and between containerized and virtual machine environments.

Consistent end-to-end security policies and access controls 

Fig 2: Increased Attack Surface 

Teams need to more effectively ensure that users are given the right access permissions to applications; that application components are properly ring-fenced; and that communications across hybrid infrastructures and workloads are secured. Identity based on IP addresses, and intent based on ports, are insufficient for modern applications. What is needed is end-to-end deep visibility from end-users to apps to data, and an extension of the principles of zero trust network access (ZTNA) to these modern applications.

Operational complexity — multiple disjointed products, no end-to-end observability 

The responsibility for secure, highly available production rollouts of Kubernetes falls on application platform teams. However, they are confronted with a vast array of open-source components that must be stitched together to achieve connectivity, availability, security, and observability — including global and local load balancers, ingress controllers, WAF, IPAM, DNS, sidecar proxies, policy frameworks, identity frameworks, and more.

Multiple disjointed products, no end-to-end observability

Fig: 3 Multiple components need to be managed separately 

Platform teams need a way to centrally control traffic management and security policies across the full application operating environment. They also need a way to gain end-to-end visibility across multiple K8s environments and entire application topologies, including application dependencies, metrics, traces, and logs. The end-result of this complexity is usually a compromise consisting of partial visibility, automation, and scalability, which ends up making a lot of projects fail.

All these challenges and more are driving us to further evolve our networking and security thinking for modern apps. We simply cannot afford to continue to rely solely on the network architectures of the last decade. More versatile and flexible models are needed to address connectivity, security, and operational requirements in this rapidly evolving world.

VMware Modern Apps Connectivity Solution   

VMware is introducing a new solution that brings together the advanced capabilities of Tanzu Service Mesh and VMware NSX Advanced Load Balancer (formerly Avi Networks) to address today’s unique enterprise challenges.

The VMware Modern Apps Connectivity solution offers a rich set of integrated application delivery services through unified policies, monitoring, visualizations, and observability. These services include enterprise-grade L4 load balancing, ingress controller, global load balancing (GSLB), web application security, integrated IPAM and DNS, end-to-end service visibility and encryption, and an extensible policy framework for intelligent traffic management and security. Through the integrated solution, operators can centrally manage end-to-end application traffic routing, resiliency, and security policies via Tanzu Service Mesh.

This solution speeds the path to app modernization with connectivity and better security across hybrid environments and hybrid app architectures. It is built on cloud-native principles and enables a set of important use-cases that automates the process of connecting, observing, scaling, and better-securing applications across multi-site environments and clouds.

VMware Modern Apps Connectivity Solution  

The VMware Modern App Connectivity solution works with VMware Tanzu, Amazon EKS, and upstream Kubernetes today, and is in preview with Red Hat OpenShift, Microsoft Azure AKS, and Google GKE(3). As a leader in delivering the Virtual Cloud Network, VMware understands the challenges of creating operationally simple models for modern app connectivity and security. The solution closes the dev-to-production gap caused by the do-it-yourself approach forced on many networking teams who are under pressure to launch reliable, business-critical services that work consistently across heterogeneous architectures and environments.

More Information:

https://blogs.vmware.com/networkvirtualization/2021/05/multi-cloud-connectivity-security-kubernetes.html/

https://open-security-summit.org/sessions/2021/mini-summits/nov/kubernetes/developing-secure-multi-cloud-applications/

https://www.redhat.com/en/about/press-releases/red-hat-extends-foundation-multicloud-transformation-and-hybrid-innovation-latest-version-red-hat-enterprise-linux

https://www.ibm.com/cloud/blog/distributed-cloud-vs-hybrid-cloud-vs-multicloud-vs-edge-computing-part-1

https://www.ibm.com/blogs/systems/hybrid-multicloud-a-mouthful-but-the-right-approach/

https://www.ibm.com/cloud/architecture/architectures/public-cloud/

 https://www.ibm.com/blogs/think/2019/11/how-a-hybrid-multicloud-strategy-can-overcome-the-cloud-paradox/


“Supercloud” to the Rescue? New Architecture Could Make Cloud Computing More Secure

$
0
0


Super Cloud The New Multi Cloud 

“Supercloud” to the Rescue? New Architecture Could Make Cloud Computing More Secure

Recent reports show hybrid cloud adoption growth tripled in the last year, and 80 percent of all IT budgets are expected to be committed to cloud solutions in the next 15 months. The message is pretty clear: cloud computing isn’t going anywhere and will only become more widespread.

The same could be said, however, for the hesitations many have about transferring valuable data – everything from employee information to financial records – to cloud-service providers. The high maintenance costs of private data centers are causing many providers to employ distributed cloud computing systems, which can pose reliability and security issues.

Could a “Supercloud” save the day? One team of European researchers thinks so and presents a solution in their IEEE article, “User-Centric Security and Dependability in the Clouds-of-Clouds”.

User-focused and self-managed, the Supercloud is the team’s new vision of security and dependability management for distributed cloud computing.

Challenges of Distributed Cloud Computing Systems

Distributed cloud computing systems are complex and under the complete control of providers. Users have no influence over the security, pricing or reliability of their clouds. Furthermore, the researchers say having one provider host the cloud causes it to be more vulnerable to hacking, and cloud services become less stable if the user isn’t located near one of the provider’s data centers.

The multi-cloud architecture of the Supercloud uses the distributed cloud systems of multiple providers. This better ensures users have their data hosted in the nearest data centers for all providers in the system, not just the nearest data center of a single provider.

As part of the Supercloud’s multi-cloud architecture, a security layer provides separation between the customer’s cloud and the provider-controlled cloud. This layer allows the Supercloud architecture to host user-centric clouds, or “U-Clouds,” which are specifically encrypted for each individual user, whether it is a person or corporation.

U-Clouds can be hosted through the same public or private provider, but are separated from other U-Clouds through U-Cloud boundaries created by the Supercloud security layer. As seen in the figure below, this increases the reliability of each cloud. If a U-Cloud is functioning incorrectly or is infected by a virus, the isolation of each U-Cloud prevents them from affecting others that use the same provider.

Figure 1: Diagram of the Supercloud concept


“This à la carte approach to cloud security enables full protection customizability as the customer can choose which security and availability services to deploy in his or her own cloud,” said Marc Lacoste, a Senior Research Scientist at Orange Labs.

The Supercloud architecture has four different planes (see below in red) designed to use various security and dependability services maintained by the service providers. These planes would give users control of their personal clouds, rather than surrendering control to providers. While it requires some knowledge from users, they control who can access their data and can easily transfer their data between different providers in the Supercloud architecture.

Figure 2: Sample Supercloud workflow

New Business Opportunities

The researchers believe the design of the Supercloud will create business opportunities for providers and many companies. Any establishment handling sensitive data would have incentive to transition to U-Clouds, since they provide the ability to better protect data from outside parties.

Additionally, the Supercloud architecture would allow cloud providers to offer new services as users would have more trust in cloud computing security. “The Supercloud offers business opportunities across dimensions, out of which cloud brokerage is probably the most immediate,” said Marko Vukolić, a Research Staff Member at IBM Research in Zurich. “Technology developed in the Supercloud allows the creation of value-added services that bring together resources from several, possibly untrusted, cloud providers to give users better service, more security and dependability guarantees.”

Healthcare is one example of an industry that can benefit from the Supercloud. As the volume of diagnostic imaging continues to rise, a cloud solution would keep data storage costs down. The Supercloud could provide a hospital with a secure online archive of all its images and ensure even the hospital’s provider didn’t have access to the data in their private cloud.

Though still in development, the research team has been implementing different components of the Supercloud architecture to achieve integrated proofs of concepts. Beyond healthcare, other predicted applications and business domains include cloud brokerage, blockchain and smart home security.

Amidst the development of cloud computing and the growing concern of data breaches, the Supercloud will soon be able to offer individual consumers and businesses a unique multi-cloud architecture that is more secure and dependable than other cloud computing systems.

Learn more about multi-cloud architecture in IEEE Xplore.

Follow the research team’s progress in developing the Supercloud. See their latest results at https://supercloud-project.eu/.

Supercloud: a new approach to security in a multi-cloud environment with automation

Multicloud is really not about the public clouds it’s built on.  The use of multiple clouds is both an intentional strategy and the inevitable result of the nature of early cloud adoption. Beginning to emerge, such as supercloud, distributed cloud, meta cloud, abstract cloud, and cloud-native. It is an innovation enabling to benefit from “on-demand” security services for multi-cloud environments. Security and dependability of data and services in a multicloud environment: this is now possible thanks to the supercloud project.

Automation has become a requirement in any field. Data from the cloud can’t be manually manipulated, controlled, and operated; customers expect a seamless experience. Automation eliminates waste with cost policies that alert on cost anomalies and take automated actions on idle and underused resources. And supercloud improved automation across clouds, application functionality, and efficiency of certain workflows, such as the ability to conduct analytics without data movement and reduced risk of failures or cybersecurity attacks

From Multicloud to Supercloud:

The meta cloud concept will be the single focus for the next 5 to 10 years as we begin to put public clouds to work. Having a collection of cloud services managed with abstraction and automation is much more valuable than attempting to leverage each public cloud provider on its terms rather than yours. It provides a decentralized alternative to the traditional cloud computing paradigm.

Public cloud providers through abstract interfaces access specific services, such as storage, computing, artificial intelligence, data, etc. It enables freedom from the barriers set by current public or private clouds, and the combination of the deployment security that one finds in private clouds with the upscaling flexibility of public clouds.

Cloud-spanning technology allows us to use those services more effectively. It is a type of cloud delivery model in which an application is deployed and executed over multiple simultaneous cloud platforms and infrastructure. And it enables a cloud application to distribute its computations and components across one or more cloud environments. It also reduces vendor lock-in by combining several cloud solutions to form an enterprise cloud solution.

Breaking Analysis: Supercloud is becoming a thing


However, managing a multi-cloud environment can be complex, challenging, and expensive without proper planning, and execution. A meta cloud removes the complexity that multicloud brings these days. This user-centric architecture enables the user to choose, autonomously and on-demand, their protection requirements and the security services necessary to guarantee these. The medical field is far from being the only area that could benefit from this new security model in multi-cloud environments: supercloud uses are virtually limitless.

Datacenter providers, such as CoreSite, automate access from primarily enterprise customers to multiple clouds; this translates into better speeds and control for customers, more responsiveness, and an overall better customer experience. These solutions can be leveraged by similar companies that operate within their vertical markets, and are often made available as free, open-source tools. The goal is to deliver value above and beyond what’s currently available within an existing cloud platform while making it easier for teams to get what they need out of these powerful resources.

Dell Technologies is using multi-cloud strategies to place its workloads and application development where they would be most effective.

Dell Technologies has announced multi-cloud capabilities that provide a uniform experience regardless of where programs and data are stored. Dell Infrastructure also adds additional services and tools to assist developers to pick the ideal cloud environment while maintaining the security, support, and predictable pricing of Dell infrastructure.

The multi-cloud framework has enabled most enterprises to make use of the cloud’s benefits without being bound to a single provider while keeping prices reasonable. While the multi-cloud gives them more freedom in managing their workflows and other tasks, there’s no disputing that its appealing pricing is a big reason why companies choose it. Although choosing multi-cloud is less expensive, the fact is that complexity that comes with this option is something that most firms are only realizing after making this decision. Multi-cloud has numerous benefits and maybe a game-changer for companies in their transformation journey. Getting all of the data and workloads housed on various servers, however, is a challenge.

Operational complexity, inconsistency, and various siloed IT environments are all common trade-offs in the multi-cloud paradigm. These issues jeopardize some of the cloud’s primary benefits, posing efficiency, financial, security, and competitive threats. As a result, IT organizations want solutions that enable them to offset these trade-offs to provide a genuinely integrated experience across all of their environments, allowing them to pick where and how they operate based on what is best for the company rather than the operational location.

“As data becomes more spread across on-premises and colocation data canters, various public clouds and edge settings, today’s multi-cloud reality is difficult,” said Jeff Boudreau, president, of Infrastructure Solutions Group, Dell Technologies. “We have the industry’s widest technology portfolio, consistent tools, expertise developing open ecosystems, and industry-leading data storage capabilities, services, and supply chain. All of this puts Dell in a unique position to assist clients to manage their multi-cloud strategy.”

Michael Dell, CEO of Dell Technologies, stated in his keynote talk at Dell Technologies World 2022 (DTW) that the argument between on-prem and off-prem is over. Multi-cloud, he believes, is the way of the future, with workloads and data moving effortlessly across the whole ecosystem. According to Dell, 90 percent of customers now have both on-premises and public cloud environments. And seventy-five percent use three or more distinct clouds.

Organizations require solutions to address the challenges of different environments and organizational requirements. Here are three things to think about if you want to maximize the benefits of multi-cloud:

Consider going beyond public clouds to bring the cloud’s speed and agility to any location

Organizations are increasingly turning to cloud and services-based infrastructure because of the speed and agility it provides in meeting changing business demands and responding rapidly to change. Furthermore, many are managing both on-premises and cloud systems, with an increasing number of applications in co-located or edge locations. Organizations require the capacity to swiftly grow to meet demand and aim to enhance their operational flexibility by employing a pay as you go services to be able to pivot fast to answer rapidly changing company demands or market changes. Most crucially, they require adaptability regardless of location, whether in a private or public cloud, a co-lo, or at the edge. Many businesses see multi-cloud as a way to take advantage of new technologies.

With a consistent toolkit, you can manage the complexity of multi-cloud

The majority of businesses currently use several public and private cloud platforms, with 92 percent claiming to have a multi-cloud strategy and 82 percent using a hybrid cloud. While this might lead to more agile and speedier development, it can also lead to more complexity. As a result, businesses may face a slew of issues, including policy enforcement, security, compliance, cost management, and maintaining service levels. Organizations confront the problems of managing and operating different systems, as well as having specialist personnel educated and up to date, with multiple platforms to handle. Because all control planes and toolsets are centralized, teams just have to learn one tool and apply it everywhere, reducing complexity and operational strain.

IBM Cloud Pak for Data and IBM Performance Server: The Next Generation Netezza

Defend against single points of failure in the future

As more businesses rely on the public cloud, the risk of unplanned events—outages, natural disasters, criminal actors, or just plain old human error—increases. Protecting against single points of failure is more important than ever. Organizations desire the freedom and flexibility to run their workloads wherever they choose, which might include using several public or private clouds for redundancy and mobility. This also adds to the protection against vendor lock-in. While some firms have found it straightforward to migrate their data to the cloud, others have found it difficult to repatriate data or relocate to another cloud.

Dell Technologies is also improving software throughout its industry-leading storage portfolio, resulting in improved intelligence, automation, cyber resiliency, and multi-cloud flexibility.

Customers desire the convenience and agility of the cloud experience everywhere they operate, not just in the public cloud, according to Sam Grocott, Senior Vice President of Dell Technologies Business Unit (DTBU) Marketing in a blog post. As a result, the DTW announcements will provide enterprises the option to choose, allowing them to fully realize the multi-cloud potential.

Dell has extended its cooperation with AWS by announcing CyberSense for Dell PowerProtect Cyber Recovery for AWS. Adaptive analytics, scanning metadata, and whole files, as well as machine learning and forensic technologies, will enable organizations to find, diagnose, and speed up data recovery. It can also monitor files and databases for signs of a cyberattack and retrieve the last known incorruptible copy of data for a faster, more secure, and confidential recovery.

DELL TECHNOLOGIES

SUPERCLOUD is an innovation enabling to benefit from “on-demand” security services for multi-cloud environments.

Why SUPERCLOUD?

After adopting the Cloud to increase productivity, agility, and reduce operational costs, companies are now relying on multiple cloud service providers, 5 on average according to Gartner (2018), we call this multi-cloud strategy. This evolution is due to the desires both to no longer be dependent upon one single provider, but also to benefit from the most agile and most innovative solutions on the market, with higher levels of availability. Multi-cloud however means new security issues. For, in a context where, according to the Going Hybrid study (carried out for NTT Communications in March 2018), 84% of European businesses have adopted a multi-cloud approach, guaranteeing interoperability and flexibility in exploiting data, services and communication, therefore their security and dependability, has become a major issue. This is the challenge that the SUPERCLOUD project has managed to take up thanks to the European Union’s largest research and innovation programme, “Horizon 2020”, aimed at stimulating Europe’s economic competitivity.

“Horizon 2020”: a synergy of skills at the service of innovation

The aim of the actions of “Horizon 2020” is to foster collaboration between public and private sectors so as to develop research and innovation. With its partners from the consortium selected and financed by this programme, Orange is at the origin of a three-year project aiming to create a new and unprecedented secure Cloud infrastructure: the open source SUPERCLOUD framework, presented to the European Commission during its final review on 15th March this year in Brussels.

Tech Data Reboot CP4D Webinar

It is dedicated to the development of “on-demand” security services for multi-cloud environments. The SUPERCLOUD consortium unites nine organisations: industrial partners, research institutes, SMEs, and universities *, coming from six European countries and widely recognised in their respective scientific and technological fields. Each partner contributes its leading expertise, for example data protection for IBM, system security for Darmstadt University, security policies and network security for the Institut Mines Télécom, network virtualisation for the University of Lisbon, or the medical field for pilot projects carried out for Philips Healthcare and Electronics and for Maxdata Software. Furthermore, Technikon managed the administrative coordination of the project. Marc Lacoste, researcher in security at Orange specifies: “Orange’s teams, in synergy with the partners, defined a scientific vision of the project, and piloted it from a technical viewpoint”. Plus, Orange provided its expertise to design and develop the SUPERCLOUD technology through production of several components of the framework, for example those linked to virtualisation for multi-cloud, advanced cryptography for flexible data protection, or supervision of virtualised network security.

A user-centric Cloud or “U-Cloud”

In a multi-cloud environment, the general lack of interoperability and flexibility poses security and dependability problems. Furthermore, as each provider imposes its own security services – the “lock-in” phenomenon – it is difficult to configure them to closely adapt to user needs.

The SUPERCLOUD framework thus proposes a new approach to the management of security and of the availability of multi-cloud environments. This user-centric architecture enables the user to choose, autonomously and on-demand, their protection requirements and the security services necessary to guarantee these. In this way, the user defines “U-Clouds”, or isolated sets of services and data operating in multi-clouds. Their security is ensured thanks to the SUPERCLOUD framework, or security layer, deployed over existing public or private Clouds, separating user Clouds from those of providers.

“The SUPERCLOUD vision is built around four requirements: security must be in self-service mode, i.e. completely at the hand of the user; it must also be guaranteed end-to-end, i.e. transversely to all of the systems; equally, it must be automated, so self-managed; and finally, it must guarantee resilience, meaning resisting failures”.

The Commission’s assessors congratulated the consortium for the progress made, and in particular for the very high scientific and technical level of its solution. Thanks to this experiment, Orange brought to the fore the excellence of its research in the field of security. Notably via the dissemination of over 40 project articles in distinguished international publications, the coordination of a “Vision Paper” published at the IEEE, and even the co-organisation of several workshops such as during the ACM EuroSys 2017 conference.

Cutting-edge Data Analysis, Visualization, and Simulation in the Cloud

SUPERCLOUD for all, an example in the medical imaging field

Who can benefit from this new approach? All Cloud providers and companies that handle sensitive customer data and who wish nevertheless to benefit from the advantages of the Cloud. “The SUPERCLOUD enables freedom from the barriers set by current public or privates Clouds, and the combining of the deployment security that one finds in private Clouds with the upscaling flexibility of public Clouds”.

To prove this, the consortium completed, inter alia, a pilot project of a distributed medical imaging platform. Medical imaging is used more and more to perform telemedicine or diagnostic assistance for example. Hospitals store these images in several Clouds, and often need to send this data to each other, in a completely secure manner. To do this, the consortium deployed a distributed platform implementing the SUPERCLOUD framework. Hospitals can thus manage their imaging data exchanges thanks to an infrastructure that is powerful enough to guarantee both data dependability – for it to be accessible from any place at any time, and its security – to prevent it from being taken or used by unauthorised persons.

The medical field is far from being the only area that could benefit from this new security model in multi-cloud environments: “SUPERCLOUD uses are virtually limitless. As the framework is open, it can easily be used by all professions with specific security needs, like those of the financial sector for example, or the automobile industry, and of course all Cloud operators themselves”.

Orange is a major player in network security performance research, serving the development of emerging technologies that will be included in the innovative uses of tomorrow. To achieve this, the company initiates and carries out pioneering scientific research, teaming up with the best researchers in their fields. Its major participation in the SUPERCLOUD project is a superb demonstration of this.

Its a Cloud.. its a SuperComputer.. no, its SuperCloud!

The rise of the supercloud

Last week’s AWS re:Invent conference underscored the degree to which cloud computing generally and Amazon Web Services Inc. specifically have disrupted the technology landscape. From making infrastructure deployment simpler to accelerating the pace of innovation to the formation of the world’s most active and vibrant technology ecosystem, it’s clear that AWS has been the No. 1 force for industry change in the last decade.

Going forward, we see three high-level contributions from AWS that will drive the next 10 years of innovation: 1) the degree to which data will play a defining role in determining winners and losers; 2) the knowledge assimilation effect of AWS’ cultural processes such as two-pizza teams, customer obsession and working backwards; and 3) the rise of superclouds – that is, clouds built on top of hyperscale infrastructure that focus not only on information technology transformation, but on deeper business integration and digital transformation of entire industries.

In this Breaking Analysis, we’ll review some of the takeaways from the 10th annual AWS re:Invent conference and focus on how we see the rise of superclouds will have big impacts on the future of virtually all industries.

It’s happening

AWS re:Invent 2021 was the most important hybrid tech event of the year. No one really knew what the crowd would be like, but well over 20,000 people came to re:Invent, and probably an additional 5,000 to 10,000 folks came without badges to have meetings and do networking off the expo floor. So somewhere well north of 25,000 people physically attended the event, with 200,000-plus more online — huge for this year.

One of the most telling moments at re:Invent was a conversation with Steve Mullaney, chief executive of networking company Aviatrix Systems Inc. Just before we went on theCUBE, Nick Sturiale, managing partner at Ignition Partners, one of Aviatrix’ venture capital backers, looked at Steve and said: “It’s happening.”

What Sturiale meant by “It’s happening” is that the next era of cloud innovation is here and is beginning in earnest. The cloud is expanding out to the edge: AWS is bringing its operating model, application programming interfaces, primitives and services to more and more locations. Companies such as Aviatrix, and many others, are building capabilities on top of the cloud that don’t exist from the cloud providers today. And their strategy is to move fast in their respective domains to stay ahead and add value.

Yes, data and machine learning are critical – we talk about that all the time — but the ecosystem flywheel was so evident at this year’s re:Invent. Partners were charged up. There wasn’t nearly as much chatter about AWS competing with them. Rather, there was much more excitement around the value these partners are creating on top of AWS’ massive platform.

CloudLightning - MultiClouds: Challenges and Current Solutions

Despite aggressive marketing from competitive hyperscalers, other cloud providers and as-a-service on-premises and hybrid offerings, AWS’ lead appears to be accelerating. A notable example is AWS’ efforts around custom silicon. Far more companies, especially independent software vendors, are tapping into AWS’ silicon advancements.

We saw the announcement of Graviton3 and new chips for training and inference. As we’ve reported extensively, AWS is on a curve that will outpace Intel Corp.’s x86 chips in performance, price/performance, cost, power consumption and speed of innovation. And its Nitro platform is giving AWS and its partners the greatest degree of optionality in the industry – from central processing units, graphics processing units, Intel, Advanced Micro Devices Inc., Nvidia Corp. and, very importantly, Arm-based custom silicon springing from AWS’ acquisition of Annapurna Labs.

AWS started its custom silicon journey in 2008 and has invested massive resources into this effort. Other hyperscalers, which have the scale economics to justify such efforts, are just recently announcing initiatives in this regard. Others that don’t have the scale will be relying on third-party silicon providers– a perfectly reasonable strategy.

But because AWS has control of the entire software and hardware stack, we believe it has a strategic advantage in this respect. Silicon especially is a domain where – to quote Amazon.com Inc. CEO and former AWS CEO Andy Jassy – there is no compression algorithm for experience. Being on the curve matters. A lot.

And the biggest trend in our view this past week was the clear emergence of superclouds.

Rise of the supercloud

In his 2020 book “Rise of the Data Cloud,” written with Steve Hamm, Snowflake Inc. CEO Frank Slootman laid out the premise for the emergence of data cloud, a title that we’ve stolen in part for this Breaking Analysis — thank you, Frank Slootman. In his book, he made a case for companies to put data at the center of their organizations – rather than organizing just around people, for example.

The idea is to create data networks. Although people are of course critical, organizing around data and enabling people to access and share data will lead to the democratization of data and network effects will kick in – kicking off a renaissance in business productivity.

Essentially this is Metcalfe’s Law for data. Bob Metcalfe, the inventor of Ethernet, put forth the premise when we both worked for Pat McGovern at IDG. It states that the value of a network is proportional to the square of the number of its users on the network. Thought of another way, the first connection isn’t so valuable but the billionth is really valuable.

Slootman’s Law, if you will, says the more people that have access to the data (governed of course) and the more data connections that can be shared, the more value will be realized from that data. Exponential value, in fact.

What is a supercloud?

Supercloud is a term we first referenced in the post led by John Furrier prior to re:Invent. Supercloud describes an architecture that taps the underlying services and primitives of hyperscale clouds to deliver additional value above and beyond what’s available from public cloud providers. A supercloud delivers capabilities through software, consumed as services, and can run on a single hyperscale cloud or span multiple clouds.

In fact, to the degree that a supercloud can span multiple clouds — and on-premises workloads — and hide the underlying complexity of the infrastructure supporting this work, the more adoption and value will be realized.

We’ve listed some examples above of what we consider to be superclouds in the making. Snowflake is one of our favorite to cite and we use it frequently. It’s building a data cloud that spans multiple clouds and supports distributed data, but governs that data centrally – somewhat consistent with the data mesh approach.

Goldman Sachs Group Inc. announced at re:Invent this year a new data management cloud — the “Goldman Sachs Financial Cloud for Data with AWS.” We’ll come back to that later in more detail, but it’s a prime example of an industry supercloud.

Nasdaq CEO Adena Friedman spoke at the day one keynote and talked about the supercloud it’s building. Dish Network Corp. is building a supercloud to power 5G wireless networks. United Airlines Inc. is, at this time, focused on porting applications to AWS as part of its digital transformation, but eventually we predict it will start building out a supercloud travel platform. What was most significant about the United effort is the best practices it’s borrowing from AWS, such as small teams and moving fast. AWS is teaching customers how to build a culture to support the buildout of superclouds.

But many others that we’ve listed above are on a supercloud journey. Just some of the folks we talked to at re:Invent that are building clouds on top of clouds are shown. Cohesity Inc. is building out a data management supercloud focused on data protection and governance. Hashicorp announced its IPO at a $13 billion valuation– building an IT automation supercloud. Databricks Inc., ChaosSearch Inc., Zscaler Inc., which is building a security supercloud, and many others that we spoke with at the event.

Albert Reuther - Enabling Interactive, On-Demand HPC for Rapid Prototyping and ML

Castles in the cloud

We want to take a moment to talk about “Castles in the Cloud.” It’s a premise put forth by Jerry Chen and the team at Greylock. It’s a really important piece of work that is building out a dataset and categorizing the various cloud services to understand better where the cloud giants are investing and where startups can participate, how companies can play in the castles built by the hyperscalers, how they can cross the moats that have been built and where the innovation opportunities exist.

Superclouds are strong examples of companies leveraging the castles and crossing the moats built by hyperscalers.

Only four players have built hyperscale castles

Frequently, we’re challenged about our statements that there are only four hyperscalers – AWS, Microsoft Corp., Google LLC and Alibaba Group Holding Ltd. Although we recognize that certain companies, Oracle Corp. in particular, have done a good job of improving and building out their clouds, we don’t consider companies such as IBM Corp. and other smaller managed service providers to be hyperscalers. One of the main data points we use to defend our thinking is capital investment. There are many other key performance indicators such as size of ecosystem, partner acceleration and enablement and feature sets, but capital expenditure investment is a big factor in our thinking.

Above is a chart from Platformonomics LLC, a firm that’s obsessed with CapEx, showing annual CapEx spend for five cloud companies – Amazon, Google, Microsoft, IBM and Oracle. This data runs through 2019 and we’ve superimposed the direction each company is headed. Amazon spent more than $40 billion on CapEx in 2020 and will spend more than $50 billion this year. Sure, there are warehouses and other capital expenses in those numbers, but the vast majority is spent on building out its cloud infrastructure.

Same with Google and Microsoft. Oracle is at least increasing its CapEx to $4 billion — but it’s de minimis compared with the cloud giants. IBM is headed in the other direction – choosing to invest $34 billion in acquiring Red Hat instead of putting its capital into cloud infrastructure. It’s a reasonable strategy, but it underscores the gap and to us strongly supports the premise.

Update: According to Charles Fitzgerald, author of Platformonomics, 60% to 70% of Google and Microsoft capital spending goes toward data centers. His estimate for Amazon is under 50% — perhaps as low as 30% because of the large investment in warehouses and logistics. These revised assumptions would place each of these cloud providers in the $15 billion annual capex range, still substantially larger than the other firms cited above.  

IaaS revenue as an indicator

Another key metric we track is infrastructure-as-a-service revenue.

Above is an updated chart from the one we showed last month, which at the time excluded Alibaba’s most recent quarter. The change was not material, but the four hyperscalers, which invested more than $100 billion in CapEx last year, will together generate more than $120 billion in revenue this year. And they’re growing at 41% collectively. That is remarkable for such a large base of revenue. And for AWS, the rate of revenue growth is accelerating.

The point is, if you’re going to build a supercloud, why wouldn’t you start by building on top of these platforms (notwithstanding concerns about China with respect to Alibaba)?

How some early superclouds are performing

Supercloud is not a category within the ETR taxonomy. But we can evaluate some of the companies we’ve been following that we see as building superclouds by looking at ETR survey data. The chart above plots Net Score or spending momentum on the vertical axis and Market Share or presence in the ETR data set on the horizontal axis. Most every name on the chart is building a supercloud of some sort.

Let’s start by calling out AWS and Azure. They stand alone as the cloud leaders. You can debate what’s included in Azure and our previous chart on revenue attempts to strip out the Microsoft SaaS business, but this is a customer view. Customers see Microsoft as a cloud leader – which it is – so that’s why its presence is larger than AWS even though its IaaS revenue is significantly smaller. But they both have strong momentum on the vertical axis as shown by that red horizontal line. Remember, anything above that is considered elevated.

- Google cloud, as you see, is well behind those two leaders.

- Snowflake’s data cloud as supercloud

Look at Snowflake. We realize we repeat this often, but Snowflake continues to hold a Net Score in the mid-to-high 70’s and, at 165 mentions – which you can see in the inserted table – continues to expand its market presence.

Of all the technology companies we track, we feel Snowflake’s vision and execution on its data cloud strategy is the most prominent example of a supercloud. Truly, every tech company should be paying attention to Snowflake’s moves and carving out unique value propositions for their customers by standing on the shoulders of cloud giants (as ChaosSearch CEO Ed Walsh likes to say as his company contemplates its supercloud buildout).

Accelerate and Assure the Adoption of Cloud Data Platforms

On-prem dependency creates a wide range of supercloud maturity

In general, the more cloud-native the firm’s offering, the further along they’ll be toward building a supercloud — and the greater the momentum they’ll have. But typically these firms are coming from a smaller installed base and have foreclosed on the on-premises opportunity (for now).

On the left hand side of the chart above, you can see a number of companies we spoke with that are in various stages of building out their superclouds. Databricks, ThoughSpot Inc., DataRobot Inc., Zscaler, HashiCorp, Elastic N.V., Confluent Inc. – all above the 40% line. And somewhat below that line but still respectable, are those with a significant on-prem presence – VMware Inc. with Tanzu, Cohesity Inc., Rubrik Inc. and Veeam Software Inc. And there are many others that we didn’t necessarily talk to at re:Invent or they don’t show up in the ETR data set.

Cisco, Dell, HPE and IBM

We’ve called out Cisco Systems Inc., Dell Technologies Inc., Hewlett Packard Enterprise Co. and IBM on the chart because they all have large on-premises installed bases and different points of view. To varying degrees they are each building superclouds.

But to be frank, these large companies are first protecting their respective on-prem turf. You can’t blame them. They are all adding as-a-service offerings, which is cloudlike. They will rightly fight hard and compete on their respective portfolios, channels and vastly improved simplicity.

But when speaking to customers at re:Invent – and these are not just startups we talk to, since we’re talking about customers of enterprise tech companies like these – the customers want to build on AWS. They will fully admit they can’t or won’t move everything out of their data centers, but the vast majority of customers we spoke with have much more momentum around moving toward AWS.

Yes, of course there’s some recency bias because we just got back from re:Invent and it’s a conference full of Amazon customers, but the pace of play, the business savvy and the transformative mindset of these customers is obvious. And the numbers we shared earlier simply don’t lie. These customers are among the firms consuming technology and transforming their business at the most rapid pace.

Regarding these four players, they are starting to move in the supercloud direction — but they are late to the party. Nonetheless, a big strategic advantage is they have more credibility around multicloud than the hyperscalers. But on balance, AWS’ overall lead is accelerating in our opinion, the gap in my opinion is not closing.

A new breed of tech companies is emerging

In and around 2010 and 2011 we collaborated with two individuals who shaped our thinking about the big-data market. Peter Goldmacher at the time was a sell-side analyst at Cowen & Co. and Abhi Mehta was with Bank of America, transforming the bank’s data operations. Goldmacher said at the time that it was the buyers of big-data technologies – and those that applied it to their operations —  that would create the most value. He posited that they would create far more value than Cloudera Inc. or Hortonworks Inc., for example, and a collection of other big data players. Clearly he was right.

Mehta was a shining example of that premise and he posited on theCUBE that ecosystems would evolve within vertical industries around data — and the confluence of data and technology and machine intelligence would power the next generation of value creation.

Fast forward and apply this thinking to 2021….

Superclouds form around industries and data

Just after the first re:Invent, we published a post on Wikibon about the making of a new gorilla – AWS. And we said the way to compete would be to take an industry focus and become best-of-breed within your industry. We aligned with Mehta’s view that industry ecosystems would evolve around data and offer opportunities for nonhyperscalers to add value. What we didn’t predict at the time – but are seeing clearly emerge – is that these superclouds will be built on top of AWS and other clouds.

Goldman’s Financial Cloud for data is taking a page out of Amazon’s retail business: pointing its proprietary data, algorithms, tools and processes at its clients and making these assets available as a service on top of the AWS cloud. It’s a supercloud for financial services. It is relying on AWS for infrastructure, compute, storage, networking, security and services around machine learning to power its supercloud.

Nasdaq and Dish are similarly bringing forth their unique value. As we said earlier, United Airlines will in our view eventually evolve from migrating its apps to the cloud to building out a supercloud for travel.

This trend is taking shape in virtually every industry and geography and will establish a new breed of disruptive winners. Incumbents that move fast and capitalize on this trend will thrive in our view.

What about your logo? What is your supercloud strategy? We’re sure you’ve been thinking about it. Or perhaps you’re already well down the road. We’d love to hear how you’re doing it and whether you see the trends the same or differently.

Answering the top 10 questions about SuperCloud

More Information:

https://hellofuture.orange.com/en/supercloud-new-approach-security-multi-cloud-environment/

https://siliconangle.com/2021/12/07/the-rise-of-the-supercloud/

https://www.analyticsinsight.net/dell-technologies-future-will-highly-rely-on-its-multi-cloud-services/

https://www.analyticsinsight.net/moving-from-multicloud-to-supercloud-automation-takes-charge/

https://innovate.ieee.org/innovation-spotlight/multi-cloud-architecture-supercloud-u-cloud/















Orchestrated objective reduction (Orch OR)

$
0
0

 

Orchestrated objective reduction (Orch OR) - Theory

The Penrose–Hameroff theory of orchestrated objective reduction (Orch OR) claims that quantum computations in the brain account for consciousness 1. The communication among neurons by the secretion of neurotransmitters is based on synaptic vesicles distributed along their axons. The neuronal cytoskeleton has a key role in the dynamics of these vesicles. In the 1990s, Stuart Hameroff, psychologist at the University of Arizona, Tucson, USA, and Roger Penrose, mathematical physicist at the University of Oxford, Oxford, UK, proposed that microtubules, the smallest units of the cytoskeleton, are channels for the transfer of the integrated quantum information responsible for consciousness.

The cell cytoplasm is like an over-crowded dance floor at disco. The cytoskeleton strongly interacts with water molecules, metabolites, and moving proteins (like kinesins). These interactions are structural, signaling, and sometimes to orient the internal cytoskeleton. There is no known mechanism for protecting microtubules (rigid tubes made of the tubulin protein) from decoherence, the environmentally-induced destruction of quantum coherence, the unavoidable coupling of a quantum system with the environment. Quantum computing requires quantum coherence in order to use superpositions of quantum states to solve certain problems much more quickly than its classical counterpart. Without a protecting mechanism, the role of quantum computation in microtubules in the emergence of consciousness recalls to me water memory, Benveniste’s proposal to explain the mechanism by which homeopathic remedies work 2.

Figure 2. A tentatively proposed picture of a conscious event by quantum computing in one microtubule. | Credit: Hameroff & Penrose (2014).

Hameroff’s ideas in the hands of Penrose have developed almost to absurdity. There is no justification to the incorporation in the Orch OR theory of consciousness the Diósi–Penrose scheme for objective reduction of the quantum state 34. The tentative role of gravity in quantum state reduction (the so-called wavefunction collapse), by means of the Schrödinger–Newton equation, only introduces noise in the presentation of the Orch OR theory and distracts from its most important points. I will not discuss here the ideas of Diósi–Penrose explaining quantum measurements by means of the instability of quantum superpositions involving significant mass displacements.

The prevalent scientific view is that consciousness emerged as a property of biological organisms during the course of evolution. It is a beneficial adaptation that confers a survival advantage to conscious species. However, Orch OR theory claims that consciousness is an intrinsic feature of the action of the non-computable universe. Because humans are capable of knowing the truth of Gödel-unprovable statements, the Penrose–Lucas argument states that human thought is necessarily non-computable 5. However, the computational power of a quantum computer is exactly the same as a classical one, as proved in 1985 by Oxford University physicist David Deutsch. Quantum Turing machines are equivalent to (Classical) Turing machines, even if certain NP problems can be made efficient using quantum algorithms. In my opinion, to recur to the ‘magic’ of non-computability is not the best route to a scientific solution of the problem of consciousness.

Microtubules are part of the cytoskeleton of all eukaryotic cells, however consciousness is the result of neurons in the cerebral cortex. Microtubules are cylindrical polymers of 25 nanometers in diameter made of tubulin dimers, composed of alpha and beta monomers in a helical pathway. In 1982, Hameroff and Watt 6 suggested that tubulin dimers act as dipoles representing information (classical) bits of information. Microtubules act like two-dimensional Boolean switching matrices in a cellular automata. Early versions of Orch OR theory proposed a quantum version of these ideas: tubulin dimers acting as qubits (quantum bits). A beautiful theory killed by an ugly fact.

Figure 3. Upward (left) and downward (right) dipole ‘5-start’ helix in a microtubule. Their quantum superposition is the proposed qubit. |Credit: Hameroff & Penrose (2014).

Experimental results by Jeffrey R. Reimers et al. 7, and others8, have shown that microtubules can neither sustain long-lived quantum states nor support quantum information processing associated with tubulin dimers as qubits. The whole set of original ideas by Hameroff and Penrose have been killed by Nature. There is no quantum coherence over the required time scale. Electronic motion in tubulin dimers is in the range of 10 fs to 30 ps, while Orch OR theory needs quantum coherence on the 25 ms timescale. Without a decoherence protection system, similar to the one used in photosynthesis, quantum computing in microtubules is not plausible.

If your theoretical ideas are refuted by experiments, don’t worry, be happy, you only have to change your theory in order to escape current evidence. Hameroff and Penrose confess that the early version of Orch OR theory, published mainly from 1996 to 1998, is only a schematic cartoon version where tubulin dipoles act as qubits 9. In an effort to perpetuate the Orch OR model against experimental evidence, the new version of the Orch OR theory [1] uses as qubits the so-called ‘quantum channels’ (helical dipole pathways within the microtubule lattices). The helical pathway, akin to a ‘topological qubit’, was introduced in 2002 into Orch OR theory, after the structure of tubulin was elucidated by electron crystallography. Topological quantum computation in the brain appears to be a very suggestive approximation to the problem of consciousness. But the analogy between the quantum braids proposed in topological quantum computation in 1997 and the helical pathways of the Orch OR is extremely difficult to see for a physicist and computer scientist like me.

In my opinion, the new version of Orch OR theory, using as qubits the mesoscopic helical pathways of many tubulins in microtubule lattices, is even more unbelievable than the early version. Extraordinary claims require extraordinary evidence. Reimers et al. [6] states that Orch OR theory cannot be treated seriously without a precise description of the quantum states of the qubits, how these states become entangled, and a means of achieving quantum coherence over the required time scale.

Figure 4. 2D sheet of tubulin dimers fold into a single microtubule (top). STM images (bottom) of a single tubulin dimer (left) and a single microtubule (right). | Credit: Satyajit Sahua et al. (2013).

Like the claimed scientific evidence supporting homeopathy, Hameroff and Penrose [1] highlights the apparent quantum coherence up to 100 microseconds in single microtubules measured at warm temperature by the research group of Anirban Bandyopadhyay at the National Institute for Material Sciences in Tsukuba, Japan 1011. They claim that these results provide the first experimental validation of the Orch OR theory. However, Bandyopadhyay et al. 12 have measured quantum coherence in microtubule nanowire with and without water inside its channel. The changes of the coherence time between both cases are so small that I suspect that they are the result of some systematic errors not taking into account. Moreover, the claim of Hameroff and Penrose [1] that the measured coherence time is 250 times briefer than the 25 ms invoked for Orch OR events appears as pure numerology. Recall that the 25 ms signal corresponds to the 40 Hz gamma waves recorded in neural activity by using electroencephalography during some conscious perception experiments. I must recognize that my opinion is biased, but the yet to be reproduced results of the group of Bandyopadhyay reminds me Benveniste’s water memory.

The crucial validation or falsification of Orch OR theory must come from experimentation. The current “gold standard” in neuroscience is fMRI (functional Magnetic Resonance Imaging), but its spatial and temporal resolutions are not enough. Orch OR theory has been criticized repeatedly since its inception. Hameroff and Penrose admit that their brain’s microtubules at the interface between neurophysiology and quantum gravity are very speculative. They explicitly write that “the actual mechanisms underlying the production of consciousness in a human brain will be very much more sophisticated than any that we can put forward at the present time, and would be likely to differ in many important respects from any that we would be in a position to anticipate in our current proposals” [1].

Quantum biology is a hot topic, but its role in light harvesting in photosynthesis, magnetoreception, enzyme catalysis, or even DNA mutations, is far away from that in Orch OR theory. To be a detailed, testable, falsifiable, and reasonably rigorous approach to a theory of consciousness a new and mature version of the theory is needed. In my opinion, Orch OR is not a promising route to the nature of consciousness. Life is born out of “warm, wet & noisy” systems. Consciousness is like the Schrödinger’s cat of neuroscience.

Orchestrated objective reduction (Orch OR) is a theory which postulates that consciousness originates at the quantum level inside neurons, rather than the conventional view that it is a product of connections between neurons.

A review and update of a controversial 20-year-old theory of consciousness published in Physics of Life Reviews claims that consciousness derives from deeper level, finer scale activities inside brain neurons. The recent discovery of quantum vibrations in "microtubules" inside brain neurons corroborates this theory, according to review authors Stuart Hameroff and Sir Roger Penrose. They suggest that EEG rhythms (brain waves) also derive from deeper level microtubule vibrations, and that from a practical standpoint, treating brain microtubule vibrations could benefit a host of mental, neurological, and cognitive conditions.

The theory, called "orchestrated objective reduction" ('Orch OR'), was first put forward in the mid-1990s by eminent mathematical physicist Sir Roger Penrose, FRS, Mathematical Institute and Wadham College, University of Oxford, and prominent anesthesiologist Stuart Hameroff, MD, Anesthesiology, Psychology and Center for Consciousness Studies, The University of Arizona, Tucson. They suggested that quantum vibrational computations in microtubules were "orchestrated" ("Orch") by synaptic inputs and memory stored in microtubules, and terminated by Penrose "objective reduction" ('OR'), hence "Orch OR." Microtubules are major components of the cell structural skeleton.

Google Quantum AI Update 2022

Orch OR was harshly criticized from its inception, as the brain was considered too "warm, wet, and noisy" for seemingly delicate quantum processes.. However, evidence has now shown warm quantum coherence in plant photosynthesis, bird brain navigation, our sense of smell, and brain microtubules. The recent discovery of warm temperature quantum vibrations in microtubules inside brain neurons by the research group led by Anirban Bandyopadhyay, PhD, at the National Institute of Material Sciences in Tsukuba, Japan (and now at MIT), corroborates the pair's theory and suggests that EEG rhythms also derive from deeper level microtubule vibrations. In addition, work from the laboratory of Roderick G. Eckenhoff, MD, at the University of Pennsylvania, suggests that anesthesia, which selectively erases consciousness while sparing non-conscious brain activities, acts via microtubules in brain neurons.

"The origin of consciousness reflects our place in the universe, the nature of our existence. Did consciousness evolve from complex computations among brain neurons, as most scientists assert? Or has consciousness, in some sense, been here all along, as spiritual approaches maintain?" ask Hameroff and Penrose in the current review. "This opens a potential Pandora's Box, but our theory accommodates both these views, suggesting consciousness derives from quantum vibrations in microtubules, protein polymers inside brain neurons, which both govern neuronal and synaptic function, and connect brain processes to self-organizing processes in the fine scale, 'proto-conscious' quantum structure of reality."

After 20 years of skeptical criticism, "the evidence now clearly supports Orch OR," continue Hameroff and Penrose. "Our new paper updates the evidence, clarifies Orch OR quantum bits, or "qubits," as helical pathways in microtubule lattices, rebuts critics, and reviews 20 testable predictions of Orch OR published in 1998 -- of these, six are confirmed and none refuted."

An important new facet of the theory is introduced. Microtubule quantum vibrations (e.g. in megahertz) appear to interfere and produce much slower EEG "beat frequencies." Despite a century of clinical use, the underlying origins of EEG rhythms have remained a mystery. Clinical trials of brief brain stimulation aimed at microtubule resonances with megahertz mechanical vibrations using transcranial ultrasound have shown reported improvements in mood, and may prove useful against Alzheimer's disease and brain injury in the future.

Lead author Stuart Hameroff concludes, "Orch OR is the most rigorous, comprehensive and successfully-tested theory of consciousness ever put forth. From a practical standpoint, treating brain microtubule vibrations could benefit a host of mental, neurological, and cognitive conditions."

The review is accompanied by eight commentaries from outside authorities, including an Australian group of Orch OR arch-skeptics. To all, Hameroff and Penrose respond robustly.

Penrose, Hameroff and Bandyopadhyay will explore their theories during a session on "Microtubules and the Big Consciousness Debate" at the Brainstorm Sessions, a public three-day event at the Brakke Grond in Amsterdam, the Netherlands, January 16-18, 2014. They will engage skeptics in a debate on the nature of consciousness, and Bandyopadhyay and his team will couple microtubule vibrations from active neurons to play Indian musical instruments. "Consciousness depends on anharmonic vibrations of microtubules inside neurons, similar to certain kinds of Indian music, but unlike Western music which is harmonic," Hameroff explains.

Orchestrated objective reduction (Orch OR) is a theory which postulates that consciousness originates at the quantum level inside neurons, rather than the conventional view that it is a product of connections between neurons. The mechanism is held to be a quantum process called objective reduction that is orchestrated by cellular structures called microtubules. It is proposed that the theory may answer the hard problem of consciousness and provide a mechanism for free will.[1] The hypothesis was first put forward in the early 1990s by Nobel laureate for physics, Roger Penrose, and anaesthesiologist and psychologist Stuart Hameroff. The hypothesis combines approaches from molecular biology, neuroscience, pharmacology, philosophy, quantum information theory, and quantum gravity.[2][3]

While mainstream theories assert that consciousness emerges as the complexity of the computations performed by cerebral neurons increases,[4][5] Orch OR posits that consciousness is based on non-computable quantum processing performed by qubits formed collectively on cellular microtubules, a process significantly amplified in the neurons. The qubits are based on oscillating dipoles forming superposed resonance rings in helical pathways throughout lattices of microtubules. The oscillations are either electric, due to charge separation from London forces, or magnetic, due to electron spin—and possibly also due to nuclear spins (that can remain isolated for longer periods) that occur in gigahertz, megahertz and kilohertz frequency ranges.[2][6] Orchestration refers to the hypothetical process by which connective proteins, such as microtubule-associated proteins (MAPs), influence or orchestrate qubit state reduction by modifying the spacetime-separation of their superimposed states.[7] The latter is based on Penrose's objective-collapse theory for interpreting quantum mechanics, which postulates the existence of an objective threshold governing the collapse of quantum-states, related to the difference of the spacetime curvature of these states in the universe's fine-scale structure.[8]

Three Quantum Principles: the fundamentals of quantum physics and their relation to mind

Orchestrated objective reduction has been criticized from its inception by mathematicians, philosophers,[9][10][11][12][13] and scientists.[14][15][16] The criticism concentrated on three issues: Penrose's interpretation of Gödel's theorem; Penrose's abductive reasoning linking non-computability to quantum events; and the brain's unsuitability to host the quantum phenomena required by the theory, since it is considered too "warm, wet and noisy" to avoid decoherence.

In 1931, mathematician and logician Kurt Gödel proved that any effectively generated theory capable of proving basic arithmetic cannot be both consistent and complete. In other words, a mathematically sound theory lacks the means to prove itself. An analogous statement has been used to show that humans are subject to the same limits as machines.[17] However, in his first book on consciousness, The Emperor's New Mind (1989), Roger Penrose argued that Gödel-unprovable results are provable by human mathematicians.[18] He takes this disparity to mean that human mathematicians are not describable as formal proof systems, and are therefore running a non-computable algorithm.

If correct, the Penrose–Lucas argument leaves the question of the physical basis of non-computable behaviour open. Most physical laws are computable, and thus algorithmic. However, Penrose determined that wave function collapse was a prime candidate for a non-computable process. In quantum mechanics, particles are treated differently from the objects of classical mechanics. Particles are described by wave functions that evolve according to the Schrödinger equation. Non-stationary wave functions are linear combinations of the eigenstates of the system, a phenomenon described by the superposition principle. When a quantum system interacts with a classical system—i.e. when an observable is measured—the system appears to collapse to a random eigenstate of that observable from a classical vantage point.

If collapse is truly random, then no process or algorithm can deterministically predict its outcome. This provided Penrose with a candidate for the physical basis of the non-computable process that he hypothesized to exist in the brain. However, he disliked the random nature of environmentally induced collapse, as randomness was not a promising basis for mathematical understanding. Penrose proposed that isolated systems may still undergo a new form of wave function collapse, which he called objective reduction (OR).[7]

Penrose sought to reconcile general relativity and quantum theory using his own ideas about the possible structure of spacetime.[18][19] He suggested that at the Planck scale curved spacetime is not continuous, but discrete. He further postulated that each separated quantum superposition has its own piece of spacetime curvature, a blister in spacetime. Penrose suggests that gravity exerts a force on these spacetime blisters, which become unstable above the Planck scale of  and collapse to just one of the possible states. The rough threshold for OR is given by Penrose's indeterminacy principle:

An essential feature of Penrose's theory is that the choice of states when objective reduction occurs is selected neither randomly (as are choices following wave function collapse) nor algorithmically. Rather, states are selected by a "non-computable" influence embedded in the Planck scale of spacetime geometry. Penrose claimed that such information is Platonic, representing pure mathematical truths, which relates to Penrose's ideas concerning the three worlds: the physical, the mental, and the Platonic mathematical world. In Shadows of the Mind (1994), Penrose briefly indicates that this Platonic world could also include aesthetic and ethical values, but he does not commit to this further hypothesis.[19]

The Penrose–Lucas argument was criticized by mathematicians,[20][21][22] computer scientists,[12] and philosophers,[23][24][9][10][11] and the consensus among experts in these fields is that the argument fails,[25][26][27] with different authors attacking different aspects of the argument.[27][28] Minsky argued that because humans can believe false ideas to be true, human mathematical understanding need not be consistent and consciousness may easily have a deterministic basis.[29] Feferman argued that mathematicians do not progress by mechanistic search through proofs, but by trial-and-error reasoning, insight and inspiration, and that machines do not share this approach with humans.[21]

Orch OR

Penrose outlined a predecessor to Orch OR in The Emperor's New Mind, coming to the problem from a mathematical viewpoint and in particular Gödel's theorem, but lacked a detailed proposal for how quantum processes could be implemented in the brain. Stuart Hameroff separately worked in cancer research and anesthesia, which gave him an interest in brain processes. Hameroff read Penrose's book and suggested to him that microtubules within neurons were suitable candidate sites for quantum processing, and ultimately for consciousness.[30][31] Throughout the 1990s, the two collaborated on the Orch OR theory, which Penrose published in Shadows of the Mind (1994).[19]

Hameroff's contribution to the theory derived from his study of the neural cytoskeleton, and particularly on microtubules.[31] As neuroscience has progressed, the role of the cytoskeleton and microtubules has assumed greater importance. In addition to providing structural support, microtubule functions include axoplasmic transport and control of the cell's movement, growth and shape.[31]

Orch OR combines the Penrose–Lucas argument with Hameroff's hypothesis on quantum processing in microtubules. It proposes that when condensates in the brain undergo an objective wave function reduction, their collapse connects noncomputational decision-making to experiences embedded in spacetime's fundamental geometry. The theory further proposes that the microtubules both influence and are influenced by the conventional activity at the synapses between neurons.

Microtubule computation

A: An axon terminal releases neurotransmitters through a synapse and are received by microtubules in a neuron's dendritic spine.

B: Simulated microtubule tubulins switch states.[1]

Hameroff proposed that microtubules were suitable candidates for quantum processing.[31] Microtubules are made up of tubulin protein subunits. The tubulin protein dimers of the microtubules have hydrophobic pockets that may contain delocalized π electrons. Tubulin has other, smaller non-polar regions, for example 8 tryptophans per tubulin, which contain π electron-rich indole rings distributed throughout tubulin with separations of roughly 2 nm. Hameroff claims that this is close enough for the tubulin π electrons to become quantum entangled.[32] During entanglement, particle states become inseparably correlated. Hameroff originally suggested in the fringe Journal of Cosmology that the tubulin-subunit electrons would form a Bose–Einstein condensate.[33] He then proposed a Frohlich condensate, a hypothetical coherent oscillation of dipolar molecules. However, this too was rejected by Reimers' group.[34] Hameroff then responded to Reimers. "Reimers et al have most definitely NOT shown that strong or coherent Frohlich condensation in microtubules is unfeasible. The model microtubule on which they base their Hamiltonian is not a microtubule structure, but a simple linear chain of oscillators." Hameroff reasoned that such condensate behavior would magnify nanoscopic quantum effects to have large scale influences in the brain.

Hameroff then proposed that condensates in microtubules in one neuron can link with microtubule condensates in other neurons and glial cells via the gap junctions of electrical synapses.[35][36] Hameroff proposed that the gap between the cells is sufficiently small that quantum objects can tunnel across it, allowing them to extend across a large area of the brain. He further postulated that the action of this large-scale quantum activity is the source of 40 Hz gamma waves, building upon the much less controversial theory that gap junctions are related to the gamma oscillation.[37]

Related experimental results[edit]

In April 2022, the results of two related experiments were presented at The Science of Consciousness conference. In a study Hameroff was part of, Jack Tuszyński of the University of Alberta demonstrated that anesthetics hasten the duration of a process called delayed luminescence, in which microtubules and tubulins re-emit trapped light. Tuszyński suspects that the phenomenon has a quantum origin, with superradiance being investigated as one possibility. In the second experiment, Gregory D. Scholes and Aarat Kalra of Princeton University used lasers to excite molecules within tubulins, causing a prolonged excitation to diffuse through microtubules further than expected, which did not occur when repeated under anesthesia.[38][39] However, diffusion results have to be interpreted carefully, since even classical diffusion can be very complex due to the wide range of length scales in the fluid filled extracellular space.[40]

Dr Lucien Hardy: Thought, Matter, and Quantum Theory

Criticism

Question book-new.svg

This section relies too much on references to primary sources. Please improve this section by adding secondary or tertiary sources. 

Find sources: "Orchestrated objective reduction"– news · newspapers · books · scholar · JSTOR (February 2021) (Learn how and when to remove this template message)

Orch OR has been criticized both by physicists[14][41][34][42][43] and neuroscientists[44][45][46] who consider it to be a poor model of brain physiology. Orch OR has also been criticized for lacking explanatory power; the philosopher Patricia Churchland wrote, "Pixie dust in the synapses is about as explanatorily powerful as quantum coherence in the microtubules."[47]

Decoherence in living organisms

In 2000 Max Tegmark claimed that any quantum coherent system in the brain would undergo effective wave function collapse due to environmental interaction long before it could influence neural processes (the "warm, wet and noisy" argument, as it was later came to be known).[14] He determined the decoherence timescale of microtubule entanglement at brain temperatures to be on the order of femtoseconds, far too brief for neural processing. Christof Koch and Klaus Hepp also agreed that quantum coherence does not play, or does not need to play any major role in neurophysiology.[15][16] Koch and Hepp concluded that "The empirical demonstration of slowly decoherent and controllable quantum bits in neurons connected by electrical or chemical synapses, or the discovery of an efficient quantum algorithm for computations performed by the brain, would do much to bring these speculations from the 'far-out' to the mere 'very unlikely'."[15]

In response to Tegmark's claims, Hagan, Tuszynski and Hameroff claimed that Tegmark did not address the Orch OR model, but instead a model of his own construction. This involved superpositions of quanta separated by 24 nm rather than the much smaller separations stipulated for Orch OR. As a result, Hameroff's group claimed a decoherence time seven orders of magnitude greater than Tegmark's, although still far below 25 ms. Hameroff's group also suggested that the Debye layer of counterions could screen thermal fluctuations, and that the surrounding actin gel might enhance the ordering of water, further screening noise. They also suggested that incoherent metabolic energy could further order water, and finally that the configuration of the microtubule lattice might be suitable for quantum error correction, a means of resisting quantum decoherence.[48][49]

In 2009, Reimers et al. and McKemmish et al., published critical assessments. Earlier versions of the theory had required tubulin-electrons to form either Bose–Einsteins or Frohlich condensates, and the Reimers group noted the lack of empirical evidence that such could occur. Additionally they calculated that microtubules could only support weak 8 MHz coherence. McKemmish et al. argued that aromatic molecules cannot switch states because they are delocalised; and that changes in tubulin protein-conformation driven by GTP conversion would result in a prohibitive energy requirement.[41][34][42]

In 2022, a group of Italian researchers performed several experiments that falsified a related hypothesis by physicist Lajos Diósi.[50][51]

Neuroscience

Further information: Neuroscience

Sir Roger Penrose - AI, Consciousness, Computation, and Physical Law

Hameroff frequently writes: "A typical brain neuron has roughly 107 tubulins (Yu and Baas, 1994)", yet this is Hameroff's own invention, which should not be attributed to Yu and Baas.[52] Hameroff apparently misunderstood that Yu and Baas actually "reconstructed the microtubule (MT) arrays of a 56 μm axon from a cell that had undergone axon differentiation" and this reconstructed axon "contained 1430 MTs ... and the total MT length was 5750 μm."[52] A direct calculation shows that 107 tubulins (to be precise 9.3 × 106 tubulins) correspond to this MT length of 5750 μm inside the 56 μm axon.

Hameroff's 1998 hypothesis required that cortical dendrites contain primarily 'A' lattice microtubules,[53] but in 1994 Kikkawa et al. showed that all in vivo microtubules have a 'B' lattice and a seam.[54][55]

Orch OR also required gap junctions between neurons and glial cells,[53] yet Binmöller et al. proved in 1992 that these don't exist in the adult brain.[56] In vitro research with primary neuronal cultures shows evidence for electrotonic (gap junction) coupling between immature neurons and astrocytes obtained from rat embryos extracted prematurely through Cesarean section;[57] however, the Orch OR claim is that mature neurons are electrotonically coupled to astrocytes in the adult brain. Therefore, Orch OR contradicts the well-documented electrotonic decoupling of neurons from astrocytes in the process of neuronal maturation, which is stated by Fróes et al. as follows: "junctional communication may provide metabolic and electrotonic interconnections between neuronal and astrocytic networks at early stages of neural development and such interactions are weakened as differentiation progresses."[57]

Other biology-based criticisms have been offered, including a lack of explanation for the probabilistic release of neurotransmitter from presynaptic axon terminals[58][59][60] and an error in the calculated number of the tubulin dimers per cortical neuron.[52]

In 2014, Penrose and Hameroff published responses to some criticisms and revisions to many of the theory's peripheral assumptions, while retaining the core hypothesis.[2][6]

How quantum brain biology can rescue conscious free will

Stuart Hameroff1,2*

1Department of Anesthesiology, Center for Consciousness Studies, University of Arizona, Tucson, AZ, USA

2Department of Psychology, Center for Consciousness Studies, University of Arizona, Tucson, AZ, USA

Conscious “free will” is problematic because (1) brain mechanisms causing consciousness are unknown, (2) measurable brain activity correlating with conscious perception apparently occurs too late for real-time conscious response, consciousness thus being considered “epiphenomenal illusion,” and (3) determinism, i.e., our actions and the world around us seem algorithmic and inevitable. The Penrose–Hameroff theory of “orchestrated objective reduction (Orch OR)” identifies discrete conscious moments with quantum computations in microtubules inside brain neurons, e.g., 40/s in concert with gamma synchrony EEG. Microtubules organize neuronal interiors and regulate synapses. In Orch OR, microtubule quantum computations occur in integration phases in dendrites and cell bodies of integrate-and-fire brain neurons connected and synchronized by gap junctions, allowing entanglement of microtubules among many neurons. Quantum computations in entangled microtubules terminate by Penrose “objective reduction (OR),” a proposal for quantum state reduction and conscious moments linked to fundamental spacetime geometry. Each OR reduction selects microtubule states which can trigger axonal firings, and control behavior. The quantum computations are “orchestrated” by synaptic inputs and memory (thus “Orch OR”). If correct, Orch OR can account for conscious causal agency, resolving problem 1. Regarding problem 2, Orch OR can cause temporal non-locality, sending quantum information backward in classical time, enabling conscious control of behavior. Three lines of evidence for brain backward time effects are presented. Regarding problem 3, Penrose OR (and Orch OR) invokes non-computable influences from information embedded in spacetime geometry, potentially avoiding algorithmic determinism. In summary, Orch OR can account for real-time conscious causal agency, avoiding the need for consciousness to be seen as epiphenomenal illusion. Orch OR can rescue conscious free will.

Introduction: Three Problems with Free Will

We have the sense of conscious control of our voluntary behaviors, of free will, of our mental processes exerting causal actions in the physical world. But such control is difficult to scientifically explain for three reasons:

Consciousness and Causal Agency

What is meant, exactly, by “we” (or “I”) exerting conscious control? The scientific basis for consciousness, and “self,” are unknown, and so a mechanism by which conscious agency may act in the brain to exert causal effects in the world is also unknown.

Does Consciousness Come Too Late?

Brain electrical activity correlating with conscious perception of a stimulus apparently can occur after we respond to that stimulus, seemingly consciously. Accordingly, science and philosophy generally conclude that we act non-consciously, and have subsequent false memories of conscious action, and thus cast consciousness as epiphenomenal and illusory (e.g., Dennett, 1991; Wegner, 2002).

Beyond algorithm based AI concepts: consciousness and its generation on machines

Determinism

Even if consciousness and a mechanism by which it exerts real-time causal action came to be understood, those specific actions could be construed as entirely algorithmic and inevitably pre-ordained by our deterministic surroundings, genetics and previous experience.

We do know that causal behavioral action and other cognitive functions derive from brain neurons, and networks of brain neurons, which integrate inputs to thresholds for outputs as axonal firings, which then collectively control behavior. Such actions may be either (seemingly, at least) conscious/voluntary, or non-conscious (i.e., reflexive, involuntary, or “auto-pilot”). The distinction between conscious and non-conscious activity [the “neural correlate of consciousness (NCC)”] is unknown, but often viewed as higher order emergence in computational networks of integrate-and-fire neurons in cortex and other brain regions (Scott, 1995). Cortical-cortical, cortical-thalamic, brainstem and limbic networks of neurons connected by chemical synapses are generally seen as neurocomputational frameworks for conscious activity, (e.g., Baars, 1988; Crick and Koch, 1990; Edelman and Tononi, 2000; Dehaene and Naccache, 2001), with pre-frontal and pre-motor cortex considered to host executive functions, planning and decision making.

But even if specific networks, neurons, membrane, and synaptic activities involved in consciousness were completely known, questions would remain. Aside from seemingly occurring too late for conscious control, neurocomputational activity fails to: (1) distinguish between conscious and non-conscious (“auto-pilot”) cognition, (2) account for long-range gamma synchrony electro-encephalography (“EEG”), the best measurable NCC (Singer and Gray, 1995), for which gap junction electrical synapses are required, (3) account for “binding” of disparate activities into unified percepts, (4) consider scale-invariant (“fractal-like,” “1/f”) brain dynamics and structure, and (5) explain the “hard problem” of subjective experience (e.g., Chalmers, 1996). A modified type of neuronal network can resolve some of these issues, but to fully address consciousness and free will, something else is needed. Here I propose the missing ingredient is finer scale, deeper order, molecular-level quantum effects in cytoskeletal microtubules inside brain neurons.

In particular, the Penrose–Hameroff “Orch OR” model suggests that quantum computations in microtubules inside brain neurons process information and regulate membrane and synaptic activities. Microtubules are lattice polymers of subunit proteins called “tubulin.” Orch OR proposes tubulin states in microtubules act as interactive information “bits,” and also as quantum superpositions of multiple possible tubulin states (e.g., quantum bits or qubits). During integration phases, tubulin qubits interact by entanglement, evolve and compute by the Schrödinger equation, and then reduce, or collapse to definite states, e.g., after 25 ms in gamma synchrony. The quantum state reduction is due to an objective threshold [“objective reduction (OR)”] proposed by Penrose, accompanied by a moment of conscious awareness. Synaptic inputs and other factors “orchestrate” the microtubule quantum computations, hence “orchestrated objective reduction (Orch OR).”

Orch OR directly addresses conscious causal agency. Each reduction/conscious moment selects particular microtubule states which regulate neuronal firings, and thus control conscious behavior. Regarding consciousness occurring “too late,” quantum state reductions seem to involve temporal non-locality, able to refer quantum information both forward and backward in what we perceive as time, enabling real-time conscious causal action. Quantum brain biology and Orch OR can thus rescue free will.

Consciousness, Brain, and Causality

Consciousness involves awareness, phenomenal experience (composed of what philosophers term “qualia”), sense of self, feelings, apparent choice and control of actions, memory, a model of the world, thought, language, and, e.g., when we close our eyes, or meditate, internally-generated images and geometric patterns. But what consciousness actually is remains unknown.

Most scientists and philosophers view consciousness as an emergent property of complex computation among networks of the brain's 100 billion “integrate-and-fire” neurons. In digital computers, discrete voltage levels represent information units (e.g., “bits”) in silicon logic gates. McCulloch and Pitts (1943) arranged logic gates as integrate-and-fire silicon neurons, leading to “perceptrons” (Rosenblatt, 1962; Figure 1) and self-organizing “artificial neural networks” capable of learning and self-organized behavior. Similarly, according to the standard “Hodgkin and Huxley” (1952) model, biological neurons are “integrate-and-fire” threshold logic device in which multiple branched dendrites and a cell body (soma) receive and integrate synaptic inputs as membrane potentials. The integrated potential is then compared to a threshold potential at the axon hillock, or axon initiation segment (AIS). When AIS threshold is reached by the integrated potential, an all-or-none action potential “firing,” or “spike” is triggered as output, conveyed along the axon to the next synapse. Axonal firings can manifest will and behavior, e.g., causing other neurons to move muscles or speak words.


Figure 1

www.frontiersin.org

FIGURE 1. THREE CHARACTERIZATIONS OF INTEGRATE-AND-FIRE NEURONS. TOP: Biological neuron with multiple dendrites and one cell body (soma) receive and integrate synaptic inputs as membrane potentials which are compared to a threshold at the axon initiation segment (AIS). If threshold is met, axonal spikes/firings are triggered along a single axon which branches distally to convey outputs. MIDDLE: computer-based artificial neuron (e.g., a “perceptron,” Rosenblatt, 1962) with multiple weighted inputs and single branched output. BOTTOM: model neuron (see subsequent figures) showing the same essential features with three inputs on one dendrite and single axonal output which branches distally.

Some contend that consciousness emerges from axonal firing outputs, “volleys,” or “explosions” from complex neurocomputation (Koch, 2004; Malach, 2007). But coherent axonal firings are preceded and caused by synchronized dendritic/somatic integrations, suggesting consciousness involves neuronal dendrites and cell bodies/soma, i.e., in integration phases of “integrate-and-fire” sequences (Pribram, 1991; Eccles, 1992; Woolf and Hameroff, 2001; Tononi, 2004). Integration implies merging and consolidation of multiple input sources to one output, e.g., chemical synaptic inputs integrated toward threshold for firing, commonly approximated as linear summation of dendritic/somatic membrane potentials. However actual integration is active, not passive, and involves complex logic and signal processing in dendritic spines, branch points and local regions, amplification of distal inputs, and changing firing threshold at the AIS trigger zone (Shepherd, 1996; Sourdet and Debanne, 1999; Poirazi and Mel, 2001). Dendrites and soma are primary sources of EEG, and sites of anesthetic action which erase consciousness with little or no effects on axonal firing capabilities. Arguably, dendritic/somatic integration is closely related to consciousness, with axonal firings the outputs of conscious (or non-conscious) processes. Nonetheless, according to the Hodgkin–Huxley model, integration is assumed to be completely algorithmic and deterministic (Figure 2A), leaving no apparent room for conscious free will.

INTEGRATE-AND-FIRE NEURONAL BEHAVIORS. (A) The Hodgkin–Huxley model predicts integration by membrane potential in dendrites and soma reach a specific, narrow threshold potential at the proximal axon (AIS), and fire with very low temporal variability (small tb–ta) for given inputs. (B) Recordings from cortical neurons in awake animals (Naundorf et al., 2006) show a large variability in effective firing threshold and timing. Some unknown “x-factor” (related to consciousness?) exerts causal influence on firing and behavior. Here, quantum temporal non-locality results in backward time referral, suggested as the “x-factor” modulating firing threshold.

However, Naundorf et al. (2006) showed that firing threshold in cortical neurons in brains of awake animals (compared to neurons in slice preparations) varies widely on a spike-to-spike, firing-to-firing basis. Some factor other than the integrated AIS membrane potential contributes to firing, or not firing (Figure 2B). Firings control behavior. This “x-factor,” modulating integration and adjusting firing threshold and timing, is perfectly positioned for causal action, for conscious free will. What might it involve? Figure 2B indicates possible modification of integration and firing threshold by backward time referral.

Anatomically, a source for integration and firing threshold modification comes from lateral connections among neurons via gap junctions, or electrical synapses (Figure 3). Gap junctions are membrane protein complexes in adjacent neurons (or glia) which fuse the two cells and synchronize their membrane polarization states e.g., in gamma synchrony EEG (Dermietzel, 1998; Draguhn et al., 1998; Galarreta and Hestrin, 1999; Bennett and Zukin, 2004; Fukuda, 2007), the best measurable NCC (Gray and Singer, 1989; Fries et al., 2002; Kokarovtseva et al., 2009). Gap junction-connected cells also have continuous intracellular spaces, as open gap junctions between cells act like windows, or doors between adjacent rooms. Neurons connected by dendritic-dendritic gap junctions have synchronized local field potentials (EEG) in integration phase, but not necessarily synchronous axonal firing outputs. Thus gap junction synchronized dendritic networks can collectively integrate inputs, and provide an x-factor in selectively controlling firing outputs (Hameroff, 2010). Gap junction dynamics may also enable mobile agency in the brain. As gap junctions open and close, synchronized zones of collective integration and conscious causal agency can literally move through the brain, modulating integration, firing thresholds and behavior (Figure 4; Hameroff, 2010; Ebner and Hameroff, 2011). As consciousness can occur in different brain locations at different times, the NCC may be a mobile zone exerting conscious causal agency in various brain regions at different times.

(A) Dendrites of adjacent neurons linked by gap junction which remain closed. The gap junction connection is “sideways,” lateral to the flow of synaptic information. (B) Dendritic-dendritic gap junction open, synchronizing (vertical stripes) electrophysiology and enabling collective integration among gap junction-connected neurons.

TWO TIMESTEPS IN A NEUROCOMPUTATIONAL NETWORK OF INTEGRATE-AND-FIRE NEURONS. Inputs come from left, outputs go to top, bottom and right. Dendritic-dendritic gap junctions may open, e.g., between striped dendrites and soma to form “synchronized webs.” As gap junctions open and close, the synchronized web can move through the network, e.g., Step 1, 2. Mobile webs are candidates for the neural correlates of consciousness (NCC). Outputs marked by * reflect collective integration and suggest conscious causal agency.

But why would such causal agency be conscious? And with membranes synchronized, how do gap junction-connected neurons share and integrate information? Evidence points to the origins of behavior and consciousness at a deeper order, finer scale within neurons, e.g., in cytoskeletal structures such as microtubules which organize cell interiors.

Quantum supremacy: Benchmarking the Sycamore processor (QuantumCasts)

A Finer Scale?

Single cell organisms like Paramecium swim about, avoid obstacles and predators, find food and mates, and have sex, all without any synaptic connections. They utilize cytoskeletal structures such as microtubules (in protruding cilia and within their internal cytoplasm) for sensing and movement. The single cell slime mold Physarum polycephalum sends out numerous tendrils composed of bundles of microtubules, forming patterns which, seeking food, can solve problems and escape a maze (e.g., Adamatzky, 2012). Observing the purposeful behavior of single cell creatures, neuroscientist Charles Sherrington (1957) remarked: “of nerve there is no trace, but perhaps the cytoskeleton might serve.”

Interiors of animal cells are organized by the cytoskeleton, a scaffolding-like protein network of microtubules, microtubule-associated proteins (MAPs), actin and intermediate filaments (Figure 5A). Microtubules are cylindrical polymers 25 nm (nm = 10−9 m) in diameter, composed usually of 13 longitudinal protofilaments, each a chain of the peanut-shaped protein tubulin (Figure 5B). Microtubules self-assemble from tubulin, a ferroelectric dipole arranged within microtubules in two types of hexagonal lattices (A-lattice and B-lattice; Tuszynski et al., 1995), each slightly twisted, resulting in differing neighbor relationships among each subunit and its six nearest neighbors. Pathways along contiguous tubulins in the A-lattice form helical pathways which repeat every 3, 5, and 8 rows on any protofilament (the Fibonacci series; Figure 5B).

(A) Axon terminal (left) with two internal microtubules releasing neurotransmitters into synapse and onto receptors in membrane of dendritic spine. Actin filaments (as well as soluble second messengers, not shown) connect to cytoskeletal microtubules in main dendrite. Dendritic microtubules (right) are arranged in local networks, interconnected by microtubule-associated proteins (MAPs). (B) Larger scale showing two types of microtubule information processing. Top row: four timesteps in a microtubule automata simulation, each tubulin holding a bit state, switching e.g., at 10 megahertz (Rasmussen et al., 1990; Sahu et al., 2012). Bottom row: four topological bits in a microtubule. Information represented as specific helical pathways of conductance and information transfer. Microtubule mechanical resonances come into play (Hameroff et al., 2002; Sahu et al., 2012).

Each tubulin may differ from among its neighbors by genetic variability, post-translational modifications, binding of ligands and MAPs, and moment to moment dipole state transitions. Thus microtubules have enormous capacity for complex information representation and processing, are particularly prevalent in neurons (109 tubulins/neuron), and uniquely stable and configured in dendrites and cell bodies (Craddock et al., 2012a). Microtubules in axons (and non-neuronal cells) are arrayed radially, extending continuously (all with the same polarity) from the centrosome near the nucleus, outward toward the cell membrane. However microtubules in dendrites and cell bodies are interrupted, of mixed polarity, stabilized, and arranged in local recursive networks suitable for learning and information processing (Figure 5A; Rasmussen et al., 1990).

Neuronal microtubules regulate synapses in several ways. They serve as tracks and guides for motor proteins (dynein and kinesin) which transport synaptic precursors from cell body to distal synapses, encountering, and choosing among several dendritic branch points and many microtubules. The guidance mechanism for such delivery, choosing the proper path, is unknown, but seems to involve the MAP tau as a traffic signal (placement of tau at specific sites on microtubules being the critical feature). In Alzheimer's disease, tau is hyperphosphorylated and dislodged from destabilized microtubules. Disruption of microtubules and formation of neurofibrillary tangles composed of free, hyperphosphorylated tau correlates with memory loss in Alzheimer's disease (Matsuyama and Jarvik, 1989; Craddock et al., 2012b), and post-anesthetic cognitive dysfunction (Craddock et al., 2012c).

Due to their lattice structure and direct involvement in organizing cellular functions, microtubules have been suggested to function as information processing devices. After Sherrington's (1957) broad observation about cytoskeletal information processing, Atema (1973) proposed that tubulin conformational changes propagate as signals along microtubules. Hameroff and Watt (1982) suggested that microtubule lattices act as two-dimensional Boolean computational switching matrices with input/output occurring via MAPs. Microtubule information processing has also been viewed in the context of cellular (“molecular”) automata in which tubulin states interact with hexagonal lattice neighbor tubulin states by dipole couplings, synchronized by biomolecular coherence as proposed by Fröhlich (1968, 1970, 1975); (Smith et al., 1984; Rasmussen et al., 1990). Simulations of microtubule automata based on tubulin states show rapid information integration and learning. Recent evidence indicates microtubules have resonances at frequency ranges from 10 kHz to 10 MHz, and possibly higher (Sahu et al., 2012). Topological computing can also occur in which helical pathways through the skewed hexagonal lattice are the relevant states, or bits (Figure 2B, bottom). Particular resonance frequencies may correlate with specific helical pathways.

With roughly 109 tubulins per neuron switching at e.g., 10 MHz (107), the potential capacity for microtubule-based information processing is 1016 operations/s per neuron. Integr-ation in microtubules (influenced by encoded memory), and synchronized in collective integration by gap junctions may be an x-factor in altering firing threshold and exerting causal agency in sets of synchronized neurons. But even a deeper order, finer scale microtubule-based process in a self-organizing zone of conscious agency would still be algorithmic and deterministic, and fail to address completely the problems of consciousness and free will.

Quantum Computing with Neutral Atoms | Seminar Series with Ivan Deutsch

And another problem looms.

Is Consciousness Too Late?

Several lines of evidence suggest that real time conscious action is an illusion, that we act non-consciously and have belated, false impressions of conscious causal action. This implies that free will does not exist, that consciousness is epiphenomenal, and that we are, as Huxley (1893/1986) bleakly summarized, “merely helpless spectators.” Apparent evidence against real-time conscious action includes the following:

Sensory Consciousness Comes Too Late for Conscious Response

Neural correlates of conscious perception occur 150–500 ms after impingement on our sense organs, apparently too late for causal efficacy in seemingly conscious perceptions and willful actions, often initiated or completed within 100 ms after sensory impingement. Velmans (1991, 2000) listed a number of examples: analysis of sensory inputs and their emotional content, phonological, and semantic analysis of heard speech and preparation of one's own spoken words and sentences, learning and formation of memories, and choice, planning and execution of voluntary acts. Consequently, the subjective feeling of conscious control of these behaviors is deemed illusory (Dennett, 1991; Wegner, 2002).

In speech, evoked potentials (EPs) indicating conscious word recognition occur about 400 ms after auditory input, however semantic meaning is appreciated (and response initiated) after only 200 ms. As Velmans points out, only two phonemes are heard by 200 ms, and an average of 87 words share their first two phonemes. Even when contextual effects are considered, semantic processing and initiation of response occur before conscious recognition (Van Petten et al., 1999).

Gray (2004) observes that in tennis “The speed of the ball after a serve is so great, and the distance over which it has to travel so short, that the player who receives the serve must strike it back before he has had time consciously to see the ball leave the server's racket. Conscious awareness comes too late to affect his stroke.” McCrone (1999): “[for] tennis players … facing a fast serve … even if awareness were actually instant, it would still not be fast enough ….” Nonetheless tennis players claim to see the ball consciously before they attempt to return it.

Control of transmon qubits using a cryogenic CMOS integrated circuit (QuantumCasts)

Readiness Potentials

Kornhuber and Deecke (1965) recorded brain electrical activity over pre-motor cortex in subjects who were asked to move their finger randomly, at no prescribed time. They found that brain electrical activity preceded finger movement by ~800 ms, calling this activity the readiness potential (“RP,” Figure 6A). Libet and colleagues (1983) repeated the experiment, except they also asked subjects to note precisely when they consciously decided to move their finger. (To do so, and to avoid delays caused by verbal report, Libet et al. used a rapidly moving clock and asked subjects to note when on the clock they consciously decided to move their finger). This conscious decision came ~200 ms before actual finger movement, hundreds of milliseconds after onset of the RP. Libet and many authorities concluded that the RP represented non-conscious determination of movement, that many seemingly conscious actions are actually initiated by nonconscious processes, and that conscious intent was an illusion. Consciousness apparently comes too late. However, as shown in Figure 6B, temporal non-locality enabling backward time referral of (quantum) information from the moment of conscious intent can account for necessary RP preparation.

 THE “READINESS POTENTIAL (RP)” (LIBET ET AL., 1983). (A) Cortical potentials recorded from a subject instructed to move his/her hand whenever he/she feels ready, and to note when the decision was made (Conscious intent), followed quickly by the finger actually moving. (Time between Conscious intent, and finger moving is fixed.) Readiness potential, RP, preceding Conscious intent is generally interpreted as representing the Non-conscious choice to move the finger, with Conscious intent being illusion. (B) Assuming RP is necessary preparation for conscious finger movement, Actual conscious intent could initiate the earlier RP by (quantum) temporal non-locality and backward time referral, enabling preparation while preserving real time conscious intent and control.

And yet we feel as though we act consciously in real time. To account for this paradox, Dennett (1991); (cf. Dennett and Kinsbourne, 1992) described real time conscious perception and action as retrospective construction, as illusion. His multiple drafts model proposed sensory inputs and cognitive processing produced tentative contents under continual revision, with the definitive, final edition only inserted into memory, overriding previous drafts (“Orwellian Revisionism” after George Orwell's fictional, retroactive “Ministry of Truth” in the novel 1984). Perceptions are edited and revised over hundreds of milliseconds, a final version inserted into memory. In this view (more or less the standard in modern philosophy and neuroscience) the brain retrospectively creates content or judgment, e.g., of real time conscious control which is recorded in memory as veridical truth. In other words, we act non-consciously in real time, but then falsely remember acting consciously. Consciousness, in this view, is an epiphenomenal illusion occurring after-the-fact. We are living in the past.

For example in the “color phi” effect (Kolers and von Grunau, 1976) a red spot appears briefly on the left side of a screen, followed after a pause by a green spot on the right side. Conscious observers report one spot moving back and forth, changing to green halfway across the screen, the brain seemingly “filling in” (Figure 7). Yet after a sequence of such observations, if the spot on the right is suddenly red (instead of green), the subject is not fooled and fills in continuously with red halfway across. Does the brain know in advance to which color the dot will change? No, says Dennett. The brain fills in the proper color in a subsequent draft, and belatedly imprints it into conscious memory. Consciousness occurs after the fact (Figure 7A). Any conscious response to the color change would occur well after presentation, dooming free will. However a quantum explanation with temporal non-locality and backward time referral enables constructive “filling in” from near future brain activity, allowing real time conscious perception (Figure 7B). Is there any evidence for backward time effects in the brain?

IN THE “COLOR PHI” PHENOMENON (KOLERS AND VON GRUNAU, 1976). A red circle appears on the left side of a screen, disappears, and then, a fraction of a second later, a green circle appears on the right side. An observer consciously “sees” a red circle moving continuously from left to right, changing to green halfway across. (A) According to Dennett's “Orwellian Revisionism,” the brain constructs, or fills in the movement and transition after the fact, and inserts a constructed perception into memory. Real-time perception is not conscious. (B) In a “Quantum Explanation,” temporal non-locality and backward time referral allow real-time, veridical conscious perception.

Max Velmans on How to Understand Causal Interactions between Consciousness and Brain


Backward Time Effects in the Brain? Three Lines of Evidence

Libet's “Open Brain” Sensory Experiments

In addition to volitional studies (moving a finger), Libet and colleagues studied the timing of conscious sensory experience in awake, cooperative patients undergoing brain surgery with local anesthesia (e.g., Libet et al., 1964, 1979; Libet, 2004). With his neurosurgical colleagues, in these patients Libet was able to record from, and stimulate specific areas of somatosensory cortex, e.g., corresponding to the skin of each patient's hand, and the hand itself (Figures 8 and 9), as well as communicate with the conscious patients.

CORTICAL POTENTIALS IN LIBET'S SENSORY EXPERIMENTS. (A) Peripheral stimulation, e.g., at the hand, results in near-immediate conscious experience of the stimulation, an evoked potential EP at ~30 ms in the “hand area” of somatosensory cortex, and several 100 ms of ongoing cortical electrical activity. (B) Direct cortical activity of the somatosensory cortical hand area for several 100 ms results in no EP, ongoing cortical activity, and conscious sensory experience of the hand, but only after ~500 ms. Libet termed the 500 ms of cortical activity resulting in conscious experience “neuronal adequacy.”

LIBET'S SENSORY EXPERIMENTS, CONTINUED. (A) Libet et al. stimulated medial lemniscus of thalamus in the sensory pathway to produce an EP (~30 ms) in somatosensory cortex, but only brief post-EP stimulation, resulting in only brief cortical activity. There was no apparent “neuronal adequacy,” and no conscious experience. An EP and several 100 ms of post-EP cortical activity (neuronal adequacy) were required for conscious experience at the time of EP. (B) To account for his findings, Libet concluded that subjective information was referred backward in time from neuronal adequacy (~500 ms) to the EP.

As depicted in Figure 8A, peripheral stimulus, e.g., of the skin of the hand, resulted in an “EP” spike in the somatosensory cortical area for the hand ~30 ms after skin contact, consistent with the time required for a neuronal signal to travel from hand to spinal cord, thalamus, and brain. The stimulus also caused several 100 ms of ongoing cortical activity following the EP. Subjects reported conscious experience of the stimulus (using Libet's rapidly moving clock) near-immediately, e.g., at the time of the EP at 30 ms.

Libet also stimulated the “hand area” of subjects' brain somatosensory cortex directly (Figure 8B). This type of stimulation did not cause an EP spike, but did result in ongoing brain electrical activity. Conscious sensation referred to (“felt in”) the hand occurred, but only after stimulation and ongoing brain activity lasting up to 500 ms (Figure 8B). This requirement of ongoing, prolonged electrical activity (what Libet termed “neuronal adequacy”) to produce conscious experience (“Libet's 500 ms”) was subsequently confirmed by Amassian et al. (1991), Ray et al. (1999), Pollen (2004) and others.

But if hundreds of milliseconds of brain activity are required for neuronal adequacy, how can conscious sensory experience occur at 30 ms? To address this issue, Libet also performed experiments in which stimulation of thalamus resulted in an EP at 30 ms, but only brief ongoing activity, i.e., without neuronal adequacy (Figure 9A). No conscious experience occurred. Libet concluded that for real-time conscious perception (e.g., at the 30 ms EP), two factors were necessary: an EP at 30 ms, and several 100 ms of ongoing cortical activity (neuronal adequacy) after the EP. Somehow, apparently, the brain seems to know what will happen after the EP. Libet concluded the hundreds of milliseconds of ongoing cortical activity (“neuronal adequacy”) is the sine qua non for conscious experience—the NCC, even if it occurs after the conscious experience. To account for his results, he further concluded that subjective information is referred backwards in time from the time of neuronal adequacy to the time of the EP (Figure 9B). Libet's backward time assertion was disbelieved and ridiculed (e.g., Churchland, 1981; Pockett, 2002), but never refuted (Libet, 2002, 2003).

The neural basis of consciousness

Pre-Sentiment and Pre-Cognition

Electrodermal activity measures skin impedance, usually with a probe wrapped around a finger, as an index of autonomic, sympathetic neuronal activity causing changes in blood flow and sweating, in turn triggered by emotional response in the brain. Over many years, researchers (Bierman and Radin, 1997; Bierman and Scholte, 2002; Radin, 2004) have published a number of well-controlled studies using electrodermal activity, fMRI and other methods to look for emotional responses, e.g., to viewing images presented at random times on a computer screen. They found, not surprisingly, that highly emotional (e.g., violent, sexual) images elicited greater responses than neutral, non-emotional images. But surprisingly, the changes occurred half a second to two seconds before the images appeared. They termed the effect pre-sentiment because the subjects were not consciously aware of the emotional feelings. Non-conscious emotional sentiment (i.e., feelings) appeared to be referred backward in time. These studies were published in the parapsychology literature, as mainstream academic journals refused to consider them.

Bem (2012) published “Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect” in the mainstream J. Pers. Soc. Psychol. The article reported on eight studies showing statistically significant backward time effects, most involving non-conscious influence of future emotional effects (e.g., erotic or threatening stimuli) on cognitive choices. Studies by others have reported both replication, and failure to replicate, the controversial results.

Quantum Delayed Choice Experiments

In the famous “double slit experiment,” quantum entities (e.g., photons, electrons) can behave as either waves, or particles, depending on the method chosen to measure them. Wheeler (1978) described a thought experiment in which the measurement choice (by a conscious human observer) was delayed until after the electron or other quantum entity passed though the slits, presumably as either wave or particle. Wheeler suggested the observer's delayed choice could retroactively influence the behavior of the electrons, e.g., as waves or particles. The experiment was eventually performed (Kim et al., 2000) and confirmed Wheeler's prediction; conscious choices can affect previous events, as long as the events had not been consciously observed in the interim.

In “delayed choice entanglement swapping,” originally a thought experiment proposed by Asher Peres (2000); Ma et al. (2012) went a step further. Entanglement is a characteristic feature of quantum mechanics in which unified quantum particles are separated but remain somehow connected, even over distance. Measurement or perturbation of one separated-but-still-entangled particle instantaneously affects the other, what Einstein referred to (mockingly) as “spooky action at a distance.” Despite its bizarre nature, entanglement has been demonstrated repeatedly, and is the foundation for quantum cryptography, quantum teleportation and quantum computing (Deutsch, 1985). In entanglement swapping, two pairs of unified/entangled particles are separated, and one from each pair is sent to two measurement devices, each associated with a conscious observer (“Alice” and “Bob,” as is the convention in such quantum experiments). The other entangled particle from each pair is sent to a third observer, “Victor.” How Victor decides to measure the two particles (as an entangled pair, or as separable particles) determines whether Alice and Bob observe them as entangled (showing quantum correlations) or separable (showing classical correlations). This happens even if Victor decides after Alice's and Bob's devices have measured them (but before Alice and Bob consciously view the results). Thus, conscious choice affects behavior of previously measured, but unobserved, events.

How can backward time effects be explained scientifically? The problem may be related to our perception of time in classical (non-quantum) physics. Anton Zeilinger, senior author on the Ma et al. study, said: “Within a naïve classical worldview, quantum mechanics can even mimic an influence of future actions on past events.”

Time and Conscious Moments

What is time? St. Augustine remarked that when no one asked him, he knew what time was; however when someone asked him, he did not. The (“naïve”) worldview according to classical Newtonian physics is that time is either a process which flows, or a dimension in 4-dimensional space-time along which processes occur. But if time flows, it would do so in some medium or dimension (e.g., minutes per what?). If time is a dimension, why would processes occur unidirectionally in time? Yet we consciously perceive a unidirectional time-like reality. An alternative explanation is that time does not exist as process or dimension, but as a collage of discrete configurations of the universe, connected in some way by consciousness and memory (Barbour, 1999). This follows Leibniz “monads” (e.g., Rescher, 1991; c.f. Spinoza, 1677), momentary, snapshot-like arrangements of spatiotemporal reality based on Mach's principle that the universe has an underlying structure related to mass distribution (also a foundation of Einstein's general relativity). Whitehead (1929, 1933) expounded on Leibniz monads, conferring mental aspects to occasions occurring in a wider field of “proto-conscious experience” (“occasions of experience”). These views from philosophy and physics link consciousness to discrete events in the fine structure of physical reality.

Consciousness has also been seen as discrete events in psychology, e.g., James, (1890) “specious present, the short duration of which we are immediately and incessantly sensible” (though James was vague about duration, and also described a continual “stream of consciousness”). The “perceptual moment” theory of Stroud (1956) described consciousness as a series of discrete events, like sequential frames of a movie [modern film and video present 24–72 frames/s, 24–72 cycles/s, i.e., Hertz (“Hz”)]. Periodicities for perception and reaction times are in the range of 20–50 ms, i.e., gamma synchrony EEG (30–90 Hz). Slower periods, e.g., 4–7 Hz theta frequency, with nested gamma waves may correspond with saccades and visual gestalts (Woolf and Hameroff, 2001; VanRullen and Koch, 2003).

Support for consciousness as sequences of discrete events is also found in Buddhism, trained meditators describing distinct “flickerings” in their experience of pure undifferentiated awareness (Tart, 1995, pers. communication). Buddhist texts portray consciousness as “momentary collections of mental phenomena,” and as “distinct, unconnected and impermanent moments which perish as soon as they arise.” Buddhist writings even quantify the frequency of conscious moments. For example the Sarvaastivaadins (von Rospatt, 1995) described 6,480,000 “moments” in 24 h (an average of one “moment” per 13.3 ms, 75 Hz), and some Chinese Buddhism as one “thought” per 20 ms (50 Hz), both in gamma synchrony range.

Long-range gamma synchrony in the brain is the best measurable NCC. In surgical patients undergoing general anesthesia, gamma synchrony between frontal and posterior cortex is the specific marker which disappears with loss of consciousness and returns upon awakening (Hameroff, 2006). In what may be considered enhanced or optimized levels of consciousness, high frequency (more than 80 Hz) phase coherent gamma synchrony was found spanning cortical regions in meditating Tibetan monks, at the highest amplitude ever recorded (Lutz et al., 2004). Faster rates of conscious moments may correlate with subjective perception of slower time flow, e.g., as in a car accident, or altered state. But what are conscious moments? Shimony (1993) recognized that Whitehead's occasions were compatible with quantum state reductions, or “collapses of the wave function.” Several lines of evidence suggest consciousness could be identified with sequences of quantum state reductions. What exactly are quantum state reductions?

A theory of deep learning: explaining the approximation, optimization and generalization puzzles

Consciousness and Quantum State Reduction

Reality is described by quantum physical laws which appear to reduce to classical rules (e.g., Newton's laws of motion) at certain scale limits, though those limits are unknown. According to quantum physical laws:

- Objects/particles may exist in two or more places or states simultaneously—more like waves than particles and governed by a quantum wavefunction. This property of multiple coexisting possibilities is known as quantum superposition.

- Multiple objects/particles can be unified, acting as a single coherent object governed by one wavefunction. If a component is perturbed, others feel it and react, e.g., in Bose-Einstein condensation.

- If unified objects are spatially separated they remain unified. This non-locality is also known as quantum entanglement.

- But we don't see quantum superpositions in our macroscale world. How and why do quantum laws reduce to classical behavior? Various interpretations of quantum mechanics address this issue:

Copenhagen and the conscious observer: In the early days of quantum mechanics, Bohr (1934/1987) and colleagues recognized that quantum superpositions persist until measured by a device (the “Copenhagen interpretation”, after Bohr's Danish origin). Wigner (1961) and von Neumann (1932/1955) further stipulated that the superposition continues in the device until the results are observed by a conscious human, that conscious observation “collapses the wave function.” These interpretations enabled quantum experiments to flourish, but put consciousness outside science, and failed to account for fundamental reality. Schrödinger (1935) took exception, posing his famous (“Schrödinger's cat”) thought experiment in which the fate of a cat in a box is tied to a quantum superposition, reasoning that, according to the Wigner and von Neumann interpretation, the cat would remain both dead and alive until the box is opened and observed by a conscious human. Despite the absurdity, limitations on quantum superposition remain unknown.

The multiple worlds view suggests each superposition is a separation in reality, evolving to a new universe (Everett, 1957). There is no collapse, but an infinity of realities (and conscious minds) is required.

David Bohm's interpretation (Bohm and Hiley, 1993) avoids reduction/collapse by postulating another layer of reality. Matter exists as objects guided by complex “pilot” waves of possibility.

Henry Stapp (1993) views the universe as a single quantum wave function. Reduction within the brain is a conscious moment (akin to Whitehead's “occasion of experience”—Whitehead, 1929, 1933). Reduction/collapse is consciousness, but its cause and distinction between universal wave function and that within the brain are unclear.

In decoherence theory (e.g., Zurek, 2003) any interaction (loss of isolation) of a quantum superposition with a classical system (e.g., through heat, direct interaction or information exchange) erodes the quantum system. But (1) the fate of isolated superpositions is not addressed, (2) no quantum system is ever truly isolated, (3) decoherence doesn't actually disrupt superposition, just buries it in noise, and (4) some quantum processes are enhanced by heat and/or noise.

An objective threshold for quantum state reduction (OR) exists due to e.g., the number of superpositioned particles (GRW theory—Ghirardi et al., 1986) or a factor related to quantum gravity or underlying properties of spacetime geometry, as in the OR proposals of Károlyházy et al. (1986); Diȼsi (1989) and Penrose (1989, 1996). Penrose OR also includes consciousness, each OR event being associated with a moment of conscious experience.

Penrose (1989, 1994) uniquely brings consciousness into physics, and directly approaches superpositioned objects as actual separations in underlying reality at its most basic level (fundamental space-time geometry at the Planck scale of 10−33 cm). Separation is akin to the multiple worlds view in which each possibility branches to form and evolve its own universe. However according to Penrose the space-time separations are unstable and (instead of branching off) spontaneously reduce (self-collapse) to one particular space-time geometry or another. This OR self-collapse occurs at a threshold given by E = ħ/t, where E is the magnitude (gravitational self-energy) of the superposition, e.g., the number of tubulins (E is also proportional to intensity of conscious experience), ħ is Planck's constant (over 2π), and t the time interval at which superposition E will self-reduce by OR, choosing classical states in a moment of consciousness (Figure 10).

 LOCATION OR STATE OF A PARTICLE/OBJECT IS EQUIVALENT TO CURVATURE IN UNDERLYING SPACETIME GEOMETRY. From left, a superposition develops over time, e.g., a particle separating from itself, shown as simultaneous curvatures in opposite directions. The magnitude of the separation is related to E, the gravitational self-energy. At a particular time t, E reaches threshold by E = ħ/t, and spontaneous OR occurs, one particular curvature is selected. This OR event is accompanied by a moment of conscious experience (“NOW”), its intensity proportional to E. Each OR event also results in temporal non-locality, referring quantum information backward in classical time (curved arrows).

Penrose E = ħ/t is related to the Heisenberg “uncertainty principle” which asserts a fundamental limit to the precision with which values for certain pairs of physical properties can be simultaneously known. The most common examples are uncertainty in position (x) and momentum (p) of a particle, given by their standard deviations (σx and σp) whose product σxσp is the uncertainty which must meet or exceed a fundamental limit related to ħ, Planck's constant over 2π. The uncertainty principle is thus usually written as σxσp ≥ ħ/2. Uncertainty can pertain to properties other than position and momentum, and Penrose equated superposition/separation to uncertainty in the underlying structure of space-time itself. Heisenberg's uncertainty principle imposes a limit, causing quantum state reduction.

Space-time uncertainty is expressed as the gravitational self-energy E, the energy required for an object of mass m and radius r (or it's equivalent spacetime geometry) to separate from itself by a distance a. For Orch OR, E was calculated for superposition/separation of tubulin proteins at three levels, with three sets of m, r, and a. E was calculated for separation at the level of (1) the entire tubulin protein, (2) atomic nuclei within tubulin, and (3) nucleons (protons and neutrons) within tubulin atomic nuclei. Separation at the level of atomic nuclei (femtometers) was found to dominate, and used to calculate E (in terms of number of tubulins) for various values of time t corresponding with neurophysiology, e.g., 25 ms for gamma synchrony at 40 Hz. For a conscious event occurring at 25 ms, superposition/separation of 2 × 1010 tubulins are required, involving microtubules in roughly tens of thousands of neurons (Hameroff and Penrose, 1996a).

Particular states are chosen in OR due to (1) algorithmic quantum computing by the Schrödinger equation evolving toward E = ħ/t, and (2) influence in the OR process at the moment of E = ħ/t. According to Penrose, this influence, unlike randomness associated with measurement and decoherence, reflects “non-computable values” intrinsic to spacetime geometry. Thus conscious choices in OR (and Orch OR) are neither random nor algorithmically deterministic.

Quantum state reductions are essential to quantum computing which involves superposition of information states, e.g., both 1 and 0 (quantum bits, or “qubits”). Superpositioned qubits entangle and compute (by the Schrödinger equation) until reduction/collapse of each qubit to classical values (“bits”) occurs as the solution. In technological quantum computers, reduction occurs by measurement/observation, introducing a component of randomness. Superposition, entanglement and reduction are also essential to quantum cryptography and quantum teleportation technologies (Bennett and Wiesner, 1992; Bouwmeester et al., 1997; Macikic et al., 2002). Entanglement implies non-locality, e.g., that complementary quantum particles (electrons in coupled spin-up and spin-down pairs) remain somehow connected when spatially (or temporally) separated, each pair member reacting instantaneously to perturbation of its separated partner. Einstein initially objected to entanglement, as it would appear to require signaling faster than light, and thus violate special relativity. He famously termed it “spooky action at a distance,” and described a thought experiment (“Einstein, Podolsky, and Rosen (EPR)”; Einstein et al., 1935) in which each member of an entangled pair of superpositioned electrons (“EPR pairs”) would be sent in different directions, each remaining in superposition and entangled. When one electron was measured at its destination and, say, spin-up was observed, its entangled twin miles away would, according to the prediction, correspondingly reduce instantaneously to spin-down when measured. The issue was unresolved at the time of Einstein's death, but since the early 1980s (Aspect et al., 1982; Tittel et al., 1998) this type of experiment has been repeatedly confirmed through wires, fiber optic cables and via microwave beams through atmosphere. Strange as it seems, EPR entanglement is a fundamental feature of quantum mechanics and reality. How can it be explained?

Penrose (1989; 2004, cf. Bennett and Wiesner, 1992) suggested quantum entanglements are not mediated in a normal causal way, that non-local entanglement (quantum information, or “quanglement,” as Penrose terms it) should be thought of as able to propagate in either direction in time (into the past or into the future). Along similar lines, Aharonov and Vaidman (1990) also proposed that quantum state reductions send quantum information both forward and backward in what we perceive as time, “temporal non-locality.” However it is generally agreed that quantum information going backward in time cannot, by itself, communicate or signal ordinary classical information; it is “acausal.” This restriction is related to elimination of possible causal paradox (e.g., signaling backward in time to kill one's ancestor, paradoxically preventing one's birth). Indeed quantum information going forward in time is also considered acausal, unable to signal classical information either. In quantum cryptography and teleportation, acausal quantum information can only influence or correlate with classical information, but nonetheless greatly enhance capabilities of causal, classical processes.

Penrose suggested acausal backward time effects used in conjunction with classical channels could influence classical results in a way unattainable by classical, future-directed means alone, and that temporal non-locality and acausal backward time effects were essential features of entanglement. He suggested that in EPR (Figure 11), quantum information/quanglement from the measurement/state reduction moves backward in (what we “naively” perceive as classical) time to the unified pair, then to the complementary twin, influencing and correlating its state when measured. Can quantum backward referral happen in the brain?

BACKWARD TIME IN EPR ENTANGLEMENT. The Einstein-Podolsky-Rosen (EPR) experiment verified by Aspect et al. (1982); Tittel et al. (1998), and many others. On the left is an isolated, entangled pair of superpositioned complementary quantum particles, e.g., two electrons in spin up and spin down states. The pair is separated and sent to two different, spatially-separated locations/measuring devices. The single electron at the top (in superposition of both spin up and spin down states) is measured, and reduces to a single classical state (e.g., spin down). Instantaneously its spatially-separated twin reduces to the complementary state of spin up (or vice versa). The effect is instantaneous over significant distance, hence appears to be transmitted faster than the speed of light. According to Penrose (2004; cf. Bennett and Wiesner, 1992), measurement/reduction of the electron at the top sends quantum information backward in time to the origin of the unified entanglement, then onward to the twin electron. No other reasonable explanation has been put forth.

Open symposium on scientific theories of consciousness: Integrated Information Theory

Orchestrated Objective Reduction (Orch OR)

Penrose put forth OR as a mechanism for consciousness in physical science (the first, and still only specific proposal). For neurobiological implementation of OR, the Penrose–Hameroff model of “Orch OR” proposed quantum computations terminated by OR in microtubules within brain neurons, “orchestrated” by synaptic inputs, memory and other factors, hence “Orch OR” (Penrose and Hameroff, 1995, 2011; Hameroff and Penrose, 1996a,b; Hameroff, 1998, 2007). Starting with classical microtubule automata (e.g., Rasmussen et al., 1990) in which tubulins in microtubule lattices convey interactive bit states, e.g., of 1 or 0, and are thus capable of classical information processing (Figure 5B), Orch OR also proposed that quantum superpositioned tubulin bits, or “qubits,” e.g., of both 1 AND 0 compute via entanglement with tubulins in the same neuron, and also those in neighboring and distant neurons via gap junctions (Figure 12). The quantum computations evolve by the Schrödinger equation in entangled microtubules in dendrites and cell bodies during integration phases of gap junction-connected integrate-and-fire neurons. Entangled superpositions contribute to increasing gravitational self-energy E. When threshold is met by E = ħ/t, a conscious moment occurs as entangled tubulin qubits simultaneously undergo OR to classical tubulin states which then proceed to trigger (or not trigger) axonal firings, and adjust synapses. Microtubule quantum computations can thus be the “x-factor” in integration regulating axonal firing threshold. Compatible with known neurophysiology, Orch OR can account for conscious causal control of behavior.

 THREE TOY NEURONS IN AN INPUT/INTEGRATION LAYER. Adjacent dendrites are connected by gap junction electrical synapses in “dendritic web,” showing internal cytoskeletal microtubules connected by microtubule-associated proteins. Insert: communication/correlation between microtubules through gap junctions by electromagnetic or quantum entanglement, enabling collective integration among gap junction-connected, synchronized neurons and glia.

Entangled superpositions leading to OR and moments of consciousness by E = ħ/t are seen as sequential, only one “consciousness” occurring in the brain at any one time (except perhaps for “split-brain” patients, or those with other cognitive disorders). Superpositions outside the largest, most rapidly evolving gap junction-connected web may decohere randomly, or continue and participate in a subsequent moment of consciousness. The results of each Orch OR conscious moment set initial conditions for the next.

By E = ħ/t, superposition of about 2 × 1010 tubulins would reach threshold at t = 25 ms, as in 40 Hz gamma synchrony, 40 conscious moments/s. Depending on the percentage of tubulins involved per neuron, this would entail thousands to hundreds of thousands of gap junction-connected neurons per conscious moment at 40 Hz as the NCC (Figure 12). With specific neuronal distributions and brain regions defined by gap junction openings and closings, synchronized “dendritic webs” as the NCC can move and redistribute moment to moment. Within the NCC, consciousness by E = ħ/t may occur on a spectrum of frequencies, at different fractal-like scales of brain activity (He et al., 2010), with deeper order, finer scale entangled processes in microtubules correlating with high frequency, high intensity experience, and larger proportions of brain involvement.

Proteins can act as quantum levers, able to amplify quantum effects into particular classical states (Conrad, 1994). Orch OR suggests that tubulin states and superpositions are initiated by electron cloud dipoles (van der Waals London forces) in clusters of aromatic resonance rings (e.g., in amino acids tryptophan, phenylalanine, tyrosine, Figures 13A–C). London force dipoles are inherently quantum mechanical, tending to superposition. They also mediate effects of general anesthetic gases which act in aromatic clusters (“hydrophobic pockets”) in neuronal proteins including tubulin to selectively erase consciousness (Hameroff, 2006). This suggests a deeper order, finer scale component of the NCC.

(A) A microtubule, a cylindrical lattice of peanut-shaped tubulin proteins, with molecular model of enlarged single tubulin with C-termini tails (Craddock et al., 2012c). (B) Tubulin dimer, lower C terminus tail visible. Interior blowup shows aromatic rings clustered in a linear groove, and further blowup of ring structures. (C) Approximate locations of resonance rings suggesting trans-tubulin alignments (see Figure 14A).

Electron movements of one nanometer, e.g., in a London force dipole oscillation, displace atomic nuclei by one Fermi length, 10−15 m, the diameter of a carbon atom nucleus (Sataric et al., 1998), and also the superposition separation distance required for gravitational self-energy E in Orch OR (Hameroff and Penrose, 1996a,b). Thus London forces can induce superposition of an entire protein/tubulin mass, albeit by an extremely tiny separation distance. Nonetheless the protein-level (rather than electron only) superposition separation engenders significant gravitational self-energy E, and thus by E = ħ/t, usefully brief durations of time t for conscious moments and actions.

Orch OR has been criticized on the basis of decoherence in the “warm, wet and noisy” brain, preventing superposition long enough to reach threshold (Tegmark, 2000; cf. Hagan et al., 2001). But subsequently plant proteins have been shown to routinely use electron superposition for chemical energy (Engel et al., 2007). Further research has demonstrated warm quantum effects in bird brain navigation (Gauger et al., 2011), ion channels (Bernroider and Roy, 2005), sense of smell (Turin, 1996), DNA (Rieper et al., 2011), protein folding (Luo and Lu, 2011), and biological water (Reiter et al., 2011). Microtubules (Sahu et al., 2012) appear to have kilohertz and megahertz resonance related to enhanced (?quantum) conductance through spiral pathways.

Conductance pathways through aromatic ring arrays in each tubulin aligned with neighbor tubulin arrays following spiral geometry in microtubule lattices (Figure 14A) allow helical macroscopic “quantum highways” through microtubules (Figure 14A) suitable for topological quantum computing (Kitaev, 1997; Hameroff et al., 2002; Penrose and Hameroff, 2011). With particular spiral pathways as topological qubits (“braids”) rather than individual tubulins, overall microtubule information capacity is reduced, each topological bit/qubit pathway requiring many tubulins (Figure 14B, Bottom). But topological qubits are robust, resist decoherence, and reduce to classical helical pathways (or combinations) which can, with each conscious moment, regulate synapses and trigger axonal firings.

 (A) Alignment of aromatic ring structures in tubulins and through microtubule lattice suggests different helical pathways, possible macroscopic “quantum highways” e.g., following the Fibonacci sequence in the A lattice. (B) Top: superpositioned tubulins (gray) increase through first three steps (neuronal integration) until threshold is met by E = ħ/t, resulting in Orch OR, a conscious moment, and selection of classical tubulin states which may trigger axonal firing. (B) Bottom: same as (A), but with topological qubits, i.e., different helical pathways represent information. One particular pathway is selected in the Orch OR conscious moment.

 Two Orch OR conscious moments underlie gamma synchrony electrophysiology in an integrate-and-fire neuron. Quantum superposition E evolves during integration, increasing with time until threshold is met at E = ħ/t (t = 25 ms), at which instant an Orch OR conscious moment occurs (intensity proportional to E), and classical states of tubulin are selected which can trigger (or not trigger) axonal firings which control actions and behavior (as well as regulate synaptic strength and record memory).

TWO ORCH OR EVENTS (SOLID LINES) UNDERLIE INTEGRATE-AND-FIRE ELECTROPHYSIOLOGY (DOTTED LINES) IN NEURONS. Orch OR and conscious moments occur here at t = 25 ms (gamma synchrony), with E then equivalent to superposition of approximately 2 × 1010 tubulins. Each Orch OR moment occurs with conscious experience, and selects tubulin states which can then trigger axonal firings. Each Orch OR event can also send quantum information backward in perceived time.

Each Orch OR quantum state reduction also causes temporal non-locality, sending quantum information/quanglement (with gravitational self-energy E) backward in what we perceive as classical time, integrating with forward-going E to help reach E = ħ/t, perhaps earlier than would otherwise occur (Figure 2B). As described previously, Orch OR temporal non-locality and backward time referral of quantum information can provide real-time conscious causal control of voluntary actions (Figure 6; cf. Wolf, 1998; Sarfatti, 2011).

Do backward time effects risk causal paradox? In classical physics, the cause of an effect must precede it. But backward-going quanglement is acausal, only able to influence or correlate with information in a classical channel, e.g., as occurs in quantum entanglement, cryptography and teleportation. And according to some quantum interpretations, backward time effects can't violate causality if they only alter past events whose subsequent effects had not been consciously observed (“If a tree falls ….”). In the experimental studies cited here (Libet, pre-sentiment/Bem, delayed choice) backward referral itself is non-conscious (though Libet refers to it as “subjective experience”) until reduction occurs in the present. There is no causal paradox.

If conscious experience is indeed rooted in Orch OR, with OR relating the classical to the quantum world, then temporal non-locality and referral of acausal quantum information backward in time is to be expected (Penrose and Hameroff, 2011). Temporal non-locality and backward time referral can rescue causal agency and conscious free will.

David Baker, University of Washington | Machine Learning Workshop

Conclusion: How Quantum Brain Biology can Rescue Conscious Free Will

Problems regarding conscious “free will” include: (1) the need for a neurobiological mechanism to account for consciousness and causal agency, (2) conscious perceptions apparently occurring too late for real-time conscious responses, and (3) determinism. Penrose–Hameroff “Orch OR” is a theory in which moments of conscious choice and experience are identified with quantum state reductions in microtubules inside neurons. Orch OR can help resolve the three problematic issues in the following ways.

A Mechanism for Consciousness and Causal Agency

Orch OR is based on sequences of quantum computations in microtubules during integration phases in dendrites and cell bodies of integrate-and-fire brain neurons linked by gap junctions. Each Orch OR quantum computation terminates in a moment of conscious experience, and selects a particular set of tubulin states which then trigger (or do not trigger) axonal firings, the latter exerting causal behavior. Orch OR can in principle account for conscious causal agency.

Does Consciousness Come Too Late?

Brain electrical activity appearing to correlate with conscious perception of a stimulus can occur after we respond to that stimulus, seemingly consciously. Accordingly, consciousness is deemed epiphenomenal and illusory (Dennett, 1991; Wegner, 2002). However evidence for backward time effects in the brain (Libet et al., 1983; Bem, 2012; Ma et al., 2012), and in quantum physics (e.g., to explain entanglement, Penrose, 1989, 2004; Aharonov and Vaidman, 1990; Bennett and Wiesner, 1992) suggest that quantum state reductions in Orch OR can send quantum information backward in (what we perceive as) time, on the order of hundreds of milliseconds. This enables consciousness to regulate axonal firings and behavioral actions in real-time, when conscious choice is felt to occur (and actually does occur), thus rescuing consciousness from necessarily being an epiphenomenal illusion.

Determinism

Is the universe unfolding (in which case free will is possible), or does it exist as a “block universe” with pre-determined world-lines, our actions pre-determined by algorithmic processes? In Orch OR, consciousness unfolds the universe. The selection of states, according to Penrose, is influenced by a non-computable factor, a bias due to fine scale structure of spacetime geometry. According to Orch OR, conscious choices are not entirely algorithmic.

Orch OR is a testable quantum brain biological theory compatible with known neuroscience and physics, and able to account for conscious free will.

More Information:

https://mappingignorance.org/2015/06/17/on-the-quantum-theory-of-consciousness/

https://www.frontiersin.org/articles/10.3389/fnint.2012.00093/full

https://en.wikipedia.org/wiki/Orchestrated_objective_reduction

https://www.quantamagazine.org/a-new-spin-on-the-quantum-brain-20161102/

https://thenextweb.com/news/reverse-engineering-consciousness-is-the-brain-a-quantum-computer

https://www.nature.com/articles/s41586-021-03588-y

https://www.science.org/stoken/author-tokens/st-191/full

https://quantumai.google

https://www.tensorflow.org/quantum

https://quantumai.google/team

https://quantumai.google/research/conferences

https://github.com/tensorflow/quantum/tree/research



























Superconducting Transmon Qubit

$
0
0

 

What are transmon qubits

In quantum computing, and more specifically in superconducting quantum computing, a transmon is a type of superconducting charge qubit that was designed to have reduced sensitivity to charge noise. The transmon was developed by Robert J. Schoelkopf, Michel Devoret, Steven M. Girvin, and their colleagues at Yale University in 2007.[1][2] Its name is an abbreviation of the term transmission line shunted plasma oscillation qubit; one which consists of a Cooper-pair box "where the two superconductors are also capacitatively shunted in order to decrease the sensitivity to charge noise, while maintaining a sufficient anharmonicity for selective qubit control".[3]

The transmon achieves its reduced sensitivity to charge noise by significantly increasing the ratio of the Josephson energy to the charging energy. This is accomplished through the use of a large shunting capacitor. The result is energy level spacings that are approximately independent of offset charge. Planar on-chip transmon qubits have T1 coherence times ~ 30 μs to 40 μs.[5] By replacing the superconducting transmission line cavity with a three-dimensional superconducting cavity, recent work on transmon qubits has shown significantly improved T1 times, as long as 95 μs.[6][7] These results demonstrate that previous T1 times were not limited by Josephson junction losses. Understanding the fundamental limits on the coherence time in superconducting qubits such as the transmon is an active area of research.


The transmon qubit | QuTech Academy

What is/are Transmon Qubit?

TRANSMON QUBIT - Small-scale quantum processors, built on the superconducting transmon qubit, demonstrated already the anticipated quantum speed-up. [1] In this work, we have realized a quantum metamaterial consisting of eight individually controllable superconducting transmon qubits, which are coupled to the mode continuum of a one-dimensional coplanar waveguide. [2] The device consists of a superconducting transmon qubit coupled to the open end of a transmission line. [3] Here, we investigate the impact of the intrinsic properties of two-dimensional transmon qubits on quasiparticle tunneling (QPT) and discuss how we can use quasiparticle dynamics to gain critical information about the quality of JJ barrier and device performance. [4] A simplified model of a transmon qubit coupled to a cavity resonator is used to demonstrate a quantum circuit. [5] The superconducting transmon qubit is currently a leading qubit modality for quantum computing, but gate performance in quantum processor with transmons is often insufficient to support running complex algorithms for practical applications. [6] We show how recent advances in circuit quantum electrodynamics, specifically, the realization of galvanic coupling of a transmon qubit to a high-impedance transmission line, allows the observation of inelastic collisions of single microwave photons with instantons (phase slips). [7] In this paper, we present simulation results of “hexagonal” transmon qubit in a superconducting coplanar waveguide (CPW) resonator. [8] Particularly for superconducting transmon qubits, this leakage opens a path to errors that are correlated in space and time. [9] The superconducting transmon qubit is a leading platform for quantum computing and quantum science. [10] We investigate transmon qubits made from semiconductor nanowires with a fully surrounding superconducting shell. [11] We find quantum memory, particularly resonant cavities with transmon qubits arranged in a 2. [12] We have demonstrated a novel type of superconducting transmon qubit in which a Josephson junction has been engineered to act as its own parallel shunt capacitor. [13] Quantum heat transfer through a generic superconducting set-up consisting of a tunable transmon qubit placed between resonators that are termined by thermal reservoirs is explored. [14] This work begins to address this by providing a new mathematical description of a commonly used circuit QED system, a transmon qubit coupled to microwave transmission lines. [15] Here, we use the quarton to yield purely nonlinear coupling between two linearly decoupled transmon qubits. [16] By modulating the flux through a transmon qubit, we realize a swap between the qubit and its readout resonator that suppresses the excited state population to 0. [17] To assess scalability, we identify the types of “frequency collisions” that will impair a transmon qubit and cross-resonance gate architecture. [18] As an example, we simulate a 2D square array of 100 superconducting transmon qubits. [19] Here, we combine measurements of transmon qubit relaxation times (T1) with spectroscopy and microscopy of the polycrystalline niobium films used in qubit fabrication. [20] Here, we show a hybrid device, consisting of a superconducting transmon qubit and a mechanical resonator coupled using the magnetic-flux. [21] There was a recent demonstration of a microwave quantum memory using microwave cavities coupled with a transmon qubit. [22] Doing so, we measure a transmon qubit using a single, chip-scale device to provide both parametric amplification and isolation from the bulk of amplifier backaction. [23] As an illustration of our method, we show results of the optimization of a two-qubit gate using transmon qubits in the circuit QED architecture. [24] Specifically, we consider using transition edge sensors colocated on silicon substrates hosting superconducting qubits to monitor for energy injection from ionizing radiation, which has been demonstrated to increase decoherence in transmon qubits. [25] 

IQM - Unimon qubit

Here, we demonstrate a novel coupling architecture for transmon qubits that circumvents the standard relationship between desired and undesired interaction rates. [26] Specifically, we investigate the intermediate bistable regime of the generalized Jaynes-Cummings Hamiltonian (GJC), realized by a circuit quantum electrodynamics (cQED) system consisting of a transmon qubit coupled to a microwave cavity. [27] Superconducting transmon qubits are the leading platform in solid-state quantum computing and quantum simulation applications. [28] We characterize highly coherent transmon qubits fabricated with a direct-write photolithography system. [29] We experimentally test this entropic uncertainty relation with strong and weak measurements of a superconducting transmon qubit. [30] Advanced optical lithography is an alternative patterning method, and we report on the development of transmon qubits patterned solely with optical lithography. [31] We treat the example of a transmon qubit coupled to a stripline resonator. [32] Here we employ numerically exact methods to study realistic implementations of a transmon qubit embedded in electromagnetic environments focusing on the most important system-reservoir correlation effects such as the Lamb shift and entanglement. [33] Such features make our pulses an effective alternative as the spectral hole-burning pulses to reduce the number of repetitions of pulses, and can also be applied to initialize the multiple qubits where the driving frequency varies with time or position in superconducting transmon qubit systems. [34] We theoretically study the transmission properties and the PT -symmetry in a hybrid quantum electromechanical system consisting of a coplanar-waveguide (CPW) microwave cavity, a nanomechanical resonator (NAMR) and a superconducting transmon qubit, where the qubit works in the large detuning condition with both the CPW microwave cavity and the NAMR. [35] Based on such junctions, MoS2 transmon qubits are engineered and characterized in a bulk superconducting microwave resonator for the first time. [36] A transmon qubit in the 3-dimensional microwave cavity is a versatile system for various circuit QED experiments. [37] Most importantly this kind of transducer gives us new possibilities for quantum information processing, since it can be used as a superconducting transmon qubit, which can be simultaneously coupled to two acoustic cavities. [38] We theoretically investigate resonant dipole-dipole interaction (RDDI) between artificial atoms in a 1D geometry, implemented by N transmon qubits coupled through a transmission line. [39] Here we present an experimental approach to fabricate mm-wave superconducting resonators that could be combined with transmon qubits and used in future microwave-mm-wave converters that distribute entanglement at a high rate in low-loss quantum networks. [40] The theory agrees well with experimental results for continuous measurement of a transmon qubit. [41] We argue that this need not always be the case, and consider a modification to a leading quantum sampling problem-- time evolution in an interacting Bose-Hubbard chain of transmon qubits [Neill et al, Science 2018] -- where each site in the chain has a driven coupling to a lossy resonator and particle number is no longer conserved. [42] The first theme concerns the experimental realisation of a tuneable coupling scheme, giving rise to different interactions with adjustable ratios, between two transmon qubits. [43] We find that defects at circuit interfaces are responsible for about 60% of the dielectric loss in the investigated transmon qubit sample. [44] We implement the quantum approximate optimization algorithm on our hardware platform, consisting of two superconducting transmon qubits and one parametrically modulated coupler. [45] Superconducting transmon qubits are of great interest for quantum computing and quantum simulation. [46] We refine the method of delayed vectors, adapted from classical chaos theory to quantum systems, and apply it remotely on the IBMQ platform -- a quantum computer composed of transmon qubits. [47] Here, we use the cross-resonance interaction to implement a gate between two superconducting transmon qubits with a direct static dispersive coupling. [48] A common approach to realize conditional-phase (CZ) gates in transmon qubits relies on flux control of the qubit frequency to make computational states interact with non-computational ones using a fast-adiabatic trajectory to minimize leakage. [49] We use four transmon qubits coupled to individual rectangular cavities which are aperture-coupled to a common rectangular waveguide feedline. [50]

Superconducting Transmon Qubit

Small-scale quantum processors, built on the superconducting transmon qubit, demonstrated already the anticipated quantum speed-up. [1] In this work, we have realized a quantum metamaterial consisting of eight individually controllable superconducting transmon qubits, which are coupled to the mode continuum of a one-dimensional coplanar waveguide. [2] The device consists of a superconducting transmon qubit coupled to the open end of a transmission line. [3] The superconducting transmon qubit is currently a leading qubit modality for quantum computing, but gate performance in quantum processor with transmons is often insufficient to support running complex algorithms for practical applications. [4] Particularly for superconducting transmon qubits, this leakage opens a path to errors that are correlated in space and time. [5] The superconducting transmon qubit is a leading platform for quantum computing and quantum science. [6] We have demonstrated a novel type of superconducting transmon qubit in which a Josephson junction has been engineered to act as its own parallel shunt capacitor. [7] As an example, we simulate a 2D square array of 100 superconducting transmon qubits. [8] Here, we show a hybrid device, consisting of a superconducting transmon qubit and a mechanical resonator coupled using the magnetic-flux. [9] Superconducting transmon qubits are the leading platform in solid-state quantum computing and quantum simulation applications. [10] We experimentally test this entropic uncertainty relation with strong and weak measurements of a superconducting transmon qubit. [11] Such features make our pulses an effective alternative as the spectral hole-burning pulses to reduce the number of repetitions of pulses, and can also be applied to initialize the multiple qubits where the driving frequency varies with time or position in superconducting transmon qubit systems. [12] We theoretically study the transmission properties and the PT -symmetry in a hybrid quantum electromechanical system consisting of a coplanar-waveguide (CPW) microwave cavity, a nanomechanical resonator (NAMR) and a superconducting transmon qubit, where the qubit works in the large detuning condition with both the CPW microwave cavity and the NAMR. [13] Most importantly this kind of transducer gives us new possibilities for quantum information processing, since it can be used as a superconducting transmon qubit, which can be simultaneously coupled to two acoustic cavities. [14] We implement the quantum approximate optimization algorithm on our hardware platform, consisting of two superconducting transmon qubits and one parametrically modulated coupler. [15] Superconducting transmon qubits are of great interest for quantum computing and quantum simulation. [16] Here, we use the cross-resonance interaction to implement a gate between two superconducting transmon qubits with a direct static dispersive coupling. [17] Our implementation is based on a three-level superconducting transmon qubit dispersively coupled to two cavities. [18]

Transmon qubit coupled

The device consists of a superconducting transmon qubit coupled to the open end of a transmission line. [1] A simplified model of a transmon qubit coupled to a cavity resonator is used to demonstrate a quantum circuit. [2] This work begins to address this by providing a new mathematical description of a commonly used circuit QED system, a transmon qubit coupled to microwave transmission lines. [3] Specifically, we investigate the intermediate bistable regime of the generalized Jaynes-Cummings Hamiltonian (GJC), realized by a circuit quantum electrodynamics (cQED) system consisting of a transmon qubit coupled to a microwave cavity. [4] We treat the example of a transmon qubit coupled to a stripline resonator. [5]

An alternative superconducting qubit achieves high performance for quantum computing

Quantum computers, devices that exploit quantum phenomena to perform computations, could eventually help tackle complex computational problems faster and more efficiently than classical computers. These devices are commonly based on basic units of information known as quantum bits, or qubits.

Researchers at Alibaba Quantum Laboratory, a unit of Alibaba Group's DAMO research institute, have recently developed a quantum processor using fluxonium qubits, which have so far not been the preferred choice when developing quantum computers for industry teams. Their paper, published in Physical Review Letters, demonstrates the potential of fluxonium for developing highly performing superconducting circuits.

"This work is a critical step for us in advancing our quantum computing research," Yaoyun Shi, Director of Alibaba's Quantum Laboratory, told Phys.org. "When we started our research program, we decided to explore fluxonium as the building block for future quantum computers, deviating from the mainstream choice of the transmon qubit. We believe that this relatively new type of superconducting qubit could go much further than transmon."

While some past studies had already explored the potential of quantum processors based on fluxonium qubits, most of them primarily offered proofs of concept, which were realized in university labs. For these "artificial atoms" to be implemented in real quantum computers and compete with transmons (i.e., widely used qubits), however, they would need to demonstrate a high performance on a wide range of operations, within a single device. This is precisely the key objective of this work.

Fluxonium qubits have two characteristics that set it apart from transmons: their energy levels are far more uneven (i.e., "anharmonic") and they use a large inductor to replace the capacitor used in transmon. Both contribute to fluxonium's advantage, at least theoretically, in being more resilient to errors, leading to better "coherence," i.e., holding quantum information for a longer time, and "higher fidelity," i.e., accuracy, in realizing elementary operations.

"One can picture the energy levels forming a ladder," Chunqing Deng, who led the study, explained. "The energy gaps are important, because each quantum instruction has a 'pitch,' or frequency, and it triggers transitions between two levels when the pitch matches their energy gaps."

Essentially, when the first two energy gaps between levels are closed, as they are in transmon, a "call" for the transition between the first two energy levels (i.e., "0" and "1" states), can accidentally also trigger transitions between the second and third level. This can bring the state outside of the valid computational space, leading to what is known as a leakage error. In fluxonium, on the other hand, the distance separating the second and third energy "steps" is greater, which reduces the risk of leakage errors.

"In principle, the design of fluxonium is simple: it consists of two elementary components—a 'Josephson junction' shunted with a large inductor, which is similar, in fact, to that of a transmon, which is a Josephson junction shunted with a capacitor," Chunqing said. "The Josephson junction is the magical component that creates anharmonicity in the first place. The large inductor is often, as in our case as well, implemented by a large number (in our work, 100) of Josephson junctions."

Replacing the capacitor with an inductor in fluxonium removes the "islands" resulting from the electrodes and the source of "charge noises" caused by electron charge fluctuations, thus making fluxonium more error-proof. This is, however, at the expense of much more demanding engineering, due to the large array of Josephson junctions.

Introduction to Fluxonium-like devices & Blochnium - Vladimir Manucharyan

Fluxonium's advantage in high coherence can be greatly amplified for achieving high gate fidelities if the gates use a short time. Such fast gates are indeed achieved through the "tunability" feature demonstrated by the researchers. More precisely, the energy gap or "frequency" between the "0" and "1" states, can be rapidly changed, so that two qubits can be quickly brought to be "in resonance," that is, having the same frequency. Being in resonance is when the two qubits evolve together to realize the most critical building block of a quantum computer—2-qubit gates.

In initial tests, the quantum platform designed by Chunqing and his colleagues was found to attain an average single-qubit gate fidelity of 99.97% and a two-qubit gate fidelity of up to 99.72%. These values are comparable to some of the best results achieved by quantum processors in previous studies. Besides single- and two-qubit gates, the team also integrated, in a robust manner, other basic operations needed for a digital quantum computer—reset and readout.

The 2-qubit processor developed by this team of researchers could open new possibilities for the use of fluxonium in quantum computing, as it significantly outperformed other proof-of-concept processors introduced in the past. Their work could inspire other teams to develop similar designs, substituting transmon with fluxonium qubits.

"Our study introduces an alternative choice to the widely adapted transmon," Chunqing said. "We hope that our work will inspire more interest in exploring fluxonium, so that its full potential can be unlocked to achieve a significantly higher performance in fidelity, which will in turn significantly reduce the overhead of realizing fault-tolerance quantum computing. What this means is that, for the same computational task, a higher-fidelity fluxonium quantum computer may need significantly fewer number of qubits."

Essentially, Chunqing and his colleagues showed that fluxonium-based processors could perform far more powerful computations than transmon-based ones, using the same number of physical qubits. In their next studies, the team would like to scale up their system and try to make it fault-tolerant while retaining a high fidelity.

"We now plan to validate our hypothesis that fluxonium is indeed a much better qubit than transmon and then march towards the community's next major milestone of realizing fault-tolerance, using ultra-high fidelity flxuonium qubits," Yaoyun added. "We believe fluxonium has the potential to be more widely recognized, as we are not even close to any theoretical limit of high-fidelity operation yet. It is important to keep pushing this direction."

Introduction to Transmon Physics

Introduction to Transmon Qubits and Qiskit Pulses

IBM held a Quantum Computing Challenge from May 20th to May 27th. This year was special as it marked the 40th anniversary of the Physics of Computation Conference and also the 5th anniversary of IBM Quantum putting up a Quantum Computer on the cloud. The best thing about participating in IBM challenges is undoubtedly the galore of learning and experience that comes with it. This is my first medium blog and I couldn’t think of anything but Quantum Computing to start with. All the challenge questions can be seen here. I would like to mention that a lot of my understanding is based on Zlatko Minev’s lectures on this topic in 2020 Qiskit Global Summer School. There were many things he covered which I am yet to understand well, but those lectures are absolutely amazing. You could find them here. A lot of images in this blog have been taken from the slides presented in QGSS 2020.

Through this blog, I am attempting to explain, to the extent I have understood, some simple Physics behind Transmon qubits and then, dive into Qiskit Pulses. I would like to urge you to pitch in with comments in case some aspects could be explained better and I am definitely looking forward to feedback on how I could improve this blog. Challenge 4 from IBM Quantum Challenge 2021 was about Qiskit Pulses. Coincidentally, I got very interested in understanding different quantum hardware architectures about a couple of months back when IBM and many other companies published blogs on how they want users to understand and contribute to Quantum hardware. Honestly, I had never imagined I would get this interested in Quantum Physics before.

I would like to mention that specifically to do this challenge, all the details that I am going to write down in this blog, might not necessarily be required. My attempt here is to explain details that might help us get more insights on or at least get a big picture of what Superconducting (Transmon) qubits are and what Qiskit Pulses help us achieve, with codes. The blog will be theory in the beginning and then, we would have some codes. I am expecting it to be a lengthy blog, but I will try my best to make it as interesting for you as it is for me. So, let’s get started:

So far, most of us have seen quantum circuits at a very high level. With Qiskit, we build circuits using pre-defined gates like Pauli gates (X, Y, Z), rotations (rx, ry, rz), Hadamard gate for a single qubit and controlled rotations for 2 or more qubits, for eg: CNOT, CU, Toffoli (which you also saw in Challenge 1). Simply speaking, Qiskit pulses helps us do these operations at the level of this fancy looking device below, through an interface:

I will be covering some minor details about this device, but for now, what I mean is that though some properties are intrinsic to the materials and components being used in this fridge, what essentially a user does with Qiskit Pulses is, to control what happens in the wires in the fridge. You could think of it as you being given access to some control knobs and you decide how to move those knobs based on what signals you want to pass through those wires. So if you want to rotate a qubit by some angle, you would want to pass a signal with certain attributes (frequency, amplitude, duration) so that the qubit actually rotates by that angle. Qubits don’t really exist. We are building a system where we can mimic quantum behavior and a qubit. So at the hardware level, we will soon be seeing what it actually means to rotate a qubit on a Bloch sphere, but for that, we would have to get into some Physics. I am attempting to explain with not much prior knowledge of Physics. Perhaps, some Physics you might have learnt in your high school would help, but it is not required.

Fluxonium qubits for ultra-high-fidelity and scalable quantum processors - Chunqing Deng

Building Intuition for Quantum Systems and Quantum Harmonic Oscillator

Let’s deviate from the device for now. Have you heard about a Simple Harmonic Oscillator or Simple Harmonic Motion? If you studied Physics in high school, you might have come across it. Even if you haven’t, it doesn’t matter and you could still continue reading. Oscillator, as most of us might have understood, is just something that oscillates. The most famous example of a simple harmonic motion can be seen in the gif below, a mass tied to a spring moving back and forth. Why is it called simple harmonic? It is simple because we are not considering any external forces that usually come into picture, for eg: friction of the surface. If friction acted, you might have guessed that the movement of the spring would eventually get restricted and stop after a point. Typically, some external force like you or I would pull the string to an end and release it, after which the oscillations begin.

Simple Harmonic Oscillator (Spring experiment)

You might be wondering why we are talking about Simple Harmonic Oscillator. The idea is to build our intuition for something called Quantum Harmonic Oscillator. Let’s understand the Simple Harmonic Oscillator.

Spring experiment (Potential Energy)

Ep in the image above is called Potential Energy. You can think of Potential Energy as an energy that an object contains due to its position and arrangement. A simple example is a ball held up in the air, the higher its level from the ground, the greater the potential energy (PE). Another important type of energy is Kinetic Energy (KE) that an object possesses due to its motion. In the equation describing PE of the system in the image above, k is a spring constant (we don’t need to dive deeper into that. In Science, there are many constants. Mostly, scientists came up with these constants through experiments). The equation should remind you of a famous curve in Mathematics. If you guessed Parabola, you are right. The relation between PE and displacement x of the object tied to the spring from its natural position (called equilibrium) forms a parabola. You could plot PE for different values of x and you should be getting a curve like below. One point to note in the above image is the direction of the Force (F) and displacement (x). The force acts in the opposite direction to x since the object naturally resists displacement from the equilibrium or natural position, this is a common phenomenon you might have observed in the nature.

Relation between Displacement and P.E

 Well, the PE changes, but we also discussed KE above. This brings me to a very important law in Physics called the Conservation of Energy according to which the total energy of an isolated system remains conserved. In reality, there are many types of energies including and apart from PE and KE associated to a system, but in our example of the spring and object system, we can consider PE and KE to be the only ones. So, Total Energy in our system = PE + KE. What this means is that energy of a system can neither be created, nor destroyed. It can only be transformed from one type to the other. So in our experiment, PE is highest at maximum possible displacement from the equilibrium and at such a displacement, the KE is nearly 0, but as the spring rushes to bring back the object to equilibrium, KE increases and KE is maximum at the equilibrium position, whereas PE is close to 0. Again, why am I discussing all this? This is important to understand a very important concept when we discuss about Quantum Systems, called the Hamiltonian operator. The Hamiltonian keeps coming up in Quantum Mechanics and even in Quantum Computing. You will see that it is a very important concept even in Challenge 5. Simply speaking, Hamiltonian is the total energy of the system, but we will discuss more about it later in this blog.

Total Energy is conserved (Classical system, Spring)

Quantum Systems and Quantum Harmonic Oscillator

Now, let’s finally discuss about Quantum systems. One thing to note is that many things are simpler to understand in Quantum Mechanics when we can think of analogies in Classical Physics and that’s the reason, we started with a Simple Harmonic Oscillator.

When we talk about Quantum Systems, we are talking about systems consisting of objects like atoms and subatomic particles (electrons, protons etc.). You might have heard about atoms and subatomic particles in high school if you studied Physics and Chemistry then. Atoms are considered the smallest units of any matter. Subatomic particles, as the name suggests, are constituents of an atom. Electrons are negatively charged and protons are positively charged. Charge is a fundamental physical property of atom and matter. You can think of it as a property because of which nature exists, the way it exists. The design of an atom (its structure, its constituents like electrons, protons) enables it to react with other atoms in the most stable way, forming molecules which form matter and literally everything in the nature.

These atoms and subatomic particles oscillate and their motion is called Quantum Harmonic motion. How is Quantum Harmonic Motion different from Simple Harmonic motion? It turns out that the Potential Energy (PE) of the quantum system relates to something called Magnetic flux (I will be coming to this) in the same way as PE of our spring-object system relates to displacement. That relationship is plotted as a Parabola, but there is one major difference. Energy in Quantum Systems is Quantized! Let me explain this.

Transmons aren’t forever

When scientists wanted to understand atoms, they decided to conduct an experiment called Atomic emission of light. They filled up a tube with hydrogen atoms and excited them and as is the case with everything in nature, the atoms wanted to return to their lower and stable states. So they started emitting energy in the form of another quantum particle called Photon. Since energy is always conserved, a photon would carry as much energy as the difference between the excited state and lower energy state of the atom (the experiment was carried out in such a way that there were minimal external factors interfering with the setup). Photon is also defined as a quantum of light and so, its energy is a function of its frequency (think of it as frequency of waves). The scientists passed the photons emitted from the tube through lens and prisms, and found that photons of only specific frequencies are getting emitted(as there were distinct colors of light after passing them through the prism). This made them realize that the atoms can only take a few specific discrete states of energies. This is what we mean when we say energy is Quantized.

I had earlier mentioned about the analogy between simple harmonic oscillator and quantum harmonic oscillator in terms of the parabolic plot. Here is what the corresponding relation between P.E and magnetic flux (will come to this soon) looks like:

Quantum Harmonic Oscillator (atoms) energy landscape

There are a couple of interesting points to note here. We see waves at each energy state and each state is written in what we know very well as the braket notation of quantum states. In terms of 0 and 1, |3> would mean |11>. Every quantum state is a wave function.|0> state can be thought of as a discrete wave with only two possible positions (0 and 1) with amplitude of 1 at 0, but no amplitude at 1, which means there is 100% chance of finding the atom at 0. The best way to describe a quantum state is by considering it as a Probability Distribution and waves help us to do it because all we need to do to know probability at a point is by squaring the amplitudes of the wave (so the valid states would only be those for which we get valid probabilities). Actually, the waves at each state in the image above can be explained best by the famous Schrodinger’s Equation. This equation simply tells how the wave function of a state evolves with time and the evolution is dependent on the Hamiltonian operator (H).

Schrodinger’s Equation

Another quick point to note is the ‘classically forbidden region’. What does that even mean? Remember that in the spring experiment, we could have a maximum displacement, which means it was not practically possible to stretch the spring beyond that maximum displacement. Strangely and surprisingly, it is possible to go beyond this classically defined maximum displacement in case of quantum objects. That’s the reason you see the wave functions extending beyond the borders of the parabola.

Another point to note is that the Quantum Harmonic oscillator has its subsequent energy levels or states at equal distances, pictorially like below. The three red arrows are equal in length.

Transmon Qubits

Now, I think we know enough to get into details of the design of Transmon Qubits. When we say qubits, we essentially want quantum analogs of bits which take only 0 and 1 as values. What this means is that we somehow need to ensure that our atom or any quantum particle we are dealing with remains within |0> to |1> energy states. Theoretically, what do you think we need to do to achieve this or what do we change in the above image of the quantum harmonic oscillator to do this? An answer could be that we bring some non-linearity and by non-linearity I mean that the energy states do not look equidistant like how they look in the image above. This non-linearity is called ‘Anharmonicity’. We will get into details around this soon, but before that, I would like to mention some things about the energy states and the Hamiltonian. Schrodinger’s Equation is sometimes also written as:

This should remind you of something in Linear Algebra. Remember, if for a matrix A, we can write A|x> = b|x> where b is some complex number and A and |x> can contain complex numbers, then |x> is considered an eigenstate of A and b is the corresponding eigenvalue. So, in the equation above, Hamiltonian H is a matrix (remember, it is an operator that means total energy of the system),Ψ is an eigenstate of H and E (energy value) is the corresponding eigenvalue of H. So, all the energy states we have been showing in the images above are eigenstates of the Hamiltonian of the quantum system we are dealing with.

A quick note about operators in general: Every operator in quantum mechanics basically relates to something we want to measure about a quantum system and the measurement is with respect to the orthonormal basis comprising of eigenstates of the operator. For example, we can have a position operator, a momentum operator and so on. In case of Hamiltonian, it is the total energy and more importantly, its split into P.E and K.E. What we have been seeing so far in the images above, are plots of P.E with respect to magnetic flux (soon to be covered)in case of Quantum systems. The concept of measurement might seem a little confusing here because so far, when we dealt with quantum circuits at a high level, we have been thinking of measurements in terms of |0> and |1> for every qubit, and you might be wondering that the eigenstates of every operator need not necessarily be |0> and |1>. The fact is that technically, measurements can be done with respect to any orthonormal basis. In Quantum Computing, we usually convert this measurement with respect to Z-basis (|0> and |1>, which are eigenstates of Pauli Z matrix) because practically, we want to see our outputs in terms of 0s and 1s only, but I think it’s worth noting that every operation that we view as some rotation about the Bloch sphere, is also same as measuring something with respect to an operator.

Coming back to ‘Anharmonicity’, we discussed that we would like a setup where the energy states are not equidistant. This is facilitated by something called ‘Josephson Junction’. Adding Josephson junction would result in the same eigenstates or energy states as the Harmonic Quantum oscillator, but with different energy values or eigenvalues as compared to the harmonic counterpart. In terms of the energy diagram, we want something like below:

How would above ensure we stay within |0> and |1>. The lighter red arrow is equal in length to the darker red arrow, but its tip is not at any energy state. Since there are discrete energy states, there is no other option, but for the quantum system to fall back to the state it was being excited in (in this case |1>). The only way to reach any energy state is to have the arrow more or less exactly reach the energy state. Note that even though the lighter red arrow seems to cross |2>, the system would fall back to |1> because that’s where it is excited from in this example.

We will discuss more about this soon, but let’s get back to our fancy looking quantum computer. Now, we will dive a little deeper into its components.

Control of transmon qubits using a cryogenic CMOS integrated circuit (QuantumCasts)

The Quantum chip, chips placed at the bottom.

You see, the quantum chips are actually placed at the bottom of this fridge. One thing to note is that when we discussed about the atomic emission of light, the photons had extremely low frequencies and to deal with such low energy levels, we need to place this chip in an extremely low temperature with as less external forces (noise) possible. The chip also has readout resonators for the purpose of measurements of the quantum circuits. Both, the qubit and readout are constructed in a very similar way, except the qubit has a ‘Josephson Junction’. I will come to the classical electronic circuit design and the corresponding qubit design soon, but let’s also discuss some other things about the fridge that would also be helpful when we work with Qiskit Pulses. Somewhere, in the middle of the fridge, there are many types of wires or channels connected to different parts of the chip. Pulses are passed through some of these channels to perform the operations we want on the qubits. So, with qiskit pulses, we would be deciding what kind of pulses we should be passing to design an operation that we intend to do. Below is how we describe different wires or channels in Qiskit Pulses:

Now, let’s dive deeper into the design of a transmon qubit. Consider a simple circuit like below, you might have seen this in high school. The Physics and circuits going forward might seem a little intimidating, but if you observe, it is not that complicated:

The capacitor is a component that ensures steady flow of current even when the power is turned off. Since it ensures flow of current, capacitor’s energy is analogous to KE in the spring-object setup. The inductor is a component that resists a change in current, much like how the spring resists getting stretched. The way a inductor works is that it resists current initially. While some current still passes through it, a magnetic field (I will briefly mention what magnetic field is, later) develops around it (Fact: With current flowing through a wire, magnetic field is created perpendicular to the flow of current). Eventually, the resistance reduces and the magnetic field keeps getting stronger. When the power is turned off, the inductor resists that as well. As a response, it would convert magnetic field into electric current and pass that to the device we want to light up. Inductor behaves much like the spring at the maximum displacement from equilibrium and so, its energy is analogous to the PE of the spring object setup. On the quantum chip, the above setup looks like below.

The two silver metal plates in the above image form a capacitor. The small yellow link between them is the inductor. In case of a qubit, that inductor is replaced by a ‘Josephson Junction’. You can think of a Josephson junction (J) as an inductor with a special property that its resistance decreases at a much slower rate as compared to a normal inductor, as the current passing through it increases. This causes the non-linearity or Anharmonicity. Q refers to net charge on a plate (+ or -). With the josephson junction, the image of the simple classical circuit we considered, would look like below. Note that the normal inductor is replaced by Josephson junction J. L and E are inductance and energy of J, C is the capacitance. All components are made of superconducting material.

A classical circuit analogous to a single transmon qubit

I had earlier mentioned that the readout resonator is build similar to a qubit, except that it has normal inductor unlike the qubit which has a josephson junction. With respect to classical circuits, this set up would look like below:

Classical circuit representation: Transmon qubit (left circuit in pink) coupled with readout resonator (right circuit in green) with a capacitor

You can think of above in terms of the image of the transmon qubit. The readout resonator would look similar, except the small yellow link would be a normal inductor for it. The readout resonator and qubit are coupled with two metallic pads representing the capacitor. Similarly, qubits are also coupled and the corresponding classical circuit would look like below:

Classical circuit representation: Two transmon qubits coupled with a capacitor

We will now dive deeper into the Hamiltonian of a single transmon qubit. As usual, the classical circuits will help us get an idea and intuition of the qubit Hamiltonian.

The transmon qubit | QuTech Academy

Before going further, we have come across something called Magnetic flux and Magnetic field before. I think this is the right time to mention what these mean. Around a magnet, a magnetic field is just the region in which the magnetic force of attraction and repulsion are felt. Typically, the intensity of the magnetic force is expressed in terms of imaginary magnetic field lines. Magnetic flux is just the number of magnetic field lines passing through the magnetic field. In our case, a magnetic field gets generated perpendicular to the inductor when charges move through the Josephson junction, like in the image below:

Magnetic Field around J in transmon qubit

Q(t) is the charge on a plate, v(t) is the voltage (an electric current or a charge passes because of a non-zero voltage), i(t) is the electric current, ϕ(t) is the magnetic flux. The following relationships in the image below will be useful to get the right expression for the energies at the capacitor and at the inductor. If you think about them, it is intuitive why these quantities are related, at least by their definitions.

Note that the dot over a quantity represents its derivative w.r.t time. It turns out that after some simple derivations using some simple laws, the Hamiltonian of the classical circuit with a normal inductor (not Josephson junction) is:

From our high school, we might remember that KE is equal to p²/2m where p is momentum and m is mass of the object. From the spring experiment, we know that PE is 1/2*k*x². This should make you realize that there is some one to one correspondence between variables in the above hamiltonian and the hamiltonian of the string-object setup.

Now it should make sense why we were plotting Potential Energy of the quantum system with respect to magnetic flux. It is exactly analogous to why we plotted PE of spring-object setup w.r.t displacement.

In the classical circuit’s Hamiltonian equation above, C and L are constants and are properties of the given conductor and inductor respectively. This means that Q (charge) and ϕ (flux) vary at a particular energy level and this, should remind you of a Circle. For different levels of energy, the locus of the classical system with varying Q and ϕ would look like below. The concentric circles represent different levels of energy. The bigger the radius, the higher the energy level

The above is as far as the classical circuit system was concerned. In a quantum system, the system also has a wave behavior resulting in a probability distribution of its position. This would result in not very defined paths like above, but something like below:

The yellow circular region showing the possible area of finding the quantum system for the specific energy state.

Now, let’s think about the same Hamiltonian that we described for the classical circuit, a little differently. We had two metal plates as components of our transmon qubit and we had charge moving between the two plates. The moving of charges is a type of operation. So, there are two operations taking place at the same time, an addition of a charge on one plate (creation operator, a†) and removal of a charge from another plate (annihilation operator, a).

The operators a and a† do not commute and it turns out that difference between a*(a†) and (a†)*a is 1 or I. Also, there is something called as a resonance frequency (ω0). In the classical circuit, it is defined as 1/sqrt(LC) where L and C are inductance and capacitance, respectively. In case of the transmon qubit, it is the qubit frequency. Actually, we will see later that there will be an offset between the qubit frequency and drive frequency, but for now, let’s consider that to be qubit frequency. When we discussed about the atomic emission of light experiment, we said that the energy of the photon being emitted is directly proportional to its frequency and that the energy of the photon represents the energy difference between different energy states. Based on all that we said in the last couple of paras, it is intuitive to see that the Hamiltonian of the classical circuit and harmonic quantum oscillator can also be written as:

Also, from the definitions of operators a and a†, it is easily seen that a†*a |n> = n|n>. One can think of the state |n> as a state in which n amount of charge is being moved from one metallic plate to the other on the transmon qubit or the number of photons required to excite n charges so that they (n charges or electrons) move from one plate to the other: we saw in the atomic emission of light experiment that a photon with appropriate frequency could excite an atom to a particular energy level, an excitement to one energy level corresponds to excitement of a single charge or electron(these states form what is known as a fock space). Using this and the fact that a†*a and a*a† do not commute and their difference is 1, we can say that:

The 1/2 in the equation above is the zero point energy, which means that even if no charges are getting transferred from one plate to the other, there is some energy (total energy) in the system.

Let’s discuss a little more about ‘Zero point fluctuations (ZPF)’.

Quantum Harmonic Oscillator

In the image above, the wave at the |0> level looks like a gaussian distribution. The standard deviation of the distribution can be thought of as a combination of fluctuations in charge(Q) and magnetic flux (ϕ) at that state, with mean of both these operators as 0, which is easy to prove.

Above relations for charge(Q) and magnetic flux (ϕ) are good to know. With these relations, it is easy to prove that the mean of Q and mean of ϕ at the |0> level are 0 by showing <0|Q|0> and <0|ϕ|0> are 0.

So far, we have discussed everything for a Quantum Harmonic Oscillator, but the transmon qubit needs to be an anharmonic oscillator as we discussed before in this blog and the anharmonicity is created using a special type of an inductor called the Josephson junction. Here is how things change when we bring in the anharmonicity or non-linearity:

Energy in the Josephson junction is given by:

The PE vs flux plot would now look like:

So, to arrive at the Hamiltonian for this Quantum system, we would just have to adjust the Hamiltonian we got for quantum harmonic oscillator to take into account the non-linearity. We do this by using Taylor series expansion of cos in the energy equation of J and by some approximations. To understand this approximation at a high level, one thing to note is that we can write operator a as

This is because of the circular trajectory of the quantum harmonic oscillator we discussed earlier: On a circle x²+y²=r, every point can be written as x+iy. Since Hamiltonian of quantum harmonic oscillator was a circle in terms of Q(t) and ϕ(t), and we could also express that in terms of a, it is easier to think why a could be written as above at a particular energy level or state. So, in the expansion, we could write operators a and a† in the above form. As a result, some terms would show a rotation (e power something) and some terms won’t. The approximation would then involve removing terms involving rotation and retaining ones that don’t because the rotating terms do not contribute much to the total energy and hence, the Hamiltonian. This is called Rotating Wave Approximation (RWA). Such approximations are very common and the idea is to make the Hamiltonian simpler to deal with. I am omitting some derivations because the main idea is to get the intuition and a bigger picture. With the technique we used for quantum harmonic oscillator, the final Hamiltonian for an anharmonic oscillator would look like:

But for the transmon qubit, we only need to deal with |0> and |1> states. Even if we need to jump to higher states, we only need to hop between two consecutive phases. This means that we need to somehow get a smaller version of operators a and a†. Pauli matrices help here.

"Next generation superconducting qubits for quantum computing" presented by Jens Koch, Northwestern


Introduction to Pulses: Theory for single qubit

So far, whatever we discussed was building up for just defining the Hamiltonian of a single transmon qubit. In order to perform operations on these qubits, we need to apply some desired external push in the form of signals or waves (photons of appropriate frequencies: a single photon of the right frequency for a single level energy excitement) so that what we apply interacts with the Hamiltonian and the energy setup of the transmon qubit to eventually end up giving us the operation we want. Here’s what this would look like:

This means that as users of a quantum computer, we might not be able to do much at the level of a transmon qubit itself, but we can control Ω(t) and this is what Qiskit pulses help us do through a driver channel. For multi qubit gates, there are control channels, but we won’t be getting into that in this blog. A nice pictorial representation of what I just mentioned is below:

What we pass as a drive signal would be discrete and therefore, the signal in the extreme left looks like how it looks. The centre signal is the qubit’s wave function. The combination of the two gives us a result that we would like to optimize to perform the desired operation. You should be realizing that with Ω(t), my overall Hamiltonian would now get affected. It won’t be just the transmon’s Hamiltonian. Typically in Qiskit, Ω(t) is considered as a simple sine wave defined as

a sine wave is simply defined as below.

Keynote: Superconducting qubits for quantum computation: transmon vs fluxonium

This makes it easy because when deciding what we need to pass through the drive channel, we just need to decide the right amplitude, frequency, phase and also t(Duration). We don’t need to bother much about what the shape of the wave should be because it is designed to be a sine wave.

Since we are passing a signal through a drive channel which could be in the form of photons or charge (current), the Hamiltonian of the drive should also include the annihilation and creation operators. It looks like below:

Note that in the above, we have used the equivalents of the annihilation and creation operators in terms of Pauli matrices.

Combining the qubit hamiltonian and drive hamiltonian, we get:

Delta is the offset or difference between qubit frequency and drive frequency:

With all this theory, we are now finally ready to understand Qiskit Pulses practically:

Pulses: Practical usage and Challenge 4

Before getting into the challenge, let’s code something to get an idea of what we will be doing:

In the first cell, I have just imported some modules and packages, in the 2nd cell, I am just defining some style for plotting. The third cell is where we are building pulses or we are passing some type of signals through each of the drive channels. Every drive channel is connected to a qubit on the chip. So, you could think of above as you passing signals or waves to 5 qubits each connected to a drive channel. Let’s see what this gives as an output:

You should realize that what we are passing through a channel is a waveform. Above is just an example. Note that in the above, the signals don’t carry any frequency, they are just samples of 0s and 1s. Though we can call them waveforms in general, they are practically not useful to perform anything on the quantum chip. We would also prefer something similar to sine wave as described in the theory part of pulses. In Qiskit, there are some pre-defined waveforms like Gaussian, SquaredGaussian, Drag etc. These are essentially probability distributions and depending on what we pass as parameters to these, qiskit pulse would simulate and throw some discrete values (samples) which would constitute a waveform. We can create custom waveforms or we could pass the built-in ones I mentioned. One could also combine various built-in ones to get a customized waveform. This has lots of benefits. The backends already have calibrated pulses for various native gates. Someone could play around with these pulses and come up with something that is less susceptible to external noise. This would just make the quantum hardware more capable of handling noise.

If we are building pulses, we need to work with actual hardware. For the challenge, we were given access to this backend called ibm_jakarta. Here’s how an account can be assigned to a ibm backend, provided the account is given access to the backend:

Next, we try finding a little about our backend:

For the challenge, we dealt with the qubit that was connected to drive channel indexed as 0 (d0), so we take the qubit as 0. You will notice, we found something called Sampling time (dt). That is an attribute of the backend. It is basically the time difference between two samples of the simulated waveform. You will soon see that Duration is seen in the units of dt and this means that number of samples in a waveform is actually equal to the duration. In fact, in the first simple example, you can see that already. Next, we find the native gates for which the backend already has some calibrated pulses. The set we see here in ibm_jakarta are usually the ones that are calibrated in all the backends.

Towards next-generation physical and logical qubits in superconducting circuits, by Chen Wang

Next in the challenge notebook, we found the calibrated pulses for measure gate. The calibrated pulses for measure would appear on the measurement channel when we draw it. Measurement channels are actually connected to the readout resonators on the chip as we discussed in this blog. Here’s the code for it:

Let’s understand this drawing a bit. In general, the pulse schedule diagrams look like below:

d refers to drive channels, m to measurement channels, u to control channels and a to acquire channels. For each type of channel, there is a specific color coding of the pulses being passed through them (in acquire channels, no pulses are passed, they are just set of wires connected to readout resonators for the purpose of storing and digitizing data). You will also notice shades of colors. For example, I see a bright blue and a dark blue pulse on d1. The samples of a waveform are complex numbers, which means that the waveform can have a real and a complex component. The bright shows the real part and the dark shows the imaginary part. In the above case, the pulses on d1 coincidentally only have either a real part or an imaginary part. In reality, there can be overlap of the shades on the same pulse which means it has both the real and imaginary parts.

Let’s see what we can understand from calibrated pulses of Pauli x gate. For this purpose, I used ibmq_armonk backend, which is available for everyone since I no longer have access to ibm_jakarta, though in the challenge, we used ibm_jakarta.

Though the waveform is of the type Drag, it looks similar to a gaussian distribution. If we go by the documentation of Drag pulses, it is actually meant to be similar to a Gaussian pulse with some adjustments. (https://qiskit.org/documentation/stubs/qiskit.pulse.Drag.html). If we want to look at the samples of this waveform, we could create a Drag pulse with the same parameters. Let’s do that.

The number of samples is the same as the duration, as we expected.

Now, just for a while, let’s go and look at the Hamiltonian we got earlier which comprised of the qubit and drive Hamiltonian taken together:

The first term in the hamiltonian is that which corresponds to the qubit, but the frequency is offset by the drive frequency (delta = qubit frequency — drive frequency) and the second term corresponds to the drive hamiltonian. The Hamiltonian can more simply be written as:

This means that if the delta in the first term is 0 (which is only possible if the qubit frequency = drive frequency), then the second term alone would just cause a rotation about x-axis. This condition is referred to as ‘on resonance’: ‘The drive frequency is on resonance with the qubit frequency’. If delta is not 0, the first term would contribute to a rotation about z-axis, which is just a phase shift if you see the 2*2 matrix corresponding to rz operation. This means that the drive frequency plays a key role in deciding the type of rotation. For any rotation, about x axis, we would require the drive frequency to be same as the qubit frequency and finding this frequency to build appropriate pulses would be crucial.

An important point to note here. When we say rotation of a qubit, a rotation is not literally happening on the quantum chip. Recall that when we came up with the Hamiltonian, we used Pauli matrices so that we get smaller versions of the matrices corresponding to annihilation and creation operators. Annihilation and creation operators are just representing transfer from or to one of the two metallic plates in our qubit design. So the pulses we pass are causing some excitations of the charges in order to perform an annihilation or creation. The charge can only be in its discrete energy states, but the pulses could still be such that they cause slight excitation which we denote through superpositions and the superpositions are important because what we are usually interested in is the discrete energy state the charge is in after a sequence of pulses, each causing some excitation. Luckily for us, since these kind of excitations can be expressed as matrices, it becomes easier to think in terms linear algebra and rotations around Bloch sphere. So, one could think of excitations of charge along a certain axis as rotation about x-axis and similar for rotation about z-axis (like how rz causes phase shifts, in terms of transmon qubits, that’s just change of orientation of charge).

Chunqing Deng, Fluxonium Qubits for Ultra high fidelity and Scalable Quantum Processors

Let’s also look at the calibrated pulses for other native gate sets. The proposed gate sets are only x, sx, cx and rz (which you also saw in challenge 1). However, the backends still consider u1,u2,u3 as native gates. u1 is same as rz and u2, u3 can easily be written in terms of rz and sx, which is also given in the challenge notebook.

Both sx and x are calibrated with Drag pulses, with the only difference being in the amplitude. You will notice that the only difference in the pulses between sx and x is the amplitude and the beta (which adjusts the amplitude). This means that amplitude is also an important factor in deciding the type of rotation.

Note that a rotation about Y axis can be achieved by a combination of rotations about x and z axis. So, we won’t need to consider that separately.

rz rotation doesn’t really involve a pulse, it is just a phase shift. A phase shift is created using a technique called frame change which creates virtual rz. I am yet to learn more about that, but on the schedule plot, a phase shift looks like a circular arrow like below:

Note the schedule of the u3 gate. We know how it can be implemented in terms of rz and sx. The pulse for every subsequent gate starts at the end of the duration of the previous gate. For eg: it starts with a rz (phase shift) at 0. Since phase shift is not a pulse and does not involve duration, the following sx gate also starts at 0 and has a duration of 320. The subsequent rz starts at 320 and so on. This means that while building schedules, the start and end point of a pulse in it matters.

Let’s get into the challenge now:

The challenge notebook starts with a solved question on finding the right frequency for |0> to |1> transition for Pauli X gate. Since Pauli X gate is a rotation by angle pi about x axis, all we want is that the drive frequency should be same as the qubit frequency, as we discussed above. This ends up being an optimization problem, where we ‘sweep’ over a set of frequencies. Typically in an optimization problem, it’s important that we also choose appropriate set of values to sweep over, but in our case, it was made simpler by a helper module. All we needed was to provide a center frequency we already got from the backend in one of the code snippets above. See the entire code for this below:

The frequencies obtained from the helper module look like below, they are close to the center frequency specified, as expected:

The schedules are executed as jobs on the backend, after which we need to optimize. But what function do we optimize? Often a Lorentzian function is chosen. If we you have studied Statistics, you might have heard about Cauchy distribution. Lorentzian function is Cauchy distribution. The function for our use case looks like:

The code and the results are below. The helper module helped us again.

High fidelity quantum gate via accelerated adiabatic evolution

Why do we want a cauchy distribution? The idea is that we want to find a frequency for which the probability of transition from |0> to |1> is maximum, given the amplitude. Why are we so sure that this would result in the desired frequency? It is because we will start with an amplitude and a guessed frequency that are almost correct for a X gate. You might be thinking that a lot of things here start with proper guesses and we also adjusted the duration of gaussian pulse based on what we got for the calibrated Drag pulses. So, it might seem like we are not doing much. I think the idea here is that we would be improvising on what is already existing and known, and that makes sense (although here, we are not trying to improve the X pulses, but that’s the motivation for tasks where the goal would be to improvise). This is a simpler task, but there can be more complicated gates like a Swap gate and it’s constantly being researched upon how one can improve gates like these. Even there, I think the initial idea would be to use a combination of existing calibrations of native gates, try improvising on the existing ways in which pulses are scheduled for swap gates and then, later try customized pulses. Below is a check of how correct our new frequency from optimization is.

Once we get the right frequency, we would like to get the the right amplitude. Again, we would sweep over a set of amplitudes around an initial almost correct guess. The function we fit here would be a simple sine wave function. We just want the amplitude for which the period of the sine wave is the angle pi.

The actual challenge was to find the right frequency for a transition from |1> to |2> so that we get a qutrit instead of a qubit. We already have the right frequency and amplitude for Gaussian pulse to transition from |0> to |1>. For transition from |1> to |2>, we would again perform a frequency sweep after we have applied the gaussian pulse to move from |0> to |1>. What helps us here is the fact that we can set a frequency multiple times in a schedule. After a frequency is set, all pulses would have that frequency, till we set another frequency within the schedule. Code snippets below:

We would again be optimizing Lorentzian function:

With this, I end this blog. Hope you liked reading it. I would like to mention, we only covered a simple hamiltonian for a single qubit. There are many other types of hamiltonians that take into account qubit-qubit interactions, qubit-readout resonator interactions etc.

For more details on transmon qubits, you could check out lectures 16 to 21 and labs 6 and 7 from Qiskit Global Summer School 2020 (https://qiskit.org/learn/intro-qc-qh/). You could also check out Qiskit Metal : https://qiskit.org/metal/ and https://youtube.com/playlist?list=PLOFEBzvs-VvqHl5ZqVmhB_FcSqmLufsjb. For more on pulses, you could go through chapter 6 from qiskit textbook: https://qiskit.org/textbook/ch-quantum-hardware/index-pulses.html

More Information:

https://qiskit.org/textbook/ch-quantum-hardware/transmon-physics.html

https://www.qutube.nl/quantum-computer-12/the-transmon-qubit-160

https://towardsdatascience.com/the-ultimate-beginners-guide-to-quantum-computing-and-its-applications-5b43c8fbcd8f

https://slideplayer.com/slide/3361519/

https://www.paperdigest.org/2021/02/highlights-of-quantum-information-dqi-talks-aps-2021-march-meeting/

https://sambader.net/wp-content/uploads/2013/12/Transmon_Paper.pdf

https://www.paperdigest.org/2022/02/highlights-of-quantum-information-dqi-talks-aps-2022-march-meeting/

https://epjquantumtechnology.springeropen.com/articles/10.1140/epjqt/s40507-021-00107-w

https://www.qutube.nl/quantum-computer-12/the-transmon-qubit-160

https://www.qutube.nl/courses-10/quantum-computer-12/nv-center-qubits-173

https://patents.google.com/patent/US10847705B2/en

https://sambader.net/wp-content/uploads/2013/12/Transmon_Paper.pdf

https://www.nature.com/articles/s41467-022-29287-4

https://ocw.tudelft.nl/course-lectures/4-2-1-the-transmon-qubit/


Neuromorphic Computing-How The Brain-Inspired Technology

$
0
0

 

Neuromorphic Computing,  
How The Brain-Inspired Technology and Intel Loihi 2

Memristors—From In-Memory Computing, Deep Learning Acceleration, and Spiking Neural Networks to the Future of Neuromorphic and Bio-Inspired Computing

Inspired by:

Adnan Mehonic, Abu Sebastian, Bipin Rajendran, Osvaldo Simeone, Eleni Vasilaki, Anthony J. Kenyon

Machine learning, particularly in the form of deep learning (DL), has driven most of the recent fundamental developments in artificial intelligence (AI). DL is based on computational models that are, to a certain extent, bio-inspired, as they rely on networks of connected simple computing units operating in parallel. The success of DL is supported by three factors: availability of vast amounts of data, continuous growth in computing power, and algorithmic innovations. The approaching demise of Moore's law, and the consequent expected modest improvements in computing power that can be achieved by scaling, raises the question of whether the progress will be slowed or halted due to hardware limitations. This article reviews the case for a novel beyond-complementary metal–oxide–semiconductor (CMOS) technology—memristors—as a potential solution for the implementation of power-efficient in-memory computing, DL accelerators, and spiking neural networks. Central themes are the reliance on non-von-Neumann computing architectures and the need for developing tailored learning and inference algorithms. To argue that lessons from biology can be useful in providing directions for further progress in AI, an example-based reservoir computing is briefly discussed. At the end, speculation is given on the “big picture” view of future neuromorphic and brain-inspired computing systems.

1 Introduction

Three factors are currently driving the main developments in artificial intelligence (AI): availability of vast amounts of data, continuous growth in computing power, and algorithmic innovations. Graphics processing units (GPUs) have been demonstrated as effective co-processors for the implementation of machine learning (ML) algorithms based on deep learning (DL). Solutions based on DL and GPU implementations have led to massive improvements in many AI tasks, but have also caused an exponential increase in demand for computing power. Recent analyses show that the demand for computing power has increased by a factor of 300 000 since 2012, and the estimate is that this demand will double every 3.4 months—at a much faster rate than improvements made historically through Moore's scaling (a sevenfold improvement over the same period of time).[1] At the same time, Moore's law has been slowing down significantly for the last few years,[2] as there are strong indications that we will not be able to continue scaling down complementary metal–oxide–semiconductor (CMOS) transistors. This calls for the exploration of alternative technology roadmaps for the development of scalable and efficient AI solutions.

Transistor scaling is not the only way to improve computing performance. Architectural innovations, such as GPUs, field-programmable arrays (FPGAs), and application-specific integrated circuits (ASICs), have all significantly advanced the ML field.[3] A common aspect of modern computing architectures for ML is a move away from the classical von-Neumann architecture that physically separates memory and computing. This approach yields a performance bottleneck that is often the main reason for both energy and speed inefficiency of ML implementations on conventional hardware platforms due to costly data movements. However, architectural developments alone are not likely to be sufficient. In fact, standard digital CMOS components are inherently not well suited for the implementation of a massive number of continuous weights/synapses in artificial neural networks (ANNs).


Neuromorphic Computing-How The Brain-Inspired Technology | Neuromorphic Artificial Intelligence |


1.1 The Promise of Memristors

There is a strong case to be made for the exploration of alternative technologies. Although the memristor technology is currently still in development, it is a strong candidate for future non-CMOS and beyond von-Neumann computing solutions.[4] Since its early development in 2008,[5] or even earlier under different names,[6] memristor technology expanded remarkably to include many different materials solutions, physical mechanisms, and novel computing approaches.[4] A single progress report cannot cover all different approaches and fast-growing developments in the field. The evaluation of state of the art in memristor-based electronics can be found elsewhere.[7] Instead, in this article, we present and discuss a few representative case studies, showcasing the potential role of memristors in the expanding field of AI hardware. We present the examples of how memristors are used for in-memory computing systems, DL accelerators, and spike-based computing. Finally, we discuss and speculate on the future of neuromorphic and bio-inspired computing paradigms and provide reservoir computing as an example.

For the last 15 years, memristors have been a focal point for many different research communities—mathematicians, solid-state physicists, experimental material scientists, electrical engineers, and, more recently, computer scientists and computational neuroscientists. The concept of memristor was introduced almost 50 years ago, back in 1971,[8] was nearly forgotten for almost four decades. There are many different flavors of memristive technologies. Still, in their most popular implementation, memristors are simple two-terminal devices with the extraordinary property that their resistance depends on their history of electrical stimuli. In other words, memristors are resistors with memory. They promise high levels of integration, stable non-volatile resistance states, fast resistance switching, and excellent energy efficiency—all very desirable properties for next generation of memory technologies.

The physical implementations of memristors are broad and arguably include many different technologies, such as redox-based resistive random-access memory (ReRAM), phase-change memories (PCMs), and magnetoresistive random-access memory (MRAM). Further differentiations within larger classes can be made, depending on physical mechanisms that govern the resistance change. Many excellent reviews cover the principles and switching mechanisms of memristor devices. Here, we will briefly mention two extensively studied types of memristive devices, namely, ReRAM and PCM.

Resistance switching is one of the most explored properties of memristive devices. A thin insulating film reversibly changes its electrical resistance—between an insulating state and a conducting state—under the application of an external electrical stimulus. For binary memory devices, two stable states are sought, typically called the high resistance state (HRS) and the low resistance state (LRS). The transition from the HRS to the LRS is called a SET process, whereas a RESET process describes the transition from the LRS to the HRS.

Basic memory cells of both types, in their most straightforward implementation, have three layers—two conductive electrodes and a thin switching layer sandwiched in-between. Local redox processes govern resistance switching in ReRAM devices. A broad classification can be made based on a distinction between the switching that happens as a result of intrinsic properties of the switching material (typically oxides) and switching that is the result of in-diffusion of metal ions (typically from one of the metallic electrodes). The former type is called intrinsic switching, and the latter is called extrinsic switching.[9] Alternatively, a classification can be made depending on the main driving force for the redox process (thermal or electrical), or the type of ions that move. The main three classes are electrochemical metallization cells (or conductive bridge) ReRAMs (ECM), valence change ReRAMs (VCM), and thermochemical ReRAMs (TCM).[4]

Many ReRAM devices require an electroforming step prior to resistance switching. This can be considered a soft breakdown of the insulating material. A conductive filament is produced inside the insulating film as a result of the applied electrical bias. Modification of conductive filaments, led by a local redox process, leads to the change of resistance. The diameter of the conductive filament is typically on the order of a few nanometers to a few tens of nanometers, and it does not depend on the size of the electrodes. Another, less common type is interface-type switching, which does not depend on creation and modification of conductive filaments, but can be driven by the formation of a tunnel or Schottky barrier across the whole interface between electrode and switching layer.

In the case of PCMs, the change of resistance due to the crystallization and amorphization processes of phase change materials. Amplitude and duration of applied voltage pulses control the phase transitions – the SET process changes the amorphous to a crystalline phase (HRS to LRS transition), and the RESET process changes the crystalline to an amorphous phase (LRS to HRS transition).

For many computing tasks, more than two states are required, and for most memristive devices, including ReRAMs and PCMs, many resistance states can be achieved. However, benchmarking of memristive devices for different applications, beyond pure digital memory, can be challenging and relies on many different parameters other than the number of different resistance states. We will discuss the main device properties in the context of different applications.

Neuromorphic computing with emerging memory devices


1.2 The Landscape of Different Approaches and Applications

In the context of this article, memristors can be used in applications beyond simple memory devices.[10] A “big picture” landscape of memristor-based approaches for AI is shown in Figure 1. There is more than one way that memristors can perform computing. A unique feature of memristor devices is the ability to co-locate memory and computing and to break the von-Neumann bottleneck at the lowest, nanometer-scale level. One such approach is the concept of in-memory computing, which uses memory not only to store the data but also to perform computation at the same physical location. Furthermore, memristors have long been considered for DL acceleration. In particular, memristive crossbar arrays physically represent weights in ANNs as conductances at each cross-point. When voltages are applied at one side of the crossbar and current sensed on the orthogonal terminals, the array provides vector-matrix multiplication in constant time step using Kirchhoff's and Ohm's laws. Vector-matrix multiplications dominate most DL algorithms—hundreds of thousands are often needed during training and inference. When weights are implemented as memristor conductances, there is no need for the extensive power-hungry data movement required by conventional digital systems based on the von-Neumann architecture.

The landscape of memristor-based systems for AI. In-memory computing aims to eliminate the von-Neumann bottleneck by implementing compute directly within the memory. DL accelerators based on memristive crossbars are used to implement vector-matrix multiplication directly using Ohm's and Kirchhoff's laws. SNNs, a type of ANNs, are biologically more plausible and do not operate with continuous signals, but use spikes to process and transfer data. Memristor systems could provide a hardware platform to implement spike-based learning and inference. More complex functionalities (neuromorphic), beyond simple digital switching CMOS paradigm, directly implemented in memristive hardware primitives, might fuel the next wave of higher cognitive systems.

Other more bio-realistic concepts are also being explored. These include schemes relying on spike-based communication. The central premise of this approach can be summarized with the motto “computing with time, not in time.” It has been shown that memristors can directly implement some functions of biological neurons and synapses, most importantly, synapse-like plasticity, and neuron-like integration and spiking. In these solutions, the information is encoded and transferred in the form of voltage or current spikes. Memristor resistances are used as proxies for synaptic strengths. More importantly, adjustment of the resistances is controlled according to local learning rules. One popular local learning rule is spike-timing-dependent plasticity (STDP), which adjust a local state variable such as conductance dynamically based on the relative timing of spikes. In a simple example, the conductance of a memristive “synapse” can be increased or decreased depending on the degree of overlap between pre- and post-synaptic voltage pulses. There also exist implementations that do not require overlapping pulses, instead utilizing the volatile internal dynamics of memristive devices. Spike-based computing promises further improvements in power efficiency, taking the inspiration from the remarkable efficiency of the human brain. It is important to note that there exist many challenges related to full adoption of memristor technologies. We discuss some of the common device and system non-idealities and some proposed schemes to deal with these. In contrast, when carefully controlled, some of these non-idealities, such as stochastic switching, can be harnessed for probabilistic computing. We provide an overview of some algorithmic approaches of probabilistic spiking neural networks (SNNs) and provide references to hardware and memristor-based implementations.

Finally, we speculate that, for future developments in AI, new knowledge and computational models from the fields of computational neuroscience could play a crucial role. Virtually, all recent developments in ML and DL are driven by the field of computer science. At the same time, the algorithmic inspiration from neuroscience is mostly based on old models established as early as the 1950s. Although we are still at the infancy of understanding the full working principles of the biological brain, novel brain-inspired architectural principles, beyond simple probabilistic DL approaches, could lead to higher level cognitive functionalities. One such example is the concept of reservoir computing, which we discuss briefly in the article. It is unlikely that current digital CMOS transistor technology can be optimized for the implementation of much more dynamic and adaptive systems in an efficient way. In contrast, memristor-based systems, with their rich switching dynamics and many state variables, may provide a perfect substrate to build a new class of intelligent and efficient neuromorphic systems.


Neuromorphic computing with memristors: from device to system - Professor Huaqiang Wu


2 In-Memory Computing

In the von-Neumann architecture, which dates back to the 1940s, memory and processing units are physically separated, and large amounts of data need to be shuttled back and forth between them during the execution of various computational tasks. The latency and energy associated with accessing data from the memory units are key performance bottlenecks for a range of applications, in particular, for the increasingly prominent AI-related workloads.[11] The energy cost associated with moving data is a key challenge for both severely energy constrained mobile and edge computing as well as high-performance computing in a cloud environment due to cooling constraints. The current approaches, such as using hundreds of processors in parallel[12] or application-specific processors,[13] are not likely to fully overcome the challenge of data movement. It is getting increasingly clear that novel architectures need to be explored where memory and processing are better collocated.

In-memory computing is one such non-von-Neumann approach where certain computational tasks are performed in place in the memory itself organized as a computational memory unit.[14-17] As schematically shown in Figure 2, in-memory computing obviates the need to move data into a processing unit. Computing is performed by exploiting the physical attributes of the memory devices, their array-level organization, the peripheral circuitry, and the control logic. In this paradigm, the memory is an active participant in the computational task. Besides reducing latency and energy cost associated with data movement, in-memory computing also has the potential to improve the computational time complexity associated with certain tasks due to the massive parallelism afforded by a dense array of millions of nanoscale memory devices serving as compute units. By introducing physical coupling between the memory devices, there is also a potential for further reduction in computational time complexity.[18, 19] Memristive devices, such as PCM, ReRAM, and MRAM,[20, 21] are particularly well suited for in-memory computing.

In-memory computing. In a conventional computing system, when an operation f is performed on data D, D has to be moved into a processing unit. This incurs significant latency and energy cost and creates the well-known von-Neumann bottleneck. With in-memory computing, f(D) is performed within a computational memory unit by exploiting the physical attributes of the memory devices. This obviates the need to move D to the processing unit.

There are several key physical attributes that enable in-memory computing using memristive devices. First of all, the ability to store two levels of resistance/conductance values in a non-volatile manner and to reversibly switch from one level to the other (binary storage capability) can be exploited for computing. Figure 3a shows the resistance values achieved upon repeated switching of a representative memristive device (a PCM device) between LRS and HRS. Due to the LRS and the HRS, resistance could serve as an additional logic state variable. In conventional CMOS, voltage serves as the single logic state variable. The input signals are processed as voltage signals and are output as voltage signals. By combining CMOS circuitry with memristive devices, it is possible to exploit the additional resistance state variable. For example, the HRS state could indicate logic “0,” and the LRS state could denote logic “1.” This enables logical operations that rely on the interaction between the voltage and resistance state variables and could enable the seamless integration of processing and storage. This is the essential idea behind memristive logic, which is an active area of research.[24-26] Memristive logic has the potential to impact application areas, such as image processing,[27] encryption, and database query.[28] Brain-inspired hyper-dimensional computing that involves the manipulation of large binary vectors has recently emerged as another promising application area for in-memory logic.[29, 30] Going beyond binary storage, certain memristive devices can also be programmed to a continuum of resistance or conductance values (analog storage capability). For example, Figure 3b shows a continuum of resistance levels in a PCM device achieved by the application of programming pulses with varying amplitude. The device is first programmed to the fully crystalline state, after which RESET pulses are applied with progressively increasing amplitude. The device resistance is measured after the application of each RESET pulse. Due to this property, it is possible to program a memristive device to a certain desired resistance value through iterative programming by applying several pulses in a closed-loop manner.[31] Yet another physical attribute that enables in-memory computing is the accumulative behavior exhibited by certain memristive devices. In these devices, it is possible to progressively reduce the device resistance by the successive application of SET pulses with the same amplitude. Also, in certain cases, it is possible to progressively increase the resistance by the successive application of RESET pulses. Experimental measurement of this accumulative behavior in a PCM device is shown in Figure 3c. This accumulative behavior is central to applications, such as training deep neural networks (DNNs), which is described later. Furthermore, the behavior is not limited to PCM devices, and most memristor-based technologies show potential for multi-level switching and gradual resistance modulation. Therefore, the applications presented using this feature in PCM technology could, in principle, be achieved using most other memristor technologies. We refer readers to dedicated review articles and previous literature that cover gradual resistance modulation obtained in different memristor techniques.[10] The intrinsic stochasticity associated with the switching behavior in memristive devices can also be exploited for in-memory computing.[32] Applications include stochastic computing[33] and physically unclonable functions.[34] We will discuss these concepts in more detail later in the text.

In-Memory Computing: Memory Devices & Applications (ETH Zürich, Fall 2020)

The key physical attributes of memristive devices that facilitate in-memory computing. a) Binary storage capability whereby the devices can be switched between high and low resistance values in a repeatable manner. Adapted with permission.[22] Copyright 2019, IOP Publishing. b) Multi-level storage capability whereby the devices can be programmed to a continuum of resistance values by the application of appropriate programming pulses. Adapted with permission.[23] Copyright 2018, American Institute of Physics. c) The accumulative behavior whereby the resistance of a device can be progressively decreased by the successive application of identical programming pulses. Reproduced with permission.[23] Copyright 2018, American Institute of Physics.

A very useful in-memory computing primitive enabled by the binary and analog non-volatile storage capability is matrix-vector multiplication (MVM).[7, 35] The physical laws that are exploited to perform this operation are Ohm's law and Kirchhoff's current summation laws. For example, to perform the operation Ax = b, the elements of A are mapped linearly to the conductance values of memristive devices organized in a crossbar configuration. The x values are mapped linearly to the amplitudes of read voltages and are applied to the crossbar along the rows. The result of the computation, b, will be proportional to the resulting current measured along the columns of the array. The concept is shown in Figure 4 and can be summarized by a simple equation that relates the voltage vector, V, the conductance matrix, G, and the current vector, I, as I = G V.

a) Memristor crossbar array. b) Applied (read) voltages and conductance of memristor devices in a crossbar array define an input vector and an input matrix, whereas sensed currents provide a resulting vector of vector-matrix multiplication.

Compressed sensing and recovery are one of the applications that could benefit from an in-memory computing unit that performs MVMs. The objective behind compressed sensing is to acquire a large signal at sub-Nyquist sampling rate and to subsequently reconstruct that signal accurately. Unlike most other compression schemes, sampling and compression are done simultaneously, with the signal getting compressed as it is sampled. Such techniques have widespread applications in the domain of medical imaging, security systems, and camera sensors. The compressed measurements can be thought of as a mapping of a signal x of length N to a measurement vector y of length M < N. If this process is linear, then it can be modeled by an M × N measurement matrix M. The idea is to store this measurement matrix in the in-memory computing unit, with memristive devices organized in a crossbar configuration (see Figure 5a). In this manner, the compression operation can be performed in O(1) time complexity. To recover the original signal from the compressed measurements, an approximate message passing (AMP) algorithm can be used, using an iterative algorithm that involves several MVMs on the very same measurement matrix and its transpose. In this way, the same matrix that was coded in the in-memory computing unit can also be used for the reconstruction, reducing reconstruction complexity from O(M × N) to O(N). An experimental illustration of compressed sensing recovery in the context of image compression is shown in Figure 5b. A 128 × 128 pixel image was compressed by 50% and recovered using the measurement matrix elements encoded in a PCM array. The normalized mean square error (NMSE) associated with the recovered signal is plotted as a function of the number of iterations. A remarkable property of AMP is that its convergence rate is independent of the precision of the MVMs. The lack of precision only results in a higher error floor, which may be considered acceptable for many applications. Note that, in this application, the measurement matrix remains fixed, and hence, the property of PCM that is exploited is the multi-level storage capability.

a) Compressed sensing involves one MVM. Data recovery is performed via an iterative scheme, using several MVMs on the very same measurement matrix and its transpose. b) An experimental illustration of compressed sensing recovery in the context of image compression is presented, showing 50% compression of a 128 × 128 pixel image. The NMSE associated with the reconstructed signal is plotted against the number of iterations.


Neuromorphic Computing by Prof. Shubham Sahay


3 DL Accelerators

DNNs, loosely inspired by biological neural networks, consist of parallel processing units called neurons interconnected by plastic synapses. By tuning the weights of these interconnections using millions of labeled examples, these networks are able to perform certain supervised learning tasks remarkably well. These networks are typically trained via a supervised learning algorithm based on gradient descent. During the training phase, the input data are forward propagated through the neuron layers with the synaptic networks performing multiply accumulate operations. The final layer responses are compared with input data labels, and the errors are backpropagated. Both steps involve sequences of MVMs. Subsequently, the synaptic weights are updated to reduce the error. This optimization approach can take multiple days or weeks to train state-of-the-art networks on conventional computers. Hence, there is a significant effort toward the design of custom ASICs based on reduced precision arithmetic and highly optimized dataflow.[13, 37] However, the need to shuttle millions of synaptic weight values between the memory and processing unit remains a key performance bottleneck, both for power and time efficiency, and, hence, in-memory computing is being explored as an alternative approach for both inference and training of DNNs.[38, 39] The essential idea is to map the various layers of a neural network to an in-memory computing unit where memristive devices are organized in a crossbar configuration (see Figure 6). The synaptic weights are stored in the conductance state of the memristive devices, and the propagation of data through each layer is performed in a single step by inputting the data to the crossbar rows and deciphering the results at the columns.

DL based on in-memory computing. The various layers of a neural network are mapped to a computational memory unit where memristive devices are organized in a crossbar configuration. The synaptic weights are stored in the conductance state of the memristive devices. A global communication network is used to send data from one array to another.

DL inference refers to just the forward propagation in a DNN once the weights have been learned. Both binary and analog storage capability of memristive devices can be exploited for the MVM operations associated with the inference operation. The key challenges are the inaccuracies associated with programming the devices to a specified synaptic weight as well as drift, noise, etc., associated with the conductance values.[40] Due to this programming noise, the synaptic weights that are obtained by training in high precision arithmetic (e.g., 32 bit floating point) cannot be mapped directly to computational memory. However, it can be shown that by customizing the training procedure to make it aware of these device-level non-idealities, it is possible to obtain synaptic weights that are suitable for being mapped to an in-memory computing unit.[39] For conductance drift, global scaling procedures or periodic calibration of the batch normalization parameters have been found to be very effective.[39] Figure 7 shows the mixed hardware/software experimental results using a prototype multi-level PCM chip. The synaptic weights are mapped to PCM devices organized in a two-PCM differential configuration (723 444 PCM devices in total). The differential configuration means that two memristors are used per synaptic weight, so both positive and negative weights can be represented. It can be seen that with a hardware-aware custom training scheme, it is possible to approach the floating-point baseline. The temporal decline in accuracy is attributed to the conductance drift exhibited by PCM devices.[41] However, with appropriate compensation schemes, it is possible to maintain software equivalent accuracies over a substantial period of time.

DL inference. Experimental results on ResNet-32 using the CIFAR-10 data set. The classification accuracies obtained via the direct mapping and custom training approaches are compared with the floating-point baseline.

In-memory computing can also be used in the context of supervised training of DNNs with backpropagation. When performing training of a DNN encoded in crossbar arrays, forward propagation is performed in the same way as inference described earlier. Next, backward propagation is performed by inputting the error gradient from the subsequent layer onto the columns of the current layer and deciphering the result from the rows. Subsequently, the error gradient is computed. Finally, the weight update is performed based on the outer product of activations and error gradients of each layer. This weight update relies on the accumulative behavior of memristive devices. Recent DL research shows that when training DNNs, it is possible to perform the forward and backward propagations rather imprecisely, whereas the gradients need to be accumulated in high precision.[42] This observation makes the DL training problem amenable to the mixed-precision in-memory computing approach that was recently proposed.[43] The in-memory compute unit is used to store the synaptic weights and to perform the forward and backward passes, whereas the weight changes are accumulated in high precision (Figure 8a).[45, 46] When the accumulated weight exceeds a certain threshold, pulses are applied to the corresponding memory devices to alter the synaptic weights. This approach was tested using the handwritten digit classification problem based on the MNIST data set. A two-layered neural network was used with two-PCM devices in differential configuration (≈400 000 devices) representing the synaptic weights. Resulting test accuracy after 20 epochs of training was ≈98% (Figure 8b). After training, inference on this network was performed for over a year with marginal reduction in the test accuracy. The crossbar topology also facilitates the estimation of the gradient and the in-place update of the resulting synaptic weight all in O(1) time complexity.[38, 47] By obviating the need to perform gradient accumulation externally, this approach could yield better performance than the mixed-precision approach. However, significant improvements to the memristive technology, in particular, the accumulative behavior, are needed to apply this to a wide range of DNNs.[48, 49]

DL training. a) Schematic illustration of the mixed-precision architecture for training DNNs. b) The synaptic weight distributions and classification accuracies are compared between the experiments and floating point baseline. Reproduced with permission.[44] Copyright 2020, Frontiers Media.

Compared with the charge-based memory devices that are also used for in-memory computing,[50-52] a key advantage of memristive devices is the potential to be scaled to dimensions of a few nanometers.[53-57] Most of the memristive devices are also suitable for back end of line integration, thus enabling their integration with a wide range of front-end CMOS technologies. Another key advantage is the non-volatility of these devices that would obviate the need for computing systems to be constantly connected to a power supply. However, there are also challenges that need to be overcome. The significant intra-device and intra-device variability associated with the LRS and HRS states is a key challenge for applications where memristive devices are used for logical operations. For applications that rely on analog storage capability, a significant challenge is programming variability that captures the inaccuracies associated with programming an array of devices to desired conductance values. In ReRAM, this variability is attributed mostly to the stochastic nature of filamentary switching, and one prominent approach to counter this is that of establishing preferential paths for conductive filament formation.[58, 59] Representing single computational elements using multiple memory devices is another promising approach.[60] Yet another challenge is the temporal and temperature-induced variations of the programmed conductance values. The resistance “drift” in PCM devices, which is attributed to the intrinsic structural relaxation of the amorphous phase, is an example. The concept of projected PCM is a promising approach toward tackling “drift.”[61, 62] Other memristor technologies could have similar issues related to states retention or occurrence of random telegraphic noise. The requirements that the memristive devices need to fulfill when used for computational memory are heavily application-dependent. For memristive logic, high cycling endurance (>1012 cycles) and low device-to-device variability of the LRS/HRS resistance values are critical. For computational tasks involving read-only operations, such as MVM, it is required that the conductance states remain relatively unchanged during their execution. It is also desirable to have a gradual analog-type switching characteristic for programming a continuum of resistance values in a single device. A linear and symmetric accumulative behavior is also required in applications where the device conductance needs to be incrementally updated such as in DL training.[63] For stochastic computing applications, random device variability is not problematic, but any device degradation should be gradual and graceful to compensate for variations in switching voltages.[64] We further summarize some common device non-idealities and proposed approaches to minimize the adverse effects in Section 6.

4 SNNs and Memristors

As opposed to the DL networks discussed earlier, SNNs can more naturally incorporate the notion of time in signal encoding and processing. SNNs are typically modeled on the integrate-and-fire behavior of neurons in the brain. In this framework, neurons communicate with each other using binary signals or spikes. The arrival of a spike at a synapse triggers a current flow into the downstream neuron, with the magnitude of the current weighted by the effective conductance of the synapse. The incoming currents are integrated by the neuron to determine its membrane potential, and a spike is issued when the potential exceeds a threshold. This spiking behavior can be triggered in a deterministic or probabilistic manner. Once a spike is issued, the membrane potential is reset to a resting potential or decreased according to some predetermined rule. The integration is limited to a specific time window, or else a leak factor is incorporated in the integration, endowing the neuron model with a finite memory of past spiking events.

Compared with the realization of the second-generation DNNs (discussed in the previous section), SNNs can potentially have significant improvements in efficiency. The first reason for this comes from the underlying signal encoding mechanism. The calculation of the output of a neuron involves the determination of the weighted sum of synaptic weights with real-valued neuronal outputs of the previous layer. For a fully connected second-generation DNN with N neurons in each layer, this requires 

N

2

 multiplications of real-valued numbers, typically stored in low precision representations. In contrast, the forward propagation operation in an SNN only requires addition operations, as the input neuronal signals are binary spike signals. To elaborate, assume that the input signal is encoded as a spike train with duration T, with a minimum inter-spike interval of 

Δ

t

. If the probability of a spike at any instant of time is p, then, on an average, 

N

p

T

/

Δ

t

 spikes have to be propagated through the synapses, and this requires 

N

2

p

T

/

Δ

t

 addition operations. In most modern processors, the cost of multiplication, 

C

m

, is 3–4 times higher than that of addition, 

C

a

. Hence, provided the neuronal and synaptic variables required for computation are available in the processor, SNNs offer a path to more efficient computation if the inequality

 

C

a

p

(

T

Δ

t

)

<

C

m

(1)

holds. Hence, it is important to develop algorithms for SNNs that minimize p and 

(

T

/

Δ

t

)

 to improve computational efficiency. This requires the use of sparse binary signal encoding schemes that go beyond rate coding that is typically used in SNNs today. The following section will discuss strategies to develop general-purpose learning rules for SNNs that satisfy such constraints. The second potential for efficiency improvement of SNNs arises because of novel memristor-based processor architectures. While SNNs can be implemented using Si CMOS static random-access memory or dynamic random-access memory technologies, the advent of novel nanoscale memristive devices provides opportunities for significant improvements in overall computational efficiency.

As mentioned in the previous section, memristive devices can be integrated at the junctions of crossbar arrays to represent the weights of synapses, and CMOS circuits at the periphery can be designed to implement the neuronal integration and learning logic. The small form factor of the devices, coupled with the scalability of operating voltages and currents beyond what is possible with conventional CMOS, suggests that these architectures can have several orders of magnitude efficiency improvement over silicon-based implementations.[65, 66] However, apart from the already mentioned non-idealities of memristive devices, crossbar arrays with more than 2048 × 2048 devices present reliability issue due to the resistance drop on the wires and the sneak paths that corrupt the measurement and programming of synaptic states. One approach to mitigate these issues is to design neurosynaptic cores with smaller crossbars and associated neuron circuits, tile these cores on a 2D array, and provide communication fabrics between the cores.[67] Such tiled neurosynaptic core-based designs are particularly amenable for realizing SNNs, as only binary spikes corresponding to intermittently active spiking neurons need to be transported between cores, as opposed to real-valued neuronal variables that are active for all the neurons in the core in the case of DL networks. This is the second inherent advantage that SNNs have over DNNs for computational efficiency improvement.

Overcoming the reliability challenges mentioned earlier is essential for building reliable systems and would require the co-optimization of algorithms and architectures that are designed to mitigate or leverage these non-ideal behaviors for computation. Two kinds of systems can be visualized based on the application mode. Inference engines, which do not support on-chip learning, can be designed based on memristive devices integrated on crossbars, where the devices are programmed to the desired conductance states based on the weights obtained from software training. However, as memristive devices support incremental conductance changes by the application of suitable electrical programming pulses, it is also possible to design learning systems where network weight updates are implemented on-chip in an event-driven manner.[68] There are also many recent examples where these devices have been engineered to mimic the integration and fire characteristics of biological neurons,[69-71] potentially enabling the realization of all-memristor implementations of SNNs.[72] The field is still in its infancy and, so far, has only witnessed small proof-of-concept demonstrations. We now discuss some of the approaches that have been explored toward realizing memristive-based inference-only spiking networks as well as learning networks with SNNs.

Bio-inspired Computing with Memristors

4.1 Memristive SNNs for Inference

A common approach to develop SNNs is to start with a second-generation ANN trained using traditional backpropagation-based methods, and then convert the resulting network to a spiking network in software. These solutions are based on weight-normalization schemes, so that the spike rates of the neurons in the SNN are proportional to the activations of the neurons in the ANN.[73, 74] While this should, in principle, result in SNNs with comparable accuracies as their second-generation counterparts, some device-aware re-training would typically be necessary when the network is implemented in hardware due to the non-linearity and limited dynamic ranges of nanoscale devices.

One of the differentiating features of inference engines is that the nanoscale devices storing state variables are programmed only rarely, compared with the number of reads (potentially at every inference cycle). As higher energy programming cycles have a stronger effect in degrading device lifetimes compared with the lower energy read cycles, this mode of operation can have better overall system reliability compared with that of learning systems.

In a preliminary hardware demonstration leveraging this approach, Midya et al. used memristors based on SiOxNy:Ag to implement compact oscillatory neurons whose output voltage oscillation frequency is proportional to the input current.[75] In this proof-of-concept demonstration of a three-layer network, ANN to SNN conversion was limited to the last layer alone, but the approach could be extended to hidden layers as well.

4.2 Memristive SNNs for Unsupervised Learning and Adaptation

Most hardware demonstrations of SNNs using memristive devices have focused on the unsupervised learning paradigm, where the synaptic weights are modified in an unsupervised manner according to the biologically inspired STDP rule.[76] The rule captures the experimental observation that when a synapse experiences multiple pre-before-post pairings, the effective synaptic strength increases, and conversely, multiple post-before-pre spike pairs result in an effective decrease in synaptic conductance.

It should be noted that while other biological mechanisms may also play a key role in learning and memory formation in the brain, as have been observed experimentally,[77, 78] STDP is a simple local learning rule, which is especially straightforward to implement in hardware. While it is possible to implement-timing-dependent plasticity rules using many-transistor CMOS circuits,[79] it was experimentally demonstrated early on that memristive devices can exhibit STDP-like weight adaptation behaviors upon the application of suitable waveforms.[68, 80, 81] Going beyond individual device demonstrations, IBM has also demonstrated an integrated neuromorphic core with 256 × 256 PCM synapses fabricated along with Si CMOS neuron circuits capable of on-chip learning based on a simplified model of STDP for auto-associative pattern learning tasks.[82]

Boybat et al. used phase change memristive synapses to demonstrate temporal correlation detection through unsupervised learning based on a simplified form of STDP,[60] as shown in Figure 9. In their experiment, a multi-memristive architecture was introduced, where N PCM devices are used to represent one synapse, with all devices within a synapse read during spike transmission, but only one of the devices, selected through an arbitration scheme, is programmed to update the synaptic weight. Software equivalent accuracies could be obtained in the experiment with this scheme, although the individual devices are plagued by several common non-ideal effects, such as programming non-linearity, read noise, and conductance drift. Note that with 

N

=

1

 device representing a synapse, the network accuracy was significantly lower than the software baseline; 

N

=

7

 devices were necessary to obtain close to ideal performance.

a) Unsupervised learning demonstration using multi-memristive PCM architecture. The network consists of an integrate and fire neuron receiving inputs from 1000 multi-PCM synapses, with each synapse being excited by Poisson generated binary spike streams. 10% of the synapses receive correlated inputs, whereas the rest receive uncorrelated inputs. The weights evolve based on the simplified STDP rule shown. b) With N = 7 PCM device per synapse, the correlated and uncorrelated synaptic weights evolve to well-separated values, whereas with N = 1, the separation is corrupted due to programming noise.

Spiking networks can also be used for other unsupervised learning[83] and adaptation tasks. Recently, Fang et al. demonstrated that certain optimization problems could be solved driven by the coupled dynamics of ferroelectric field-effect transistor (FeFET)-based spiking neurons.[84] While there was no synaptic weight adaptation in this approach, the optimal solution to the problem is determined by the coupled interactions between the neurons, which modulate each other's membrane potentials in an event-driven manner.

All-memristive neuromorphic computing with level-tuned neurons


4.3 Memristive SNNs for Supervised Learning

Compared with the previous two approaches, implementing supervised learning in SNNs is a more challenging task, as the algorithm and the network must generate spikes at precise time instants based on the input excitation. As opposed to the backpropagation algorithm that is highly successful in training ANNs, supervised learning algorithms for SNNs are not well developed yet, due to the inherent difficulty in applying gradient descent methods for spiking neuron models with infinite discontinuities at the instants of spikes. Nevertheless, there have been several demonstrations of supervised learning algorithms for SNNs based on approximate forms of gradient descent for simple fully connected networks.[85-87]

Recently, Nandakumar et al. demonstrated a proof-of-concept realization of supervised learning in a two-layer SNN implemented using nanoscale PCM synapses based on the Normalized Approximate Descent Algorithm.[88] In the experiment, 132 spike streams representing spoken audio signals generated using a silicon cochlea chip were used as the input, and the network was trained to generate 168 spike streams whose arrival times indicate the pixel intensity corresponding to the spoken characters.[88] Compared with normal classification problems in deep networks where the accuracy depends only on the relative magnitude of the response of the output neurons, the SNN problem is harder, as the network is tasked with generating close to 1000 spikes at specific time instances over a period of 1250 ms from 168 spiking neurons that are excited by 132 input spike streams. The accuracy for spike placement obtained in the experiment was about 80% compared with the software baseline accuracy of over 98%, despite using the same multi-memristive architecture described earlier. This experiment is, hence, illustrative of the need for developing more robust and event-driven learning algorithms for SNNs that can mitigate or even leverage the device non-idealities for designing computational systems (Figure 10).

a) SNN supervised learning experiment. A two-layer network is tasked with generating 1000 ms long spike streams from the 168 neurons at the output corresponding to the images of the spoken characters. The inputs to the network are 132 spike streams representing the characters subsampled from the output of a silicon cochlea chip. The weights are modified based on the NormAD learning rule. b) Using multi-PCM synapses, the accuracy of spike placement at the output is about 80%, compared with the FP64 accuracy of close to 98%. Reproduced with permission.[88] Copyright 2020, Springer Nature.

In summary, spike-based learning and inference are promising facets of the neuromorphic computing paradigm. Unlike conventional ML models, spike-based processing “computes with time, not in time.”

5 Some Challenges of Memristor Technologies

While memristor technology shows a huge promise for a wide range of applications, several significant challenges need to be addressed to make them commercially viable. It is essential to recognize that different applications have different device requirements. There are general challenges related to reliability, fabrication, uniformity, and scalability that need to be addressed regardless of the application. The family of memristor technologies is diverse, and different technologies come with different kinds of device and system non-idealities. These are covered extensively in the available literature,[89] where scalability and reliability issues are discussed in the context of specific memristor technologies. While some of the stringent requirements typically necessary for non-volatile data storage and memory applications might be relaxed for analog and neuromorphic computing, other additional device properties might be specifically required. Here, we will discuss some key examples of typical non-idealities relevant for the types of computing applications covered in this report, as well as some approaches that may be used to mitigate them.

One of the most prominent issues relates to the non-linearity in current/voltage characteristic seen many memristive devices. This prevents accurate vector-matrix computation using memristive crossbars, as the output current does not depend linearly on the applied voltage, and the linear relationship (I = G V) assumed cannot be used over the whole voltage range. Few techniques have been proposed to deal with I/V non-linearities such as hot-forming step prior to programming,[90] or use of transistor selector elements (1T1R architectures).[91] By elevating the temperature to 150 °C during the electroforming step, it is possible to increase the number of oxygen vacancies generated in Ta2O5 ReRAM devices and form denser conductive filaments. Denser filaments formed at the elevated temperatures exhibit much better I/V linearity compared with those formed at the room temperature. Using a transistor as a selector element in the 1T1R architecture allows for the much better control of the current compliance during the set and the electroforming processes. This is a likely reason for the improved I/V linearity.

The next non-ideality concerns the cycle-to-cycle and device-to-device variability that is inherent to nanoscale devices. This could be addressed partially by improved materials engineering, such as the use of ultra-thin atomic layer deposition-TiN buffer layers,[92] or the engineering of oxide–metal interfaces[93, 94] and oxide microstructure[95] that improves the stability of operating characteristics. These techniques are used to restrict filaments formation to the particular sites and limit significant variations in filaments configuration from cycle to cycle. Faulty devices could be considered by adapting mapping schemes (the way that weights in ANNs are mapped onto the memristor conductance states), and using redundancy techniques.[96] The issue of limited dynamic range (i.e., separation between on- and off-state conductance) could be extended using two or more devices whose conductances are assigned different significances to represent a single weight.[49] Adverse effects that arise due to the finite resistance of metal wires used for the crossbar are mitigated using advanced mapping or compensation schemes.[97]

Recently, a technology agnostic technique called committee machines (CMs) has demonstrated significant potential in increasing the inference accuracy when dealing with several non-idealities.[98] The approach does not assume any prior knowledge of particular non-idealities or the use of customized training procedures. The main idea behind CMs is to use committees of smaller neural networks and to average the inference outputs. This leads to higher inference accuracy, even with the same total number of devices used (the total number of weights from all members of the committee is equal to the number of weights in a single large neural network). This is shown in Figure 11. Put simply, it is advantageous to combine several smaller neural networks rather than to use a single large neural network when dealing with non-perfect memristor devices.

Effectiveness of CMs to dealing with different types of device and system non-idealities. a) Inference accuracy of committees (of size 5) as a function of the accuracy of the individual networks that constitute the committees. All data points above the dashed line indicate improvement in accuracy. b) Median accuracy achieved by individual networks and committees of networks plotted against the total number of synapses required for these neural networks or their committees.

While the abovementioned requirements are sufficient for building reliable inference engines, on-chip training in DL accelerators also requires that device conductance can be programmed in an a linear fashion. Ideally, it would be possible to increase or decrease the device conductance linearly based on the number of identical (positive or negative) voltage pulses, and not depending on the current state. If the devices could be linearly tuned, simple programming schemes to implement on-chip learning can be used without requiring complex peripheral circuits. However, most memristor technologies exhibit highly non-linear programming characteristics. The resistance change is not only dependent on pulse width/amplitude, but also on the current resistance state. Much work has been done to obtain devices that exhibit linear programmability, as well as to develop specific programming schemes that would improve non-linear resistance modulation. One such scheme is the state-dependent (SD) programming that instead of identical pulses uses exponentially increasing pulse width while keeping the amplitude fixed.[99] Excellent programming linearity can be obtained using this scheme; however, the drawback is that the complexity of peripheral control circuitry increases, limiting the overall power-efficiency gains, as additional memory is required to record current conductance states and to verify them before every programming pulse. Another approach is to use identical pulses consisting of one programming pulse and one offset pulse of opposite polarity. This approach, bipolar scheme (BP), does not require tracking of current resistance states as in the case of SD, and the circuitry is not as complex and limiting.[100] Materials optimization includes the use of bilayers that are shown to improve non-linear programming.[101]

It is important to note that the majority of non-idealities are studied in single devices or small arrays, and the available statistics is still limited. For memristor technology to gain full maturity, there is a great need for more extensive reliability studies, where adverse effects from lower tail bits might become more apparent.

Small Molecule Memristors for Neuromorphic Computing by Aaron Cookson

6 Harnessing Hardware Randomness for Learning

As discussed in the previous sections, the implementation of standard deterministic systems may be severely impaired in hardware implementations whose components are inherently noisy. This is the case for memristive implementation of deterministic ANNs and SNNs in which the synaptic weights are defined by the conductance of memristors. As also seen, state-of-the-art solutions are predicated on the need to mitigate the adverse effects. In this section, we explore the idea that, if properly harnessed, native hardware randomness can be an asset, rather than a nuisance, for the purpose of developing learning and inference machines.

The main argument in favor of harnessing, rather than mitigating, randomness is that it enables the implementation of the primitive of sampling without the need for specialized hardware. Sampling, that is, drawing random numbers from a probability distribution, is a key step for the deployment of probabilistic models and of Bayesian learning and inference strategies.[102, 103] As we will discuss, probabilistic spiking neuron models have potential advantages over conventional deterministic counterparts, such as leaky integrate-and-fire, in facilitating the development of flexible learning rules. Furthermore, as will also see, unlike standard learning strategies, Bayesian learning can quantify uncertainty, enhance generalization, and provide tools for a principled exploration of the parameter space.

There are generally two way to inject noise—at the level of the activation of each neuron and at the level of the synaptic weights. The first approach enables the implementation of probabilistic spiking behavior. For example, the stochasticity of the switching process in ReRAM-based memristor devices has been utilized to this end.[104] Synaptic sampling, instead, allows the deployment of Bayesian learning and inference, and a number of different hardware platforms, including memristors, have been proposed for this purpose.[105-108]

In the following, we first review the role of probabilistic spiking models for learning, and then provide a short discussion of Bayesian methods. This discussion is aimed at offering some guideline on the development of suitable hardware platforms and on the exploration of properties of memristive devices that typically seen as disadvantageous. Although these algorithmic concepts are currently less well explored in direct hardware implementations than standard deterministic methods, we believe that memristor technologies could be particularly well suited for their efficient implementations.

Brains Behind the Brains: Mike Davies and Neuromorphic Computing at Intel Labs | Intel

6.1 Probabilistic SNNs

As we have discussed, deterministic spiking neuron models such as leaky integrate-and-fire define non-differentiable functions of the synaptic weights: Increasing or decreasing the synaptic weights of a spiking neuron may cause the membrane potential to cross or step back from the spiking threshold, causing an abrupt change in the output. The derivative with respect to the weights is, hence, zero except around the firing threshold, where it is undefined. As a result, standard gradient-based learning rules cannot be directly derived for deterministic models of SNNs. In probabilistic SNN models, a neuron spikes probabilistically with a probability that increases with its membrane potential. Probabilistic models for SNNs solve the outlined problem by defining as learning criterion a function of the probability of spiking according to the desired spatio-temporal patterns. This function is differentiable in the weight vectors.

To elaborate, in supervised and unsupervised learning, the learning problem can be formulated as the minimization of a loss function that measures the degree to which the spiking behavior of neurons in a readout layer conforms to the desired behavior dictated by the training set.[109, 110] This problem is differentiable when defined in terms of the probability of the spiking signals in the readout layer.[105] In reinforcement learning, the goal is to minimize a time-averaged reward signal obtained by the learning agent, as it interacts with the environment, making observations and taking actions. A reinforcement learning agent is, hence, faced with the problem of balancing the need for exploration of the parameter space with that of exploiting its current knowledge to increase the reward signal. This can be done by optimizing over a probabilistic policy that chooses actions with a confidence that increases as training proceeds. The resulting optimization problem can be formulated in terms of the probability of the behavior of the neurons in the readout layer when the latter is converted into actions.[110]

Once a learning criterion is determined based on the problem under study, because of the differentiability of the function to be optimized, training can be carried out via stochastic gradient-based rules.

As a related note, we observe that another advantage of probabilistic SNN models is that they can be directly extended with minor conceptual and algorithmic difficulties to allow for multi-valued spikes or inter-neuron instantaneous connections or, equivalently, Winner Take All (WTA) circuits.[111] This is particularly important, because data produced by some neuromorphic sensors incorporate a sign to indicate a positive or negative change.[112] Furthermore, various decoding rules, such as first-to-spike, can be directly optimized for, instead of having to rely on surrogate target spiking sequences.[113]

To illustrate the potential advantages of probabilistic SNN models, we consider a standard reinforcement learning task, in which a learning agent acts in a grid world with the aim of finding the shortest way to an unknown goal location. Two approaches are compared. The first is the standard method of training an ANN model and converting the trained weights for use in an SNN with the same architecture. The alternative approach directly trains a probabilistic SNN as a stochastic policy that selects actions, i.e., moves in the grid world, by trading exploration and exploitation. Figure 12 compares the performance of these two solutions as a function of the resolution of the input grid representation. The results clearly validate the intuition that directly training the stochastic policy, as a probabilistic SNN is more effective, as well as efficient in terms of number of spikes, than using ANN-to-SNN conversion.

Consider the standard reinforcement learning task in which a learning agent aims at finding a shortest path to an unknown goal point in a grid world through episodic interactions with the environment. The figure shows the time steps needed to reach the goal and the number of spikes per episode for the standard approach of converting a pre-trained ANN and for the direct training or a probabilistic SNN.

Reproduced with permission.[110] Copyright 2019, IEEE.

6.2 Bayesian SNNs

As mentioned, synaptic sampling enables the implementation of Bayesian inference and learning.[105-108] Unlike the conventional approach considered thus far of identifying a single set of parameter vectors during learning, the Bayesian principle prescribes the inference of a probability distribution over the synaptic weights. While in the presence of sufficient data, this distribution concentrates on the optimal weight configuration, when data are limited, the synaptic weight distribution provides a “credibility” profile in the parameter space. This, in turn, allows the assessment of uncertainty and of the confidence of the model's decisions or actions, as well as the principled exploration of the parameter space.

Efficient Bayesian learning methods rely on the capacity of the model to draw samples from the current weight distribution. This is important both during inference, to obtain a credibility profile over outputs, and during learning, to enable exploration. Initial efforts toward the implementation of Bayesian methods on hardware include references.[105-108] These papers implement synaptic sampling by including additional circuitry. In contrast, we envision that the inherent randomness of switching processes in memristive devices could provide a source of randomness “for free.” Memristors may, hence, be the missing piece that will unlock the potential of spike-based computing

Joshua Yang: Memristive Materials and Devices for Neuromorphic Computing

7 Future of Neuromorphic and Bio-Inspired Computing Systems

Taking a “big picture” view, current AI and ML methods, in particular, have achieved astonishing results in every field they have been applied to and have become or are becoming standard tools for nearly every type of industry one can think of. This impressive invasion was mainly propelled by DL, which is loosely inspired by biological neural networks.

DL primarily refers to learning with ANNs of many layers and, fundamentally, is not different to what we know about that field in the 1990s. Indeed, the key algorithm underlining the success of DL, backpropagation, is an old story: “Learning with backpropagating errors” by Rumelhart et al. was published in 1986.[114] The most commonly used neural networks are feedforward neural networks, and convolutional neural networks used for image processing can be seen as inspired by our visual system, and both of these are not very new concepts.

Backpropagation is, perhaps, the most fundamental method we can think of for parameter optimization. It is derived by differentiating an error function with respect to the learnable parameters, so in some ways, it is not entirely surprising that the algorithm existed for many years. What might be somehow surprising is that we have not been able to move away much from this idea. While there has been recent progress, much of it consisted of relatively small additions and tweaks, for instance, new ways to address the so-called “problem of the vanishing gradient,” the deterioration of the error signal as is backpropagated from the output to the input of the network. Undoubtedly, there were some fundamentally different architectures, smart techniques, and novel analyses, but, arguably, the key factor behind such a success seems to be the vast availability of data and computational power.

In fact, recent advances of the neuroscience community are not present in the neural networks. We do not want to argue that this, per se, is either good or bad, or to suggest that the next super-algorithms will be copying nature. We only want to underline that though inspired only, ANNs had their basis on neuroscience concepts and that there are many phenomena that have, perhaps, not been sufficiently explored within an AI context. For instance, biological neural networks have different learning rules for positive and negative connections; connections change in multiple time scales and show reversible dynamic behavior (known as short-term plasticity), and the brain itself has a structure where specific areas play different roles, just to name a few.

Instead, our progress was mainly based on hardware improvements that made this success possible by allowing long training phases; an amount of training unrealistic for any human. While it is true that human intelligence also develops over years and that human learning involves many trials, for comparison, AlphaGoZero, which surpass human performance in the game of Go, was trained over 4.9 million games.[115] To match this number of games would require a human that lives for 90 years to complete one Go game every 10 min from the moment they are born. This realization tells us two things: 1) our machines do not learn the same way that humans do, and even if we think our methods as bio-inspired, we likely still miss some key ingredients and 2) executing that many games certainly requires considerable computational power and energy consumption.

As a consequence, training algorithms often require a high energy footprint due to excessive training times and hyper parameter tuning involved. Hyper-parameters are parameters of the system that are not (usually) adapted via the learning method itself; one such example is the learning rate, which indicate how fast the network should update its “knowledge.” Before rushing to say that a high learning rate is obviously desirable, such a learning rate could lead to oscillations as, for instance, optimal solutions could be overlooked, or it could lead to forgetting previously obtained knowledge. Setting the learning rate right is not always trivial. In fact, the tuning of hyper-parameters was what originally made the ML community to turn away from ANNs, and it was the performance of DL that brought the focus back. One may then wonder, at the end of the day how much energy inefficient could DL systems be? The reply is, perhaps, surprising: estimated carbon emissions for training standard natural language processing models is approximately five times higher than running a car for a lifetime.[116] This realization suggests there is an urgent need to improve on both current hardware and learning models.

Given such energy concerns, systems based on low-power memristive devices are a highly promising alternative.[117, 118] Besides having a low carbon footprint, many studies demonstrated devices that mimic neurons, synapses, and plasticity phenomena. Often, such approaches work well for off-line training. However, some of these attempts, particularly where plasticity is involved, are opportunistic (including own work), and how scaling to larger networks could happen is not always obvious. Faithfully reproducing the brain functionality, when neuroscience has already so many open questions, is challenging for any technology. Moreover, using technologies that potentially allow less possibilities for engineering in comparison with traditional methods (such as CMOS) might well be mission impossible. How far can we go by reconstructing neuron by neuron and synapse by synapse in terms of scalability remains unclear. A more promising way might be to achieve a deeper understanding of the physics of the relevant materials and based on this understanding co-develop the technology and the required learning methods for achieving AI.

In the meantime, in parallel, we can immediately explore simple bio-inspired approaches that harness the dynamics of the material and could be proven useful for particular sets of problems. Here, we present one such example, which stems from the area of reservoir computing, an idea invented separately by Herbert Jeager for the ML community,[119] under the name of echo state networks, and by Wolfgang Mass[120] for the computational neuroscience community, under the name of liquid state machines. We strongly suspect that both these methods were very much motivated by the difficulty of training recurrent networks with a generalization of backpropagation known as backpropagation through time. While feedforward networks can perform many tasks successfully, recurrences are required for memory, and moreover, the brain is clearly not only feedforward. If recurrences exist and are required, there must be a way to efficiently train such structures. As a side note, it is very difficult to imagine how a biological neural network would be able to implement backpropagation through time, and for this alternative, approaches have recently made their appearance.[121]

Reservoir computing methods came up with a workaround to the problem of training recurrent networks: they do not train them but instead harness their properties. Common in the approaches of echo state networks and liquid state machines is the idea of using a randomly recurrent network with fixed connectivity, hence no need to resort to backpropagation through time. This recurrent network is called a reservoir. It provides memory and at the same time transforms the input data to a spatiotemporal representation of higher dimensionality. This enhanced representation can be used as an input to single-layer perceptrons that are trained with a very simple learning method, so the only learnable parameters are the feedforward weights between the reservoir neurons and the output neurons. The key difference between the echo state networks and liquid state machines is that the first approach uses recurrent artificial neuron dynamics, whereas the second uses recurrent SNNs, reflecting the mindset in their corresponding communities. The main principle of reservoir computing is shown in Figure 13. The input x(t) is projected into the higher dimensional feature space r(t) using the dynamical reservoir system. Only the weights connecting the internal states r(t) with the output y(t) need to be trained, whereas the rest of the system is fixed. The advantage of this approach is that it only requires a simple training method, whereas the ability to process complex and temporal data is retained.

Reservoir computing maps inputs x(t) to higher dimensional space, defined by the reservoir states r(t). Only weights connecting reservoir states r(t) and output y(t) need to be trained.

Indeed, it might be surprising how much randomness can do from the point of computation: a random network can enrich data representations sufficiently, so that a linear method can separate the data into the desirable classes. This approach is conceptually similar to the well-known method of support vector machines, which uses kernels to augment the dimensionality of the data, so that again only a simple linear method is sufficient to achieve data classification. In fact, a link between the purely statistical technique of support vector machines and the bio-inspired technique of reservoir computing has been formally built.[122] We can, perhaps, think of this link as a demonstration that biological inspiration and purely mathematical methodology might also solve problems in a similar manner.

We claim that reservoir computing would benefit from appropriate hardware. When simulating, the convergence of the recurrent network requires time, because the continuous system will be discretized and sequentially run on the central processing unit. If instead we replace the reservoir with an appropriate material, this step could become both fast and energy efficient: the material could compute effortlessly using its physical properties. Reservoirs do not need overengineering, because no specific structure is required; we only need to produce dynamics that are complex enough but not chaotic. In fact, there has already been work exploiting memristors in this direction.[123]

Could ideas from biology still add value to existing methods? A recent augmentation of the echo state networks,[124] inspired by the fruit fly brain, explores the concept of sparseness to improve learning performance of reservoirs. In brains, contrary to the typical ANNs, only few neurons fire at a time, a fact that has been linked to memory capacity. Neuronal thresholds appropriately initialized and updated with a slower time constant than that of the feedforward learnable weights can modulate sparseness and lead to better performance in comparison with the non-sparse reservoir, but also in comparison with state-of-the-art methods in a set of benchmark problems. Due to the sparseness leading to task-specific neurons, this bio-inspired technique can alleviate the problem of catastrophic forgetting. ML methods often suffer from the fact that once they learn a new task, they have forgotten the previous one. As in the space reservoir network, a new task will likely recruit previously unused neurons, and learning a new skill does not completely override those previously learned. This simple method competes and surpasses more complicated methods that are built specifically to address catastrophic forgetting. Most importantly, the formulation of the specific rule allows for completely replacing the network dynamics with any other dynamics, including material dynamics, that are suitable for the purpose (i.e., highly non-linear and not chaotic). Perhaps, there are more such lessons to be learned from biology.

So, what can be done right now? To us, it is clear that a better understanding of the physics behind memristive devices is key for the progress of the field.[125] A deeper understanding will allow us to harness the properties of the system for brain-like computation rather trying to fabricate some arbitrary brain behavior that may or may not be important in the context of a specific application, or worse may not scale up. Instead of thinking at the level of mimicking neurons and synapses, we can instead take inspiration from the biological systems, consider the dynamics required for neuronal processing, and use the material physics to reproduce them.

IBM’s Incredible TrueNorth Chip || Neuromorphic Computing

8 Conclusions

Memristor technologies are still to realize their full potential that has been promoted over the last 15 years. Although predominantly seen as candidates to replace or augment our current digital memory technologies, the impact of memristor technologies on the broader fields of AI and cognitive computing platforms is likely to be even more significant. As discussed in this progress report, the versatility of memristor technologies has resulted in their use across a range of applications: from in-memory computing, DL accelerators, and SNNs, to more futuristic bio-inspired computing paradigms. These approaches should not be seen as solutions to the same problem, nor as technologies that are in direct competition among themselves or with current, very successful, CMOS systems. In addition, it is crucial to recognize that many of the discussed research areas are still at the very beginning of their development. Of these, more mature approaches will likely produce industrially relevant solutions sooner. For example, greater power efficiency is an essential utility and a pressing issue that many engineers are trying to address. In-memory computing and DL accelerators based on memristors represent an attractive proposition for extreme power efficiency.

There is also significant scope for more fundamental work. Development of new generations of bio-inspired algorithms would further boost advancements in hardware systems and platforms. The challenge and opportunity lie in the interdisciplinary nature of the research and the necessity to understand distinct methodologies and approaches. We believe that the community will benefit from the next generation of researchers being well educated across different traditional disciplines. For example, there is an undeniable link between the fields of computer science, more specifically, ML, and computational neuroscience. The two disciplines could co-exist separately and act independently with distinct goals; however, there are great benefits to be gained from a more holistic approach. A strong case for closer collaboration has been made recently.[126] Collaborations should be expanded to include researchers in solid-state physics, materials science, nanoelectronics, circuit/architecture design, and information theory. Memristors show great promise to be a fabric for producing brain-inspired building blocks,[127] and this progress report showcases different types of memristor-based applications. Memristor technologies are versatile enough to provide the perfect platform for different disciplines to strive together in pushing the frontiers of our current technologies in the most fundamental way. There are many specifics around memristive technologies, which have not been covered in this progress report. For example, optical control of memristive devices could potentially bring additional benefits in terms of higher bandwidth, lower cross talk, faster operational speeds, and integrate sensing together with memory and processing.[128-131] Integration of multiple functionalities in a compact nanodevice could lead to even better power efficiency of neuromorphic systems, such as artificial retinas.[132]

The progress report presents a broader landscape of ways memristors could be utilized for future computing systems. We aim to provide a general overview of the main approaches currently being pursued. In addition, the report provides some speculations about what might be missing in the research field and what efforts could be fruitful for the future. As such, the report does not include all details of different memristor technologies, materials systems, or specific technology challenges. There is a number of excellent review articles that cover specifics in much more details, and we refer readers to those. More specifically, an excellent overview of memristor-based electronics that discussed future prospects and current challenges can be found in the previous study.[7] More details about in-memory computing, including digital and analog schemes, are covered in the previous study.[16] The use of PCMs for brain-inspired computing is specifically discussed in the previous study,[23] whereas the use of redox-based memristors can be found in another review.[133] Other types of memristive technologies, not mentioned in this progress report, including ferroelectric memories, non-filamentary resistive random-access memory, and topological insulators, are covered in the recent guest editorial.[134] Integration of CMOS and memristive technologies for neuromorphic applications is discussed in the previous study.

Advances in neuromorphic computing technology




More Information:

https://onlinelibrary.wiley.com/doi/full/10.1002/aisy.202000105

https://www.intel.com/content/www/us/en/research/neuromorphic-computing-loihi-2-technology-brief.html

https://www.nature.com/articles/s41467-018-07565-4

https://www.nature.com/articles/s43588-021-00184-y

http://meseec.ce.rit.edu/551-projects/winter2011/2-5.pdf

https://www.frontiersin.org/articles/10.3389/fnins.2020.00551/full

https://download.intel.com/newsroom/2021/new-technologies/neuromorphic-computing-loihi-2-brief.pdf

https://www.youtube.com/playlist?list=PL5Q2soXY2Zi-Mnk1PxjEIG32HAGILkTOF

https://arxiv.org/pdf/2105.05956.pdf

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjh78b72df6AhVPgv0HHZIsAvAQFnoECBEQAQ&url=https%3A%2F%2Fwww.cs.utah.edu%2F~rajeev%2Fcs7960%2Fnotes%2Fslides%2F19-7960-01.pptx&usg=AOvVaw2ZJY5hENGbhWpnghn4eTUB



RRAM Compute-In-Memory Hardware For Edge Intelligence

$
0
0

 





Reconfigurable NueRRAM-CIM architecture

A NeuRRAM chip consists of 48 CIM cores that can perform computation in parallel. A core can be selectively turned off through power gating when not actively used, whereas the model weights are retained by the non-volatile RRAM devices. Central to each core is a TNSA consisting of 256 × 256 RRAM cells and 256 CMOS neuron circuits that implement analogue-to-digital converters (ADCs) and activation functions. Additional peripheral circuits along the edge provides inference control and manages RRAM programming.

The TNSA architecture is designed to offer flexible control of dataflow directions, which is crucial for enabling diverse model architectures with different dataflow patterns. For instance, in CNNs that are commonly applied to vision-related tasks, data flows in a single direction through layers to generate data representations at different abstraction levels; in LSTMs that are used to process temporal data such as audio signals, data travel recurrently through the same layer for multiple time steps; in probabilistic graphical models such as a restricted Boltzmann machine (RBM), probabilistic sampling is performed back and forth between layers until the network converges to a high-probability state. Besides inference, the error back-propagation during gradient-descent training of multiple AI models requires reversing the direction of dataflow through the network.

However, conventional RRAM-CIM architectures are limited to perform MVM in a single direction by hardwiring rows and columns of the RRAM crossbar array to dedicated circuits on the periphery to drive inputs and measure outputs. Some studies implement reconfigurable dataflow directions by adding extra hardware, which incurs substantial energy, latency and area penalties (Extended Data Fig. 2): executing bidirectional (forwards and backwards) dataflow requires either duplicating power-hungry and area-hungry ADCs at both ends of the RRAM array11,34 or dedicating a large area to routing both rows and columns of the array to shared data converters15; the recurrent connections require writing the outputs to a buffer memory outside of the RRAM array, and reading them back for the next time-step computation.

The TNSA architecture realizes dynamic dataflow reconfigurability with little overhead. Whereas in conventional designs, CMOS peripheral circuits such as ADCs connect at only one end of the RRAM array, the TNSA architecture physically interleaves the RRAM weights and the CMOS neuron circuits, and connects them along the length of both rows and columns. As shown in Fig. 2e, a TNSA consists of 16 × 16 of such interleaved corelets that are connected by shared bit-lines (BLs) and word-lines (WLs) along the horizontal direction and source-lines (SLs) along the vertical direction. Each corelet encloses 16 × 16 RRAM devices and one neuron circuit. The neuron connects to 1 BL and 1 SL out of the 16 BLs and the 16 SLs that pass through the corelet, and is responsible for integrating inputs from all the 256 RRAMs connecting to the same BL or SL. Sixteen of these RRAMs are within the same corelet as the neuron; and the other 240 are within the other 15 corelets along the same row or column. Specifically, Fig. 2f shows that the neuron within corelet (i, j) connects to the (16i + j)th BL and the (16j + i)th SL. Such a configuration ensures that each BL or SL connects uniquely to a neuron, while doing so without duplicating neurons at both ends of the array, thus saving area and energy.

Circuit Design and Silicon Prototypes for Compute-in-Memory for Deep Learning Iinference Engine

/p>

Moreover, a neuron uses its BL and SL switches for both its input and output: it not only receives the analogue MVM output coming from BL or SL through the switches but also sends the converted digital results to peripheral registers through the same switches. By configuring which switch to use during the input and output stages of the neuron, we can realize various MVM dataflow directions. Figure 2g shows the forwards, backwards and recurrent MVMs enabled by the TNSA. To implement forwards MVM (BL to SL), during the input stage, input pulses are applied to the BLs through the BL drivers, get weighted by the RRAMs and enter the neuron through its SL switch; during the output stage, the neuron sends the converted digital outputs to SL registers through its SL switch; to implement recurrent MVM (BL to BL), the neuron instead receives input through its SL switch and sends the digital output back to the BL registers through its BL switch.

Weights of most AI models take both positive and negative values. We encode each weight as difference of conductance between two RRAM cells on adjacent rows along the same column (Fig. 2h). The forwards MVM is performed using a differential input scheme, where BL drivers send input voltage pulses with opposite polarities to adjacent BLs. The backwards MVM is performed using a differential output scheme, where we digitally subtract outputs from neurons connecting to adjacent BLs after neurons finish analogue-to-digital conversions.

To maximize throughput of AI inference on 48 CIM cores, we implement a broad selection of weight-mapping strategies that allow us to exploit both model parallelism and data parallelism (Fig. 2a) through multi-core parallel MVMs. Using a CNN as an example, to maximize data parallelism, we duplicate the weights of the most computationally intensive layers (early convolutional layers) to multiple cores for parallel inference on multiple data; to maximize model parallelism, we map different convolutional layers to different cores and perform parallel inference in a pipelined fashion. Meanwhile, we divide the layers whose weight dimensions exceed the RRAM array size into multiple segments and assign them to multiple cores for parallel execution. A more detailed description of the weight-mapping strategies is provided in Methods. The intermediate data buffers and partial-sum accumulators are implemented by a field-programmable gate array (FPGA) integrated on the same board as the NeuRRAM chip. Although these digital peripheral modules are not the focus of this study, they will eventually need to be integrated within the same chip in production-ready RRAM-CIM hardware.

Efficient voltage-mode neuron circuit

Figure 1d and Extended Data Table 1 show that the NeuRRAM chip achieves 1.6-times to 2.3-times lower EDP and 7-times to 13-times higher computational density (measured by throughput per million of RRAMs) at various MVM input and output bit-precisions than previous state-of-the-art RRAM-based CIM chips, despite being fabricated at an older technology node17,18,19,20,21,22,23,24,25,26,27,36. The reported energy and delay are measured for performing an MVM with a 256 × 256 weight matrix. It is noted that these numbers and those reported in previous RRAM-CIM work represent the peak energy efficiency achieved when the array utilization is 100% and does not account for energy spent on intermediate data transfer. Network-on-chip and program scheduling need to be carefully designed to achieve good end-to-end application-level energy efficiency.

Key to the NeuRRAM’s EDP improvement is a novel in-memory MVM output-sensing scheme. The conventional approach is to use voltage as input, and measure the current as the results based on Ohm’s law (Fig. 3a). Such a current-mode-sensing scheme cannot fully exploit the high-parallelism nature of CIM. First, simultaneously turning on multiple rows leads to a large array current. Sinking the large current requires peripheral circuits to use large transistors, whose area needs to be amortized by time-multiplexing between multiple columns, which limits ‘column parallelism’. Second, MVM results produced by different neural-network layers have drastically different dynamic ranges. Optimizing ADCs across such a wide dynamic range is difficult. To equalize the dynamic range, designs typically activate a fraction of input wires every cycle to compute a partial sum, and thus require multiple cycles to complete an MVM, which limits ‘row parallelism’.

NeuRRAM improves computation parallelism and energy efficiency by virtue of a neuron circuit implementing a voltage-mode sensing scheme. The neuron performs analogue-to-digital conversion of the MVM outputs by directly sensing the settled open-circuit voltage on the BL or SL line capacitance39 (Fig. 3b): voltage inputs are driven on the BLs whereas the SLs are kept floating, or vice versa, depending on the MVM direction. WLs are activated to start the MVM operation. The voltage on the output line settles to the weighted average of the voltages driven on the input lines, where the weights are the RRAM conductances. Upon deactivating the WLs, the output is sampled by transferring the charge on the output line to the neuron sampling capacitor (Csample in Fig. 3d). The neuron then accumulates this charge onto an integration capacitor (Cinteg) for subsequent analogue-to-digital conversion.

Such voltage-mode sensing obviates the need for power-hungry and area-hungry peripheral circuits to sink large current while clamping voltage, improving energy and area efficiency and eliminating output time-multiplexing. Meanwhile, the weight normalization owing to the conductance weighting in the voltage output (Fig. 3c) results in an automatic output dynamic range normalization for different weight matrices. Therefore, MVMs with different weight dimensions can all be completed within a single cycle, which significantly improves computational throughput. To eliminate the normalization factor from the final results, we pre-compute its value and multiply it back to the digital outputs from the ADC.

Neuromorphic NeuRRAM Chip AI Developed, Performs Computations in Memory Without Network Connectivity

Our voltage-mode neuron supports MVM with 1-bit to 8-bit inputs and 1-bit to 10-bit outputs. The multi-bit input is realized in a bit-serial fashion where charge is sampled and integrated onto Cinteg for 2n−1 cycles for the nth least significant bit (LSB) (Fig. 3e). For MVM inputs greater than 4 bits, we break the bit sequence into two segments, compute MVM for each segment separately and digitally perform a shift-and-add to obtain the final results (Fig. 3f). Such a two-phase input scheme improves energy efficiency and overcomes voltage headroom clipping at high-input precisions.

The multi-bit output is generated through a binary search process (Fig. 3g). Every cycle, neurons add or subtract CsampleVdecr amount of charge from Cinteg, where Vdecr is a bias voltage shared by all neurons. Neurons then compare the total charge on Cinteg with a fixed threshold voltage Vref to generate a 1-bit output. From the most significant bit (MSB) to the least significant bit (LSB), Vdecr is halved every cycle. Compared with other ADC architectures that implement a binary search, our ADC scheme eliminates the residue amplifier of an algorithmic ADC, and does not require an individual DAC for each ADC to generate reference voltages like a successive approximation register (SAR) ADC40. Instead, our ADC scheme allows sharing a single digital-to-analogue converter (DAC) across all neurons to amortize the DAC area, leading to a more compact design. The multi-bit MVM is validated by comparing ideal and measured results, as shown in Fig. 3h and Extended Data Fig. 5. More details on the multi-bit input and output implementation can be found in Methods.

The neuron can also be reconfigured to directly implement Rectified Linear Unit (ReLU)/sigmoid/tanh as activations when needed. In addition, it supports probabilistic sampling for stochastic activation functions by injecting pseudo-random noise generated by a linear-feedback shift register (LFSR) block into the neuron integrator. All the neuron circuit operations are performed by dynamically configuring a single amplifier in the neuron as either an integrator or a comparator during different phases of operations, as detailed in Methods. This results in a more compact design than other work that merges ADC and neuron activation functions within the same module12,13. Although most existing CIM designs use time-multiplexed ADCs for multiple rows and columns to amortize the ADC area, the compactness of our neuron circuit allows us to dedicate a neuron for each pair of BL and SL, and tightly interleave the neuron with RRAM devices within the TNSA architecture, as can be seen in Extended 

Hardware-algorithm co-optimizations

The innovations on the chip architecture and circuit design bring superior efficiency and reconfigurability to NeuRRAM. To complete the story, we must ensure that AI inference accuracy can be preserved under various circuit and device non-idealities3,41. We developed a set of hardware-algorithm co-optimization techniques that allow NeuRRAM to deliver software-comparable accuracy across diverse AI applications. Importantly, all the AI benchmark results presented in this paper are obtained entirely from hardware measurements on complete datasets. Although most previous efforts (with a few exceptions8,17) have reported benchmark results using a mixture of hardware characterization and software simulation, for example, emulate the array-level MVM process in software using measured device characteristics3,5,21,24, such an approach often fails to model the complete set of non-idealities existing in realistic hardware. As shown in Fig. 4a, these non-idealities may include (1) Voltage drop on input wires (Rwire), (2) on RRAM array drivers (Rdriver) and (3) on crossbar wires (e.g. BL resistance RBL), (4) limited RRAM programming resolution, (5) RRAM conductance relaxation41, (6) capacitive coupling from simultaneously switching array wires, and (7) limited ADC resolution and dynamic range. Our experiments show that omitting certain non-idealities in simulation leads to over-optimistic prediction of inference accuracy. For example, the third and the fourth bars in Fig. 5a show a 2.32% accuracy difference between simulation and measurement for CIFAR-10 classification19, whereas the simulation accounts for only non-idealities (5) and (7), which are what previous studies most often modelled5,21.

Our hardware-algorithm co-optimization approach includes three main techniques: (1) model-driven chip calibration, (2) noise-resilient neural-network training and analogue weight programming, and (3) chip-in-the-loop progressive model fine-tuning. Model-driven chip calibration uses the real model weights and input data to optimize chip operating conditions such as input voltage pulse amplitude, and records any ADC offsets for subsequent cancellation during inference. Ideally, the MVM output voltage dynamic range should fully utilize the ADC input swing to minimize discretization error. However, without calibration, the MVM output dynamic range varies with network layers even with the weight normalization effect of the voltage-mode sensing. To calibrate MVM to the optimal dynamic range, for each network layer, we use a subset of training-set data as calibration input to search for the best operating conditions (Fig. 4b). Extended Data Fig. 6 shows that different calibration input distributions lead to different output distributions. To ensure that the calibration data can closely emulate the distribution seen at test time, it is therefore crucial to use training-set data as opposed to randomly generated data during calibration. It is noted that when performing MVM on multiple cores in parallel, those shared bias voltages cannot be optimized for each core separately, which might lead to sub-optimal operating conditions and additional accuracy loss (detailed in Methods).

Weier Wan's PhD Defense @ Stanford -- RRAM Compute-In-Memory Hardware For Edge Intelligence

Stochastic non-idealities such as RRAM conductance relaxation and read noises degrade the signal-to-noise ratio (SNR) of the computation, leading to an inference accuracy drop. Some previous work obtained a higher SNR by limiting each RRAM cell to store a single bit, and encoding higher-precision weights using multiple cells9,10,16. Such an approach lowers the weight memory density. Accompanying that approach, the neural network is trained with weights quantized to the corresponding precision. In contrast, we utilize the intrinsic analogue programmability of RRAM42 to directly store high-precision weights and train the neural networks to tolerate the lower SNR. Instead of training with quantized weights, which is equivalent to injecting uniform noise into weights, we train the model with high-precision weights while injecting noise with the distribution measured from RRAM devices. RRAMs on NeuRRAM are characterized to have a Gaussian-distributed conductance spread, caused primarily by conductance relaxation. Therefore, we inject a Gaussian noise into weights during training, similar to a previous study21. Figure 5a shows that the technique significantly improves the model’s immunity to noise, from a CIFAR-10 classification accuracy of 25.34% without noise injection to 85.99% with noise injection. After the training, we program the non-quantized weights to RRAM analogue conductances using an iterative write–verify technique, described in Methods. This technique enables NeuRRAM to achieve an inference accuracy equivalent to models trained with 4-bit weights across various applications, while encoding each weight using only two RRAM cells, which is two-times denser than previous studies that require one RRAM cell per bit.

By applying the above two techniques, we already can measure inference accuracy comparable to or better than software models with 4-bit weights on Google speech command recognition, MNIST image recovery and MNIST classification (Fig. 1e). For deeper neural networks, we found that the error caused by those non-idealities that have nonlinear effects on MVM outputs, such as voltage drops, can accumulate through layers, and become more difficult to mitigate. In addition, multi-core parallel MVM leads to large instantaneous current, further exacerbating non-idealities such as voltage drop on input wires ((1) in Fig. 4a). As a result, when performing multi-core parallel inference on a deep CNN, ResNet-2043, the measured accuracy on CIFAR-10 classification (83.67%) is still 3.36% lower than that of a 4-bit-weight software model (87.03%).

To bridge this accuracy gap, we introduce a chip-in-the-loop progressive fine-tuning technique. Chip-in-the-loop training mitigates the impact of non-idealities by measuring training error directly on the chip44. Previous work has shown that fine-tuning the final layers using the back-propagated gradients calculated from hardware-measured outputs helped improve accuracy5. We find this technique to be of limited effectiveness in countering those nonlinear non-idealities. Such a technique also requires re-programming RRAM devices, which consumes additional energy. Our chip-in-the-loop progressive fine-tuning overcomes nonlinear model errors by exploiting the intrinsic nonlinear universal approximation capacity of the deep neural network45, and furthermore eliminates the need for weight re-programming. Figure 4d illustrates the fine-tuning procedure. We progressively program the weights one layer at a time onto the chip. After programming a layer, we perform inference using the training-set data on the chip up to that layer, and use the measured outputs to fine-tune the remaining layers that are still training in software. In the next time step, we program and measure the next layer on the chip. We repeat this process until all the layers are programmed. During the process, the non-idealities of the programmed layers can be progressively compensated by the remaining layers through training. Figure 5b shows the efficacy of this progressive fine-tuning technique. From left to right, each data point represents a new layer programmed onto the chip. The accuracy at each layer is evaluated by using the chip-measured outputs from that layer as inputs to the remaining layers in software. The cumulative CIFAR-10 test-set inference accuracy is improved by 1.99% using this technique. Extended Data Fig. 8a further illustrates the extent to which fine-tuning recovers the training-set accuracy loss at each layer, demonstrating the effectiveness of the approach in bridging the accuracy gap between software and hardware measurements.

ML on the Edge for Industry 4.0 with Arm Ethos N-78 Neural Processing Unit | Embedded World 2021

;

Using the techniques described above, we achieve inference accuracy comparable to software models trained with 4-bit weights across all the measured AI benchmark tasks. Figure 1e shows that we achieve a 0.98% error rate on MNIST handwritten digit recognition using a 7-layer CNN, a 14.34% error rate on CIFAR-10 object classification using ResNet-20, a 15.34% error rate on Google speech command recognition using a 4-cell LSTM, and a 70% reduction of L2 image-reconstruction error compared with the original noisy images on MNIST image recovery using an RBM. Some of these numbers are not yet to the accuracies achieved by full-precision digital implementations. The accuracy gap mainly comes from low-precision (≤4-bit) quantization of inputs and activations, especially on the most sensitive input and output layers46. For instance, Extended Data Fig. 8b presents an ablation study that shows that quantizing input images to 4-bit alone results in a 2.7% accuracy drop for CIFAR-10 classification. By contrast, the input layer only accounts for 1.08% of compute and 0.16% of weights of a ResNet-20 model. Therefore, they can be off-loaded to higher-precision digital compute units with little overheads. In addition, applying more advanced quantization techniques and optimizing training procedures such as data augmentation and regularization should further improve the accuracy for both quantized software models and hardware-measured results.

Table 1 summarizes the key features of each demonstrated model. Most of the essential neural-network layers and operations are implemented on the chip, including all the convolutional, fully connected and recurrent layers, neuron activation functions, batch normalization and the stochastic sampling process. Other operations such as average pooling and element-wise multiplications are implemented on an FPGA integrated on the same board as NeuRRAM (Extended Data Fig. 11a). Each of the models is implemented by allocating the weights to multiple cores on a single NeuRRAM chip. We developed a software toolchain to allow easy deployment of AI models on the chip47. The implementation details are described in Methods. Fundamentally, each of the selected benchmarks represents a general class of common edge AI tasks: visual recognition, speech processing and image de-noising. These results demonstrate the versatility of the TNSA architecture and the wide applicability of the hardware-algorithm co-optimization techniques.

The NeuRRAM chip simultaneously improves efficiency, flexibility and accuracy over existing RRAM-CIM hardware by innovating across the entire hierarchy of the design, from a TNSA architecture enabling reconfigurable dataflow direction, to an energy- and area-efficient voltage-mode neuron circuit, and to a series of algorithm-hardware co-optimization techniques. These techniques can be more generally applied to other non-volatile resistive memory technologies such as phase-change memory8,17,21,23,24, magnetoresistive RAM48 and ferroelectric field-effect transistors49. Going forwards, we expect NeuRRAM’s peak energy efficiency (EDP) to improve by another two to three orders of magnitude while supporting bigger AI models when scaling from 130-nm to 7-nm CMOS and RRAM technologies (detailed in Methods). Multi-core architecture design with network-on-chip that realizes efficient and versatile data transfers and inter-array pipelining is likely to be the next major challenge for RRAM-CIM37,38, which needs to be addressed by further cross-layer co-optimization. As resistive memory continues to scale towards offering tera-bits of on-chip memory50, such a co-optimization approach will equip CIM hardware on the edge with sufficient performance, efficiency and versatility to perform complex AI tasks that can only be done on the cloud today.

Methods
Core block diagram and operating modes

 Fig. 1 show the block diagram of a single CIM core. To support versatile MVM directions, most of the design is symmetrical in the row (BLs and WLs) and column (SLs) directions. The row and column register files store the inputs and outputs of MVMs, and can be written externally by either an Serial Peripheral Interface (SPI) or a random-access interface that uses an 8-bit address decoder to select one register entry, or internally by the neurons. The SL peripheral circuits contain an LFSR block used to generate pseudo-random sequences used for probabilistic sampling. It is implemented by two LFSR chains propagating in opposite directions. The registers of the two chains are XORed to generate spatially uncorrelated random numbers51. The controller block receives commands and generates control waveforms to the BL/WL/SL peripheral logic and to the neurons. It contains a delay-line-based pulse generator with tunable pulse width from 1 ns to 10 ns. It also implements clock-gating and power-gating logic used to turn off the core in idle mode. Each WL, BL and SL of the TNSA is driven by a driver consisting of multiple pass gates that supply different voltages. On the basis of the values stored in the register files and the control signals issued by the controller, the WL/BL/SL logic decides the state of each pass gate.

The core has three main operating modes: a weight-programming mode, a neuron-testing mode and an MVM mode (Extended Data Fig. 1). In the weight-programming mode, individual RRAM cells are selected for read and write. To select a single cell, the registers at the corresponding row and column are programmed to ‘1’ through random access with the help of the row and column decoder, whereas the other registers are reset to ‘0’. The WL/BL/SL logic turns on the corresponding driver pass gates to apply a set/reset/read voltage on the selected cell. In the neuron-testing mode, the WLs are kept at ground voltage (GND). Neurons receive inputs directly from BL or SL drivers through their BL or SL switch, bypassing RRAM devices. This allows us to characterize the neurons independently from the RRAM array. In the MVM mode, each input BL and SL is driven to Vref − Vread, Vref + Vread or Vref depending on the registers’ value at that row or column. If the MVM is in the BL-to-SL direction, we activate the WLs that are within the input vector length while keeping the rest at GND; if the MVM is in the SL-to-BL direction, we activate all the WLs. After neurons finish analogue-to-digital conversion, the pass gates from BLs and SLs to the registers are turned on to allow neuron-state readout.

Device Fabrication

RRAM arrays in NeuRRAM are in a one-transistor–one-resistor (1T1R) configuration, where each RRAM device is stacked on top of and connects in series with a selector NMOS transistor that cuts off the sneak path and provides current compliance during RRAM programming and reading. The selector n-type metal-oxide-semiconductor (NMOS), CMOS peripheral circuits and the bottom four back-end-of-line interconnect metal layers are fabricated in a standard 130-nm foundry process. Owing to the higher voltage required for RRAM forming and programming, the selector NMOS and the peripheral circuits that directly interface with RRAM arrays use thick-oxide input/output (I/O) transistors rated for 5-V operation. All the other CMOS circuits in neurons, digital logic, registers and so on use core transistors rated for 1.8-V operations.

The RRAM device is sandwiched between metal-4 and metal-5 layers shown in Fig. 2c. After the foundry completes the fabrication of CMOS and the bottom four metal layers, we use a laboratory process to finish the fabrication of the RRAM devices and the metal-5 interconnect, and the top metal pad and passivation layers. The RRAM device stack consists of a titanium nitride (TiN) bottom-electrode layer, a hafnium oxide (HfOx) switching layer, a tantalum oxide (TaOx) thermal-enhancement layer52 and a TiN top-electrode layer. They are deposited sequentially, followed by a lithography step to pattern the lateral structure of the device array.

RRAM write–verify programming and conductance relaxation

Each neural-network weight is encoded by the differential conductance between two RRAM cells on adjacent rows along the same column. The first RRAM cell encodes positive weight, and is programmed to a low conductance state (gmin) if the weight is negative; the second cell encodes negative weight, and is programmed to gmin if the weight is positive. Mathematically, the conductances of the two cells are max(𝑔max𝑊𝑤max,𝑔min)

 and max(−𝑔max𝑊𝑤max,𝑔min)

 respectively, where gmax and gmin are the maximum and minimum conductance of the RRAMs, wmax is the maximum absolute value of weights, and W is the unquantized high-precision weight.

To program an RRAM cell to its target conductance, we use an incremental-pulse write–verify technique42. Extended Data Fig. 3a,b illustrates the procedure. We start by measuring the initial conductance of the cell. If the value is below the target conductance, we apply a weak set pulse aiming to slightly increase the cell conductance. Then we read the cell again. If the value is still below the target, we apply another set pulse with amplitude incremented by a small amount. We repeat such set–read cycles until the cell conductance is within an acceptance range to the target value or overshoots to the other side of the target. In the latter case, we reverse the pulse polarity to reset, and repeat the same procedure as with set. During the set/reset pulse train, the cell conductance is likely to bounce up and down multiple times until eventually it enters the acceptance range or reaches a time-out limit.

There are a few trade-offs in selecting programming conditions. (1) A smaller acceptance range and a higher time-out limit improve programming precision, but require a longer time. (2) A higher gmax improves the SNR during inference, but leads to higher energy consumption and more programming failures for cells that cannot reach high conductance. In our experiments, we set the initial set pulse voltage to be 1.2 V and the reset pulse voltage to be 1.5 V, both with an increment of 0.1 V and pulse width of 1 μs. A RRAM read takes 1–10 μs, depending on its conductance. The acceptance range is ±1 μS to the target conductance. The time-out limit is 30 set–reset polarity reversals. We used gmin = 1 μS for all the models, and gmax = 40 μS for CNNs and gmax = 30 μS for LSTMs and RBMs. With such settings, 99% of the RRAM cells can be programmed to the acceptance range within the time-out limit. On average each cell requires 8.52 set/reset pulses. In the current implementation, the speed of such a write–verify process is limited by external control of DAC and ADC. If integrating everything into a single chip, such write–verify will take on average 56 µs per cell. Having multiple copies of DAC and ADC to perform write–verify on multiple cells in parallel will further improve RRAM programming throughput, at the cost of more chip area.

Besides the longer programming time, another reason to not use an overly small write–verify acceptance range is RRAM conductance relaxation. RRAM conductance changes over time after programming. Most of the change happens within a short time window (less than 1 s) immediately following the programming, after which the change becomes much slower, as shown in Extended Data Fig. 3d. The abrupt initial change is called ‘conductance relaxation’ in the literature41. Its statistics follow a Gaussian distribution at all conductance states except when the conductance is close to gmin. Extended Data Fig. 3c,d shows the conductance relaxation measured across the whole gmin-to-gmax conductance range. We found that the loss of programming precision owing to conductance relaxation is much higher than that caused by the write–verify acceptance range. The average standard deviation across all levels of initial conductance is about 2.8 μS. The maximum standard deviation is about 4 μS, which is close to 10% of gmax.

To mitigate the relaxation, we use an iterative programming technique. We iterate over the RRAM array for multiple times. In each iteration, we measure all the cells and re-program those whose conductance has drifted outside the acceptance range. Extended Data Fig. 3e shows that the standard deviation becomes smaller with more programming iterations. After 3 iterations, the standard deviation becomes about 2 μS, a 29% decrease compared with the initial value. We use 3 iterations in all our neural-network demonstrations and perform inference at least 30 min after the programming such that the measured inference accuracy would account for such conductance relaxation effects. By combining the iterative programming with our hardware-aware model training approach, the impact of relaxation can be largely mitigated.

Processing-In-Memory for Efficient AI Inference at the Edge

Implementation of MVM with multi-bit inputs and outputs

The neuron and the peripheral circuits support MVM at configurable input and output bit-precisions. An MVM operation consists of an initialization phase, an input phase and an output phase. Extended Data Fig. 4 illustrates the neuron circuit operation. During the initialization phase (Extended Data Fig. 4a), all BLs and SLs are precharged to Vref. The sampling capacitors Csample of the neurons are also precharged to Vref, whereas the integration capacitors Cinteg are discharged.

During the input phase, each input wire (either BL or SL depending on MVM direction) is driven to one of three voltage levels, Vref − Vread, Vref and Vref + Vread, through three pass gates, as shown in Fig. 3b. During forwards MVM, under differential-row weight mapping, each input is applied to a pair of adjacent BLs. The two BLs are driven to the opposite voltage with respect to Vref. That is, when the input is 0, both wires are driven to Vref; when the input is +1, the two wires are driven to Vref + Vread and Vref − Vread; and when the input is −1, to Vref − Vread and Vref + Vread. During backwards MVM, each input is applied to a single SL. The difference operation is performed digitally after neurons finish analogue-to-digital conversions.

After biasing the input wires, we then pulse those WLs that have inputs for 10 ns, while keeping output wires floating. As voltages of the output wires settle to 𝑉𝑗=∑𝑖𝑉𝑖𝐺𝑖𝑗∑𝑖𝐺𝑖𝑗

, where Gij represents conductance of RRAM at the i-th row and the j-th column, we turn off the WLs to stop all current flow. We then sample the charge remaining on the output wire parasitic capacitance to Csample located within neurons, followed by integrating the charge onto Cinteg, as shown in Extended Data Fig. 4b. The sampling pulse is 10 ns (limited by the 100-MHz external clock from the FPGA); the integration pulse is 240 ns, limited by large integration capacitor (104 fF), which was chosen conservatively to ensure function correctness and testing different neuron operating conditions.

The multi-bit input digital-to-analogue conversion is performed in a bit-serial fashion. For the nth LSB, we apply a single pulse to the input wires, followed by sampling and integrating charge from output wires onto Cinteg for 2n−1 cycles. At the end of multi-bit input phase, the complete analogue MVM output is stored as charge on Cinteg. For example, as shown in Fig. 3e, when the input vectors are 4-bit signed integers with 1 sign-bit and 3 magnitude-bits, we first send pulses corresponding to the first (least significant) magnitude-bit to input wires, followed by sampling and integrating for one cycle. For the second and the third magnitude-bits, we again apply one pulse to input wires for each bit, followed by sampling and integrating for two cycles and four cycles, respectively. In general, for n-bit signed integer inputs, we need a total of n − 1 input pulses and 2n−1 − 1 sampling and integration cycles.

Such a multi-bit input scheme becomes inefficient for high-input bit-precision owing to the exponentially increasing sampling and integration cycles. Moreover, headroom clipping becomes an issue as charge integrated at Cinteg saturates with more integration cycles. The headroom clipping can be overcome by using lower Vread, but at the cost of a lower SNR, so the overall MVM accuracy might not improve when using higher-precision inputs. For instance, Extended Data Fig. 5a,c shows the measured root-mean-square error (r.m.s.e.) of the MVM results. Quantizing inputs to 6-bit (r.m.s.e. = 0.581) does not improve the MVM accuracy compared with 4-bit (r.m.s.e. = 0.582), owing to the lower SNR.

To solve both the issues, we use a 2-phase input scheme for input greater than 4-bits. Figure 3f illustrates the process. To perform MVM with 6-bit inputs and 8-bit outputs, we divide inputs into two segments, the first containing the three MSBs and the second containing the three LSBs. We then perform MVM including the output analogue-to-digital conversion for each segment separately. For the MSBs, neurons (ADCs) are configured to output 8-bits; for the LSBs, neurons output 5-bits. The final results are obtained by shifting and adding the two outputs in digital domain. Extended Data Fig. 5d shows that the scheme lowers MVM r.m.s.e. from 0.581 to 0.519. Extended Data Fig. 12c–e further shows that such a two-phase scheme both extends the input bit-precision range and improves the energy efficiency.

Finally, during the output phase, the analogue-to-digital conversion is again performed in a bit-serial fashion through a binary search process. First, to generate the sign-bit of outputs, we disconnect the feedback loop of the amplifier to turn the integrator into a comparator (Extended Data Fig. 4c). We drive the right side of Cinteg to Vref. If the integrated charge is positive, the comparator output will be GND, and supply voltage VDD otherwise. The comparator output is then inverted, latched and readout to the BL or SL via the neuron BL or SL switch before being written into the peripheral BL or SL registers.

To generate k magnitude-bits, we add or subtract charge from Cinteg (Extended Data Fig. 4d), followed by comparison and readout for k cycles. From MSB to LSB, the amount of charge added or subtracted is halved every cycle. Whether to add or to subtract is automatically determined by the comparison result stored in the latch from the previous cycle. Figure 3g illustrates such a process. A sign-bit of ‘1’ is first generated and latched in the first cycle, representing a positive output. To generate the most significant magnitude-bit, the latch turns on the path from Vdecr− = Vref − Vdecr to Csample. The charge sampled by Csample is then integrated on Cinteg by turning on the negative feedback loop of the amplifier, resulting in CsampleVdecr amount of charge being subtracted from Cinteg. In this example, CsampleVdecr is greater than the original amount of charge on Cinteg, so the total charge becomes negative, and the comparator generates a ‘0’ output. To generate the second magnitude-bit, Vdecr is reduced by half. This time, the latch turns on the path from Vdecr+ = Vref + 1/2Vdecr to Csample. As the total charge on Cinteg after integration is still negative, the comparator outputs a ‘0’ again in this cycle. We repeat this process until the least significant magnitude-bit is generated. It is noted that if the initial sign-bit is ‘0’, all subsequent magnitude-bits are inverted before readout.

Such an output conversion scheme is similar to an algorithmic ADC or a SAR ADC in the sense that a binary search is performed for n cycles for a n-bit output. The difference is that an algorithmic ADC uses a residue amplifier, and a SAR ADC requires a multi-bit DAC for each ADC, whereas our scheme does not need a residue amplifier, and uses a single DAC that outputs 2 × (n − 1) different Vdecr+ and Vdecr− levels, shared by all neurons (ADCs). As a result, our scheme enables a more compact design by time-multiplexing an amplifier for integration and comparison, eliminating the residual amplifier, and amortizing the DAC area across all neurons in a CIM core. For CIM designs that use a dense memory array, such a compact design allows each ADC to be time-multiplexed by a fewer number of rows and columns, thus improving throughput.

To summarize, both the configurable MVM input and output bit-precisions and various neuron activation functions are implemented using different combinations of the four basic operations: sampling, integration, comparison and charge decrement. Importantly, all the four operations are realized by a single amplifier configured in different feedback modes. As a result, the design realizes versatility and compactness at the same time.

A Ternary-weight Compute-in-Memory RRAM Macro

Multi-core parallel MVM

NeuRRAM supports performing MVMs in parallel on multiple CIM cores. Multi-core MVM brings additional challenges to computational accuracy, because certain hardware non-idealities that do not manifest in single-core MVM become more severe with more cores. They include voltage drop on input wires, core-to-core variation and supply voltage instability. voltage drop on input wires (non-ideality (1) in Fig. 4a) is caused by large current drawn from a shared voltage source simultaneously by multiple cores. It makes equivalent weights stored in each core vary with applied inputs, and therefore have a nonlinear input-dependent effect on MVM outputs. Moreover, as different cores have a different distance from the shared voltage source, they experience a different amounts of voltage drops. Therefore, we cannot optimize read-voltage amplitude separately for each core to make its MVM output occupy exactly the full neuron input dynamic range.

These non-idealities together degrade the multi-core MVM accuracy. Extended Data Fig. 5e,f shows that when performing convolution in parallel on the 3 cores, outputs of convolutional layer 15 are measured to have a higher r.m.s.e. of 0.383 compared with 0.318 obtained by performing convolution sequentially on the 3 cores. In our ResNet-20 experiment, we performed 2-core parallel MVMs for convolutions within block 1 (Extended Data Fig. 9a), and 3-core parallel MVMs for convolutions within blocks 2 and 3.

The voltage-drop issue can be partially alleviated by making the wires that carry large instantaneous current as low resistance as possible, and by employing a power delivery network with more optimized topology. But the issue will persist and become worse as more cores are used. Therefore, our experiments aim to study the efficacy of algorithm-hardware co-optimization techniques in mitigating the issue. Also, it is noted that for a full-chip implementation, additional modules such as intermediate result buffers, partial-sum accumulators and network-on-chip will need to be integrated to manage inter-core data transfers. Program scheduling should also be carefully optimized to minimize buffer size and energy spent at intermediate data movement. Although there are studies on such full-chip architecture and scheduling37,38,53, they are outside the scope of this study.

Noise-resilient neural-network training

During noise-resilient neural-network training, we inject noise into weights of all fully connected and convolutional layers during the forwards pass of neural-network training to emulate the effects of RRAM conductance relaxation and read noises. The distribution of the injected noise is obtained by RRAM characterization. We used the iterative write–verify technique to program RRAM cells into different initial conductance states and measure their conductance relaxation after 30 min. Extended Data Fig. 3d shows that measured conductance relaxation has an absolute value of mean <1 μS (gmin) at all conductance states. The highest standard deviation is 3.87 μS, about 10% of the gmax 40 μS, found at about 12 μS initial conductance state. Therefore, to simulate such conductance relaxation behaviour during inference, we inject a Gaussian noise with a zero mean and a standard deviation equal to 10% of the maximum weights of a layer.

We train models with different levels of noise injection from 0% to 40%, and select the model that achieves the highest inference accuracy at 10% noise level for on-chip deployment. We find that injecting a higher noise during training than testing improves models’ noise resiliency. Extended Data Fig. 7a–c shows that the best test-time accuracy in the presence of 10% weight noise is obtained with 20% training-time noise injection for CIFAR-10 image classification, 15% for Google voice command classification and 35% for RBM-based image reconstruction.

For CIFAR-10, the better initial accuracy obtained by the model trained with 5% noise is most likely due to the regularization effect of noise injection. A similar phenomenon has been reported in neural-network quantization literature where a model trained with quantization occasionally outperforms a full-precision model54,55. In our experiments, we did not apply additional regularization on top of noise injection for models trained without noise, which might result in sub-optimal accuracy.

For RBM, Extended Data Fig. 7d further shows how reconstruction errors reduce with the number of Gibbs sampling steps for models trained with different noises. In general, models trained with higher noises converge faster during inference. The model trained with 20% noise reaches the lowest error at the end of 100 Gibbs sampling steps.

Extended Data Fig. 7e shows the effect of noise injection on weight distribution. Without noise injection, the weights have a Gaussian distribution. The neural-network outputs heavily depend on a small fraction of large weights, and thus become vulnerable to noise injection. With noise injection, the weights distribute more uniformly, making the model more noise resilient.

To efficiently implement the models on NeuRRAM, inputs to all convolutional and fully connected layers are quantized to 4-bit or below. The input bit-precisions of all the models are summarized in Table 1. We perform the quantized training using the parameterized clipping activation technique46. The accuracies of some of our quantized models are lower than that of the state-of-the-art quantized model because we apply <4-bit quantization to the most sensitive input and output layers of the neural networks, which have been reported to cause large accuracy degradation and are thus often excluded from low-precision quantization46,54. To obtain better accuracy for quantized models, one can use higher precision for sensitive input and output layers, apply more advanced quantization techniques, and use more optimized data preprocessing, data augmentation and regularization techniques during training. However, the focus of this work is to achieve comparable inference accuracy on hardware and on software while keeping all these variables the same, rather than to obtain state-of-the-art inference accuracy on all the tasks. The aforementioned quantization and training techniques will be equally beneficial for both our software baselines and hardware measurements.

Chip-in-the-loop progressive fine-tuning

During the progressive chip-in-the-loop fine-tuning, we use the chip-measured intermediate outputs from a layer to fine-tune the weights of the remaining layers. Importantly, to fairly evaluate the efficacy of the technique, we do not use the test-set data (for either training or selecting checkpoint) during the entire process of fine-tuning. To avoid over-fitting to a small fraction of data, measurements should be performed on the entire training-set data. We reduce the learning rate to 1/100 of the initial learning rate used for training the baseline model, and fine-tune for 30 epochs, although we observed that the accuracy generally plateaus within the first 10 epochs. The same weight noise injection and input quantization are applied during the fine-tuning.

Marco Rios - Running efficiently CNNs on the Edge thanks to Hybrid SRAM-RRAM in-Memory Computing

Implementations of CNNs, LSTMs and RBMs

We use CNN models for the CIFAR-10 and MNIST image classification tasks. The CIFAR-10 dataset consists of 50,000 training images and 10,000 testing images belonging to 10 object classes. We perform image classification using the ResNet-2043, which contains 21 convolutional layers and 1 fully connected layer (Extended Data Fig. 9a), with batch normalizations and ReLU activations between the layers. The model is trained using the Keras framework. We quantize the input of all convolutional and fully connected layers to a 3-bit unsigned fixed-point format except for the first convolutional layer, where we quantize the input image to 4-bit because the inference accuracy is more sensitive to the input quantization. For the MNIST handwritten digits classification, we use a seven-layer CNN consisting of six convolutional layers and one fully connected layer, and use max-pooling between layers to down-sample feature map sizes. The inputs to all the layers, including the input image, are quantized to a 3-bit unsigned fixed-point format.

All the parameters of the CNNs are implemented on a single NeuRRAM chip including those of the convolutional layers, the fully connected layers and the batch normalization. Other operations such as partial-sum accumulation and average pooling are implemented on an FPGA integrated on the same board as the NeuRRAM. These operations amount to only a small fraction of the total computation and integrating their implementation in digital CMOS would incur negligible overhead; the FPGA implementation was chosen to provide greater flexibility during test and development.

Extended Data illustrates the process to map a convolutional layer on a chip. To implement the weights of a four-dimensional convolutional layer with dimension H (height), W (width), I (number of input channels), O (number of output channels) on two-dimensional RRAM arrays, we flatten the first three dimensions into a one-dimensional vector, and append the bias term of each output channel to each vector. If the range of the bias values is B times of the weight range, we evenly divide the bias values and implement them using B rows. Furthermore, we merge the batch normalization parameters into convolutional weights and biases after training (Extended Data Fig. 9b), and program the merged Wʹ and bʹ onto RRAM arrays such that no explicit batch normalization needs to be performed during inference.

Under the differential-row weight-mapping scheme, the parameters of a convolutional layer are converted into a conductance matrix of size (2(HWI + B), O). If the conductance matrix fits into a single core, an input vector is applied to 2(HWI + B) rows and broadcast to O columns in a single cycle. HWIO multiply–accumulate (MAC) operations are performed in parallel. Most ResNet-20 convolutional layers have a conductance matrix height of 2(HWI + B) that is greater than the RRAM array length of 256. We therefore split them vertically into multiple segments, and map the segments either onto different cores that are accessed in parallel, or onto different columns within a core that are accessed sequentially. The details of the weight-mapping strategies are described in the next section.

The Google speech command dataset consists of 65,000 1-s-long audio recordings of voice commands, such as ‘yes’, ‘up’, ‘on’, ‘stop’ and so on, spoken by thousands of different people. The commands are categorized into 12 classes. Extended Data Fig. 9d illustrates the model architecture. We use the Mel-frequency cepstral coefficient encoding approach to encode every 40-ms piece of audio into a length-40 vector. With a hop length of 20 ms, we have a time series of 50 steps for each 1-s recording.

We build a model that contains four parallel LSTM cells. Each cell has a hidden state of length 112. The final classification is based on summation of outputs from the four cells. Compared with a single-cell model, the 4-cell model reduces the classification error (of an unquantized model) from 10.13% to 9.28% by leveraging additional cores on the NeuRRAM chip. Within a cell, in each time step, we compute the values of four LSTM gates (input, activation, forget and output) based on the inputs from the current step and hidden states from the previous step. We then perform element-wise operations between the four gates to compute the new hidden-state value. The final logit outputs are calculated based on the hidden states of the final time step.

Each LSTM cell has 3 weight matrices that are implemented on the chip: an input-to-hidden-state matrix with size 40 × 448, a hidden-state-to-hidden-state matrix with size 112 × 448 and a hidden-state-to-logits matrix with size 112 × 12. The element-wise operations are implemented on the FPGA. The model is trained using the PyTorch framework. The inputs to all the MVMs are quantized to 4-bit signed fixed-point formats. All the remaining operations are quantized to 8-bit.

An RBM is a type of generative probabilistic graphical model. Instead of being trained to perform discriminative tasks such as classification, it learns the statistical structure of the data itself. Extended Data Fig. 9e shows the architecture of our image-recovery RBM. The model consists of 794 fully connected visible neurons, corresponding to 784 image pixels plus 10 one-hot encoded class labels and 120 hidden neurons. We train the RBM using the contrastive divergence learning procedure in software.

During inference, we send 3-bit images with partially corrupted or blocked pixels to the model running on a NeuRRAM chip. The model then performs back-and-forth MVMs and Gibbs sampling between visible and hidden neurons for ten cycles. In each cycle, neurons sample binary states h and v from the MVM outputs based on the probability distributions: 𝑝(ℎ𝑗=1|𝐯)=σ(𝑏𝑗+∑𝑖𝑣𝑖𝑤𝑖𝑗)

 and 𝑝(ℎ𝑗=1|𝐯)=

 σ(𝑏𝑗+∑𝑖𝑣𝑖𝑤𝑖𝑗)

, where σ is the sigmoid function, ai is a bias for hidden neurons (h) and bj is a bias for visible neurons (v). After sampling, we reset the uncorrupted pixels (visible neurons) to the original pixel values. The final inference performance is evaluated by computing the average L2-reconstruction error between the original image and the recovered image. Extended Data Fig. 10 shows some examples of the measured image recovery.

When mapping the 794 × 120 weight matrix to multiple cores of the chip, we try to make the MVM output dynamic range of each core relatively consistent such that the recovery performance will not overly rely on the computational accuracy of any single core. To achieve this, we assign adjacent pixels (visible neurons) to different cores such that every core sees a down-sampled version of the whole image, as shown in Extended Data Fig. 9f). Utilizing the bidirectional MVM functionality of the TNSA, the visible-to-hidden neuron MVM is performed from the SL-to-BL direction in each core; the hidden-to-visible neuron MVM is performed from the BL-to-SL direction.

Crossbar Demonstration of Artificial Intelligence Edge Computing Acceleration with ReRAM

Weight-mapping strategy onto multiple CIM cores

To implement an AI model on a NeuRRAM chip, we convert the weights, biases and other relevant parameters (for example, batch normalization) of each model layer into a single two-dimensional conductance matrix as described in the previous section. If the height or the width of a matrix exceed the RRAM array size of a single CIM core (256 × 256), we split the matrix into multiple smaller conductance matrices, each with maximum height and width of 256.

We consider three factors when mapping these conductance matrices onto the 48 cores: resource utilization, computational load balancing and voltage drop. The top priority is to ensure that all conductance matrices of a model are mapped onto a single chip such that no re-programming is needed during inference. If the total number of conductance matrices does not exceed 48, we can map each matrix onto a single core (case (1) in Fig. 2a) or multiple cores. There are two scenarios when we map a single matrix onto multiple cores. (1) When a model has different computational intensities, defined as the amount of computation per weights, for different layers, for example, CNNs often have higher computational intensity for earlier layers owing to larger feature map dimensions, we duplicate the more computationally intensive matrices to multiple cores and operate them in parallel to increase throughput and balance the computational loads across the layers (case (2) in Fig. 2a). (2) Some models have ‘wide’ conductance matrices (output dimension >128), such as our image-recovery RBM. If mapping the entire matrix onto a single core, each input driver needs to supply large current for its connecting RRAMs, resulting in a significant voltage drop on the driver, deteriorating inference accuracy. Therefore, when there are spare cores, we can split the matrix vertically into multiple segments and map them onto different cores to mitigate the voltage drop.

By contrast, if a model has more than 48 conductance matrices, we need to merge some matrices so that they can fit onto a single chip. The smaller matrices are merged diagonally such that they can be accessed in parallel (case (3) in Fig. 2a). The bigger matrices are merged horizontally and accessed by time-multiplexing input rows (case (4) in Fig. 2a). When selecting the matrices to merge, we want to avoid the matrices that belong to the same two categories described in the previous paragraph: (1) those that have high computational intensity (for example, early layers of ResNet-20) to minimize impact on throughput; and (2) those with ‘wide’ output dimension (for example, late layers of ResNet-20 have large number of output channels) to avoid a large voltage drop. For instance, in our ResNet-20 implementation, among a total of 61 conductance matrices (Extended Data Fig. 9a: 1 from input layer, 12 from block 1, 17 from block 2, 28 from block 3, 2 from shortcut layers and 1 from final dense layer), we map each of the conductance matrices in blocks 1 and 3 onto a single core, and merge the remaining matrices to occupy the 8 remaining cores.

Table 1 summarizes core usage for all the models. It is noted that for partially occupied cores, unused RRAM cells are either unformed or programmed to high resistance state; WLs of unused rows are not activated during inference. Therefore, they do not consume additional energy during inference.

The NeuRRAM chip simultaneously improves efficiency, flexibility and accuracy over existing RRAM-CIM hardware by innovating across the entire hierarchy of the design, from a TNSA architecture enabling reconfigurable dataflow direction, to an energy- and area-efficient voltage-mode neuron circuit, and to a series of algorithm-hardware co-optimization techniques. These techniques can be more generally applied to other non-volatile resistive memory technologies such as phase-change memory8,17,21,23,24, magnetoresistive RAM48 and ferroelectric field-effect transistors49. Going forwards, we expect NeuRRAM’s peak energy efficiency (EDP) to improve by another two to three orders of magnitude while supporting bigger AI models when scaling from 130-nm to 7-nm CMOS and RRAM technologies (detailed in Methods). Multi-core architecture design with network-on-chip that realizes efficient and versatile data transfers and inter-array pipelining is likely to be the next major challenge for RRAM-CIM37,38, which needs to be addressed by further cross-layer co-optimization. As resistive memory continues to scale towards offering tera-bits of on-chip memory50, such a co-optimization approach will equip CIM hardware on the edge with sufficient performance, efficiency and versatility to perform complex AI tasks that can only be done on the cloud today.

The core has three main operating modes: a weight-programming mode, a neuron-testing mode and an MVM mode (Extended Data Fig. 1). In the weight-programming mode, individual RRAM cells are selected for read and write. To select a single cell, the registers at the corresponding row and column are programmed to ‘1’ through random access with the help of the row and column decoder, whereas the other registers are reset to ‘0’. The WL/BL/SL logic turns on the corresponding driver pass gates to apply a set/reset/read voltage on the selected cell. In the neuron-testing mode, the WLs are kept at ground voltage (GND). Neurons receive inputs directly from BL or SL drivers through their BL or SL switch, bypassing RRAM devices. This allows us to characterize the neurons independently from the RRAM array. In the MVM mode, each input BL and SL is driven to Vref − Vread, Vref + Vread or Vref depending on the registers’ value at that row or column. If the MVM is in the BL-to-SL direction, we activate the WLs that are within the input vector length while keeping the rest at GND; if the MVM is in the SL-to-BL direction, we activate all the WLs. After neurons finish analogue-to-digital conversion, the pass gates from BLs and SLs to the registers are turned on to allow neuron-state readout.

The NeuRRAM chip is not only twice as energy efficient as state-of-the-art, it’s also versatile and delivers results that are just as accurate as conventional digital chips.

NeuRRAM chip is twice as energy efficient and could bring the power of AI into tiny edge devices

Stanford engineers created a more efficient and flexible AI chip, which could bring the power of AI into tiny edge devices

AI-powered edge computing is already pervasive in our lives. Devices like drones, smart wearables, and industrial IoT sensors are equipped with AI-enabled chips so that computing can occur at the “edge” of the internet, where the data originates. This allows real-time processing and guarantees data privacy.

However, AI functionalities on these tiny edge devices are limited by the energy provided by a battery. Therefore, improving energy efficiency is crucial. In today’s AI chips, data processing and data storage happen at separate places – a compute unit and a memory unit. The frequent data movement between these units consumes most of the energy during AI processing, so reducing the data movement is the key to addressing the energy issue.

Stanford University engineers have come up with a potential solution: a novel resistive random-access memory (RRAM) chip that does the AI processing within the memory itself, thereby eliminating the separation between the compute and memory units. Their “compute-in-memory” (CIM) chip, called NeuRRAM, is about the size of a fingertip and does more work with limited battery power than what current chips can do.

“Having those calculations done on the chip instead of sending information to and from the cloud could enable faster, more secure, cheaper, and more scalable AI going into the future, and give more people access to AI power,” said H.-S Philip Wong, the Willard R. and Inez Kerr Bell Professor in the School of Engineering.

“The data movement issue is similar to spending eight hours in commute for a two-hour workday,” added Weier Wan, a recent graduate at Stanford leading this project. “With our chip, we are showing a technology to tackle this challenge.”

They presented NeuRRAM in a recent article in the journal Nature. While compute-in-memory has been around for decades, this chip is the first to actually demonstrate a broad range of AI applications on hardware, rather than through simulation alone.

Seminar in Advances in Computing-SRAM based In-Memory Computing for Energy-Efficient AI Systems

Putting computing power on the device

To overcome the data movement bottleneck, researchers implemented what is known as compute-in-memory (CIM), a novel chip architecture that performs AI computing directly within memory rather than in separate computing units. The memory technology that NeuRRAM used is resistive random-access memory (RRAM). It is a type of non-volatile memory – memory that retains data even once power is off – that has emerged in commercial products. RRAM can store large AI models in a small area footprint, and consume very little power, making them perfect for small-size and low-power edge devices.

Even though the concept of CIM chips is well established, and the idea of implementing AI computing in RRAM isn’t new, “this is one of the first instances to integrate a lot of memory right onto the neural network chip and present all benchmark results through hardware measurements,” said Wong, who is a co-senior author of the Nature paper.

The architecture of NeuRRAM allows the chip to perform analog in-memory computation at low power and in a compact-area footprint. It was designed in collaboration with the lab of Gert Cauwenberghs at the University of California, San Diego, who pioneered low-power neuromorphic hardware design. The architecture also enables reconfigurability in dataflow directions, supports various AI workload mapping strategies, and can work with different kinds of AI algorithms – all without sacrificing AI computation accuracy.

To show the accuracy of NeuRRAM’s AI abilities, the team tested how it functioned on different tasks. They found that it’s 99% accurate in letter recognition from the MNIST dataset, 85.7% accurate on image classification from the CIFAR-10 dataset, 84.7% accurate on Google speech command recognition and showed a 70% reduction in image-reconstruction error on a Bayesian image recovery task.

“Efficiency, versatility, and accuracy are all important aspects for broader adoption of the technology,” said Wan. “But to realize them all at once is not simple. Co-optimizing the full stack from hardware to software is the key.”

“Such full-stack co-design is made possible with an international team of researchers with diverse expertise,” added Wong.

Fueling edge computations of the future

Right now, NeuRRAM is a physical proof-of-concept but needs more development before it’s ready to be translated into actual edge devices.

But this combined efficiency, accuracy, and ability to do different tasks showcases the chip’s potential. “Maybe today it is used to do simple AI tasks such as keyword spotting or human detection, but tomorrow it could enable a whole different user experience. Imagine real-time video analytics combined with speech recognition all within a tiny device,” said Wan. “To realize this, we need to continue improving the design and scaling RRAM to more advanced technology nodes.”

“This work opens up several avenues of future research on RRAM device engineering, and programming models and neural network design for compute-in-memory, to make this technology scalable and usable by software developers”, said Priyanka Raina, assistant professor of electrical engineering and a co-author of the paper.

If successful, RRAM compute-in-memory chips like NeuRRAM have almost unlimited potential. They could be embedded in crop fields to do real-time AI calculations for adjusting irrigation systems to current soil conditions. Or they could turn augmented reality glasses from clunky headsets with limited functionality to something more akin to Tony Stark’s viewscreen in the Iron Man and Avengers movies (without intergalactic or multiverse threats – one can hope).

If mass produced, these chips would be cheap enough, adaptable enough, and low power enough that they could be used to advance technologies already improving our lives, said Wong, like in medical devices that allow home health monitoring.

They can be used to solve global societal challenges as well: AI-enabled sensors would play a role in tracking and addressing climate change. “By having these kinds of smart electronics that can be placed almost anywhere, you can monitor the changing world and be part of the solution,” Wong said. “These chips could be used to solve all kinds of problems from climate change to food security.”

The NeuRRAM chip is the first compute-in-memory chip to demonstrate a wide range of AI applications at a fraction of the energy consumed by other platforms while maintaining equivalent accuracy

The NeuRRAM neuromorphic chip was developed by an international team of researchers co-led by UC San Diego engineers.

An international team of researchers has designed and built a chip that runs computations directly in memory and can run a wide variety of AI applications–all at a fraction of the energy consumed by computing platforms for general-purpose AI computing.

The NeuRRAM neuromorphic chip brings AI a step closer to running on a broad range of edge devices, disconnected from the cloud, where they can perform sophisticated cognitive tasks anywhere and anytime without relying on a network connection to a centralized server. Applications abound in every corner of the world and every facet of our lives, and range from smart watches, to VR headsets, smart earbuds, smart sensors in factories and rovers for space exploration.

The NeuRRAM chip is not only twice as energy efficient as the state-of-the-art “compute-in-memory” chips, an innovative class of hybrid chips that runs computations in memory, it also delivers results that are just as accurate as conventional digital chips. Conventional AI platforms are a lot bulkier and typically are constrained to using large data servers operating in the cloud.

In addition, the NeuRRAM chip is highly versatile and supports many different neural network models and architectures. As a result, the chip can be used for many different applications, including image recognition and reconstruction as well as voice recognition.

“The conventional wisdom is that the higher efficiency of compute-in-memory is at the cost of versatility, but our NeuRRAM chip obtains efficiency while not sacrificing versatility,” said Weier Wan, the paper’s first corresponding author and a recent Ph.D. graduate of Stanford University who worked on the chip while at UC San Diego, where he was co-advised by Gert Cauwenberghs in the Department of Bioengineering.

The research team, co-led by bioengineers at the University of California San Diego, presents their results in the Aug. 17 issue of Nature.

Processing-in-Memory Course: Lecture 14: Analyzing&Mitigating ML Inference Bottlenecks - Spring 2022

Currently, AI computing is both power hungry and computationally expensive. Most AI applications on edge devices involve moving data from the devices to the cloud, where the AI processes and analyzes it. Then the results are moved back to the device. That’s because most edge devices are battery-powered and as a result only have a limited amount of power that can be dedicated to computing.

By reducing power consumption needed for AI inference at the edge, this NeuRRAM chip could lead to more robust, smarter and accessible edge devices and smarter manufacturing. It could also lead to better data privacy as the transfer of data from devices to the cloud comes with increased security risks.

On AI chips, moving data from memory to computing units is one major bottleneck.

“It’s the equivalent of doing an eight-hour commute for a two-hour work day,” Wan said.

To solve this data transfer issue, researchers used what is known as resistive random-access memory, a type of non-volatile memory that allows for computation directly within memory rather than in separate computing units. RRAM and other emerging memory technologies used as synapse arrays for neuromorphic computing were pioneered in the lab of Philip Wong, Wan’s advisor at Stanford and a main contributor to this work. Computation with RRAM chips is not necessarily new, but generally it leads to a decrease in the accuracy of the computations performed on the chip and a lack of flexibility in the chip’s architecture.

"Compute-in-memory has been common practice in neuromorphic engineering since it was introduced more than 30 years ago,” Cauwenberghs said. “What is new with NeuRRAM is that the extreme efficiency now goes together with great flexibility for diverse AI applications with almost no loss in accuracy over standard digital general-purpose compute platforms."

A carefully crafted methodology was key to the work with multiple levels of “co-optimization” across the abstraction layers of hardware and software, from the design of the chip to its configuration to run various AI tasks. In addition, the team made sure to account for various constraints that span from memory device physics to circuits and network architecture.

“This chip now provides us with a platform to address these problems across the stack from devices and circuits to algorithms,” said Siddharth Joshi, an assistant professor of computer science and engineering at the University of Notre Dame , who started working on the project as a Ph.D. student and postdoctoral researcher in Cauwenberghs lab at UC San Diego.

Chip performance

Researchers measured the chip’s energy efficiency by a measure known as energy-delay product, or EDP. EDP combines both the amount of energy consumed for every operation and the amount of times it takes to complete the operation. By this measure, the NeuRRAM chip achieves 1.6 to 2.3 times lower EDP (lower is better) and 7 to 13 times higher computational density than state-of-the-art chips.

Researchers ran various AI tasks on the chip. It achieved 99% accuracy on a handwritten digit recognition task; 85.7% on an image classification task; and 84.7% on a Google speech command recognition task. In addition, the chip also achieved a 70% reduction in image-reconstruction error on an image-recovery task. These results are comparable to existing digital chips that perform computation under the same bit-precision, but with drastic savings in energy.

Researchers point out that one key contribution of the paper is that all the results featured are obtained directly on the hardware. In many previous works of compute-in-memory chips, AI benchmark results were often obtained partially by software simulation.

Next steps include improving architectures and circuits and scaling the design to more advanced technology nodes. Researchers also plan to tackle other applications, such as spiking neural networks.

“We can do better at the device level, improve circuit design to implement additional features and address diverse applications with our dynamic NeuRRAM platform,” said Rajkumar Kubendran, an assistant professor for the University of Pittsburgh, who started work on the project while a Ph.D. student in Cauwenberghs’ research group at UC San Diego.

In addition, Wan is a founding member of a startup that works on productizing the compute-in-memory technology. “As a researcher and an engineer, my ambition is to bring research innovations from labs into practical use,” Wan said.

 Intelligence on Silicon: From Deep Neural Network Accelerators to Brain-Mimicking AI-SoCs

New architecture

The key to NeuRRAM’s energy efficiency is an innovative method to sense output in memory. Conventional approaches use voltage as input and measure current as the result. But this leads to the need for more complex and more power hungry circuits. In NeuRRAM, the team engineered a neuron circuit that senses voltage and performs analog-to-digital conversion in an energy efficient manner. This voltage-mode sensing can activate all the rows and all the columns of an RRAM array in a single computing cycle, allowing higher parallelism.

In the NeuRRAM architecture, CMOS neuron circuits are physically interleaved with RRAM weights. It differs from conventional designs where CMOS circuits are typically on the peripheral of RRAM weights.The neuron’s connections with the RRAM array can be configured to serve as either input or output of the neuron. This allows neural network inference in various data flow directions without incurring overheads in area or power consumption. This in turn makes the architecture easier to reconfigure.

To make sure that accuracy of the AI computations can be preserved across various neural network architectures, researchers developed a set of hardware algorithm co-optimization techniques. The techniques were verified on various neural networks including convolutional neural networks, long short-term memory, and restricted Boltzmann machines.

As a neuromorphic AI chip, NeuroRRAM performs parallel distributed processing across 48 neurosynaptic cores. To simultaneously achieve high versatility and high efficiency, NeuRRAM supports data-parallelism by mapping a layer in the neural network model onto multiple cores for parallel inference on multiple data. Also, NeuRRAM offers model-parallelism by mapping different layers of a model onto different cores and performing inference in a pipelined fashion.

An international research team

The work is the result of an international team of researchers.

The UC San Diego team designed the CMOS circuits that implement the neural functions interfacing with the RRAM arrays to support the synaptic functions in the chip’s architecture, for high efficiency and versatility. Wan, working closely with the entire team, implemented the design; characterized the chip; trained the AI models; and executed the experiments. Wan also developed a software toolchain that maps AI applications onto the chip.

The RRAM synapse array and its operating conditions were extensively characterized and optimized at Stanford University.

The RRAM array was fabricated and integrated onto CMOS at Tsinghua University.

The Team at Notre Dame contributed to both the design and architecture of the chip and the subsequent machine learning model design and training.

The research started as part of the National Science Foundation funded Expeditions in Computing project on Visual Cortex on Silicon at Penn State University, with continued funding support from the Office of Naval Research Science of AI program, the Semiconductor Research Corporation and DARPA JUMP program, and Western Digital Corporation.

A compute-in-memory chip based on resistive random-access memory

Published open-access in Nature, August 17, 2022.

Weier Wan, Rajkumar Kubendran, Stephen Deiss, Siddharth Joshi, Gert Cauwenberghs, University of California San Diego

Weier Wan, S. Burc Eryilmaz, Priyanka Raina, H-S Philip Wong, Stanford University

Clemens Schaefer, Siddharth Joshi, University of Notre Dame

Rajkumar Kubendran, University of Pittsburgh

Wenqiang Zhang, Dabin Wu, He Qian, Bin Gao, Huaqiang Wu, Tsinghua University


Corresponding authors: Wan, Gao, Joshi, Wu, Wong and Cauwenberghs


More Information:

https://www.nature.com/articles/s41586-022-04992-8

https://www.synopsys.com/designware-ip/technical-bulletin/the-dna-of-an-ai-soc-dwtb_q318.html

https://www.synopsys.com/designware-ip/ip-market-segments/artificial-intelligence.html#memory

https://innovationtoronto.com/2022/08/neurram-chip-is-twice-as-energy-efficient-and-could-bring-the-power-of-ai-into-tiny-edge-devices/

https://www.eurekalert.org/multimedia/946477

https://today.ucsd.edu/story/Nature_bioengineering_2022

https://www.eenewseurope.com/en/48-core-neuromorphic-ai-chip-uses-resistive-memory/

https://cacm.acm.org/news/263914-a-neuromorphic-chip-for-ai-on-the-edge/fulltext

https://today.ucsd.edu/story/Nature_bioengineering_2022

https://www.quantamagazine.org/a-brain-inspired-chip-can-run-ai-with-far-less-energy-20221110/









Advanced Artificial Intelligence

$
0
0




DEFINING HUMAN INTELLIGENCE

Human intelligence is the ability to understand and use the full capacity of your mind. It helps us to understand concepts and learn how to handle and adapt to unexpected situations. Studies show that intelligence can also be inherited and determined by genes. There are so many types of tests that measure intelligence. Their diversity comes from different cultures and people. For example, some tests require knowledge of culture and vocabulary while others are simpler and require the recognition of symbols, shapes and colours. Research has shown that people can practice improving their intelligence and cognitive abilities. You can work to improve your general and emotional knowledge, creativity, reasoning and self-awareness. It also greatly influences the development of the ability to solve problems very effectively. We’re just scratching the surface. Here’s a bit more in-depth information.

THE TRADITIONAL DEFINITION

There are many types and definitions of intelligence. Over time views on the development of intelligence have changed. Debates on the subject of intelligence have been going on since the early 1900s and have been accompanied by the creation of various tests for individuals. The traditional view or definition is that we are born with a fixed amount of intelligence and that our level of intelligence cannot change in the course of life. It also states that intelligence can only be measured by solving a few different short answer tests. In addition, teachers used the same learning materials and topics to try and teach each individual equally. Simply put, they treated everyone the same and with a single approach. There weren’t any differences in testing methods, or how they treated students.

HOW DOES PSYCHOLOGY VIEW IT?

Many psychologists have explored the field of intelligence and how it is measured. According to their conclusions, there are different aspects from which we can observe the intelligence of a man. Psychologists believe that intelligence is the use of general knowledge for understanding and manipulating the environment. They focused a lot on various cognitive processes such as learning, memory, perception and understanding, as well as problem-solving. Intelligence is seen as the effective possession and use of various combined and different abilities. Consensus within groups of psychologists did not always coincide, and many of them developed different theories and statistics that guided them through their research. Here are some of the most notable theories:

Spearman’s General Intelligence or g factor

Thurstone’s Primary Mental Abilities

Gardner’s Multiple Intelligences

Triarchic Theory of Intelligence

HOW DO WE MEASURE IT?

Is there a way to find out if we are smart or not? It seems difficult to measure something that is not physically present, but today, intelligence is actually measured by various forms of tests. In fact, these tests are very helpful to people in discovering how far their intelligence goes. Most people are interested in their level of intelligence. You’re probably wondering how you can test your capabilities. Here are some of the most popular types of intelligence tests that are used today.

  • IQ test, which is one of the well known in society
  • Specific aptitude tests
  • Tests of logical reasoning
  • Creativity tests
  • Emotional Intelligence tests
  • Memory tests

IS YOUR INTELLIGENCE FIXED?

According to various research and studies on the topic of intelligence and whether it is inherited or can be improved, we can see that there is a lot of disagreement in the world of psychologists and sociologists. There are debates on this topic to determine if it is possible to improve skills and work on your abilities to change your mindset. Some of them believe that intelligence is something we are born with, as a part of the body, and that it is very much related to inheriting parental genes. Their strong and strict beliefs on this theory date back to the 1900s when we could test our intelligence only through certain types of tests and that was the only indicator and measure of intelligence at that time. As it is no longer the 1900s, science and research have advanced, and opinions have been divided. So now the other part of the scientific world says that intelligence is a cognitive ability that is worked on throughout one’s life. They believe that anyone can learn how to upgrade their intelligence and have an opinion that it can be improved by learning and using logic.

HOW DO YOU HELP IT DEVELOP?

The times we live in are much faster. We have very little time to dedicate and work on ourselves. Luckily here are some “brain exercises” that you can use to help you develop your intelligence on your own.

THERE ARE FIVE EFFECTIVE WAYS TO DO IT.

LEARN SOMETHING NEW.

Learn something new at every opportunity. Recent studies have shown that learning new things can help improve your memory and mental performance. It keeps your mind active and stops it from taking it easy.

GO BILATERAL

The brain has two sides, one that's creative and one that's analytical. Train and exercise both sides of the brain. Make sure if you're in front of data you're doing something creative with your day and vice versa.

IMPROVE YOUR LIFESTYLE

You should regularly move, stay hydrated and have a nutrient-packed diet. The better you live your life, the healthier your brain will be. Especially as you age

BE CONSCIOUS OF AUTOPILOT

Routine is great, but it can make use lazy at times. Keep your mind energized and active by stepping out of your comfort zone daily.

LEARN TO MEDITATE

Lot's of evidence supports meditation. One study from UCLA found that people who had been meditating for an average of 20 years had more grey matter volume throughout the brain.


CONCLUSION

Overall, intelligence cannot be defined in one way. As you can see, opinions are divided. Too many theories exist, and they are still being researched today. But don't worry, as shown above, we can all work to try to improve our own intelligence.

Artificial Intelligence (AI) aims to build machines that are able to behave in ways we associate with human activity: perceiving and analysing our environment, taking decisions, communicating and learning. There are various approaches to achieving this. The most well-known, and arguably most advanced, is machine learning (ML), which itself has various broad approaches.


State of AI Report 2022


Show advancements in the past year

To mention just two approaches, in supervised learning algorithms make associations between a given input and the desired output by learning on training sets comprising many correct input/output pairs. In reinforcement learning, the ML algorithm repeatedly chooses from a given set of actions in order to maximise a reward function which should lead it to the desired result. A typical example is learning to play a game such as Go, chess or video games, where the reward function is increasing the score or winning the game. Reinforcement learning is considered to be a promising strategy to address complex real-world problems.

Machine learning algorithms have passed a number of impressive milestones in recent years. They identified objects by vision better than humans in 2015.1 The following year, they beat a Go champion and started playing complex video games.2 Autonomous cars have driven tens of millions of kilometres with very few accidents.3 Deep learning algorithms have become extraordinarily adept at mimicking traditionally human activities such as language processing, artistic creation and even scientific research.4 This rapid and impressive progress is primarily due to the increasing amount of available data and computing power. However, many applications require even more sophisticated skills, such as the ability to make sensible decisions in highly uncertain environments; transparency and traceability; the ability to combine data from highly heterogenous sources, and long-term memory and the inclusion of context.

Selection of GESDA best reads and key reports

There are several large-scale efforts to map the state of the art of artificial intelligence and to predict its evolution. Stanford’s “One Hundred Year Study on Artificial Intelligence” produces a summary of the major technological trends and applications by domains as well as legal, ethical and policy issues every five years.5 The “20-Year Community Roadmap for Artificial Intelligence Research in the US” from the Association for the Advancement of AI (AAAI) proposes detailed research roadmaps and recommendations about research infrastructures and education.6 The yearly State of AI Report summarises the main developments of AI of the past year in the field of research, industry and politics as well as education and experts.7 Other roadmaps focus on the opportunities and challenges of integrating AI in government, society and industry from European8 and Chinese9 perspectives.

Human- versus Artificial Intelligence

www.frontiersin.orgJ. E. (Hans). Korteling*, www.frontiersin.orgG. C. van de Boer-Visschedijk, www.frontiersin.orgR. A. M. Blankendaal, www.frontiersin.orgR. C. Boonekamp and www.frontiersin.orgA. R. Eikelboom

TNO Human Factors, Soesterberg, Netherlands

AI is one of the most debated subjects of today and there seems little common understanding concerning the differences and similarities of human intelligence and artificial intelligence. Discussions on many relevant topics, such as trustworthiness, explainability, and ethics are characterized by implicit anthropocentric and anthropomorphistic conceptions and, for instance, the pursuit of human-like intelligence as the golden standard for Artificial Intelligence. In order to provide more agreement and to substantiate possible future research objectives, this paper presents three notions on the similarities and differences between human- and artificial intelligence: 1) the fundamental constraints of human (and artificial) intelligence, 2) human intelligence as one of many possible forms of general intelligence, and 3) the high potential impact of multiple (integrated) forms of narrow-hybrid AI applications. For the time being, AI systems will have fundamentally different cognitive qualities and abilities than biological systems. For this reason, a most prominent issue is how we can use (and “collaborate” with) these systems as effectively as possible? For what tasks and under what conditions, decisions are safe to leave to AI and when is human judgment required? How can we capitalize on the specific strengths of human- and artificial intelligence? How to deploy AI systems effectively to complement and compensate for the inherent constraints of human cognition (and vice versa)? Should we pursue the development of AI “partners” with human (-level) intelligence or should we focus more at supplementing human limitations? In order to answer these questions, humans working with AI systems in the workplace or in policy making have to develop an adequate mental model of the underlying ‘psychological’ mechanisms of AI. So, in order to obtain well-functioning human-AI systems, Intelligence Awareness in humans should be addressed more vigorously. For this purpose a first framework for educational content is proposed.

Watch Google's AI LaMDA program talk to itself at length (full conversation)

Introduction: Artificial and Human Intelligence, Worlds of Difference

Artificial General Intelligence at the Human Level

Recent advances in information technology and in AI may allow for more coordination and integration between of humans and technology. Therefore, quite some attention has been devoted to the development of Human-Aware AI, which aims at AI that adapts as a “team member” to the cognitive possibilities and limitations of the human team members. Also metaphors like “mate,” “partner,” “alter ego,” “Intelligent Collaborator,” “buddy” and “mutual understanding” emphasize a high degree of collaboration, similarity, and equality in “hybrid teams”. When human-aware AI partners operate like “human collaborators” they must be able to sense, understand, and react to a wide range of complex human behavioral qualities, like attention, motivation, emotion, creativity, planning, or argumentation, (e.g. Krämer et al., 2012; van den Bosch and Bronkhorst, 2018; van den Bosch et al., 2019). Therefore these “AI partners,” or “team mates” have to be endowed with human-like (or humanoid) cognitive abilities enabling mutual understanding and collaboration (i.e. “human awareness”).

However, no matter how intelligent and autonomous AI agents become in certain respects, at least for the foreseeable future, they probably will remain unconscious machines or special-purpose devices that support humans in specific, complex tasks. As digital machines they are equipped with a completely different operating system (digital vs biological) and with correspondingly different cognitive qualities and abilities than biological creatures, like humans and other animals (Moravec, 1988; Klein et al., 2004; Korteling et al., 2018a; Shneiderman, 2020a). In general, digital reasoning- and problem-solving agents only compare very superficially to their biological counterparts, (e.g. Boden, 2017; Shneiderman, 2020b). Keeping that in mind, it becomes more and more important that human professionals working with advanced AI systems, (e.g. in military‐ or policy making teams) develop a proper mental model about the different cognitive capacities of AI systems in relation to human cognition. This issue will become increasingly relevant when AI systems become more advanced and are deployed with higher degrees of autonomy. Therefore, the present paper tries to provide some more clarity and insight into the fundamental characteristics, differences and idiosyncrasies of human/biological and artificial/digital intelligences. In the final section, a global framework for constructing educational content on this “Intelligence Awareness” is introduced. This can be used for the development of education and training programs for humans who have to use or “collaborate with” advanced AI systems in the near and far future.

With the application of AI systems with increasing autonomy more and more researchers consider the necessity of vigorously addressing the real complex issues of “human-level intelligence” and more broadly artificial general intelligence, or AGI, (e.g. Goertzel et al., 2014). Many different definitions of A(G)I have already been proposed, (e.g. Russell and Norvig, 2014 for an overview). Many of them boil down to: technology containing or entailing (human-like) intelligence, (e.g. Kurzweil, 1990). This is problematic. Most definitions use the term “intelligence”, as an essential element of the definition itself, which makes the definition tautological. Second, the idea that A(G)I should be human-like seems unwarranted. At least in natural environments there are many other forms and manifestations of highly complex and intelligent behaviors that are very different from specific human cognitive abilities (see Grind, 1997 for an overview). Finally, like what is also frequently seen in the field of biology, these A(G)I definitions use human intelligence as a central basis or analogy for reasoning about the—less familiar—phenomenon of A(G)I (Coley and Tanner, 2012). Because of the many differences between the underlying substrate and architecture of biological and artificial intelligence this anthropocentric way of reasoning is probably unwarranted. For these reasons we propose a (non-anthropocentric) definition of “intelligence” as: “the capacity to realize complex goals” (Tegmark, 2017). These goals may pertain to narrow, restricted tasks (narrow AI) or to broad task domains (AGI). Building on this definition, and on a definition of AGI proposed by Bieger et al. (2014) and one of Grind (1997), we define AGI here as: “Non-biological capacities to autonomously and efficiently achieve complex goals in a wide range of environments”. AGI systems should be able to identify and extract the most important features for their operation and learning process automatically and efficiently over a broad range of tasks and contexts. Relevant AGI research differs from the ordinary AI research by addressing the versatility and wholeness of intelligence, and by carrying out the engineering practice according to a system comparable to the human mind in a certain sense (Bieger et al., 2014).

It will be fascinating to create copies of ourselves which can learn iteratively by interaction with partners and thus become able to collaborate on the basis of common goals and mutual understanding and adaptation, (e.g.Bradshaw et al., 2012; Johnson et al., 2014). This would be very useful, for example when a high degree of social intelligence of AI will contribute to more adequate interactions with humans, for example in health care or for entertainment purposes (Wyrobek et al., 2008). True collaboration on the basis of common goals and mutual understanding necessarily implies some form of humanoid general intelligence. For the time being, this remains a goal on a far-off horizon. In the present paper we argue why for most applications it also may not be very practical or necessary (and probably a bit misleading) to vigorously aim or to anticipate on systems possessing “human-like” AGI or “human-like” abilities or qualities. The fact that humans possess general intelligence does not imply that new inorganic forms of general intelligence should comply to the criteria of human intelligence. In this connection, the present paper addresses the way we think about (natural and artificial) intelligence in relation to the most probable potentials (and real upcoming issues) of AI in the short- and mid-term future. This will provide food for thought in anticipation of a future that is difficult to predict for a field as dynamic as AI.

What Is “Real Intelligence”?

Implicit in our aspiration of constructing AGI systems possessing humanoid intelligence is the premise that human (general) intelligence is the “real” form of intelligence. This is even already implicitly articulated in the term “Artificial Intelligence”, as if it were not entirely real, i.e., real like non-artificial (biological) intelligence. Indeed, as humans we know ourselves as the entities with the highest intelligence ever observed in the Universe. And as an extension of this, we like to see ourselves as rational beings who are able to solve a wide range of complex problems under all kinds of circumstances using our experience and intuition, supplemented by the rules of logic, decision analysis and statistics. It is therefore not surprising that we have some difficulty to accept the idea that we might be a bit less smart than we keep on telling ourselves, i.e., “the next insult for humanity” (van Belkom, 2019). This goes as far that the rapid progress in the field of artificial intelligence is accompanied by a recurring redefinition of what should be considered “real (general) intelligence.” The conceptualization of intelligence, that is, the ability to autonomously and efficiently achieve complex goals, is then continuously adjusted and further restricted to: “those things that only humans can do.” In line with this, AI is then defined as “the study of how to make computers do things at which, at the moment, people are better” (Rich and Knight, 1991; Rich et al., 2009). This includes thinking of creative solutions, flexibly using contextual- and background information, the use of intuition and feeling, the ability to really “think and understand,” or the inclusion of emotion in an (ethical) consideration. These are then cited as the specific elements of real intelligence, (e.g. Bergstein, 2017). For instance, Facebook’s director of AI and a spokesman in the field, Yann LeCun, mentioned at a Conference at MIT on the Future of Work that machines are still far from having “the essence of intelligence.” That includes the ability to understand the physical world well enough to make predictions about basic aspects of it—to observe one thing and then use background knowledge to figure out what other things must also be true. Another way of saying this is that machines don’t have common sense (Bergstein, 2017), like submarines that cannot swim (van Belkom, 2019). When exclusive human capacities become our pivotal navigation points on the horizon we may miss some significant problems that may need our attention first.

To make this point clear, we first will provide some insight into the basic nature of both human and artificial intelligence. This is necessary for the substantiation of an adequate awareness of intelligence (Intelligence Awareness), and adequate research and education anticipating the development and application of A(G)I. For the time being, this is based on three essential notions that can (and should) be further elaborated in the near future.

• With regard to cognitive tasks, we are probably less smart than we think. So why should we vigorously focus on human-like AGI?

• Many different forms of intelligence are possible and general intelligence is therefore not necessarily the same as humanoid general intelligence (or “AGI on human level”).

• AGI is often not necessary; many complex problems can also be tackled effectively using multiple narrow AI’s.1

We Are Probably Not so Smart as We Think

How intelligent are we actually? The answer to that question is determined to a large extent by the perspective from which this issue is viewed, and thus by the measures and criteria for intelligence that is chosen. For example, we could compare the nature and capacities of human intelligence with other animal species. In that case we appear highly intelligent. Thanks to our enormous learning capacity, we have by far the most extensive arsenal of cognitive abilities2 to autonomously solve complex problems and achieve complex objectives. This way we can solve a huge variety of arithmetic, conceptual, spatial, economic, socio-organizational, political, etc. problems. The primates—which differ only slightly from us in genetic terms—are far behind us in that respect. We can therefore legitimately qualify humans, as compared to other animal species that we know, as highly intelligent.

Powering Pharma Intelligence with Advanced Analytics Webinar


Limited Cognitive Capacity

However, we can also look beyond this “relative interspecies perspective” and try to qualify our intelligence in more absolute terms, i.e., using a scale ranging from zero to what is physically possible. For example, we could view the computational capacity of a human brain as a physical system (Bostrom, 2014; Tegmark, 2017). The prevailing notion in this respect among AI scientists is that intelligence is ultimately a matter of information and computation, and (thus) not of flesh and blood and carbon atoms. In principle, there is no physical law preventing that physical systems (consisting of quarks and atoms, like our brain) can be built with a much greater computing power and intelligence than the human brain. This would imply that there is no insurmountable physical reason why machines one day cannot become much more intelligent than ourselves in all possible respects (Tegmark, 2017). Our intelligence is therefore relatively high compared to other animals, but in absolute terms it may be very limited in its physical computing capacity, albeit only by the limited size of our brain and its maximal possible number of neurons and glia cells, (e.g. Kahle, 1979).

To further define and assess our own (biological) intelligence, we can also discuss the evolution and nature of our biological thinking abilities. As a biological neural network of flesh and blood, necessary for survival, our brain has undergone an evolutionary optimization process of more than a billion years. In this extended period, it developed into a highly effective and efficient system for regulating essential biological functions and performing perceptive-motor and pattern-recognition tasks, such as gathering food, fighting and flighting, and mating. Almost during our entire evolution, the neural networks of our brain have been further optimized for these basic biological and perceptual motor processes that also lie at the basis of our daily practical skills, like cooking, gardening, or household jobs. Possibly because of the resulting proficiency for these kinds of tasks we may forget that these processes are characterized by extremely high computational complexity, (e.g. Moravec, 1988). For example, when we tie our shoelaces, many millions of signals flow in and out through a large number of different sensor systems, from tendon bodies and muscle spindles in our extremities to our retina, otolithic organs and semi-circular channels in the head, (e.g. Brodal, 1981). This enormous amount of information from many different perceptual-motor systems is continuously, parallel, effortless and even without conscious attention, processed in the neural networks of our brain (Minsky, 1986; Moravec, 1988; Grind, 1997). In order to achieve this, the brain has a number of universal (inherent) working mechanisms, such as association and associative learning (Shatz, 1992; Bar, 2007), potentiation and facilitation (Katz and Miledi, 1968; Bao et al., 1997), saturation and lateral inhibition (Isaacson and Scanziani, 2011; Korteling et al., 2018a).

These kinds of basic biological and perceptual-motor capacities have been developed and set down over many millions of years. Much later in our evolution—actually only very recently—our cognitive abilities and rational functions have started to develop. These cognitive abilities, or capacities, are probably less than 100 thousand years old, which may be qualified as “embryonal” on the time scale of evolution, (e.g. Petraglia and Korisettar, 1998; McBrearty and Brooks, 2000; Henshilwood and Marean, 2003). In addition, this very thin layer of human achievement has necessarily been built on these “ancient” neural intelligence for essential survival functions. So, our “higher” cognitive capacities are developed from and with these (neuro) biological regulation mechanisms (Damasio, 1994; Korteling and Toet, 2020). As a result, it should not be a surprise that the capacities of our brain for performing these recent cognitive functions are still rather limited. These limitations are manifested in many different ways, for instance:

  • The amount of cognitive information that we can consciously process (our working memory, span or attention) is very limited (Simon, 1955). The capacity of our working memory is approximately 10–50 bits per second (Tegmark, 2017).
  • Most cognitive tasks, like reading text or calculation, require our full attention and we usually need a lot of time to execute them. Mobile calculators can perform millions times more complex calculations than we can (Tegmark, 2017).
  • Although we can process lots of information in parallel, we cannot simultaneously execute cognitive tasks that require deliberation and attention, i.e., “multi-tasking” (Korteling, 1994; Rogers and Monsell, 1995; Rubinstein, Meyer, and Evans, 2001).
  • Acquired cognitive knowledge and skills of people (memory) tend to decay over time, much more than perceptual-motor skills. Because of this limited “retention” of information we easily forget substantial portions of what we have learned (Wingfield and Byrnes, 1981).
LaMDA | Is google's AI sentient? | Full audio conversation between Blake Lemoine and LaMDA


Ingrained Cognitive Biases

Our limited processing capacity for cognitive tasks is not the only factor determining our cognitive intelligence. Except for an overall limited processing capacity, human cognitive information processing shows systematic distortions. These are manifested in many cognitive biases (Tversky and Kahneman, 1973, Tversky and Kahneman, 1974). Cognitive biases are systematic, universally occurring tendencies, inclinations, or dispositions that skew or distort information processes in ways that make their outcome inaccurate, suboptimal, or simply wrong, (e.g. Lichtenstein and Slovic, 1971; Tversky and Kahneman, 1981). Many biases occur in virtually the same way in many different decision situations (Shafir and LeBoeuf, 2002; Kahneman, 2011; Toet et al., 2016). The literature provides descriptions and demonstrations of over 200 biases. These tendencies are largely implicit and unconscious and feel quite naturally and self/evident when we are aware of these cognitive inclinations (Pronin et al., 2002; Risen, 2015; Korteling et al., 2018b). That is why they are often termed “intuitive” (Kahneman and Klein, 2009) or “irrational” (Shafir and LeBoeuf, 2002). Biased reasoning can result in quite acceptable outcomes in natural or everyday situations, especially when the time cost of reasoning is taken into account (Simon, 1955; Gigerenzer and Gaissmaier, 2011). However, people often deviate from rationality and/or the tenets of logic, calculation, and probability in inadvisable ways (Tversky and Kahneman, 1974; Shafir and LeBoeuf, 2002) leading to suboptimal decisions in terms of invested time and effort (costs) given the available information and expected benefits.

Biases are largely caused by inherent (or structural) characteristics and mechanisms of the brain as a neural network (Korteling et al., 2018a; Korteling and Toet, 2020). Basically, these mechanisms—such as association, facilitation, adaptation, or lateral inhibition—result in a modification of the original or available data and its processing, (e.g. weighting its importance). For instance, lateral inhibition is a universal neural process resulting in the magnification of differences in neural activity (contrast enhancement), which is very useful for perceptual-motor functions, maintaining physical integrity and allostasis, (i.e. biological survival functions). For these functions our nervous system has been optimized for millions of years. However, “higher” cognitive functions, like conceptual thinking, probability reasoning or calculation, have been developed only very recently in evolution. These functions are probably less than 100 thousand years old, and may, therefore, be qualified as “embryonal” on the time scale of evolution, (e.g. McBrearty and Brooks, 2000; Henshilwood and Marean, 2003; Petraglia and Korisettar, 2003). In addition, evolution could not develop these new cognitive functions from scratch, but instead had to build this embryonal, and thin layer of human achievement from its “ancient” neural heritage for the essential biological survival functions (Moravec, 1988). Since cognitive functions typically require exact calculation and proper weighting of data, data transformations—like lateral inhibition—may easily lead to systematic distortions, (i.e. biases) in cognitive information processing. Examples of the large number of biases caused by the inherent properties of biological neural networks are: Anchoring bias (biasing decisions toward previously acquired information, Furnham and Boo, 2011; Tversky and Kahneman, 1973, Tversky and Kahneman, 1974), the Hindsight bias (the tendency to erroneously perceive events as inevitable or more likely once they have occurred, Hoffrage et al., 2000; Roese and Vohs, 2012) the Availability bias (judging the frequency, importance, or likelihood of an event by the ease with which relevant instances come to mind, Tversky and Kahnemann, 1973; Tversky and Kahneman, 1974), and the Confirmation bias (the tendency to select, interpret, and remember information in a way that confirms one’s preconceptions, views, and expectations, Nickerson, 1998). In addition to these inherent (structural) limitations of (biological) neural networks, biases may also originate from functional evolutionary principles promoting the survival of our ancestors who, as hunter-gatherers, lived in small, close-knit groups (Haselton et al., 2005; Tooby and Cosmides, 2005). Cognitive biases can be caused by a mismatch between evolutionarily rationalized “heuristics” (“evolutionary rationality”: Haselton et al., 2009) and the current context or environment (Tooby and Cosmides, 2005). In this view, the same heuristics that optimized the chances of survival of our ancestors in their (natural) environment can lead to maladaptive (biased) behavior when they are used in our current (artificial) settings. Biases that have been considered as examples of this kind of mismatch are the Action bias (preferring action even when there is no rational justification to do this, Baron and Ritov, 2004; Patt and Zeckhauser, 2000), Social proof (the tendency to mirror or copy the actions and opinions of others, Cialdini, 1984), the Tragedy of the commons (prioritizing personal interests over the common good of the community, Hardin, 1968), and the Ingroup bias (favoring one’s own group above that of others, Taylor and Doria, 1981).

This hard-wired (neurally inherent and/or evolutionary ingrained) character of biased thinking makes it unlikely that simple and straightforward methods like training interventions or awareness courses will be very effective to ameliorate biases. This difficulty of bias mitigation seems indeed supported by the literature (Korteling et al., 2021).

General Intelligence Is Not the Same as Human-like Intelligence

Fundamental Differences Between Biological and Artificial Intelligence

We often think and deliberate about intelligence with an anthropocentric conception of our own intelligence in mind as an obvious and unambiguous reference. We tend to use this conception as a basis for reasoning about other, less familiar phenomena of intelligence, such as other forms of biological and artificial intelligence (Coley and Tanner, 2012). This may lead to fascinating questions and ideas. An example is the discussion about how and when the point of “intelligence at human level” will be achieved. For instance, Ackermann. (2018) writes: “Before reaching superintelligence, general AI means that a machine will have the same cognitive capabilities as a human being”. So, researchers deliberate extensively about the point in time when we will reach general AI, (e.g., Goertzel, 2007; Müller and Bostrom, 2016). We suppose that these kinds of questions are not quite on target. There are (in principle) many different possible types of (general) intelligence conceivable of which human-like intelligence is just one of those. This means, for example that the development of AI is determined by the constraint of physics and technology, and not by those of biological evolution. So, just as the intelligence of a hypothetical extraterrestrial visitor of our planet earth is likely to have a different (in-)organic structure with different characteristics, strengths, and weaknesses, than the human residents this will also apply to artificial forms of (general) intelligence. Below we briefly summarize a few fundamental differences between human and artificial intelligence (Bostrom, 2014):

‐Basic structure: Biological (carbon) intelligence is based on neural “wetware” which is fundamentally different from artificial (silicon-based) intelligence. As opposed to biological wetware, in silicon, or digital, systems “hardware” and “software” are independent of each other (Kosslyn and Koenig, 1992). When a biological system has learned a new skill, this will be bounded to the system itself. In contrast, if an AI system has learned a certain skill then the constituting algorithms can be directly copied to all other similar digital systems.

  • Speed: Signals from AI systems propagate with almost the speed of light. In humans, the conduction velocity of nerves proceeds with a speed of at most 120 m/s, which is extremely slow in the time scale of computers (Siegel and Sapru, 2005).
  • Connectivity and communication: People cannot directly communicate with each other. They communicate via language and gestures with limited bandwidth. This is slower and more difficult than the communication of AI systems that can be connected directly to each other. Thanks to this direct connection, they can also collaborate on the basis of integrated algorithms.
  • Updatability and scalability: AI systems have almost no constraints with regard to keep them up to date or to upscale and/or re-configure them, so that they have the right algorithms and the data processing and storage capacities necessary for the tasks they have to carry out. This capacity for rapid, structural expansion and immediate improvement hardly applies to people.
  • In contrast, biology does a lot with a little: organic brains are millions of times more efficient in energy consumption than computers. The human brain consumes less energy than a lightbulb, whereas a supercomputer with comparable computational performance uses enough electricity to power quite a village (Fischetti, 2011).

These kinds of differences in basic structure, speed, connectivity, updatability, scalability, and energy consumption will necessarily also lead to different qualities and limitations between human and artificial intelligence. Our response speed to simple stimuli is, for example, many thousands of times slower than that of artificial systems. Computer systems can very easily be connected directly to each other and as such can be part of one integrated system. This means that AI systems do not have to be seen as individual entities that can easily work alongside each other or have mutual misunderstandings. And if two AI systems are engaged in a task then they run a minimal risk to make a mistake because of miscommunications (think of autonomous vehicles approaching a crossroad). After all, they are intrinsically connected parts of the same system and the same algorithm (Gerla et al., 2014).

Google Just Put an A.I. Brain in a Robot [Research Breakthrough]

/p>

Complexity and Moravec’s Paradox

Because biological, carbon-based, brains and digital, silicon-based, computers are optimized for completely different kinds of tasks (e.g., Moravec, 1988; Korteling et al., 2018b), human and artificial intelligence show fundamental and probably far-stretching differences. Because of these differences it may be very misleading to use our own mind as a basis, model or analogy for reasoning about AI. This may lead to erroneous conceptions, for example about the presumed abilities of humans and AI to perform complex tasks. Resulting flaws concerning information processing capacities emerge often in the psychological literature in which “complexity” and “difficulty” of tasks are used interchangeably (see for examples: Wood et al., 1987; McDowd and Craik, 1988). Task complexity is then assessed in an anthropocentric way, that is: by the degree to which we humans can perform or master it. So, we use the difficulty to perform or master a task as a measure of its complexity, and task performance (speed, errors) as a measure of skill and intelligence of the task performer. Although this could sometimes be acceptable in psychological research, this may be misleading if we strive for understanding the intelligence of AI systems. For us it is much more difficult to multiply two random numbers of six digits than to recognize a friend on a photograph. But when it comes to counting or arithmetic operations, computers are thousands of times faster and better, while the same systems have only recently taken steps in image recognition (which only succeeded when deep learning technology, based on some principles of biological neural networks, was developed). In general: cognitive tasks that are relatively difficult for the human brain (and which we therefore find subjectively difficult) do not have to be computationally complex, (e.g., in terms of objective arithmetic, logic, and abstract operations). And vice versa: tasks that are relatively easy for the brain (recognizing patterns, perceptual-motor tasks, well-trained tasks) do not have to be computationally simple. This phenomenon, that which is easy for the ancient, neural “technology” of people and difficult for the modern, digital technology of computers (and vice versa) has been termed the moravec’s Paradox. Hans Moravec (1988) wrote: “It is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility.”

Human Superior Perceptual-Motor Intelligence

Moravec’s paradox implies that biological neural networks are intelligent in different ways than artificial neural networks. Intelligence is not limited to the problems or goals that we as humans, equipped with biological intelligence, find difficult (Grind, 1997). Intelligence, defined as the ability to realize complex goals or solve complex problems, is much more than that. According to Moravec (1988) high-level reasoning requires very little computation, but low-level perceptual-motor skills require enormous computational resources. If we express the complexity of a problem in terms of the number of elementary calculations needed to solve it, then our biological perceptual motor intelligence is highly superior to our cognitive intelligence. Our organic perceptual-motor intelligence is especially good at associative processing of higher-order invariants (“patterns”) in the ambient information. These are computationally more complex and contain more information than the simple, individual elements (Gibson, 1966, Gibson, 1979). An example of our superior perceptual-motor abilities is the Object Superiority Effect: we perceive and interpret whole objects faster and more effective than the (more simple) individual elements that make up these objects (Weisstein and Harris, 1974; McClelland, 1978; Williams and Weisstein, 1978; Pomerantz, 1981). Thus, letters are also perceived more accurately when presented as part of a word than when presented in isolation, i.e. the Word superiority effect, (e.g. Reicher, 1969; Wheeler, 1970). So, the difficulty of a task does not necessarily indicate its inherent complexity. As Moravec (1988) puts it: “We are all prodigious Olympians in perceptual and motor areas, so good that we make the difficult look easy. Abstract thought, though, is a new trick, perhaps less than 100 thousand years old. We have not yet mastered it. It is not all that intrinsically difficult; it just seems so when we do it.”

Webinar QueryPlanet - AI for EO at scale: Introduction

The Supposition of Human-like AGI

So, if there would exist AI systems with general intelligence that can be used for a wide range of complex problems and objectives, those AGI machines would probably have a completely different intelligence profile, including other cognitive qualities, than humans have (Goertzel, 2007). This will be even so, if we manage to construct AI agents who display similar behavior like us and if they are enabled to adapt to our way of thinking and problem-solving in order to promote human-AI teaming. Unless we decide to deliberately degrade the capabilities of AI systems (which would not be very smart), the underlying capacities and abilities of man and machines with regard to collection and processing of information, data analysis, probability reasoning, logic, memory capacity etc. will still remain dissimilar. Because of these differences we should focus at systems that effectively complement us, and that make the human-AI system stronger and more effective. Instead of pursuing human-level AI it would be more beneficial to focus on autonomous machines and (support) systems that fill in, or extend on, the manifold gaps of human cognitive intelligence. For instance, whereas people are forced—by the slowness and other limitations of biological brains—to think heuristically in terms of goals, virtues, rules and norms expressed in (fuzzy) language, AI has already established excellent capacities to process and calculate directly on highly complex data. Therefore, or the execution of specific (narrow) cognitive tasks (logical, analytical, computational), modern digital intelligence may be more effective and efficient than biological intelligence. AI may thus help to produce better answers for complex problems using high amounts of data, consistent sets of ethical principles and goals, probabilistic-, and logic reasoning, (e.g. Korteling et al., 2018b). Therefore, we conjecture that ultimately the development of AI systems for supporting human decision making may appear the most effective way leading to the making of better choices or the development of better solutions on complex issues. So, the cooperation and division of tasks between people and AI systems will have to be primarily determinated by their mutually specific qualities. For example, tasks or task components that appeal to capacities in which AI systems excel, will have to be less (or less fully) mastered by people, so that less training will probably be required. AI systems are already much better than people at logically and arithmetically correct gathering (selecting) and processing (weighing, prioritizing, analyzing, combining) large amounts of data. They do this quickly, accurately and reliably. They are also more stable (consistent) than humans, have no stress and emotions and have a great perseverance and a much better retention of knowledge and skills than people. As a machine, they serve people completely and without any “self-interest” or “own hidden agenda.” Based on these qualities AI systems may effectively take over tasks, or task components, from people. However, it remains important that people continue to master those tasks to a certain extent, so that they can take over tasks or take adequate action if the machine system fails.

In general, people are better suited than AI systems for a much broader spectrum of cognitive and social tasks under a wide variety of (unforeseen) circumstances and events (Korteling et al., 2018b). People are also better at the social-psychosocial interaction for the time being. For example, it is difficult for AI systems to interpret human language and -symbolism. This requires a very extensive frame of reference, which, at least until now and for the near future, is difficult to achieve within AI. As a result of all these differences, people are still better at responding (as a flexible team) to unexpected and unpredictable situations and creatively devising possibilities and solutions in open and ill-defined tasks and across a wide range of different, and possibly unexpected, circumstances. People will have to make extra use of their specific human qualities, (i.e. what people are relatively good at) and train to improve relevant competencies. In addition, human team members will have to learn to deal well with the overall limitations of AIs. With such a proper division of tasks, capitalizing on the specific qualities and limitations of humans and AI systems, human decisional biases may be circumvented and better performance may be expected. This means that enhancement of a team with intelligent machines having less cognitive constraints and biases, may have more surplus value than striving at collaboration between humans and AI that have developed the same (human) biases. Although cooperation in teams with AI systems may need extra training in order to effectively deal with this bias-mismatch, this heterogeneity will probably be better and safer. This also opens up the possibility of a combination of high levels of meaningful human control AND high levels of automation which is likely to produce the most effective and safe human-AI systems (Elands et al., 2019; Shneiderman, 2020a). In brief: human intelligence is not the golden standard for general intelligence; instead of aiming at human-like AGI, the pursuit of AGI should thus focus on effective digital/silicon AGI in conjunction with an optimal configuration and allocation of tasks.

Explainability and Trust

Developments in relation to artificial learning, or deep (reinforcement) learning, in particular have been revolutionary. Deep learning simulates a network resembling the layered neural networks of our brain. Based on large quantities of data, the network learns to recognize patterns and links to a high level of accuracy and then connect them to courses of action without knowing the underlying causal links. This implies that it is difficult to provide deep learning AI with some kind of transparency in how or why it has made a particular choice by, for example, by expressing an intelligible reasoning (for humans) about its decision process, like we do, (e.g. Belkom, 2019). In addition, reasoning about decisions like humans do is a very malleable and ad hoc process (at least in humans). Humans are generally unaware of their implicit cognitions or attitudes, and therefore not be able to adequately report on them. It is therefore rather difficult for many humans to introspectively analyze their mental states, as far as these are conscious, and attach the results of this analysis to verbal labels and descriptions, (e.g. Nosek et al. (2011). First, the human brain hardly reveals how it creates conscious thoughts, (e.g. Feldman-Barret, 2017). What it actually does is giving us the illusion that its products reveal its inner workings. In other words: our conscious thoughts tell us nothing about the way in which these thoughts came about. There is also no subjective marker that distinguishes correct reasoning processes from erroneous ones (Kahneman and Klein, 2009). The decision maker therefore has no way to distinguish between correct thoughts, emanating from genuine knowledge and expertize, and incorrect ones following from inappropriate neuro-evolutionary processes, tendencies, and primal intuitions. So here we could ask the question: isn’t it more trustworthy to have a real black box, than to listen to a confabulating one? In addition, according to Werkhoven et al. (2018) demanding explainability observability, or transparency (Belkom, 2019; van den Bosch et al., 2019) may cause artificial intelligent systems to constrain their potential benefit for human society, to what can be understood by humans.

Of course we should not blindly trust the results generated by AI. Like other fields of complex technology, (e.g. Modeling & Simulation), AI systems need to be verified (meeting specifications) and validated (meeting the systems’ goals) with regard to the objectives for which the system was designed. In general, when a system is properly verified and validated, it may be considered safe, secure and fit for purpose. It therefore deserves our trust for (logically) comprehensible and objective reasons (although mistakes still can happen). Likewise people trust in the performance of aero planes and cell phones despite we are almost completely ignorant about their complex inner processes. Like our own brains, artificial neural networks are fundamentally intransparant (Nosek et al., 2011; Feldman-Barret, 2017). Therefore, trust in AI should be primarily based on its objective performance. This forms a more important base than providing trust on the basis of subjective (trickable) impressions, stories, or images aimed at belief and appeal to the user. Based on empirical validation research, developers and users can explicitly verify how well the system is doing with respect to the set of values and goals for which the machine was designed. At some point, humans may want to trust that goals can be achieved against less cost and better outcomes, when we accept solutions even if they may be less transparent for humans (Werkhoven et al., 2018).

Advanced Analytics Webinar: 2020 Trends in Enterprise Advanced Analytics

The Impact of Multiple Narrow AI Technology

AGI as the Holy Grail

AGI, like human general intelligence, would have many obvious advantages, compared to narrow (limited, weak, specialized) AI. An AGI system would be much more flexible and adaptive. On the basis of generic training and reasoning processes it would understand autonomously how multiple problems in all kinds of different domains can be solved in relation to their context, (e.g. Kurzweil, 2005). AGI systems also require far fewer human interventions to accommodate the various loose ends among partial elements, facets, and perspectives in complex situations. AGI would really understand problems and is capable to view them from different perspectives (as people—ideally—also can do). A characteristic of the current (narrow) AI tools is that they are skilled in a very specific task, where they can often perform at superhuman levels, (e.g. Goertzel, 2007; Silver et al., 2017). These specific tasks have been well-defined and structured. Narrow AI systems are less suitable, or totally unsuitable, for tasks or task environments that offer little structure, consistency, rules or guidance, in which all sorts of unexpected, rare or uncommon events, (e.g. emergencies) may occur. Knowing and following fixed procedures usually does not lead to proper solutions in these varying circumstances. In the context of (unforeseen) changes in goals or circumstances, the adequacy of current AI is considerably reduced because it cannot reason from a general perspective and adapt accordingly (Lake et al., 2017; Horowitz, 2018). As with narrow AI systems, people are then needed to supervise on these deviations in order to enable flexible and adaptive system performance. Therefore the quest of AGI may be considered as looking for a kind of holy grail.

Multiple Narrow AI is Most Relevant Now!

The potential high prospects of AGI, however, do not imply that AGI will be the most crucial factor in future AI R&D, at least for the short- and mid-term. When reflecting on the great potential benefits of general intelligence, we tend to consider narrow AI applications as separate entities that can very well be outperformed by a broader AGI that presumably can deal with everything. But just as our modern world has evolved rapidly through a diversity of specific (limited) technological innovations, at the system level the total and wide range of emerging AI applications will also have a groundbreaking technological and societal impact (Peeters et al., 2020). This will be all the more relevant for the future world of big data, in which everything is connected to everything through the Internet of Things. So, it will be much more profitable and beneficial to develop and build (non-human-like) AI variants that will excel in areas where people are inherently limited. It seems not too far-fetched to suppose that the multiple variants of narrow AI applications also gradually get more broadly interconnected. In this way, a development toward an ever broader realm of integrated AI applications may be expected. In addition, it is already possible to train a language model AI (Generative Pre-trained Transformer3, GPT-3) with a gigantic dataset and then have it learn various tasks based on a handful of examples—one or few-shot learning. GPT-3 (developed by OpenAI) can do this with language-related tasks, but there is no reason why this should not be possible with image and sound, or with combinations of these three (Brown, 2020).

Besides, the moravec Paradox implies that the development of AI “partners” with many kinds of human (-level) qualities will be very difficult to obtain, whereas their added value, (i.e. beyond the boundaries of human capabilities) will be relatively low. The most fruitful AI applications will mainly involve supplementing human constraints and limitations. Given the present incentives for competitive technological progress, multiple forms of (connected) narrow AI systems will be the major driver of AI impact on our society for short- and mid-term. For the near future, this may imply that AI applications will remain very different from, and in many aspects almost incomparable with, human agents. This is likely to be true even if the hypothetical match of artificial general intelligence (AGI) with human cognition were to be achieved in the future in the longer term. Intelligence is a multi-dimensional (quantitative, qualitative) concept. All dimensions of AI unfold and grow along their own different path with their own dynamics. Therefore, over time an increasing number of specific (narrow) AI capacities may gradually match, overtake and transcend human cognitive capacities. Given the enormous advantages of AI, for example in the field of data availability and data processing capacities, the realization of AGI probably would at the same time outclass human intelligence in many ways. Which implies that the hypothetical point of time of matching human- and artificial cognitive capacities, i.e. human-level AGI, will probably be hard to define in a meaningful way (Goertzel, 2007).3

So when AI will truly understand us as a “friend,” “partner,” “alter ego” or “buddy,” as we do when we collaborate with other humans as humans, it will surpass us in many areas at the same Moravec (1998) time. It will have a completely different profile of capacities and abilities and thus it will not be easy to really understand the way it “thinks” and comes to its decisions. In the meantime, however, as the capacities of robots expand and move from simple tools to more integrated systems, it is important to calibrate our expectations and perceptions toward robots appropriately. So, we will have to enhance our awareness and insight concerning the continuous development and progression of multiple forms of (integrated) AI systems. This concerns for example the multi-facetted nature of intelligence. Different kind of agents may have different combinations of intelligences of very different levels. An agent with general intelligence may for example be endowed with excellent abilities on the area of image recognition and navigation, calculation, and logical reasoning while at the same time being dull on the area of social interaction and goal-oriented problem solving. This awareness of the multi-dimensional nature of intelligence also concerns the way we have to deal with (and capitalize on) anthropomorphism. That is the human tendency in human-robot interaction to characterize non-human artifacts that superficially look similar to us as possessing human-like traits, emotions, and intentions, (e.g., Kiesler and Hinds, 2004; Fink, 2012; Haring et al., 2018). Insight into these human factors issues is crucial to optimize the utility, performance and safety of human-AI systems (Peeters et al., 2020).

From this perspective, the question whether or not “AGI at the human level” will be realized is not the most relevant question for the time being. According to most AI scientists, this will certainly happen, and the key question is not IF this will happen, but WHEN, (e.g., Müller and Bostrom, 2016). At a system level, however, multiple narrow AI applications are likely to overtake human intelligence in an increasingly wide range of areas.

UMaine Artificial Intelligence Webinar


Conclusions and Framework

The present paper focused on providing some more clarity and insight into the fundamental characteristics, differences and idiosyncrasies of human and artificial intelligences. First we presented ideas and arguments to scale up and differentiate our conception of intelligence, whether this may be human or artificial. Central to this broader, multi-faceted, conception of intelligence is the notion that intelligence in itself is a matter of information and computation, independent of its physical substrate. However, the nature of this physical substrate (biological/carbon or digital/silicon), will substantially determine its potential envelope of cognitive abilities and limitations. Organic cognitive faculties of humans have been very recently developed during the evolution of mankind. These “embryonal” faculties have been built on top of a biological neural network apparatus that has been optimized for allostasis and (complex) perceptual motor functions. Human cognition is therefore characterized by various structural limitations and distortions in its capacity to process certain forms of non-biological information. Biological neural networks are, for example, not very capable of performing arithmetic calculations, for which my pocket calculator fits millions of times better. These inherent and ingrained limitations, that are due to the biological and evolutionary origin of human intelligence, may be termed “hard-wired.”

In line with the Moravic’s paradox, we argued that intelligent behavior is more than what we, as homo sapiens, find difficult. So we should not confuse task-difficulty (subjective, anthropocentric) with task-complexity (objective). Instead we advocated a versatile conceptualization of intelligence and an acknowledgment of its many possible forms and compositions. This implies a high variety in types of biological or other forms of high (general) intelligence with a broad range of possible intelligence profiles and cognitive qualities (which may or may not surpass ours in many ways). This would make us better aware of the most probable potentials of AI applications for the short- and medium-term future. For example, from this perspective, our primary research focus should be on those components of the intelligence spectrum that are relatively difficult for the human brain and relatively easy for machines. This involves primarily the cognitive component requiring calculation, arithmetic analysis, statistics, probability calculation, data analysis, logical reasoning, memorization, et cetera.

In line with this we have advocated a modest, more humble, view of our human, general intelligence. Which also implies that human-level AGI should not be considered as the “golden standard” of intelligence (to be pursued with foremost priority). Because of the many fundamental differences between natural and artificial intelligences, human-like AGI will be very difficult to accomplish in the first place (and also with relatively limited added value). In case an AGI will be accomplished in the (far) future it will therefore probably have a completely different profile of cognitive capacities and abilities than we, as humans, have. When such an AGI has come so far that it is able to “collaborate” like a human, it will at the same time be likely that can in many respects already function at highly superior levels relative to what we are able to. For the time being, however, it will not be very realistic and useful to aim at AGI that includes the broad scope of human perceptual-motor and cognitive abilities. Instead, the most profitable AI applications for the short- and mid-term future, will probably be based on multiple narrow AI systems. These multiple narrow AI applications may catch up with human intelligence in an increasingly broader range of areas.

From this point of view we advocate not to dwell too intensively on the AGI question, whether or when AI will outsmart us, take our jobs, or how to endow it with all kinds of human abilities. Given the present state of the art it may be wise to focus more on the whole system of multiple AI innovations with humans as a crucial connecting and supervising factor. This also implies the establishment and formalization of legal boundaries and proper (effective, ethical, safe) goals for AI systems (Elands et al., 2019; Aliman, 2020). So this human factor (legislator, user, “collaborator”) needs to have good insight into the characteristics and capacities of biological and artificial intelligence (under all sorts of tasks and working conditions). Both in the workplace and in policy making the most fruitful AI applications will be to complement and compensate for the inherent biological and cognitive constraints of humans. For this reason, prominent issues concern how to use it intelligently? For what tasks and under what conditions decisions are safe to leave to AI and when is human judgment required? How can we capitalize on the strengths of human intelligence and how to deploy AI systems effectively to complement and compensate for the inherent constraints of human cognition. See (Hoffman and Johnson, 2019; Shneiderman, 2020a; Shneiderman, 2020b) for recent overviews.

In summary: No matter how intelligent autonomous AI agents become in certain respects, at least for the foreseeable future, they will remain unconscious machines. These machines have a fundamentally different operating system (biological vs digital) with correspondingly different cognitive abilities and qualities than people and other animals. So, before a proper “team collaboration” can start, the human team members will have to understand these kinds of differences, i.e., how human information processing and intelligence differs from that of–the many possible and specific variants of—AI systems. Only when humans develop a proper of these “interspecies” differences they can effectively capitalize on the potential benefits of AI in (future) human-AI teams. Given the high flexibility, versatility, and adaptability of humans relative to AI systems, the first challenge becomes then how to ensure human adaptation to the more rigid abilities of AI?4 In other words: how can we achieve a proper conception the differences between human- and artificial intelligence?

CSIAC Webinar - Deep Learning for Radio Frequency Target Classification

Framework for Intelligence Awareness Training

For this question, the issue of Intelligence Awareness in human professionals needs to be addressed more vigorously. Next to computer tools for the distribution of relevant awareness information (Collazos et al., 2019) in human-machine systems, this requires better education and training on how to deal with the very new and different characteristics, idiosyncrasies, and capacities of AI systems. This includes, for example, a proper understanding of the basic characteristics, possibilities, and limitations of the AI’s cognitive system properties without anthropocentric and/or anthropomorphic misconceptions. In general, this “Intelligence Awareness” is highly relevant in order to better understand, investigate, and deal with the manifold possibilities and challenges of machine intelligence. This practical human-factors challenge could, for instance, be tackled by developing new, targeted and easily configurable (adaptive) training forms and learning environments for human-AI systems. These flexible training forms and environments, (e.g. simulations and games) should focus at developing knowledge, insight and practical skills concerning the specific, non-human characteristics, abilities, and limitations of AI systems and how to deal with these in practical situations. People will have to understand the critical factors determining the goals, performance, and choices of AI? Which may in some cases even include the simple notion that AIs excite as much about their performance in achieving their goals as your refrigerator does for keeping your milkshake well. They have to learn when and under what conditions decisions are safe to leave to AI and when is human judgment required or essential? And more in general: how does it “think” and decide? The relevance of this kind of knowledge, skills and practices will only become bigger when the degree of autonomy (and genericity) of advanced AI systems will grow.

What does such an Intelligence Awareness training curriculum look like? It needs to include at least a module on the cognitive characteristics of AI. This is basically a subject similar to those subjects that are also included in curricula on human cognition. This broad module on the “Cognitive Science of AI” may involve a range of sub-topics starting with a revision of the concept of "Intelligence" stripped of anthropocentric and anthropomorphic misunderstandings. In addition, this module should focus at providing knowledge about the structure and operation of the AI operating system or the “AI mind.” This may be followed by subjects like: Perception and interpretation of information by AI, AI cognition (memory, information processing, problem solving, biases), dealing with AI possibilities and limitations in the “human” areas like creativity, adaptivity, autonomy, reflection, and (self-) awareness, dealing with goal functions (valuation of actions in relation to cost-benefit), AI ethics and AI security. In addition, such a curriculum should include technical modules providing insight into the working of the AI operating system. Due to the enormous speed with which the AI technology and application develops, the content of such a curriculum is also very dynamic and continuously evolving on the basis of technological progress. This implies that the curriculum and training-aids and -environments should be flexible, experiential, and adaptive, which makes the work form of serious gaming ideally suited. Below, we provide a global framework for the development of new educational curricula on AI awareness. These subtopics go beyond learning to effectively “operate,” “control” or interact with specific AI applications (i.e. conventional human-machine interaction):

  • Understanding of underlying system characteristics of the AI (the “AI brain”). Understanding the specific qualities and limitations of AI relative to human intelligence.
  • Understanding the complexity of the tasks and of the environment from the perspective of AI systems.
  • Understanding the problem of biases in human cognition, relative to biases in AI.
  • Understanding the problems associated with the control of AI, predictability of AI behavior (decisions), building trust, maintaining situation awareness (complacency), dynamic task allocation, (e.g. taking over each other’s tasks) and responsibility (accountability).
  • How to deal with possibilities and limitations of AI in the field of “creativity”, adaptability of AI, “environmental awareness”, and generalization of knowledge.
  • Learning to deal with perceptual and cognitive limitations and possible errors of AI which may be difficult to comprehend.
  • Trust in the performance of AI (possibly in spite of limited transparency or ability to “explain”) based on verification and validation.
  • Learning to deal with our natural inclination to anthropocentrism and anthropomorphism (“theory of mind”) when reasoning about human-robot interaction.
  • How to capitalize on the powers of AI in order to deal with the inherent constraints of human information processing (and vice versa).
  • Understanding the specific characteristics and qualities of the man-machine system and being able to decide on when, for what, and how the integrated combination of human- and AI faculties may perform at best overall system potential.

In conclusion: due to the enormous speed with which the AI technology and application evolves we need a more versatile conceptualization of intelligence and an acknowledgment of its many possible forms and combinations. A revised conception of intelligence includes also a good understanding of the basic characteristics, possibilities, and limitations of different (biological, artificial) cognitive system properties without anthropocentric and/or anthropomorphic misconceptions. This “Intelligence Awareness” is highly relevant in order to better understand and deal with the manifold possibilities and challenges of machine intelligence, for instance to decide when to use or deploy AI in relation to tasks and their context. The development of educational curricula with new, targeted, and easily configurable training forms and learning environments for human-AI systems are therefore recommended. Further work should focus on training tools, methods and content that are flexible and adaptive enough to be able to keep up with the rapid changes in the field of AI and with the wide variety of target groups and learning goals.

More Information:

https://radar.gesda.global/topics/advanced-ai

https://www.stateof.ai

https://advancedinstitute.ai

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3181994/

https://www.frontiersin.org/articles/10.3389/frai.2021.622364/full

https://hmn24.com/blogs/articles/defining-human-intelligence










Impact of Quantum Computing on. Security

$
0
0


A group of researchers has claimed that quantum computers can now crack the encryption we use to protect emails, bank accounts and other sensitive data. Although this has long been a theoretical possibility, existing quantum computers weren't yet thought to be powerful enough to threaten encryption.

Can quantum computers break 256-bit encryption?

Hence, it is considered post-quantum computing resistant.” A 2019 Kryptera research paper estimated that a quantum computer capable of more than 6,600 logical, error-corrected qubits would be required to break AES-256 encryption.

QUANTUM ALGORITHMS: SHOR'S ALGORITHM

WHAT IT IS

Shor’s Algorithm is an algorithm for finding the prime factors of large numbers in polynomial time. In cybersecurity, a common encryption technique is RSA (Rivest–Shamir–Adleman). RSA is based on a public key that is the product of two large prime numbers that are kept secret. RSA is based on the assumption that a computer won’t be able to factor a very large number into its prime components, as factoring is a very different kind of problem-solving compared to addition or multiplication. Shor’s algorithm takes advantage of quantum mechanical properties of superposition and interference to quickly search through possible solutions, though the method could potentially be performed on a classical computer, over a much larger time frame. 

For example, according to Thorsten Kleinjung of the University of Bonn, it would take 1 or 2 years to factor N = 13506641086599522334960321627880596993888147560566702752448514385152651060485953383394028715057190944179820728216447155137368041970396491743046496589274256239341020864383202110372958725762358509643110564073501508187510676594629205563685529475213500852879416377328533906109750544334999811150056977236890927563 on a 2.2 GHz Athlon 64 CPU PC with ≤ 2 GB memory.

Shor’s Algorithm is a powerful tool with the potential to factor any prime number, granting its wielder the ability to break many current encryptions. While today’s NISQ quantum computers are not yet sufficient to break the RSA encryption, experts estimate that within a few years, this could be possible. Indeed, Shor’s algorithm sparked significant interest in quantum computers. 

While some predict that Shor’s Algorithm will be able to run on quantum annealing devices - non-universal quantum computers with only specialized optimization applications - within 10 years, there are many factors that go into this calculation. This is assuming that every other year the number of annealing qubits will double, as has happened in the past, but we mustn't fully rely on this calculation. Annealing qubit advancements could be hindered by various roadblocks slowing this timeframe, or ongoing research in annealing or universal quantum computing could result in breakthroughs revealing better algorithms or technologies to speed this process up.

This uncertainty in the time frame, as well as the intensity of RSA encryption being rendered useless, has gained attention from adjacent fields and governments alike. A recent United States Executive Order establishing a timeline for creating standards and frameworks for post-quantum cryptography (PQC) and developing quantum-resistant algorithms demonstrates how serious this security breach could be. Where there is data, there is a need for security. CIOs/CTOs from enterprises all over the world are investing in quantum after learning of the impact of Shor’s Algorithm, oftentimes strengthening quantum competencies with other applications and gaining an advantage over those who wait to begin their quantum journey. For example, UnitedHealth Group has quantum teams on three continents, and they’ve begun investigating artificial intelligence applications with quantum machine learning after their start in quantum cryptography.




NIST announces first 4 Post Quantum Cryptographic Algorithms


Six years after it first announced its post-quantum cryptography standardization project, the National institute of Standards and Technology (NIST) has revealed the first four algorithms to make the grade.

The PQC Timeline

Launched in 2016, NIST made an open call to the world’s cryptographers to submit candidate algorithms that would be resistant to attacks by future quantum computers. The deadline for the original submissions was 30 November 2017 and by the end of that year NIST had announced it had accepted a total of 69 submissions.

In January 2019 NIST announced 26 candidates had made it through to the second round of evaluation. By July 2020 this had been narrowed down to 7 third round finalists and 8 alternates. With this month’s announcement we are one step closer to the final published standards, which are expected in 2024.

Commenting on the announcement, US Secretary of Commerce had this to say:

“Today’s announcement is an important milestone in securing our sensitive data against the possibility of future cyberattacks from quantum computers. Thanks to NIST’s expertise and commitment to cutting-edge technology, we are able to take the necessary steps to secure electronic information so U.S. businesses can continue innovating while maintaining the trust and confidence of their customers.”

“Our post-quantum cryptography program has leveraged the top minds in cryptography, worldwide, to produce this first group of quantum-resistant algorithms that will lead to a standard and significantly increase the security of our digital information.


The First Four


These algorithms are the first 4 of what will constitute the preliminary post-quantum cryptography standards. The primary algorithms, which NIST recommends be implemented in most cases are based on module lattices. They comprise:

- 1 CRYSTALS-Kyber – an IND-CCA2 secure key encapsulation mechanism based on the hardness of solving the Learning With Errors (LWE) problem over module lattices.

- 2 CRYSTALS-Dilithium – a digital signature scheme also based on the hardness of mathematical problems over module lattices.

Two other digital signature algorithms are also standardized:

- 3 FALCON – a lattice-based digital signature scheme that utilises the short integer solution over NTRU lattices. FALCON has smaller signatures sizes and can be used when the size of the signature is an issue.

- 4 SPHINCS+ – a stateless, hash-based signature scheme. SPHINCS+ has an excellent security record. It provides a digital signature scheme based on a totally different hard problem. Its large signature size may restrict its use to specific cases.

In addition, NIST has launched a fourth round, in order to standardize at least one more algorithm for key exchange, which will not be based on lattices. The four algorithms selected for this fourth round are: BIKE, Classic McEliece, HQC and SIKE. This will ensure a variety of hard problems, in the unlikely case that lattice-based systems fail in the future. The case of Rainbow, which was one of the finalists of Round 3, but was recently broken, is a sobering reminder that the security of any new scheme is not absolute.

NIST has also announced a future new Call for Proposals for different digital signature algorithms. The aim is to reduce the size of the keys and increase the diversity of the possible schemes.


An Update of NIST's Post-Quantum Cryptography Standardization



Roland van Rijswijk-Deij (UTwente, NLnet Labs) – Quantum Prooving the Internet


Impact of quantum computing on security


Post-Quantum Cryptography: the Good, the Bad, and the Powerful



Quantum Computing: Random Number Generator & Quantum Safe Digital Certification

The threat of quantum to cyber security

One of which is breaking the RSA cryptography. Based on a 2048-bit number, the RSA encryption algorithm is widely utilised for sending sensitive information over the internet. As per industry experts, quantum computers would need 70 million qubits to break the encryption.

How Quantum Computers Break Encryption | Shor's Algorithm Explained

How Quantum Computing Will Transform Cybersecurity

Quantum computer, electronic circuitry

Quantum computing is based on quantum mechanics, which governs how nature works at the smallest scales. The smallest classical computing element is a bit, which can be either 0 or 1. The quantum equivalent is a qubit, which can also be 0 or 1 or in what's called a superposition — any combination of 0 and 1. Performing a calculation on two classical bits (which can be 00, 01, 10 and 11) requires four calculations. A quantum computer can perform calculations on all four states simultaneously. This scales exponentially: 1,000 qubits would, in some respects, be more powerful than the world's most powerful supercomputer.

The promise of quantum computing, however, is not speeding up conventional computing. Rather, it will deliver an exponential advantage for certain classes of problems, such as factoring very large numbers, with profound implications for cybersecurity.

Qubits, however, are inherently unstable. Interaction between a qubit and its surroundings degrades information in microseconds. Isolating qubits from the environment, for example, by cooling them close to absolute zero, is challenging and expensive. Noise increases with qubit count, requiring complex error correction approaches.

The other quantum concept central to quantum computing is entanglement, whereby qubits can become correlated such that they are described by a single quantum state. Measure one and you instantaneously know the state of the other. Entanglement is important in quantum cryptography and quantum communication.

Quantum Computing: One Weird Trick to Break RSA Encryption


Cybersecurity Implications


Quantum computing, and prosaic quantum technology, promise to transform cybersecurity in four areas: 

1. Quantum random number generation is fundamental to cryptography. Conventional random number generators typically rely on algorithms known as pseudo-random number generators, which are not truly random and thus potentially open to compromise. Companies such as Quantum Dice and IDQuantique are developing quantum random number generators that utilize quantum optics to generate sources of true randomness. These products are already seeing commercial deployment.

2. Quantum-secure communications, specifically quantum key distribution (QKD). Sharing cryptographic keys between two or more parties to allow them to privately exchange information is at the heart of secure communications. QKD utilizes aspects of quantum mechanics to enable the completely secret exchange of encryption keys and can even alert to the presence of an eavesdropper. QKD is currently limited to fiber transmission over 10s of kilometers, with proofs of concept via satellite over several thousand kilometers. KETS Quantum Security and Toshiba are two pioneers in this field.

3. The most controversial application of QC is its potential for breaking public-key cryptography, specifically the RSA algorithm, which is at the heart of the nearly $4 trillion ecommerce industry. RSA relies on the fact that the product of two prime numbers is computationally challenging to factor. It would take a classical computer trillions of years to break RSA encryption. A quantum computer with around 4,000 error-free qubits could defeat RSA in seconds. However, this would require closer to 1 million of today's noisy qubits. The world's largest quantum computer is currently less than 100 qubits; however, IBM and Google have road maps to achieve 1 million by 2030. A million-qubit quantum computer may still be a decade away, but that time frame could well be compressed. Additionally, highly sensitive financial and national security data is potentially susceptible to being stolen today — only to be decrypted once a sufficiently powerful quantum computer becomes available. The potential threat to public-key cryptography has engendered the development of algorithms that are invulnerable to quantum computers. Companies like PQShield are pioneering this post-quantum cryptography.

4. Machine learning has revolutionized cybersecurity, enabling novel attacks to be detected and blocked. The cost of training deep models grows exponentially as data volumes and complexity increase. Open AI's GPT-3 used as much carbon as a typical American would use in 17 years. The emerging field of quantum machine learning may enable exponentially faster, more time- and energy-efficient machine learning algorithms. This, in turn, could yield more effective algorithms for identifying and defeating novel cyberattack methods.

Quantum Computing Challenges


Quantum computing promises to transform cybersecurity, but there are substantial challenges to address and fundamental breakthroughs still required to be made. 

The most immediate challenge is to achieve sufficient numbers of fault-tolerant qubits to unleash quantum computing's computational promise. Companies such as IBM, Google, Honeywell and Amazon are investing in this problem.

Quantum computers are currently programmed from individual quantum logic gates, which may be acceptable for small quantum computers, but it's impractical once we get to thousands of qubits. Companies like IBM and Classiq are developing more abstracted layers in the programming stack, enabling developers to build powerful quantum applications to solve real-world problems.

Arguably, the key bottleneck in the quantum computing industry will be a lack of talent. While universities churn out computer science graduates at an accelerating pace, there is still too little being done to train the next generation of quantum computing professionals.

The United States's National Quantum Initiative Act is a step in the right direction and incorporates funding for educational initiatives. There are also some tremendous open-source communities that have developed around quantum computing — perhaps the most exciting and active being the IBM Qiskit community. It will take efforts from governments, universities, industry and the broader technology ecosystem to enable the level of talent development required to truly capitalize on quantum computing.

Preparing For The Quantum Future


The quantum revolution is upon us. Although the profound impact of large-scale fault-tolerant quantum computers may be a decade off, near-term quantum computers will still yield tremendous benefits. We are seeing substantial investment in solving the core problems around scaling qubit count, error correction and algorithms. From a cybersecurity perspective, while quantum computing may render some existing encryption protocols obsolete, it has the promise to enable a substantially enhanced level of communication security and privacy.

B’Envoy-age to Pre-Quantum Encryption - Daniel Rouhana, Emma Dickenson, Doron Podoleanu

Organizations must think strategically about the longer-term risks and benefits of quantum computing and technology and engage in a serious way today to be ready for the quantum revolution of tomorrow.


The quantum computing impact on cybersecurity is profound and game-changing, to put it in a nutshell. 

Quantum computing holds great promise in many areas, such as medical research, artificial intelligence, weather forecasting, etc. But it also poses a significant threat to cybersecurity, requiring a change in how we encrypt our data. Even though quantum computers don’t technically have the power to break most of our current forms of encryption yet, we need to stay ahead of the threat and come up with quantum-proof solutions now. If we wait until those powerful quantum computers start breaking our encryption, it will be too late. 

Another Reason to Act Now: Harvest Now, Decrypt Later
Regardless of when quantum computers will be commercially available, another reason to quantum-proof data now is the threat from nefarious actors scraping data. They are already stealing data and holding onto it until they can get their hands on a quantum computer to decrypt it. At that point, the data will have already been compromised. The only way to ensure the security of information, particularly information that needs to remain secure well into the future, is to safeguard it now with quantum-safe key delivery. 

The Quantum Threat to Cybersecurity

Quantum computers will be able to solve problems that are far too complex for classical computers to figure out. This includes solving the algorithms behind encryption keys that protect our data and the Internet’s infrastructure. 

Much of today’s encryption is based on mathematical formulas that would take today’s computers an impractically long time to decode. To simplify this, think of two large numbers, for example, and multiply them together. It’s easy to come up with the product, but much harder to start with the large number and factor it into its two prime numbers. A quantum computer, however, can easily factor those numbers and break the code. Peter Shor developed a quantum algorithm (aptly named Shor’s algorithm) that easily factors large numbers far more quickly than a classical computer. Since then,  scientists have been working on developing quantum computers that can factor increasingly larger numbers.  

Today’s RSA encryption, a widely used form of encryption, particularly for sending sensitive data over the internet, is based on 2048-bit numbers. Experts estimate that a quantum computer would need to be as large as 70 million qubits to break that encryption. Considering the largest quantum computer today is IBM’s 53-qubit quantum computer, it could be a long time before we’re breaking that encryption.

As the pace of quantum research continues to accelerate, though, the development of such a computer within the next 3-5 years cannot be discounted. As an example, earlier this year, Google and the KTH Royal Institute of Technology in Sweden reportedly found “a more efficient way for quantum computers to perform the code-breaking calculations, reducing the resources they require by orders of magnitude.” Their work, highlighted in the MIT Technology Review, demonstrated that a 20 million-qubit computer could break a 2048-bit number – in a mere 8 hours. What that demonstration means is that continued breakthroughs like this will keep pushing the timeline up.

It’s worth noting that perishable sensitive data is not the main concern when it comes to the quantum encryption threat. The greater risk is the vulnerability of information that needs to retain its secrecy well into the future, such as national security-level data, banking data, privacy act data, etc. Those are the secrets that really need to be protected with quantum-proof encryption now, particularly in the face of bad actors who are stealing it while they wait for a quantum computer that can break the encryption.  

Adapting Cybersecurity to Address the Threat
Researchers have been working hard in the last several years to develop “quantum-safe” encryption. The American Scientist reported that the U.S. National Institute of Standards and Technology (NIST) is already evaluating 69 potential new methods for what it calls “post-quantum cryptography (PQC).” 

There are a lot of questions surrounding quantum computing, and scientists continue to work diligently to answer them. When it comes to the impact of quantum computing on cybersecurity, though, one thing is certain: it will pose a threat to cybersecurity and our current forms of encryption. To mitigate that threat we need to change how we keep our data secure and start doing it now. We need to approach the quantum threat as we do other security vulnerabilities: by deploying a defense-in-depth approach, one characterized by multiple layers of quantum-safe protection. Security-forward organizations understand this need for crypto agility and are seeking crypto-diverse solutions like those offered by Quantum Xchange to make their encryption quantum-safe now, and quantum-ready for tomorrow’s threats. 




THE IMPACT OF QUANTUM COMPUTING ON CRYPTOGRAPHY AND DATA

Business leaders thinking about the future of their companies’ data security need only to look at the image attached to this article. A key with the potential to open the universe of digital 1’s and 0’s. The abundant research and development being applied to quantum computing promises to launch a whole new universe of computing security considerations. In this article, Joe Ghalbouni provides insight into what quantum computing is, quantum cryptography (and post-quantum cryptography) and when business leaders need to be thinking about this priority subject.

Quantum computing poses a threat on currently employed digital cryptography protocols
What is quantum computing?
Quantum computing has been at the heart of academic research, since its idea was first proposed by Richard Feynman, in order to understand and simulate quantum mechanical systems efficiently. The main idea is to make use of a quantum mechanical system in order to perform calculations. Since this system obeys the laws of quantum mechanics, it allows for an accurate simulation of a quantum system, the latter which can only be simulated classically up to a certain error factor.

Beyond this perspective, making use of purely quantum phenomena such as superposition of states, quantum parallelism and entanglement, leverage a computational potential which can allows us, for particular problems, to compute much faster. We are speaking here of orders of magnitude! The quantum bit, most commonly referred to as qubit, is the analogous of the classical bit for quantum computers. Unlike its classical counterpart, it’s not bounded to the states 0 or 1. It can be found in a superposition of both. This permits for the computing power of a quantum system to grow exponentially, each time a new qubit is added.

Among the mathematically complex problems a quantum computer promises to solve more efficiently, we denote those on which cryptographic security is solely based. Quantum computers thus are an imminent threat to currently employed cryptography protocols. But how can we address this risk and opt for a mitigation solution? Let’s start with a review on cryptography.

Reviewing cryptography

Classical cryptography as currently referred to in the quantum ecosystem, is based on the mathematical complexity to break encryption. While there are numerous cryptography protocols, they can be classified into three categories:

Asymmetric cryptography: each user holds what is commonly referred to as a public and a private key. For example, the public key of a user A, as its name suggests is available for everyone who wants to send an encrypted message to that particular user. By using his/her public key, other users on the network encrypt a message and send it to user A. The private key as its name suggests is private to the user A, and enables him/her to decrypt the encrypted message. The public key is created from the private key, and is equivalent to a product of prime numbers. The reverse process consisting of factorizing this number, in order to retrieve its primes, and thus deducing the private key from the public key, is considered a hard to compute problem. Notable algorithms include RSA, ECDSA, Diffie-Hellman, etc.
Symmetric cryptography: each user holds a private key used for both encryption and decryption. In order for two users A and B to securely interact, a key must be agreed upon prior to message encryption and exchange. This key has to be transferred in a secure manner to avoid any eavesdropping. Guessing a private key requires to go through trials and therefore, no method is more efficient than brute force. Notable algorithms include AES, Blowfish, DES, etc.
Hashing functions: a string input comprising a random number of characters, passes through a hashing function and comes out as an output of a fixed number of characters. This function is mathematically irreversible. A specific input, will always give the same output once it passes through the same hashing function, ensuring that we can verify that it matches when comparing to a database. Notable algorithms include SHA-2, SHA-3, MDA, etc.
Quantum Computers and the threat on Cryptography and Data
Quantum computers are believed to be more efficient than their classical counterparts at solving specific problems. Notable quantum algorithms such as Shor’s factoring algorithm and Grover’s search algorithm raise serious concerns towards current digital cryptography protocols. But how are these protocols exactly affected and what products are they tied to?

Protocols related to online banking and digital signatures: asymmetric cryptography for example, becomes very vulnerable. Shor’s algorithm allows for an efficient factorization by finding the primes that form a number. This means that private keys will be easily deduced from public keys and therefore encryption will no longer be secured. This is a main concern for RSA and ECDSA. Online banking transactions will become at risk, as well as digital signatures such as those used in cryptocurrency to verify transaction ownership. It is believed according to recent scientific articles that Shor’s algorithm will efficiently break RSA-2048 and ECDSA-160 for a respective quantum processor of 4096 qubits and 1000 qubits [1-3].
Protocols related to data and servers: symmetric cryptography will see its security level drop. Grover’s algorithm results in a more efficient search optimization than any known classical search algorithm, and can considerable reduce brute force attempts from  on average, to  (  being the possibilities). Thus in the case of AES-128, where 2128 key combinations exist, it would take Grover’s algorithm 264 operations to find it instead of 2128/2 classically. This means a 50% speed gain. This endangers any data encrypted and stored on online servers, especially if it remains of significance importance over time [1-3].
Protocols related to blockchain and cryptocurrency: hashing functions being irreversible, mean that quantum computers will not bring any advances towards decrypting them. It is simply not feasible. However, collision and birthday surprise attacks which rely on trial of a big set, similarly to brute force methods, will become more powerful thanks to Grover’s algorithm. The speed up attainable in the search algorithm will make it easier to find the right input for a given hash function’s output. For example, SHA-256 which maps an output of 256 bits, imply 2256 possibilities requiring classically on average 2256/2 trials to find the right combination. Grover’s algorithm will reduce this to  trials [1-3]. This indicated a 50% speedup. While it’s still a big number, it reduces the bit security level to half of its original value and is something to keep in mind. Hashing functions are extensively used in cryptocurrencies and any vulnerability that can target the blocks in the public ledger, needs to be addressed. Although, when dealing with a distributed ledger, any modification in the block can be corrected right away by the majority of the nodes.    
Quantum Computing Timeline
What about the timeline? How much time do we still have to become prepared? Although universal quantum computers that can run these powerful algorithms, and that comprise inherently stabilized qubits might take a couple of decades to see the light, it still is a short period of time. Being quantum ready requires training of personnel at different levels (top managers, IT, HR,…) as well as recruiting the right people for the right positions.

The quantum ecosystem is constantly evolving and maturing, and following the latest advances is crucial in order to invest both time and money in the right place and at the right moment. Also, quantum algorithms are constantly being developed and only require those universal quantum computers to be put to the final test. Simulators allow us already to fully understand for a few qubits how they work. Scaling an algorithm to a bigger number of qubits is, most of the time, pretty straight forward. Once available on the market, fault tolerant quantum computers can be operational right away.

On the technical practicality, what should businesses do?

Businesses must assess the risk present on their data and cybersecurity by quantum experts. The reports will indicate how safe the system is and for which estimated period of time. From there, companies will have to put in place a long term strategy to go towards one of two solutions, or a mix between them: quantum proof algorithms or purely quantum protocols.

Quantum proof algorithms saw the light after the National Institute of Standards and Technologies (NIST) launched an ongoing competition to pick a newly designed quantum safe encryption protocol. At the moment of writing this article, NIST has reached the final phase and recently released an FAQ about Quantum Computing and Cryptography [6].

With quantum algorithms constantly being developed, these quantum proof algorithms might become obsolete one day. Thus, quantum cryptography which makes use of quantum phenomena for intrinsic security and which allow us to detect the presence of an eavesdropper, might be a more appropriate and safer solution. Quantum Key Distribution (QKD) allows for a provably quantum secure scheme of private key exchange. Not only do we get over the idea of a public/private key combination, but private keys will be exchanged remotely, the latter requiring however an appropriate quantum architecture based either on optical fibers and/or satellites.

A hybrid solution could consist of injecting pure quantum randomness instead of classical pseudo-randomness when generating keys for RSA and ECDSA, or when generating passwords. When a list of those is generated all at once, it is possible for an attacker to guess the underlying function behind the pseudo-randomness and from a few elements at their disposal, figure out the whole list. Adding quantum randomness through quantum random number generators, is a good solution since quantum is intrinsically random with its measurement process.

IdQuantique is an example of a Swiss company that proposes QRNG and QKD ready solutions for implementation. We also denote similar products proposed by the Canadian and American companies evolutionQ and QuintessenceLabs.

Conclusion
Getting prepared to face the and benefit from the second Quantum revolution is a challenge. However, when well prepared, businesses can explore so many new opportunities, allowing them to fully embrace this new technology. Whether it is through correctly assessing every risk associated with Quantum Computing, on the company’s different levels, or through its change in managerial approach, businesses should start addressing the Quantum question right away, before it becomes too late!

Quantum technologies in defence & security
 
Given the potential implications of novel quantum technologies for defence and security, NATO has identified quantum as one of its key emerging and disruptive technologies. This article seeks to unpack some of the fascinating future applications of quantum technologies and their implications for defence and security.

Those who are not shocked when they first come across quantum theory cannot possibly have understood it.

Niels Bohr

If you think you understand quantum mechanics, you don’t understand quantum mechanics.

Richard Feynman

Not only is the Universe stranger than we think, it is stranger than we can think.

Werner Heisenberg

Three quotes from three famous quantum physicists. I guess it is safe to say that there is broad consensus that trying to understand quantum mechanics is not your average Sunday morning brain teaser. However, quantum mechanics is not just mind-boggling and food for vigorous thought. In fact, although we might not be able fully to comprehend it, technologies built upon our understanding of quantum mechanics are already all around us.

Transistors and semiconductors in our computers and communication infrastructures are examples of ‘first generation’ quantum technologies. But the best is still to come. Through a greater understanding of quantum phenomena such as ‘superposition’ and ‘entanglement’ (explained below), the ‘second quantum revolution’ is now taking place, enabling the development of novel and revolutionary quantum technologies.

As these technologies will bring profound new capabilities both for civilian and military purposes, quantum technologies have received significant interest from industry and governments in recent years. Big technology companies like IBM, Google and Microsoft are spending hundreds of millions of dollars on research and development in the area of quantum computing in their race for ‘quantum supremacy’. Similarly, governments have recognised the transformative potential and the geopolitical value of quantum technology applications and the United States, the European Union and China have each set up their own >1 billion dollar research programmes.

Principles underlying quantum technologies


Without going into a detailed explanation of quantum mechanics, a few key underlying principles are worth briefly discussing to help understand the potential applications of quantum technologies.

Quantum technologies exploit physical phenomena at the atomic and sub-atomic scale. Fundamental to quantum mechanics is that at this atomic scale, the world is ‘probabilistic’ as opposed to ‘deterministic’.

This notion of probability was the subject of a world-famous debate between Albert Einstein and Niels Bohr at the fifth Solvay Conference on Physics, held in October 1927 in Brussels. This conference gathered the 29 most notable physicists of the time (17 of them would later become Nobel Prize winners) to discuss the newly formulated quantum theory.

 This photograph was taken in Leopold Park in Brussels during the Fifth Solvay Conference on Physics in 1927, and is often referred to as the “most intelligent photograph ever taken”.  Photo credit: Benjamin Couprie, Institut International de Physique de Solvay.
This photograph was taken in Leopold Park in Brussels during the Fifth Solvay Conference on Physics in 1927, and is often referred to as the “most intelligent photograph ever taken”.
Photo credit: Benjamin Couprie, Institut International de Physique de Solvay.

In the so-called “debate of the century” during the 1927 Solvay Conference, Niels Bohr defended the new quantum mechanics theory as formulated by Werner Heisenberg, whereas Albert Einstein tried to uphold the deterministic paradigm of cause and effect. Albert Einstein famously put forward that “God does not play dice”, after which Niels Bohr countered “Einstein, stop telling God what to do.”

Nowadays, the scientific community agrees that Niels Bohr won the debate. This means that our world does not have a fixed script based on cause and effect but is in fact subject to chance. In other words, you can know everything there is to know in the universe and still not know what will happen next.

This new probabilistic paradigm led the way to a better understanding of some key properties of quantum particles which underlie quantum technologies, most notably ‘superposition’ and ‘entanglement’. The improved understanding of these fundamental quantum principles is what has spurred the development of next-generation quantum technologies: quantum sensing, quantum communication and quantum computing.

Present and future applications


While quantum computing has received most of the hype around quantum technologies, a whole world of quantum sensing and quantum communication is out there, which is just as fascinating and promising.

Quantum sensing


Quantum sensors are based on ultra-cold atoms or photons, carefully manipulated using superposition or entanglement in specific ‘quantum states’. By exploiting the fact that quantum states are extremely sensitive to disturbances, quantum sensors are able to measure tiny differences in all kinds of different properties like temperature, acceleration, gravity or time.

Quantum sensing has transformative potential for our measurement and detection technology. Not only does it enable much more accurate and sensitive measurements, it also opens up possibilities to measure things we have never been able to measure before. To name a few, quantum sensors could allow us to find out exactly what lies under our feet through underground mapping; provide early-warning systems for volcanic eruptions; enable autonomous systems to ‘see’ around corners; and provide portable scanners that monitor a person’s brain activity (source: Scientific American).

While quantum technologies might seem to be technologies of the distant future, the first quantum sensors are actually already on the market (for example, atomic clocks and gravimeters). Looking ahead, we can expect more quantum sensing applications becoming available over the course of the coming five to seven years, with quantum Positioning Navigation and Timing (PNT) devices and quantum radar technologies as particular applications to look out for.

Quantum communication


The potential of quantum communication relies on its promise to enable ‘ultra-secure’ data communication, potentially even completely unhackable. Currently, our exchange of data relies on streams of electrical signals representing ‘1s’ and 0s’ running through optical fibre cables. A hacker who manages to tap into these cables can read and copy those bits as they travel through the cable. In quantum communication on the other hand, the transmitted information is encoded in a quantum particle in a superposition of ‘1’ and ‘0’, a so-called ‘qubit’. Because of the sensitivity of quantum states to external disturbances, whenever a hacker tries to capture what information is being transmitted, the qubit ‘collapses’ to either a ‘1’ or a ‘0’ – thereby destroying the quantum information and leaving a suspicious trail.

The first application of quantum communication is called ‘Quantum Key Distribution’ (QKD) which uses quantum particles for the exchange of cryptographic keys. In QKD, the actual data is transmitted over traditional communication infrastructure using normal bits, however, the cryptographic keys necessary to decrypt the data are transmitted separately using quantum particles. Extensive experimentation in QKD is already taking place, both using terrestrial communication as well as space-based communication. In 2016, China launched the world’s first quantum science satellite ‘Micius’, which has since then demonstrated intercontinental ground-to-satellite and satellite-to-ground QKD by securing a video conference meeting between Beijing and Vienna (source).

‘Quantum teleportation’ would be the next step in quantum communication. Whereas in QKD the cryptographic keys are distributed using quantum technology, with quantum teleportation it is the information itself that is being transmitted using entangled quantum pairs. The greatest distance over which quantum teleportation has been achieved so far over fibre-optic cable is 50 kilometres (source), and the challenge in the coming years is to scale quantum teleportation to enable secure communication over larger distances.

The ultimate goal in quantum communication is to create a ‘quantum internet’: a network of entangled quantum computers connected with ultra-secure quantum communication guaranteed by the fundamental laws of physics. However, a quantum internet not only requires quantum teleportation over very large distances, it would also require the further development of other crucial enabling technologies like quantum processors, a comprehensive quantum internet stack including internet protocols and quantum internet software applications. This really is a long-term endeavour and, while it’s difficult to determine if and exactly when this technology matures, most scholars refer to a time horizon of 10-15 years.

Quantum computing


Quantum computing will significantly increase our capacity to solve some of the most complex computational problems. In fact, quantum computing is said to be as different from classical computing, as a classical computer differs from the abacus.

As explained above, whereas classical computers perform calculations using binary digits (0 or 1), quantum computers represent information using quantum bits (qubits) which can be in a superposition of both states (0 and 1 at the same time).

As qubits are extremely sensitive to external disturbances, in order to be able to control, manipulate and exploit them, qubits need to be cooled down to a level extremely close to the absolute minimum temperature (or zero kelvin), around 15 millikelvins. That is colder than outer space! In fact, inside a quantum computer is the coldest place in the universe we know of.

 Quantum computer built by IBM: the IBM Q System One (source: Forbes). Want to listen to it? Visit this link to listen to the sounds of a quantum computer’s heartbeat.
Quantum computer built by IBM: the IBM Q System One (source: Forbes). Want to listen to it? Visit this link to listen to the sounds of a quantum computer’s heartbeat.

Qubits enable quantum computers to make multiple calculations at the same time, potentially resulting in an immense increase in computational efficiency as opposed to classical computers. There are a number of applications where quantum computers will be particularly transformational:

Simulation of physical systems for drug discovery and the design of new materials;

Solving complex optimisation problems in supply chain, logistics and finance;

Combination with artificial intelligence for the acceleration of machine learning;

Factorisation of integers, enabling the decryption of most commonly used cybersecurity protocols (e.g. RSA, an asymmetric encryption algorithm, used for secure data transmission).

Big technology companies like IBM, Google and Microsoft are racing for ‘quantum supremacy’, which is the point where a quantum computer succeeds in solving a problem that no classical computer could solve in any feasible amount of time.

In October 2019, Google claimed to have achieved quantum supremacy on its 53-qubit quantum computer. However, critics say that the problem solved in the Google experiment had no practical value and that therefore the race for quantum supremacy is still on.

Current quantum computers have around 60 qubits but further developments follow each other in rapid succession and ambitions are high. Last September, IBM announced a road map for the development of its quantum computers, including its goal to build a quantum computer with 1000 qubits by 2023 (source). Google has its own plan to build a million-qubit quantum computer by 2029 (source).

With 1000-qubit quantum computers, so-called Noisy Intermediate-Scale Quantum (NISQ) computers, we can already see some valuable practical applications in material design, drug discovery or logistics. The coming five to ten years therefore will be incredibly exciting for quantum computing.

Implications for defence and security


Quantum technologies have the potential to bring profound new capabilities, enabling us to sense the insensible, transforming cybersecurity, and enabling us to solve problems we have never been able to solve before.

In the defence and security environment, two applications will have particularly significant implications in the near- to mid-term.

Firstly, quantum sensing. Quantum sensors have some promising military applications. For example, quantum sensors could be used to detect submarines and stealth aircraft, and quantum sensors could be used for Position, Navigation and Timing (PNT). Such ‘quantum PNT devices’ could be used as reliable inertial navigation systems, which enable navigation without the need for external references such as GPS. This would be a game-changing capability for underwater navigation on submarines, for instance, but also as a back-up navigation system for above-water platforms in case of GPS signal loss.

The first quantum sensors are already commercially available, making it the most mature technology out of sensing, communications and computing. Moreover, for quantum communications and computing, the civilian sector is expected to drive developments forward, given the immense potential value they have for civil industry. However, for quantum sensing, potential applications such as quantum PNT and quantum radar are particularly interesting for the military. Therefore, it is up to the military to fund, support and guide research and development in this area to make these potential applications a reality.

Secondly, the ‘quantum threat’ posed by quantum computing. As mentioned in the previous section, the factorisation of integers is one type of problem that quantum computers can solve particularly efficiently. Most of our digital infrastructure and basically anything we do online – whether that is video conferencing, sending e-mails or accessing our online bank account – is encrypted through cryptographic protocols based on the difficulty of solving these kinds of integer factorisation problems (e.g. the RSA algorithm). While practically usable quantum computers still need to be developed, the quantum algorithm to solve these problems and to decrypt our digital communication, i.e. Shor’s algorithm, has already been invented in 1994 and is waiting for a quantum computer capable of running it.

To illustrate, the figure below is an example of an integer factorisation problem as used to secure potentially sensitive information.

 Example of an integer factorisation problem, which forms the basis of our current cybersecurity systems. (source)
Example of an integer factorisation problem, which forms the basis of our current cybersecurity systems. (source)

While you might think that any graphic calculator would be able to solve this seemingly simple mathematical problem, in fact, the world’s fastest supercomputer would take the whole lifetime of the universe to solve it. A quantum computer, however, would be able to solve it in a couple of minutes (source).

This is an urgent threat to society writ large but also specifically to the military, given the importance of secure communication and secure information for defence and security. To counter this threat, we will have to completely upgrade all our secure digital infrastructure using cryptography that is ‘quantum-resistant’, i.e. secure against both quantum and classical computers. One option would be to wait for quantum communication (QKD or quantum teleportation) to mature and use this quantum technology to protect against the other quantum technology. However, time is not on our side. Not only could quantum computing technology outpace quantum communication development, the threat is already present. With the prospect of future quantum computers, hackers could steal encrypted information today, store it and decrypt it in 10-15 years using a future quantum computer.

The better option is to implement ‘Post-Quantum Cryptography’ (PQC), new classical (i.e. non-quantum) cryptographic algorithms that even quantum computers will not be able to solve. Currently, the US National Institute of Standards and Technology (NIST) is leading an international competition to select the PQC algorithm(s) to be standardised and adopted across the globe. The process started in 2016 and in July 2020 the NIST announced it had seven final candidates.

We can expect the NIST to make its final selection for standardisation by early 2022 and establish actual standards by 2024 (source). Decision-makers across industries and within the military should pencil these dates in their diaries, start preparing for a big cybersecurity upgrade and make sure we hit the ground running.

Way ahead


New advances in quantum technology research and development have the potential to bring exciting new capabilities to the military. Given the sizable interest and funding for quantum technologies coming from both civilian industry and governments, it is expected that the technology will mature and that new quantum applications will become available in the coming five to ten years. However, for Allied militaries to be able to actually reap the benefits of these new quantum technologies, it is essential that Allies proactively engage in this field and guide the development and adoption of the military applications of quantum technologies. This should include not just engaging with big technology companies, but specifically also with start-ups, universities and research institutes as these are vital for innovation in these new technologies.

Allied militaries could bring significant added value to existing efforts in industry and academia by providing testing & validation infrastructure (test centres) and access to end-user military operators. Early experimentation with these technologies not only contributes to their further development, but also enables the military to become familiar with these technologies and their capabilities, which helps facilitate future adoption. Moreover, active participation in the quantum ecosystem increases the military’s understanding of the potential risks associated to quantum technologies, specifically within the cyber domain.




Quantum Computing Resistant Encryption for  Cyber Security


Post-Quantum Cryptography - Chris Peikert - 3/6/2022


Quantum Computing Resistent Encrryption for  Cyber Security   - Quantum Fourier Transform

Quantum Computing: Random Number Generator & Quantum Safe Digital Certification


NIST Announces First Four Quantum-Resistant Cryptographic Algorithms

Introduction to quantum cryptography - Vadim Makarov

 More Information:




















ChatGPT for Deverlopers

$
0
0

 

ChatGPT for Deverlopers and Code Completion

ChatGPT can write code. Now researchers say it's good at fixing bugs, too

A developer's new best friend? ChatGPT is up with the best when it comes to automatically debugging code. But whether it saves developers' time or creates more work remains to be seen.

OpenAI's ChatGPT chatbot can fix software bugs very well, but its key advantage over other methods and AI models is its unique ability for dialogue with humans that allows it to improve the correctness of an answer. 

Elon Musk Opinion On chatGPT

Researchers from Johannes Gutenberg University Mainz and University College London pitted OpenAI's ChatGPT against "standard automated program repair techniques" and two deep-learning approaches to program repairs: CoCoNut, from researchers at the University of Waterloo, Canada; and Codex, OpenAI's GPT-3-based model that underpins GitHub's Copilot paired programming auto code-completion service. 

Also: How to get started using ChatGPT

"We find that ChatGPT's bug fixing performance is competitive to the common deep learning approaches CoCoNut and Codex and notably better than the results reported for the standard program repair approaches," the researchers write in a new arXiv paper, first spotted by New Scientist.

It’s Time to Pay Attention to A.I. (ChatGPT and Beyond)

The best AI chatbots: ChatGPT and other interesting alternatives to try

AI chatbots and writers can help lighten your workload by writing emails and essays and even doing math. They use artificial intelligence to generate text or answer queries based on user input. ChatGPT is one popular example, but there are other noteworthy chatbots.

That ChatGPT can be used to solve coding problems isn't new, but the researchers highlight that its unique capacity for dialogue with humans gives it a potential edge over other approaches and models. 

The researchers tested ChatGPT's performance using the QuixBugs bug-fixing benchmark. The automated program repair (APR) systems appear to be at a disadvantage as they were developed prior to 2018. 

Also: The best AI art generators: DALL-E 2 and alternatives

ChatGPT is based on the transformer architecture, which Meta's AI chief Yann LeCunn highlighted this week was developed by Google. Codex, CodeBERT from Microsoft Research, and its predecessor BERT from Google are all based on Google's transformer method.

OpenAI highlights ChatGPT's dialogue capability in examples for debugging code where it can ask for clarifications, and receive hints from a person to arrive at a better answer. It trained the large language models behind ChatGPT (GPT-3 and GPT 3.5) using Reinforcement Learning from Human Feedback (RLHF).   

While ChatGPT's ability for discussion can help it to arrive at a more correct answer, the quality of its suggestions remain unclear, the researchers note. That's why they wanted to evaluate ChatGPT's bug-fixing performance. 

The researchers tested ChatGPT against QuixBugs 40 Python-only problems, and then manually checked whether the suggested solution was correct or not. They repeated the query four times because there is some randomness in the reliability of ChatGPT's answers, as a Wharton professor found out after putting the chatbot through an MBA-like exam.  

Also: The developer role is changing radically

How I Coded An Entire Website Using ChatGPT

ChatGPT solved 19 of the 40 Python bugs, putting it on par with CoCoNut (19) and Codex (21). But standard APR methods only solved seven of the issues.   

The researchers found that ChatGPT's success rate with follow-up interactions reached 77.5%. 

The implications for developers in terms of effort and productivity are ambiguous, though. Stack Overflow recently banned ChatGPT-generated answers because they were low quality but plausible sounding. The Wharton professor found that ChatGPT could be a great companion to MBA students as it can play a "smart consultant" -- one who produces elegant but oftentimes wrong answers -- and foster critical thinking.

"This shows that human input can be of much help to an automated APR system, with ChatGPT providing the means to do so," the researchers write.

"Despite its great performance, the question arises whether the mental cost required to verify ChatGPT answers outweighs the advantages that ChatGPT brings."   

Can ChatGPT replace Programmers? What are ChatGPT OpenAI limitations

Let’s see how capable this chat bot is and whether it can replace programmers. In addition, we will discuss its limitations and what type of tasks can be performed by ChatGPT.

Will ChatGPT Replace Software Engineers? (full analysis)

Working of ChatGPT

Let’s talk about working of ChatGPT in brief, ChatGPT is a large language model based on GPT3 and GPT 3.5. This AI tool applies machine learning algorithms to a massive corpus of text to respond to user requests using language that sounds surprisingly human.

According to OpenAI, ChatGPT increases its capability through reinforcement learning that relies on human feedback. The business hires human AI trainers to interact with the model, taking on the roles of both the user and the chatbot. Trainers compare responses provided by ChatGPT to human responses and rate their quality to reinforce human-like conversational approaches.

ChatGPT can replace programmers

Artificial intelligence (AI) scientists have been impressed by the skills of AlphaCode, an artificial intelligence system that can often compete with humans in solving simple computer science problems. Google’s sister company DeepMind, an AI based in London, released the tool in February and has now published its results in Science, showing that AlphaCode beat about half of humans in code competitions.

Social media users have been mesmerized by the ability of another chatbot, called ChatGPT, to produce sometimes meaningful-sounding (and sometimes sublimely ridiculous) mini-essays — including short computer programs — on demand. But these state-of-the-art AIs can only perform relatively limited tasks, and researchers say they are nowhere near being able to replace human programmers.

ChatGPT and AlphaCode are “big language models” — neural network-based systems that learn to perform a task by digesting vast amounts of existing human-generated text. In fact, the two systems use “virtually the same architecture,” says Zico Kolter, a computer scientist at Carnegie Mellon University in Pittsburgh, Pennsylvania. “And while there are of course minor differences in training and execution, the main difference, if any, is that they are simply trained on different data sets and therefore for different tasks.”

This Is Better Than ChatGPT (With Prompting Guide)

While ChatGPT is a general-purpose conversation tool, AlphaCode is more specialized: it was trained exclusively on how people answered questions from software writing competitions. “AlphaCode was designed and trained specifically for competitive programming, not software engineering,” David Choi, a research engineer at DeepMind and co-author of the paper, told Nature in an email.

Limitations of ChatGPT

Wrong answers

ChatGPT is a large-scale language model that is constantly trained to improve response accuracy. However, since this is a completely new technology, the model has not yet undergone sufficient training. Therefore, the AI ​​chatbot may provide wrong answers. Because of this, StackOverflow banned ChatGPT, saying: “Overall, because the average rate of getting correct answers from ChatGPT is too low, posting answers generated by ChatGPT is substantially harmful to our site and to users who ask or search for correct answers.”

Training data limitations and bias issues

Like many AI models, ChatGPT has limitations in its training data. Both limitations in the training data and biases in the data can have a negative impact on model output. In fact, ChatGPT has shown a bias when it comes to training minority data groups. Therefore, it is important to improve the transparency of model data to reduce the bias of this technology.

Sustainability

There is a thread of conversation on Twitter about how many graphics processing units (GPUs) are needed to run ChatGPT. The bottom line is that ChatGPT is very expensive to run. Since ChatGPT is a free product, there are currently a lot of questions about how sustainable the technology is in the long term.

ChatGPT's OpenAI Projects to Reach $1 BILLION Revenue SOONER!

4 ways devs can use ChatGPT to be more productive

Four ways in which ChatGPT can maximize your productivity as a developer, as well as four reasons why it is not an -be-all and end-all solution to writing code. Using ChatGPT to its fullest potential while avoiding its pitfalls will allow your productivity to skyrocket!

We’ll cover:

How ChatGPT can help devs write better code

1. Making coding more accessible

2. ChatGPT as a research assistant

3. Reducing tedium with ChatGPT

4. Using ChatGPT for natural language processing

4 areas where ChatGPT falls short

1. Human judgment is still required

2. ChatGPT can’t problem solve

3. ChatGPT doesn’t have multiple perspectives

4. ChatGPT can’t get you hired

Start leveraging AI for yourself

How ChatGPT can help devs write better code

ChatGPT is not the first machine learning tool to serve as a coding assistant.

How ChatGPT Works Technically For Beginners

Autocomplete and text-generation software has been helping us type code and even email faster for several years. We also have GitHub Copilot, which uses a production version of OpenAI’s GPT-3 to suggest improvements and flag potential problems in our code. As a coding assistant, ChatGPT distinguishes itself from Copilot with the ability to formulate detailed responses to conversational prompts instead of basic pre-programmed commands.

Here are four distinct ways using ChatGPT can make your life as a developer simpler.

1. Making coding more accessible

Throughout the history of computer science, we’ve seen technological advancements that have enabled many more people to become developers. Largely thanks to methods of abstraction, it has become easier for more and more people to leverage complex technologies once only understood by highly specialized engineers.

For instance, high-level programming languages, in tandem with compilers and IDEs, allow today’s engineers to write human-readable code without having to write machine code (which is in binary and not human-friendly). Similarly, the improvement of AI assistants like Copilot is a promising sign that we’re still moving toward making coding a more accessible and enjoyable experience for all.

Benefitting from abstraction doesn’t necessarily mean that developers would be any less skilled or knowledgeable. Similarly, not knowing how a car engine works doesn’t make you a bad driver, and using autocomplete doesn’t make you a bad engineer. We can still build beautiful applications while benefiting from high-level languages like Java or machine learning tools like ChatGPT.

How ChatGPT is Trained

2. ChatGPT as a research assistant

ChatGPT has been trained on over 45 terabytes of text data from various sources, including CommonCrawl, WebText2, and code in Python, HTML, JavaScript, and CSS.

ChatGPT generates responses based on this vast training dataset — and conveniently does so in response to human input. The ability to interpret human input can make ChatGPT a helpful research assistant. While its results still need validation, it can provide accurate results that can save us from scouring search engine results or StackOverflow. It can even offer further explanations that will aid coders in learning and understanding new concepts.

This benefit can help us streamline our search for new and relevant knowledge while coding. No developer knows everything, and questions are bound to pop up in your mind every now and then. Hopping over to the OpenAI tab and having ChatGPT answer questions can save a lot of time spent researching. You shouldn’t use ChatGPT to pull all of your information, but this is an excellent method to get an answer in a matter of seconds.

3. Reducing tedium with ChatGPT

ChatGPT will make coding more productive and bug-free. As it accommodates more complex requirements, we can look forward to it helping eliminate grunt work and accelerate productivity and testing.

As assistants such as ChatGPT evolve, many of the tedious tasks that have occupied developers could go away in the next decade, including:

Automating unit tests

Generating test cases based on parameters

Analyzing code to suggest security best practices

Automating QA

Another major benefit is automating the mundane task of creating documentation. ChatGPT can help developers generate documentation for their code, such as API and technical documentation. For example, ChatGPT can analyze your code and extract valuable information such as function and variable names, descriptions, and usage examples. It can then use this information to create detailed reports that are easy to navigate. This automation can save developer teams significant time and effort that would otherwise be dedicated to manually constructing the necessary documentation.

Other forms of documentation, such as user manuals, release notes, troubleshooting guides, and more, can also be expedited by ChatGPT. Although this chatbot cannot be a replacement for understanding your code, it is a great tool for efficiently maintaining proper documentation so that other teams (and new team members) can easily understand the dev team’s workflow.

Relieving developers of menial tasks can free them to think about more complex issues, optimizations, and higher-level concerns, such as an application’s implications for its users or business. Allowing this new AI chatbot to perform these actions, such as data processing, will open up your work schedule to focus on more critical and creative projects.

Some people think assistant tools make developers lazy. We strongly disagree. Once you’ve learned something, there’s no cognitive or productivity benefit to retyping the same line of code repeatedly. Why reinvent the wheel if the best code for a particular task has already been tried and tested? Besides, the problem you’re solving is probably more complicated than copying/pasting a few code snippets.

This benefit is analogous to how APIs simplified devs’ lives. For instance, the payment processing that Stripe now hides behind a single API once required developers to write 1,000 lines of code.

What Are The Top 10 Limitations Of ChatGPT?

4. Using ChatGPT for Natural Language Processing

Natural language processing (NLP) is a subset of machine learning that uses software to manipulate and produce natural languages, such as the text that appears when you ask ChatGPT a question or the speech you hear from an AI bot like Alexa or Siri. Tasks such as translating between languages, text analysis, speech recognition, and automatic text generation all fall under the umbrella of natural language processing.

Here are a few examples of how ChatGPT can aid developers with natural language processing.

Sentence parsing: ChatGPT can parse natural language inputs and extract the desired information, such as entities and actions. This information can be used to identify the necessary requirements.

Text classification: ChatGPT can classify natural language inputs into predefined categories such as functional requirements, non-functional requirements, or constraints.

Summarization: ChatGPT can summarize natural language inputs into a more concise and actionable form, which can help developers quickly understand the key requirements.

Dialogue-based: ChatGPT can assist in a dialogue-based approach, where developers can ask follow-up questions to gather more clarification on the requirements.

Using natural language processing techniques, ChatGPT can help devs gauge the requirements expressed in natural language. They can then transform this information into actionable requirements to guide development.

It’s important to note that these examples pertain to one natural language processing use case. Your approach will depend on the context and conditions of your project.

4 areas where ChatGPT falls short

ChatGPT is not magic. It looks at a massive corpus of data to generate what it considers the best responses based on existing code.

Accordingly, it definitely has its limitations. Be wary of these limitations while utilizing ChatGPT to your benefit.

Let's build GPT: from scratch, in code, spelled out.

1. Human judgment is still required

ChatGPT is a valuable tool, but it surely doesn’t replace human judgment. Its learning models are based on consuming existing content — some of which contain mistakes and errors.

No matter what code snippet is generated by ChatGPT, you still need to apply your judgment to ensure it’s working for your problem. ChatGPT generates snippets based on code written in the past, so there’s no guarantee that the generated code is suitable for your particular situation. As with any snippet you find on StackOverflow, you still have to ensure your intention was fully understood and that the code snippet is suitable for your program.

Ultimately, we can’t blindly copy/paste code snippets from ChatGPT, and the consequences of doing so could be severe.

2. ChatGPT can’t problem-solve

Problem-solving is an essential skill that developers need to have, which is why machine-learning and text-based tools won’t take over developer jobs anytime soon.

As a developer, your job involves understanding a problem, coming up with several potential solutions, then using a programming language to translate the optimal solution for a computer or compiler. While machine learning tools can help us type code faster, they can’t do problem-solving for us.

While ChatGPT can enable many people to become better, more efficient developers, it’s not capable of building large-scale applications for humans. In the end, we still need human judgment to discern between good and bad code. Even if we’re receiving help when writing code, we’re not running out of big problems to solve.

Relying on ChatGPT to solve your problems with plagiarized code is dangerous. For one thing, inserting code you copied into your applications indiscriminately introduces great security, legal, and ethical risks, even if you’re borrowing from a machine-learning tool. Besides, the tech industry still wants critical thinking developers, and you’re not going to convince anyone you have those attributes by stealing code.

Suffice it to say, plagiarism is definitely not the correct use of ChatGPT.

Microsoft: ChatGPT For Free - Join The Waitlist!

3. ChatGPT doesn’t have multiple perspectives

ChatGPT has a limited perspective. Its suggestions are based on the data it is trained with, which comes with many risks.

For one, if ChatGPT mistakes a highly repeated code snippet as a best practice, it can suggest and perpetuate a vulnerability or inefficiency.

ChatGPT is fully capable of generating incorrect answers, but like any other answer, it will do so with utmost confidence. Unfortunately, no metric helps manage expectations about the potential error in a response. This is a disadvantage against other sources we visit for guidance. Sites like StackOverflow or GitHub are capable of giving us more multidimensional data. We can validate others’ suggestions by looking at their context, responses, upvotes, etc. In this sense, other sources are better equipped to touch on the nuances of real-world problems.

ChatGPT’s limited perspective can make it something of an echo chamber, which can be very problematic. We’ve long known that machine learning algorithms can inherit bias, so AI is vulnerable to adopting harmful biases like racism, sexism, and xenophobia. Despite the guardrails that OpenAI has implemented, ChatGPT is also capable of inheriting bias. (If you’re interested, reporter Davey Alba discussed ChatGPT’s susceptibility to bias on Bloomberg.)

All in all, we have to take every ChatGPT response with a massive grain of salt — and sometimes, it might be easier just to write your code from scratch than to work backward to validate a generated code snippet.

Why OpenAI’s ChatGPT Is Such A Big Deal

4. ChatGPT can’t get you hired

Though it can generate a code snippet, ChatGPT is not the end of coding interviews. Besides, the majority of the coding interview consists of problem-solving — not writing code. Writing code will only take about 5-10 minutes of a 45-minute coding interview. If you’re curious about how to prepare for interviews efficiently, check out this full breakdown of the Amazon coding interview.

The rest of the coding interview requires you to give other hireable signals. You still need to ensure that you’re asking the right questions to articulate and understand your problem and narrating your thought process to demonstrate how you narrow your solution space. ChatGPT can’t help you with any of this. However, these critical thinking and problem-solving skills carry just as much weight in your hireability as your coding competency.

Don’t rely on ChatGPT too heavily. Instead, rely on your knowledge and skills to get the job!

Start leveraging AI for yourself

Generating Music with ChatGPT and MuseScore 4

Machine learning tools help us perform tasks more efficiently, but they don’t replace our need to think. Sometimes they’re right, and other times they’re incredibly (and hilariously) wrong. ChatGPT will help you immensely in your daily life, but it stops well short of doing your job for you.

While assistants like Siri and Alexa can help us with basic tasks, they can’t help us with complex efforts like making substantial life changes. Similarly, ChatGPT can’t help us with complex problems, nor can it replace innovation. But these technologies help alleviate menial tasks that distract us from tackling more ambitious issues (such as improving AI technologies).

As a developer, you shouldn’t stop investing in your learning or long-term coding career. If anything, be open to incorporating these tools into your life in the future. If you are interested, you can learn how to leverage AI for yourself.

With these resources, you’ll get up to speed quickly through hands-on work with ML techniques and tools, including deep learning, NLP, GPT-3, and integrating the OpenAI API.

Conclusion

The real question by the end of this topic is – Can AI write blogs and replace Google or Human interactions? The answer is clearly NO, as of now because there are a lot of dependencies relying on human intelligence and this trending technology. Although this technology is a charm still the ratio of bugs and inaccurate answers is high in ChatGPT (Generative Pre-trained Transformer). On the other hand, Google works with a different mechanism and its metrics are based on websites. But the amount of fame that this technology has been getting is highly appreciable and we hope for brighter days in this technology. 

Frequently Asked Questions (FAQ)

1. Who built ChatGPT?

Ans: ChatGPT is an AI-based model that was launched by OpenAI in Nov 2022 and works on automating chatbot technology. It is also considered the advanced version of traditional support chat systems. It was founded back in 2015 by Elon Musk, Ilya Sutskever, Wojciech Zaremba Greg Brockman, and Sam Altman. and became sensational overnight. The company’s current CEO is Sam Altman.

2. Is ChatGPT free to use?

Ans: Since it is still in the development phase and their team is continuously working to make ChatGPT more smooth and interactive so currently it is free to use and open to all. However, the company has not disclosed any future plans regarding the usage.

3. Will ChatGPT replace Google?

Ans: The answer to this is “NO”, ChatGPT is an AI-based trained model that works on generating output based on human interaction (provided inputs) and there are times when ChatGPT is not even answering the relevant answers. On the other hand, Google is an internationally established brand that is being served as a home brand for search engines for every third person in the world. Google offers many other services that are more precise, vast, and accurate. So, ChatGPT is likely not going to replace Google completely.

4. Who owns Open AI?

Ans: Open AI is an AI-based research lab that has two different verticals, i.e.

OpenAI LP

OpenAI INC

The organization was founded by Elon Musk, Ilya Sutskever, Wojciech Zaremba Greg Brockman, and Sam Altman.

5. Will ChatGPT take jobs?

Ans: The concept of ChatGPT is so unique that it gained popularity in a matter of no time. But, since it’s in a new phase and there are tons of work required to generate quality output. So, the answer is “NO” it’s not going to take jobs in the upcoming years.

6. What are ChatGPT Alternatives?

Coding Made Easy with ChatGPT - AI Webinar - Feb 22, 2023

Ans: Since its launch, many companies have come out offering different tools for AI-based content, among which are listed below:

ADVANTAGES OF CHATGPT:

Generates human-like text: ChatGPT is trained on a large corpus of text and can generate text that is difficult to distinguish from text written by a human.

Can be fine-tuned: ChatGPT can be fine-tuned on specific tasks such as question answering, dialogue systems, and text summarization.

Can handle a wide range of topics: ChatGPT has been trained on a diverse set of text, which means it has the ability to handle a wide range of topics.

Can generate long-form text: ChatGPT is capable of generating long pieces of text, such as articles or essays, which can be useful for content creation.

Can be integrated with other systems: ChatGPT can be integrated with other systems such as chatbots, virtual assistants, and language translation systems.

DISADVANTAGES  OF CHATGPT:

Lack of context awareness: ChatGPT does not have the ability to understand the context of the conversation or the user’s intent, which can lead to irrelevant or nonsensical responses.

Can generate biased or offensive content: Since ChatGPT is trained on a dataset that reflects the biases of its source data, it can generate biased or offensive content if not properly monitored.

Lack of common-sense knowledge: ChatGPT does not possess common sense knowledge, it may struggle to understand and respond to certain types of questions or requests that involve common-sense reasoning.

Over-reliance on training data: ChatGPT’s performance is directly tied to the quality and quantity of the training data.

Can be computationally expensive: Running ChatGPT requires a significant amount of computational resources, which can be costly and impractical for some applications.

BENEFITS OF CHATGPT:

Automation of tasks: ChatGPT can automate tasks such as text generation, content creation, and customer service, which can save time and increase efficiency.

Improved customer service: ChatGPT can be integrated into customer service systems to provide quick and accurate responses to customers, improving the overall customer experience.

Increased productivity: ChatGPT can assist with writing, research, and data analysis, which can increase productivity in various industries.

Personalization: ChatGPT can be fine-tuned to understand the specific language and style of a given industry or company, providing a more personalized experience for customers.

Creative applications: ChatGPT can be used in creative applications such as writing fiction, poetry, or song lyrics, which can be beneficial for artists, writers and other creatives.

More Information:

https://openai.com/blog/chatgpt/

https://www.geeksforgeeks.org/what-is-chatgpt/

https://levelup.gitconnected.com/how-to-use-chatgpt-for-developers-4e7f354bbc02

https://medium.com/@tanyamarleytsui/coding-with-chatgpt-b50ab3fcb45f

https://platform.openai.com/docs/introduction/overview

https://www.scientificamerican.com/article/ai-platforms-like-chatgpt-are-easy-to-use-but-also-potentially-dangerous/

https://www.projectpro.io/article/chatgpt-application-examples/713


IBM Osprey Fastest Quantum Chip in the World

$
0
0

 


IBM Unveils 433-Qubit Osprey Chip 

Next year 2023 entanglement hits the kilo-scale with Big Blue’s 1,121-qubit Condor

End 2022, IBM announced Osprey, a 400+ qubit quantum processor. IBM aims to achieve quantum systems with 4,000+ qubits by 2025, unlocking supercomputing capabilities and tackling increasingly complex computational problems.

IBM Osprey has the largest qubit count of any IBM quantum processor, more than tripling the 127 qubits on the IBM Eagle processor unveiled in 2021. This processor has the potential to run complex quantum computations well beyond the computational capability of any classical computer.

IBM has built the largest quantum computer yet. Dubbed Osprey, it has 433 qubits, or quantum bits, which is more than triple the size of the company's previously record-breaking 127-qubit computer and more than eight times larger than Google's 53-qubit computer Sycamore.

Exclusive—IBM Shares Details of Its 400+ Qubit Quantum Processor

At the IBM Quantum Summit in Nov. 2022, IBM announced Osprey, a 400+ qubit quantum processor. IBM aims to achieve quantum systems with 4,000+ qubits by 2025, unlocking supercomputing capabilities and tackling increasingly complex computational problems.

We spoke with Oliver Dial, physicist and chief quantum hardware architect at IBM, involved in developing the new 400+ qubit quantum processor. 

IBM Osprey - The World's Most Powerful Quantum Computer 

The 433-qubit IBM Osprey chip

The 433-qubit IBM Osprey chip. Image courtesy of Ryan Lavine/IBM



Why This Breakthrough Quantum Computer From IBM Will Change Computing Forever

Dial has significant experience developing high-frequency electronics, cryogenic systems, and semiconductor spin qubits. At IBM, he specializes in superconducting qubits, researching their underlying physics and collecting system-level metrics.

A Quantum Processor with 400+ Qubits

IBM’s new quantum processor contains 433 qubits known as transmons, which are essentially superconducting resonators that can store 0 or 1 microwave photons. These qubits can be manipulated by applying microwave pulses of different frequencies to them from outside the processor.  

“Our qubits are connected to each other with busses. Different qubits directly connected by busses have different frequencies, so we can control them independently,” Dial explained. “While transmons are a common qubit type, we use fixed-frequency transmons—meaning the frequency of microwaves we use to control them is determined when we make the device. We can’t tweak it during testing. This gives our devices great coherence times but puts a lot of emphasis on fabricating things accurately, so we can meet that frequency requirement.”

The researchers' device is supported by passive microwave circuitry, which does not deliberately absorb or emit microwave signals but redirects them. Examples of on-chip passive circuitry include microwave resonators that measure the state of the qubits, filters that protect the qubits from decaying out of a drive line, and transmission lines (in other words, wires) that deliver microwave signals to the qubits and to and from the readouts.

Presentation of the 433-qubit IBM Osprey chip

Dario Gil (IBM senior VP and director of research), Jay Gambetta (IBM fellow and VP of quantum computing), and Jerry Chow (IBM fellow and director of quantum infrastructure) presenting the 433-qubit IBM Osprey chip. Image courtesy of Ryan Lavine/IBM

“We build all this circuitry on chip with the qubits, using much of the same techniques as what’s called back-end-of-line wiring in traditional CMOS processes,” Dial said. “However, all these techniques must be modified to use superconducting metals.”

These multi-layered devices place qubits on a single chip, which is connected to a second chip known as the interposer through superconducting bonds. The interposer has readout resonators on its surface and multi-level wiring buried inside it, which delivers signals into and out of the devices.

IBM delivers its 433-qubit Osprey quantum processor

IBM delivers its 433-qubit Osprey quantum processor. It has the largest qubit count of any IBM quantum processor, more than tripling the 127 qubits on the IBM Eagle processor unveiled in 2021. Image courtesy of Connie Zhou/IBM

This unique design creates a clear separation between qubits, readout resonators, and other circuitry, reducing microwave loss, which the qubits are very sensitive to. Ultimately, this is what allowed the researchers to pack so many qubits on a single chip to maintain good coherence.

“We developed this general structure in Eagle, a 127-qubit processor that we built last,” Dial said. “Eagle was the first integration of all these technologies, while Osprey proves that we can use them to make processors larger than anything we’ve made before. A lot of what’s new on Osprey isn’t what’s on the chip itself—which is a refinement of Eagle—but what surrounds it.”

IBM Quantum System Two

A More Sophisticated Design

IBM's new quantum processor operates at very low temperatures of approximately 0.02 degrees Kelvin. The team thus had to identify a strategy to deliver hundreds of microwave signals into this low-temperature environment, considering the little cooling power of its processor’s refrigerator ( about 100 µW of power).

“The cables that deliver microwave signals to our processor are a particular problem, as most things that conduct electricity well also conduct heat and thus compromise the insulation of our refrigerator,” Dial explained. “To tackle this problem, our Eagle processor used over 600 cables going between different stages of the fridge, each assembled, wired, and tested by hand. In Osprey, we replace most of these cables with flexible ribbon cables created using standard printed circuit board techniques. Each one of these cables replaces many individual cables, connectors, and components—simplifying our design and thus increasing the processor’s reliability.”

The Osprey processor is supported by a new generation of control electronics, instruments outside of the refrigerator that create an interface between classical and quantum computing tools. These tools, which build on IBM’s previous work, generate microwave control signals for the new chip and interpret signals that come back.

IBM’s new processor

IBM’s new processor has the potential to run complex quantum circuits beyond what any classical computer would ever be capable of. For reference, the number of classical bits that would be necessary to represent a state on the IBM Osprey processor exceeds the total number of atoms in the known universe. Image courtesy of Connie Zhou/IBM

“We achieved a new and simpler design for generating the analog signals based on direct digital synthesis and water cooled to increase the density of the electronics—letting us reach a whopping 400 qubits of control per rack,” Dial said.

The Osprey processor is based on a platform that was refined over several years, with technologies that IBM already tested and implemented on its Falcon, Hummingbird, and Eagle processors. The primary advancements from these previous processors are the wiring and control systems outside of the chip, as well as the scaled-up software stack.

“We’re also incorporating some learning into how we tune the device (i.e., its gate times, powers, etc.), which we think will make large sections of the device have much better average fidelities than what we’ve typically managed in the past,” Dial said. “We think this will make it an ideal platform for studying error mitigation—running multiple copies of a circuit with slight variations to generate more accurate expectation values.” 

IBM Quantum State of the Union 2022

Approaching the Quantum-centric Supercomputing Era

The new processor created by Dial and his colleagues is another step toward the era of quantum-centric supercomputing (i.e., when quantum computers can solve arbitrarily-scaled problems).

“When we build a classical supercomputer, we don’t build a single fast processor, but we harness many processors working in parallel, which creates flexibility to solve one large problem, or many small problems at the same time,” Dial explained. “Similarly, we want to work toward a quantum architecture that can scale up and down, solving the parts of our users’ problems that are best solved on a quantum computer with a quantum computer, and solving the parts of their problems that are best solved on a classical with a classical computer.”

To allow users to harness the strengths of both quantum and classical computing technologies, IBM is working on a range of middleware and software tools that enable better communication between these different types of computing systems.  

“We use the example of circuit knitting a lot when explaining this idea,” Dial said. “Our goal here is to take a single quantum circuit that is too large to run on a single quantum processor and break it up into smaller pieces that can be run on multiple processors. If all we have is classical communication between processors, we can do this, but the overhead (number of extra times we need to run the circuit) is large. If we expand that classical communication to include real-time classical communication (the ability to measure a qubit on one processor, turn it into classical data, move it to another processor, and change what we do on that second quantum processor all within a few microseconds), new advanced knitting options become possible. This richer communication allows better scaling, but now the computers need to be close enough to make this high-speed communication possible—distances of meters, not miles.”

Dial and his colleagues are now working on a new technology known as I-couplers, set to be unveiled by 2024, which could make the overhead vanish entirely. I-couplers are microwave links between quantum processors that can be cooled down to the devices’ milli-Kelvin temperatures so that they can be literally frozen into a system when the processor is cooled down.

“The final, very long-term project we’re working on in this area is called transduction: moving quantum information with optical photons instead of microwaves,” Dial added. “This would allow us to make reconfigurable quantum networks, but it’s a much more difficult technology to master. Nobody has fully demonstrated this in our systems.”


Other Advances and Future Outlooks

At the IBM Quantum Summit 2022, IBM also unveiled the Quantum System Two update, a platform that supports the operation of larger processors and the diverse types of communication that would characterize a quantum-centric supercomputer. Combined with its new processor and other tools, this platform paves the way for yet another year of exciting quantum technology advancements.

“There are things we are continually working to improve: our qubit coherence times, our gate fidelities, the density and crosstalk of our devices,” Dial said. “For the next year or two, we will also focus on two big hardware-centric projects. One involves various types of communication between quantum processors: real-time classical, chip-to-chip quantum gates (quantum multi-chip-modules), and long-range quantum communication—the basic ingredients for the quantum-centric supercomputer. The other is the introduction of cryo-CMOS control to our production systems.”

Currently, IBM’s control hardware is based on field-programmable gate arrays (FPGAs), which increases its cost and limits attainable qubit densities. The team hopes that moving to CMOS-based control components integrated into the refrigerator will simplify wiring and signal delivery problems in quantum computers, bringing them closer to their goal of developing a system with a few thousand qubits.

“As we talk about tens of thousands of qubits, error correction becomes more important,” Dial noted. “We believe we can get more efficient error correcting codes, but this will require more complicated connections between our qubits than those we have today. Right now, our heavy-hex devices (and most devices people make) have 2D arrays of qubits. Each qubit is connected to other nearby qubits on the surface of the chip in some repeating pattern. We are beginning investigations into creating connections between distant qubits on the chip and crossovers between those connections, which could pave the way toward machines that can implement efficient fault tolerant codes.”

IBM Unveils 400 Qubit-Plus Quantum Processor and Next-Generation IBM Quantum System Two

Company Outlines Path Towards Quantum-Centric Supercomputing with New Hardware, Software, and System Breakthrough

IBM (NYSE: IBM) kicked off the IBM Quantum Summit 2022, announcing new breakthrough advancements in quantum hardware and software and outlining its pioneering vision for quantum-centric supercomputing. The annual IBM Quantum Summit showcases the company's broad quantum ecosystem of clients, partners and developers and their continued progress to bring useful quantum computing to the world.

"The new 433 qubit 'Osprey' processor brings us a step closer to the point where quantum computers will be used to tackle previously unsolvable problems," said Dr. Darío Gil, Senior Vice President, IBM and Director of Research. "We are continuously scaling up and advancing our quantum technology across hardware, software and classical integration to meet the biggest challenges of our time, in conjunction with our partners and clients worldwide. This work will prove foundational for the coming era of quantum-centric supercomputing."

At the Summit, the company unveiled the following new developments:

‘IBM Osprey’ - IBM’s new 433-quantum bit (qubit) processor

IBM Osprey has the largest qubit count of any IBM quantum processor, more than tripling the 127 qubits on the IBM Eagle processor unveiled in 2021. This processor has the potential to run complex quantum computations well beyond the computational capability of any classical computer. For reference, the number of classical bits that would be necessary to represent a state on the IBM Osprey processor far exceeds the total number of atoms in the known universe. For more about how IBM continues to improve the scale, quality, and speed of its quantum systems, read Quantum-Centric Supercomputing: Bringing the Next Wave of Computing to Life.

The Next Wave - IBM Quantum Summit 2022 Keynote

New quantum software addresses error correction and mitigation

Addressing noise in quantum computers continues to be an important factor in adoption of this technology. To simplify this, IBM released a beta update to Qiskit Runtime, which now includes allowing a user to trade speed for reduced error count with a simple option in the API. By abstracting the complexities of these features into the software layer, it will make it easier for users to incorporate quantum computing into their workflows and speed up the development of quantum applications. For more details read Introducing new Qiskit Runtime capabilities — and how our clients are integrating them into their use cases.

IBM Quantum System Two update – IBM’s next-generation quantum system

As IBM Quantum systems scale up towards the stated goal of 4,000+ qubits by 2025 and beyond, they will go beyond the current capabilities of existing physical electronics. IBM updated the details of the new IBM Quantum System Two, a system designed to be modular and flexible, combining multiple processors into a single system with communication links. This system is targeted to be online by the end of 2023 and will be a building block of quantum-centric supercomputing — the next wave in quantum computing which scales by employing a modular architecture and quantum communication to increase its computational capacity, and which employs hybrid cloud middleware to seamlessly integrate quantum and classical workflows.

New IBM Quantum Safe technology: 

As quantum computers grow more powerful, it is crucial that technology providers take steps to protect their systems and data against a potential future quantum computer capable of decrypting today's security standards. From offering the z16 system with quantum safe technology, to contributing algorithms in connection with the National Institute of Standards and Technology's (NIST) goal for standardization by 2024, IBM offers technology and services with these security capabilities. At the Summit, IBM and Vodafone announced a collaboration to explore how to apply IBM's quantum-safe cryptography across Vodafone's technology infrastructure. 

Client & Ecosystem Expansion: Growth of IBM Quantum Network: IBM also announced today that German conglomerate Bosch has joined the IBM Quantum Network to explore a variety of quantum use cases. Other recent additions to the network include multinational telco Vodafone to explore quantum computing and quantum-safe cryptography, French bank Crédit Mutuel Alliance Fédérale to explore use cases in financial services, and Swiss innovation campus uptownBasel to boost skill development and promote leading innovation projects on quantum and high-performance computing technology. These organizations are joining more than 200 organizations — and more than 450,000 users — with access to the world's largest fleet of more than 20 quantum computers accessible over the cloud.

"The IBM Quantum Summit 2022 marks a pivotal moment in the evolution of the global quantum computing sector, as we advance along our quantum roadmap. As we continue to increase the scale of quantum systems and make them simpler to use, we will continue to see adoption and growth of the quantum industry," said Jay Gambetta, IBM Fellow and VP of IBM Quantum. "Our breakthroughs define the next wave in quantum, which we call quantum-centric supercomputing, where modularity, communication, and middleware will contribute to enhanced scaling computation capacity, and integration of quantum and classical workflows."

Statements regarding IBM's future direction and intent are subject to change or withdrawal without notice and represent goals and objectives only.

Quantum Computing: Now Widely Available!

IBM is a leading global hybrid cloud and AI, and business services provider, helping clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. Nearly 3,800 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently, and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and business services deliver open and flexible options to our clients. All of this is backed by IBM's legendary commitment to trust, transparency, responsibility, inclusivity, and service. For more information, visit https://www.ibm.com/quantum.  

In 2021, IBM unveiled Eagle, the first quantum processor with more than 100 qubits. Now the company has debuted Osprey, which possesses more than three times as many qubits. The advances IBM made to triple the number of qubits on a chip in just one year suggest Big Blue is on track to deliver Condor, the world’s first universal quantum computer with more than 1,000 qubits, in 2023, the company says.

Quantum computers can theoretically find answers to problems that classical computers would take eons to solve. The more components known as qubits are quantum-mechanically linked or entangled together in a quantum computer, the more computations it can perform, in an exponential fashion.

The qubit numbers of IBM’s quantum computers have steadily grown over time. In 2016, the company put the first quantum computer on the cloud for anyone to experiment with—a device with 5 qubits, each a superconducting circuit cooled to near-absolute-zero temperatures of roughly 20 milliKelvin (-273 degrees C). In 2019, the company debuted the 27-qubit Falcon; in 2020, the 65-qubit Hummingbird; and in 2021, the 127-qubit Eagle.

“Each one of the improvements we used led to a little improvement in speed, but once we lost all these bottlenecks, we saw a major improvement in speed.”

—Oliver Dial, IBM

Next year, IBM aims to launch its 1,121-qubit Condor processor, which stands poised to become (barring any surprises in the meantime) the world’s largest general-purpose quantum processor. The new Osprey chip, unveiled at IBM’s Quantum Computing Summit on 9 November, reveals key steps the company is taking in order to scale up to this ambitious goal.

One strategy that began with Eagle and continues with Osprey is to separate the wires and other components needed for readout and control onto their own layers. This multi-level wiring helps protect infamously fragile qubits from disruption, helping the processor incorporate larger numbers of them.

“We probably didn’t need all that technology to deploy a 100-qubit device, but doing all that helped set up Osprey and Condor,” says Oliver Dial, IBM Quantum’s chief hardware architect. “We now have the technology in hand to go way beyond 100 qubits.”

Osprey possesses two major advantages over Eagle outside the chip, Dial notes. One is replacing the “quantum chandelier” of microwave cables IBM used with its previous quantum processors with flexible ribbon cables, the kind you might find that carries signals between, for instance, the motherboard and screen if you open up a cellphone or a laptop, he says.

Chart with images and labels shows IBM Falcon (27 qubits), IBM Hummingbird (65 qubits), IBM Eagle (127 qubits), IBM Osprey (433 quibits).IBM’s 433-qubits Osprey quantum processor more than triples the 127 qubits on the IBM Eagle processor unveiled in 2021.CONNIE ZHOU/IBM

“All these microwave cables, which get microwave signals in and out of the refrigerator where the qubits are stored, are not very scalable,” Dial says.

Osprey’s flexible ribbon cables are adapted to cryogenic environments. The electrical and thermal resistance of the cables are tailored to help microwave signals flow while not conducting too much heat that might interfere with the qubits. This led to a 77 percent increase in the number of connections leading to the chip—“basically, almost twice as many wires”—which will help IBM scale up its quantum computers, Dial says.

The other major advantage seen with Osprey is a new generation of the control electronics that send and receive microwave signals to and from the quantum processor. Whereas Dial says IBM’s first phase of control electronics (2019-’21) enjoyed a greater flexibility, Osprey’s control electronics “are more specialized, more tailored to quantum devices, to produce the exact signals we need, the frequencies we need, the power we need,” Dial says.

“We play Osprey to get on the cloud the middle of next year.”

—Oliver Dial, IBM

Osprey: The World's Largest Quantum Computer

These improvements “have reduced cost, which is an important consideration as we scale up,” Dial says. “With our first generation of five and 20 qubit devices, we needed an entire rack of control electronics, and with Eagle we saw 40 qubits per rack. Now we can control more than 400 qubits with one rack of equipment.” He adds that qubit density has also increased with Osprey.

IBM’s new control electronics include a cryo-CMOS prototype controller chip implemented using 14-nanometer FinFET technology that runs at roughly 4 Kelvin (-269.15 degrees C). (According to IBM, it is expected to be implemented into future generations of their quantum computer control electronics.) The prototype chip uses an application-specific integrated circuit (ASIC) design that is less bulky and power-hungry than previous field-programmable gate array (FPGA) approaches. “Instead of about 100 watts per qubit like we needed before, we only need about 10 milliwatts, so we can fit far more qubits onto a chip,” Dial says.

In addition, recent advances in microwave signal generation and reception from the telecommunications and defense industries “means direct digital synthesis is finally affordable,” Dial says. “Instead of generating signals at a few hundred megahertz and mixing to 5 gigahertz, you can now directly generate at 5 gigahertz, which reduces the number of components and increases simplicity.”

These hardware improvements—along with other factors, such as better methods to handle quantum computing workloads, and faster device drivers—have led to a major boost in speed. Based on an IBM quantum computing speed metric known as circuit layer operations per second (CLOPS), the company has gone from 1,400 to 15,000 CLOPS with its best systems, Dial says. (Quantum programmers run quantum algorithms on quantum computers that are made up of quantum circuits, which describe sequences of elementary operations, called quantum gates, that are applied on a set of qubits. CLOPS is a measure of the speed at which a quantum computer runs quantum circuits.)

“Quantum chandelier” no more: IBM says its Osprey processor introduces a high-density control signal delivery with flex wiring, pictured here.CONNIE ZHOU/IBM

“Each one of the improvements we used led to a little improvement in speed, but once we lost all these bottlenecks, we saw a major improvement in speed,” Dial says.

Before Osprey becomes widely available for use, IBM is spending extra time setting up its control electronics and calibrating the system. “We plan Osprey to get on the cloud the middle of next year,” Dial says.

IBM is also preparing to include optional error mitigation techniques within the cloud software for its quantum computers that can essentially trade speed for more accurate results. “Instead of pushing complexity onto the user, we’re building these capabilities in the back end to take care of these details,” Dial says. “By the end of 2024, we expect that error mitigation with multiple Heron chips running in parallel in our ‘100 by 100 initiative’ can lead to systems of 100 qubits wide by 100 gates deep, enabling capabilities way past those of classical computers.”

IBM also announced that it was partnering with communications technology company Vodafone to develop post-quantum cryptography that can defend against future quantum computers that could rapidly break modern cryptography. “We’re working on crypto-agility, the ability to move between cryptographic schemes to acknowledge how cryptography constantly changes and advances,” Dial says.

More Information:

https://newsroom.ibm.com/2022-11-09-IBM-Unveils-400-Qubit-Plus-Quantum-Processor-and-Next-Generation-IBM-Quantum-System-Two

https://www.ibm.com/quantum/summit

https://spectrum.ieee.org/ibm-quantum-computer-osprey

https://www.popsci.com/technology/ibm-quantum-summit-osprey/

https://www.siliconrepublic.com/machines/ibm-osprey-quantum-processor-computing-summit

https://www.allaboutcircuits.com/news/exclusive-ibm-shares-details-of-its-400-plus-qubit-quantum-processor/

https://www.hpcwire.com/2022/12/01/ibm-quantum-summit-osprey-flies-error-handling-progress-quantum-centric-supercomputing/









Neuralink: Merging Man and Machine

$
0
0

 Merging Man and Machine


NEURALINK LET’S BECOME CYBORGS

Many experts believe that even at the early stage of True AI, humans will be left behind at some specific task from AI. And I personally believe that AI will surpass us at some point, which is a threat to humanity.

Neuralink is Elon Musk‘s answer to the threat artificial intelligence poses to the human race. In many ways, we are like cyborgs anyway because our smartphone is like an extension of us, like a limb. The problem is the communication bandwidth to the phone is too slow.

It’s a speed issue. We are getting the information to our brains too slowly and not sending it out fast enough. You need something artificial in your brain to help with this process. Neuralink connects human brains to computers through a chip smaller than a penny. You insert it by drilling a small hole in the skull. Although, this could be as simple as a LASIK procedure in the future.

Anyway, A robot implants the device to ensure extreme precision. The chip is connected to the brain through 1,000 tiny wires that are ten times thinner than a human hair. The company says these wires are flexible enough not to damage brain tissue. This is key because other BMIs such as deep brain stimulation which is used to treat Parkinson’s insert much larger wires which carry more risks such as strokes.

Introducing Neuralink

So How Does The Neuralink Device Work, How Does It Enhance Your Brain?

Your brain is made up of neurons. 100 billion neurons. The neurons come together to form a large network through synapses. At these connection points, neurons talk to each other using chemical signals called neurotransmitters.

Everything you see, hear, taste, smell, feel – these are just neurons firing or what’s called an action potential. This is where the Neuralink chip comes in. The wires on the chip carry electrodes. The electrodes record the signals produced by the action potential and send them to an external device worn behind the ear.

An App controls the device so the electrodes basically read what is happening in the brain and repair the broken circuits. The electrodes stimulate as many neurons as possible in orders of magnitude greater than what’s ever been done before to increase the capabilities of our brain.

This will be life-changing for people with brain disorders. Neuralink says it will help people who are paralyzed to move and allow those who are blind to see, treat Alzheimer’s, cure epilepsy.

When Elon appeared on a podcast, he said “Neuralink could in principle fix anything that is wrong with the brain and that the device could be implanted into someone as part of a trial within a year. It’s already been tested on monkeys. A monkey has been able to control the computer with his brain.”

Neuralink is building on the shoulders of other implant technologies. Matthew Nagle, who was paralyzed after being stabbed in the neck, was the first person to use a brain-machine interface in 2006 that allowed him to control a computer cursor. Even though Neuralink is focused on treating brain injuries, for now, that is just the starting point for the company.

There will be updated versions of Neuralink. Kind of like a software update for your computer where the latest version could be capable of so much more.

Neuralink: Merging Man and Machine

So What Are Some Things You Could Do With A Neuralink Device In The Future?

Elon says humans will be able to recall everything like a movie and play it back. We will have superhuman memory. If you travel to a foreign country you will be able to download a program and speak in their language. In fact, you would not need to speak at all. You could communicate with people without speaking if you both have an implant like telepathic communication.

Neuralink is just one of many companies thinking about brain-machine interfaces. Facebook is funding research on BMIs that will allow you to type with just your mind. Google’s parent company Alphabet has a research organization that wants to treat diseases with an implantable device. The ultimate goal of Neuuralink is to achieve a symbiosis with artificial intelligence.

But merging with AI comes with dangers. When your computer gets hacked, that’s a problem. When it comes to your brain, that’s a bigger issue. Scientists have been urging governments to address what data should be collected, how data will be kept safe, how citizens can opt-out of having their data shared as a default.

But all the problems aside it has the potential to change your lives and I don’t say that lightly. It can change our lives in the way the internet has connected the world, in the way aeroplanes have allowed us to travel, in the way medicines have cured disease.

We may be looking at our world now and thinking that we are lucky to live in this day and age and that two decades ago, they couldn’t have imagined what would be possible. Yet two decades from now, people will look at us and think about how far behind we were.

Merging with AI: How to Make a Brain-Computer Interface to Communicate with Google using Keras and OpenBCI

Elon Musk and Neuralink want to build a Brain-Computer Interface that can act as the third layer of the brain, allowing humans to form a symbiotic relationship with Artificial Intelligence.

But what if you can already do that?

In a (very) limited form, you actually can.

Dendrites: Why Biological Neurons Are Deep Neural Networks

BACKGROUND

Brain-Computer Interface (BCI) broadly refers to any system that establishes a direct connection between the nervous system and an electronic device. These devices may be surgically implanted in the brain, or they may be external. Typical paradigms include allowing a user to control an actuator or keyboard, allowing a device to send sensory data to the user, or bilateral communication involving both sensory data and motor control (i.e. a prosthetic arm that receives motor control input and sends sensory data on pressure or temperature)

Historically, neuroprosthetics have been the primary motive for BCI research. These include artificial limbs for amputees, cochlear implants for the deaf, and deep brain stimulation for individuals suffering from seizures. Already, these devices have improved the lives of millions of people, and their widespread use demonstrates the benefits of achieving direct bilateral communication between the brain and electronic devices. However, the possible applications of the technology extend far beyond healthcare. Even within the realm of neuroprosthetics, we can imagine going beyond just repairing and consider augmenting our abilities beyond normal human levels. Artificial limbs may one day progress to the point where they are, by any objective criterion, superior to their natural counterparts. These limbs may look and feel just like normal limbs, but would be far stronger and more agile. Another example would be artificial eyes that are capable of far higher resolution than human eyes, an ability zoom in or out, and to see in the UV or IR spectrum.

The Science Behind Elon Musk’s Neuralink Brain Chip | WIRED

The possibilities get even more interesting when considering cognition and skill formation. A recent study demonstrates that stimulating certain parts of the brain improves memory formation and recall. Other experiments have managed to artificially implant memories in animals. As an example, it may be possible to apply the methods of these studies to improve your ability to quickly learn an instrument. Or perhaps it may be possible to combine various neurostimulators and sensors to develop an “arithmetic processing unit” that can detect when particular areas of the brain associated with mathematical or logical reasoning are activated, and communicates with them to enhance abilities.

It is an extension of this cognitive augmentation that Elon Musk and Neuralink want to pursue. According to Musk and many leading AI theorists, a key barrier in humanity’s intellectual progress relative to AI is the bandwidth problem: although computers and AI are becoming ever faster and more capable of processing and generating knowledge, we face immediate and fundamental limitations in our ability to do the same. We acquire information primarily through our senses and ability to interpret language. In the time it takes your eyes and visual cortex to read and understand a single sentence, a computer can scan through thousands of pages of text. It’s conceivable that in a few decades time, we may have advanced AI running on specialized neuromorphic hardware with incredibly accurate models of how the world works and an ability to analyze and understand millions of documents in minutes, making decisions and inferences that are far beyond human comprehension. In a world increasingly dependent on AI driven decision making, humans may find themselves obsolete in all parts of the business, scientific, and political decision making process. Our brains did not evolve to play a game of chess with trillions of pieces or to comprehend calculated strategies that plan millions of moves ahead. It is a fear of this super-intelligent black box that motivates much of the current work at Neuralink, Kernel, and several other related organizations.

Consciousness in Artificial Intelligence | John Searle | Talks at Google

Most of the leading edge research in BCI technology seeks to maximize the information bandwidth, typically through invasive methods that implant electrodes directly into the brain or nerves. However, non invasive methods, specifically electroencephalography (EEG) and electromyography (EMG) are routinely used with considerable success. These involve placing electrodes on the surface of your head (EEG) or skin above muscles (EMG) to measure the cumulative electrical activity underneath. The granularity of this data is low, and it is a far cry from the level of precision and bandwidth that will ultimately be needed to realize the more ambitious goals of BCI research. Nevertheless, EEG/EMG enabled BCIs have achieved incredible feats, like controlling drones, video games, and keyboards with thought, and they provide a small glimpse into the possibilities that further research may unlock. Furthermore, several companies like Cognixion and Neurable are exploring real world applications of EEG based BCIs and have received considerable funding and support with many exciting projects underway.

OVERVIEW

In this project, we establish a direct connection between your nervous system and an external AI agent. This agent may be anything you can get an API for: Google Assistant, Siri, Alexa, Watson, etc. Services like Dictionary or YouTube would qualify as well, but these would limit applications to content queries rather than general purpose requests.

For the purposes of this project, we will query Google Search directly as it provides the most flexibility and is the easiest to set up. Upon completion, you should be able to query a handful of terms on Google simply by thinking about them.

The technique we use exploits the nerve signals generated by your brain in the process of subvocalization. This is the “internal monologue” that takes place inside your head when you slowly and deliberately read or think. You may have noticed yourself doing this when reading silently, sometimes to the point where you’re subtly moving your jaw and tongue without even realizing it. You may also have come across the concept when receiving tips for SAT, MCAT, GRE or other standardized test preparation. Test takers are advised to avoid subvocalization as it’s a bad habit that slows down reading speed.

We are able to exploit subvocalization because the brain sends signals to your larynx corresponding to words you think to say, even if you don’t intend to actually say them out loud. By placing electrodes on your face over the laryngeal and mandibular nerves, we can record signals corresponding to specific words and use them to train deep learning models that discern between different words. In other words (no pun intended), we can discern when you’re thinking about a particular word simply from the act of you thinking about it.

Brain Criticality - Optimizing Neural Computations


Brain and Laryngeal Nerves

This technology has its limitations, and it is by no means perfect or ready for practical use. However, since its first real world demonstration two years ago by the MIT Media Lab, it has been used successfully in devices that allow users to do math, make phone calls, order pizza, and even receive assistance while playing chess.

MIT Media Lab AlterEgo Headset

SETUP & MATERIALS

The primary hardware tool required is an OpenBCI Ganglion board. There are a variety of other hardware options available, but I found OpenBCI to have one of the largest communities of developers for support. It sets you back about $200, but it’s well worth it given the incredible stuff you can build with it.

OpenBCI Board and Electrodes

In addition to the board, you’ll need electrodes and wires. A set of gold cup electrodes and electrode gel should cost $50 and should work fine.

Ganglion Board

Electrodes

Electrode Gel

Alternatively, you can get a complete OpenBCI starter kit, which includes the board and multiple types of dry electrodes, as well as an electrode headband, for $465. It’s a bit pricey, so the gold cup setup is completely fine. Though, if you plan to experiment with other applications of BCI, like VR (tutorial with Unity VR coming soon!), the headband and dry electrodes make for a far better experience.

How Your Brain Organizes Information

Biosensing Starter Kit

OpenBCI also offers 8 and 16 channel boards. These will offer superior data quality, but the 4 channel Ganglion will be adequate for this project.

CONFIGURATION

On a Linux machine, check if you have Python 3.4 or higher. Open your terminal and type the following command:

python3 --version

If you don’t have Python, or if you have an older version, enter:

$ sudo apt-get update

$ sudo apt-get install python3.6

Now, download or clone the pyOpenBCI directory.

Change directory into the repository, and run the following command to install the prerequisite packages:

$ pip install numpy pyserial bitstring xmltodict requests bluepy

You’re now ready to install pyOpenBCI

$ pip install pyOpenBCI

To see some action, change directory to pyOpenBCI/Examples and find print_raw_example.py. Open this file with your favorite code editor, and make the following change on line 7:

board = OpenBCICyton(daisy = False)

Should be changed to:

board = OpenBCIGanglion(mac=’*’)

This allows pyOpenBCI to employ the appropriate modules for the particular board we are using.

Now, power on your board.

On your computer, from the Examples directory in your terminal, type the following command:

$ sudo python print_raw_example

Boom!! Your terminal should now be flooded with a stream of raw input data from the board.

How are memories stored in neural networks? | The Hopfield Network #SoME2

RECORD SIGNALS

Now that we can obtain raw signals, we can start designing and building the data pipeline. To start, we must first convert the raw data into an LSL stream. LSL refers to Lab Streaming Layer, and is a protocol developed at the Swartz Center for Computational Neuroscience at UC San Diego to facilitate the recording and analysis of live data streams. LSL will stream our EEG data onto the local host, from where it can be picked up by other applications or scripts.

Modify the lsl_example.py file in pyOpenBCI/Examples to remove the AUX stream, which we do not need, an add a markers stream:

We must now define an experimental setup that will record the data in the form we want and store it for further use. We want the experiment to generate a data set of time series EEG data separated into intervals, with each interval corresponding to the subvocalization of a single word. To achieve this, we can execute an experiment that starts a recording session of N intervals, with each interval lasting T seconds. All samples within a given interval are annotated with the interval index and the particular word the user is instructed to subvocalize.

The lsl-record.py file from neurotech-berkeley serves as a good starting point. Modify the file in accordance with our defined setup:

You may adjust the termBank (line 64) to try various combinations of words in various contexts. You may also adjust the default duration (line 12) before each session.

Now it’s time for the fun part! Plug the electrodes into your board:

Left 4 channels are EEG in, right 2 channels are ground

Tape them to your face in the following configuration:

Find a quiet place to sit, and enter the following lines into separate terminals:

// Terminal 1: converts raw data to LSL and streams it

$ sudo python lsl_example

// Terminal 2: reads LSL data stream and executes experiment

$ sudo python lsl_record

Note: we run as sudo to allow the script to detect the MAC address of the board

This should start a recording session of the specified duration. You will be prompted with a random word from your term-bank to subvocalize over 2 second intervals. The recording sessions may get uncomfortable and sleep inducing, so it’s better to do multiple small sessions with breaks in between. Additionally, our experimental setup may lead to poor data quality if frequent disturbances occur (i.e. abrupt movements or subvocalizing the incorrect word).

You may design and implement a more flexible setup with an option to hit a key that deletes the current and previous intervals upon noticing a disturbance. Another workaround is to do multiple small sessions and combine the data at the end, discarding sessions with excessive disturbances. Some noise is inevitable, and you don’t have to be too picky as the model becomes more resilient with increased sample count.

For optimal results, you should have at least 1000 high quality samples for every word in your word bank.

Elon Musk’s Neuralink Event: Everything Revealed in 10 Minutes


PROCESS SIGNALS

Once you have enough data, it’s time to prepare it for use in machine learning.

Combine and preprocess your data as appropriate so that it has the following format:

Example Data Table

Words are indices from 1 to NumIntervals, which is the sum of SessionDuration/2 over the total number of sessions

Terms correspond to the word displayed at each interval

[A, B, C, D] are the EEG channels

Each word, term combination corresponds to approximately 800 lines of data

Import your CSV files into python using numpy. You should have all your data loaded in a NumLines x 6 ndarray in your script.

The first step is to filter the data to remove noise that is outside the frequencies we’re interested in. Informative EEG frequencies correspond to the following bands:

EEG Wave Frequencies

Filtering for frequencies between 4 Hz and 100 Hz may seem reasonable, but would fail because 60 Hz is the frequency of the power grid (may vary by country), which is bound to be a significant source of noise. For optimal results, we should filter between 4 Hz and 50 Hz.

We can use Scipy’s Butterworth filter to select the frequency range we want to keep. Define a filter with the following code:

Then, generate a timestamp column (as we combined multiple datasets and rendered the original timestamps invalid), and apply the filter to each channel:

Once filtered, use the following code to restructure your data into a three dimensional ndarrray array with dimensions IntervalLength x ChannelCount x IntervalCount.

What we’ve effectively done with the above code is converted time series data into image data. It might sound a bit weird, but you can think of each 2 second interval as an image, with each pixel corresponding to the signal value acquired at a particular (channelNumber, lineNumber) coordinate. In other words, we have a stack of IntervalCount images that are each IntervalLength x CannelCount in size.

First 120 data points of an EEG interval

This technique, demonstrated by Justin Alvey in a similar project, is incredibly powerful because it allows us to treat time series data as if it were image data, allowing us to leverage the power of computer vision and Convolutional Neural Networks (CNNs). You can even visualize a particular subvocalization by plotting it as an image

Neuralink Show and Tell, Fall 2022

Additionally, using CNNs allows us to skip Fourier transformations as various frequencies (emerging as patterns on each image) may be learned by the neural network without explicitly specifying what frequencies it should look for.

Now we’re ready to start building the CNN. Since we only have 1 color dimension, we can use a 1D CNN with input dimensions of IntervalLength and ChannelCount. You may experiment with different hyperparameters and architectures. I settled on a single convolution layer, two fully connected layers, and two pooling layers.

For a more detailed analysis on single dimensional CNNs and how they are applied to time series data, refer to this article by Nils Ackermann.

We now have a model that should be able to match an interval of EEG data to a specific word in your word bank.

Let’s see how well it does. Apply the model to the test data, and compare the predicted results against the actual results.

# Test Model

y_predicted = model.predict(X_test)

With two words in the term bank, I was able to achieve 90% accuracy. As expected, accuracy diminished slightly with additional words, with 86% for three way and 81% for four way.

Sample truth chart from two word classification. Left is Actual, Right is Predicted

One possible way to increase the size of the term bank without compromising accuracy is to create a hierarchical “term tree” with multiple-word queries. You may then perform depth first search on the tree -with each layer of words only being compared to others in that same layer of the same subtree- to find the best match.

GOOGLE SEARCH

We now have all the pieces necessary to query Google using your BCI. Define a mapping between specific subvocalizations and queries, and make the appropriate call:

and….

To do live queries as you think of them, modify and import the lsl_record.py script as a module. You may then call it to read the LSL stream in response to user input for a single 2 second interval.

That’s it! You can now search Google without saying or typing a single word.

Webinar // Artificial intelligence in medical devices


CONCLUSION

You can’t do too much with a three or four word term-bank (barring implementation of the term-tree mentioned earlier). Going through all these steps to search for directions to your nearest gas station is slightly more complicated than just Googling it the normal way. Nevertheless, it’s important to consider where further developments in this technology could lead to. We can imagine an improved and less conspicuous version of this device, not too different from the one the MIT team already has, being used for navigation, web queries, text messaging, smart home management, or any number of routine tasks. When combined with the power of ever improving AI assistants capable of context dependent interpretation, the possibilities expand even further.

The applications of EEG based BCIs are a small subset of what could ultimately be made possible by the cutting edge research taking place at companies and university labs across the world. Telepathic communication, superhuman intelligence, additional senses, simulated experiences, digitization of human consciousness, and merging with artificial intelligence are all worth considering. If these possibilities are realized, they won’t just redefine our relationship with technology: they’ll redefine what it means to be human.


REFERENCES

The following are a list of resources and organizations I found helpful in completing this project and in learning about BCIs in general. I’d like to especially acknowledge the AlterEgo Team at the MIT Media Lab for being the original inspiration for this project, as well as Mr. Alvey and NeuroTech Berkeley for their prior code and tutorial contributions to the BCI community. Additionally, I’d like to thank faculty at the University of California, Davis, notably Dr. Ilias Tagkopoulos, Dr. Karen Moxon, and Dr. Erkin Seker, for ongoing assistance and support.

AlterEgo: A Personalized Wearable Silent Speech Interface

Using deep learning to “read your thoughts” — with Keras and EEG — Justin Alvey

Neurotech Berkeley Github

Open BCI Gitub

Brain-Computer Interfacing: An Introduction — Rajesh Rao

Finally, I’d like to give a huge shout out to the growing BCI/ Neurotech communities which have provided endless support, resources, and enthusiasm for the future.

NeuroTechX

Reddit BCI

Reddit Neuralink

OpenBCI



How safe is our reliance on AI, and should we regulate it?

See this link for a Discussion




More Information:































ChatGPT and the rise of large language models

$
0
0

 







ChatGPT and the rise of large language models

ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health

What are Large Language Models (LLMs)?

Large Language Models (LLMs) have recently gathered attention with the release of ChatGPT, a user-centered chatbot released by OpenAI. In this perspective article, we retrace the evolution of LLMs to understand the revolution brought by ChatGPT in the artificial intelligence (AI) field.

Introduction to Large Language Models

The opportunities offered by LLMs in supporting scientific research are multiple and various models have already been tested in Natural Language Processing (NLP) tasks in this domain.


The impact of ChatGPT has been huge for the general public and the research community, with many authors using the chatbot to write part of their articles and some papers even listing ChatGPT as an author. Alarming ethical and practical challenges emerge from the use of LLMs, particularly in the medical field for the potential impact on public health. Infodemic is a trending topic in public health and the ability of LLMs to rapidly produce vast amounts of text could leverage misinformation spread at an unprecedented scale, this could create an “AI-driven infodemic,” a novel public health threat. Policies to contrast this phenomenon need to be rapidly elaborated, the inability to accurately detect artificial-intelligence-produced text is an unresolved issue.

1. Introduction

“ChatGPT” is a large language model (LLM) trained by OpenAI, an Artificial intelligence (AI) research and deployment company, released in a free research preview on November 30th 2022, to get users’ feedback and learn about its strengths and weaknesses (1) Previously developed LLMs were able to execute different natural language processing (NLP) tasks, but ChatGPT differs from its predecessors. It’s an AI chatbot optimized for dialog, especially good at interacting in a human-like conversation.

With the incredibly fast spread of ChatGPT, with over one million users in 5 days from its release, (2) many have tried out this tool to answer complex questions or to generate short texts. It is a small leap to infer that ChatGPT could be a valuable tool for composing scientific articles and research projects. But can these generated texts be considered plagiarism? (3, 4).

It took a while to adopt systems at the editorial level to recognize potential plagiarism in scientific articles, but intercepting a product generated by ChatGPT would be much more complicated.

In addition, the impact that this tool may have on research is situated within a background that has been profoundly affected after the COVID-19 pandemic (5). In particular, health research has been strongly influenced by the mechanisms of dissemination of information regarding SARS-CoV-2 through preprint servers that often allowed for rapid media coverage and the consequent impact on individual health choices (6, 7).

Even more than scientific literature, social media have been the ground of health information dissemination during the COVID-19 pandemic, with the rise of a phenomenon known as infodemic (8).

Starting from a background on the evolution of LLMs and the existing evidence on their use to support medical research, we focus on ChatGPT and speculate about its future impact on research and public health. The objective of this paper is to promote a debate on ChatGPT’s space in medical research and the possible consequences in corroborating public health threats, introducing the novel concept of “AI-driven infodemic.”

Principles of Multi-Modal Learning


2. The evolution of pre-trained large language models

The LLMs’ evolution in the last 5 years has been exponential and their performance in a plethora of different tasks has become impressive.

Before 2017, most NLP models were trained using supervised learning for particular tasks and could be used only for the task they were trained on (9).

To overcome those issues, the self-attention network architecture, also known as Transformer, (10) was introduced in 2017 and was used to develop two game-changing models in 2018: Bidirectional Encoder Representations from Transformers (BERT) and Generative Pretrained Transformer (GPT) (11, 12).

Both models achieved superior generalization capabilities, thanks to their semi-supervised approach. Using a combination of unsupervised pre-training and supervised fine-tuning, these models can apply pre-trained language representations to downstream tasks.

GPT models rapidly evolved in different versions, being trained on a larger corpus of textual data and with a growing number of parameters.

The third version of GPT (GPT-3), with 175 billion parameters, is 100 times bigger than GPT-2 and approximately two times the number of neurons in the human brain (13).

GPT-3 can generate text that is appropriate for a wide range of contexts, but unfortunately, it often expresses unintended behaviors such as making up facts, generating biased text, or simply not following user instructions (14).

This can be explained since the objective of many LLMs, including GPT-3, is to predict the next element in a text, based on a large corpus of text data from the internet, (15) thus LLMs learn to replicate biases and stereotypes present in that data (16).

Here comes the major problem of alignment: the difficulty of ensuring that a LLM is behaving in a way that is aligned with human values and ethical principles.

Addressing the alignment problem for LLMs is an ongoing area of research and OpenAI developed a moderation system, trained to detect a broad set of categories of undesired content, including sexual and hateful content, violence, and other controversial topics (17).

ChatGPT incorporates a moderation system, but the true innovation lies in its user-centered approach, which was used to fine-tune the model from GPT-3 to follow the user instructions “helpfully and safely” (14).

This process started from InstructGPT, a LLM with “only” 1.3 billion parameters trained using reinforcement learning from human feedback (RLHF), a combined approach of supervised learning, to obtain human feedback, and reinforcement learning using human preferences as a reward signal.

RLHF is used for adapting the pre-trained model GPT-3 to the specific task of following users’ instructions. From the optimization of InstructGPT for dialog, ChatGPT was born.


Despite these advancements, ChatGPT still sometimes writes plausible-sounding but incorrect or nonsensical answers, due to its inability of fact-checking and its knowledge limited until 2021 (1).

Lecture 9.1: Reinforcement Learning (Multimodal Machine Learning, Carnegie Mellon University)

Lecture 9.2: Multimodal RL (Multimodal Machine Learning, Carnegie Mellon University)

3. Large language models to support academic research

One potential application of LLM is in support of academic research. The scientific literature, with around 2.5 million papers published every year, (20) due to its magnitude is already beyond human handling capabilities.

AI could be a solution to tame the scientific literature and support researchers in collecting the available evidence, (21) by generating summaries or recommendations of papers, which could make it easier for researchers to quickly get the key points of a scientific result. Overall, AI tools have the potential to make the discovery, consumption, and sharing of scientific results more convenient and personalized for scientists. The increasing demand for accurate biomedical text mining tools for extracting information from the literature led to the development of BioBERT, a domain-specific language representation model pre-trained on large-scale biomedical corpora (22).

BioBERT outperforms previous models on biomedical NLP tasks mining, including named entity recognition, relation extraction, and question answering.

Another possible approach is the one of domain-specific foundation models, such as BioGPT and PubMedGPT 2.7B, (23, 24) that were trained exclusively on biomedical abstracts and papers and used for medical question answering and text generation.

Med-PaLM, a LLM trained using few-shot prompting, exceeds previous state-of-the-art models on MedQA, a medical question answering dataset consisting of United States Medical Licensing Exam (USMLE) style questions (25). The performance of ChatGPT on USMLE was recently evaluated and it achieved around 50–60% accuracy across all examinations, near the passing threshold, but still inferior to Med-PaLM (26).

GPT-4 exceeds the passing score on USMLE by over 20 points and outperforms earlier LLMs. Nevertheless, there is a large gap between competency and proficiency examinations and the successful use of LLMs in clinical applications (27).

In the NLP task of text summarization GPT-2 was one of the best-performing models used for summarizing COVID-19 scientific research topics, using a database with over 500,000 research publications on COVID-19 (CORD-19) (28, 29).

CORD-19 was also used for the training of CoQUAD, a question-answering system, designed to find the most recent evidence and answer any related questions (30).

A web-based chatbot that produces high-quality responses to COVID-19-related questions was also developed, this user-friendly approach was chosen to make the LLM more accessible to the general audience (31).

LLMs have also been used for abstract screening for systematic reviews, this allows the use of unlabelled data in the initial step of scanning abstracts, thus saving researcher’s time and effort (32).

LLMs facilitate the implementation of advanced code generation capabilities for statistical and data analytics, two large-scale AI-powered code generation tools have recently come into the spotlight:

OpenAI Codex, a GPT language model fine-tuned on publicly available code from GitHub, (33) and DeepMind AlphaCode, designed to address the main challenges of competitive programming (34).

On one hand, AI tools can make programmers’ jobs easier, aid in education and make programming more accessible (35).

On the other hand, the availability of AI-based code generation raises concerns: the risk of using code generation models is users’ over-reliance on the generated outputs, especially non-programmers may quickly become accustomed to auto-suggested solutions (36).

The above-described deskilling issue is not limited to coding. If we conceive a scenario in which AI is extensively used for scientific production, we must consider the risk of deskilling in researchers’ writing abilities. Some have already raised concerns about the peril of seeing the conduct of research being significantly shaped through AI, leading to a decline in the author’s ability to craft meaningfully and substantively her objects of study (37).

Our reflections highlight a growing interest in the use of LLMs in academic research, with the release of ChatGPT this interest has only increased (38).

Foundation models and the next era of AI

4. The revolution of ChatGPT and the potential impact on scientific literature production

The user-centered approach of ChatGPT is the paradigm shift that makes it different from previous LLMs. The revolutionary impact of ChatGPT does not lie in its technical content, which appears to be merely a different methodology for training, but in the different perspective that it is bringing. ChatGPT will probably be overtaken soon, but the idea of making AI accessible to the broader community and putting the user at the center will stand.

Google AI chief: The promise of multi-modal learning

The accessibility and user-friendly interface of ChatGPT could induce researchers to use it more extensively than previous LLMs. ChatGPT offers the opportunity to streamline the work of researchers, providing valuable support throughout the scientific process, from suggesting research questions to generate hypotheses. Its ability to write scripts in multiple programming languages and provide clear explanations of how the code works, makes it a useful asset for improving understanding and efficiency. ChatGPT output to our prompt requesting to make examples of its abilities to support researchers in suggesting research questions, generate hypotheses, writing scripts in multiple programming languages and providing explanations of how the code works.


ChatGPT can also be used to suggest titles, write drafts, and help to express complex concepts in fluent and grammatically correct scientific English. This can be particularly useful for researchers who may not have a strong background in writing or who are not native English speakers. By supplementing the work of researchers, rather than replacing it, automating many of the repetitive tasks, ChatGPT may help researchers focus their efforts on the most impactful aspects of their work.


The high interest of the scientific community in this tool is demonstrated by the rapid increase in the number of papers published on this topic, shortly after its release. The use of ChatGPT in scientific literature production has already become a reality during the writing of this draft, many authors stated to have used ChatGPT to write at least part of their papers (39). This underlines how ChatGPT has already been integrated into the research process, even before addressing ethical concerns and discussing common rules. For example, ChatGPT has been listed as the first author of four papers, (26, 40, 41, 42) without considering the possibility of “involuntary plagiarism” or intellectual property issues surrounding the output of the model.

The number of pre-prints produced using ChatGPT points out that the use of this technology is inevitable and a debate in the research community is a priority (43).


5. Navigating the threats of ChatGPT in public health: AI-driven infodemic and research integrity

A potential concern related to the emergence of LLMs is the submissiveness in following users’ instructions. Despite the limitations imposed by programmers, LLMs can be easily tricked into producing text on controversial topics, including misinforming content (44).

The ability of LLMs to generate texts similar to those composed by humans could be used to create fake news articles or other seemingly legitimate but actually fabricated or misleading content, (45, 46) without the reader realizing that the text is produced by AI (47).

Under this damaging matter, the counter-offensive rises: some authors highlight the importance of creating LLM detectors that can be able to identify fake news, (48) while others propose LLMs to support the enhancement of detector performance (49). Commonly used GPT-2 detectors were flawed in recognizing text written by AI when generated by ChatGPT (50), new detectors were rapidly developed and released to address this gap, but these tools do not perform well in identifying GPT-4 generated text. This poses a continuous unfair competition to improve detectors that need to follow the pace of LLMs’ rapid advancement, leaving a gap for malicious intent.

As a result, this poses a continuous unfair competition to improve detectors that need to follow the pace of LLMs’ rapid advancement, leaving a gap for malicious intent.

The absence of accurate detectors calls for precautionary measures, for example, the International Conference on Machine Learning (ICML) for its 2023 call for papers prohibited the use of LLMs such as ChatGPT in submitted drafts. However, ICML acknowledges that currently there is not any tool to verify compliance with this rule and thus they are relying on the discretion of participants and await the development of shared policies within the scientific community.

Many scientific journals are questioning the policy matter, publishing editorials on the topic, and updating author’s guidelines (51).

For example, Springer Nature journals have been the first to add rules in the guide to authors: to avoid accountability issues, LLMs cannot be listed as authors and their use should be documented in the methods or acknowledgments sections (3). Also Elsevier created guidelines on the use of AI-assisted writing for scientific production, confirming the rules imposed by Springer and requiring the authors to specify the AI tools employed, giving details on their use. Elsevier declared to be committed to monitor the development around generative AI and to refine the policy if necessary (52).

The misuse of ChatGPT in scientific research could lead to the production of fake scientific abstracts, papers, and bibliographies. In the earlier versions of ChatGPT (up to the 15th December version), when asked to cite references to support its statements, the output was a list of fake bibliographic references. (e.g., fabricated output reference: Li, X., & Kim, Y. (2020). Fake academic papers generated by AI: A threat to the integrity of research. PLOS ONE, 15 (3), e0231891.)

Talk: What Makes Multi-modal Learning Better than Single (Provably)

The usage of real authors’ names, journals, and plausible titles makes the fake reference difficult to immediately spot. This calls for preventive actions, such as the mandatory use of the digital object identifier system (DOI), which could be used to rapidly and accurately identify fake references.

In fields where fake information can endanger people’s safety, such as medicine, journals may have to take a more rigorous approach to verify the information as accurate (53). A combined evaluation by more up-to-date AI-output detectors and human reviewers is necessary to identify AI-generated scientific abstracts and papers, though this process may be time-consuming and imperfect. We, therefore, suggest adopting a “detectable-by-design” policy: the release of new generative AI models by the tech industry to the public should be permitted only if the output generated by the AI is detectable and thus can be unequivocally identified as AI-produced. The impact that generating false and potentially mystifying texts can have on people’s health is huge. The issue of the dissemination of untruthful information has long been known: starting with the unforgettable Wakefield case and the then-generated disbelief that vaccines can cause autism, (54) to the current observation of non-conservative behavior evidenced by the various phases of the COVID-19 pandemic (55). In this context, it has been more evident than ever that junk and manipulative research, through underperforming studies or with study designs unfit to carry out the intended research objective, has had an impact on the behavior of the general population and, more worryingly, on health professionals (56).

The diffusion of misinformation conveyed through rapidly disseminated channels such as mass media and social networks, can generate the phenomenon known as infodemic (57). The consequence on the scientific framework is considerable, even with implications on possible healthcare choices, already a determining factor in the recent pandemic. (58) Infodemic can influence medical decision-making on treatment or preventive measures, (59, 60) for example some people used hydroxychloroquine as a treatment for COVID-19 based on false or unproven information, endorsed by popular and influential people (61). The risk is that we may face a new health emergency where new information can be rapidly produced using LLMs to generate human-like texts ready to spread even if incorrect or manipulated. The concept of infodemic was introduced in 2003 by Rothkopf as an “epidemic of information,” (62) and evolved in 2020 after the COVID19 pandemic, integrating the element of rapid misinformation spreading (63). With the global diffusion of LLMs the infodemic concept must evolve again into the one of “AI-driven Infodemic.” Not only is it possible to rapidly disseminate misinformation via social media platforms and other outlets, but also to produce exponentially growing amounts of health-related information, regardless of one’s knowledge, skills, and intentions. Given the nature of social media content diffusion, LLMs could be used to create content specifically designed for target population groups and in order to go viral and foster misinformation spread. We foresee a scenario in which human-like AI-produced contents will dramatically exacerbate every future health threat that can generate infodemics, that from now on will be AI-driven. Social media and gray literature have already been the ground for infodemic, (63) but scientific literature could become a new and powerful means of disinformation campaigns. The potential of LLMs and in particular ChatGPT in easily generating human-like texts, could convey excessive and, without proper control, low-quality scientific literature production in the health field. The abundance of predatory journals, that accept articles for publication without performing quality checks for issues such as plagiarism or ethical approval, (64) could allow the flooding of the scientific literature with AI-generated articles on an unprecedented scale. The consequences on the integrity of the scientific process and the credibility of the literature would be dreadful (65).


6. Discussion

Large language models have already shown hints of their potential in supporting scientific research and in the next months we expect a growing amount of papers talking about the use of ChatGPT in this field.

The accessibility and astonishing abilities of ChatGPT made it popular across the world and allowed it to achieve a milestone, setting AI conversational tools to the next level.

But soon after its release possible threats emerged, ChatGPT’s ability to follow user’s instruction is a double-edged sword: on one hand, this approach makes it great at interacting with humans, on the other hand being submissive ab origine exposes it to misuse, for example by generating convincing human-like misinformation.

The field of medical research may be a great source for both opportunities and threats coming from this novel approach.

Given that the scientific community has not yet determined the principles to follow for a helpful and safe use of this disruptive technology, the risks coming from the fraudulent and unethical use of LLMs in the health context cannot be ignored and should be assessed with a proactive approach.

We define the novel concept of “AI-driven infodemic,” a public health threat coming from the use of LLMs to produce a vast amount of scientific articles, fake news, and misinformative contents. The “AI-driven infodemic” is a consequence of the use of LLM’s ability to write large amounts of human-like texts in a short period of time, not only with malicious intent, but in general without any scientific ground and support. Beyond text-based content, other AI tools, such as generative-adversarial networks, could also generate audio and video Deepfakes that could be used to disseminate misinformation content, especially on social media (66). Political Deepfakes already contributed toward generalized indeterminacy and disinformation (67).

Reinforcement Learning 5: Function Approximation and Deep Reinforcement Learning

To address this public health threat is important to raise awareness and rapidly develop policies through a multidisciplinary effort, updating the current WHO public health research agenda for managing infodemics (68). There is a need for policy action to ensure that the benefits of LLMs are not outweighed by the risks they pose. In this context, we propose the detectable-by-design approach, which involves building LLMs with features that make it easier to detect when they are being used to produce fake news or scientific articles. However, implementing this approach could slow down the development process of LLMs, and for this reason, it might not be readily accepted by AI companies. The constitution of groups of experts inside health international agencies (e.g., WHO, ECDC) dedicated to monitor the use of LLMs for fake news and scientific articles production is needed, as the scenario is rapidly evolving and the AI-driven infodemic threat is forthcoming. Such groups could work closely with AI companies to develop effective strategies for detecting and preventing the use of LLMs for nefarious purposes. Additionally, there might be a need for greater regulation and oversight of the AI industry to ensure that LLMs are developed and used responsibly. Recently, the President of the Italian Data Protection Authority (DPA) has taken action against Open AI for serious breaches of the European legislation on personal data processing and protection (69). The DPA has imposed a temporary ban on ChatGPT in Italy due to the company’s failure to provide adequate privacy information to its users its and lack of a suitable legal basis for data collection. The absence of a suitable legal basis for data collection raises serious concerns about the ethical implications of using personal data without consent or an adequate legal framework.

In the WHO agenda, AI is considered a possible ally to fight infodemics, allowing automatic monitoring for misinformation detection; but the rise of LLMs and in particular ChatGPT should raise concerns that it could play an opposite role in this phenomenon.

LLMs will continue to improve and rapidly become precious allies for researchers, but the scientific community needs to ensure that the advances made possible by ChatGPT and other AI technologies are not overshadowed by the risks they pose. All stakeholders should foster the development and deployment of these technologies aligned with the values and interests of society. It is crucial to increase understanding of AI challenges of transparency, accountability, and fairness in order to develop effective policies. A science-driven debate to develop shared principles and legislation is necessary to shape a future in which AI has a positive impact on public health, not having such a conversation could result in a dangerous AI-fueled future (70).

Lecture 10.1: Fusion, co-learning, and new trend (Multimodal Machine Learning, CMU)

Introducing PaLM 2

When you look back at the biggest breakthroughs in AI over the last decade, Google has been at the forefront of so many of them. Our groundbreaking work in foundation models has become the bedrock for the industry and the AI-powered products that billions of people use daily. As we continue to responsibly advance these technologies, there’s great potential for transformational uses in areas as far-reaching as healthcare and human creativity.

Over the past decade of developing AI, we’ve learned that so much is possible as you scale up neural networks — in fact, we’ve already seen surprising and delightful capabilities emerge from larger sized models. But we’ve learned through our research that it’s not as simple as “bigger is better,” and that research creativity is key to building great models. More recent advances in how we architect and train models have taught us how to unlock multimodality, the importance of having human feedback in the loop, and how to build models more efficiently than ever. These are powerful building blocks as we continue to advance the state of the art in AI while building models that can bring real benefit to people in their daily lives.

Introducing PaLM 2

Building on this work, today we’re introducing PaLM 2, our next generation language model. PaLM 2 is a state-of-the-art language model with improved multilingual, reasoning and coding capabilities.

Multilinguality: PaLM 2 is more heavily trained on multilingual text, spanning more than 100 languages. This has significantly improved its ability to understand, generate and translate nuanced text — including idioms, poems and riddles — across a wide variety of languages, a hard problem to solve. PaLM 2 also passes advanced language proficiency exams at the “mastery” level.

Reasoning: PaLM 2’s wide-ranging dataset includes scientific papers and web pages that contain mathematical expressions. As a result, it demonstrates improved capabilities in logic, common sense reasoning, and mathematics.

Coding: PaLM 2 was pre-trained on a large quantity of publicly available source code datasets. This means that it excels at popular programming languages like Python and JavaScript, but can also generate specialized code in languages like Prolog, Fortran and Verilog.

A versatile family of models

Even as PaLM 2 is more capable, it’s also faster and more efficient than previous models — and it comes in a variety of sizes, which makes it easy to deploy for a wide range of use cases. We’ll be making PaLM 2 available in four sizes from smallest to largest: Gecko, Otter, Bison and Unicorn. Gecko is so lightweight that it can work on mobile devices and is fast enough for great interactive applications on-device, even when offline. This versatility means PaLM 2 can be fine-tuned to support entire classes of products in more ways, to help more people.

Powering over 25 Google products and features

At I/O today, we announced over 25 new products and features powered by PaLM 2. That means that PaLM 2 is bringing the latest in advanced AI capabilities directly into our products and to people — including consumers, developers, and enterprises of all sizes around the world. Here are some examples:

PaLM 2’s improved multilingual capabilities are allowing us to expand Bard to new languages, starting today. Plus, it’s powering our recently announced coding update.

Workspace features to help you write in Gmail and Google Docs, and help you organize in Google Sheets are all tapping into the capabilities of PaLM 2 at a speed that helps people get work done better, and faster.

Med-PaLM 2, trained by our health research teams with medical knowledge, can answer questions and summarize insights from a variety of dense medical texts. It achieves state-of-the-art results in medical competency, and was the first large language model to perform at “expert” level on U.S. Medical Licensing Exam-style questions. We're now adding multimodal capabilities to synthesize information like x-rays and mammograms to one day improve patient outcomes. Med-PaLM 2 will open up to a small group of Cloud customers for feedback later this summer to identify safe, helpful use cases.

Sec-PaLM is a specialized version of PaLM 2 trained on security use cases, and a potential leap for cybersecurity analysis. Available through Google Cloud, it uses AI to help analyze and explain the behavior of potentially malicious scripts, and better detect which scripts are actually threats to people and organizations in unprecedented time.

Since March, we've been previewing the PaLM API with a small group of developers. Starting today, developers can sign up to use the PaLM 2 model, or customers can use the model in Vertex AI with enterprise-grade privacy, security and governance. PaLM 2 is also powering Duet AI for Google Cloud, a generative AI collaborator designed to help users learn, build and operate faster than ever before.

Advancing the future of AI

PaLM 2 shows us the impact of highly capable models of various sizes and speeds — and that versatile AI models reap real benefits for everyone. Yet just as we’re committed to releasing the most helpful and responsible AI tools today, we’re also working to create the best foundation models yet for Google.

Our Brain and DeepMind research teams have achieved many defining moments in AI over the last decade, and we’re bringing together these two world-class teams into a single unit, to continue to accelerate our progress. Google DeepMind, backed by the computational resources of Google, will not only bring incredible new capabilities to the products you use every day, but responsibly pave the way for the next generation of AI models.

We’re already at work on Gemini — our next model created from the ground up to be multimodal, highly efficient at tool and API integrations, and built to enable future innovations, like memory and planning. Gemini is still in training, but it’s already exhibiting multimodal capabilities never before seen in prior models. Once fine-tuned and rigorously tested for safety, Gemini will be available at various sizes and capabilities, just like PaLM 2, to ensure it can be deployed across different products, applications, and devices for everyone’s benefit.

Massive Update of Chat GPT! - Artificial Intelligence

PaLM-E: An embodied multimodal language model

Recent years have seen tremendous advances across machine learning domains, from models that can explain jokes or answer visual questions in a variety of languages to those that can produce images based on text descriptions. Such innovations have been possible due to the increase in availability of large scale datasets along with novel advances that enable the training of models on these data. While scaling of robotics models has seen some success, it is outpaced by other domains due to a lack of datasets available on a scale comparable to large text corpora or image datasets.

Today we introduce PaLM-E, a new generalist robotics model that overcomes these issues by transferring knowledge from varied visual and language domains to a robotics system. We began with PaLM, a powerful large language model, and “embodied” it (the “E” in PaLM-E), by complementing it with sensor data from the robotic agent. This is the key difference from prior efforts to bring large language models to robotics — rather than relying on only textual input, with PaLM-E we train the language model to directly ingest raw streams of robot sensor data. The resulting model not only enables highly effective robot learning, but is also a state-of-the-art general-purpose visual-language model, while maintaining excellent language-only task capabilities.

How ChatGPT is Trained

An embodied  language model, and also a visual-language generalist

On the one hand, PaLM-E was primarily developed to be a model for robotics, and it solves a variety of tasks on multiple types of robots and for multiple modalities (images, robot states, and neural scene representations). At the same time, PaLM-E is a generally-capable vision-and-language model. It can perform visual tasks, such as describing images, detecting objects, or classifying scenes, and is also proficient at language tasks, like quoting poetry, solving math equations or generating code.

PaLM-E combines our most recent large language model, PaLM, together with one of our most advanced vision models, ViT-22B. The largest instantiation of this approach, built on PaLM-540B, is called PaLM-E-562B and sets a new state of the art on the visual-language OK-VQA benchmark, without task-specific fine-tuning, and while retaining essentially the same general language performance as PaLM-540B.

How does PaLM-E work?

Technically, PaLM-E works by injecting observations into a pre-trained language model. This is realized by transforming sensor data, e.g., images, into a representation through a procedure that is comparable to how words of natural language are processed by a language model.

Language models rely on a mechanism to represent text mathematically in a way that neural networks can process. This is achieved by first splitting the text into so-called tokens that encode (sub)words, each of which is associated with a high-dimensional vector of numbers, the token embedding. The language model is then able to apply mathematical operations (e.g., matrix multiplication) on the resulting sequence of vectors to predict the next, most likely word token. By feeding the newly predicted word back to the input, the language model can iteratively generate a longer and longer text.

The inputs to PaLM-E are text and other modalities — images, robot states, scene embeddings, etc. — in an arbitrary order, which we call "multimodal sentences". For example, an input might look like, "What happened between <img_1> and <img_2>?", where <img_1> and <img_2> are two images. The output is text generated auto-regressively by PaLM-E, which could be an answer to a question, or a sequence of decisions in text form.

PaLM-E model architecture, showing how PaLM-E ingests different modalities (states and/or images) and addresses tasks through multimodal language modeling.

The idea of PaLM-E is to train encoders that convert a variety of inputs into the same space as the natural word token embeddings. These continuous inputs are mapped into something that resembles "words" (although they do not necessarily form discrete sets). Since both the word and image embeddings now have the same dimensionality, they can be fed into the language model.

We initialize PaLM-E for training with pre-trained models for both the language (PaLM) and vision components (Vision Transformer, a.k.a. ViT). All parameters of the model can be updated during training.

Transferring knowledge from large-scale training to robots

PaLM-E offers a new paradigm for training a generalist model, which is achieved by framing robot tasks and vision-language tasks together through a common representation: taking images and text as input, and outputting text. A key result is that PaLM-E attains significant positive knowledge transfer from both the vision and language domains, improving the effectiveness of robot learning.

Positive transfer of knowledge from general vision-language tasks results in more effective robot learning, shown for three different robot embodiments and domains.

Results show that PaLM-E can address a large set of robotics, vision and language tasks simultaneously without performance degradation compared to training individual models on individual tasks. Further, the visual-language data actually significantly improves the performance of the robot tasks. This transfer enables PaLM-E to learn robotics tasks efficiently in terms of the number of examples it requires to solve a task.

An introduction to Policy Gradient methods - Deep Reinforcement Learning

Results

We evaluate PaLM-E on three robotic environments, two of which involve real robots, as well as general vision-language tasks such as visual question answering (VQA), image captioning, and general language tasks. When PaLM-E is tasked with making decisions on a robot, we pair it with a low-level language-to-action policy to translate text into low-level robot actions.

In the first example below, a person asks a mobile robot to bring a bag of chips to them. To successfully complete the task, PaLM-E produces a plan to find the drawer and open it and then responds to changes in the world by updating its plan as it executes the task. In the second example, the robot is asked to grab a green block. Even though the block has not been seen by that robot, PaLM-E still generates a step-by-step plan that generalizes beyond the training data of that robot. 

PaLM-E controls a mobile robot operating in a kitchen environment. Left: The task is to get a chip bag. PaLM-E shows robustness against adversarial disturbances, such as putting the chip bag back into the drawer. Right: The final steps of executing a plan to retrieve a previously unseen block (green star). This capability is facilitated by transfer learning from the vision and language models.

In the second environment below, the same PaLM-E model solves very long-horizon, precise tasks, such as “sort the blocks by colors into corners,” on a different type of robot. It directly looks at the images and produces a sequence of shorter textually-represented actions — e.g., “Push the blue cube to the bottom right corner,” “Push the blue triangle there too.” — long-horizon tasks that were out of scope for autonomous completion, even in our own most recent models. We also demonstrate the ability to generalize to new tasks not seen during training time (zero-shot generalization), such as pushing red blocks to the coffee cup. 

PaLM-E controlling a tabletop robot to successfully complete long-horizon tasks.

The third robot environment is inspired by the field of task and motion planning (TAMP), which studies combinatorially challenging planning tasks (rearranging objects) that confront the robot with a very high number of possible action sequences. We show that with a modest amount of training data from an expert TAMP planner, PaLM-E is not only able to also solve these tasks, but it also leverages visual and language knowledge transfer in order to more effectively do so.

Deep RL Bootcamp Frontiers Lecture I: Recent Advances, Frontiers and Future of Deep RL

      

PaLM-E produces plans for a task and motion planning environment.

As a visual-language generalist, PaLM-E is a competitive model, even compared with the best vision-language-only models, including Flamingo and PaLI. In particular, PaLM-E-562B achieves the highest number ever reported on the challenging OK-VQA dataset, which requires not only visual understanding but also external knowledge of the world. Further, this result is reached with a generalist model, without fine-tuning specifically on only that task.

PaLM-E exhibits capabilities like visual chain-of-thought reasoning in which the model breaks down its answering process in smaller steps, an ability that has so far only been demonstrated in the language-only domain. The model also demonstrates the ability to perform inference on multiple images although being trained on only single-image prompts. The image of the New York Knicks and Boston Celtics is under the terms CC-by-2.0 and was posted to Flickr by kowarski. The image of Kobe Bryant is in the Public Domain. The other images were taken by us.

Conclusion

PaLM-E pushes the boundaries of how generally-capable models can be trained to simultaneously address vision, language and robotics while also being capable of transferring knowledge from vision and language to the robotics domain. There are additional topics investigated in further detail in the paper, such as how to leverage neural scene representations with PaLM-E and also the extent to which PaLM-E, with greater model scale, experiences less catastrophic forgetting of its language capabilities.

PaLM-E not only provides a path towards building more capable robots that benefit from other data sources, but might also be a key enabler to other broader applications using multimodal learning, including the ability to unify tasks that have so far seemed separate.

More Information:

https://wiki.pathmind.com/deep-reinforcement-learning

https://magazine.sebastianraschka.com/p/ahead-of-ai-7-large-language-models

https://primo.ai/index.php?title=Large_Language_Model_%28LLM%29

































Data Processing Unit - How is helps in ExaScale Computing

$
0
0

 

Data Processing Unit: What the Heck is it?

Are you an IT Professional or technology enthusiast looking to learn more about data processing units (DPU)? Then this blog is for you!

In it, we’ll explore what a DPU is, how it is used in data processing, and its importance in the tech world. Learn more to stay abreast of the latest advancements in data processing technology.

WHAT IS A DATA PROCESSING UNIT (DPU)?

A DPU, or data processing unit, is a new class of programmable processor that combines an industry-standard, high-performance, software-programmable multi-core CPU, a high-performance network interface, and flexible and programmable acceleration engines. It is designed to move data around the data center and do data processing. DPUs are a new pillar of computing that joins CPUs and GPUs as one of the three processing units used in data centers.

The Data Processing Unit

WHAT IS THE DIFFERENCE BETWEEN A CPU, GPU AND A DPU?

A CPU, GPU, and DPU are all processing units used in data centers, but they differ in their functions.

A CPU is the central processing unit that executes instructions encompassing a computer program. A GPU is for accelerated computing and performs parallel operations rather than serial operations.

A DPU moves data around the data center and does data processing. A DPU is a new class of programmable processor that combines an industry-standard, high-performance, software-programmable multi-core CPU, a high-performance network interface, and flexible and programmable acceleration engines. Here are the differences between them:

CPU:

Executes instructions encompassing a computer program

Flexible and responsive

The most ubiquitous processor

GPU:

Performs parallel operations rather than serial operations

Intended to process computer graphics but can handle other types of complex workloads such as supercomputing, AI, machine learning

Accelerates computing

DPU:

Moves data around the data center

Does data processing

Combines an industry-standard, high-performance, software-programmable multi-core CPU, a high-performance network interface, and flexible and programmable acceleration engines

Introduction to Developing Applications with NVIDIA DOCA on BlueField DPUs

WHAT ARE THE BENEFITS OF DPU’s?

Data Processing Units (DPUs) offer numerous benefits that can elevate computing systems to perform at optimal levels and effectively manage diverse workloads. DPUs enhance system efficiency, scalability, security, and flexibility.

DPUs help drive a more efficient and accelerated data processing pace across the entire network.

They improve workload management by offloading CPU-intensive tasks to allow CPUs to handle other workloads.

DPUs enable faster data transfers as they alleviate network congestion while reducing latency.

DPUs enhance security by isolating essential services, improving threat detection, and isolation in case of a breach.

They offer greater flexibility in designing custom networks that cater to individual computing requirements due to their programmable nature.

DPUs accelerate machine learning operations and accelerate GPU-based analytics while reducing overall costs of such operations through increased efficiency.

Moreover, DPUs eliminate network bottlenecks and improve data center performance reliability through distributed processing. By leveraging this advanced technology, corporations can realize digital transformation by executing critical workload management with remarkable innovation and effectiveness.

NOTE

A true fact — According to data released on July 2020 from Market Research Future (MRFR), the global DPU market is expected to grow at an outstanding rate during the forecast period of 2020–2027.

Why wait for the future when you can process data like a pro today, thanks to Data Processing Units (DPUs)!

WHAT ARE SOME POTENTIAL DRAWBACKS OR LIMITATIONS OF USING DPU’S IN A DATA CENTER ENVIRONMENT?

DPUs have many benefits in modern data centers, but there are also some potential drawbacks and limitations to consider. Some of these include:

Cost: DPUs can be expensive, and organizations may need to invest in new hardware to support them. This can be a significant upfront cost that may not be feasible for all organizations.

Integration: DPUs need to be integrated into the existing data center infrastructure, which can be a complex process. This may require additional resources and expertise to ensure a smooth integration.

Compatibility: Not all applications and workloads are compatible with DPUs. Organizations need to carefully evaluate their use cases and determine whether DPUs are the right fit for their needs.

Security: While DPUs can improve security by offloading security tasks from the main CPU, they can also introduce new security risks if not properly configured and managed.

Scalability: DPUs can help improve scalability by offloading network and communication workloads from the CPU, but they may not be able to keep up with rapidly growing workloads in all cases.

Overall, while DPUs offer many benefits in terms of performance, security, and scalability, organizations need to carefully evaluate their use cases and consider the potential drawbacks before investing in this technology.

From Petascale to Exascale Computing


WHAT ARE SOME POTENTIAL SECURITY RISKS ASSOCIATED WITH USING DPU’S?

DPUs can improve security by offloading security tasks from the main CPU, but they can also introduce new security risks if not properly configured and managed. Some potential security risks associated with using DPUs are:

Misconfiguration: DPUs need to be properly configured to ensure that they are secure. Misconfiguration can lead to vulnerabilities that can be exploited by attackers.

Lack of visibility: DPUs can make it more difficult to monitor and detect security threats because they operate independently of the main CPU. This can make it harder to identify and respond to security incidents.

Attack surface: DPUs can increase the attack surface of a system because they introduce new hardware and software components that need to be secured. This can make it more difficult to ensure that all components are secure.

Compatibility issues: Not all security tools and applications are compatible with DPUs. This can limit the ability of organizations to use their existing security tools and may require them to invest in new tools that are compatible with DPUs.

Complexity: DPUs can add complexity to a system, which can make it more difficult to manage and secure. This can increase the risk of human error and make it harder to ensure that all components are properly secured.

WHEN WILL DATA PROCESSING UNITS BE AVAILABLE?

Data processing units (DPUs) will be available in the market in the coming months. DPUs are programmable processors specifically designed to process large amounts of data efficiently. Due to their high processing power, they are increasingly being used in data centers, cloud computing, and AI applications to accelerate data transfer and improve overall system performance.

DPUs are becoming an essential component for companies managing large amounts of data due to their ability to handle a diverse range of workloads. As such, many technology firms have started developing their own DPUs.

However, industry experts predict that there will be a shortage of DPUs when they initially launch on the market. This shortage can be attributed to the ongoing semiconductor chip shortage that has impacted various industries worldwide. It is expected that by 2022, the availability of DPUs will increase as more semiconductor foundries produce these chips.

NOTE

According to a report by MarketsandMarkets™, the global DPU market size is projected to grow from USD 1.5 billion in 2020 to USD 4.9 billion by 2025 at a CAGR of 26.7%. This growth is fueled by increased demand for advanced computing capabilities driven by cloud computing adoption, shift towards AI workloads, and the rapid growth of Big Data analytics.

Ready to upgrade your data game? Here’s how to get your hands on the ultimate processing power tool.

DPUs, or Data Processing Units, are specialized processors designed to offload networking and security tasks from the main CPU of a computer or server. They are a new class of reprogrammable high-performance processors combined with high-performance network interfaces and flexible and programmable acceleration engines.

DPUs can improve performance, reduce latency in network applications, and support virtualization and containerization. They can also help organizations create architectures to support next-gen reliability, performance, and security requirements. DPUs are expected to play an increasingly important role in modern data centers and cloud computing environments as the demand for high-performance, secure, and scalable networks continues to grow.

While DPUs offer many benefits, they can also introduce new security risks if not properly configured and managed. Overall, DPUs are a promising technology that can help organizations optimize their network performance and security while reducing costs.

2021 ECP Annual Meeting: ADIOS Storage and in situ I/O


FREQUENTLY ASKED QUESTIONS

Q: What is a Data Processing Unit (DPU)?

A: A Data Processing Unit (DPU) is a specialized hardware component used for accelerating and optimizing data processing tasks in computer systems.

Q: How does a DPU work?

A: A DPU works by offloading compute-intensive and data-intensive tasks from the CPU to a specialized hardware accelerator, resulting in increased performance and efficiency.

Q: What are some common use cases of DPUs?

A: DPUs are commonly used in data center applications, including big data processing, machine learning, and artificial intelligence. They can also be found in edge computing devices, such as routers and gateways.

Q: What are the benefits of using a DPU?

A: The benefits of using a DPU include increased performance, reduced latency, improved energy efficiency, and enhanced security.

Q: Can a DPU be added to an existing computer system?

A: Yes, it is possible to add a DPU to an existing computer system, provided that the system is compatible with the DPU and has the necessary hardware expansion slots.

Q: What are some examples of DPUs on the market?

A: Some popular DPUs on the market include NVIDIA’s Tensor Core GPU, Google’s Tensor Processing Unit (TPU), and Intel’s FPGA-based Arria 10 GX DPU.

What Is A DPU (Data Processing Unit)? 

Data processing units, commonly known as DPUs, are a new class of reprogrammable high-performance processors combined with high-performance network interfaces that are optimized to perform and accelerate network and storage functions carried out by data center servers. DPUs plug into a server’s PCIe slot just as a GPU would, and they allow servers to offload network and storage functions from the CPU to the DPU, allowing the CPU to focus only on running the operating systems and system applications. DPUs often use a reprogrammable FPGA combined with a network interface card to accelerate network traffic the same way that GPUs are being used to accelerate artificial intelligence (AI) applications by offloading mathematical operations from the CPU to the GPU. In the past, GPUs were used to deliver rich, real-time graphics. This is so because they can process large amounts of data in parallel making them ideal for accelerating AI workloads, such as machine learning and deep learning, and other artificial intelligence workloads.

DPU Accelerated Servers will become extremely popular in the future thanks to their ability to offload network functions from the CPU to the DPU, freeing up precious CPU processing power, allowing the CPU to run more applications, and run the operating system as efficiently as possible without being bogged down by handling network activities. In fact, some experts claim that 30% of CPU processing power goes towards handling network and storage functions. Offloading storage and network functions to the DPU frees up precious CPU processing power for functions such as virtual or containerized workloads. Additionally, DPUs can be used to handle functions that include network security, firewall tasks, encryption, and infrastructure management. 

DPUs will become the third component in data center servers along with CPU (central processing units) and GPUs (graphics processing units) because of their ability to accelerate and perform network and storage functions. The CPU would be used for general-purpose computing. The GPU would be used to accelerate artificial intelligence applications. The DPU in a DPU equipped server would be used to process data and move data around the data center. 

Overall, DPUs have a bright future thanks to the ever-increasing amount of data stored in data centers, requiring a solution that can accelerate storage and networking functions performed by high-performance data center servers. DPUs can breathe new life into existing servers because they can reduce the CPU utilization of servers by offloading network and storage functions to the DPU. Estimates indicate that 30% of CPU utilization goes towards networking functions, so moving them to the DPU will provide you with extra CPU processing power. Thus, DPUs can extend the life of your servers for months or even years, depending on how much of your system’s resources are being used for network functions. 

WHAT IS A DPU?! | Server Factory Explains

What Are The Components Of A DPU? 

A DPU is a system on a chip that is made from three primary elements. First, data processing units typically have a multi-core CPU that is software programmable. The second element is a high-performance network interface that enables the DPU to parse, process, and efficiently move data through the network. The third element is a rich set of flexible, programmable acceleration engines that offload network and storage functions from the CPU to the DPU. DPUs are often integrated with smart NICs offering powerful network data processing. 

Nvidia is leading the way when it comes to DPUs, recently releasing the Nvidia Bluefield 2 DPU, which is the world’s first data infrastructure on chip architecture, optimized for modern data centers. The Bluefield 2 DPU allows data center servers to offload network and storage functions from the CPU to the DPU, allowing the DPU to handle mundane storage and network functions. 

Nvidia DPUs are accessible through the DOCA SDK, enabling a programmable API for DPU hardware. DOCA enables organizations to program DPUs to accelerate data processing for moving data in and out of servers, virtual machines, and containers. DPUs accelerate network functions and handle east-west traffic associated with VMs and containers and north-south traffic flowing in and out of data centers. That said, where DPUs shine is in moving data within a data center because they are optimized for data movement.  

Furthermore, Nvidia states that DPUs are capable of offloading and accelerating all data center security services. This is so because they include next-generation firewalls, micro-segmentation, data encryption capabilities, and intrusion detection. In the past, security was handled by software utilizing x86 CPUs; however, security can be offloaded to DPUs, freeing up CPU resources for other tasks. 

2021 ECP Annual Meeting - ADIOS User's BOF

What Are The Most Common Features Of DPUs? 

DPUs have a ton of features, but here are the most common features that are found on DPUs: 

  • High-speed connectivity via one or multiple 100 Gigabit to 200 Gigabit interfaces 
  • High-speed packet processing 
  • Multi-core processing via ARM or MIPS based CPUs (8x 64-bit Arm CPU Cores)
  • Memory controllers offering support for DDR4 and DDR5 RAM 
  • Accelerators 
  • PCI Express Gen 4 Support 
  • Security features 
  • Custom operating system separated from the host system’s OS 

What Are Some Of The Most Common DPU Solutions? 

Nvidia has released a DPU known as the Nvidia Mellanox BlueField 2 DPU and the BlueField 2X DPU. The BlueField 2X DPU has everything that the BlueField 2 DPU has, plus an additional Ampere GPU, enabling artificial intelligence functionality on the DPU. Nvidia included a GPU on its DPU to handle security, network, and storage management. For example, machine learning or deep learning can run on the data processing unit itself and be used to identify and stop an attempted network breach. Furthermore, Nvidia has stated that it intends to launch Bluefield 3 in 2022 and Bluefield 4 in 2023.  

Companies such as Intel and Xilinx are introducing some DPUs into the space. That said, some of the offerings from Xilinx and Intel are known as SmartNICs. SmartNICs from Xilinx and Intel utilize FPGAs to accelerate network and storage functions. Smart NICs work the same way as do data processing units in that they offload network functions from the CPU to the SmartNIC, freeing up processing power by intelligently delegating network and storage functions to the SmartNIC. FPGAs bring parallelism and customization to the data path because of the reprogrammable nature of FPGAs.  

For example, Xilinx offers the ALVEO series of SmartNICs with various products, and Intel and its partners offer several FPGA-based SmartNIC solutions to accelerate data processing workloads in large data centers. Intel claims that its SmartNICs “boost data center performance levels by offloading switching, storage, and security functionality onto a single PCIe platform that combines both Intel FPGAs and Intel Xeon Processors.” Intel offers a second newer Smart NIC solution known as the Silicom FPGA SmartNIC N5010, which combines an Intel Stratix 10 FPGA with an Intel Ethernet 800 Series Adapter, providing organization with 4x 100 Gigabit Ethernet Ports, offering plenty of bandwidth for data centers. 

The U.S. Exascale Computing Project

Why Are DPUs Increasing In Popularity? 

We live in a digital information age where tons of data is being generated daily. This is especially true as the number of IoT devices, autonomous vehicles, connected homes, and connect workplaces come online, saturating data centers with data. So, there is a need for solutions that can enable data centers to cope with the ever-increasing amount of data moving in/out of data centers and the data moving through a data center. 

DPUs contain a data movement system that accelerates data movement and processing operations, offloading networking functions from a server’s processor to the DPU. DPUs are a great way for extracting more processing power out of a server, especially when considering that Moore’s Law has slowed down, pushing organizations to use hardware accelerators to gain more performance from their hardware, reducing an organization’s total cost of ownership since more performance can be extracted from existing hardware, allowing a server to perform more application workloads. 

Data processing units and FPGA SmartNICs are gaining popularity, with Microsoft and Google exploring bringing them to their data centers to accelerate data processing and artificial intelligence workloads. Moreover, Nvidia has partnered with VMware to offload networking, security, and storage tasks to the DPU. 

What Are Some Other Performance Accelerators? 

We will now discuss some of the other performance accelerators that are often used in data centers. The performance accelerators that we will discuss include GPUs (graphics processing units), computational storage, and FPGA (field-programmable gate arrays). 

1. Graphics Processing Units (GPUs) 

Graphics processing units are often deployed in high-performance servers in data centers to accelerate workloads. A server will often offload complicated mathematical calculations to the GPU because the GPU can perform them faster. This is so because GPUs employ a parallel architecture, which is made from many smaller cores than CPUs, enabling them to handle many tasks in parallel, which allows organizations to extract more performance from servers.

Source Credit (Nvidia) 

For example, the average CPU has anywhere between four to ten cores, while GPUs have hundreds or thousands of smaller cores that operate together to tackle complex calculations in parallel. As such, GPUs are different from CPUs, which have fewer cores and are more suitable for sequential data processing. GPU accelerated servers are great for high-resolution video editing, medical imaging, artificial intelligence, machine learning training, and deep learning training. 

GPUs installed on data center servers are great for accelerating deep learning training and machine learning training which require a lot of computation power that CPUs simply do not offer. GPUs perform artificial intelligence tasks quicker than CPUs because they are equipped with HBM (high bandwidth memory and hundreds or thousands of cores that can perform floating-point arithmetic significantly faster than traditional CPUs.  

For these reasons, organizations use GPUs to train deep learning and machine learning models. The larger the data set and the larger the neural network, the more likely an organization will need a GPU to accelerate the workloads. Although CPUs can perform deep learning training and machine learning training, it takes them a long time to complex computations. There are situations where deep learning training takes a few hours; however, performing the same task using only a CPU may take a few days to a few weeks instead of just a few hours. 

Moreover, adding GPUs to data center servers provides significantly better data throughput and offers the ability to process and analyze data with as little latency as possible. Latency refers to the amount of time required to complete a given task, and data throughput refers to the number of tasks completed per unit of time. 

2. Computational Storage Devices (CSD) 

Computational storage has made its way into data centers as a performance accelerator. Computational storage processes data at the storage device level, reducing data moving between the CPU and the storage device. Computational storage enables real-time data analysis and improves a system’s performance by reducing input/output bottlenecks. Computational storage devices look the same as regular storage devices, but they include a multi-core processor that’s used to perform functions such as indexing data as it enters the storage devices and search the storage devices for specific entries. 

Source Credit (AnandTech)

Computational storage devices are increasing in popularity due to the growing need to process and analyze data in real-time. Real-time data processing and analysis is possible because the data no longer has to move between the storage device and the CPU. Instead, the data is processed on the storage device itself. Bringing compute power to storage media at the exact location where the data is located enables real-time analysis and decision making.  

What is CPU,GPU and TPU? Understanding these 3 processing units using Artificial Neural Networks.

3. FPGA (Field Programmable Gate Array) 

Source Credit (Xilinx)

An FPGA is an integrated circuit that is made from logic blocks, I/O cells, and other resources that allow users to reprogram and reconfigure the chip in different ways according to the specific requirements of the workload you want it to perform. FPGAs are gaining popularity for performing deep learning inference processing and machine learning inference. Additionally, FPGA-based SmartNICs are being used because of their ability to offload network and storage functions from the CPU to the SmartNIC. Network and storage functions can place a significant burden on a system’s CPU, so offloading these functions to a SmartNIC frees up precious CPU processing power to run the OS and other critical applications. FPGA based SmartNICs allow organizations to optimize the SmartNIC for the specific workload that’s going to be offloaded to the SmartNIC, providing customizability that’s difficult to find elsewhere. 

Bottom Line

At this point, it should come as no surprise that DPUs (data processing units) are gaining popularity in high-performance data center servers due to their ability to offload storage and network functions to the DPU, allowing the processor to focus on running the operating system and revenue generating applications. Premio offers a number of DPU servers that utilize DPUs to make servers more powerful by offloading data processing, network functions, and storage functions from the CPU to the DPU. Nvidia claims that a single BlueField 2 Data Processing Unit can handle the same data center services that would require 125 CPU cores, allowing DPU servers to work smart and not harder. So, if you’re interested in buying DPU servers, feel free to contact our DPU server professionals. They will be more than happy to assist you with choosing or customizing a solution that meets your specific requirements.

High Performance Storage at Exascale

What’s a DPU? 

Specialists in moving data in data centers, DPUs, or data processing units, are a new class of programmable processor and will join CPUs and GPUs as one of the three pillars of computing.

Of course, you’re probably already familiar with the central processing unit. Flexible and responsive, for many years CPUs were the sole programmable element in most computers.

More recently the GPU, or graphics processing unit, has taken a central role. Originally used to deliver rich, real-time graphics, their parallel processing capabilities make them ideal for accelerated computing tasks of all kinds. Thanks to these capabilities, GPUs are essential to artificial intelligence, deep learning and big data analytics applications.

Over the past decade, however, computing has broken out of the boxy confines of PCs and servers — with CPUs and GPUs powering sprawling new hyperscale data centers.

These data centers are knit together with a powerful new category of processors. The DPU has become the third member of the data-centric accelerated computing model.

“This is going to represent one of the three major pillars of computing going forward,” NVIDIA CEO Jensen Huang said during a talk earlier this month.

“The CPU is for general-purpose computing, the GPU is for accelerated computing, and the DPU, which moves data around the data center, does data processing.”

What's a DPU?

  • System on a chip that combines:
  • Industry-standard, high-performance, software-programmable multi-core CPU
  • High-performance network interface
  • Flexible and programmable acceleration engines
  • CPU v GPU v DPU: What Makes a DPU Different? 

A DPU is a new class of programmable processor that combines three key elements. A DPU is a system on a chip, or SoC, that combines:

An industry-standard, high-performance, software-programmable, multi-core CPU, typically based on the widely used Arm architecture, tightly coupled to the other SoC components.

A high-performance network interface capable of parsing, processing and efficiently transferring data at line rate, or the speed of the rest of the network, to GPUs and CPUs.

A rich set of flexible and programmable acceleration engines that offload and improve applications performance for AI and machine learning, zero-trust security, telecommunications and storage, among others.

All these DPU capabilities are critical to enable an isolated, bare-metal, cloud-native computing platform that will define the next generation of cloud-scale computing.

DPU vs SmartNIC vs Exotic FPGAs A Guide to Differences and Current DPUs

DPUs Incorporated into SmartNICs

The DPU can be used as a stand-alone embedded processor. But it’s more often incorporated into a SmartNIC, a network interface controller used as a critical component in a next-generation server.

Other devices that claim to be DPUs miss significant elements of these three critical capabilities.

 DPUs, or data processing units, can be used as a stand-alone embedded processor, but they’re more often incorporated into a SmartNIC, a network interface controller that’s used as a key component in a next generation server.

DPUs can be used as a stand-alone embedded processor, but they’re more often incorporated into a SmartNIC, a network interface controller used as a key component in a next-generation server.

For example, some vendors use proprietary processors that don’t benefit from the broad Arm CPU ecosystem’s rich development and application infrastructure.

Others claim to have DPUs but make the mistake of focusing solely on the embedded CPU to perform data path processing.

A Focus on Data Processing

That approach isn’t competitive and doesn’t scale, because trying to beat the traditional x86 CPU with a brute force performance attack is a losing battle. If 100 Gigabit/sec packet processing brings an x86 to its knees, why would an embedded CPU perform better?

Instead, the network interface needs to be powerful and flexible enough to handle all network data path processing. The embedded CPU should be used for control path initialization and exception processing, nothing more.

At a minimum, there 10 capabilities the network data path acceleration engines need to be able to deliver:

  1. Data packet parsing, matching and manipulation to implement an open virtual switch (OVS)
  2. RDMA data transport acceleration for Zero Touch RoCE
  3. GPUDirect accelerators to bypass the CPU and feed networked data directly to GPUs (both from storage and from other GPUs)
  4. TCP acceleration including RSS, LRO, checksum, etc.
  5. Network virtualization for VXLAN and Geneve overlays and VTEP offload
  6. Traffic shaping “packet pacing” accelerator to enable multimedia streaming, content distribution networks and the new 4K/8K Video over IP (RiverMax for ST 2110)
  7. Precision timing accelerators for telco cloud RAN such as 5T for 5G capabilities
  8. Crypto acceleration for IPSEC and TLS performed inline, so all other accelerations are still operational
  9. Virtualization support for SR-IOV, VirtIO and para-virtualization
  10. Secure Isolation: root of trust, secure boot, secure firmware upgrades, and authenticated containers and application lifecycle management

These are just 10 of the acceleration and hardware capabilities that are critical to being able to answer yes to the question: “What is a DPU?”

What is a DPU processor (Data Processing Unit)? What Use Cases ?

So what is a DPU? This is a DPU:

 What's a DPU? This is a DPU, also known as a Data Processing Unit.

Many so-called DPUs focus on delivering just one or two of these functions.

The worst try to offload the datapath in proprietary processors.

While good for prototyping, this is a fool’s errand because of the scale, scope and breadth of data centers.

Additional DPU-Related Resources

Defining the SmartNIC: What is a SmartNIC and How to Choose the Best One

Best Smart NICs for Building the Smart Cloud: PART I

Welcome to the DPU-Enabled Data Revolution Era

Accelerating Bare Metal Kubernetes Workloads, the Right Way

Mellanox Introduces Revolutionary SmartNICs for Making Secure Cloud Possible

Achieving a Cloud Scale Architecture with SmartNICs

Provision Bare-Metal Kubernetes Like a Cloud Giant!

Kernel of Truth Podcast: Demystifying the DPU and DOCA 

Learn more on the NVIDIA Technical Blog.

Securing Next Generation Apps over VMware Cloud Foundation with Bluefield-2 DPU

More Information:

https://www.hardwarezone.com.sg/tech-news-nvidias-new-bluefield-dpus-will-accelerate-data-center-infrastructure-operations

https://www.nvidia.com/en-us/networking/products/data-processing-unit/


IBM Frontiers of Science

$
0
0





Frontier, the world’s largest supercomputer

These are the world’s top 10 fastest supercomputers

The US has retaken the top spot in the race to build the world’s fastest supercomputer.

'Frontier' is capable of more than a billion, billion operations a second, making it the first exascale supercomputer.

Supercomputers have been used to discover more about diseases including COVID-19 and cancer.

Fun fact: there might be faster supercomputers out there whose operators didn’t submit their systems to be ranked.

A huge system called Frontier has put the US ahead in the race to build the world’s fastest supercomputer. This and other of the speediest computers on the planet promise to transform our understanding of climate change, medicine and the sciences by processing vast amounts of data more quickly than would have been thought possible even a few years ago.

Leading the field in the TOP500 rankings, Frontier is also said to be the first exascale supercomputer. This means it is capable of more than a billion, billion operations a second (known as an Exaflop).

Frontier might be ahead, but it has plenty of rivals. Here are the 10 fastest supercomputers in the world today:

1. Frontier, the new number 1, is built by Hewlett Packard Enterprise (HPE) and housed at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA.

2. Fugaku, which previously held the top spot, is installed at the Riken Center for Computational Science in Kobe, Japan. It is three times faster than the next supercomputer in the top 10.

3. LUMI is another HPE system and the new number 3, crunching the numbers in Finland.

Frontier: The World's First Exascale Supercomputer Has Arrived

Have you read?

The Future of Jobs Report 2023

How to follow the Growth Summit 2023

4. Summit, an IBM-built supercomputer, is also at ORNL in Tennessee. Summit is used to tackle climate change, predict extreme weather and understand the genetic factors that influence opioid addiction.

5. Another US entry is Sierra, a system installed at the Lawrence Livermore National Laboratory in California, which is used for testing and maintaining the reliability of nuclear weapons.

6. China’s highest entry is the Sunway TaihuLight, a system developed by the National Research Center of Parallel Computer Engineering and Technology and installed in Wuxi, Jiangsu.

7. Perlmutter is another top 10 entry based on HPE technology.

8. Selene is a supercomputer currently running at AI multinational NVIDIA in the US.

9. Tianhe-2A is a system developed by China’s National University of Defence Technology and installed at the National Supercomputer Center in Guangzhou.

10. France’s Adastra is the second-fastest system in Europe and has been built using HPE and AMD technology.

Supercomputers are exceptionally high-performing computers able to process vast amounts of data very quickly and draw key insights from it. While a domestic or office computer might have just one central processing unit, supercomputers can contain thousands.

Put simply, they are bigger, more expensive and much faster than the humble personal computer. And Frontier - the fastest of the fast - has some impressive statistics.

Frontier, the world’s largest supercomputer, relies on a cooling system that uses 6,000 gallons of water a minute. Image: Oak Ridge National Laboratory/Hewlett Packard Enterprise

To achieve such formidable processing speeds, a supercomputer needs to be big. Each of Frontier’s 74 cabinets is as heavy as a pick-up truck, and the $600 million machine has to be cooled by 6,000 gallons of water a minute.

Celebrating 80 Years: Top-Secret Science

Developing vaccines

The speed of the latest generation of supercomputers can help solve some of the toughest global problems, playing a part in developing vaccines, testing car designs and modelling climate change.

In Japan, the Fugaku system was used to research COVID-19’s spike protein. Satoshi Matsuoka, director of Riken Center for Computational Science, says the calculations involved would have taken Fugaku’s predecessor system “days, weeks, multiple weeks”. It took Fugaku three hours.

Supercomputers are being used to support healthcare in the US, too. IBM says its systems support the search for new cancer treatments by quickly analysing huge amounts of detailed data about patients.

AMD-Powered Frontier Supercomputer Breaks the Exascale Barrier, Now Fastest in the World

AMD-powered systems now comprise five of the top ten fastest supercomputers

The AMD-powered Frontier supercomputer is now the first officially recognized exascale supercomputer in the world, topping 1.102 ExaFlop/s during a sustained Linpack run. That ranks first on the newly-released Top500 list of the world's fastest supercomputers as the number of AMD-powered systems on the list has expanded significantly this year. Frontier not only overtakes the previous leader, Japan's Fugaku, but blows it out of the water — in fact, Frontier is faster than the next seven supercomputers on the list, combined. Notably, while Frontier hit 1.1 ExaFlops during a sustained Linpack FP64 benchmark, the system delivers up to 1.69 ExaFlops in peak performance but has headroom to hit 2 ExaFlops after more tuning. For reference, one ExaFlop equals one quintillion floating point operations per second.  

Frontier also now ranks as the fastest AI system on the planet, dishing out 6.88 ExaFlops of mixed-precision performance in the HPL-AI benchmark. That equates to 68 million instructions per second for each of the 86 billion neurons in the brain, highlighting the sheer computational horsepower. It appears this system will compete for the AI leadership position with newly-announced AI-focused supercomputers powered by Nvidia's Arm-based Grace CPU Superchips.

AMD's Frontier Supercomputer Breaks the Exaflops Barrier

Additionally, the Frontier Test and Development (Crusher) system also placed first on the Green500, denoting that Frontier's architecture is now also the most power-efficient supercomputing architecture in the world (the primary Frontier system ranks second on the Top500). The full system delivered 52.23 GFlops per watt while consuming 21.1 MW (megawatts) of power during the qualifying benchmark run. At peak utilization, Frontier consumes 29 MW. 

The Frontier supercomputer's sheer scale is breathtaking, but is just one of many accomplishments for AMD in this year's Top500 list — AMD EPYC-powered systems now comprise five of the top ten supercomputers in the world, and ten of the top twenty. In fact, AMD's EPYC is now in 94 of the Top500 supercomputers in the world, marking a steady increase over the 73 systems listed in November 2021, and the 49 listed in June 2021. AMD also appears in more than half of the new systems on the list this year. As you can see in the above album, Intel CPUs still populate most systems on the Top500, while Nvidia GPUs also continue as the dominant accelerator. 

However, in terms of power efficiency, AMD reigns supreme in the latest Green500 list — the company powers the four most efficient systems in the world, and also has eight of the top ten and 17 of the top 20 spots. 

The Frontier supercomputer is built by HPE and is installed at the Department of Energy's (DOE) Oak Ridge National Laboratory (ORNL) in Tennessee. The system features 9,408 compute nodes, each with one 64-core AMD "Trento" CPU paired with 512 GB of DDR4 memory and four AMD Radeon Instinct MI250X GPUs. Those nodes are spread out among 74 HPE Cray EX cabinets, each weighing 8,000 pounds. All told, the system has 602,112 CPU cores tied to 4.6 petabytes of DDR4 memory.

Additionally, the 37,888 AMD MI250X GPUs feature 8,138,240 cores and have 4.6 petabytes of HBM memory (128GB per GPU). The CPUs and GPUs are tied together using the Ethernet-based HPE Cray Slingshot-11 networking fabric. The entire system uses direct watercooling to rein in heat, with 6,000 gallons of water moved through the system by 350-horsepower pumps — these pumps could fill an Olympic-sized swimming pool in 30 minutes. The water in the system runs at a balmy 85 degrees, which helps power efficiency as the system doesn't use chillers to reduce the water temperature.

The entire system is connected to an insanely performant storage subsystem with 700 petabytes of capacity, 75 TB/s of throughput, and 15 billion IOPS of performance. A metadata tier is spread out over 480 NVMe SSDs that provide 10PB of the overall capacity, while 5,400 NVMe SSDs provide 11.5PB of capacity for the primary high-speed storage tier. Meanwhile, 47,700 PMR hard drives provide 679PB of capacity. 

Assembling Frontier was a challenge unto itself, as ORNL had to source 60 million parts with 685 different part numbers to build the system. The chip shortage hit during construction, impacting 167 of those part numbers, so ORNL found itself short two million parts. AMD also ran into issues as 15 part numbers for its MI200 GPUs encountered shortages. To help circumvent the shortages, ORNL worked with the ASCR to get Defense Priorities and Allocation System (DPAS) ratings for those parts, meaning the US government invoked the Defense Act to procure the parts due to Frontier's importance to national defense.



Frontier

(Image credit: ORNL)

Even though the system currently peaks at 29 MW of power, Frontier's mechanical plant can cool up to 40 MW of computational power, or the equivalent of 30,000 US homes. The plant can be expanded up to 70 MW, leaving room for future growth. 

While Frontier gets the nod for the first officially-recognized Exascale supercomputer in the world, China is largely thought to have two Exacscale supercomputers, the Tianhe-3 and OceanLight, that broke the barrier a year ago. Unfortunately, those systems haven't been submitted to the Top500 committee due to political tensions between the US and China. However, the lack of official submissions to the Top500 — a Gordon Bell submission was tendered as a proxy — has led to some doubt that these are true exascale systems, at least as measured with an FP64 workload.

For now, Frontier is officially the fastest supercomputer in the world and the first to officially break the exascale barrier. The nearly-mythical, oft-delayed Intel-powered Aurora is expected to come online later this year, or early next year, with up to 2 ExaFlops of performance, rivaling Frontier for the top spot in the supercomputing rankings.

Up next for AMD? El Capitan, a 2+ ExaFlop machine last known to be coming online in 2023. Upon completion, this Zen 4-powered supercomputer will vie with the Intel-powered Aurora for the title of the fastest supercomputer in the Top500.  

Soundbite: Lessons and Legacy - Oppenheimer and The Manhattan Project

IBM Unveils World's First 2 Nanometer Chip Technology, Opening a New Frontier for Semiconductors

New chip milestone to propel major leaps forward in performance and energy efficiency

May 6, 2021

ALBANY, N.Y., May 6, 2021 /PRNewswire/ -- IBM (NYSE: IBM) today unveiled a breakthrough in semiconductor design and process with the development of the world's first chip announced with 2 nanometer (nm) nanosheet technology. Semiconductors play critical roles in everything from computing, to appliances, to communication devices, transportation systems, and critical infrastructure.

"The IBM innovation reflected in this new 2 nm chip is essential to the entire semiconductor and IT industry."

Demand for increased chip performance and energy efficiency continues to rise, especially in the era of hybrid cloud, AI, and the Internet of Things. IBM's new 2 nm chip technology helps advance the state-of-the-art in the semiconductor industry, addressing this growing demand. It is projected to achieve 45 percent higher performance, or 75 percent lower energy use, than today's most advanced 7 nm node chipsi.

The potential benefits of these advanced 2 nm chips could include:

Quadrupling cell phone battery life, only requiring users to charge their devices every four daysii.

Slashing the carbon footprint of data centers, which account for one percent of global energy useiii. Changing all of their servers to 2 nm-based processors could potentially reduce that number significantly.

Drastically speeding up a laptop's functions, ranging from quicker processing in applications, to assisting in language translation more easily, to faster internet access.

Contributing to faster object detection and reaction time in autonomous vehicles like self-driving cars.

"The IBM innovation reflected in this new 2 nm chip is essential to the entire semiconductor and IT industry," said Darío Gil, SVP and Director of IBM Research. "It is the product of IBM's approach of taking on hard tech challenges and a demonstration of how breakthroughs can result from sustained investments and a collaborative R&D ecosystem approach."

IBM at the forefront of semiconductor innovation

Exacomm 2022: Frontier’s Exascale Architecture by Scott Atchley, Oak Ridge National Laboratory

This latest breakthrough builds on decades of IBM leadership in semiconductor innovation. The company's semiconductor development efforts are based at its research lab located at the Albany Nanotech Complex in Albany, NY, where IBM scientists work in close collaboration with public and private sector partners to push the boundaries of logic scaling and semiconductor capabilities.

This collaborative approach to innovation makes IBM Research Albany a world-leading ecosystem for semiconductor research and creates a strong innovation pipeline, helping to address manufacturing demands and accelerate the growth of the global chip industry.

IBM's legacy of semiconductor breakthroughs also includes the first implementation of 7 nm and 5 nm process technologies, single cell DRAM, the Dennard Scaling Laws, chemically amplified photoresists, copper interconnect wiring, Silicon on Insulator technology, multi core microprocessors, High-k gate dielectrics, embedded DRAM, and 3D chip stacking. IBM's first commercialized offering including IBM Research 7 nm advancements will debut later this year in IBM POWER10-based IBM Power Systems.

50 billion transistors on a fingernail-sized chip   

Increasing the number of transistors per chip can make them smaller, faster, more reliable, and more efficient. The 2 nm design demonstrates the advanced scaling of semiconductors using IBM's nanosheet technology. Its architecture is an industry first. Developed less than four years after IBM announced its milestone 5 nm design, this latest breakthrough will allow the 2 nm chip to fit up to 50 billion transistors on a chip the size of a fingernail.

More transistors on a chip also means processor designers have more options to infuse core-level innovations to improve capabilities for leading edge workloads like AI and cloud computing, as well as new pathways for hardware-enforced security and encryption. IBM is already implementing other innovative core-level enhancements in the latest generations of IBM hardware, like IBM POWER10 and IBM z15.

About IBM

IBM is a leading global hybrid cloud and AI, and business services provider, helping clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. Nearly 3,000 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently, and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and business services deliver open and flexible options to our clients. All of this is backed by IBM's legendary commitment to trust, transparency, responsibility, inclusivity, and service.

Jack Wells, Oak Ridge National Laboratory

The Frontier supercomputer at the Department of Energy’s Oak Ridge National Laboratory earned the top ranking today as the world’s fastest on the 59th TOP500 list, with 1.1 exaflops of performance. The system is the first to achieve an unprecedented level of computing performance known as exascale, a threshold of a quintillion calculations per second.

Frontier features a theoretical peak performance of 2 exaflops, or two quintillion calculations per second, making it ten times more powerful than ORNL’s Summit system. The system leverages ORNL’s extensive expertise in accelerated computing and will enable scientists to develop critically needed technologies for the country’s energy, economic and national security, helping researchers address problems of national importance that were impossible to solve just five years ago.

“Frontier is ushering in a new era of exascale computing to solve the world’s biggest scientific challenges,” ORNL Director Thomas Zacharia said. “This milestone offers just a preview of Frontier’s unmatched capability as a tool for scientific discovery. It is the result of more than a decade of collaboration among the national laboratories, academia and private industry, including DOE’s Exascale Computing Project, which is deploying the applications, software technologies, hardware and integration necessary to ensure impact at the exascale.”

Rankings were announced at the International Supercomputing Conference 2022 in Hamburg, Germany, which gathers leaders from around the world in the field of high-performance computing, or HPC. Frontier’s speeds surpassed those of any other supercomputer in the world, including ORNL’s Summit, which is also housed at ORNL’s Oak Ridge Leadership Computing Facility, a DOE Office of Science user facility.

Frontier, a HPE Cray EX supercomputer, also claimed the number one spot on the Green500 list, which rates energy use and efficiency by commercially available supercomputing systems, with 62.68 gigaflops performance per watt. Frontier rounded out the twice-yearly rankings with the top spot in a newer category, mixed-precision computing, that rates performance in formats commonly used for artificial intelligence, with a performance of 6.88 exaflops.

The work to deliver, install and test Frontier began during the COVID-19 pandemic, as shutdowns around the world strained international supply chains. More than 100 members of a public-private team worked around the clock, from sourcing millions of components to ensuring deliveries of system parts on deadline to carefully installing and testing 74 HPE Cray EX supercomputer cabinets, which include more than 9,400 AMD-powered nodes and 90 miles of networking cables.

“When researchers gain access to the fully operational Frontier system later this year, it will mark the culmination of work that began over three years ago involving hundreds of talented people across the Department of Energy and our industry partners at HPE and AMD,” ORNL Associate Lab Director for computing and computational sciences Jeff Nichols said. “Scientists and engineers from around the world will put these extraordinary computing speeds to work to solve some of the most challenging questions of our era, and many will begin their exploration on Day One.”

Frontier has arrived, and ORNL is preparing for science on Day One. Credit: Carlos Jones/ORNL, Dept. of Energy

ORNL’s investment in high-performance computing is critical to our ability to deliver on the lab’s and the Department of Energy’s mission. Credit: Carlos Jones/ORNL, Dept. of Energy

Teams of dedicated people overcame numerous hurdles, including pandemic-related supply chain issues, to complete Frontier’s installation. Despite these challenges, delivery of the system took place from September to November 2021. Credit: Carlos Jones/ORNL, Dept. of Energy

Next Generation Power SystemsModeling and Computations

Frontier has arrived, and ORNL is preparing for science on Day One. Credit: Carlos Jones/ORNL, Dept. of Energy

ORNL’s investment in high-performance computing is critical to our ability to deliver on the lab’s and the Department of Energy’s mission. Credit: Carlos Jones/ORNL, Dept. of Energy

Teams of dedicated people overcame numerous hurdles, including pandemic-related supply chain issues, to complete Frontier’s installation. Despite these challenges, delivery of the system took place from September to November 2021. Credit: Carlos Jones/ORNL, Dept. of Energy

Frontier’s overall performance of 1.1 exaflops translates to more than one quintillion floating point operations per second, or flops, as measured by the High-Performance Linpack Benchmark test. Each flop represents a possible calculation, such as addition, subtraction, multiplication or division.

Frontier’s early performance on the Linpack benchmark amounts to more than seven times that of Summit at 148.6 petaflops. Summit continues as an impressive, highly ranked workhorse machine for open science, listed at number four on the TOP500.

Frontier’s mixed-precision computing performance clocked in at roughly 6.88 exaflops, or more than 6.8 quintillion flops per second, as measured by the High-Performance Linpack-Accelerator Introspection, or HPL-AI, test. The HPL-AI test measures calculation speeds in the computing formats typically used by the machine-learning methods that drive advances in artificial intelligence.

Detailed simulations relied on by traditional HPC users to model such phenomena as cancer cells, supernovas, the coronavirus or the atomic structure of elements require 64-bit precision, a computationally demanding form of computing accuracy. Machine-learning algorithms typically require much less precision — sometimes as little as 32-, 24- or 16-bit accuracy — and can take advantage of special hardware in the graphic processing units, or GPUs, relied on by machines like Frontier to reach even faster speeds.

ORNL and its partners continue to execute the bring-up of Frontier on schedule. Next steps include continued testing and validation of the system, which remains on track for final acceptance and early science access later in 2022 and open for full science at the beginning of 2023.

UT-Battelle manages ORNL for the Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.

 Frontier By The Numbers, Credit: Laddy Fields/ORNL, U.S. Dept. of Energy

Credit: Laddy Fields/ORNL, U.S. Dept. of Energy

FACTS ABOUT FRONTIER

The Frontier supercomputer’s exascale performance is enabled by some of the world’s most advanced pieces of technology from HPE and AMD:

Frontier has 74 HPE Cray EX supercomputer cabinets, which are purpose-built to support next-generation supercomputing performance and scale, once open for early science access.

Each node contains one optimized EPYC™ processor and four AMD Instinct™ accelerators, for a total of more than 9,400 CPUs and more than 37,000 GPUs in the entire system. These nodes provide developers with easier capabilities to program their applications, due to the coherency enabled by the EPYC processors and Instinct accelerators.

HPE Slingshot, the world’s only high-performance Ethernet fabric designed for next-generation HPC and AI solutions, including larger, data-intensive workloads, to address demands for higher speed and congestion control for applications to run smoothly and boost performance.

An I/O subsystem from HPE that will come online this year to support Frontier and the OLCF. The I/O subsystem features an in-system storage layer and Orion, a Lustre-based enhanced center-wide file system that is also the world’s largest and fastest single parallel file system, based on the Cray ClusterStor E1000 storage system. The in-system storage layer will employ compute-node local storage devices connected via PCIe Gen4 links to provide peak read speeds of more than 75 terabytes per second, peak write speeds of more than 35 terabytes per second, and more than 15 billion random-read input/output operations per second. The Orion center-wide file system will provide around 700 petabytes of storage capacity and peak write speeds of 5 terabytes per second.

As a next-generation supercomputing system and the world’s fastest for open science, Frontier is also energy-efficient, due to its liquid-cooled capabilities. This cooling system promotes a quieter datacenter by removing the need for a noisier, air-cooled system.

Exascale: The New Frontier of Computing

IBM’s first quantum data center in Europe

We’re reimagining quantum computing on the cloud with IBM’s first quantum data center in Europe, based in Ehningen, Germany.

IBM’s first quantum data center in Europe

We’re reimagining quantum computing on the cloud with IBM’s first quantum data center in Europe, based in Ehningen, Germany.

Today, we’re announcing that IBM will build its first European quantum data center, and first outside the US, at the IBM facility in Ehningen, Germany, available for cloud access in 2024.

As an entirely new computational paradigm, cloud-based quantum computing still carries some practical questions: How do you efficiently run a program across quantum and classical resources? What if those resources and data are spread across the world, governed by different data privacy laws and considerations? Routing quantum-classical compute workflows around the world while delivering a seamless experience to the end user is no small task.

So, alongside the data center launch, we’re developing a new software integration layer to help answer these questions. Once we bring the European quantum data center online, we will introduce the multichannel scheduler: a layer to manage access and resources across different regions and channels.

A channel can be a partner or institution that handles their users access and/or data and can combine with their own or third party classical resources to develop and integrate quantum into their own advance compute solutions.

With the European IBM Quantum data center, the client can ensure their data is handled and processed solely in Europe.

Today, institutions are exploring quantum computing in a variety of ways, generating their own specific needs. Some are hoping to extract business value by running quantum algorithms or building their own quantum software, or integrating quantum solutions into classical applications.

Others are interested in exploring the frontiers of science on the latest in quantum computing hardware. Some might be processing critical job data governed by data security policies or regional laws, while others are running experiments that simply require whichever resources can be provisioned soonest.

The multichannel scheduler



The multichannel scheduler is a layer that sits between the user, their cloud services, and the quantum data centers. It serves to facilitate user access to multi-region computing that uses the IBM Qiskit Runtime primitives to run quantum programs — with the advantage of incorporating quantum resources from different regions depending on their needs or constraints, such as data sovereignty.

This approach allows our clients and cloud providers in each of our quantum regions to employ classical services to run some of our middleware for quantum tools like Circuit knitting techniques allow us to partition large quantum circuits into subcircuits that fit on smaller devices, incorporating classical simulation to “knit” together the results to achieve the target answer. Read more.circuit knitting, and having access to our newest innovations like dynamic circuits. And the end user will still be able to run multi-cloud workflows incorporating other cloud service providers. The multichannel scheduler communicates with each of these channels while routing quantum workflows to the appropriate geographies.

The multichannel scheduler is especially important for users concerned about where their data is stored and processed. It starts the journey towards quantum computation as a stateless service, where the data ownership remains with our users. With the European IBM Quantum data center, the client can ensure their data is handled and processed solely in Europe.

Pictured: The multichannel scheduler will connect the IBM Qiskit Runtime service across different regions with each channel"  and in the graph, we can replace for this one, to show "multichannel scheduler api.

Pictured: The multichannel scheduler will connect the IBM Qiskit Runtime service across different regions with each channel.

The multichannel scheduler will allow for the use of IBM Quantum systems in both the US quantum data center as well as the new European quantum data center regardless of where they’re submitting code from. Users in Europe can continue exploring early prototype systems provided only in the US data center and, when ready, to apply those lessons learned to Europe-only systems.

The scheduler is the central piece to our mission to bring useful quantum computing to the world. It will provide a quantum experience tailored to the needs of the more than 60 European organizations in the IBM Quantum Network, plus our Europe-based pay-as-you-go clients. And it will continue to connect organizations around the world to quantum computing resources on the cloud through our quantum computing centers in the US, Europe, and any centers we build in the future.

The 100,000 Qubit Quantum-Centric Supercomputer of 2033

In order to get there, we’ve set our sights on a key milestone: a 100,000-qubit system by 2033. And now, we’re sponsoring and partnering on targeted research with the University of Tokyo and the University of Chicago to develop a system that would be capable of addressing some of the world’s most pressing problems that even the most advanced supercomputers of today may never be able to solve.

But why 100,000? At last year’s At the IBM Quantum Summit 2022, the quantum community came together to witness the debut of the 433-qubit IBM Quantum Osprey processor — and to dive into the newest Qiskit Runtime capabilities that will accelerate research and development and build towards quantum-centric supercomputing. Read more.IBM Quantum Summit, we demonstrated that we’d charted the paths forward to scaling quantum processors to thousands of qubits — but beyond that, the path is less clear.

Why? It’s a combination of footprint, cost, chip yield, energy, and supply chain challenges, to name a few. To ensure that these roadblocks don’t stop our progress, we must collaborate to do fundamental research across physics, engineering, and computer science.

Just as no single company is responsible for the current era of computing, now the world’s greatest institutions are coming together to tackle this problem to bring about this new era. We need the help of a broader quantum industry.

Scaling quantum computers

Last year, we released our answer for how we plan to scale quantum computers to a level where they can perform useful tasks. With that foundation set, we now see four key areas requiring further advancement in order to realize the 100,000-qubit supercomputer: Quantum communication, middleware for quantum, quantum algorithms and error correction (capable of using multiple quantum processors and quantum communication), and components with the necessary supply chain.

Rendering of IBM Quantum’s 100,000-qubit quantum-centric supercomputer, expected to be deployed by 2033.

Pictured: A concept rendering of IBM Quantum’s 100,000-qubit quantum-centric supercomputer, expected to be deployed by 2033.

We’ll be sponsoring research at the University of Chicago and the University of Tokyo to advance each of these four areas.

HFIR: Leading the World in Isotopes and Science

The University of Tokyo will lead efforts to identify, scale, and run end-to-end demonstrations of quantum algorithms. They will also begin to develop and build the supply chain around new components required for such a large system including cryogenics, control electronics, and more. The University of Tokyo, too, has demonstrated leadership in these spaces; they helm the Quantum Innovation Initiative Consortium (QIIC), bringing together academia, government, and industry to develop quantum computing technology and building an ecosystem around it.

Through the IBM-UTokyo Lab, the university has already begun researching and developing algorithms and applications related to quantum computing, while laying the groundwork for the hardware and supply chain necessary to realize a computer at this scale.

Meanwhile, the University of Chicago will be leading efforts in bringing quantum communication to quantum computation, with classical and quantum parallelization plus quantum networks. They will also lead efforts to improve middleware for quantum, adding serverless quantum execution, Circuit knitting techniques allow for partitioning large quantum circuits into subcircuits that fit on smaller devices. Read more.circuit knitting, and physics-informed error resilience so we can run programs across these systems.

The University of Chicago has already demonstrated a proven track record of leadership in quantum and quantum communication through the Chicago Quantum Exchange. The CQE operates a 124-mile quantum network over which to study long-range quantum communication. Additionally, many of the University of Chicago’s software techniques have helped provide structure to quantum software and influenced IBM’s and other industry middleware.

We recognize how challenging it will be to build a 100,000-qubit system. But we see the path before us — we have our list of known knowns and known unknowns. And if unforeseen challenges arise, we as an industry should be eager to take them on. We think that together with the University of Chicago and the University of Tokyo, 100,000 connected qubits is an achievable goal by 2033.

At IBM, we’ll continue following our development roadmap to realize quantum-centric supercomputing, while enabling the community to pursue progressive performance improvements. It means finding quantum advantages over classical processors, while treating quantum as one piece of a broader HPC paradigm with classical and quantum working as one computational unit. And with this holistic approach, plus our push toward the 100,000-qubit mark, we’re going to bring useful quantum computing to the world, together.

More Information:

https://www.olcf.ornl.gov/frontier/

http://www.phys.utk.edu/archives/colloquium/2022/10-03-messer.pdf

https://vimeo.com/720503659

https://en.wikipedia.org/wiki/Frontier_(supercomputer)

https://research.ibm.com/blog/europe-quantum-datacenter-software

https://research.ibm.com/blog

https://research.ibm.com/blog/ibm-pytorch-foundation




Enabling Quantum sensing technology with engineered diamond solution

$
0
0

 



Quantum Sensing for Energy Applications: Review and Perspective

SPINNING DIAMONDS FOR QUANTUM PRECISION

Quantum sensors are allowing us to see the tiniest parts of our world but external magnetic fields can easily break their crucial quantum state. By introducing extra spin, scientists have discovered they can trick a quantum system to keep it going.

We live in a noisy world. Interference from light, vibrations, electromagnetic radiation and sound can be annoying; it messes with our sleep and can interfere with our electrical equipment.

For physicists who study the very small and the very far away, noise can be a deal-breaker. To reduce it, they often need to come up with large, expensive solutions.

They had to build the world’s largest and most powerful particle accelerator to see the tiny signal from the Higgs Boson particle, and the world’s longest and most sensitive ruler to see gravitational waves. Scientists have to send telescopes into space to avoid the noise of our atmosphere if they are to see the details of the most distant galaxies.

But the solution isn’t always on such a grand scale. In new research published in Nature Physics, a group of physicists from the University of Melbourne have found a way to reduce the noise experienced by quantum sensors just by spinning them.

Quantum sensors are highly sensitive and among their many promising applications they are ushering in a new era of MRI (Magnetic Resonance Imaging) that is making visible the tiny details inside cells and proteins.

A particularly promising quantum sensor is the nitrogen vacancy (NV) centre, found in diamonds. This is an atomic-level flaw, where a nitrogen atom replaces a carbon atom, trapping electrons in a quantum state.

“An electron is essentially a bar magnet,” says Dr Alexander Wood from the School of Physics at the University of Melbourne, who was first author on the Nature Physics paper.

“It has a north pole and a south pole. And if we put an electron in a magnetic field, it will spin very rapidly.”

But the electrons in NV centres aren’t the only magnets in a diamond.

“In a diamond you have two kinds of carbon. Most are what is called carbon-12, which is pretty boring,” says Dr Wood.

“However, about 1 in every 100 carbon atoms is a carbon-13. It has an extra neutron.

“Like electrons, the nucleus of each of these carbon-13 atoms is like a little bar magnet. And, like a bar magnet, if you put a carbon-13 nucleus in a magnetic field, it spins.”

Quantum states rely on a property called coherence, which is sensitive to environmental ‘noise’ that can lead to a loss of the quantum state, known as dephasing. Associate Professor Andy Martin, who led the Australian Research Council funded study, says that maintaining the quantum state of NV centres is hard.

“A quantum state is fragile. It’s fragile to the magnetic field in particular. If you have fluctuations in the magnetic field it will dephase the quantum sensor.”

Maintaining the quantum state is the key to using NV systems as quantum sensors of nano-scale environments

Professor Hollenberg, who leads a University of Melbourne research group on quantum sensors, likens the quantum state to a bubble.

“If your environment is prickly, then the quantum state won’t last very long. But if your environment is less prickly, that bubble will last a lot longer,” he says.

“This is the principle by which we can sense the environment around the NV centre at extremely small scales and high sensitivity.”

In the study, researchers sought to reduce the effect of dephasing by quickly rotating the whole system.

“The spinning atomic bar magnetics of the carbon-13 atoms create prickles in the magnetic field – they interact with the NV centres, affecting its coherence and ability to sense,” says Associate Professor Martin.

Minimising the noise from carbon-13 increases the sensitivity of quantum sensors, which should lead to greater insights into the nanoscale world.

This can be achieved using synthetically engineered and expensive isotopically pure carbon-12 diamonds, or by stopping the carbon-13 atoms from spinning. The problem with stopping the carbon-13 spinning is that the NV centre electrons would also stop spinning, and this spinning is crucial to how these quantum sensors work.

The solution is to trick the NV centre into thinking the atomic bar magnets of the carbon-13 atoms have stopped spinning.

To do this the team, working in the laboratory of Professor Robert Scholten, used a technique from classical physics. It involves rotating the whole diamond at high speeds.

For their experiments, the researchers mounted a diamond with quantum sensors (NV centres, in blue) on a rotating spindle. The grid pattern represents the atomic structure of the diamond. The diamond is mostly non-magnetic carbon-12 atoms, but contains a small number of magnetic carbon-13. A green laser is used to both create and read the quantum state. Picture: Supplied

“In the magnetic field that we typically use, atomic bar-magnets of the NV centres will spin about 2.8 billion times per second, whereas the carbon-13 will spin about 5,000 times per second,” says Dr Wood.

“Because it is already spinning so fast, if we rotate the entire diamond at 5,000 times per second, the atomic bar-magnet of the NV centre isn’t affected.

“But the carbon-13 atoms are affected. And because the NV centre and the carbon-13 are now in the same frame of reference, rotating at 5,000 times a second in the same direction as the carbon atoms spin, it means the NV centre sees the carbon-13 as essentially stationary.

“So you can effectively cancel the magnetic fields from the carbon-13 that these sensors see by putting your sensor and the carbon-13 inside the same rotating frame.”

“What we have here is an environment that when you are not rotating is quite spiky. And when you rotate it, it becomes less spiky, increasing the longevity of the quantum state,” says Associate Professor Martin.

Based on this we would assume that the optimum precision would occur when the diamond was spinning at exactly the same rate as the carbon-13. But the researchers found that this wasn’t the case.

“You would expect the quantum-ness of the sensor to go up and up until the carbon-13 spins are frozen in the rotating frame, but as we get closer to the frozen frame, the coherence starts to go down, because the carbon-13s start interacting with each other, adding noise back into the system,” says Dr Wood.

If the diamond rotates in the same direction (orange) as the carbon-13, the quantum sensor sees a slower spin (and lower pseudo magnetic field), while if the diamond rotates in the opposite direction (purple) the quantum sensor sees a faster spin (and larger pseudo magnetic field). Picture: Supplied

The researchers have determined the pseudo field that gives the greatest reduction in noise from the cabon-13 spins.

“The sweet spot seems to be in a total magnetic field – which is the combination of the normal field and the rotating frame pseudo field – of one Gauss, which equates to the sensor seeing the carbon spin about 1000 times per second,” says Dr Wood.

“The Gauss is a measurement of magnetic flux density, or magnetic field strength. For example, a fridge magnet is about 100 Gauss and the Earth’s magnetic field strength is about half a Gauss.”

While this technique could soon be used to improve the precision of quantum MRI scanners, Associate Professor Martin says it may also help to answer some fundamental questions in physics.

“For example, quantum sensors could assist in answering such questions as; when does a fluid become a fluid?” he says.

“Take a water molecule, that’s not a fluid. Take two water molecules, that’s not a fluid either. At some point it becomes a fluid and it’s all to do with the scale at which you are probing. And you can only look at that if you can probe down to those scales.

“Now you have these sensors based on nitrogen defects in diamonds. They don’t have to be a big diamond like a diamond ring, they can be nano-crystals. They can be extremely small.

“So you begin to have these devices that can measure translational and, now, rotational movement. It gives you a probe on these very small scales, not just in terms of magnetic fields but in terms of the translational and rotational motion.”

Enabling quantum sensing technology with engineered diamond solutions | Quantum.Tech 2021

Abstract

On its revolutionary threshold, quantum sensing is creating potentially transformative opportunities to exploit intricate quantum mechanical phenomena in new ways to make ultrasensitive measurements of multiple parameters. Concurrently, growing interest in quantum sensing has created opportunities for its deployment to improve processes pertaining to energy production, distribution, and consumption. Safe and secure utilization of energy is dependent upon addressing challenges related to material stability and function, secure monitoring of infrastructure, and accuracy in detection and measurement. A summary of well-established and emerging quantum sensing materials and techniques, as well as the corresponding sensing platforms that have been developed for their deployment is provided here. Specifically, the enhancement of existing advanced sensing technologies with quantum methods and materials is focused on, enabling the realization of an unprecedented level of sensitivity, placing an emphasis on relevance to the energy industry. The review concludes with a discussion on high-value applications of quantum sensing to the energy sector, as well as remaining barriers to sensor deployment.

1 Overview

1.1 Basics of Quantum Information Science

Quantum Information Science (QIS) combines the intellectual foundations of quantum mechanical and information science theories, relying upon quantum mechanics and elements of mathematics, computer science, physical sciences, and engineering.[1, 2] Quantum mechanical theory includes a probabilistic description of matter and energy and can be used to naturally explore physical systems that are not well accounted for within the classical regime.[2, 3] Information theory defines information in terms of the entropy of a variable, enabling modern electronics and computing devices to perform more efficiently.[4] A proper blend of these two established theories has appeared as a new paradigm in science and technology and provides the foundation of QIS. Over the last three decades, QIS has become one of the most rapidly growing fields of research in areas such as physics, chemistry, and engineering.

QIS has already exhibited utility in both communication and computation, and applications are expected to increase in areas such as quantum sensing[5, 6] and quantum networking.[2, 7] Studies have been performed in different research institutions worldwide to explore the utility of QIS in areas including high energy physics, nuclear science and energy,[8] quantum chemistry,[9] economics,[10] communications,[11] and optimization.[12] QIS is expected to be particularly useful for the energy sector: early stage proposals seeking applications are being investigated in research institutions around the globe in areas such as nuclear energy, material sciences, and energy infrastructure optimizations. QIS is typically divided into four main pillars: quantum sensing and metrology, quantum networking, quantum simulations, and quantum computing. In Figure 1, we summarize the possible energy applications of each of the four pillars of QIS.

The four pillars of QIS (light green squares). The possible applications to different areas related to energy research and engineering (light blue rectangles).

In the short and intermediate term, quantum sensing has emerged as a potentially impactful and practical pillar of QIS that could have wide-ranging benefits for a range of economic sectors. Quantum sensing defines the measurement of physical quantities using quantum objects, quantum coherences or quantum entanglement to achieve sensitivities beyond classical limit.[6] Examples of state-of-the art commercial sensing technologies which utilize quantum mechanical laws to achieve an unprecedented level of accuracy in measurement include navigation,[6] atomic clocks,[13] gravimeters and gradiometers,[14-16] nuclear magnetic and paramagnetic resonances,[17] and electron microscopes.[18] The U.S. National Quantum Initiative Act, which was signed into law in late 2018, instructs three U.S. agencies—the National Institute of Standards and Technology (NIST), the National Science Foundation (NSF), and the Department of Energy (DOE)—to work with academic institutions and private industry to catalyze the growth of QIS.[19] According to the NSF, in the next ten years several opportunities will be enabled in quantum sensors for biotechnology and defense, next-generation positioning and navigation, and timing systems useful to military and commerce, while providing new opportunities to address complex problems in material science, chemistry, and physics. These applications have wide-ranging implications in important areas such as energy and security, impacting everyday life.


1.2 Quantum Sensing for Energy Applications

Ever-growing investment into quantum sensing-based research has created opportunities for quantum sensor deployment in a range of energy-relevant applications. Several exciting areas of application include renewable energy, nuclear energy and nuclear waste management, fossil energy, geothermal energy, electricity delivery and reliability, vehicle electrification, and energy efficiency. Common themes can be identified that span many of these various areas of application including:



1) Monitoring of energy infrastructure and advanced manufacturing

Quantum sensing can be used to realize unprecedented combinations of range, resolution, and sensitivity for measurements of critical parameters of interest. Data from sensor devices with high sensitivity and reliability that are capable of detecting early signs of equipment failure can be analyzed using predictive models[20] to obtain insight on future performance and assess operational state of health with a higher degree of confidence. Such enhanced monitoring systems enable condition-based asset maintenance rather than time-based maintenance, thereby lowering the overall cost and minimizing interruptions and failures. Unprecedented performance and cost trade-offs can be potentially achieved, allowing a broader commercial deployment. As one example, entangled photons can be integrated with an optical fiber sensor platform, which can be evaluated for its ability to maintain quantum coherence over application-relevant length scales. While classical sensing platforms are rapidly advancing, quantum technologies ultimately can push their performance beyond classical limits.

2) Electricity delivery transmission, distribution, and storage

Monitoring power losses during energy transmission, storage and distribution, and accurately calculating the techno-economic viability of power utilities requires electrical grids that are integrated with advanced optimization tools. Electrical grids are complex interconnected networks that transmit power from producers to consumers. Existing electrical grids can benefit from integrating “smart grid” technologies, including quantum security, networking, computing, and sensing features.[21] Sensors are often deployed to monitor the performance and integrity of the grid and to analyze parameters such as temperature and strain/stress in overhead power lines, towers, and transformers.[22, 23] If quantum sensing materials are developed that can function in extreme environments, they will improve power grid performance by providing improved sensitivity and/or faster response times when monitoring grid health.

3) Building energy efficiency

Smart buildings, which conserve energy by automating ventilation, heating, lighting, and other energy-consuming systems to minimize waste, can benefit from technology derived from QIS.[24] Energy usage is monitored and optimized using a series of sensors and actuators that track energy consumption throughout the day, enabling demand to be met more efficiently. Opportunities exist for quantum sensors to be integrated into smart buildings and smart grids, and for quantum computing and simulations to solve optimization problems for energy distribution.[24, 25] Together, these steps will ensure that resources are distributed such that energy consumption is minimized.

4) Nuclear energy

Deployment of quantum sensors in nuclear power plants has potential to aid in greater plant efficiency and safety. A recent Nuclear Science Advisory Committee (NSAC) QIS subcommittee's report on nuclear physics and QIS laid out recommendations for integrating QIS (particularly quantum sensing) for national nuclear security.[26] For example, an atom interferometric quantum sensor could be used for detecting isotopes in nuclear energy plants.[27] Highly sensitive QIS-enhanced devices can not only detect early stages of radiation breaches but also can provide avenues for remote monitoring of safety-related issues in the power plant.

The current status of quantum sensors in the fossil energy domain and perspectives on the potential applications of quantum sensing for subsurface surveying and scanning, natural gas infrastructure, and critical element detections using quantum enhanced sensing technologies are discussed in more detail in the next section.

1.3 Quantum Sensing for Fossil Energy Applications

In Figure 2, we present the Energy Information Administration's (EIA) projection of world energy growth from 2000 to 2050 for energy consumption by fuel type, and it shows an ever-increasing trend of fossil energy consumption within that time frame, at a similar growth rate as nuclear, renewables, and liquid fuel types. Fossil energy remains the dominant resource for energy usage. The statistics showed that in 2018, global energy consumption grew at a rate of 2.9%, which was the fastest growth since 2010.[28] Global oil consumption grew by 2.2 million barrels per day, and coal consumption doubled its ten years growth rate with a 1.4% increase. On the other hand, carbon emission rose by 2% in the last seven years. According to the recent data for the world's total primary energy supply, the percentage of the total energy from three major sources coal, oil and natural gases are 27%, 32%, and 22%, respectively, with remaining energy derived mainly from nuclear, biofuels, wind, and solar power.[29] While alternative sources are expected to grow in coming years, fossil fuels are anticipated to be the primary source of global energy for decades to come. Fossil fuel-based electricity generators currently only exhibit efficiencies on the order of ≈30%; and so increasing electricity generation efficiency with fossil energy and allowing for reliable resource recovery, extraction, transport, and utilization are key societal challenges, along with curbing greenhouse gas emissions.

EIA projection of world energy growth in consumption by fuel type with historical data up to 2018.[30]

Fossil energy application areas are commonly separated into the following: coal mining and recovery, CO2 utilization and coal beneficiation, carbon capture and sequestration, upstream oil and gas, midstream oil and gas, downstream oil and gas, electricity generation, and electricity transmission and distribution. The domain of fossil energy production can potentially exploit the opportunities that are offered by quantum technologies to provide higher levels of safety and security.

Specifically, quantum sensors can be used to remotely monitor oil and gas flows with a higher confidence in downstream, midstream, and upstream stage of oil and gas production. The sensitivity of existing distributed sensing systems designed for monitoring gas flows in pipelines and boreholes can be further enhanced by using entangled photons whose polarizations encode quantum information.[31, 32] In addition, quantum sensors can be used to scan the subsurface to detect oil, gas, and mineral deposits with higher accuracy than existing technologies. Sensors based on the nitrogen vacancy in nanodiamond[6, 33, 34] are highly sensitive and have potential applications in critical elements detection and recovery (see Section 2.3.2). Detection of low levels of methane from natural gas infrastructure monitoring can enable improved quantification of methane emissions, an important greenhouse gas. The detection of low levels of CO2 will enable better monitoring, verification and accounting in carbon sequestration applications while also aiding in the early identification of impacts on water quality and wellbore integrity. By integrating concepts of QIS in existing traditional sensing technologies, unprecedented performance and cost trade-offs will be potentially achieved. Table 1 summarizes sensing needs that quantum sensors may address for each fossil energy area.

1.4 Scope of the Review

Here, we review current developments in quantum sensing and discuss potential areas in which quantum sensing technologies may be introduced into different energy sector domains. More specifically, we highlight mature and emerging quantum sensing techniques, materials, and sensor platforms, and discuss how quantum sensing may contribute to more efficient energy production, transport, storage, and consumption. While this review provides an overview of relevance to the energy sector in general, an emphasis is placed upon fossil energy related applications. Different areas within fossil energy such as oil and gas exploration, surface and subsurface characterization, and chemical sensing are all examples of applications where quantum technologies are potentially useful for enhancing safe, secure, and reliable extraction and utilization of energy resources.[14, 35] For example, highly sensitive quantum sensors such as gravimeters and magnetometers have already proven useful for fossil energy in exploring oil and gas resources and subsurface characterization.[14, 15] A brief discussion of commercially available quantum sensing technologies is included, emphasizing that quantum sensors are sufficiently mature to be deployed within the market. The review concludes with an overview of remaining barriers to and challenges for continued quantum sensor deployment and a discussion of future opportunities to introduce quantum technologies into the energy sector. Taken together, this review is intended to familiarize researchers within the energy sector with quantum sensing technologies, while providing experts in quantum sensing an overview of potential areas of need in the energy sector that could benefit from quantum sensors. Figure 3 highlights a general scope of this review.

Q2B 2021 | Quantum Computing Hardware – Recent Developments & the Road Ahead | William Oliver | MIT


2 Quantum Sensing

2.1 Overview of Quantum Sensing Principles

Quantum sensing is broadly defined as the use of quantum materials, quantum coherence, and/or quantum entanglement to measure physical quantities (i.e., temperature, electromagnetic fields, strain, etc.) and/or to enhance the sensitivity of classical analytical measurements.[6] Two prominent strategies may be exploited for quantum sensing. The first, photonic quantum sensing, exploits the quantum nature of light for a range of sensing applications, from remote target detection to the readout of optical memory.[36] Nonphotonic quantum sensors, which rely on spin qubits, trapped ions, and other materials with well-defined quantum states have also been developed for applications including magnetometry, thermometry, and others.[6, 37] Both quantum sensor types may also be used to enhance the performance of classical sensor systems[6, 36] with the promise of reduced noise as compared to the so-called “shot limit.”[38]

To construct a working quantum sensor with any candidate material, DiVincenzo[39] and Degen[6] outlined a set of three necessary conditions that must be followed: i) The quantum system must have discrete resolvable energy levels (or an ensemble of two-level systems with a lower energy state |0⟩ and an upper energy state |1⟩) that are separated by a finite transition energy; ii) it must be possible to initialize the quantum sensor into a well-known state and to read out its state; iii) the quantum sensor can be coherently manipulated, typically by time-dependent fields. The interaction with these fields leads to a shift of the quantum sensor's energy levels or a change in the transition energy gap. Figure 4 shows a summary of publications on the topic of quantum sensors with a focus on materials, sensing, and devices. As one can see, during the past few decades the developments of quantum sensing has increased rapidly.

Summary of ten-year forecasts of quantum sensor markets by a) sensor type[48] and b) quantum sensor applications.

Quantum Sensing and Imaging with Diamond Spins - Ania Bleszynski Jayich

2.2 Quantum Sensing Techniques and Materials

Sensing materials play an important role in many sensor devices by providing a well-established, sensitive, and reversible response to key parameters of interest and generating a detectable signal (optical, electrical, mass, acoustic, etc.). A promising assortment of sensing materials[6, 37] have been developed that use quantum processes to probe environmental properties including temperature,[53-56] electric[57, 58] and magnetic[59-61] fields, pH,[62, 63] thermal conductivity,[64] strain,[65, 66] and force,[67] among others. Here, several prominent classes of materials used in quantum sensing applications are briefly discussed and summarized, including emerging materials such as quantum metal−organic frameworks (MOFs) that may exhibit promise as sensors with further research and development.

Materials suitable for use as quantum sensors must meet the DiVincenzo and Degen criteria described above. In addition, the sensor must be able to selectively interact with variables of interest (i.e., magnetic field, pH, temperature, etc.) in a way that predictably alters the material's quantum states, such as a shift in their respective energy levels.[6] In tandem, quantum materials may be used to enhance classical sensing performance. This section will introduce general techniques and definitions related to quantum sensor materials, followed by a discussion on specific quantum materials and examples of how they have been deployed for sensing applications.

2.2.1 Sensing Techniques Using Quantum Materials

The initialization, manipulation, and read-out of states for quantum materials may be probed using a variety of optical and magnetic techniques. This enables changes in the read-out to be determined as a function of environmental factors such as electromagnetic field or temperature.[6] The read-outs discussed here include the zero phonon line (ZPL), optically determined magnetic resonance (ODMR), spin relaxation times, optical charge conversion (OCC), and level anticrossings (LACs).

Zero phonon lines

Because quantum materials involve a single transition between two distinct, defined energy levels (or quantum states), single photon emission (SPE) may be induced. Here, optical excitation leads to a transition from the ground state to the excited state; during relaxation, single photons are released at a given time (i.e., the probability of multiple photons released at a single time is equal to 0).[68] While a distribution of emission energies may be observed from SPEs due to coupling between electronic and vibrational states, leading to a broadening of the emission spectrum, certain percentage of the emitted photons do not involve vibrational coupling, and their emission is simply equal to the energy gap between the ground and excited energy level. This “zero phonon line” emission band appears as a single narrow emission band in the emission spectra of some quantum materials.[69]

Sensing using the ZPL operates under the same principle as classical fluorescence-based sensors, in which changes in emission peak energy, intensity, and/or breadth in response to environmental changes may be exploited for sensing applications.[70] However, the intrinsic narrowness of the ZPL provides enhanced sensitivity and reproducibility for probing small changes in environmental variables such as temperature with high spatial resolution.[71, 72]

Optically detected magnetic resonance

Energy band diagram for the NV− center. The sinusoidal black arrows represent the radiative transition; black solid and dashed inclined arrows indicate strong and weak nonradiative decay via the singlet state. Optically active states for fluorescent measurement and spin splitting as shown in inset. Reproduced with permission.[34] Copyright 2017, Annual Reviews, Inc.

Briefly, optical excitation of NV− centers leads to a transition from the ground state Ig> to the triplet excited state Ie>, and relaxation can occur via fluorescence near the ZPL or through an intersystem crossing to Is> followed by the emission of a lower energy (near infrared) photon, leading to a ground state preferentially in the ms = 0 spin state. Electrons in the Ie, ms = ±1> state are more likely to undergo an intersystem crossing than those in the Ie, ms = 0 > state, and, because spin is preserved during optical excitation, a difference in the luminescence lifetime is observed for electrons excited out of the ms = 0 state versus the ms = ±1 state.[34, 73] Pumping the system with microwave (MW) radiation resonant with the ground state triplet spin transition places the system in the degenerate ms = ±1 state. However, the presence of a magnetic field lifts the degeneracy of the ms = ±1 state, causing Zeeman splitting (Figure 6, inset) and a corresponding decrease in the fluorescent intensity at the resonant frequencies between ms = 0 and ms = −1 and between ms = 0 and ms = 1, due to partial population transfer to "dark" ms = ±1 states.[34, 73] The ODMR spectra plots the luminescent intensity as a function of frequency (Figure 7a,b), and the frequency separation between the two spin transitions (where the intensity decreases) can be used to probe parameters such as temperature and magnetic field strength[74] (Figure 7c,d).

Quantum Sensors: Interview with Prof. Dr. Dr. Oliver Ambacher

Spin relaxation lifetime

a) Photoluminescence (PL) spectrum of an ensemble of NV− centers upon excitation with a 532 nm laser. The NV0 zero phonon line is at 575 nm and the NV− zero phonon line is at 638 nm. b) Time-resolved photoluminescence of NV− excited from the ms = 0 (blue) state and ms = ±1 (red) spin states during a 2 µs laser pulse. c,d) Electron paramagnetic resonance spectrum of a single NV center at zero and nonzero magnetic field, recorded using the ODMR technique. Reproduced with permission.[34] Copyright 2017, Annual Reviews, Inc.

Spin relaxation techniques have been used with NV− centers in nanodiamonds to sense the presence of nearby nuclear spins. Here, changes in the quantum material's longitudinal spin relaxation time T1 are monitored by optically sampling the population difference in the quantum material's ground state. This difference is determined by random jumps due to a fluctuating magnetic field in the dynamic environment. Changes in initial fluorescence in the pulsed fluorescence measurements are monitored. This enables the detection of the presence of nuclear spins in the vicinity of the quantum material center down to a single atom level.[63, 75, 76] For example, Figure 8 illustrates the detection of magnetic noise from a Gd3+ ion near a shallow NV− center.

Optical charge conversion

Measurement of magnetic noise from a single Gd3+ molecule attached to a diamond surface using a single shallow NV− center. a) Schematic power spectrum of the fluctuating magnetic field due to relaxation of the Gd3+ electronic spin (inset: NV-center electronic excited and ground states, with ground-state spin sublevels). Fourier components of this spectrum near the frequency resonant with the NV center zero-magnetic-field splitting lead to an increase in the NV center spin-state population relaxation rate. b) Demonstration of NV magnetic sensing of a single Gd3+ molecule on the surface of bulk diamond. Measurements of the NV center spin-state population difference relaxation and exponential fits. Clean diamond surface: blue squares and blue line. Gd3+ molecules attached to the diamond surface: red circles and red line. Recleaned diamond surface: green triangles and green line. The scatter of the experimental data points is consistent with photon shot noise with total averaging time on the order of an hour (not including the time needed to correct for setup drifts). Inset: Pulse measurement scheme for measuring the NV center spin-state relaxation rate. An avalanche photodiode (APD) was used for NV-center red fluorescence detection. Reproduced with permission.[76] Copyright 2014, American Chemical Society.

OCC techniques can be exploited in quantum materials for sensing applications. Here, optical transitions to different vacancy center charge states can be monitored using photoluminescence spectroscopy. Optical pumping promotes excitation to luminescent “bright” charge states, and conversion to non-luminescent “dark” states can then be induced using a second lower-energy pulse. Sufficiently fast detectors can monitor the conversion from “bright” to “dark” states, and the rate of this conversion is highly sensitive to external microwave or radio-frequency electric fields.[57] An example of an OCC set-up and experimental data is shown in Figure 9, where silicon carbide (SiC) is used for electric field detection. Here, the reset illumination is either 365 or 405 nm, with a pump color of 976 nm. Illumination at 365 nm generates electron–hole (e–h) pairs that reset VV to VV0−(bright state) in the steady state, at 405 nm directly ionizes VV− (dark state) to VV0, and at 976 nm excitation converts VV0 to VV− by direct two-photon ionization or indirectly by one-photon ionization of local traps. In addition, VV0 photoluminescence is provided through excitation at 976 nm. VV(0/−) is the transition energy level between the neutral and negatively charged states. VB and CB are the valence and conduction bands, respectively. An RF electric field (rms amplitude E, frequency fE) is applied across a coplanar capacitor with a 17 µm gap. VVs are created by carbon implantation immediately below the surface. A fast detector with 10 MHz bandwidth (BW) allows for direct detection of a full OCC transient signal in a single measurement. Figure 9b shows an example of data collected using this technique.

Level anticrossing spectroscopy

a) Schematic of the optical setup for optical charge conversion of SiC. b) VV is first reset to its bright state (VV0) by 405 nm illumination, followed by PL detection (top) of the charge conversion toward the VV dark state by 976 nm excitation. Bottom shows the difference between conversion with and without applied electric field (10 MHz). Data are fitted to a stretched exponential function with R = 42 kHz and n = 0.54 obtained using a global fit for all electric field values. Reproduced with permission.[57] Copyright 2018, National Academy of Science.

Level anticrossing spectroscopy techniques that require only optical measurements, without the use of an external rf field, can also be deployed for sensing applications.[77] Here, ground state level anticrossings (GSLACs) between spin states, such as ms = −3/2 and ms = 1/2 or ms = −1/2, give rise to photoluminescence resonance peaks upon magnetic excitation. When an applied magnetic field is in resonance with a given GSLAC, the external magnetic field being sensed will lead to a deviation in the PL intensity, which can be measured using a lock-in in-phase photovoltage.[77] Figure 10 illustrates the level scheme for NV− centers in diamond. Here mS is the electron spin projection quantum number, Dg and De are the ground- and excited-state zero-magnetic-field splittings, ΩMW is the MW Rabi frequency, γg0 and γg±1 are the relaxation rates from the singlet state 1E to the triplet ground-state 3A2, and γe0 and γe±1 are the relaxation rates from the triplet excited state 3E to the singlet state 1A1. The GSLAC technique has been used to observe fluctuating magnetic fields at frequencies from 500 kHz to 8 MHz.[78]

a) Level scheme for an NV− center in diamond. b) Levels of the NV− centers' electron-spin magnetic sublevels in the ground state. c) Hyperfine level (|mS, mI⟩) anticrossing in the vicinity of the GSLAC. The degree of mixing near the GSLAC (denoted by the dashed ellipses) is indicated by the relative admixture of the colors in each curve; the lines corresponding to unmixed states do not change color. Reproduced with permission.[79] Copyright 2019, American Physical Society.

2.2.2 Quantum Sensing Materials

Atom/ion sensors

One of the more widely explored strategies for quantum sensing relies upon the atomic spins of neutral atoms, electric states and vibrational modes of trapped ions, and Rydberg states in Rydberg atoms, each of which can be initialized and readout using optical techniques.[6] Common neutral atoms are typically alkali metals, such as cesium or rubidium, which can be used as a thermal atomic vapor (typically in a cell with a buffer noble gas, such as argon or neon),[80] or they can be laser cooled in a vacuum chamber.[81] Atomic vapors are best known for their use in atomic clocks[82] and magnetometry,[83] while cold atom sensors are most famously used in commercially available gravitometers (see Section 2.4.1),[84] with applications in inertial sensing as well (e.g., navigation).[85]

A range of trapped ion sensors have been developed, using ions such as 88Sr+, 171Yb+, 24Mg+, and 9Be+, and have been studied as sensors for applied force,[86] spectroscopy,[87] electric and magnetic fields,[88] and as atomic clocks.[89] Another class of atom-based quantum sensors include Rydberg atoms, which are most frequently used to detect electric fields,[90] and in recent years have also been used in the detection of magnetic fields.[91] Atom and ion quantum sensors are among the most mature classes of quantum sensors, and are currently available commercially in atomic clocks and gravitometers/gradiometers (Section 2.4). However, barriers including the need for liquid helium temperatures (in the case of cold atoms) and/or vacuum conditions can hinder applications in the harsh environments (e.g., elevated temperature, pressure, corrosive conditions, etc.) that are often encountered for energy applications. Consequently, solid-state qubits have received increasing attention for quantum sensing, and these materials are discussed in subsequent sections.

Carbon nanodiamonds

Carbon nanodiamonds have emerged as an exciting quantum material with demonstrated sensing efficacy in the detection of temperature,[54-56, 71, 92] strain,[65, 66] pH,[62, 63] electric[93, 94] and magnetic[60, 61] fields, spin,[95] thermal conductivity,[64] and the phases of water molecules,[96] even under harsh conditions such as high pressure.[97] Quantum emission is enabled via defect sites within the nanodiamond; these “color centers” are lattice defects from carbon vacancies and atomic impurities such as nitrogen[34, 74, 98] (NV), silicon[99, 100] (SiV), tin[71, 101] (SnV), germanium[100, 102] (GeV), and other atoms.[103, 104] Luminescence from single defect sites in nanodiamond can be experimentally observed,[73] and changes in luminescent features such as the ZPL energy, width, or amplitude may respond to environmental factors such as temperature, enabling them to act as luminescent sensors.[55, 71] For example, nanodiamonds (diameter <250 nm) containing SiV centers exhibit a narrow emission band at its ZPL (≈740 nm) upon excitation with a 532 nm laser, with a broader, low-intensity phonon sideband (PSB) peak centered at ≈765 nm. Changing the temperature of the nanodiamond impacts the ZPL amplitude and both the energy and width of the PSB and ZPL emission peaks.[72] Consequently, Plakhotnik and co-workers used a multiparametric technique analyzing the SiV nanodiamond ZPL position, width, and amplitude in tandem as a function of temperature, which enabled the detection of temperature changes as low as 0.4 °C within 1 ms.[72] Similarly, Akimov's group used the ZPL position and peak width of GeV-center nanodiamonds to monitor temperature.[92] With increasing temperature, a red-shift and broadening of the ZPL peak was observed, which enabled temperature changes as small as 0.1 K to be detected within a range of 150 to 400 K, with the ability to operate over a total range of 4 to 860 K (Figure 11).[92] Similar fluorescence-based detection of temperature changes have also been accomplished using NV− [55, 105] and Sn[71] centers. The use of high-performance thermometers is particularly useful for fossil energy applications, including monitoring the health of power grids[23] (i.e., temperature in transformers[22] and transmission lines), oil refining processes, pipeline integrity,[106, 107] and coal mine safety,[108] among others. The widespread deployment of quantum thermometers requires the sensor platform to be made sufficiently economical to compete with existing technologies. Additionally, demonstrated superior sensitivity and the ability to function under harsh conditions (i.e., temperatures >250 °C, high pressure, corrosive environments, etc.) will be crucial for NV− center-based thermometers to be commercially viable.

a) Plot of GeV ZPL peak position shift in GHz as a function of temperature. b) Plot of the change in ZPL width in GHz as a function of temperature. c) Photoluminescence spectra of the GeV nanodiamonds upon 532 nm laser excitation, monitored at various temperatures. Reproduced with permission.[92] Copyright 2017, American Chemical Society.

In the case of the negatively charged NV− centers and the more recently characterized SiV−centers,[109] the diamond's fluorescence intensity is also sensitive to microwave radiation, providing an additional sensing mechanism via ODMR experiments (vide supra).[34] While NV0 and NV+ centers are known, their optical and magnetic activities are poorly understood[34, 110] and studies on these centers have thus been limited.[111] Consequently, quantum sensing experiments typically use only the NV− center. The sensitivity of the ODMR peak separation as a function of magnetic field strength has made NV− centers obvious candidates for use in magnetometry[112-114] applications. A simple example of this phenomena is shown in Figure 12, in which the frequency separating two spin transitions in a single NV− center increases as a function of increasing magnetic field, demonstrating how NV− can be utilized to determine the strength of an external magnetic field with high spatial resolution (angstrom scale).[112] The ODMR readout of magnetic fields has also been extended for more advanced applications including time-resolved magnetometry,[61] nuclear magnetic resonance spectroscopy techniques,[34, 115] detection of electronic and nuclear spins,[116] and magnetic imaging.[112, 117]

ODMR spectrum illustrating the impact of external magnetic field strength on the splitting and frequency of spin transitions in a single NV− center. The frequency separating the transitions increases with increasing magnetic field strength. Reproduced with permission.[112] Copyright 2008, Springer Nature.

The diamond NV center consists of a nitrogen atom neighboring a vacancy and three carbon atoms. The symmetry of the defect is 

 irreducible representations. The Hamiltonian of the NV center can be described as

The first term is the zero field splitting parameter, where the different factors arise from nonvanishing, commutator, D = 2.87 GHz, the second term is the magnetic component, where B is the vector magnetic field, and the third term describes the electric field component, where E denotes an electric field and ϵ denotes a coupling constant. The zero field splitting term D is sensitive to strain, pressure, and temperature.[34] Because each component of the Hamiltonian is sensitive to different parameters, it is possible to extend the sensing efficacy of NV− centers beyond magnetic sensing, particularly in the presence of weak external magnetic fields or if the experiment is designed to cancel the effects of external magnetic fields,[66] because the Zeeman effect is significantly stronger than other splitting effects (such as the Stark effect, vide infra) in the ground state.[34, 93, 94] Of course, this responsiveness to other parameters poses significant cross-sensitivity challenges because multiple variables may influence the sensor response, and additional research is needed to isolate variables of interest before NV− systems may be deployed in complex environments.

Several strategies may be employed to mitigate or insulate the NV center from magnetic fields, enabling the ODMR technique to be applied to detect other experimental variables of interest.[118] One method involves the alignment of an external magnetic field in the nonaxial plane (Bz = 0), leading to the observation of only the central hyperfine transition in the ODMR spectrum, which is susceptible to splitting with electric field and strain.[93] Once the magnetic field is mitigated, shifts in the NV− ground triplet-state spin sub-levels resulting from an external electric field can be monitored in the ODMR spectrum due to the Stark effect, enabling electric field detection.[93, 94] It must be emphasized that electric field sensing using this technique may only occur when the internal electric or strain fields are significantly stronger than the magnetic field along the NV− axes.[94] Hence, careful alignment of the external magnetic field to the nonaxial plane is critical for isolating the electric field variable, and the use of a magnetically shielded experimental design may further improve the use of NV− centers in electrometer and strain sensing applications.[93] ODMR can also be used to provide a readout of lattice strain. For example, cantilever-based experiments have been used to apply longitudinal and axial strain on NV− nanodiamonds, producing both a shift in the frequency of the zero-field ODMR line and splitting.[65, 66, 119] Because lattice strain can also be thermally induced, external temperature may also be probed with high spatial resolution by monitoring the frequency of the NV− ODMR spectra as a function of temperature following strain-induced hyperfine splitting.[54] It has also been demonstrated that the frequency of the ODMR splitting parameter (D) will linearly shift to higher frequencies with increasing pressure, with a corresponding blue shift in the ZPL; hence, NV− nanodiamonds represent a versatile material for pressure sensing using optical techniques.[120] Thus, although the NV− system is versatile, a key barrier to the development of practical NV− sensors is that multiple variables can simultaneously influence the ODMR response, and additional strategies are needed to develop sensors that will only respond to single variables of interest.

Quantum enhanced sensing and imaging with single spins in diamond - Fedor Jelezko

In addition to ODMR, an emerging method for all-optical sensing using diamond is spin relaxometry. In the presence of nearby magnetic molecules such as Mn2+, O2, and Gd3+, the longitudinal relaxation rate R1 increases and the longitudinal relaxation time T1 decreases, enabling NV− nanodiamonds to be deployed as ion and molecule sensors (see, for instance, Figure 8).[63, 75, 76] Subsequent works have placed Gd3+ near the NV− center for relaxometry-based sensing of pH and redox potential. Here, pH or redox-active polymers were used to selectively and reversibly release Gd3+ from the nanodiamond surface into solution (i.e., via redox-induced disulfide cleaving) as a function of pH or redox potential, leading to a response in T1.[63] The Brownian motion of a Gd3+ complex in solutions of different viscosities was also monitored by probing the T1 of the nanodiamond NV− center.[121]

Another NV center approach gaining the momentum is NV center NMR spectroscopy, which allows discrimination of atomic species at the nanoscale level. The basic notion is that the application of short pulses allows one to set up a frequency-dependent effective proton field in the vicinity of the NV center. Several methods such as pulse-echo and dynamic decoupling have been demonstrated for NMR sensing.[122, 123] Examples of the methods used for ODMR, spin relaxometry, and NMR are shown in Figure 13. Both bulk diamond and nanodiamond NMR find applications in various areas of applied research.[124] Research is on-going to discover and characterize new color centers[104] and to integrate well-studied systems such as NV nanodiamonds into devices[125, 126] relevant for sensing applications, including portable systems,[50, 52, 114, 127] which represent exciting next steps in the integration of this quantum material into practical technologies.

a) Optically detected magnetic resonance sensing algorithm. Reproduced with permission.[128] Copyright 2013, Materials Research Society. b) Spin relaxometry detection scheme. Reproduced with permission.[75] Copyright 2013, Springer Nature. c) NV center nuclear magnetic resonance detection algorithm. Reproduced with permission.[122] Copyright 2013, American Association for the Advancement of Science.

Color centers in SiC have emerged as intriguing quantum materials that exhibit similar properties to nanodiamonds (including readouts by ODMR) coupled with the scalability and processability of Si.[37, 53, 129] There are over 200 polytypes of SiC with a wide array of reported color centers, producing a rich space of potential properties.[37, 53] In particular, 6H-SiC and 4H-SiC, the most commonly studied SiC polytopes, exhibit near-infrared emission that minimizes absorbance from optical fibers, facilitating integration into optical devices for sensing applications.[53, 130] Consequently, SiC-based quantum sensors have been explored for measuring magnetic fields,[131, 132] temperature (10–310 K),[133] strain,[134] and electric fields,[57, 58] among others. The silicon monovacancy (Vsi−) and divacancy (Vsi, with missing adjacent Si and C atoms) are the two most well-studied SiC centers for quantum applications.[37, 98] The neutral divacancy Vsi has a spin quantum number of 1 (S = 1), with optical properties that are similar to NV− vacancies in diamond. The negatively charged silicon monovacancy Vsi− is an S = 3/2 system, leading to differences in electronic structure relative to Vsi (Figure 14).[98, 130, 135] Both centers exhibit ODMR (with vacancy-dependent differences in ODMR features) that can be exploited for sensing applications. Additionally, unlike the NV− center in diamond, Vsi− spins exist only in one orientation, which simplifies interpretations of magnetic resonance transitions.[136] The ODMR method has specifically been used for magnetometry,[53, 59, 77] thermometry,[53, 137] strain characterization,[131, 138, 139] and electrometry,[139] using similar techniques to those described above for NV− diamond centers.

Simplified electronic structure diagram of the S = 1 neutral silicon divacancy Vsi (left) and S = 3/2 negatively charged silicon monovacancy Vsi− (right). Solid lines and dashed lines denote radiative and nonradiative transitions, respectively; the thick dashed arrow represents the most probable intersystem crossing. The number of filled circles represents the populations of the ground-state energy levels under optical pumping. Adapted with permission.[130] Copyright 2011, American Physical Society.

Like the NV− centers, because SiC is responsive to a multitude of variables, care must be taken during experimental set-ups to isolate variables of interest. In addition to strategies discussed in the nanodiamond section (vide supra), one possible advantage in SiC is the multitude of addressable high spin centers that may exist in a single SiC crystal. It has been shown that certain centers exhibit low sensitivities to temperature, for example, rendering them excellent candidates for magnetometry applications, whereas other centers are highly sensitive to temperature changes and are less sensitive to magnetic fields, making these centers preferable for thermometry applications.[53] The ability to selectively engineer specific color centers for specific sensing targets would represent a critical step in advancing the viability of ODMR-based quantum sensors. Continued research and development of SiC centers that are sensitive to only one variable are therefore crucial.

OCC can also be exploited for sensing applications (see Section 3.2) using SiC. For example, Figure 9 demonstrates a set-up in which the OCC is used to image acoustic waves on piezoelectric surface-acoustic wave (SAW) devices.[57] OCC has also been used for strain sensing applications.[134] A recent update has improved the sensitivity of this technique by applying a reference electric field during the measurement.[58]

This high spin VSi− center can be further exploited for the detection of magnetic field, temperature, and strain using level anticrossing spectroscopy techniques (see Section 3.2 and Figure 10).[77] For example, Figure 15 illustrates the sensitivity of the Vsi− PL to temperature.[77] This technique has also been exploited for both magnetometry[77, 140] and thermometry[133] applications.

a) Spectrum of the lock-in detection of Vsi− PL variation taken at different temperatures, with a sharp luminescent resonance observed at the ground state anti-level crossing between ms = −3/2 and ms = +1/2 (GSLAC-2) at BG2 = 1.25 m. b) Plot of the in-phase lock-in voltage component, Ux, and quadrature lock-in voltage component, Uy, at different magnetic fields, increased in sub-μT increments every 125 s. The horizontal line denotes the Uy mean value. Reproduced with permission.[77] Copyright 2016, American Physical Society.

SiC is quickly emerging as a versatile material for quantum sensing applications, with advantageous properties including optical fiber-relevant emission wavelengths, processability, and the ability to operate at room temperature in ambient conditions. Research focused on integrating SiC color centers into devices,[125, 141] improved material processing,[129, 132] and the discovery of new optically active vacancy centers[106, 142] should further expand the utility of SiC centers in energy-based sensing applications.

The line-like emission bands from lanthanide ion f-orbitals, which are shielded from the environment by their outermost filled s and p orbitals, have long been valued in luminescent applications such as biological imaging, lighting displays, and sensing.[143] Their narrow emission bands and high degree of shielding has led to recent exploration of lanthanide ions doped into solid crystals for quantum applications, and while these transitions are parity forbidden in free space, they become weakly allowed when dispersed in a crystal field, particularly at cryogenic temperatures.[98, 144] Despite their relatively weak emission,[98] the long nuclear spin relaxation and coherence times of rare earth ions,[98] coupled with relatively facile synthetic strategies,[144] have spurred significant interest in developing rare earth-based quantum materials. While not extensively evaluated for sensing applications, a range of rare earth-ion doped solids have been produced, including europium and erbium-doped yttrium oxide,[145] europium-doped yttrium silicate (Y2SiO5),[146] samarium nickelate,[147, 148] cerium-[149] and praseodymium[150, 151]-doped yttrium aluminum garnet (YAG), thulium-doped lithium niobite,[152] and others.[144]

How to grow flawless/IF diamonds! And Quantum Computers

Similar to color centers in diamond and silicon carbide, rare earth-doped solids can exhibit ODMR (Figure 16), suggesting potential utility in sensing applications. Optical readouts of individual REEs have been demonstrated in Ce(III)-,[149] Pr(III)-,[150] and Tb(III)-doped YAG,[153] Nd(III)-doped niobium,[154] Eu(III)-,[155] and Yb(III)-doped Y2SiO5,[156] among others. Quantum sensing of external magnetic fields has been demonstrated for 151Eu3+ doped into Y2SiO5,[146] and the sensitivity of optically measured level anticrossings to external magnetic fields has been studied in holmium(III)-doped 7LiFYF4,[157] providing early promise that rare earth ion-doped solids may have utility as quantum sensors.

ODMR signal of a single Ce3+ ion (in YAG) as the frequency of microwaves (MW) is swept across the spin resonance. Inset shows level structure of the ground-state and the optically excited spin doublets. The red line is the Lorentzian fit to the observed signal. Measurements were conducted at ≈3.5 K. Reproduced with permission.[149] Copyright 2014, Springer Nature.

Perovskite materials of the general formula ABO3, where A and B are metal atoms, are commonly used in a variety of classical sensing applications.[158] An interesting subclass of perovskites (such as metal nickelates) have unfiled d-orbitals in which a single electron influences the behavior of surrounding electrons. These strongly correlated quantum materials exhibit unique, complex properties that have utility for sensing applications, particularly in harsh conditions.[159] Sensing using these perovskites involves manipulating the material's electronic orbitals under applied bias, which can be explained by semiclassical theory. For example, biomolecules[148] and electric fields[147] can be detected by exploiting the strongly correlated electronic system of samarium nickelate (SNO). Under negative bias in the presence of protons, hydrogen intercalates into the SNO lattice (HSNO, Figure 17a), accompanied by electrons formed at the counter-electrode. The incoming electrons modify the Ni 3d orbital, creating half-filled eg orbitals with electrons that are localized due to strong Coulombic repulsion (Figure 17b), which in turn leads to an increase in resistivity in response to external electric fields (Figure 17d).[147] This work was extended to biological systems, where SNO was functionalized with a glucose oxidase enzyme. In the presence of glucose concentrations as low as ≈5 × 10−16 m, a spontaneous proton transfer in the presence of the enzyme took place, again increasing the resistivity of the HSNO.[148]

Quantum dots

a) Schematic illustrating electronic field sensing using the strongly correlated quantum perovskite SmNiO3 (SNO). Schematics of the electronic structure of Ni 3d orbitals in b) hydrogenated and c) pristine SNO. The electrons become localized in HSNO owing to the strong Coulomb repulsion in doubly occupied eg orbitals above the t2g orbitals. U represents the on-site electron–electron correlation. d) The experimentally observed change in the electrical resistance (ΔR) for bias potentials from 0.5 V to 5 mV, with error bars representing standard deviations. The dashed line is a linear extrapolation estimate of the resistance change beyond the experimentally measured window. This measurement range spans the bioelectric potentials generated by different maritime vessels and several marine animals, as marked. UUV denotes an underwater unmanned vehicle. Reproduced with permission.[147] Copyright 2019, Springer Nature.

Single photon emission has been reported for atomically thin transition metal dichalcogenides (TMDCs, e.g. MoSe2),[160] 2D materials including hexagonal boron nitride (hBN),[161, 162] colloidal quantum dots (such as ZnS[163] and InP/ZnSe[164]), and stacked, semiconductor quantum dots such as InAs.[165] While a variety of “quantum dots” with high-performance emission properties have been developed, for QIS applications the quantum dots must exhibit the controlled emission of a single photon per light pulse, produce photons in an indistinguishable quantum state, and emit with high efficiency.[166, 167] While single photon emission has been achieved with multiple quantum dot systems, epitaxially grown quantum dots such as InAs in GaAs have, to date, been the most widely used for single photon applications.[167] A lattice-mismatch strain-driven approach is used to spontaneously form monodisperse islands with quantum confinement in three dimensions, yielding single photon emission within the telecom band.[168] Together, single photon emitting-quantum dots represent another emerging quantum material with potential for sensing applications.[6, 37, 125] Similar to the rare earth-doped solids (vide supra), reports of quantum sensors based on quantum dots are limited. However, room temperature ODMR has been observed for boron vacancy sites in hBN with S = 1 spin, indicating that these materials may be deployed as optical sensors analogous to color centers in SiC and diamond.[161] Additionally, it has been demonstrated that the emission properties of hBN are sensitive to external magnetic fields, further reinforcing the promise of hBN as quantum magnetometers.[169] Further, the hBN single-photon emission is sensitive to both external electric fields[170] and strain from acoustic waves,[171] suggesting that this material may be suited for electrometry and strain sensing applications (Figure 18).

hBN single photon emission measurements (dots) and fits (curves) with (orange) and without (blue) an applied external field. A Stark shift of 5.9 ± 0.6 nm is recorded. Reproduced with permission.[170] Copyright 2019, American Physics Society.

In addition to hBN, a mechanical motion sensor based on InAs quantum dot single-photon emission has been developed. Here, charged InAs quantum dots are coupled to a mechanical resonator, such as a tuning fork. In the presence of a magnetic field, the InAs emission bands shift in response to strain induced by vibrations from the mechanical resonator.[165] In addition to mechanical force, shifts in the InAs optical features in response to an AC field have been reported,[172] as well as Zeeman effect splitting in response to an external magnetic field.[173] Greater control over the ability to both manipulate and monitor the quantum states of InAs quantum dots should therefore yield a new sensing platform for magnetometry and electrometry.

2.2.3 Emerging Quantum Material Platforms

While most quantum sensing studies have predominantly relied on vacancy sites in diamond and SiC, new material platforms are emerging that, once fully developed, may show utility for quantum sensing in energy-relevant applications. Qubits, including quantum dots,[174] nuclear spins,[175] and electronic spins,[176] have been studied in silicon matrices, which are used extensively in the production of electronics. However, the quantum properties of these materials have only been characterized at temperatures of ≈4 K or lower. Single photon emission has recently been characterized in carbon nanotubes, which are already frequently used in imaging and sensing applications and have telecommunications-relevant near-infrared emission properties.[125, 177] The integration of qubits into MOFs is also a promising avenue for spatially controlling the arrangement of qubit arrays. MOFs are highly organized, porous, crystalline structures comprised of metal cluster centers connected by organic molecular linkers, and have been extensively used for sensing applications,[178] including in energy-relevant areas such as the detection of gases,[179] pH,[180] and rare earth ions.[181] Hence, the integration of qubits into the MOF structure may pave the way for the quantum sensing of analytes taken into the MOF pores, thereby enhancing existing MOF-based sensing technologies.[182] Several MOFs to date have been designed with highly ordered qubit arrays based on cobalt (II),[183] copper (II),[182, 184] and vandyl[185] spins, representing an intriguing class of next-generation quantum materials (Figure 19). Beyond the development of new quantum materials, another emerging area of research couples quantum emitters with plasmonic materials (e.g., quantum plasmonic sensing), which promises significant improvement in measurement sensitivity.[186] Further advances in the development of both quantum sensing materials and strategies are anticipated to expand the accessibility and efficacy of quantum sensing technologies for energy-relevant applications.

Schematic illustrating an array of qubits embedded within an MOF. The organized MOF structure provides excellent spatial control of the qubit position. The qubits may interact with analytes of interest (gas, ions, etc.) entering the MOF pore. Reproduced with permission.[183] Copyright 2017, American Chemical Society.

2.2.4 Summary of Quantum Sensing Materials and Potential Energy Applications

The quantum materials described here can be envisioned for fossil energy applications, including the continuous measurement of variables such as pressure, temperature, and pH/corrosion around energy infrastructure (e.g., pipelines, storage areas, wells, etc.). Summary in Table 3 highlights the readout techniques, materials, and sensing targets of different quantum sensing techniques and materials, showing the exciting potential of quantum material-based sensors. For example, corrosion in oil and gas pipelines causes billions of dollars in losses annually, and is also a safety and environmental hazard.[187] Among many potential readouts, quantum sensors for pH and/or ions can provide an indirect early sign of corrosion, enabling corrective action to be taken prior to infrastructure degradation.[187] Electromagnetic sensors have also been employed to monitor corrosion within pipelines.[187] Highly sensitive electric and magnetic field sensing can be achieved using a wide range of quantum materials and techniques, creating a strategy for potential sensor development in this area. As discussed in Section 2.2.2 (ii), temperature sensors are used to monitor the performance of transformers and powerlines, as well as explosion risks in coal mines and oil processing facilities, which are all essential considerations for fossil energy.

The detection of magnetic ions may also be useful for monitoring corrosion in real-time and in the sensing of critical metals such as rare earth elements (REEs), which are found in coal, fly ash, and acid mine drainage.[181] Commercially available quantum gravitometers and atomic clocks are already being used for oil and gas exploration, as discussed in Section 2.4. The integration of quantum materials into sensing-relevant platforms and devices (discussed in the next section) represents an exciting new frontier in the development of high-performance field-deployable sensors for energy applications and beyond.


2.3 Quantum Photonics Sensing Devices and Platforms

2.3.1 Quantum Photonics Platform

The quantum photonics field, while sharing similarities with electron-based quantum physics, has unique features arising from the photon nature of light. These features offer new and exciting avenues for manipulating fundamental physical processes such as photon absorption and emission. The electromagnetic wave itself is not a quantum wave since the quantum wave must be complex. However, representing the field in the second quantized notation as where the (creation) and  (annihilation) operators are complex quantities allow us to access a vast array of nonlinear and light–matter coupled Hamiltonian treatments. As an example, the Jaynes–Cummings Hamiltonian

where 

 are raising and lowering atomic operators leads to nontrivial light–matter coupling directly manifested via observed Rabi oscillations.

Several methods for manipulating solid-state emission have emerged over the years. Each of these approaches has different critical properties, such as achievable enhancement, outcoupling efficiency, collection efficiency, and ease of fabrication, which can complement each other in the future for integrated quantum computing and sensing platforms.

One appealing platform for quantum sensing and computing is a photonic crystal. This device class mimics the solid-state system; however, unlike atomic crystals, the unit cells can be tailored according to specifications for optimized performance in desired applications. Specifically, the classical Bloch equation adopted for magnetic fields, leads to a reciprocal space and the appearance of the bandgap at the unit cell boundaries. A number of geometries can be realized such as simple dielectric layered structures for 1D crystals,[189] periodic rods or hole arrays for 2D geometries (Figure 20),[190] and inverse opal and Yablonovite for 3D realizations.

Energy level structure of a 2D air rod crystal. Reproduced with permission.[188] Copyright 2008, Princeton University Press.

The most attractive feature of photonic crystal devices is the ability to engineer the photonic bandgap, which physically implies the prohibition of light propagation for a specific band of frequencies. In this way, 2D and 3D cavities with a very high-quality factor (Q) can be designed and optimized for a particular color center spectrum. Furthermore, a localized defect state with emission frequency within a bandgap can be created in the photonic crystal lattice. The combined effects of high Q and mode localization result in a large emission rate enhancement via the Purcell effect, 

Several photonic crystal structures utilizing diamond color centers have been developed (Figure 21). Very high quality factor 

 diamond cavities optimized using a reactive ion etching (RIE) plasma process have been demonstrated.[192] Additionally, a pathway to fabricating high Q cavities using monolithic diamond has gained significant research attention.[193] Alternatively, scanning of the cavity position relative to the emitter allows for evanescent coupling of the NV center to the photonic crystal.[194] Enhancement of emission by photonic crystal defect design and mode volume reduction has also been proposed.[195] Further, efficient integration with external waveguides as well as NV center coupling via photonic crystal waveguides has been realized,[196] and path entanglement using photonic crystal waveguide cavities in a diamond Si vacancy system has been demonstrated.[197]

Diamond photonic crystal cavities: a) Narrowband transmission of transverse magnetic (TM) and transverse electric (TE) cavity modes. Reproduced with permission.[192] Copyright 2014, Springer Nature. b) Energy density for fundamental photonic crystal cavity mode in cross section and c) in plane. Reproduced with permission.[193] Copyright 2010, American Chemical Society. d) Monocrystalline diamond photonic crystal cavity. Reproduced with permission.[194] Copyright 2012, American Physical Society.

Another avenue for integrated sensing modalities is through the use of solid dielectric high Q cavities. Several methods relying on whispering gallery modes such as spherical, ring and disk resonators as well as Mie nanoparticle resonators have been developed in the past decade.[256-259] The cavities allow for the creation of the dressed states 

, coupled matter and photon number  states with highly attractive properties for quantum applications. Their interaction causes energy level splitting, defined as vacuum Rabi splitting

One of the most attractive applications of these states is the potential to use them as single-photon controlled gates, which are required for quantum computing. Photonic cavities allow the direct achievement of strong coupling in the quantum electrodynamics (QED) regime, but ensuring high Q confinement, particularly at optical frequencies, is a major engineering challenge.

In the context of diamond photonics, NV center QED Rabi frequency splitting has been demonstrated using silica microsphere resonators with diamond nanocrystals (Figure 22).[198] In a separate work, microdisk resonators were shown to exhibit QED path interference effects with diamond crystals placed on the surface of the cavity.[199] In the microwave domain, Kubo et al.[200] have experimentally shown hybrid NV center/Josephson junction superconducting cavity quantum circuits that take advantage of the long coherence time of solid state qubits.

a) Geometry of the high-Q whispering gallery mode spherical cavity. b) Demonstrated Rabi frequency splitting. Reproduced with permission.[198] Copyright 2006, American Chemical Society. c) Microdisk coupled to external nanodiamond emitter. Reproduced with permission.[199] Copyright 2009, The Optical Society of American.

Another manifestation of strong photon-matter interactions, electromagnetically induced transparency (EIT), has shown great promise for applications in quantum sciences. Here, quantum eigenstates of a multilevel solid-state system (a Λ system) under destructive interference of an external field collapse onto the state with zero amplitude for one of the states of the bare system. Hence, the quantum mechanical interactions and the corresponding absorption associated with this state can be canceled. For example, all-optical field sensing using EIT states in diamond has been demonstrated.[201]

"Quantum Sensing: Probing biological systems in a new light’", presented by Peter Maurer

2.3.2 Platforms for Engineering of Local Spontaneous Emission

Spontaneous emission, while deceptively simple and commonplace, is an intrinsically quantum process. Viewed from the Weisskopf–Wigner theory using the interaction Hamiltonian, spontaneous emission leads to an exponential decay between the ground and excited state. The possibility of having cascaded and coupled superradiant emission makes the physics behind it even more nontrivial.[202] Modification of the local density of states of spontaneous emission is a promising route to access and engineer this phenomenon.

Hyperbolic metamaterials (HMM) offer a new and exciting way for manipulating nanoscale emission.[203] In these structures, emission enhancement comes from a divergence of dispersion relation at selected k vector directions. These types of materials can be described by the anisotropic effective permittivity and permeability parameters 

The dispersion relation takes a form of and diverges when and are of the opposite sign. Most commonly the sign reversal can be achieved using metal/dielectric layered materials or structures with metal rods embedded in the dielectric matrix (Figure 23).

Isofrequency surface of hyperbolic metamaterial: a) NV center on the surface of HMM. Reproduced with permission.[203] Copyright 2013, Springer Nature; b) reduction in emission lifetime on the surface of HMM. Reproduced with permission.[204] Copyright 2014, Wiley-VCH GmbH and Co. KGaA, Weinheim.

The concept of emission from dipole surface emitters using the HMM concept has been shown.[205] Additionally, enhanced emission from nanodiamonds has been demonstrated in semi-infinite HMM slabs.[204] An 80-fold enhancement of the emission rate in patterned HMM structures has been observed.[206] Furthermore, HMM permits the realization of super-resolution imaging as the dispersion relation yields a nonvanishing tangential component of the wave vector at the near field focus.[207]

Plasmonics, combined matter-photon electromagnetic modes, over the past few decades have found numerous applications in applied science.[208] Primarily applicable to the visible and infrared parts of the energy spectrum, plasmonics is fundamentally linked to metal-dielectric interface interactions. More specifically, the difference in sign between dielectric/metal permittivity leads to a resonant condition where the local electric field is amplified, yielding an enhancement of a variety of physical processes, such as spontaneous emission. Additionally, this creates a condition for confined plasmon polariton waves propagating along metal dielectric interfaces. In relation to quantum science, both local plasmons and plasmon polaritons have shown promise as effective mediums for enhancing, manipulating and modifying light.[208]

Particularly, enhanced coupling of radiative emission to plasmonic nanowires has been demonstrated both for quantum dots and nanodiamonds (Figure 24).[209, 212] In a separate work, manipulation of free space light propagation through plasmonic arrays has been shown.[213] Further, plasmon-enhanced quantum sensing beyond the classical limit has been experimentally observed.[214] Remote entanglement of quantum emitters mediated by surface plasmon polaritons has additionally been shown,[210, 211] and quadrature squeezing of the plasmon modes using an external electron paramagnetic resonance (EPR) source has been demonstrated.[215]

a) Quantum dot emission coupling to plasmon polariton modes on a silver wire: Reproduced with permission.[209] Copyright 2007, Springer Nature. b) Quantum plasmonic networks realized using distant thin silver films. Reproduced with permission.[210] Copyright 2016, The Optical Society (OSA). c) Remote entanglement using a grooved plasmonic waveguide. Reproduced with permission.[211] Copyright 2011, American Physical Society.

Another intriguing avenue for quantum sensing and single photon emission would be the engineering of density of states in the deep nearfield of a quantum emitter. The singularity of dipole emission offers the potential for drastic manipulations of spontaneous emission which can be made extremely sensitive to minute perturbations of the local environment. Figure 25 illustrates an example of the perturbation of an electric dipole emitter near field geometry used to enhance sensitivity to local permittivity changes.[216]

Snapshot of the logarithm of the magnitude of electric field components during emission with dielectric structure traced as a contour: a) Ez with dents for an out-of-plane dipole; b) Ez without dents for an out-of-plane dipole; c) Ex with dents for an in-plane dipole; d) logarithm of the ratio of the electric fields depicted in (a,b). Spatial dimensions are given in nanometers. Reproduced with permission.[216] Copyright 2018, Wiley-VCH GmbH, Weinheim.

Mika Prunnila, VTT, Finland: Quantum Technologies is Novel Microelectronics

2.4 Existing Commercial Quantum Sensors and Applications

2.4.1 Gradiometers

Gradiometers are among the most mature quantum technologies currently available for energy-based applications. Semiclassical/quantum microgravity sensors are being developed at lab scales[15, 217] and are being commercialized by companies[218] that deliver unprecedented levels of measurement, reliability, and accuracy for underground survey applications. Opportunities can also be realized by using atom interferometry in gravity survey applications.[14] This type of sensor is a highly sensitive quantum gravimeter capable of detecting gravitational fields through the material's density contrast (Figure 26).[14, 16] It is portable for outdoor usage, and has an increased sensitivity level enabling higher resolution of either smaller or deeper features compared to existing instruments, such as Scintrex CG-5 or CG-6 gravimeters.[219] In geophysical surveying for mineral and petroleum exploration, most of the currently available sensors severely suffer from unwanted spatially and temporally varying noises that obscure signal from the target of interest. Boddice et al.[220] analyzed data from commercially available instruments based on cold atom quantum technology in a gradiometer configuration and compared that with measurements obtained using classical gravimeters. They found that quantum gravimeters were 1.5 to 2 times more effective than classical gravimeters in the ability to detect small buried features, and that quantum gravimeters could measure at greater depths.

The noise cancelling effect of the gradiometer configuration in comparison to a conventional gravimeter when measuring the same buried signal. Reproduced with permission.[14] Copyright 2017, The Royal Society.

2.4.2 Quantum Light Detection and Ranging (LiDAR)

Quantum LiDAR is another technique undergoing commercial development for fossil energy applications, including oil discovery, gas leak detection, and carbon emissions studies.[221, 222] During LiDAR operation, pulsed light is sent toward a target, and some portion of the impingent light is scattered back to the LiDAR detector. By recording the time between the laser pulse and detection across an array of detector pixels, it is possible to determine the distance the scattered light has travelled, enabling 3D mapping of the terrain. In quantum LiDAR, information is obtained by the detection of single photons (as opposed to conventional LiDAR, which requires detection of multiple photons), which allows for efficient, rapid 3D mapping with higher resolution relative to conventional techniques (Figure 27). Companies including SigmaSpace,[223] Quantum Light Metrology,[224] and ID Quantique,[225] among others, have developed commercially available high-resolution, single photon LiDAR sources for terrestrial mapping applications and the detection of small gas molecules, even in harsh environments such as sub-sea.

a) Countywide digital elevation model and b) canopy model from single-photon LiDAR at 2 and 1 m spatial resolution, respectively. The enlarged area shows 3D topographic details along a) river valleys and ridges and b) in forest structures. The grey patches represent nonforested landcover classes (e.g., agriculture, water bodies, and developed land). Vertical lines in the canopy height map represent tiles with processing errors. Figure produced using ESRI software (ArcMap and ArcScene v 10.1 www.esri.com). Reproduced with permission.[222] Copyright 2016, Springer Nature.

2.4.3 Chip Scale Atomic Clock

Oil and gas exploration in harsh environments may also benefit from highly accurate atomic clocks, which measure time using the frequencies of well-defined atomic transitions in elements such as cesium or rubidium (Figure 28a).[226] Atomic clocks can be made sufficiently small for integration into devices (volumes as low as 16 cm3 with power consumption less than 120 mW), dubbed “chip scale atomic clocks” (CSACs), and are used in a range of applications, from oil exploration to national defense.[226] For example, the company Microsemi has developed CSACs for underwater oil exploration using a technique known as ocean bottom seismometry (OBS). Here, underwater mapping is achieved by placing sensors onto the ocean floor. Sound waves are then sent from an above-water ship at various angles and the seismic activity resulting from these waves is detected by the sensors, which time stamp the seismic response. With highly accurate timekeeping, a 3D map of the underwater area may be obtained from the sensors recovered from the ocean floor. As a consequence, CSACs have emerged as an exciting commercial quantum sensor for oil and gas exploration.[226]

Example of the NIST atomic clock illustrating (left) packaging arrangement, (center) individual components used in the set-up and (right) chip-scale atomic clock. a) Schematic of a system designed to detect photosynthetically active radiation (PAR) from artificial lighting systems. Reproduced with permission.[226] Copyright 2018, Institute of Electrical and Electronics Engineers. b) Filters are used to isolate wavelength ranges of interest, and PAR is monitored using silicon photodiodes (SiPDs). Reproduced with permission.[227] Copyright 2018, Society of Photo-Optical Instrumentation Engineers (SPIE).

2.4.4 Photosynthetically Active Radiation Sensors

PAR sensors are a relatively mature quantum technology that measure the photon flux density in the visible light range (400–700 nm) suitable for photosynthesis, typically using a silicon photodiode detector (Figure 28b).[228] Several companies, including Li-COR, Skye Instruments, and Apogee Instruments manufacture commercially available PAR sensors. PAR sensors have been used in multiple bio-based CO2 mitigation studies, which is a topic of significant interest to the fossil energy industry as companies try to monitor and mitigate greenhouse gas release.[229]

2.5 Quantum Enhanced Sensing Modalities for Fossil Energy Applications

Fossil energy applications are characterized by several constraints, including the requirement for distributed/remote measurements and harsh operating conditions. Particularly appealing for distributed quantum sensing applications are sensors based on the Hong–Ou–Mandel effect, since they can be readily integrated into an optical fiber platform. Additionally, the medium in the gap of the Hong–Ou–Mandel beamsplitter can be designed to be sensitive to a wide array of external physical and chemical stimuli that yield corresponding refractive index change. Due to the requirement of the beam splitter to produce a unitary transformation, two photons simultaneously incident on the beam splitter inputs can only result in bunched photon outputs, i.e with  states effectively canceling at the output. This phenomenon has recently led to the development of a range of fiber optic and integrated sensing device platforms.[230] The Hong–Ou–Mandel sensor readout method via coincident photon detection provides superior noise immunity, which is important in harsh environment sensing conditions. Separately, sensors based on nanodiamond fiber-optic integration have been developed and can be deployed in a range of sensing modalities.[231] Magnetic field sensing for subsurface applications as well as in-situ remote NMR are attractive possibilities for this type of sensor. Quantum sensors based on squeezed/nonclassical light for noise reduction also carry great potential to benefit the fossil energy industry, permitting sensing beyond classically imposed limits.[232]

Conventional sensing techniques may similarly be enhanced by quantum processes and/or materials, and there is potential to deploy quantum-enhanced classical sensing techniques for fossil energy-relevant applications. One emerging area of interest is to use quantum states of light to improve the sensitivity of plasmonic sensors, dubbed “quantum-enhanced plasmonic sensing” (see Section 2.3.2).[233] Plasmonic sensors have been developed for fossil energy-relevant applications to sense gasses,[234] temperature,[235] and pH[236] under harsh conditions. The use of quantum-enhanced plasmonic sensing techniques is a path toward improving the performance of these existing sensors. Similarly, the sensitivity of Fabry-Perot interferometers, which have been used to sense a range of parameters[237] including temperatures, gases, pressure, and others can be enhanced using squeezed vacuum states.[238] The performance of conventional LiDAR,[239] which is used in oil and gas discovery and for monitoring pipeline integrity (Section 2.4.1), may also be enhanced.

Unbelievable: See How Diamonds Can Revolutionize Quantum Computing

3 Further Opportunities and Challenges of QIS in Energy Applications

3.1 Global Quantum Initiatives: New Opportunities

A number of research centers and their physics, chemistry, engineering, and material sciences departments around the world are broadly pursing QIS as a major research and development direction. This has significantly increased outcomes in the field and has also elicited, at a policy level, the continuous distribution of financial support to achieve realistic goals for implementing QIS successfully in high-priority areas like security and safety, which should support future innovations in QIS, and, by extension, the development of viable quantum sensors.

In September 2018, the U.S. National Science and Technology Council (NSTC) issued The National Strategic Overview for Quantum Information Science that identified six main policy opportunities:[240] i) choose a science-first approach to QIS, ii) create a quantum-smart workforce for tomorrow, iii) deepen engagement with the quantum industry, iv) provide critical infrastructure, v) maintain national security and economic growth, and iv) increase international cooperation. A budget of $1.3B was announced for 2019–2023 to initiate and strengthen research addressing these opportunities. The UK National Quantum Technology Programme (NQTP) has taken a step toward developing that nation's capabilities for establishing a new sector in future QIS technologies.[241, 242] The UK government formed four multi-institutional, multi-investigator, challenge-led, and focused research programs, or “Hubs,” comprised of academics, industries and government partners, investing £385M. This effort identified a span of four areas in which quantum capabilities could potentially impart a significant impact: imaging, ultraprecise sensors, secure communications, and new quantum concepts for computers.[241] In 2018, the European Union (EU) announced a ten-year flagship-scale program combining education, science, engineering, and innovation across several EU member states to explore the potential of quantum technologies.[243] This initiative has an investment of €1.3B over ten years and has identified four main pillars: communication, sensors, simulation, and computers. Germany has invested €650M from 2018 to 2022 to stimulate quantum technology development and commercialization on top of the flagship-scale program. China has generated more than $987M in research funding from central and local governments over the past ten years.[244] In 2016, China launched the “Quantum Control and Quantum Information” National Key Research and Development (RandD) project, and within the past three years it invested $337M. Similarly, Japan launched a new initiative in 2018 called Q-LEAP with an initial funding of $200M.[245] This initiative focused on developing three pillars—quantum simulation and computation, quantum sensing, and ultrashort pulse lasers—over ten years. Canada[246] and Australia[247] are also creating similar initiatives on QIS with an investment of $1B by Canada in quantum research.

In addition to these examples, government-level interest in QIS in other countries, such as Russia and India, is growing and significant effort has been devoted toward bringing quantum stakeholders, companies, and academia together along this line. These efforts are creating new frontiers in QIS and are making an impact on sensor, computer science, cryptography, and communication technologies. Taken together, these efforts in conjunction with industrial investments will provide physical and human capital for continued growth in the research, development, and deployment of quantum sensing technologies globally.

3.2 Future Directions for Quantum Sensors and Sensing Materials

As outlined in this review, QIS is poised to have transformative impacts within the energy sector, and near-term effects are anticipated with further developments in quantum sensing. Indeed, as outlined in Section 2.3, several mature quantum commercial sensors are already in use for fossil energy applications, and this trend is anticipated to continue. Specifically, quantum LiDAR is sufficiently developed to be deployed for monitoring gas leaks along pipelines, a key aspect of energy security. Quantum LiDAR has additional utility in fossil energy applications including monitoring greenhouse gas emissions and even oil and gas discovery.[221, 222] Similarly, commercially available quantum gravimeters are available for both petroleum and mineral resource explanation in the field, and may also be useful for monitoring mining integrity.[14, 220] Finally, highly sensitive chip-scale atomic clocks (CSACs) are well-suited for underwater terrain mapping, which opens up avenues for deep-sea oil and gas exploration.[226] Several of these technologies have recently been discussed in the National Energy Technology Laboratory's Fossil Energy Quantum Information Science and Technology (NETL FE QIST) Workshop report for short-term areas in which QIS will impact fossil energy.[248] Continued reduction in equipment costs and improvements in sensitivity should lead to the expanded use of existing quantum sensing technologies for fossil energy applications.[248]

In the more immediate future (three to ten years), the NETL FE QIST report[248] anticipates the integration of quantum materials and techniques with existing fiber optic sensor platforms, which are widely used for monitoring parameters such as strain and temperature in power plants, reactors, pipelines, transformers, and other energy infrastructure.[22, 249] Operation under harsh conditions (temperatures > 250 °C, pressures that are several orders of magnitude higher than atmospheric pressure, etc.) are often required in such applications.[248] Quantum sensing materials including NV− centers in nanodiamonds or SiC may be integrated with optical sensing platforms (particularly optical fibers) for deployment to sense physical parameters including temperature and strain to monitor infrastructure health, or in waste streams to aid in the recovery of critical elements. As current sensor technology approaches the fundamental limitations of classical physics, target areas in which QIS may improve the signal-to-noise ratio of existing fiber optic sensors include: 1) improved laser performance, 2) improved efficiency in optical detectors, 3) optimization of optical fiber sensors, and 4) advances in data collection.[248]

As with any emerging technology, a key challenge is to translate quantum technology from a well-controlled research lab into more complex environmental systems, where a variety of factors may hamper sensor performance. As outlined in Section 2.2.2, while quantum materials such as NV centers and SiC show promise sensing a multitude of important variables (i.e., magnetic fields, strain, temperature, electric fields, etc.), the ability to practically isolate individual variables of interest is critical and remains a challenge for these materials, since they may be influenced by multiple variables simultaneously.[53] Another key step for sensor commercialization is the integration of quantum materials into rugged packaging capable of deployment in “real-world” systems. This is particularly true for energy applications, where harsh conditions such as elevated temperatures, high pressures, and high acidity are often encountered. Similarly, advances are needed in the material science front to enable the mass production of high-performance quantum materials at the lowest possible cost. Finally, continued integration of quantum materials with existing sensor platforms such as optical fibers will accelerate the development of commercial sensing products. The use of quantum processes to enhance the performance of classical sensors for operation in harsh conditions represents another important opportunity for research and development. Future horizons of quantum innovation can be broadly split into the categories of material science and device integration.

In terms of materials science innovations, the development of silicon has been considered to replace conventional superconducting qubits, which would put quantum technologies a step closer to mass deployment via integration with conventional CMOS processes.[250] In this direction, silicon quantum dots have been gaining significant research attention as potential spin-cavity QED coupled devices similar to transmon qubits.[251] Alternatively, 2D materials such as hexagonal boron nitride[252] present unique advantages from the point of deterministic positioning of quantum emitters. Indeed, the atomic crystal lattice can, in theory, be accessed to induce spatially localized perturbation during growth and nucleation.[253] Apart from color centers in diamond and silicon carbide, rare-earth ions have been gaining momentum as potential bulk quantum solid-state centers. Recently both optical emission and spin manipulation of a single Yb3+ ion have been demonstrated in photonic crystal cavities.[254] Although performed at cryogenic temperatures, this study represents an important proof of principle for rare-earth ions. Another emerging sensing platform could exploit topological photonic states, which are the surface states arising at the interface between two media with a different Chern number that occur, for instance, for materials with a broken time reversal symmetry.[255] These types of states can be used to transmit quantum information and sensor signals over significant distances, such as in energy infrastructure, without suffering scattering losses due to material and surface imperfections. Topological designs using magneto-optic photonic crystals and coupled resonators have been experimentally demonstrated.[256] Currently, this is an active field of research with many emerging approaches realizing topological photonic states immune to scattering.

An alternative quantum photonic subfield, cavity QED, carries great potential for future novel quantum phenomenon. By analogy with the Hong–Ou–Mandel effect, nonclassical cavity interactions with the environment can lead to new physics and applications that are highly applicable for energy purposes. As an example, the appearance of localized topological edge states in an array of coupled photonic resonators with interacting photon pairs has recently been reported,[257] as well as quantum gates developed using hybrid cavity/photon and NV centers.[258] The multiparticle entanglement required for scalable quantum computing can also be readily produced with cavity–QED interactions.[259] Expanding the number of available materials and platforms for quantum sensing will facilitate the rational design of sensor technologies tailored to the specific environmental conditions encountered in energy-based sensing applications.

In addition to these emerging sensing materials and techniques, the integration of quantum technologies with nuclear magnetic resonance (NMR) techniques represents another emerging area of innovation. Solid state NMR has developed into a formidable technique for in-situ catalysis research, providing a wealth of information about catalytically active sites, reacting molecules, and their interactions.[260] The hyperpolarization of nuclear spins in NMR spectroscopy holds great promise for ultrasensitive NMR testing, which would require only minimal sample amounts and would enable measurements to be performed even in poor contrast environments.[261] NV center hyperpolarization has received significant research attention.[262] Indeed, normally nuclear spins are very weakly polarized, which determines the limit of detection of conventional NMR techniques. Converting controllable electronic spin polarization from the NV center to surrounding and external nuclear spins can dramatically enhance NMR spectroscopy performance.[263] Improved NMR sensitivity would, of course, have broad applications across many scientific disciplines and economic sectors, including fossil energy. A range of NMR techniques are extensively used for characterizing liquid fuels,[264] oil and gas exploration,[265] studying carbon capture and carbon storage materials and mechanisms,[266] and monitoring and analyzing catalytic conversions of carbon dioxide,[267] to name just a few important applications. Thus, the development of enhanced NMR techniques would have extraordinary benefits for carbon dioxide mitigation studies and resource discovery and characterization. Continued exploration of quantum technologies with NMR-based platforms, in tandem with on-going efforts to develop field-deployable instruments,[268] is therefore a desirable direction of next-generation quantum sensor development.

3.3 Quantum Sensor Networking for Fossil Energy Applications

With the advent of QIS, quantum networking opportunities for safe and secure energy production, processing and delivery will surpass the opportunities offered by existing networking systems. These opportunities include:

  • Assessing actual data on global CO2 emission
  • Fossil energy infrastructure automation
  • Oil, gas, and electricity infrastructure build-out and planning[269]
  • Operational optimization of interdependent infrastructure

Modernization of the grid with “quantum grid,” building, transportation infrastructure and operation to support advanced energy supply

Quantum key distribution for security and reliability of energy delivery

As a long-term opportunity, quantum networking can be used to assess global CO2 emissions. Sources of CO2 emission such as coal-fired power plants, industries, ships, and vehicles fitted with quantum technologies can be connected to a complex communication network fed into powerful quantum processing units to monitor the level of CO2 emissions from individual sources. A complex network of quantum machines can be used to model CO2 emission from different sources across the globe, monitor the emission, and mitigate emissions in targeted areas using machine learning and artificial intelligence that can be significantly enhanced by quantum computing. In 2014, NASA first launched its orbiting carbon observatory-2 (OCO-2) to monitor CO2 emission in the Earth's atmosphere with a resolution of 1–3 km.[270] This was a computationally breathtaking problem, as achieving such a resolution scale required accurate and detailed models of landscapes and high CO2 emissions zones on the Earth's surface. With the advent of quantum computing, the landscape could be divided into cells, and each cell could be equipped with CO2 monitoring quantum technology, which could eventually be linked to the orbiter to assess overall global CO2 emission levels.

Quantum sensor networks require distributed nodes with discrete-variable and continuous-variable multipartite entangled states and complex sensing protocols. Quantum sensor networks exploit correlations between individual quantum sensors to enhance the sensing performance of the system for global parameters measurements as well as simultaneous multi-parameter measurements. An entangled quantum sensor network can enhance measurement sensitivities and the precision of multiple spatially distributed parameters. As demonstrated in Figure 29, quantum sensing and quantum networking can be integrated into both traditional and novel approaches to distributed optical fiber sensing to achieve unprecedented levels of performance and cost trade-offs, allowing for broader commercial deployment of distributed optical fiber sensing technologies for natural gas pipeline applications.[32] Entangled light distributed to remote locations via an installed fiber network can enable an increase in precision when estimating system parameters such as temperature, pressure, pipe corrosion, and gas concentrations. Through integration of quantum devices with advanced data analytics methodologies, the resiliency, reliability, security, and integrity of the wellbore for carbon storage and other subsurface applications can be improved.

A schematic illustration depicting representative energy applications. Inset shows a scalable quantum fiber-optic network for natural gas pipeline safety and integrity monitoring, which is reproduced from ref. [32] with permission. Copyright 2019, American Institute of Physics (AIP).

Conclusions

The discovery, production, transportation, and consumption of energy impacts nearly every aspect of society, and an attempt to meet the world's ever-evolving energy needs has driven unprecedented level of technological innovations. Hence, the energy sector will likely be among the first beneficiaries of the impending “quantum revolution,” as emerging QIS-enhanced technologies may be applied to ensure the safe, secure, and efficient use of energy resources. Indeed, quantum technologies such as quantum gravitometers, LiDAR and atomic clocks are already commercially available and in use for gas and oil exploration. Yet, a need still exists for advanced sensing instrumentation to ensure reliable, secure, and environmentally responsible fossil energy production and recovery through improved real-time monitoring of subsurface processes and the environment. Identifying high-resolution measurement and monitoring tools that are economical and portable are additional major needs in sustainable fossil energy development. Quantum sensing has the ability to facilitate resource discovery, monitor infrastructure integrity, and aid in greenhouse gas mitigation, which are all key concerns to the energy industry.

The potentially unprecedented sensitivity that may be obtained from various quantum sensor platforms may be deployed to quickly detect failings in a natural gas pipeline, for instance, preventing harmful gas leaks, or to monitor temperature in transformers to ensure their proper operation. Quantum sensing materials such as nanodiamonds can be used with optical fiber sensor platforms to obtain lower detection limits than the current state-of-the-art. In addition, optical fiber sensor interrogation methodologies may be provided with new tools for optimizing the performance of subsurface or natural gas pipeline sensing applications where multiple-km range is desired with sub-m spatial resolution.

While progress in QIS continues, several challenges exist for its implementation in energy technologies. Specifically, innovations in material science are needed to enable mass production of quantum materials, to develop materials sufficiently robust to function under “real-world” environmental conditions, and that can be integrated into practical platforms for commercialization and deployment. In addition, a gap exists between the capability of current QIS stakeholders and the needs to be addressed in the energy sectors. Enhanced collaborations between researchers working in QIS and energy communities will help in addressing specific needs in the energy sector using emerging quantum technologies. Hence, advances in QIS and energy sector performance are inextricably linked, where the multitude of potential benefits to the energy sector from QIS will help drive additional QIS-related research.

More Information:

https://onlinelibrary.wiley.com/doi/full/10.1002/qute.202100049#

https://spin.ethz.ch/research/diamond-lab.html

https://ethz.ch/content/dam/ethz/special-interest/phys/solid-state-physics/spin-dam/documents/publications/Publications2022/2022_eichler_matquanttech.pdf

https://arxiv.org/pdf/2306.02966.pdf

https://hongkunparklab.com/quantum-optoelectronics

https://www.quantum-machines.co/solutions/quantum-sensing/

https://www.bnl.gov/newsroom/news.php?a=221109

https://uwaterloo.ca/institute-for-quantum-computing/quantum-101/quantum-information-science-and-technology/quantum-sensors






Latest Images