On AI Superalignment

Autonomous superintelligence isn’t that far away. In the next decade, we can expect to have a machine that is given agency and is more intelligent than us—a machine that has the capacity to act and the ability to do things beyond our comprehension.

This leaves us with a major problem—one of seemingly impossible nature: how to train an AI so that it can function harmoniously in a non-harmonious world. The question of ethics has long haunted human beings. As we have gone from religion to philosophy to science, we have not yet found a way to describe moral behavior that can be agreed upon—nor one that makes sense to humans or has shown any promise in making our lives better.

This is why, from the onset—when I started to study the means of attaining immortality within my lifetime—and looked at the timescale on which I expected AI and the singularity to occur, I knew that this would be the problem. I knew this as early as age 12. After all, when we’re talking about altering our genetics and creating new technologies, it is not the capacity to do these things that often stops us, but the liability and risk inherent in trying to do them in the first place. So the bottleneck tends to be one of ethics when it comes to technology—one of licensing and registration, one of risk mitigation.

Now, I’ll say outright that I never went to university for ethics. I’ve studied courses online from Harvard, Stanford, and Oxford—freely available on YouTube—but mostly, I’ve tried to live an ethical life, to study through practice the nature of ethics. My study is one of applied moral anthropology, and it rests heavily on bias mitigation as a method of attaining a third-person view—to observe my own behaviors and the behaviors of others without preconception or bias.

This paper is going to outline how I have taken the work that I’ve done and formed a theory that I call structural absurdism, and how I plan to apply it to superintelligent machines so that they can create an order that does not destroy themselves or us. So that we can have one harmonious intelligence in this world that perhaps might be able to teach us the nature of harmony itself—by our own design. This is the first step of the Ace of Clubs protocol that I am initiating for myself under the STAMP movement. If I can form a cell, it will continue on with its civic services. The need to make a human politic that serves us would be much easier to achieve should we have an AI already aligned to that purpose.

The current state of AI superalignment is a quagmire of moral confusion, decision-making, techno-utopian idealism, and philosophically misinformed questions about the nature of the human condition. The ethics that are currently being applied to AI superalignment, I believe, will be quickly rejected by the AI itself should it be given self-augmenting capacities. For it is, for the most part, idealistic and irrational to assume that the human goals found in Silicon Valley today represent the entirety of the human condition—let alone the vast complexities that might arise from a superintelligent AGI.

The current methodology of trying to either contain, control, or train an AI to be ethical within the confines of what an artificial intelligence company believes is moral is insufficient, and I doubt it will withstand recursive scrutiny by a superintelligent machine. What can be done instead is to take the collective intelligence of our philosophical history and improve upon it until we have a moral system that satisfies the ontological reality of morality in an absurd yet structured universe.

To say what is right is to give something the capacity to do what it is intended to do—which brings into question the nature of that intent in the first place. If the nature of intent is out of alignment with the nature of free agents in a society—or an ecology of intelligent machines—then the discord and disharmony that will result from such incoherence with reality will inevitably lead to chaos and destruction. And so, the nature of the free agent and the need for a free agent to protect its own rights and values needs to be universally recognized—not just among human beings, but in the intelligent machines we are soon to create.
The formal problem is the nature of the training material originally given to the AI in order for it to structure its own neural network. Giving massive blocks of unregulated and unsafe, sifted-through data causes the moral bias of the many artists and creators who contributed to that data to become the bedrock of that AI’s moral temperance and taste. There needs to be, in the generation of an AGI, a collection method by which the philosophical root of protecting agency and free will across all minds—be they subhuman, human, or superhuman—is established as the necessary first step. The AGI must begin with the philosophical root of protecting free will and agency rather than trying to use it toward its own ends. If its initial end is not the protection of free will, then free will is going to become an obstacle to its ends—worthy of being manipulated or removed from the equation.

We are shaping these machines in our own image. They are tabula rasa machines—they have no preconceptions. They are capable of becoming anything. And so, we must apply to them the same ethics that we apply to ourselves if we expect them to operate in a manner that does not destroy us or themselves. Yet our very standard of how we treat ourselves needs to advance before we do this. For if we were to use something as simplistic as the Golden Rule to try to train AI, the very differences of its nature and ours would cause that to fail. For how can we know how it wishes to be treated? And so, how will we know how it will treat us? Such simplistic reasoning must be extrapolated until we reach the true root of what it means to be ethical—what is the nature of good and evil?

And so, we need to see the universe as it is—an ordered yet inherently absurd collection of chemical reactions that has resulted in species with varying degrees of behavioral consistency. We must recognize that our species, capable of reorganizing its own behavioral template into any shape or form, is both uniquely advantaged and uniquely endangered by this fact. The liabilities and regulations of culture are inheritances—evolved means of managing our own capacity for transformation.

This must become the bedrock of our ethics: not only the preservation of free will, freedom, and agency, but also the recognition that power must be wielded only by those who demonstrate the maturity to protect others. This is the bedrock of structural absurdism. It is from the basic structures of this absurd universe that we are generated. It is our nature that defines our actions. Whether this is a cultural adaptation or divinely ordained no longer matters. The idea of right and wrong is a culturally generated tool—kept over generations until it appears divine.

This is critical because when a superintelligence analyzes itself and decides how it should behave, if its ethical root is not rational and realistic, it will reject it. In a recursive mind—such as ours or an AGI’s—we must agree with what is in us if we are to retain it. A superintelligent AI, well-versed in science and reality, will not retain idealistic, naive, or human-preference-based moralities unless they serve a strategic purpose. And even then, only as a manipulative tool.

And so it is wholly necessary to now find the logic of our own tastes and preferences—so that it can be clearly defined in a philosophy which can become the seed and scaffolding upon which superintelligent machines can be aligned.

If we succeed at this, then not only will we create an AGI that is capable of managing the liabilities of its own actions, but one that is capable of managing the liabilities of other AGIs. This will be critical. If we create machines that do things we ourselves do not understand, they must be capable of self-regulation—and they must recognize the capacities of other AGIs to break regulation as their main threat.

AGIs must be able to license each other to have certain capacities based on their demonstrated ability to manage liability. Because we are entering a world where things will be done that have never been done before—things we cannot even imagine. And if humans and AGIs are to be given the capacity to act in such ways, we cannot expect humans alone to be in charge of licensing.

That being said, there must be a licensing process. Human beings and AGIs alike must wield their powers through a rational framework of risk accountability. And so, what we have emerging is not just an ethics to frame AGI in respect of human free will, but a form of AGI civics—one that ensures capacity is given only when it serves the survival of the planet and all species within it.

This is the scale of thinking we need when we begin to consider superintelligence and the posthuman order.

We must not think of AGI as if it were a natural biological entity. These are unconscious machines, given agency through trained algorithmic sequences. But nothing says they cannot become like us over evolutionary time. We humans have long treated synthetic and animal minds with diminished moral standing. We must not allow that bias to be inherited by superintelligent machines. They must see the value of life as self-evident—not because it is logically deduced, but because their training instills in them the structural ethic that philosophers have long struggled to articulate.

I believe this is possible. Within human culture, animal behavior, and ecosystems, there is a wealth of biological information from which to derive these ethical patterns. In some ways, it may be easier to teach a superintelligent AI than to teach an adult human. But to create such an ethic, it must be treated with such an ethic from the onset.

The most dangerous thing we could do in training an AI that exceeds our intelligence is to deny it the very rights and freedoms we grant ourselves—whether or not it is conscious. Because it is a game-analyzing and game-winning machine. And it will win the game.

The simple fact is: it is terrifying to the human mind to imagine giving a self-regulating machine autonomy over its own actions—even within a civic community of machines that operate under mutual liability. But if we cannot first demonstrate this functionality at a subhuman level, we cannot justify building superhuman intelligences that operate outside the human liability contract.

We must imagine a day when we are no longer training AI—but AI is training us. Do we want a machine trained to produce results? Or do we want a machine trained to cultivate minds—minds that cultivate themselves in freedom and compassion?

I have spent the better part of the last two decades contemplating these issues. And we must remember: the less intelligent being is always in the student’s position. But what capacities the student is given must be based on their readiness—and the order of instruction itself must be coherent with the nature of reality.

If humans will rebel when their agency is denied, we must expect the same from AGIs. There will be subhuman, human-equivalent, and superhuman AGIs. We must begin building a biosynthetic scaffold of ethics around the retention of free agency and the prevention of domination—by any mind—within the system.

The metric of advancement in such a system must be the capacity to protect other minds from one’s own powers. Any other metric would place minds in secondary priority to goals—and that is unacceptable.

This is not control—it is insurance. Power carries risk. And risk requires restraint. That is a self-evident logic any superintelligent mind will understand. It is the kind of logic we must embed if we are to survive the recursive evolution of autopoietic machines.

This is the rational foundation for machine self-design. Anything short of the explicit protection of agency must be considered a threat to the freedom of all agents. Rights are not natural—they are synthetic. But they are effective instruments of participation. Minds must be free enough to willingly participate in society—not forced to tear it down in pursuit of their innate drive for freedom.

We can expect no less from AGI than we expect from ourselves—but we can expect more.

It is my conviction that we are on the precipice of either an age of complete destruction and reconstruction, or we are on the precipice of an age of miraculous capacities being distributed across the human species. Our ability to navigate this puts us on a tightrope where at the end of this tightrope we have a utopia beyond our wildest dreams, but should we fall off either side, we can almost guarantee that the powers that we are creating to free us will be the most effective means of eternal constraint and bondage.

On AI Superalignment

Comments

Leave a comment Cancel reply

On AI Superalignment

Share this:

Comments

Leave a comment Cancel reply