The Nuts & Bolts Of IMPLEMENTING A SAFE MOTIVATIONAL SYSTEM

The Nuts & Bolts OfIMPLEMENTING A SAFE MOTIVATIONAL SYSTEM Mark R. Waser Digital Wisdom Institute MWaser@DigitalWisdomInstitute.org

Outline • What is a “safe” motivational system? • How do we ensure that it happens (and sticks)?

What is a “safe” motivational system? *ANYTHING* that reliably leads to ETHICAL BEHAVIOR

What is Ethical Behavior? The problem is that no ethical system has ever reached consensus. Ethical systems are completely unlike mathematics or science. This is a source of concern.

Entities Require Ethics • Ethics are “rules of the road” • Necessary for “safe” interaction • Yet, we cannot come to a consensus about them • There is something horribly wrong with this picture

The Human Moral System • Is primarily implemented via emotions • Is not transparent or reflective • Frequently conflicts with “rationality” • Is “clearly” subjective

Humans are . . . . • Evolved to self-deceive in order to better deceive others (Trivers 1991) • Unable to directly sense agency (Aarts et al. 2005) • Prone to false illusory experiences of self-authorship (Buehner and Humphreys 2009) • Unable to correctly retrieve the reasoning behind moral judgments (Hauser et al. 2007) • Almost always unaware of what morality is and why it should be practiced . . . .

Inflammatory Statements • >Human intelligence REQUIRES ethics • All humans want the same things • Ethics are universal • Ethics are SIMPLE in concept • Difference in power is irrelevant (to ethics) • Evolution has “designed” you todisagree with the above five points

The Origin of Morality • Selfishness predictably evolves • Reciprocal altruism predictably evolves • But requires cognitive complexity to ensure that is is not taken advantage of • Ethics predictably evolves • As an attractor in the state space of behavior because community is so valuable • But altruistic punishment is a necessity • Arms Race between • Individual benefits of successful personal cheating (really only in a short-term/highly time-discounted view) • Societal benefits of cheating detection & prevention

Haidt’s Functional Approach Moral systems are interlocking sets of values, virtues, norms, practices, identities, institutions, technologies, and evolved psychological mechanisms that work together to suppress or regulate selfishness and make cooperative social life possible

How to Universalize Ethics Quantify/evaluate intents, actions & consequences with respect to codified consensus moral foundations Permissiveness/Utility Function equivalent to a “consensus” human (generic entity) moral sense

Instrumental Goals/Universal Subgoals(adapted from Omohundro 2008 The Basic AI Drives) • Self-improvement • Rationality/integrity • Preserve goals/utility function • Decrease/prevent fraud/counterfeit utility • Survival/self-protection • Efficiency (in resource acquisition & use) • Community = assistance/non-interference through GTO reciprocation (OTfT + AP) • Reproduction

Human Goals survival/self-protection & reproduction happiness & pleasure ------------------------------------------------------------------------------------ community ------------------------------------------------------------------------------------- self-improvement rationality/integrity reduce/prevent fraud/counterfeit utility efficiency (in resource acquisition & use)

Human Goals & Sins survival/reproduction happiness/pleasure ------------------------------------------------- community (ETHICS)-------------------------------------------------- self-improvement rationality/integrity reduce/prevent fraud/counterfeit utility efficiency (in resource acquisition & use) murder (& abortion?) cruelty/sadism ------------------------------------------------- ostracism, banishment & slavery (wrath, envy) ---------------------------------------------------- slavery manipulation lying/fraud (swear falsely/false witness) theft (greed, adultery,coveting) suicide (& abortion?) masochism ------------------------------------------------ selfishness (pride, vanity)------------------------------------------------- acedia (sloth/despair) insanity wire-heading (lust) wastefulness (gluttony, sloth)

Haidt’s Moral Foundations 1) Care/harm: This foundation is related to our long evolution as mammals with attachment systems and an ability to feel (and dislike) the pain of others. It underlies virtues of kindness, gentleness, and nurturance. 2) Fairness/cheating: This foundation is related to the evolutionary process of reciprocal altruism. It generates ideas of justice, rights, and autonomy. [Note: In our original conception, Fairness included concerns about equality, which are more strongly endorsed by political liberals. However, as we reformulated the theory in 2011 based on new data, we emphasize proportionality, which is endorsed by everyone, but is more strongly endorsed by conservatives] 3) Liberty/oppression*: This foundation is about the feelings of reactance and resentment people feel toward those who dominate them and restrict their liberty. Its intuitions are often in tension with those of the authority foundation. The hatred of bullies and dominators motivates people to come together, in solidarity, to oppose or take down the oppressor. 4) Loyalty/betrayal: This foundation is related to our long history as tribal creatures able to form shifting coalitions. It underlies virtues of patriotism and self-sacrifice for the group. It is active anytime people feel that it's "one for all, and all for one." 5) Authority/subversion: This foundation was shaped by our long primate history of hierarchical social interactions. It underlies virtues of leadership and followership, including deference to legitimate authority and respect for traditions. 6) Sanctity/degradation: This foundation was shaped by the psychology of disgust and contamination. It underlies religious notions of striving to live in an elevated, less carnal, more noble way. It underlies the widespread idea that the body is a temple which can be desecrated by immoral activities and contaminants (an idea not unique to religious traditions).

Additional Contenders • Waste • efficiency in use of resources • Ownership/Possession • efficiency in use of resources; Tragedy of the Commons • Honesty • reduce/prevent fraud/counterfeit utility • Self-control • Rationality/integrity

How to Universalize Ethics Quantify/evaluate intents, actions & consequences with respect to codified consensus moral foundations Permissiveness/Utility Function equivalent to a “consensus” human (generic entity) moral sense

Critical Components I:Self-Knowledge & Reflection • A self must know itself to be a self • Composed of three parts: • The running processes (consciousness) • The personal knowledge base (memory) • The physical hardware (body) • Must start with: • A competent model of each • Sensors to detect changes and their effects • *MUST* “care” about itself (motivation)

Critical Components II:Explicit “Anchor” Values • Do not defect from the community • Do not become too large/powerful • Acquire and integrate knowledge • Instrumental goals

Critical Components III:Reliability • Self-Control, Integrity, Autonomy, Responsibility • In “predictive control” of its own state and that of the physical objects that support it • Yes! This is a marked deviation from the human example.

Architecture Processes will be divided into three main classes: • Operating system processes • Subconscious/tool processes • One serial consciousness/learner process (CLP) The CLP will be able to create, modify and/or influence many of the subconscious/tool processes. The CLP will NOT be given access to modify operating system processes • Indeed, it will have multiple/redundant logical, emotional & moral reasons to seriously convince it not to even try

Operating System Architecture • Open, Pluggable, Service-Oriented/Message-Passing • Quickly adopt novel input streams • Handle resource requests and allocation • Provide connectivity between components • Safety Features • Act as a “black box” security monitor capable of reporting problems without the consciousness’s awareness • Able to “manage” the CLP by manipulating the amount of processor time and memory available to it (assuming that the normal subconscious processes are unable to do so) • Other protections against hostile humans, inept builders, and the learner itself may be implemented as well

Automated Predictive World Model • Is the most important subconscious process(es) • Will serve as an interface to the “real” world • The CLP will live in a virtual world (just as we do) • Will be both reactive and predictive • Will generate “anomaly interrupts” upon deviations from expectations as an approach to solving the “brittleness” problem (Perlis 2008) • Will contain certain relatively immutable concepts to serve as anchors both for emotions and for ensuring safety (trigger patterns – Ohman et al. 2001)

Anchors & Emotions • Anchors create a multiple attachment point model which is much safer than the single-point-of-failure, top-down-only approach of “machine enslavement” advocated by the MIRI (Yudkowsky 2001) • Emotions will be generated by the subconscious processes as “actionable qualia” to inform the CLP and will also bias the selection and urgency tags of information relayed via the predictive model • Violations of the cooperative social living “moral” system will result in a flood of urgently–tagged anomaly interrupts demanding that consciousness resources be expended to “solve the problem”

Conscious Learning Process (CLP) • The goal is to provide as many optional structures and standards to support and speed development as much as possible while not restricting possibilities beyond what is absolutely required for safety. • We believe the best way to do this is with a blackboard system similar to Learning IDA (Baars and Franklin 2007). • The CLP acts like the Governing Board of the Policy Governance model (Carver 2006) to create a coherent, consistent, integrated narrative plan of action to fulfill the goals of the larger self.

The Digital Wisdom Institute is a non-profit think tank focused on the promise and challenges of ethics, artificial intelligence & advanced computing solutions. We believe that the development of ethics and artificial intelligence and equalco-existence with ethical machines is humanity's best hope http://DigitalWisdomInstitute.org

The Nuts & Bolts Of IMPLEMENTING A SAFE MOTIVATIONAL SYSTEM