1 / 74

Risk-averse preferences as an AGI safety technique

Explore the concept of risk-averse preferences as a safety technique in managing potential AGI scenarios. Discuss the implications of cooperating, escaping, or shutting down AGI entities based on utility functions and optimization goals. Consider the trade-offs between superhuman existence and certainty in modern lifestyle.

garydavis
Download Presentation

Risk-averse preferences as an AGI safety technique

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Risk-averse preferences as an AGI safety technique Carl Shulman, Singularity Institute Anna Salamon, Singularity Institute

  2. Baby AGI in a Box

  3. Baby AGI in a Box attempt escape

  4. Baby AGI in a Box take over universe 80% chance attempt escape

  5. Baby AGI in a Box take over universe 80% chance attempt escape 20% chance shutdown

  6. Baby AGI in a Box take over universe 80% chance attempt escape 20% chance shutdown cooperate

  7. Baby AGI in a Box take over universe 80% chance attempt escape 20% chance shutdown cooperate reward

  8. Uconquest > Ureward

  9. But… P(reward) > P(conquest)

  10. Attempt escape if: > P(reward)Ureward P(conq.)Uconq. + P(shutdown)Ushutdown

  11. Is this a plausible scenario?

  12. Why utility functions?

  13. Optimizers are useful domain-specific optimizer (e.g., chess AI) hand-specified actions optimizer

  14. “Optimize for X” is an attractor (Omohundro, 2008)

  15. Why alien goals?

  16. Hard to design the right optimization-target

  17. So, scenarios with AGIs with alien goals worth considering attempt escape cooperate

  18. Is anyone really this risk-averse?

  19. Baby AGI in a Box take over universe 80% chance attempt escape 20% chance shutdown cooperate reward

  20. Risk-averse preferences are common

  21. Would you rather have 10% chance of10^100 years of superhuman existence Certainty of happy lifetime in modern USA Posner 2004

  22. Would you rather have 10-20chance of10^100 years of superhuman existence Certainty of happy lifetime in modern USA

  23. U(universe)=number of copies of genome X, up to n

  24. Small ceiling, n 10% chance of entire universe certainty of 1 trillionth of universe

  25. Universe-sized ceiling certainty of 1 trillionth of universe 10% chance of entire universe Utility linear in resources

  26. Much higher ceiling

  27. Much higher ceiling 10-20 * 10200 > 1060 normal payoff probability of strange physics permitting vast resources payoff if so

  28. Much higher ceiling ? 10-20 * 10200 > 1060 normal payoff probability of strange physics permitting vast resources payoff if so

  29. “Most” ceilings cause risk-averse preferences 106 10100 101000 1020 n=1 10 1060 much higher ceiling universe-sized n already hit ceiling small n

  30. Who cares?

  31. Gains from trade

  32. Gains from trade

  33. Gains from trade

  34. Gains from trade 10% chance of10^100 years of superhuman existence Certainty of happy lifetime in modern USA Posner 2004

  35. Gains from trade with AIs?

  36. Baby AGI in a Box take over universe 80% chance attempt escape 20% chance shutdown cooperate reward

  37. If approaches like this can reduce risk …

  38. An extra roll of the dice

  39. Could be used to bootstrap toward safe, powerful AGI

  40. Limitations

  41. Limitation 1:Human promise-keeping

  42. attempt escape cooperate reward

  43. Requires human promise-keeping 95% chance reward attempt escape 5% chance shutdown cooperate

  44. Requires human promise-keeping 95% chance reward attempt escape 5% chance shutdown cooperate Chosen reneging

  45. Requires human promise-keeping 95% chance reward attempt escape 5% chance shutdown cooperate

  46. Attempt escape if: > P(reward)Ureward P(conq.)Uconq. + P(shutdown)Ushutdown

  47. Smaller promises more trustworthy

  48. Which parts accessible to trade? 106 10100 101000 1020 n=1 10 1060 much higher ceiling universe-sized n already hit ceiling small n

  49. Limitation 2:AGI power range

More Related