Ensuring fairness and safety in artificial intelligence(AI) applications is considered by many the biggest challenge in the space. As AI systems match or surpass human intelligence in many areas, it is essential that we establish a guideline to align this new form of intelligence with human values. The challenge is that, as humans, we understand very little about how our values are represented in the brain or we can’t even formulate specific rules to describe a specific value. While AI operates in a data universe, human values are a byproduct of our evolution as social beings. We don’t describe human values like fairness or justice using neuroscientific terms but using arguments from social sciences like psychology, ethics or sociology. Last year, researchers from OpenAI published a paper describing the importance of social sciences to improve the safety and fairness or AI algorithms in processes that require human intervention.

We often hear that we need to avoid bias in AI algorithms by using fair and balanced training datasets. While that’s true in many scenarios, there are many instances in which fairness can’t be described using simple data rules. A simple question such as “do you prefer A to B” can have many answers depending on the specific context, human rationality or emotion. Imagine the task of inferring a pattern of “happiness”, “responsibility” or “loyalty” given a specific dataset. Can we describe those values simply using data? Extrapolating that lesson to AI systems tells us that in order to align with human values we need help from the disciplines that better understand human behavior.

AI Value Alignment: Learning by Asking the Right Questions

In their research paper, the OpenAI team introduced the notion of AI value alignment as _“the task of ensuring that artificial intelligence systems reliably do what humans want”. _AI value alignment requires a level of understanding of human values in a given context. However, many times, we can’t simply explain the reasoning for a specific value-judgment in a data-rule. In those scenarios, the OpenAI team believes that the best way to understand human values is by simply asking questions.

Imagine a scenario in which we are trying to train a machine learning classifier in whether the outcome of a specific event is “better” or “worse”. Is an “increase in taxes better or worse?”, maybe is better for government social programs and worse for your economic plans. “Would it be better or worse if it rains today?”, maybe it would be better for the farmers and worse for the folks that were planning a biking trip. Questions about human values can have different subjective answers depending on a specific context. From that perspective, if we can get AI systems to ask specific questions maybe we can learn to imitate human judgement in specific scenarios.

Asking the right question is an effective method for achieving AI value alignment. Unfortunately, this type of learning method is vulnerable to three well-known limitations of human value judgment:

  1. Reflective Equilibrium: In many cases, humans can’t arrive to the right answer to a question related to value judgement. Cognitive or ethical biases, lack of domain knowledge or fuzzy definition of “correctness” are factors that might introduce ambiguity in the answers. However, if we remove many of the contextual limitations of the question, a person might arrive to the “right answer”. In philosophy this is known as the “reflective equilibrium” as is one of the mechanism that any AI algorithm that tries to learn about human values should try to imitate.
  2. Uncertainly: Even if we can achieve a reflective equilibrium for a given question, there might be many circumstances in which uncertainly or disagreement prevent humans for arriving to the right answer. Any activities related to future planning often entail uncertainty.
  3. Deception: Humans have a unique ability to provide plausible answers to a question but that might wrong in some non-obvious way. Intentionally or unintentionally, deceptive or misleading behavior often results in a misalignment between the outcome of a given event and the values of the parties involved. Recognizing deceptive behavior is a non-trivial challenge that needs to be solved to achieve AI value alignment.

#2020 oct opinions #ai #bias #ethics #openai

Can AI Learn Human Values?
4.55 GEEK