Artificial Intelligence

most supervised learning when ppl talk about

ethics

  • Mill’s utilitarianism: maximize benefit for everyone, mostly equally

    bane:

    • conflict of interest
    • personal benefit
  • Kant’s formalism/ duty ethics: unconditional command for every individual

    bane: universal principle may harm specific people

  • Locke’s Right Ethics: individual has right simply by existence

    bane: what to do when two people’s rights conflict

  • Aristotle’s virtue ethics: objective goodness from human qualities

    bane: how to find the “golden mean”

the golden rule (all above agrees with)

Do unto others as you would have others do unto you

codes of ethics

statements of general principles, followed by instructions for specific conduct

defiens duties the professional owes to society/employers/clients/colleagues/subordinates/profession/self

engineering design process

  1. recognize problem/need, gather information
  2. define problem/goal
  3. generate/propose solution/method
  4. evaluate benefit&cost of alternatives

handling ethical issues

  1. correct the problem
  2. whistle blowing
  3. resign in protest

definition of artificial intelligence

  • humanly → acting
  • ration → math/theory

Turing test

  • NLP
  • knowledge representation
  • automated reasoning
  • ML

total Turing test

perceptional abilities

  • computer vision
  • robotics

thinking humanly

  • get inside the working of human mind
  • general problem solver
  • cognitive science

thinking rationally

  • correctness
  • logic
  • fact-check

knowledge-based system

  • general-purpose search
  • domain-specific knowledge
  • knowledge bottleneck

intelligent agent

rational agent

  1. prior knowledge of environment
  2. performable action
  3. performance measurement
  4. perception

task environment

PEAS: performance measurement, environment, actuators, sensors

properties

  1. fully/partially observable
  2. single/multiple agent
  3. deterministic/stochastic
  4. episodic/sequential
  5. static/dynamic
  6. known/unknown

agent structure

  • function: perception → action
  • architecture: sensory → actuator

table-driven structure

  • simplest
  • e.g. industrial robot
  • #cases explode

simple reflex agent

match input against rule, return action

  • model-based: only work for fully observable
  • goal-based
  • utility-based: maximize gain expectation
  • breadth/depth first

example application

collaborative perception

  • share raw data → huge overhead
  • extract feature
  • share object position

anomaly detection

  • bidirectional optimization
  • backtracking search
  • complete: will find shallowest goal if graph has finite depth
  • optimal if path cost is non-decreasing of depth
  • where is branching factor

expand node with least path cost

  • optimal
  • where every action cost
  • cost
    • cost to reach node
    • estimated cost to goal, heuristic,
  • optimal if is admissible in tree search
    • admissibility: never overestimate
  • optimal if is consistent in graph search
    • consistency: triangle inequality
  • optimally efficient if is consistent: expand fewest node among optimal algorithm

reinforcement learning

  • control system
  • model-based vs model-free

dynamic programming

  • key: sub-problem

example

  • Dijkstra’s algorithm: per node
  • Bellman-Ford algorithm: per hop

discrete Markov decision process (discrete MDP)

finite tuple

  • space

  • action

  • state transition probabilities

  • discount factor

  • reward function : evaluation metric

  • total payoff. maximize this

  • policy . find this

find optimal policy

optimal policy

  • mapping state to expected total payoff

    Bellman equation

value iteration
  1. Bellman update:

  • linear system
  • Bellman back operator
  • sync/async update
  • force to converge exponentially
policy iteration
  1. random

  2. repeat:

  • when converge, guarantee optimal policy
  • high complexity: linear system very step
exploration and exploitation
  • -greedy

    • is small and decrease
  • softmax

continuous Markov decision process (continuous MDP)

inverted pendulum

  • kinematic model

discretization

  • curse of dimensionality:
  • bad for smooth function

4 ~ 8 dimension

value function approximation

approximate

trial

  1. model/simulator:

    • assume discrete

  2. learn from data

    • trial, each with time step
    • supervised learning
      • linear regression
      • deterministic/stochastic model
        • noise term
      • model-based reinforcement learning
      • fitted value iteration
fitted value iteration

approximate from

  1. trial: ramdomly sample

  2. initialization:

  3. repeat for :

    1. repeat for :

      sample

      is estimation of

any regression model, e.g.

  • for deterministic model, can set

Mealy machine MDP

Bellman equation:

finite horizon MDP

  • finite tuple

  • time horizon

  • maximize

  • action based on time

  • time-dependent dynamic

    • solution by dynamic programming: work way back from
linear quadratic regulation
  • linear transition with noise

  • negative quadratic reward to push system back

policy searching method

  • stochastic policy

  • : probability of taking action at state

  • direct policy search: find reasonable

    • fixed initial state
    • greedy stochastic gradient ascent
    • learning rate

repeat:

  1. sample

  2. reinforce

reason it converge: product rule

partial observable MDP

reinforce with baseline

  • arbitrary baseline
    • independent of

Monte Carlo method

  • trial:
  • wait until end of episode return
  • drawback: slow if episode long

time-difference learning (TD-learning)

at time , update

  • TD error

on-policy TD (SARSA)

  • -greedy
  1. initialize arbitrary reward
  2. repeat for episode
    1. initialize

    2. behavior policy: select based on

    3. repeat for each step

      1. select potential for based on

      2. update

      until is terminal

off-policy TD: Q-learning

same as SARSA except

  • greedy