- Artificial Intelligence
- ethics
- definition of artificial intelligence
- intelligent agent
- search
- example application
- reinforcement learning
Artificial Intelligence
most supervised learning when ppl talk about
ethics
Mill’s utilitarianism: maximize benefit for everyone, mostly equally
bane:
- conflict of interest
- personal benefit
Kant’s formalism/ duty ethics: unconditional command for every individual
bane: universal principle may harm specific people
Locke’s Right Ethics: individual has right simply by existence
bane: what to do when two people’s rights conflict
Aristotle’s virtue ethics: objective goodness from human qualities
bane: how to find the “golden mean”
the golden rule (all above agrees with)
Do unto others as you would have others do unto you
codes of ethics
statements of general principles, followed by instructions for specific conduct
defiens duties the professional owes to society/employers/clients/colleagues/subordinates/profession/self
engineering design process
- recognize problem/need, gather information
- define problem/goal
- generate/propose solution/method
- evaluate benefit&cost of alternatives
handling ethical issues
- correct the problem
- whistle blowing
- resign in protest
definition of artificial intelligence
- humanly → acting
- ration → math/theory
Turing test
- NLP
- knowledge representation
- automated reasoning
- ML
total Turing test
perceptional abilities
- computer vision
- robotics
thinking humanly
- get inside the working of human mind
- general problem solver
- cognitive science
thinking rationally
- correctness
- logic
- fact-check
knowledge-based system
- general-purpose search
- domain-specific knowledge
- knowledge bottleneck
intelligent agent
rational agent
- prior knowledge of environment
- performable action
- performance measurement
- perception
task environment
PEAS: performance measurement, environment, actuators, sensors
properties
- fully/partially observable
- single/multiple agent
- deterministic/stochastic
- episodic/sequential
- static/dynamic
- known/unknown
agent structure
- function: perception → action
- architecture: sensory → actuator
table-driven structure
- simplest
- e.g. industrial robot
- #cases explode
simple reflex agent
match input against rule, return action
- model-based: only work for fully observable
- goal-based
- utility-based: maximize gain expectation
search
- breadth/depth first
example application
collaborative perception
- share raw data → huge overhead
- extract feature
- share object position
anomaly detection
- bidirectional optimization
uninformed search
- backtracking search
breadth-first search
- complete: will find shallowest goal if graph has finite depth
- optimal if path cost is non-decreasing of depth
- where is branching factor
uniform-cost search
expand node with least path cost
- optimal
- where every action cost
informed search (heuristic search)
A* search (A-star search)
- cost
- cost to reach node
- estimated cost to goal, heuristic,
- optimal if is admissible in tree search
- admissibility: never overestimate
- optimal if is consistent in graph search
- consistency: triangle inequality
- optimally efficient if is consistent: expand fewest node among optimal algorithm
reinforcement learning
- control system
- model-based vs model-free
dynamic programming
- key: sub-problem
example
- Dijkstra’s algorithm: per node
- Bellman-Ford algorithm: per hop
discrete Markov decision process (discrete MDP)
finite tuple
space
action
state transition probabilities
discount factor
reward function : evaluation metric
total payoff. maximize this
policy . find this
find optimal policy
optimal policy
mapping state to expected total payoff
Bellman equation
value iteration
Bellman update:
- linear system
- Bellman back operator
- sync/async update
- force to converge exponentially
policy iteration
random
repeat:
- when converge, guarantee optimal policy
- high complexity: linear system very step
exploration and exploitation
-greedy
- is small and decrease
softmax
continuous Markov decision process (continuous MDP)
inverted pendulum
- kinematic model
discretization
- curse of dimensionality:
- bad for smooth function
4 ~ 8 dimension
value function approximation
approximate
trial
model/simulator:
- assume discrete
learn from data
- trial, each with time step
- supervised learning
- linear regression
- deterministic/stochastic model
- noise term
- model-based reinforcement learning
- fitted value iteration
fitted value iteration
approximate from
trial: ramdomly sample
initialization:
repeat for :
repeat for :
sample
is estimation of
any regression model, e.g.
- for deterministic model, can set
Mealy machine MDP
Bellman equation:
finite horizon MDP
finite tuple
time horizon
maximize
action based on time
time-dependent dynamic
- solution by dynamic programming: work way back from
linear quadratic regulation
linear transition with noise
negative quadratic reward to push system back
policy searching method
stochastic policy
: probability of taking action at state
direct policy search: find reasonable
- fixed initial state
- greedy stochastic gradient ascent
- learning rate
repeat:
sample
reinforce
reason it converge: product rule
partial observable MDP
reinforce with baseline
- arbitrary baseline
- independent of
Monte Carlo method
- trial:
- wait until end of episode return
- drawback: slow if episode long
time-difference learning (TD-learning)
at time , update
TD error
on-policy TD (SARSA)
- -greedy
- initialize arbitrary reward
- repeat for episode
initialize
behavior policy: select based on
repeat for each step
select potential for based on
update
until is terminal
off-policy TD: Q-learning
same as SARSA except
- greedy