Cloud Computing
operational expense (OPEX), capital expense (CAPEX) → centralize server
tenant isolation: challenge for cloud provider
cloud provider classification: infrastructure/ platform/ software as a service
data center hierarchy: rack, row & aisle, pod (building unit)
top-of-rack (ToR) switch: 2 per rack, high capacity
- north-south traffic: outside to inside—load balancing
- east-west traffic: within inside—high bandwidth
network topology
- fat tree. problem: require insane link capacity on top
- link aggregation: bond multiple link together
- leaf-spine topology (clos): layered multiple spine connected to multiple rack
Internet redundancy: multiple ISP
data storage
- classical approach: server with own HDD, data replication
- modern approach: separate compute/ storage unit, SSD, less data replication
Hadoop
- automatic MapReduce management
- HDFS: Hadoop distributed file system
- assume each server has own storage
- append-only, 128MB block
- NameNode index, DataNode store
- try to compute with local data
Spark
- multi-paradigm, high-performance
- building block: resilient distributed dataset (RDD)
- general, lazy, ephemeral, lineage, shareable
- aggressive caching with LRU
paravirtualization: modify guest OS to not use kernel space
- high performance
- hard to modify OS
full virtualization
- native performance
- need hardware support (e.g. ring -1 for hypervisor)
container
- isolate: network namespace, file system, process information, user
- Linux containers (LXC): more full system than Docker
automation level: deployment & configuration, → monitoring & measurement, → trends & prediction, → root cause analysis, → troubleshooting
- vendor-independent configuration: zero-touch provisioning & infrastructure as code
orchestration
- building blocks: ephemeral, can be replicated, replaced, composed → update, autoscaling, restore
- load balancing
- early error detection during rollout
Kubernetes
- pod: group of container, smallest unit to be managed
- init container: run before other
microservice: smaller scope & team, modularity, less complexity, language flexibility, test coverage, rapid deployment, fault isolation; cascading error, functionality & data duplication, attack surface
service mesh proxy: load balancing, convert conventional communication protocol
hysteresis: delay in response to change
- cause instability, oscillation
- event-based controller
- stabilization window
serverless/ FaaS: API redirection & runtime & DB by provider. event-driven, stateless, scale from zero to infinite
classic software development → agile → DevOps → site reliability engineering (SRE): merge development, quality assurance, operation
- version management
- extensive testing in continuous integration
- sandbox & canary deployment
- SRE: half time development, max half time operation, overflow redirect to development team
- on-call shift
- post-mortem report, pager fatigue
- error budget: target < 100% availability, adjust development accordingly
- playbook: automate away human
- on-call shift
networking safety and isolation: harden north-south traffic, assume cooperation & prevent traffic among tenant on east-west traffic
- same tenant: overlay topology (virtual) & underlay (physical)
- different tenant: traffic like north-south
- implementation: VLAN, packet carry tag
- VXLAN (extended): encapsulate, solve 4096 VLAN limit
remote storage interface: file/ block interface (NFS/ virtual disk)
object storage: key-bytes pair