Cloud Computing

operational expense (OPEX), capital expense (CAPEX) → centralize server

tenant isolation: challenge for cloud provider

cloud provider classification: infrastructure/ platform/ software as a service

data center hierarchy: rack, row & aisle, pod (building unit)

top-of-rack (ToR) switch: 2 per rack, high capacity

  • north-south traffic: outside to inside—load balancing
  • east-west traffic: within inside—high bandwidth

network topology

  • fat tree. problem: require insane link capacity on top
    • link aggregation: bond multiple link together
  • leaf-spine topology (clos): layered multiple spine connected to multiple rack

Internet redundancy: multiple ISP

data storage

  • classical approach: server with own HDD, data replication
  • modern approach: separate compute/ storage unit, SSD, less data replication

Hadoop

  • automatic MapReduce management
  • HDFS: Hadoop distributed file system
    • assume each server has own storage
    • append-only, 128MB block
    • NameNode index, DataNode store
    • try to compute with local data

Spark

  • multi-paradigm, high-performance
  • building block: resilient distributed dataset (RDD)
    • general, lazy, ephemeral, lineage, shareable
    • aggressive caching with LRU

paravirtualization: modify guest OS to not use kernel space

  • high performance
  • hard to modify OS

full virtualization

  • native performance
  • need hardware support (e.g. ring -1 for hypervisor)

container

  • isolate: network namespace, file system, process information, user
  • Linux containers (LXC): more full system than Docker

automation level: deployment & configuration, → monitoring & measurement, → trends & prediction, → root cause analysis, → troubleshooting

  • vendor-independent configuration: zero-touch provisioning & infrastructure as code

orchestration

  • building blocks: ephemeral, can be replicated, replaced, composed → update, autoscaling, restore
  • load balancing
  • early error detection during rollout

Kubernetes

  • pod: group of container, smallest unit to be managed
    • init container: run before other

microservice: smaller scope & team, modularity, less complexity, language flexibility, test coverage, rapid deployment, fault isolation; cascading error, functionality & data duplication, attack surface

service mesh proxy: load balancing, convert conventional communication protocol

hysteresis: delay in response to change

  • cause instability, oscillation
  • event-based controller
  • stabilization window

serverless/ FaaS: API redirection & runtime & DB by provider. event-driven, stateless, scale from zero to infinite

classic software development → agile → DevOps → site reliability engineering (SRE): merge development, quality assurance, operation

  • version management
  • extensive testing in continuous integration
  • sandbox & canary deployment
  • SRE: half time development, max half time operation, overflow redirect to development team
    • on-call shift
      • post-mortem report, pager fatigue
    • error budget: target < 100% availability, adjust development accordingly
    • playbook: automate away human

networking safety and isolation: harden north-south traffic, assume cooperation & prevent traffic among tenant on east-west traffic

  • same tenant: overlay topology (virtual) & underlay (physical)
  • different tenant: traffic like north-south
  • implementation: VLAN, packet carry tag
    • VXLAN (extended): encapsulate, solve 4096 VLAN limit

remote storage interface: file/ block interface (NFS/ virtual disk)

object storage: key-bytes pair