Steven Hé (Sīchàng)
Hello there, Steven here:
- Systems programmer, Rustacean, Pythonista, Elixirist, Vimmer.
- 1st-year Computer Science Ph.D. student in the Networked Systems Lab (NSL), University of Southern California (USC).
- Advisor: Harsha Madhyastha.
- Interests: the Web, Networking, SE, PL, HPC.
- For experiences, publications, etc., please see the CV linked below.
Link farm
Press s to search!
2025 Spring News
- (2025-01-09) 🌐🤖 I sketched the DeGenTWeb project to detect and study LLM-generated text content on the Web.
2024 Fall News
(2024-12-13) 🤖💀 I posted a survey of generative AI problems and countermeasurements, which showed many business, cryptography, and computer vision problems remain in this field.
(2024-11-27) ⭕💀 We paused the JSphere project that classifies the use of JavaScript on the Web by browser API calls. Preliminary results are in a report for a course and a colorful poster.
Feel free to take over this project.
(2024-11-05) 💬🌐 I recorded a retroactive presentation for my RPSLyzer talk at IMC ’24.
(2024-11-05) 💬🌐 I presented the RPSLyzer paper at IMC ’24 in Madrid and met nice people at the conference. The conference has no recordings, unfortunately.
(2024-09-15) 🐍🥲 My Battlesnake “TinyViTMCTS” will go down on the leaderboard because Duke is shutting down the VM it runs on.
(2024-09-12) 📖🌐 I posted the camera-ready PDF version of the RPSLyzer paper. Download it from the Internet Route Verification repository’s Releases.
(2024-08-27) 📖💵 The IMC Student Travel Grant committee generously awarded me funds, so that I can go present my paper at the conference.
(2024-08-23) 🏠🔬 I moved to Los Angeles and joined the USC Networked Systems Lab as a Ph.D. student, working with Professor Harsha V. Madhyastha.
(2024-07-31) 📖🎉 ACM Internet Measurement Conference (IMC) 2024 accepted our paper, RPSLyzer: Characterization and Verification of Policies in Internet Routing Registries, by Sichang He, Italo Cunha, and Ethan Katz-Bassett. We will post drafts in the Internet Route Verification repository.
(2024-07-24) 🌐⭕ I sketched the JSphere project to study the overuse of JavaScript in websites.
(2024-07-03) 🐙🇬🇧 I started developing the Natural Syntax Language Server to highlight part-of-speech for ease of reading.
To comment on the news, please go to the news page.
About
Contact
- Email: See the address in my CV. I get notifications and reply.
- I block whoever sends spam immediately.
- Ping me if I do not reply in three days.
- GitHub: Ping me for code-related discussions. I get emails.
Name
The legal name and name used in publications is “Sichang He” (何思畅, or 何思暢). Steven is the preferred English name.
Name pronunciation
Google Translate pronounces it correctly (Click Listen). Serializing it into English is impossible; the best you can get is H-err? Sii↑ chung↓.
2025 Spring News
- (2025-01-09) 🌐🤖 I sketched the DeGenTWeb project to detect and study LLM-generated text content on the Web.
2024 Fall News
(2024-12-13) 🤖💀 I posted a survey of generative AI problems and countermeasurements, which showed many business, cryptography, and computer vision problems remain in this field.
(2024-11-27) ⭕💀 We paused the JSphere project that classifies the use of JavaScript on the Web by browser API calls. Preliminary results are in a report for a course and a colorful poster.
Feel free to take over this project.
(2024-11-05) 💬🌐 I recorded a retroactive presentation for my RPSLyzer talk at IMC ’24.
(2024-11-05) 💬🌐 I presented the RPSLyzer paper at IMC ’24 in Madrid and met nice people at the conference. The conference has no recordings, unfortunately.
(2024-09-15) 🐍🥲 My Battlesnake “TinyViTMCTS” will go down on the leaderboard because Duke is shutting down the VM it runs on.
(2024-09-12) 📖🌐 I posted the camera-ready PDF version of the RPSLyzer paper. Download it from the Internet Route Verification repository’s Releases.
(2024-08-27) 📖💵 The IMC Student Travel Grant committee generously awarded me funds, so that I can go present my paper at the conference.
(2024-08-23) 🏠🔬 I moved to Los Angeles and joined the USC Networked Systems Lab as a Ph.D. student, working with Professor Harsha V. Madhyastha.
(2024-07-31) 📖🎉 ACM Internet Measurement Conference (IMC) 2024 accepted our paper, RPSLyzer: Characterization and Verification of Policies in Internet Routing Registries, by Sichang He, Italo Cunha, and Ethan Katz-Bassett. We will post drafts in the Internet Route Verification repository.
(2024-07-24) 🌐⭕ I sketched the JSphere project to study the overuse of JavaScript in websites.
(2024-07-03) 🐙🇬🇧 I started developing the Natural Syntax Language Server to highlight part-of-speech for ease of reading.
2024 Spring News
- (2024-03-15) 💬🌻 I gave 31 of a talk and presented a poster at Flower AI Summit 2024. The talk is FedCampus: A Privacy-preserving Smart Campus by Bing Luo and Sichang He.
- (2024-02-13) 🎬📱 IEEE INFOCOM 2024 accepted our demo paper, FedKit: Enabling Cross-Platform Federated Learning for Android and iOS, by Sichang He, Beilong Tang, Boyan Zhang, Jiaqi Shao, Xiaomin Ouyang, Daniel Nata Nugraha, and Bing Luo. The demo video is on YouTube.
Steven Hé (Sīchàng)’s Notes
Android
run android studio emulator without android studio:
~/Library/Android/sdk/emulator/emulator -avd <DEVICE_NAME>
Automation Software
Chatbot Prompt
general requirements:
I am a computer scientist working on systems. Follow Clear Writing Principles, avoid any repetition, be complete and succinct.
summarize paper w/ focus on FOCUS
What does this paper talk about in terms of FOCUS? Try your best to summarize every single aspect mentioned in the paper. For every single point you elaborate, always quote the exact original related phrase, sentence, or paragraph.
general requirements (old):
Requirements R: be professional and maximally succinct. Prefer verb over noun, adjective over adverb, active tense over passive tense. Avoid fluff or self-judgements. Do not lose any information. Do not praise or repeat any given content. Keep the original language, style, and meaning as much as possible. Make minimal changes for improvement.
To remind the bot the requirements:
Repeat Requirements R verbatim.
point out problem but not revise:
Criticize my draft and list all the problems. For each problem separately, quote my original words, reason about the problem, provide suggestions, and provide suggested change.
compress text:
Revise this part to make it more succinct, without loosing any information. Stay close to the original language and make minimal changes.
refine prompt based on result I do not like:
I do not like how you PROBLEM.
What could I have said instead so that you would never ever say that?
Command-Line Utilities
wrap lines to a maximum width of 80 in FILE
fold -w 80 -s FILE
extract all images from PDF
pdfimages -all PDF_FILE "$(pwd)/"
convert PDF to HTML and preserve layout
docker run -t --rm -v $(pwd):/pdf -w /pdf pdf2htmlex/pdf2htmlex:0.18.8.rc2-master-20200820-alpine-3.12.0-x86_64 PDF_FILE
Hammerspoon
add stuff: open config, type in LaTeX, save, reload config
bind function to hotkey
hs.hotkey.bind({"functionKey", …}, "key",…, function()
…
end)
functionKey: cmd
alt
ctrl
key: X
\\
Left
the {}
must stay even if it`s empty
show alert message
hs.alert.show("message")
generate system notification with title and content
hs.notify.new({title="title", informativeText="content"}):send()
define the current window as win
local win = hs.window.focusedWindow()
define win’s window frame as f
local f = win:frame()
position of f
is (f.x, f.y)
its width and height f.w
, f.h
define the screen win
at as screen
local screen = win:screen()
define screen
’s screen frame as max
local max = screen:frame()
update the frame of win
as f
win:setFrame(f)
hs.pathwatcher.new(.....):start()
reload config
hs.reload()
automatically reload config
function reloadConfig(files)
doReload = false
for _,file in pairs(files) do
if file:sub(-4) == ".lua" then
doReload = true
end
end
if doReload then
hs.reload()
end
end
myWatcher = hs.pathwatcher.new(os.getenv("HOME") .. "/.hammerspoon/", reloadConfig):start()
hs.alert.show("Config loaded")
get the home direction of the system
os.getenv("HOME")
connect string to path
..
call function
whenever changes in direction
hs.pathwatcher.new("direction", function):start()
Browsers
clear browser cache
browser DNS cache
Chrome:
chrome://net-internals/#dns
Firefox:
about:networking#dns
Computer Science
Discrete Mathematics for Computer Science
proof by induction
steps with example
- variable n
- property P(n)
- base case P(0)
- induction hypothesis
- weak induction
assume P(n−1) is true - strong induction
assume P(k) is true ∀ k<n
- weak induction
- induction step
induction hypothesis ⇒P(n) is true
- Machine Learning
- polynomial data fitting
- k-nearest neighbors predictor
- gradient descent
- logistic-regression classifier
- support vector machine (SVM)
Machine Learning
general problem format
given input A, want output y∈Y
f:A→Y,y=f(a)∈Yfor a∈A
- traditional approach
hand-craft the function f - machine learning
build another function λ and use it to generate an approximation f
machine learning is about building a function
- training data L={(a1,y1),⋯,(an,yn)}
- class of allowed function F
use a predefined algorithm to compute f∈F with the goal:
y≈f(x)
simplification using feature vector
convert input into a 1d vector, making machine learning generally applicable
- feature vector x∈X⊆Rd
- ϕ:A→X
- training set Ta={(x1,y1),⋯,(xn,yn)}⊂Rd×Re
- hypothesis space H
- Y⊆Re
predefined algorithm produce h∈H so that
f(a)=h(ϕ(a))
loss
estimation of the error of h
ℓ:Y×Y→R+ℓ(yn,h(xn))
zero-one loss
ℓ(y,y^):={01y=y^otherwise
quadratic loss
ℓ(y,y^):=(y−y^)2
empirical risk
average lost based on the training set
LT(h):=N1n=1∑Nℓ(yn,h(xn))
three types of machine learning problems
classification problem
label given data
- classifier (predictor) h: the produced function
all possible training set
T:=2X×Y
signature of machine learning function λ
λ:T→Hsuch thatλ(T)=h
- λ learn (or train) classifier h from training set T
- inference: apply classifier h on any data
- testing: apply classifier h on unseen data
classifier h define a partition of X
regression problem
given data, return a vector
clustering problem
group given data
supervised/ unsupervised machine learning
supervised learning: classification, regression
unsupervised learning: clustering
polynomial data fitting
not machine learning but similar
Ac=bA=1⋮1x1⋮xN⋯⋯x1k⋮xNkb=y1⋮yNc^∈cargmin∥Ac=b∥2
- number of monomial m(d,k)=(kd+k)
interpolation in polynomial data fitting
achieve 0 loss
- k=N−1⇒ always interpolate
- overfitting
k-nearest neighbors predictor
remember the whole training set
T={(xi,yi)∣i=1,⋯,N}
return average of yns corresponding to the k closest xns to x
- useful for both classification and regression
- smaller k results in worse overfitting
- good interpolation and poor extrapolation
Voronoi diagram
gradient descent
stochastic gradient descent (SGD)
group training set randomly into mini-batch, use gradient from each mini-batch to descent
- mini-step are in the right direction on average
- epoch: using all data once
step size
fixed
decreasing
momentum
vk+1=μkvk−αk∇f(zk)
line search
subgradient
logistic-regression classifier
- score-based
score function for linear boundary
s(x)=σ(a(x))=σ(c+uTx)
signed distance to hyperplane
hyperplane χ∈Rd
b+wTx=0w=0
w is perpendicular to χ
distance of χ from origin
β:=∥w∥∣b∣
signed distance of x from χ
Δ(x):=∥w∥b+wTx
logistic function
f(Δ):=1+e−Δ1
then the score function is
s(x):=f(a(x))=f(b+wTx)=1+e−b−wTx1
- activation a(x) is signed distance scaled
softmax function
softmax function in activation a
s(x)=s1(a)⋮sK(a)a(x)=a1(x)⋮aK(x)sk(x)=f(ak(x))=∑j=1Keaj(x)eak(x)
cross entropy loss for binary classification
cross entropy loss of assigning score p to point whose true label is y
ℓ(y,p):={−logp−log(1−p)y=1y=0=−ylogp−(1−y)log(1−p)
- differentiable so we can do gradient descent
- base of log does not matter
- resulting risk function is weakly convex
cross entropy loss for K-class case
ℓ(y,p)=−logpy=−k=1∑Kqklogpk
- true label y
- prediction p
- one hot encoding q of y
support vector machine (SVM)
binary support vector machine
separating hyperplane for binary support vector machine
nTx+c=0n∈Rd,∥n∥=1,c∈R
decision rule
h(x)=sign(nTx+c)⇒y(nTx+c)≥0
margin for (x,y) with parameter v=(n,c)
μv(x,y):=y(nTx+c)
margin μv(T) of training set
μv(T):=(x,y)∈Tminμv(x,y)
linearly separable if μv(T)>0
hinge loss
reference margin μ∗>0
ℓv(x,y):=μ∗1max{0,μ∗−μv(x,y)}=max{0,1−y(wTx+b)}
where
w:=μ∗n,b=μ∗c
- separating hyperplane wTx+b=0
empirical risk of binary support vector machine
LT(w,b):=21∥w∥2+NC0n=1∑Nℓ(w,b)(xn,yn)
- bigger C0⇒ smaller μ∗
soft linear support vector machine
(w∗,b∗)=ERMT(w,b)
subgradient of hinge function
ρ(z)=max{0,z}ρ′(z)={10z>0otherwise
support vector machine with kernel
representer theorem
∀ loss function in the form
LT(w,b)=R(∥w∥)+S(wTx1+b,⋯,wTxN+b)
where R:R+→R increasing, S:RN→R,
∃ β1,⋯βN s.t.
w∗=n=1∑Nβnxn
proof: by writing
w∗=n=1∑Nβnxn+ufor someu∈span{xi}
and proving u=0 by contradiction
support vector
sample that are misclassified or classified correctly with a margin not larger than μ∗
only support vector contribute to w∗,b∗
h(x)=sign(n=1∑NβnxnTx+b∗)LT(w,b)=21n=1∑NβmβnxnTxn+NC0n=1∑Nρ(1−yn(n=1∑NβmxnTxn+b))
kernel for support vector machine
h(x)=sign(n=1∑NβnK(xn,x)+b∗)LT(w,b)=21n=1∑NβmβnK(xn,xn)+NC0n=1∑Nρ(1−yn(n=1∑NβmK(xn,xn)+b))
where kernel K
K(x,ξ):=φ(x)φ(ξ)
for some φ
- K(x,ξ)=K(ξ,x)
- K2(x,ξ)≤K(x,x)K(ξ,ξ)
K is a kernel of {xi}
⇔C≥0 where Cij=K(xi,xj)
⇔ Mercer’s condition: ∀ f:Rd→R s.t.
∫Rdf(x)dx
is finite,
∫Rd×RdK(x,ξ)f(x)f(ξ)dxdξ≥0
Gaussian kernel
K(x,ξ)=e−σ2∥x−ξ∥2
- radial basis function (RBF) SVM
- Networking
- encoding/decoding
- framing
- reliable transmission
- Ethernet
- Wi-Fi
- network switch
- forwarding
- Ethernet switch device
- interconnect network
- interdomain routing
- user datagram protocol (UDP)
- transport control protocol (TCP)
- hypertext transfer protocol (HTTP)
- remote procedure call (RPC)
- real-time protocol (RTP)
- simple mail transfer protocol (SMTP)
- congestion control
- domain name system (DNS)
- attack
- firewall
- virtual private network (VPN)
- overlay network
Networking
encoding/decoding
- non-return to zero (NRZ)
- baseline wander
- synchronization problem
- non-return to zero inverted (NRZI): invert signal on 1
- Manchester encoding: xor clock and bit
- only 50% efficiency
- 4b/5b encoding
- 64b/66b
- 128b/130b
- modulation
framing
- sentinel-based framing
- BISYNC, PPP
- framing byte may appear in payload
- escape byte
- count-based framing: body length before body
- DDCMP
- bitwise sentinel framing
- HDLC
- bit stuffing: add extra 0 after 5 consecutive 1
- close-based framing
- fixed size frame
- need precise clock & synchronization phase
- high efficiency
error handling for framing
parity bit: whether number of 1 is odd
e.g. server RAM
- fast, easy in hardware
- miss many error
- high overhead
checksum: add up all sequence of certain length
e.g. internet checksum
- easy in software
- miss many error
cyclic redundancy check (CRC): divide message as a polynomial by generator polynomial
e.g. BISYNC, DDCMP, HDLC, ethernet, Wi-Fi
- easy in hardware with shift register
- good implementation ensure catching
- all single-bit error
- double bit error
- odd number of error
- burst of error shorter than k bit
reliable transmission
- acknowledgement (ACK)
- timeout
stop-and-wait transmission
if ACK arrive before timeout, send next frame, else, send same frame again
- timeout hard to choose
- need 1 bit for frame identifier and in ACK
- waste bandwidth
continuous transmission: sliding window algorithm
sender
- send window size (SWS)
- last ACK received (LAR)
- last frame sent (LFS)
receiver
- receive window size (RWS)
- largest frame acceptable (LFA)
- last frame received (LFR)
go-back-n
resend all frame since first lost frame
duplicate ACK
- sender: resend on duplicate ACK
- receiver: resend ACK for the last in-order frame when frame out of order
selective ACK
- sender: resend missing frame between last ACK and SACK
- receiver: send SACK for out-of-order frame
sliding window performance
utilize bandwidth
frame size f,
bandwidth b,
transmission time t,
round trip time r,
time to first ACK t0,
number of packet n
t=bft0=t+rn=⌈tt0⌉
sliding window frame identifier count
- smaller is better since overhead
- ≥ SWS + RWS: prevent overlap of sequence
Ethernet
- carrier sense: idle/busy
- multiple access: share medium and broadcast
- collision detection
- collision avoidance: exponential back-off
Ethernet switch
- usually point-to-point
- high collision
Ethernet address
48 bit printed in hex separated per byte by colon
- unique
- manufacturer are allocated 3-byte OUI
- broadcast address FF:FF:FF:FF:FF:FF
Wi-Fi
- carrier sense
- multiple access
- collision avoidance
- request to send (RTS): inform future send
- clear to send (CTS): reserve medium
Wi-Fi distribution system
- access point connected to Ethernet
- automatic handover
- support client mobility
- new access point inform old one
network switch
interconnect link of the same type
- many port
- can connect to each other
forwarding
- require unique address
packet forwarding (frame-based forwarding)
- each frame contain enough information of destination (overhead)
- no reachability information
- independent forwarding
- congestion
- network switch keep forwarding table
- map destination to outgoing port
circuit-based forwarding
- need establish circuit (stateful)
- virtual circuit established on demand
- circuit establishment request use frame forwarding
- network switch keep virtual circuit table
- input port, input id, output port, output id
source-based routing
- origin provide forwarding information
- specify each output port in switch
- intermediate switch shift array
- address of each switch
- specify each output port in switch
- origin need global view
- frame header size undefined
usage
- establish virtual circuit
- go around failure
- hide origin of packet for attack
- mostly disabled
Ethernet switch device
multiple Ethernet interface
- build forwarding table
- broadcast if destination not on forwarding table
- remove old entry
spanning tree algorithm
- disable some link to eliminate loop
- switch with lowest ID is the root
- each switch start with self as root and broadcast root and distance
- alternative: TRILL, shortest path bridging
reason
- broadcast storm: broadcast cycle
drawback
- waste bandwidth
- O(d) where d is diameter of network, slow to converge if d large
repeater
- increase collision
virtual local area network (VLAN)
- additional field in frame header
- block invalid address
- scoped broadcast
interconnect network
- failed attempt to convert packet
internet protocol (IP)
- service model
- no guarantee
- most basic requirement on link layer, work everywhere
- packet header
IP fragmentation
what to do when packet size exceed frame size
- reassemble packet at destination
- in frame header
- use same identifier
- indicate following fragment with flags
- help reassemble with offset
- alternative: drop packet
- default in IPv6
- tell sender maximum valid size
IP address
- hierarchical
- network identifier
- node identifier
network information center (NIC)
IPv4
- class A: big network, first byte < 128, 1 byte reserved
- class B: smaller, first byte 129 ~ 191, 2 byte reserved
- class C: small, first byte ≥ 192, 3 byte reserved
- private address: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
IPv6
- 16 byte
- written in hex, 2 byte between each colon, 0s omitted
subnet/CIDR
split address
- length
x.y.z.w/l
- bit not masked determine subnet size
- network smaller than
/24
usually not allowed in routing table
- netmask: same as length, 1s followed by 0s
- can split large subnet/ aggregate small subnet and advertise big network
- routing table can have overlapping entry, specific one used
network address translation (NAT)
enable computer in network share same globally routed public address
- engineering solution (hack)
- connection mapping: NAT table, use local port and remote IP and port to multiplex packet
- IP no longer unique across device
- port forwarding (UPNP) so incoming connection arrive
IP forwarding by router
- if interface is connected to destination network, translate to link-layer address (e.g. MAC) and deliver over local network
- else, find a router nearer to destination and do the above to it
address translation
- build link-layer address from IP
- does not work, e.g. IPv4 address (4 byte) to MAC (6 byte)
- privacy leak, e.g. IPv6 EUI-64 deprecated
- address resolution protocol (ARP)
- source broadcast ARP request
- matching node send ARP response with link-layer address to requester
- neighbor discovery protocol (NDP) for IPv6
default route
just deliver to this route if not in my routing table
routing table
- can configure manually but usually automatically by routing protocol
- distilled into forwarding table
- map destination to potentially many next port
- additional information than forwarding table
distance vector protocol
- used by routing information protocol (RIPv2)
- each router periodically broadcast all its best route to each neighbor
- count to infinity problem
- solution: limit maximum distance to 15
- split horizon optimization: do not send route back to where it is from
link-state protocol
- used by open shortest path first (OSPF)
- each router globally broadcast all its link at change
- reliable flooding
- costly
- send link-state packet (LSP)
- list of neighbor and metric (cost) of link
- sequence number: increasing
- guaranteed delivery via ACK
- time-to-live
- resend if no ACK
- receiver discard LSP with lower sequence number than seen
- receiver forward LSP
- receiver optimization: wait for other LSP for a while
- only needed when topology change
- metric
- usually fixed
- dynamic metric cause route oscillation and unpredictable performance
node configuration
- manual configuration: maintain IP allocation table
- autoconfiguration: DHCP
dynamic host configuration protocol (DHCP)
- client without IP address broadcast DHCP discover message
- DHCP server respond with configuration
- IP address
- DNS server
- default gateway
- lease duration
- DHCP server reserve IP prefix and give new device unallocated IP address
- DHCP relay allow multiple network to use 1 DHCP server
internet control message protocol (ICMP)
- report error, transmit metadata
- for troubleshoot
message type
0: echo reply
3: destination unreachable
- 0/1/2/3: network/host/protocol/port unreachable
- 4 fragmentation needed, for maximum transmission unit (MTU) discovery
8: echo request
interdomain routing
autonomous system (AS)
set of router following same policy, a.k.a domain
- distinct operator
- transit traffic: traffic across two AS
- stub AS: on the border, no transit trafic
- transit AS: opposite; promise to route you anywhere
- multihoming: network with multiple provider
- peering point: host network equipment
border gateway protocol (BGP)
- de facto standard
- policy-based routing
- oblivious to performance
- goal: find loop-free policy-compliant path
- no obligation to accept or propagate route
- can withdraw route
- focus on scalability
- only propagate most preferred route
- only need to run on router that connect two AS
- configure BGP session manually
- sometimes require contract and service agreement
- small/regional network pay larger network for transit
routing relationship and policy
- whatever make them money
- provider-client/client-provider: client pay for transit
- provider export all route to client
- client export only its route to provider
- peer-to-peer: exchange route for free
- peer export client route to peer
user datagram protocol (UDP)
multiplex packet for different application
- demultiplexing key (network port) in packet
- add packet to queue for socket. packet stay until application call
recv()
- datagram: packet
network port
- 2 byte, local to each node
- which port
- in the spec
- portmap
- well-know port: SSH: 22, DNS: 53, HTTP: 80, HTTPS: 443
transport control protocol (TCP)
- demultiplexing
- guaranteed delivery
- dynamic retransmission timer
- in-order delivery
- dynamic sliding window
- bidirection transmission
- flow control
- accommodate for slowest endpoint
- congestion control
- coordinate across nodes
end-to-end principle
functionality should be provided at a layer only if it can be complete there
- guarantee on each link does not guarantee end-to-end
- TCP/IP push reliable delivery to transport layer
TCP header
- number each byte: SequenceNum
- ACK for each byte: Acknowledgment
- control sliding window: AdvertisedWindow
TCP connection establishment
- three-way handshake
- 4 messages to tear down
TCP flow control
- receiver set AdvertisedWindow to remaining buffer space
- sender set sliding window size according to AdvertisedWindow
- block
send()
on buffer full
- block
- sender send 1 byte periodically to check AdvertisedWindow
- prevent chicken-egg problem: hang on 0 AdvertisedWindow
- 32 bit sequence number, sliding window can have up to 216 byte
- prevent sequence number wrap around
- increase sequence number to 64 bit with option flag
- use all bandwidth
- AdvertisedWindow has 16 bit, maximum sliding window size is 64KiB
- option flag to multiply value in AdvertisedWindow
- Nagle’s algorithm: sender wait for ACK if cannot send full frame
- silly window syndrome: sender send tiny frame due to tiny AdvertisedWindow
- adaptive transmission: EstimatedRTT te, MeasuredRTT tm, Timeout to
Karn/Partridge algorithm with parameter a:
te=a⋅te+(1−a)⋅tmto=2⋅te
Jacobson/Karels algorithm with DeviationRTT σ, parameter δ:
te=te+(δ⋅(tm−te))σ=σ+δ⋅(∣tm−te∣−σ)to=te+4⋅σ
ignore retransmitted packet
hypertext transfer protocol (HTTP)
- universal resource locator (URL)
protocol:path
- for HTTP
http[s]://[user[:pass]@]host/path[?query][#fragment]
HTTP message
- command/status line
- request: command
GET / HTTP/1.1
- response: status line
HTTP/1.1 200 OK
- request: command
- headers
- mandatory
Content-Length: 69
- mandatory
- empty line
\r\n
- content
remote procedure call (RPC)
client request & server respond (minimum 2 transmission)
- message oriented
- lower overhead than TCP
- stub
- allow function call
- protocol
- encode high-level data
- acknowledgement model
- send ACK
- response as ACK
- ping/working
- sync/async
popular example: REST/gRPC
real-time protocol (RTP)
- latency is vital
- monitor path property
- let application handle packet drop
stream control transmission protocol (SCTP)
- media delivery
- built-in media stream sync
- not depend on low-layer functionality
- common: IP + UDP
- example: WebRTC
simple mail transfer protocol (SMTP)
- user send mail to domain server
- domain server send to other domain server
- other domain server push to other user
- other user retrieve message using IMAP or POP3
- plain text: encode binary to string
- optional authentication: usually only between user/server
congestion control
- router-based vs endhost-based
- priori resource reservation vs transmission-time feedback
- window-based vs rate-based
- throughput vs latency: power = throughput / latency
traffic flow
- defined as (Source IP, Destination IP) or (Source IP, Source Port, Destination IP, Destination Port)
- aid router track soft state
resource allocation fairness
Jain’s fairness for n flow with throughput xi
f(x1,⋯,xn)=n∑i=1nxi2(∑i=1nxi)2:Rn→[0,1]
- 1 is most fair
TCP queue discipline
- FIFO queue with tail-drop
- prevalent
- priority queing
- absolute priority: higher priority always first
- reserve a fraction of bandwidth
- diffserv: buy priority
- fair queuing
- number of flow variable and large
- flow send packet of different size
- random early drop (RED): drop packet before queue fill
- no drop below minTreshold
- increasing drop probability between minTreshold and maxThreshold
classic TCP congestion control
- utilizing bandwidth & fairness
- introduced during “congestion collapse”
- implicit signal: drop packet
congestion window cwnd
limit number of unacknowledged byte
additive increase/ multiplicative decrease (AI/MD)
- packet dropped:
cwnd = max(cwnd / 2, packet_size)
- not dropped:
cwnd += packet_size
- mathematically proved to converge to same throughput
TCP slow start
- exponentially increase
cwnd
at start: each ACK:cwnd += 1
- when dropped
- slow start threshold
SSthresh = cwnd / 2
- each RTT:
cwnd = packet_size
- actual: each ACK:
cwnd += packet_size / cwnd
- actual: each ACK:
- slow start threshold
- slow start while
cwnd < SSthresh
, do AI/MD afterwards
TCP fast retransmit
- receiver send duplicated ACK for each packet out of order
- sender retransmit on multiple duplicated ACK
TCP cubic
- default in Linux since 2006
- hybrid slow start
- identify ACK slowdown and do AI/MD
- stay at expected maximum
cwnd
, do AI after no drop for a while
domain name system (DNS)
- first try: “hosts” file
/etc/hosts
- top-level domain by ICANN
- domain level hierarchy
- name resolution: ask name server from root up
- 13 root server each use different implementation
- distributed database
DNS entry
- canonical name
- nick name
- subdomain entry
attack
authenticity attack
- phishing: impersonation
- spoofing: hide source of attack
countermeasure: HTTPS
anonymity attack
- censorship
- persecution
- blackmail
countermeasure: VPN, TOR
availability attack
- DDoS
- ransom
firewall
- block incoming, usually allow outgoing
- software/hardware
- cheap, flexible vs expensive, opaque
- low vs high performance
- stateful/stateless
- have vs no memory
- cheap, easy vs expressive
firewall access policy
- accept, drop, reject
- default allow vs default deny
- IP, transport protocol, port, application data
demilitarized zone
- allow closely watched device to access more
intrusion detection system (IDS)
monitor software/hardware to detect malicious activity
- network intrusion detection system (NIDS)
- host intrusion detection system (HIDS)
virtual private network (VPN)
- tunneling protocol
- end-to-end encryption
usage
- remote access
- site-to-site
- bypass firewall/proxy
secure tunnel
- TLS, SSL
- IPsec
- authentication header (AH): after IP header
- encryption header (EH): after AH, before encrypted TCP header
- wrap all above as payload into another IP-AH-EH-encrypted payload
the onion router (Tor)
overlay network
- internet is overlay IP network over physical link
- VPN is overlay IP network over internet
example
- overlay multicast
peer-to-peer network
- unstructured peer-to-peer network: messy
- super node: big machines, store what file neighbor have
- coordination: maintain after super node leave
- tracker: broadcast what peer interested in same file
- peer exchange (PEX): list of known peer
BitTorrent
- dedicated tracker track peer and torrent
- direct connection between peer for data transfer
- select peer by speed (tic-for-tat)
- seeder: peer with file
- leecher: peer looking for file
- distributed hash table (DHT)
- no need for tracker
Data Science
Visualization
data type
item
discrete entry
attribute
measured property
attribute type
nominal (categorical)
with equality relationship
ordinal
with order
quantitative (numerical)
with algebra
- ratio: with meaningful 0
- interval
link
relationship between item
position
spatial location
grid
sampling strategy for continuous data
structured data
- table: item & attribute
- flat
- multi-dimensional
- network: item & attribute & link
- field: grid & attribute
- geometry: position
unstructured data
text, audio, graphics…
symbolic display
- parallel coordinate
- treemap
- field
- choropleth map
- star plot
graph
- framework
- scale
- content
- mark
- label
- title
- axis
mark
represent item, link
- point
- line
- area
channel
represent attribute
- position
- color
- hue
- saturation
- luminance
- shape
- glyph
- tilt
- size
- area
- texture
characteristic of channel
- selective
- associative
- quantitative
- order
- length
analysis framework
four level
- domain situation
- data/task abstraction
- visual encoding/interaction idiom
- algorithm
three question
- what
- why
- how
user task
- query
- retrieve value
- find extremum
- determine range
- search
- sort
- filter
- consume
- compute derived value
- characterize distribution
- find anomaly
- cluster
- correlate
graph theory
G(V,E)
- independent set: no edge
- clique: all possible edge
- path: all edge can be consecutively traversed
- tree: no cycle
- network
- unconnected graph
- biconnected graph: no articulation point
- articulation point: break the graph if deleted
- bipartite graph: edge between only vertex for two set
centrality
quantity to measure vertex importance
degree of vertex
number of connected node
- in-degree/ out degree for directed graph
betweeness of vertex
sum of number of shortest path through the vertex divided by number of shortest path between each pair of vertex
B(i)=a,b∑gabgaib
closeness of vertex
sum of geodesic distance between the vertex and every other
C(i)=j∑d(i,j)
geodesic distance between vertex
number of edge between them on the shortest path
eigenvector of vertex
adjacency matrix dot vector of degree
clustering coefficient
time
Documentation
Pandoc
How to use Pandoc to produce a research paper
create paper.pdf
from paper.md
and paper.bib
pandoc --pdf-engine latexmk -C --bibliography=paper.bib \
-M reference-section-title=Bibliography \
-V classoption=twocolumn -V papersize=a4paper -V fontsize=12pt \
-V geometry:margin=0.1in -V mainfont=Times \
-s paper.md -o paper.pdf
deal with CJK
pandoc --pdf-engine xelatex -C --bibliography=paper.bib \
-M reference-section-title=Bibliography \
-V classoption=twocolumn -V papersize=a4paper -V fontsize=12pt \
-V geometry:margin=0.1in -V mainfont=Times -V CJKmainfont='Songti SC' \
-s paper.md -o paper.pdf
using HTML as intermedia and wkhtmltopdf
to create PDF
pandoc -s paper.md -o paper.pdf -t html5 \
-V margin-top=0 -V margin-bottom=0 -V margin-left=0 -V margin-right=0
word count
pandoc --lua-filter ~/.config/helper.lua/wc.lua input
convert docx
to md
:
cat INPUT_FILENAME.docx | pandoc --from docx --to markdown_strict \
--extract-media=. -o OUTPUT_FILENAME.md &&
sed -i -e 's/\\\_/_/g' \
-e 's/\\\\\[/[/g' \
-e 's/\\\\\]/]/g' \
-e 's/\\\\\*/*/g' \
-e 's/\\\\left\\\\\\\\ /\\\\left\\\\{/g' \
-e 's/&gt;/>/g' \
-e 's/&lt;/</g' \
OUTPUT_FILENAME.md
Exams
Cheatsheet CSS
html body {
font-size: 12.5px;
}
* {
line-height: 1 !important;
margin-top: 0 !important;
margin-bottom: 0 !important;
padding-top: 0 !important;
padding-bottom: 0 !important;
}
h1,
h2,
h3,
h4,
h5,
h6 {
font-size: 1em !important;
}
main,
div,
section {
margin: 0 !important;
padding: 0 !important;
}
:not(span, code, pre) {
font-family: Arial Narrow !important;
}
main {
display: block !important;
}
a.anchor {
display: none;
}
section.markdown-body {
column-count: 3;
column-gap: 0;
}
html body ul,
html body ol {
padding-left: 1em;
}
TOEFL
reading
inference
- do not jump away from choice contradicting to article
summary
- do not choose choice that are not main point
speaking
do not add stuff not on notes
defend personal choice
- give example
campus-related passage + dialog
- summarize passage first
academic course passage + lecture
- directly go into lecture, mention passage in the way
academic course passage
- get example
- omit details, keep summary
Financial
Monero
p2pool mining setup using Monero GUI: add below to
Daemon startup flags
in node setting (from https://www.reddit.com/r/MoneroMining/comments/19dvhpg/unable_to_set_up_p2pool_node/)--zmq-pub tcp://127.0.0.1:18083
Government
United States
- apply for visa outside US
Enter US checklist
- passport w/ > 6 months validity
- visa & all related document
macOS
get ID of app:
osascript -e 'id of app "SomeApp"'
stop mds_store
going crazy: disable Spotlight indexing
sudo mdutil -a -i off
# enable later
sudo mdutil -a -i on
add big folders to Spotlight privacy ignore list
flush DNS cache:
sudo dscacheutil -flushcache
sudo killall -HUP mDNSResponder
restart audio control, useful when audio is not working:
sudo killall coreaudiod
install xcode-select w/o using the garbage CLI that hangs: find the version for the corresponding Xcode version here https://developer.apple.com/download/all/?q=command%20line%20tools%20for%20Xcode, download the .dmg
file, and install
MagSafe not working: reboot
Sucklessfy
allow apps from anywhere
sudo spctl --master-disable
DS.Store
remove all .DS_Store in subfolder
find . -name '.DS_Store' -type f -delete
key repeat
disable alternate character and enable key repeat for APP_NAME
defaults write APP_NAME ApplePressAndHoldEnabled -bool false
disable bloat daemon
cd /System/Library/LaunchDaemons
sudo launchctl unload com.apple.mobile.softwareupdated.plist
Mathematics
- Abstract Algebra
- integer
- relation
- group
- ring
Abstract Algebra
integer
Z:={⋯,−2,−1,0,−1,−2,⋯}
natural number
N:={0,1,2,⋯}
well-ordering axiom
∀ R⊆N,R=∅,∃ a∈R s.t. min(R)=a
division algorithm
∀ a,b∈Z,b>0,∃! q,r∈Z, s.t.
a=bq+r,0≤r<b
divisibility
b∣a, or b divide a⇔∃ c∈Z, s.t. a=bc
- b∣a, or b do not divide a
- b∣a⇔b∣(−a)
- a=0,b∣a⇒b≤∣a∣
greatest common divisor
a,b∈Z∖{0},∃! d:=gcd(a,b), s.t.
- d∣a,d∣b
- c∣a,c∣b⇒c≤d
or
- d∣a,d∣b
- c∣a,c∣b⇒c∣d
Euclidean algorithm
a,b∈Z∖{0}
let r−2=a,r−1=b
apply division algorithm on ri−1,ri until rn+1=0
ri−1=qi+1ri+ri+1
rn=gcd(a,b)
- a,b,q,r∈Z,b>0,a=bq+r⇒gcd(a,b)=gcd(b,r)
Bézout’s identity
a,b∈Z∖{0},∃ u,v s.t.
gcd(a,b)=au+bv
- gcd(a,b)=min({ax+by∣x,y∈Z})
- x,y∈Z,c=ax+by⇒gcd(a,b)∣c
- find u,v: substitute gcd(a,b) from the last equation in Euclidean algorithm up
relatively prime
a,b are relatively prime ⇔gcd(a,b)=1
- gcd(a,b)=1⇔∃ x,y∈Z,ax+by=1
- a∣bc,gcd(a,b)=1⇒a∣c
prime number
p∈Z, then p is prime ⇔p=0,p=±1, the only divisor of p are ±1,±p
- composite number: not prime
Euclid’s lemma
p prime, p∣ab⇒p∣a or p∣b
- p∈Z∖{0,±1}, then p prime ⇔ if p∣ab, then p∣a or p∣b
- p prime, p∣a1a2⋯an⇒∃ i,p∣ai
fundamental theorem of arithmetic
∀ n∈Z∖{0,±1},n can be factored uniquely into product of primes
relation
relation R on A is a subset R⊆A×A
aRb or a is related to b⇔(a,b)∈R
- aRb or a is not related to b
partition of set
A is a non-empty set, partition of A
P: {Ai∣i=1,2,⋯}
s.t.
- ∀ i=j,Ai∩Aj=∅
- A=i⋃Ai
- RP:={(a,b)∣∃ Ai,a∈Ai,b∈Ai} is equivalence relation
equivalence relation
- reflexive: aRa
- symmetric: aRb⇒bRa
- transitive: aRb,bRc⇒aRc
equivalence class
equivalence class of representative a
[a]:={x∈A∣xRa}
aRb⇔[a]=[b]
[a],[b] are either disjoint or identical
quotient set of A by R
A/R:={[a]∣a∈A}
is partition of R
modular arithmetic
a,b,n∈Z,n>0, then
a≡nb or a≡b (mod n) or a is congruent to b modulo n⇔n∣(a−b)
congruence module n
Z/n (Z mod n) or quotient set of Z by ≡n
- Z/n is equivalence relation on Z
congruence class
equivalence class of a modulo n
[a]n:={x∈Z∣x≡na}={a+nk∣k∈Z}
- there are n distinct congruence classes
addition and multiplication in modular arithmetic
[a]+[b]=[a+b][a][b]=[ab]
- addition and multiplication are well-defined
- ∀ [a]∈Z/n,∃! additive inverse
- [a]=[0] either has unique multiplicative inverse or it has not
unit
∃ x∈Z/n,[a]x=[1]
- [a] is unit ⇔gcd(a,n)=1
zero divisor
∃ x=[0],[a]x=[0]
p∈N∖{0,1}, then
p is prime
⇔∀ [a]=[0], [a] is unit
⇔[b][c]=[0]⇒[b]=[0] or [c]=[0], or no zero divisor
group
ordered pair (G,∗)
- set G
- binary operation ∗
axiom group satisfy:
- ∗ is associative
- identity: ∃ e∈G, ∀ a∈G, a∗e=e∗a=a
- inverse: ∀ a∈G, ∃ a−1∈G, a∗a−1=a−1∗a=e
property
- e is unique
- ∀ a∈G, a has unique inverse
- a∗x=b and y∗a=b has unique solution, cancellation law work
- (a−1)−1=a
- (a∗b)−1=b−1∗a−1
- value of a1∗a2∗⋯∗an is independent of the bracketing
abelian group (commutative group)
∗ is commutative
- Z,Q,R,C,Z/n,Rm×n under +
- Q∖{0},R∖{0},C∖{0},(Z/n)× under ⋅
nonabelian group
- dihedral group of order 2n, D2n
- (GL(n,R),⋅)
general linear group of degree n over R
GL(n,R):={A∈Rn×n∣A is invertible}
special linear group of degree n over R
SL(n,R):={A∈GL(n,R)∣detA=1}
binary operation
∗:G×G→G
associative binary operation
(a∗b)∗c=a∗(b∗c)
commutative binary operation
a∗b=b∗a
order of group
∣G∣, or order of G, is the number of distinct elements in G
order of element in group
∣a∣, or order of a, is the smallest n∈Z+ s.t.
an=e
or ∞ if n DNE
- e,a,a2,⋯,an−1 are distinct
- am=1⇔n∣m
- ∀ m∈Z+,∣am∣=gcd(m,n)n
Torsion group
G is a Torsion group, then ∀ a∈G,∣a∣ is finite
Torsion-free group: ∀ a∈G,a=e,∣a∣=∞
Torsion subgroup of G: for abelian group G,
Tor(G):={a∈G∣∣a∣ is finite}
ab=ba,gcd(∣a∣,∣b∣)=1⇒∣ab∣=∣a∣∣b∣
G is abelian Torsion group, then c has largest order ⇒∀ a∈G,∣a∣∣c∣
dihedral group D2n
rotation (r) and reflection (s) of n-gon
{1,s,r,sr,r2,sr2,⋯,rn−1,srn−1}n≥3(ri)(k)=i+kmodn
- order 2n
- degree n
- ∣s∣=2
- ∣r∣=n
- ∀ 0≤i=j≤n−1,sri=srj
- ∀ 0≤i≤n,ris=sr−i
permutation of set
permutation of X: bijection
σ:X→X
symmetric group on set
set Ω=∅
SΩ, the set of permutation of Ω
(SΩ,∘), symmetric group on Ω
- n≥1,Ω=Xn⇒(Sn,∘), symmetric group of degree n
- ∣SΩ∣=n!
- n≥3⇒(Sn,∘) nonabelian
array notation of permutation
α is permutation on Xn
α=(1α(1)2α(2)⋯⋯nα(n))
cycle notation of permutation
cyclically permute and fix the rest
(a1 a2 ⋯ am)=(a1a2a2a3⋯⋯ama1)
- m-cycle (or cycle of length m)
transposition
2-cycle
- ∀ σ∈Sn,σ can be expressed as product of transposition
- not unique
- even permutation: length of such transposition is even
- odd permutation
disjoint cycle
no common number
- α,β∈Sn disjoint ⇒αβ=βα
- ∀ π∈Sn,π=1,π can be uniquely expressed as product of disjoint cycle of length at least 2
subgroup
H≤G, or H⊆G is a subgroup of G
- H=∅
- ∀ x,y∈H,xy∈H
- ∀ x∈H,x−1∈H
- H<G, or H is a proper subgroup of G
subgroup test
- H≤G⇔
- H=∅
- ∀ x,y∈H,xy−1∈H
- ∣G∣<∞,H≤G⇔
- H=∅
- ∀ x,y∈H,xy∈H
centralizer of group
CG(A):={g∈G∣∀ a∈A,ga=ag}
- CG(a) if A={a}
center of group
Z(G):={g∈G∣∀ a∈G,ga=ag}
normalizer of group
NG(A):={g∈G∣gAg−1=A}
where
gAg−1:={gag−1∣a∈A}
for group G and subset A⊆G
- CG(A)≤NG(A)≤G
- G abelian ⇒Z(G)=G,CG(A)=NG(A)
cyclic group
G=⟨x⟩={xn∣n∈Z}
- generator x
- cyclic group are abelian
- ∣G∣=∣x∣
fundamental theorem of cyclic group
- H≤G⇒∃! k∈N,k=mink{xk∈H}, H=⟨xk⟩
- ∣G∣=n⇒
- ∀ a∈N,a∣n,∃! A,∣A∣=a
- A=⟨xd⟩
- d=an
- ∀ m∈Z,⟨xm⟩=⟨xgcd(m,n)⟩
- ∀ i,j,xi=xj⇔i≡jmodn
- ∀ a∈N,a∣n,∃! A,∣A∣=a
- ∣G∣=∞⇒∀ m∈Z,⟨xm⟩=⟨x∣m∣⟩
homomorphism
(G,⋅),(H,⋅) are group
well-defined map φ:G→H is homomorphism ⇔
∀ a,b∈G,φ(ab)=φ(a)φ(b)
property
- φ(1G)=1H
- ∀ n∈Z,φ(an)=φ(a)n
- monomorphism: φ injective
- epimorphism: φ surjective
- ≅, or isomorphism: φ bijective
well-defined map
f:A→B is well-defined ⇔
∀ x=y∈A⇒f(x)=f(y)
automorphism
isomorphism from G to G
- inner automorphism of G, φa(g)=aga−1(a∈G)
isomorphism
property
- ∣G∣=∣H∣
- G abelian ⇔H abelian
- ∀ g∈G,∣g∣=∣φ(g)∣
- ∀ n∈Z+,G,H have same number of elements of order n
- ∣G∣=p prime ⇒G≅Z/p
equivalence relation from isomorphism
G is set of groups
≅ is equivalence relation on G
cyclic isomorphism
G=⟨x⟩
- ∣x∣=∞⇒G≅Z
- ∣x∣=n⇒G≅Z/n
Cayley’s theorem
∀ group G, ∃ permutation H≤SG, s.t. G≅H
kernel and image of homomorphism
φ:G→H is homomorphism
h∈H
kernel of φ
Ker φ:={a∈G∣φ(a)=1}
- Ker φ≤G
- measure how much φ is not injective
- φ is injective ⇔Ker φ={1}
image of φ
Im φ=φ(G):={φ(a)∣a∈G}
- Im φ≤H
- φ is injective ⇔Im φ≅G
fiber of h under φ
φ−1(h):={a∈G∣φ(a)=h}
- φ is surjective ⇔Im φ=H
coset
H≤(G,⋅), g,a,b∈G
left coset of H in G
gH:={gh∣h∈H}
- ∃ bijection between H and gH
- aH=bH⇔b−1a∈H
- aH=bH or aH∩bH=∅
right coset of H in G
Hg:={hg∣h∈H}
- ∃ bijection between H and Hg
- Ha=Hb⇔ab−1∈H
- Ha=Hb or Ha∩Hb=∅
g is representative
- left and right coset are not necessarily equal
- gH≤H⇔g=1
- ∣H∣=∣gH∣=∣Hg∣
set of coset
set of left coset
G/H:={aH∣a∈G}
set of right coset
H∖G:={Ha∣a∈G}
- ∃ bijection between G/H and H∖G
index of subgroup in group
index of H in G
[G:H]
is number of distinct left (right) coset
Lagrange’s theorem
- ∣H∣∣G∣
- [G:H]=∣H∣∣G∣
for finite G
- a∈G⇒∣a∣∣G∣
- ∀ a∈G,a∣G∣=1
normal subgroup
N⊴G, or N is normal in G⇔
∀ a∈G,aN=Na
normal subgroup test
- N⊴G
- ⇔∀ a∈G,aNa−1⊆N
- ⇔∀ a∈G,aNa−1=N
- ⇔NG(N)=G
- ⇔ if aN=bN,cN=dN, then (ac)N=(bd)N
quotient subgroup
N⊴G
G/N=N∖G is the quotient group of G by N
under operation ⋅:
(aN)(bN):=(ab)N
- ∣G/N∣=[G:N]
natural projection
N⊴G
natural projection of G onto N
homomorphism π:G→G/N
π(a)=aN
- π is epimorphism
- Ker π=N
isomorphism theorem
first isomorphism theorem
φ:G→H homomorphism ⇒
G/Ker φ≅Im φ
under Φ:G/Ker φ→Im φ
Φ(gKer φ):=φ(g)
second isomorphism theorem (diamond theorem)
K≤G,N≤G,N⊴G⇒
- KN≤G
- K∩N ⊴K
- N⊴KN
- K/K∩N ≅ KN/N
- [KN:K]=[N:K∩N]
third isomorphism theorem
H⊴G,K⊴G,K≤H
- H/K ⊴ G/K
- (G/K)/(H/K) ≅ G/H
G/K≅G/H under Γ:G/K→G/H
Γ(gK):=gH
forth isomorphism theorem
N⊴G, natural projection π:G→G/N
∃ bijection Π
Π:{A≤G∣N⊆A}→{A≤G/N}Π(A)=π(A)={π(a)∣a∈A}=A/N
A,B≤G,N≤A,B
property of Π
- A≤B⇔Π(A)≤Π(B)
- A≤B⇒[B:A]=[Π(B):Π(A)]
- Π(A∩B)=Π(A)∩Π(B)
- A⊴G⇔Π(A)⊴G/N
ring
a triple (R,+,⋅) s.t.
(R,+) is abelian group
⋅ is associative
∀ a,b,c∈R,(ab)c=a(bc)
distributive
∀ a,b,c∈R,a(b+c)=ab+ac
- ∀ a,b∈R,0a=a0=0
- ∀ a,b∈R,(−a)b=a(−b)=−(ab)
- ∀ a,b∈R,(−a)(−b)=ab
commutative ring
⇔∀ a,b∈R, ab=ba
ring with identity
⇔∃ 1∈R, ∀ a∈R, a1=1a=a
- 1 is unique
- ∀ a∈R,−a=(−1)a
division ring
⇔(R,+,⋅) is ring with identity and ∀ a∈R∖{0}, ∃ b, ab=ba=1
field
commutative division ring
zero divisor of ring
a∈R,a=0
a is zero divisor ⇔
∃ b∈R,b=0 s.t. ab=0 or ba=0
- ∀ a,b,c∈R,a is not zero divisor, ab=ac⇒a=0 or b=c
integral domain
commutative ring (R,+,⋅),1=0
(R,+,⋅) is integral domain ⇔ no zero divisor:
∀ a,b,c∈R,ab=0⇒a=0 or b=0a=0,b=0⇒ab=0
- ∀a,b,c∈R,ab=ac⇒a=0 or b=c
- R is finite ⇒R is field
unit of ring
R is ring with identity, 1=0,a∈R
a is unit ⇔
∃ b∈R s.t. ab=1=ba
- group of units of R, R×:={units in R}
- zero divisor cannot be unit
subring
S⊆R
S is subring of R⇔
- (S,+) is a subgroup of S
- S is closed under multiplication
relationship between identity
- 1S≡1R
- 1R∈S⇒1S=1R, unit in S are unit in R
subring test
S is subring of R⇔
- S=∅
- ∀ a,b∈S,a−b∈S
- ∀ a,b∈S,ab∈S
S is finite subring of R⇔
- S=∅
- ∀ a,b∈S,a+b∈S
- ∀ a,b∈S,ab∈S
center of ring
R is ring with 1
Z(R):={z∈R∣∀ r∈R,zr=rz}
- is subring with 1
- R is division ring ⇒Z(R) is field
characteristic of ring
if ∃ n s.t.
n:=margmin{m∈Z+∣∀ a∈R,ma=0}
then char(R)=n, else char(R)=0
- char(R)>0⇔char(R)=nargmin{n∈Z+∣n1=0}
- R is integral domain ⇒char(R) is 0 or prime
- R finite ⇒char(R)∣R∣
nilpotent of commutative ring with identity
commutative ring R with 1
x is nilpotent ⇔
∃ m∈Z+ s.t. xm=0
- x is 0 or zero divisor
- ∀ r∈R,rx is nilpotent
- 1+x is unit
- ∀ unit u,u+x is unit
polynomial ring
R is commutative ring with identity, x is indeterminate
ring of polynomial in x with coefficient in R
R[x]:={polynomial in x with coefficient in R}
- commutative
- identity 1
- R is subring of R[x]
- S is subring of R⇒S[x] is subring of R[x]
- R is integral domain, p(x),q(x)∈R[x]∖{0}⇒
- R[x] is integral domain
- deg(p(x)q(x))=deg(p(x))deg(q(x))
- (R[x])×=R×
polynomial in x with coefficient in R
n∈N,ai∈R,an=0
p(x)=anxn+an−1xn−1+⋯+a1x+a0
- degree deg(p(x))=n
matrix ring
R is ring, n∈Z+
matrix ring of degree n over R
Mn(R):=Rn×n
- R has identity ⇒1Mn(R)=In
n≥2⇒
- Mn(R) is not commutative
- Mn(R) has zero divisor
- set of scalar matrix in Mn(R) is a subring
- S is subring of R⇒Mn(S) is subring of Mn(R)
scalar matrix
∀ i=j,aii=a,aij=0
ring homomorphism
R,S are ring
well-defined map φ:R→S is ring homomorphism ⇔∀ r,s∈R,
- φ(r+s)=φ(r)+φ(s)
- φ(rs)=φ(r)φ(s)
- monomorphism, epimorphism, isomorphism, kernel, image
- Kerφ is subring
- Imφ is subring
- R≅S⇒ they have the same property (commutative, identity, inverse)
ideal of ring
subring I is ideal of R⇔
∀ r∈R,a∈I,ra∈I,ar∈I
- proper ideal: proper subset that is ideal
ideal test
subset I of ring R
I is ideal ⇔
- I=∅
- ∀ a,b∈I,a−b∈I
- ∀ r∈R,a∈I,ra∈I,ar∈I
⇔
a+I=b+I,c+I=d+I⇒(ac)+I=(bd)+I
maximal ideal
ideal M of R is maximal ideal ⇔
- M⊊R
- I is ideal of R, M⊆I⊆R⇒I=M or I=R
- easily seen in lattice
- not necessarily exist
- 1∈R⇒ every proper ideal is contained in a maximal ideal
- R is commutative, 1∈R⇔R/M is field
prime ideal
ideal P of R is prime ideal ⇔
- P⊊R
- ∀ a,b∈R,ab∈P⇒a∈P or b∈P
- R is commutative, 1∈R⇒
- P is prime ideal ⇔R/P is integral domain
- R is integral domain ⇔{0} is prime ideal
- every maximal ideal is prime ideal
quotient ring
I is ideal of R
R/I=R∖I is quotient ring of R by I
1∈R⇒1R/I=1+I
R commutative ⇒R/I commutative
natural projection
π(r):=r+I:R→R/I
isomorphism theorem for ring
first isomorphism theorem for ring
φ:R→S is ring homomorphism ⇒
- Kerφ is ideal of R
- R/Kerφ≅Im φ
second isomorphism theorem for ring
A,B are subring of R, B is ideal ⇒
- A+B is subring of R
- B is ideal of A+B
- A∩B is ideal of A
- A/A∩B≅(A+B)/B
third isomorphism theorem for ring
I,J are ideal of R, I⊆J⇒
- J/I is ideal of R/I
- (R/I)/(J/I)≅R/J
forth isomorphism theorem for ring
I is ideal of R, natural projection π:R→R/I
∃ bijection
Π:{S is subring of R∣S⊇I}→{S is subring of R/I}Π(S)=π(S)=S/I
A,B are subring of R, I⊆A,B⇒
- A⊆B⇔A/I⊆B/I
- A⊆B⇔[B:A]=[B/I:A/I]
- (A∩B)/I=(A/I)∩(B/I)
- A is ideal of R⇔A/I is ideal of R/I
Arithmetics
an−bn=(a−b)(an−1+an−2b+⋯+abn−2+bn−1)
a3+b3=(a+b)(a2−ab+b2)
logxy=logx+logylogxr=rlogx
a2a2+b2−b=a2+b2+ba2
sigma notation
i=m∑nf(xi)
- linear
common sum
i=1∑ni=2n(n+1)i=1∑ni2=6n(n+1)(2n+1)i=1∑ni3=[2n(n+1)]2
trigonometric transformation
sinAcosB=21[sin(A−B)+sin(A+B)]sinAsinB=21[cos(A−B)−cos(A+B)]cosAcosB=21[cos(A−B)+cos(A+B)]
⌊x⌋ greatest integer smaller or equal than x
n→∞limB′A′=1⇒n→∞lim(1+Br)A=er
binomial coefficient
(kn)=n!k(k−1)⋯(k−n+1)
triangle inequality
∣a+b∣≤∣a∣+∣b∣
reverse triangle inequality
∣∣x∣−∣y∣∣≤∣x−y∣
power when ∣z∣<<1
(1+z)n=1+nz+⋯
- Calculus
- function
- mathematical model
- limit
- derivative f′(a)
- extremum
- indeterminate form
- antiderivative
- integral
Calculus
function
piecewise defined function
different formula for different part of domain
- step function
composite function f∘g
mathematical model
empirical model
model based entirely on data
limit
f is defined on interval containing a except possibly a
x→alimf(x)=L⇔∀ ε>0, ∃ δ>0,if0<∣x−a∣<δthen∣f(x)−L∣<ε
limit exist iff limit from both side exist
x→alimf(x)=Liff x→a−limf(x)=L and x→a+limf(x)=L
limit law
linear: sum law, difference law, constant multiplication law
product law, power law
quotient law
x→alimg(x)f(x)=limx→ag(x)limx→af(x)x→alimg(x)=0
derivative rule
∀ r>0 rational
x→∞limxr1=0
- limit calculation technique
divide numerator and denominator by highest power of x
- limit calculation technique
direct substitution property
polynomial limit can be pulled out
h→0limhbh−1=lnb
limit comparison
f(x)≤g(x) near a
⇒x→alimf(x)≤x→alimg(x)
squeeze theorem (sandwich theorem)
f(x)≤g(x)≤h(x) near a,
x→alimf(x)=x→alimh(x)=L⇒x→alimg(x)=L
e.g.
h→0limhsinh=1
continuity
f is continuous at a
⇔x→alimf(x)=f(a)
- continuous from the left/ right
- continuous on an interval ⇔ continuous at every point in interval
- continuous function after operation in limit law are still continuous
- function continuous in domain
- polynomial
- rational
- (inverse) trigonometric
- exponential
- logarithmic
discontinuity
- removable
redefining f at single point remove discontinuity - unremovable
- infinite discontinuity
- jump discontinuity
intermediate value theorem
f continuous on [a,b],f(a)=f(b),
⇒∀ N∈(f(a),f(b)),∃ c∈(a,b),f(c)=N
derivative f′(a)
derivative of f at a
f′(a)=h→0limhf(a+h)−f(a)=x→alimx−af(x)−f(a)
derivative function f′(x)
f′(x)=h→0limhf(x+h)−f(x)=y′=dxdy=dxdf=dxdf(x)=Df(x)=Dxf(x)
differential operator D, dxd
f′(a)=dxdyx=a=dxdy] x=a
differentiability
f is differentiable at a
⇔f′(x) exist
f is differentiable on interval
⇔f is differentiable at every point in the interval
- f differentiable at a⇒f continuous at a
second derivative
f′′(x)=dxd(dxdy)=dx2d2y
derivative rule
linear: constant multiplication, sum, difference rule
definition of e
h→0limheh−1=1
product rule
dxd(uv)=udxdv+vdxdu
quotient rule
dxd(vu)=v2vdxdu−udxdv
chain rule
dxdy=dudydxdu
implicit differentiation
differentiate both side of equation
derivative of (inverse) trigonometric function
dxd(tanx)=sec2xdxd(cscx)=−cscxcotxdxd(secx)=secxtanxdxd(cotx)=−csc2xdxd(sin−1x)=1−x21dxd(cos−1x)=−1−x21dxd(tan−1x)=1+x21dxd(cot−1x)=−1+x21dxd(sec−1x)=xx2−11dxd(csc−1x)=−xx2−11
derivative of logarithmic function
dxd(ln∣u∣)=udxdudxdln∣x∣=x1
logarithmic differentiation
take the logarithm of both side of equation and differentiate
differential dy
dy=dxdydx
hyperbolic function
sinhx=2ex−e−xcoshx=2ex+e−x
- hyperbolic identity similar to trigonometric identity
- inverse hyperbolic function composed of logarithm
- derivative of hyperbolic function compose of hyperbolic functions
- derivative of inverse hyperbolic function have similar form like that of trigonometric function
extremum
extreme value theorem
f continuous on closed interval
⇒f has absolute maximum and absolute minimum in the interval
Fermat’s theorem
f has extremum at a, f′(a) exist
⇒f′(c)=0
critical number a
f′(a)=0 or f′(a) DNE
closed interval method
find absolute extremum of continuous function f on closed interval by checking the value of the critical number and endpoint
mean value theorem
f continuous on [a,b], differentiable on (a,b)
⇒∃ c∈(a,b),f′(c)=b−af(b)−f(a)
- ∀ x∈(a,b),f′(x)=0⇒∀ x∈(a,b),f(x)=C
- ∀ x∈(a,b),f′(x)=g′(x)⇒∀ x∈(a,b),f(x)=g(x)+C
Rolle’s theorem
f continuous on [a,b], differentiable on (a,b), f(a)=f(b)
⇒∃ c∈(a,b),f′(c)=0
derivative test
first derivative test
- increasing/ decreasing
- extremum
second derivative test
concave upward/ downward
graph of f lie above/ below all its tangent
inflection point
indeterminate form
- 00
- ∞∞
form that can transform to indeterminate form
- 0⋅∞
- ∞−∞
- 0∞
- ∞0
- 1∞
L’Hospital’s rule
f,g differentiable, g′(x)=0 near a (excluding a), limx→ag(x)f(x) is in indeterminate form
⇒x→alimg(x)f(x)=x→alimg′(x)f′(x)
antiderivative
F′=f
- F(x)+C are also antiderivative of f(x)
integral
definite integral
∫abf(x) dx=n→∞limi=1∑nf(xi∗)Δx
integrable
integral sign ∫
integrand f(x)
limit of integration
- lower limit a
- upper limit b
Riemann sum
i=1∑nf(xi∗)Δx
midpoint rule
choose x∗ to be the midpoint
property of definite integral
∫abf(x) d(x)=−∫baf(x) dx∫aaf(x) dx=0∫abf(x) d(x)+∫bcf(x) d(x)=∫acf(x) d(x)f(x)≥0⇒∫abf(x) d(x)≥0f(x)≥g(x)⇒∫abf(x) d(x)≥∫abg(x) d(x)m≤f(x)≤M⇒m(b−a)≤∫abf(x) d(x)≤M(b−a)
- linear
fundamental theorem of calculus
f continuous on [a,b]
⇒g(x)=∫axf(t) dta≤x≤b
continuous on [a,b] and differentiable on (a,b)
g′(x)=f(x)
F is any antiderivative of f
⇒∫abf(x) dx=F(b)−F(a)
indefinite integral
∫f(x) dx
mean
F′(x)=f(x)
net change theorem
∫abF′(x) dx=F(b)−F(a)
substitution rule
∫v(u(x))dxdudx=∫v(u)du∫abv(u(x))dxdudx=∫u(a)u(b)v(u)du
⇐ chain rule
trigonometric integral and trigonometric substitution
from sin2x+cos2x=1
a−x2=acosθ(x=asinθ,θ∈[−2π,2π])∫sin2k−1xcosnx dx=−∫(1−cos2x)k−1cosnx d(cosx)∫sinmxcos2k−1x dx=∫sinmx(1−sin2x)k−1 d(sinx)
from sec2x−tan2x=1
x2−a=atanθ(x=asecθ,θ∈[0,2π)∪[π,23π))∫tanmxsec2kx dx=∫tanmx(tan2x+1)k−1 d(tanx)∫tan2k−1xsecnx dx=∫(sec2x−1)k−1secn−1x d(secx)
∫tanx dx=ln∣secx∣+C∫secx dx=ln∣secx+tanx∣+C
rational function integral
(b1x+c1)m1…(bnx+cn)mn(d1x2+h1)p1…(dqx2+hq)pqa1xt+a2xt−1+⋯+at+1=b1x+c1A11+⋯+(b1x+c1)m1A1m1+⋯+bnx+cnAn1+⋯+(bnx+cn)mnAnmn+d1x2+h1C11x+D11+⋯+(d1x2+h1)p1C1p1x+D1p1+⋯+dqx2+hqCq1x+Dq1+⋯+(dqx2+hq)pqCqpqx+Dqpq
∫x2+a2dx=a1tan−1(ax)∫x2−a2dx=2a1lnx+ax−a
symmetric integral
f(−x)=f(x)⇒∫−aaf(x) dx=2∫0af(x) dxf(−x)=−f(x)⇒∫−aaf(x) dx=0
volume from integral
from x=a to x=b, rotate y=f(x) and y=g(x) about x-axis, volume:
∫abA(x) dx=∫abπ(f(x))2−(g(x))2 dx
from x=a to x=b, rotate y=f(x) and y=g(x) about y-axis, volume:
∫abl(x)h(x) dx=∫ab2πx∣f(x)−g(x)∣ dx
arc length from integral
∫PQ ds=∫PQ(dx)2+(dy)2=∫ab1+(dxdf)2 dx
surface area from integral
from x=a to x=b, rotate y=f(x) about y-axis, surface area:
∫abl(x) d(arclength(x))=∫ab2πf(x)1+(dxdf)2 dx
average of function
fave=b−a1∫abf(x) dx
mean value theorem for integral
f continuous [a,b]⇒∃ c∈[a,b]
f(c)=fave
integration by part
∫u dv=uv−∫v du
⇐ power rule
improper integral
integral with infinite interval
- f continuous on [a,∞)
- limt→∞∫atf(x) dx exist
⇒∫a∞f(x) dx=t→∞lim∫atf(x) dx
- opposite for ∫−∞af(x) dx
- convergent if limit exist
- divergent if limit DNE
- both side exist ⇒∫−∞∞f(x) dx is their sum
discontinuous integral
- f continuous on [a,b), discontinuous at b
- limt→b−∫atf(x) dx exist
⇒∫abf(x) dx=t→b−lim∫atf(x) dx
- opposite for when f continuous on (a,b], discontinuous at a
- convergent if limit exist
- divergent if limit DNE
- both side exist ⇒ can connect them
improper integral comparison
f(x)≥g(x)≥0, ∫a∞f(x) dx converge
⇒∫a∞g(x) dx converge
- Differential Equation
- first order equation
- below old class note
- ordinary differential equation ODE
- one-dimensional (smooth) dynamical system
- discrete dynamical system
- second-order linear equation
- higher order differential equation
- autonomous planar system of differential equation
- non-autonomous planar system
- chaos
- linear system
- matrix exponential
- planar linear system
- almost linear system
- energy method
- Lyapunov’s Method
- non-constant periodic solution
- bifurcation in one-dimensional system
- bifurcation in planar system
Differential Equation
an equation that contains an unknown function and one or more of its derivatives.
- order
highest derivative - solution
function that satisfy the equation - initial condition
- initial value problem
direction field
sketch lines as a grid with direction from derivative from the equation
Euler’s method
yn+1=yn+hyn′
- h step size
first order equation
separable equation
below old class note
ordinary differential equation ODE
- general form F(x,y,y′,y′′,⋯,y(n))=0
order of differential equation
the order of the highest order derivative
solution
any function that satisfy the equation
general solution
collection of all solution
initial value problem
- initial condition y(x0)=a0,y′(x0)=a1,⋯at x0∈I
use the general solution and the initial condition
direction field
y′=f(x,y)
solving first-order differential equation
numerical approximation
- Euler’s method y(x0+h)≈y0+hf(x0,y0) example code
f[x0_, y0_] := x0 y0
step[{x0_, y0_}] := {x0 + 0.01, y0 + 0.01 f[x0, y0]}
{x100, y100} = Nest[step, {1, 1}, 100]
separable equation
use separation of variable y′=g(x)p(y)
- consider p(y)=0
- integrate both side ∫p(y)dy=∫g(x)dx
- explicit solution
- implicit solution relationship between variable
p(y)=y
y′=yg(x)⇒y=AeG(x)
linear equation
a1(x)y′+a0(x)y=b(x) where a1(x),a0(x),b(x) are continuous, a1(x)=0
standard form
y′+P(x)y=Q(x) multiply both side to construct chain rule form let μ′(x)=μ(x)P(x)⇐μ(x)=Ae∫P(x)dx⇒y′μ(x)+yμ′(x)=(yμ(x))′=μ(x)Q(x) where μ(x) is the integration factor
substitution
Bernoulli equation
dxdy+P(x)y=Q(x)yn
solution
substitution v=y1−n reduce the Bernoulli equation to linear equation dxdu+(1−n)Pu=(1−n)Q
Ricatti equation
dxdy=P(x)y2+Q(x)y+R(x)
solution
- one solution u(x)
- substitution y=u+v1 reduce the Ricatti equation to linear equation dxdv+(2Pu+Q)v=−P
existence and uniqueness theorem
f(x,y) is continuous on R=[x0−a,x0+a]×[y0−b,y0+b] and satisfy Lipschitz condition ⇒∃ δ>0,{y′=f(x,y)y(x0)=y0 has unique solution y(x) for x∈[x0−δ,x0+δ]
Lipschitz condition
∃ K>0,∣f(x,y1)−f(x,y2)∣≤K∣y1−y2∣∀ (x,y1),(x,y2)∈R⇐∂y∂f is continuous (therefore bounded) on closed interval R
metric space (X,d)
∀ g,h,k∈X
- d(g,h)≥0
- d(g,h)=d(h,g)
- d(g,h)≤d(g,u)+d(u,h)
- d denote distance
- can be defined using ∥g−h∥=x∈[x0−δ,x0+δ]max∣g(x)−h(x)∣
limit
sequence {gn} converge to limit y in X ⇔n→∞limd(gn,y)=0
- the limit is unique
Cauchy sequence
∀ ε>0,∃ N,∀ n,m≥N,d(gn,gm)<ε
complete metric space
every Cauchy sequence converge to some point in X
fixed point
point
stand for function for operator
initial value problem integral form
for y′=f(x,y) let operator T: T[g](x)=y0+∫x0xf(s,g(s))ds then y(x)=T[y](x)⇒T[y]=y ⇒y is a fixed point for operator T
Banach fixed point theorem
X is complete metric space T:X→X is a contraction ⇒T has a unique fixed point in X
- contraction T:X→X is a contraction if ∃ k<1,∀ g,h∈X,d(T[g],T[h])≤Kd(g,h)
Picard iteration
sequence {gn} where gn+1=T[gn] with g0(x)=y0 n→∞limgn(x)=y(x)
one-dimensional (smooth) dynamical system
continuously differentiable function ϕ(t,x):R×R→R 1. ϕ(0,x0)=x0∀ x0∈R2. ϕ(t+s,x0)=ϕ(t,ϕ(s,x0))=ϕ(s,ϕ(t,x0))∀ t,s,x0∈R
- flow of dynamic system ϕ
autonomous equation
x′(t)=f(x) assume f satisfy condition of existence and uniqueness theorem
time-shift immunity
if the solution of x′=f(x),x(0)=x0 is x1(t) then the solution of x′=f(x),x(t0)=x0 is x2(t)=x1(t−t0)
equilibrium
xe∈R, f(xe)=0
- asymptotically stable x move towards xe from both side ⇐f′(xe)<0
- unstable x move away from xe from both side ⇐f′(xe)>0
phase line
draw the sign graph of f(x)
linearization (approximate linear dynamic)
let λ=f′(xe),y=x−xe y′=λy⇒y=y0eλt
- λ>0⇒ blow up and unstable
- λ<0⇒ shrink down and asymptotically stable
- λ=0⇒ cannot linearize, need higher order
differential equation to dynamical system
ϕ(t,x0)=x(t) where x′=f(x)
dynamical system to differential equation
f(x)=∂t∂ϕ(0,x) and convert to initial value problem with autonomous equation
discrete dynamical system
map p:R→R
xk+1=p(xk)⇒xk+t=pk(xt)
invertible p
xk−1=p−1(xk)
fixed point
not change when apply p p(x0)=x0
stability
linearization
q(y)=p(x0+y)−x0≈λyλ=p′(x0)
- dynamics yn=λny0
- unstable ⇐∣λ∣>1
- asymptotically stable ⇐∣λ∣<1
Poincaré map
time-periodic non-autonomous system
x′=f(t,x) where f(t+T,x)=f(t,x)
periodic time-shift immunity
if the solution of x′=f(t,x),x(0)=x0 is x1(t) then the solution of x′=f(t,x),x(kT)=x0(k∈Z) is x2(t)=x1(t−kT)
Poincaré map for the system
define function P:R→R: P(x0)=x(T)⇒x(kT)=Pk(x0) ∀ x0∈R, let x(t) be the solution to x′=f(t,x),x(0)=x0
periodic solution
fixed point of Poincaré map P(x0)=x0
stability
- asymptotically stable ∣λ∣=∣P′(x0)∣<1
- unstable ∣λ∣=∣P′(x0)∣>1
example code
P[h_,x0_] := NDSolveValue[
x'[t] == x[t](1-x[t]) - h x[t] Sin[t] && x[0] == x0,
x[2 Pi], {t, 0, 2 Pi}]
second-order linear equation
a(x)y′′+b(x)y′+c(x)y=f(x)
second-order homogeneous linear equation with constant coefficient
ay′′+by′+cy=0
solution
a linear combination of two solution y1(x),y2(x) where y2(x)y1(x) is not constant (linearly independent) y(x)=c1y1(x)+c2y2(x)
auxiliary equation
under the hood: try y=erx ar2+br+c=0
- Δ>0 two distinct real root r1,r2 y=c1er1x+c2er2x
- Δ=0 double real root r0 two solution y1=er0x,y2=xer0x y=(c1+c2x)er0x
- Δ<0 two complex root α±iβ y=eαx(c1cos(βx)+c2sin(βx))
- alternatively y=Aeαxcos(βx−ϕ) where A=c12+c22,ϕ=arg(c1+ic2)
- alternatively: real-valued solution y=ae(α+iβ)x+aˉe(α−iβ)x(a∈C)
- complex-valued solution y=a1e(α+iβ)x+a2e(α−iβ)x(a1,a2∈C)
second-order non-homogeneous linear equation
ay′′+by′+cy=f(x)
solution
- let yh(x)=y(x)−yp(x) then yh(x) satisfy ay′′+by′+cy=0
- find one particular solution yp(x) therefore y(x)=c1y1(x)+c2y2(x)+yp(x)
linear differential operator
given g(x), L[g](x) L[g]=ag′′+bg′+cg
linear operator
apply operator L on function g(x) L[g](x)=ag(x)′′+bg(x)′+cg(x) L linear ⇐
- L[g1+g2]=L[g1]+L[g2]
- L[λg]=λL[g]
find a particular solution
polynomial f(x)
use the degree of f(x) as the degree of the particular solution
exponential f(x)
f(x)=P(x)eλx where P(x) is polynomial same as the polynomial method but multiply the same exponential
- when λ is one root of auxiliary equation, multiply the polynomial solution by x
- when λ is double root of auxiliary equation, multiply the polynomial solution by x2
trigonometric f(x)
f(x)=P1(x)eλxcos(μx)+P2(x)eλxsin(μx) same as polynomial but degree needs to be the maximum between P1(x),P2(x) and need two polynomial
- when λ+iμ is one root of auxiliary equation, multiply the polynomial solution by x
combination of above case
split f(x) to f1(x),⋯ that match above case
higher order differential equation
n-th order an(x)y(n)+⋯+a0(x)y(0)=f(x) where ai(x),f(x) are continuous on I⊆R
existence and uniqueness
above equation with y(x0)=y0,⋯,y(n−1)(x0)=yn−1 has unique solution ⇐∀ x∈I,ai(x),f(x) are continuous, an(x)=0
system in matrix form
M[y1,⋯,yn](x0) c=y0y1(x0)⋮y1(n−1)(x0)⋯⋱⋯yn(x0)⋮yn(n−1)(x0)c1⋮cn=y0⋮yn−1
Wronskian
determinant W[y1,⋯,yn]=detM[y1,⋯,yn]
linear dependency and Wronskian
y1,⋯,yn linearly independent in I ⇔∃ x0∈I,W[y1,⋯,yn](x0)=0 ⇔∀ x∈I,W[y1,⋯,yn](x)=0
- linearly dependent function y1,⋯,yn linearly dependent in I ⇐∃ λ1,⋯λm∈R,∀ x∈I,λ1y1(x)+⋯λmym(x)=0
homogeneous linear equation with constant coefficient
any(n)+⋯+a0y=0
auxiliary equation
anrn+⋯+a0=0 n complex root counting multiplicity
general solution
linear combination of n solution time x repeatedly if a solution has multiplicity 2 or more
autonomous planar system of differential equation
x′=f(x,y)y′=g(x,y) with respect to t
initial value problem
x(t0)=x0y(t0)=y0
- autonomous t can start wherever wanted, let t0=0
integral curve
parameterized curve of (x(t),y(t))
velocity vector
dtd(x(t),y(t))=(f(x,y),g(x,y))
phase plane
xy-plane
- phase space higher dimension
phase portrait
phase plane with several solution
equilibrium (xe,ye)
f(xe,ye)=g(xe,ye)=0
- equilibrium solution (x(t),y(t))=(xe,ye)
solution
- convert to differential equation with one variable
- conserved quantity / integral of motion
- reduce to non-autonomous first order equation dxdy=x′y′=f(x,y)g(x,y)
non-autonomous planar system
x′=f(x,y,t)y′=g(x,y,t) where f,g are periodic in t with period T
periodic time-shift immunity
similar to periodic time-shift immunity in one dimension if (x1(t),y1(t)) is solution to x′=f(x,y,t),y′=g(x,y,t)(x(0),y(0))=(x0,y0) then (x2(t),y2(t))=(x1(t−kT),y1(t−kT)) is solution to x′=f(x,y,t),y′=g(x,y,t)(x(kT),y(kT))=(x0,y0)
Poincaré map for non-autonomous periodic dynamical system
P:R2→R2 P(x0,y0)=(x(T),y(T))⇒Pk(x0,y0)=(x(kT),y(kT))
- also known as stroboscopic map
example: mass on spring with external forcing
forced duffing equation
x′=y,y′=−by+x−x3+Fsin(γt) period γ2π
Poincaré map for forced duffing equation
P[b_,F_,γ_][{x0_, y0_}]:=
NDSolveValue[
x'[t]==y[t]&&y'[t]==-by[t]+x[t]-x[t]^3+F Sin[γ t]&&x[0]==x0&&y[0]==y0,
{x[2Pi/γ],y[2Pi/γ]},
{t,0,2Pi/γ}
]
chaos
sensitive dependence on initial condition
- two initial condition close to each other deviate exponentially fast
- the exact state of the system is fundamentally unpredictable though the system is deterministic
- chaotic attractor of the two initial condition look identical
Poincaré map for autonomous system
Henon-Heiles system
x1′y1′x2′y2′=y1=−x1−2x1x2=y2=−x12+x22−x2
- conserved quantity H=21(y12+y22)+21(x12+x22+2x12x2−32x23)
- hyperplane Σ∈R4 Σ:x1=0 on Σ H=21(y12+y22)+21(x22−32x23) consider an initial condition on Σ corresponding to value h for H, when the solution reach Σ again h still conserve ⇒ it is enough to know (x2,y2) to know the rest ⇒ define Poincaré map for (x2,y2)
poincareMap[h_][{x0_,y0_}]:=Module[{stopTime},
NDSolveValue[
x1'[t]==y1[t]
&&y1'[t]==-x1[t]-2x1[t] x2[t]
&&x2'[t]==y2[t]
&&y2'[t]==-x1[t]^2-x2[t]+2[t]^2
&&x1[0]==0
&&y1[0]=Sqrt[2h-y0^2-x0^2+2/3 x0^3]
&&x2[0]=x0
&&y2[0]=y0
&&WhenEvent[x1[t]==0&&y1[t]>0,stopTime=t;"StopIntegration"],
{x2[stopTime],y2[stopTime]},
{t,0,Infinity}
]
]
linear system
x′=A(t)x+f(t)
planar linear system
x1′=a11(t)x1+a12(t)x2+f1(t)x2′=a21(t)x1+a22(t)x2+f2(t) where aij(t),fi(t),i,j∈{1,2} continuous x=[x1x2],f(t)=[f1(t)f2(t)],A(t)=[a11(t)a21(t)a12(t)a22(t)]
homogeneity
- f(t)≡0⇒ homogeneous
- otherwise ⇒ non-homogeneous
convert linear differential equation to linear system
linear differential equation y(n)+pn−1(t)y(n−1)+⋯+p0(t)y=g(t) define x1=yx2=y′⋮xn=y(n−1) get a linear system
existence and uniqueness
If A(t), f(t) are continuous functions in an interval I⊆R and t0∈I then for any initial vector x0∈Rn there exists a unique solution of x′=A(t)x+f(t) in I that satisfies the initial condition x(t0)=x0.
linearity
linear combination of solution are also solution
linear dependency
linearly dependent vector ∃ c1,c2,⋯,cm,i=1∑m(ci2)=0,∀ t∈I,c1x1(t)+⋯+cmxm(t)≡0
Wronskian
for n solution x1,⋯,xn X(t)=[x1⋯xn]=x1,1⋮x1,n⋯⋱⋯xn,1⋮xn,nW[x1,⋯,xn](t)=detX(t)
linear dependency and Wronskian
- solution x1(t),⋯,xn are linearly dependent in I
- ⇔∀ t∈I,W[x1,⋯,xn](t)=0
- ⇔∃ t0∈I,W[x1,⋯,xn](t0)=0
fundamental solution
collection of n linearly independent solution x1,⋯,xn
fundamental matrix
corresponding matrix X(t)=[x1(t)⋯xn(t)]
- invertible
- general solution for the linear system y(t)=c1x1(t)+⋯+cnxn(t)=X(t)c
- X′(t)=A(t)X(t)
- solution for the initial value problem with x(t0)=x0 y(t)=X(t)X(t0)−1x0
- ∀ Y(t) that is another fundamental matrix ∃ C,Y(t)=X(t)C
- detC=0
- ∀ C,detC=0,X(t)C is a fundamental matrix
- let Y(t)=X(t)X(t0)−1, then Y(t0)=I y(t)=Y(t)x0
linear system with constant coefficient
x′=Ax where A∈Rn×n is constant
find a solution
x(t)=ertu where u satisfy (A−rI)u=0
eigenvalue problem
u is the solution to eigenvalue equation u is non-zero eigenvector corresponding to r⇐r is a root of characteristic polynomial p(r)=det(A−rI)
- r is a complex eigenvalue ⇒rˉ is eigenvalue
- u is eigenvector corresponding to r⇒uˉ is eigenvector corresponding to rˉ
linearly independent eigenvector and general solution
A∈Rn×n has linearly independent eigenvector u1,⋯,un corresponding to real eigenvalue r1,⋯,rn ⇒ general solution for x′=Ax is x(t)=c1er1tu1+⋯+cnerntun
real distinct eigenvalue
r1,⋯,rm(1≤m≤n) are real distinct eigenvalue of A⇒ corresponding eigenvector u1,⋯,un are linearly independent ⇒ a fundamental matrix is [er1tu1⋯erntun]
complex eigenvalue
eigenvalue r=α+iβ with eigenvector u=a+ib eigenvalue rˉ with eigenvector uˉ ⇒ solution x1(t)=eαt(cos(βt)a−sin(βt)b)x2(t)=eαt(sin(βt)a+cos(βt)b)
matrix exponential
eA=I+A+2!1A2+⋯=k=0∑∞k!1Ak
property
- ∀ A, series eA converge
- eO=I
- deteA=etrA>0
- trace of A trA=i=1∑naii=i=1∑nλi
- AB=BA⇒eA+B=eAeB=eBeA
- eAe−A=I
- Au=ru⇒eAu=eru
- eU−1AU=U−1eAU
- eIt=etI
- eA(t+s)=eAteAs
diagonal matrix
diagonal matrix R=a0⋮0b⋮⋯⋯⋱⇒eR=ea0⋮0eb⋮⋯⋯⋱
matrix exponential and linear system
unique solution to the initial value problem x′=Ax with x(t0)=x0 is x(t)=eA(t−t0)x0
- derivative dtd(eAt)=AeAt=eAtA
exponential matrix as fundamental matrix
linear system x′=Ax,x(0)=x0 has fundamental matrix X(t) ⇒x(t)=X(t)X(0)−1x0=eAtx0⇒eAt=X(t)X(0)−1
generalized eigenvector
a non-zero vector u∈Cn ∃ r∈C, generalized eigenvector of A associated with r, m∈N+ (A−rI)mu=0
- m=1⇒ generalized eigenvector are also standard eigenvector
- r is eigenvalue of A with eigenvector maxm(A−rI)mu=0
generalized eigenvector given characteristic polynomial
A∈Rn×n has characteristic polynomial p(r)=(r−r1)m1⋯(r−rk)mk(m1+⋯+mk=n) ⇒∀ j=1,⋯,k, ∃ linearly independent generalized eigenvector uj,1,⋯,uj,mj ∀ l=1,⋯,mj,(A−rjI)mjuj,l=0
compute eAt
- find generalized eigenvector u1,⋯,un
- compute solution for x(t)=eAtuj (j=1,⋯,n)
- fundamental matrix X(t)=[x1(t)⋯xn(t)]
- eAt=X(t)X(0)−1
planar linear system
x′=Ax where A∈R2×2,x=[x1x2]
characteristic polynomial
p(r)Δ=det(A−rI)=r2−rtrA+detA=r2−Tr+D=T2−4D
real distinct eigenvalue Δ>0
U=[u1u2],R=[r100r2]⇒AU=[u1u2][r100r2]=UR⇒A=URU−1⇒eAt=UeRtU−1
- real eigenvalue 2T±T2−4D
transformation
define z by x=Uz⇒z′=Rz⇒z1′=r1z1,z2′=r2z2⇒{z1(t)=z1(0)er1tz2(t)=z2(0)er2t
- case 1a. 0<r1<r2 (or D>0,T>0) all arrow point away from origin
- unstable node
- case 1b. r2<r1<0 (or D>0,T<0) all arrow point towards origin
- stable node
- case 1c. r1<0<r2 (or D<0) all arrow point towards origin on z1-axis, point away from origin on z2-axis
- saddle
- case 1d. r1=0<r2 (or D=0,T>0) all arrow point away from z1-axis parallel to z2-axis
- case 1e. r2<0=r1 (or D=0,T<0) all arrow point towards z1-axis parallel to z2-axis
complex conjugate eigenvalue Δ<0
conjugate eigenvalue α±iβ with corresponding eigenvector a±ib U=[ab],R=[α−ββα]⇒AU=UR⇒A=URU−1⇒eAt=UeRtU−1 where eRt=eαt[cos(βt)−sin(βt)sin(βt)cos(βt)]
- complex eigenvalue 2T±i4D−T2
polar coordinate transformation
x=Uz{r=z12+z22θ=arctan(z1z2)⇒{θ′=−βr′=αr
- case 2a. α>0 (or T>0) arrow rotate around origin moving away
- unstable spiral
- case 2b. α<0 (or T<0) arrow rotate around origin moving towards
- stable spiral
- case 2c. α=0 (or T=0) solution are closed curve with period T=β2π
- center
rotation direction
- rotation direction on z1,z2-plane
- β>0 ⇒θ′<0, clockwise
- rotation direction on x1,x2-plane
- detU>0 same direction as on z1,z2-plane
real repeated eigenvalue Δ=0
generated eigenvalue of A U=[u1u2]
- real eigenvalue 2r0=T
- case c1. A=r0I (or both u1 and u2 are eigenvalue of A) x=[c1er0tc2er0t] all arrow point away from origin
- unstable node
- case c2. A=r0I (or only u1 is eigenvalue of A) x=[(c1+c2t)er0tc2er0t] all arrow point away from origin
- turn to the right with c2>0
- turn to the left with c2<0
- unstable
almost linear system
- almost linear system x′=f(x) at xe
- xe is an equilibrium of x′=f(x)
- y′=g(y) is an almost linear system at the origin where y=x−xe,g(y):=f(xe+y)
- almost linear system x′=f(x) at the origin
- 0 is an equilibrium of x′=f(x)
- f is continuous around 0
- ∥x∥∥f(x)−Ax∥→0 as ∥x∥→0 ⇐∂xj∂fi are continuous near (0,0)
- A=Df(0)
- detA=0
- jacobian of f(x)=[f1(x1,x2)f2(x1,x2)] Df(x)=∂x1∂f1(x1,x2)∂x1∂f2(x1,x2)∂x2∂f1(x1,x2)∂x2∂f2(x1,x2)
planar system
x′=f(x)
equilibrium
point xe,f(xe)=0
stability
- stable ∀ ε>0, ∃ δ>0, ∀ x(0)∈Bδ(xe),∀ t≥0, x(t)∈Bε(xe) in English, given a bigger disk, one can always find a smaller disk, so that if you start from the smaller disk, you don’t go out of the bigger disk
- asymptotically stable stable and ∃ η>0,∀ x(t),x(0)∈Bη(xe),∥x(t)−xe∥→0 as t→0
- unstable not stable
open disk
Bδ(x)={y∈R2:∥y−x∥2<δ}
linearization theorem (Hartman-Grobman theorem)
transform the dynamics of system x′=f(x) to the dynamics of system y′=Ay where A=Df(xe)
- system x′=f(x) is almost linear at xe
- xe is hyperbolic
- ⇒∃ coordinate transformation y=ϕ(x) near xe, ϕ(xe)=0
hyperbolic linear system
hyperbolic matrix
all eigenvalue have non-zero real part
hyperbolic equilibrium
equilibrium xe of x′=f(x) Df(xe) is hyperbolic matrix
energy method
mechanical system
x′=yy′=f(x)
potential U(x)
antiderivative of −f(x)
- U′(x)=−f(x)
energy function
conserved quantity E(x,y)=21y2+U(x)
level set E(x,y)=h
fix E(x,y) to h ⇒y=±2(h−U(x))
- graph of y is only defined where U(x)≤h
- the two parts above and below x-axis mirror each other
- intersection with x-axis are where U(x)=h
- U′(xi)=0⇒ curve is vertical at intersection
- d(h−U(x))d∣y∣>0
potential plane
x,U(x)-plane
equilibrium (xe,0)
f(xe)=0,y=0
- U′(xe)=0
linearization
f(x),f′(x) are continuous near xe U′′(xe)=0 ⇒ system is almost linear at equilibrium (xe,0)
- Jacobian corresponding to F=[yf(x)] DF(x,y)=[0−U′′(x)10]
- U′′(xe)>0, minimum at xe linear system is a center linearization does not hold use Taylor series quadratic term U(x)≈U(xe)+21U′′(xe)(x−xe)2 level curve are approximate ellipse
- U′′(xe)<0, maximum at xe linear system is a saddle by linearization theorem, level curve also saddle
Lyapunov’s Method
isolated equilibrium (xe,ye)
∃ open disk D=Bδ(xe,ye) of radius δ>0 centered at (xe,ye) D does not contain other equilibrium
positive/ negative definite/ semidefinite function W(x,y) in D
D is open disk centered at (0,0) W(x,y) is continuous in D W(0,0)=0 D∗=D</span>{(0,0)} planar system x′=f(x) real-valued function V(x) V˙(x)=∂x1∂V(x)f1(x)+∂x2∂V(x)f2(x)=dtd[V(x(t))]=f(x)⋅∇V(x) planar system x′=f(x) (0,0) is an isolated equilibrium a function V(x) that satisfy the condition of either part of Lyapunov’s stability theorem planar system x′=f(x) (0,0) is an isolated equilibrium closed curve L={x(t):0≤t≤T} x′=x−y−xf(r)y′=x+y−yf(r) L enclose at least one equilibrium x′=f(a,x) a is changing parameter ∃ smooth curve a=h(x) near x=x0 f(h(x),x)=0h(x0)=a0h′(x0)=0h′′(x0)=−fa(a0,x0)fxx(a0,x0)=0 in English: curve of equilibria (bifurcation diagram) near (a0,x0) look like parabola s(x)=fx(h(x),x)s′(x)=fxa(h(x),x)h′(x)+fxx(h(x),x) check s′(x0) s′(x0)=fxx(a0,x0) x′=f(a,x,y)y′=g(a,x,y) a∈R2 is the parameter equilibrium persist ⇐det[fx(a0,x0,y0)gx(a0,x0,y0)fy(a0,x0,y0)gy(a0,x0,y0)]=0 where (x0,y0) is equilibrium when a=a0 v=AB cv a=⟨a1,a2,a3⟩ j=⟨1,0,0⟩k=⟨0,1,0⟩l=⟨0,0,1⟩ a⋅b=i∑aibi=∣a∣∣b∣cosθ cosθ=∣a∣∣b∣a⋅b a⊥b⇔a⋅b=0 cosα=∣a∣a1cosβ=∣a∣a2cosγ=∣a∣a3 relation cos2α+cos2β+cos2γ=1 conclusion ∣a∣a=⟨cosα,cosβ,cosγ⟩ compab=∣a∣a⋅bprojab=∣a∣2a⋅ba a×b=⟨a2b3−a3b2,a3b1−a1b3,a1b2−a2b1⟩=ia1b1ja2b2ka3b3 ∣a×b∣=∣a∣∣b∣sinθ a∥b⇔a×b=0 a⋅(b×c) volume of parallelepiped by a,b,c V=∣a⋅(b×c)∣ V=0⇔a,b,c coplanar a×(b×c) vector equation r=r0+tv parametric equation x=x0+aty=y0+btz=z0+ct where r=⟨x,y,z⟩r0=⟨x0,y0,z0⟩v=⟨a,b,c⟩ symmetric equation ax−x0=by−y0=cz−z0 vector equation n⋅(r−r0)=0 scalar equation a(x−x0)+b(y−y0)+c(z−z0)=0 where n=⟨a,b,c⟩ linear equation ax+by+cz+d=0 D=∣compnb∣=a2+b2+c2∣ax1+by1+cz1+d∣ consist of parallel ruling through a plane curve second degree equation in three variable obtained after translation and rotation Ax2+By2+Cz2+J=0 or Ax2+By2+Iz=0 r(t)=⟨f(t),g(t),h(t)⟩ where f(t),g(t),h(t) are component function of r take limit of its component function r continuous at a⇔ t→alimr(t)=r(a) set C of all point x=f(t)y=g(t)z=h(t) tangent vector dtdr=r′(t)=h→0limhr(t+h)−r(t)=⟨f′(t),g′(t),h′(t)⟩ T(t)=∣r′(t)∣r′(t) integrate each component function s(t)=∫atduds du=∫at∣r′(u)∣ du κ=dsdT=∣r′∣∣T′∣=∣r′∣3∣r′×r′′∣ r′ continuous and r′=0 N(t)=∣T′(t)∣T′(t) determined by T,N in osculating plane tangent the curve center towards N radius ρ=κ1 B(t)=T(t)×N(t) determined by N,B v=r′ v=∣v(t)∣=∣r′(t)∣=dtds a=v′=r′′=v′T+vT′=v′T+κv2N=aTT+aNN=∣r′(t)∣r′(t)⋅r′′(t)T+∣r′(t)∣∣r′(t)×r′′(t)∣N {f(x1,x2,⋯)∣(x1,x2,⋯)∈D} curve/ space with equation f(x1,x2,⋯)=k (x1,x2,⋯)→(a1,a2,⋯)limf(x1,x2,⋯)=L ⇔f(x1,x2,⋯) approach L as (x1,x2,⋯) approach (a1,a2,⋯) from any path within domain (x1,x2,⋯)→(a1,a2,⋯)limf(x1,x2,⋯)=f(a1,a2,⋯) ⇔f continuous ∂xn∂fx1=a1,⋯,xn=an,⋯=fxn(a1,a2,⋯)=g′(an) where g(xn)=f(a1,⋯,xn,an+1,⋯) (fxm)xn=fxmxn f defined on D containing (a,b), fxy,fyx continuous ⇒fxy(a,b)=fyx(a,b) z−z0=fx(x0,y0)(x−x0)+fy(x0,y0)(y−y0) f(x,y)≈fx(a,b)(x−a)+fy(a,b)(y−b)+f(a,b) Δz=f(a+Δx,b+Δy)−f(a,b) Δx→0,Δy→0limΔz=Δx→0,Δy→0lim(∂x∂z+ε1)Δx+(∂y∂z+ε2)Δy=Δx→0,Δy→0lim∂x∂zΔx+∂y∂zΔy for differentiable function dz=∂x∂zdx+∂y∂zdy F(x,y)=0 given take derivative of both side ∂x∂Fdxdx+∂y∂Fdxdy=0⇒dxdy=−∂y∂F∂x∂F ∇f(x1,⋯,xn)=⟨fx1,⋯,fxn⟩ u is unit vector Du=h→0limhf(x1+u1h,⋯,xn+unh)−f(x1,⋯,xn)=∇f(x1,⋯,xn)⋅u max(Duf)=∣∇f∣ only when u,∇f(x1,⋯,xn) have the same direction level surface F(x,y,z)=k line curve r(t) on the surface of the level surface ∇F⋅r′=0 ⇒∇F is normal vector of tangent plane Fx(x−x0)+Fy(y−y0)+Fz(z−z0)=0 point (a,b) {fx(a,b)=0fy(a,b)=0 ⇒ maximum or minimum or saddle point D(a,b)=fxx(a,b)fyy(a,b)−fxy2(a,b) constraint g(x,y,z)=k ∇f∥∇g⇒∇f(x,y,z)=λ∇g(x,y,z) for additional constraint h(x,y,z)=c two constraint’s intersection curve C ∇f,∇g,∇h⊥C⇒∇f(x,y,z)=λ∇g(x,y,z)+μ∇h(x,y,z) A(S)=∬D1+fx2+fy2 dA x=x(u,v),y=y(u,v) ∂(u,v)∂(x,y)=∂u∂x∂u∂y∂v∂x∂v∂y=∂u∂x∂v∂y−∂v∂x∂u∂y∬Rf(x,y)dx dy=∬Sf(x(u,v),y(u,v))∂(u,v)∂(x,y)du dv piecewise-smooth curve C ∫Cf ds=∫Cf(x(t),y(t))(dtdx)2+(dtdy)2dt line integral of f along C with respect to x ∫Cf ds=∫Cf(x(t),y(t))dtdxdt change orientation of C W(C)=∫CF⋅dr=∫CF⋅T ds ∫C∇f⋅dr=f(r(b))−f(r(a)) vector field F and ∃ f s.t. F=∇f on open connected region, F=∇f=⟨P(x,y),Q(x,y)⟩ ⇔ on simply-closed region, ∂y∂P=∂x∂Q ∀ C1,C2 with the same ends ∫C1F⋅dr=∫C2F⋅dr ⇔ closed path integral ∮CF⋅dr=0 ⇔F is conservative C positively oriented, piecewise-smooth, simple closed, ∫CPdx+Qdy=∬D(∂x∂Q−∂y∂P)dA ∇=⟨∂x∂,∂y∂,∂z∂⟩ ∇2=∇⋅∇ curl F=∇×F=i∂x∂Pj∂y∂Qk∂z∂R=⟨∂y∂R−∂z∂Q,∂z∂P−∂x∂R,∂x∂Q−∂y∂P⟩ div F=∇⋅F=∂x∂P+∂y∂Q+∂z∂R for 2-dimension ∮CF⋅dr=∬D(curl F)⋅dA∮CF⋅ndr=∬Ddiv F dA outward unit normal vector n=⟨∣r′(t)∣y′(t),−∣r′(t)∣x′(t)⟩ r(u,v)=⟨x(u,v),y(u,v),z(u,v)⟩ grid curve normal vector for tangent plane at (u0,v0), ru(u0,v0)×rv(u0,v0) smooth, ru×rv=0 surface integral ∬Sf dS=∬Df∣ru×rv∣dA area A(S)=∬S1 dS=∬D∣ru×rv∣dA for sphere, dS=a2sinϕ dϕ dθ normal vector field n=∣ru×rv∣ru×rv dS=n dS ∬SF⋅ dS=∬SF⋅ n dS=∬SF⋅∣ru×rv∣ru×rv dS=∬SF⋅(ru×rv) dA ∫CF⋅dr=∬Scurl F⋅dS=∬Scurl F⋅(ru×rv) dA ∬SF⋅n dS=∭EdivF dV x=f(t)y=g(t) dtdy=dxdy⋅dtdxdxdy=dtdxdtdy(dtdx=0) L=∫αβ ds=∫αβ(dtdx)2+(dtdy)2 dt x=rcosθy=rsinθr2=x2+y2tanθ=xy graph of polar equation F(r,θ)=0 A=∫ab21r2 dθ L=∫αβr2+(dθdr)2 dθ list of number in definite order n→∞liman=L ⇒{an} converge ⇔∀ ε>0, ∃ N, ∀ n>N ∣an−L∣<ε otherwise, {an} diverge sequence converge if corresponding function converge x→∞limf(x)=Lf(n)=an(n∈N)⇒n→∞liman=L n→∞liman=Lf continuous at L⇒x→∞limf(an)=f(L) n→∞limrn=⎩⎨⎧01DNEr∈(−1,1)r=1otherwise increasing sequence ∀ n≥1,an<an+1 {an} bounded above ⇔∃ M,∀ n≥1,an<M monotonic bounded sequence converge ∑an or n−1∑∞an sn=i=1∑n n→∞limsn=s ⇒∑an convergent an=arn−1sn=1−ra(1−rn)∑an={1−raDNE−1<r<1otherwise n=0∑∞xn=1−x1 an=n1 divergent n=1∑∞n1=∞ n=1∑∞np1{convergedivergep>1p≤1 n=1∑∞n21=6π2 ∑an converge ⇒n→∞liman=0 n→∞liman=0 or DNE⇒∑an diverge ∃ N≥1∈N+, f(n)=an continuous decreasing positive on [N,∞) ∫1∞f(x) dx converge ⇔∑an converge f decreasing on [n,∞) Rn=s−sn∫n+1∞f(x) dx≤Rn≤∫n∞f(x) dx ∑bn converge, ∀ n,an≤bn ∑bn diverge, ∀ n,bn≤an n→∞limbnan=⎩⎨⎧C=00,bn converge∞,bndiverge⇒an,bn have same convergence⇒an converge⇒an diverge ∀ n≥1∈N,an⋅an+1<0 alternating series an ∀ n≥1∈N,∣an∣≤∣an+1∣an→∞lim=0⎭⎬⎫⇒an converge an satisfy alternating series test ∣Rn∣=∣s−sn∣≤∣an+1∣ ∑∣an∣ converge ⇒∑an absolutely converge ∑an converge but not absolutely converge ⇒∑an conditionally converge n→∞limanan+1=L⎩⎨⎧<1⇒∑an absolutely converge>1 or ∞⇒∑an diverge n→∞limn∣an∣=L⎩⎨⎧<1⇒∑an absolutely converge>1 or ∞⇒∑an diverge power series centered at a (power series about a) n=0∑∞cn(x−a)n an converge when x is within R to a {a} R ⏎ equal to each term different or integrate power series ∑an=n=0∑∞cn(x−a)n has radius of convergence R ⇒ function f(x)=n=0∑∞cn(x−a)n differentiable on (a−R,a+R) derivative f′(x)=n=0∑∞dxd(cn(x−a)n)=n=1∑∞cnn(x−a)n−1 integral ∫f(x) dx=n=0∑∞∫cn(x−a)n dx=n=0∑∞cnx+1(x−a)n+1+C f(x)=n=0∑∞cn(x−a)n∣x−a∣<R ⇔ cn=n!f(n)(a) Taylor series of f at a (/ about a / centered at a) f(x)=n=0∑∞n!f(n)(a)(x−a)n Taylor series at 0 f(x)=n=0∑∞n!f(n)(0)xn n-th degree Taylor polynomial Tn of f at a Tn(x)=i=0∑nn!f(n)(a)(x−a)n Rn(x)=f(x)−Tn(x) for ∣x−a∣<R n→∞limRn(x)=0 ⇒f equal sum of its Taylor series ∣f(n+1)(x)∣≤M for ∣x−a∣≤d ⇒∣Rn(x)∣≤(n+1)!M∣x−a∣n+1 for ∣x−a∣≤d used to prove equivalence between function and sum of Taylor series a convergent sequence (x∈R) n→∞limn!xn=0 ex=n=0∑∞n!xn sinx=n=0∑∞(−1)n(2n+1)!x2n+1cosx=n=0∑∞(−1)n(2n)!x2n ∣x∣<1 (1+x)k=n=0∑∞(kn)xn z=(x,y) in complex plane where x,y∈R (x,y)=(x,0)+(0,y)=(x,0)+(0,1)⋅(y,0)=:x+iy where i:=(0,1) diagram in R2 given by x+iy↦xi+yj modulus, = Euclidean length in Argand diagram ∣z∣=x2+y2 triangle inequality hold zˉ=x−iy eiθ=cosθ+isinθ⇒x+iy=reiθ (cosθ+isinθ)n=einθ=cosnθ+isinnθ z0=r0eiθ0 nth root of z0 zn=z0⇒zk=nr0ei(nθ0+2πk),k∈{0,1,⋯,n−1} see context in Topology ε>0 open ε-neighborhood Uε(z0)={z∈C:∣z−z0∣<ε}∈Os nonempty connected set D⊆C is connected ⇔ ∀ z1,z2∈D,∃ polygonal line with finitely many segment between z1, z2 ∃ circle of finite radius containing the set x→∞ or y→∞ x2+y2+z2=1 intersection of the sphere with line through north poll and complex number z→z0limf(z)=w0 ⇔∀ ε>0, ∃ δ>0 s.t. ∣f(z)−w0∣<ε whenever ∣z−z0∣<δ f(z)=u(x,y)+iv(x,y) z→z0limf(z)=(x,y)→(x0,y0)limu(x,y)+i(x,y)→(x0,y0)limv(x,y) ⇔ limit on the RHS exist f is continuous at z0⇔ f:D→C,Uε(z0)⊆D f differentiable at z0⇔ z→z0limz−z0f(z)−f(z0) exist f′(z) exist at z0⇒ ux=vy,uy=−vxf′=ux+ivx f(r,θ)=u(r,θ)+iv(r,θ) rur=vθ,uθ=−rvrf′=e−iθ(ur+ivr) f defined around z0, ⇒f′=ux+ivx exist f:D→C holomorphic at z0⇔ f differentiable at each point in Uε(z0)⊆D holomorphic at every finite point f holomorphic, in D f is constant ⇔f′=0 ⇔fˉ holomorphic in D ⇔∣f∣ is constant complex-valued function C:[a,b]⊆R→C,t↦z(t)=x(t)+iy(t) x′(t),y′(t) continuous ⇒C differentiable, z′(t)=x′(t)+iy′(t) C has unit tangent vector in Argand diagram T(t)=∣z′(t)∣z′(t) ∫z(t)=∫x(t)+i∫y(t) t L(C)=∫abx′(t)2+y′(t)2 dt=∫ab∣z′(t)∣ dt piecewise smooth curve C joined from finite smooth parametric curve simple closed contour separate C into contour C:z(t):[a,b]→C, ∫Cf(z) dz=∫abf(z(t)) z′(t) dt independent of parametrization linearity ∫C(cf(z)+g(z)) dz=c∫Cf(z) dz+∫Cg(z) dz for contour C=C1∪C2 ∫Cf(z) dz=∫C1f(z) dz+∫C2f(z) dz traversed in opposite direction −C:t↦z(−t):[−b,−a]→C⇒∫−Cf(z) dz=−∫Cf(z) dz contour C of length L, ⇒∫Cf(z) dz≤ML proof using lemma ∫abz(t) dt≤∫ab∣z(t)∣ dt I:=∫abw(t) dt=rIeiθI lemma hold for I=0. for I=0 rI>0⇒rI=Re (e−iθI∫abw(t) dt)=∫abRe (e−iθIw(t)) dt≤∫abe−iθIw(t) dt=∫ab∣z(t)∣ dt f:D⊆C→C continuous in D ∃ antiderivative F s.t. ∀ C in D from z1 to z2 ∫Cf(z) dz=F(z2)−F(z1) ⇔∀ C closed, ∮Cf(z) dz=0 on D⊆C, ⇒∫Cf(z) dz=0 same as Cauchy-Goursat theorem, except f′(z) continuous ∫Cf(z) dz=∫C(u+iv)(dx+idy)=∫C(u+iv)dx+(−v+iu)dy=∬D(∂x∂(−v+iu)−∂y∂(u+iv))dA=∬D(−vx+iux−uy+ivy)dA=0 on simply connected domain D⊆C, ⇒∫Cf(z) dz=0 on multiply connected domain D⊆C, ⇒∫Cf(z) dz+∫Cif(z) dz=0 for the following case, deformation of contour integral persist its value positively oriented contour C1,C2, ⇒∫C1f(z) dz=∫C2f(z) dz C positively oriented contour, ∀ z0 inside C f(z0)=2πi1∫Cz−z0f(z) dz derivative f′(z0)=2πi1∫C(z−z0)2f(z) dz at z0, f holomorphic ⇒∀ n∈N,f(n) exist and holomorphic f(n)(z0)=2πin!∫C(z−z0)(n+1)f(z) dz f holomorphic ⇒u,v have continuous partial derivative at all order ∫CRz−z0f(z)−f(z0) dz=0 on positively oriented contour C, evaluate integral I=∫Cg(z) dz find f(z),z0 s.t. z0 inside C and g(z)=(z−z0)nf(z) calculate f(n−1)(z0) apply Cauchy’s integral formula I=(n−1)!2πif(n−1)(z0) f continuous on D ∀ closed contour C ∫Cf(z) dz=0 ⇒f holomorphic throughout D bounded entire function is constant non-constant polynomial of degree n has n root non-constant holomorphic function f on open D f:Uε(z0)→C analytic ⇔ ∀ z∈Uε(z0),f(z)=n=0∑∞an(z−z0)n annual domain D, R0<∣z−z0∣<R1, f(z)=n=−∞∑∞cn(z−z0)ncn=2πi1∫C(z−z0)n+1f(z) dz res f(z0)=c−1=2πi1∫Cf(z) dz⇒∫Cf(z) dz=2πi res f(z0) positively oriented contour C, ∫Cf(z) dz=2πik=1∑nres f(zk) all singularity are inside negatively oriented contour C ∫Cf(z) dz=2πi res f(∞) Cauchy residue theorem can be rewritten as k=1∑nres f(zk)+res f(∞)=0 by replacing z with z1 res(z21f(z1))(0)=−res f(∞) z0 is singularity ⇔ f holomorphic in U^ε(z0) and not at z0 f analytic in deleted neighborhood U^(z0) z0 is isolated singularity ∃ m∈N+ s.t. ∀ n<−m, cn=0 z0 is essential isolated singularity Casorati-Weierstrass theorem ∀ ε,R>0,w∈C,∃ z∈∣z−z0∣<R s.t ∣f(z)−w∣<ε ∃ m∈N+ s.t. ∀ n<−m, cn=0 pole of order m at z0 simple pole ⇐m=1 removable singularity ⇐m=0 res f(z0)={(m−1)!1dzm−1dm−1[(z−z0)mf(z)]}(z0) only usable for m=1,2 m=1 res f(z0)=[(z−z0)f(z)](z0) m=2 res f(z0)={dzd[(z−z0)2f(z)]}(z0) f analytic at z0, ∀ n<m,f(n)(z0)=0 p,q analytic, ⇒qp has pole of order m at z0 m=1⇒qp has simple pole at z0, res [qp](z0)=q′(z0)p(z0) CPV ∫−∞∞f(x) dx=R→∞lim∫−RRf(x) dx if RHS exist real, continuous, even, irreducible rational function f=qp, ⇒f has pole zk above the real axis, let CR be upper semicircle with missing side [−R,R], then ∫−RRf(x) dx+∫CRf(x) dx=2πik∑res f(zk) if limR→∞∫CRf(x) dx=0, then ∫−∞∞f(x) dx=2πik∑res f(zk) k>0 ∫−∞∞f(x)sin(kx) dxor∫−∞∞f(x)cos(kx) dx integrate instead ∫−RRf(x)eikx dx f(z) analytic above the imaginary axis outside ∣z∣<R0, ∣f(z)∣(CR)≤MR,R→∞limMR=0⇒∀ k>0, R→∞lim∫CRf(z)eikz dz=0 proof by parametric integral and Jordan’s inequality ∫0πe−kRsinθdθ<kRπ improper integral with singularity on real axis circumvent each singularity xi on real axis by upper semicircle Ci:∣z−xi∣=ri with ri→0 f analytic on 0<∣z−xi∣<r, ⇒ri→0lim∫Cif(z) dz=−iπ res f(xi) proof by integrating Laurent series of parametric line integral term by term lemma hold at simple pole can prove ∫−∞∞xsinxdx=π circular segment of Cr:z=reiθ,0≤θ≤θ1, z→0limzf(z)=0⇒r→0lim∫Crf(z) dz=0 proof by definition of limit and upper bound theorem ∫θ0θ1f(sinθ,cosθ) dθ define C:z=eiθ,θ0≤θ≤θ1 sinθ=2iz−z−1,cosθ=2z+z−1,dθ=izdz⇒∫θ0θ1f(sinθ,cosθ) dθ=∫Cf(2iz−z−1,2z+z−1)izdz Laplace transform of f:R0+→R f~(s)=∫0∞f(t)e−stdt:C→C inverse Laplace transform γ∈R to the right of all sinularity of f~(s), f(t)=2πi1R→∞lim∫ΓRf~(s)estds f is analytic except possibly for poles simple closed contour C, 2πi1∫Cf(z)f′(z)dz=Z−P closed contour C:z=z(t),a≤t≤b,z(a)=z(b)=z0, difference in argument ΔCargf=ϕ(b)−ϕ(a)=2πk winding number of Γ with respect to 0 ν(Γ,0)=2π1ΔCargf=k Z−P=2πi1∫Cf(z)f′(z)dz=2π1ΔCargf=ν(Γ,0) f,g analytic on and within C, ⇒f,f+g have the same number of zero counted with order inside C D:∣z∣≤1,int D:∣z∣<1, ⇒∃! z0∈D s.t. g(z0)=z0 (fixed point) closed curve Γ,Γ~, ⇔ν(Γ,w)=ν(Γ~,w) vector field V:z↦f(z), IV(Ci)=ν(Γ,0)=Z−P=i∑IV(Ci) zero or pole of f positively oriented contour Ci enclose singularity zi of V vector field index is integer number of rotation of V traversing Ci IV(Ci)=ν(Γi,0) T(z)=cz+daz+b:Cˉ→Cˉ,ad−bc=0 extended complex plane Cˉ:=C∪{∞} only singularity is simple pole −cd inverse T−1(w)=cw−ab−dw derivative non-zero T′(z)=(cz+d)2ad−bc=0 is composition of linear transformation and inversion f1(z)=cz+d,f2(z)=z1,f3(z)=ca−cad−bcz⇒T=f1∘f2∘f3 they are a group map circle to circle in Cˉ need 3 point to specify Azzˉ+(B−iC)z+(B+iC)zˉ+D=0A,B,C,D∈R map 3 distinct point z1,z2,z3 to w=T(z) by (w−w3)(w2−w1)(w−w1)(w2−w3)=(z−z3)(z2−z1)(z−z1)(z2−z3) f:D→C analytic f at z0, ⇒∃ U(z0),∃! f−1 analytic derivative (f−1)′(w)=f′(z)1 f conformal, ⇒f(C1),f(C2) intersect at f(z0) with acute angle ψ proof by taking f′(z0) through C1 and C2 and their equality proof by constructing f s.t. ϕ=Re f h:D⊆R2→R with continuous second partial derivative satisfy Laplace’s equation hxx+hyy=0 hormonic ϕ:R→R, simply connected R bounded by simple closed C, ⇒ϕ attain extremum on boundary proof by constructing f s.t. ϕ=Re f and using maximum modulus principle simply connected R bounded by simple closed C ⇒ϕ is unique proof by assuming both ϕ1,ϕ2 are solutions and arguing their difference is 0 using maximum principle logz=lnr+iθLog z=lnr+iΘ where Θ=Arg z introduce α s.t. r>0,α<θ<α+2θ dzdlogz=z1 ez=exeiy=excosy+iexsiny dzdcz=czLog c sinz=2ieiz−e−iz,cosz=2eiz+e−iz entire sin(iz)=isinhz,cos(iz)=coshz ⇒sinz=sinxcoshy+icoshxsiny⇒∣sinz∣2=sin2x+sinh2y⇒sinz=0⇔z=πk,k∈Z sinhz=2ez−e−z,coshz=2ez+e−z sinh(iz)=isinz ⇒∣sinhz∣2=∣isin(y−ix)∣=sinh2x+sin2y⇒sinhz=0⇔z=iπk,k∈Z sin−1z=−ilog(iz+1−z2)cos−1z=−ilog(z+i1−z2) dzdsin−1z=1−z21dzdcos−1z=−1−z21 ∫CRzn dz={02πin=−1n=−1 zn=xn+iyn,z=x+iy⇒ n→∞limzn=z⇔n→∞limxn=x and n→∞limyn=y ∑zn converge to S ⇔ partial sum SN converge to S zn=xn+iyn,S=X+iY⇒ ∑zn=S⇔∑xn=X and ∑yn=Y ρN=S−SN see also power series can integrate term by term within circle of convergence ∫Cg(z)S(z) dz=n=0∑∞an∫Cg(z)(z−z0)ndz S(z) holomorphic let C be any closed contour and set g(z)=1 ∫CS(z) dz=n=0∑∞an∫C(z−z0)ndz=n=0∑∞0=0 by Morera’s theorem, S(z) is holomorphic can differentiate S(z) term by term f holomorphic in ∣z−z0∣<R0 ⇒f analytic, f(z)=n=0∑∞an(z−z0)n,an=n!f(n)(z0) when z0=0 f(z)=2πi1∫Cs−z1f(s) ds=2πi1∫C(n=0∑N−1sn+11zn+(s−z)sNzN)f(s) ds=n=0∑N−1zn2πi1∫Csn+1f(s) ds+ρN2πizN∫C(s−z)sNf(s) ds=n=0∑N−1znn!f(n)(0)+ρN by the upper bound theorem, for r:=∣z∣, ∣ρN∣=(Rr)NR−r∥f∥∞R→0 as R→∞ in ∣z−z0∣<R, power series ∑n=0∞an(z−z0)n converge to f(z) ⇒ it is the Taylor series of f about z0 g(x):=2πi(z−z0)m+11==2πi1∫C(z−z0)m+1f(z) dzm!f(m)(z0)∫Cg(z)f(z) dz===n=0∑∞an2πi1∫C(z−z0)n−m−1 dzamn=0∑∞an∫Cg(z)(z−z0)n dz acbd=ad−bc followed from Riemann sum f∈c[a,b], partition evenly into N subinterval h:=xi−xi−1=Nb−a RN:=i=1∑Nf(xi∗)(xi−xi−1)=hi=1∑Nf(xi∗) ∣f′(x)∣≤M⇒ ∫abf(x) dx−RN≤Mh(b−a) Ti(x):=f(xi−1)+f′(xi−1)(xi−xi−1)TN:=i=1∑N∫xi−1xiTi(x) dx=hi=1∑Nf(xi−1)+2h2i=1∑Nf′(xi−1) ∣f′′(x)∣≤M⇒ ∫abf(x) dx−TN≤6Mh2(b−a) RN:=i=1∑Nf(xˉi)(xi−xi−1),xˉi=2xi−1+xi ∣f′′(x)∣≤M⇒ ∫abf(x) dx−RN≤24Mh2(b−a) P implies Q is equivalent to ∼Q implies ∼P assume ∼P is true, get a contradiction to prove (1)⇔(2)⇔(3), need only prove (1)⇒(2)⇒(3)⇒(1) S⊆X Sc:={x∈X∣x∈/S} ∣S∣=∣T∣⟺∃ f:S→T one-to-one correspondence empty or has cardinality of {1,2,⋯,n} for some n has cardinality of N a>0,b>0⇒∃ n∈Z,na>b function f from S to T is subset F⊆S×T t is the value of f at s: sft or t=f(s) for one-to-one function f, inverse function f−1 of f⟺ f(f−1(t))=t∀ t∈Dom(f)f−1(f(s))=s∀ s∈S {an}n=1∞ or {an} an:N→R see also Sequence and Series n→∞liman=aoran→a or {an} converge to limit a⇔ ∀ ε>0,∃ integer N(ε), so that ∀ n≥N, ∣an−a∣≤ε ∀ M,∃ N, so that ∀ n≥N, an≥M an is bounded iff ∃ M so that ∀ n ∣an∣≤M ∀ ε>0,∃ N so that, if n≥N,m≥N, then ∣an−am∣≤ε k↦n(k):N→N ∀ k∈N,n(k+1)>n(k) subsequence of {an}n=1∞, {an(k)}k=1∞ or {ank} limit point d of {an}⇔ ∀ ε>0,N∈Z,∃ n≥N, s.t. ∣an−d∣≤ε {an} bounded ⇒∃ {ank},d s.t. {ank}→d f is continuous at c∈Dom(f)⇔ ∀ sequence {xn},xn∈Dom(f),xn→c,n→∞limf(xn)=f(c) f is continuous at c∈Dom(f)⇔ ∀ ε>0,∃ δ>0, s.t. ∀ x∈Dom(f),∣x−c∣≤δ⇒∣f(x)−f(c)∣≤ε f is continuous on S⇒ f is bounded: ∃ B∈R, s.t. ∀ x∈S,∣f(x)∣≤B f continuous on [a,b]⇒ ∃ c,d∈[a,b] s.t.f(c)=[a,b]supf,f(d)=[a,b]inff and Ran(f)=[[a,b]inff,[a,b]supf] and f is uniformly continuous on [a,b] supremum Ssupf:=sup{f(x)∣x∈S} infimum Sinff:=inf{f(x)∣x∈S} f continuous on [a,b], f(a)=f(b), y∈R is between f(a),f(b)⇒ ∃ c∈(a,b) s.t. f(c)=y f is uniformly continuous ⇔ ∀ ε>0,∃ δ>0 s.t. ∀ x,c∈Dom(f)∣x−c∣≤δ⇒∣f(x)−f(c)∣≤ε f is Lipschitz continuous on S⇔ ∃ M s.t. ∀ x,c∈S,x=c,x−cf(x)−f(c)≤M partition P of [a,b] is any finite collection of point x0<x1<⋯<xN where x0=a,xN=b for subinterval [xi−1,xi] of partition P Mi:=xsup{f(x)∣xi−1≤x≤xi}mi:=xinf{f(x)∣xi−1≤x≤xi} lower sum LP(f):=i=1∑Nmi(xi−xi−1) upper sum UP(f):=i=1∑NMi(xi−xi−1) LP(f)≤UP(f) for any partition Q LP(f)≤UQ(f) for partition Q that contain all point of P and additional point LP(f)≤LQ(f),UQ(f)≤UP(f) f on [a,b] is Riemann integrable ⇔ Psup{LP(f)}=Pinf{UP(f)} ∀ ε>0,∃ partition P s.t. UP(f)−LP(f)≤ε ⇒f is Riemann integrable f continuous on [a,b]⇒f Riemann integrable f Riemann integrable on [a,b] Riemann integral ∫abf(x) dx:=inf{UP(f)} ∀ x∈[a,b], f(x)≤g(x)⇒ ∫abf(x) dx≤∫abg(x) dx f∈C[a,b]⇒ ∫abf(x) dx≤∫ab∣f(x)∣ dx∫abf(x) dx≤(b−a)[a,b]sup∣f(x)∣ i=1∑Nf(xi∗)(xi−xi−1) where xi∗∈[xi−1,xi] f continuous on [a,b], {Pk} is sequence of partition s.t. maximum length of subinterval →0 as k→∞, Sk is any Riemann sum corresponding to Pk⇒ Sk→∫abf(x) dxask→∞ f differentiable on [a,b] and f′ continuous on [a,b], or f∈C(1)[a,b] f∈C[a,b] differentiable on (a,b), f(a)=0=f(b)⇒ ∃ c∈(a,b), s.t. f′(c)=0 f∈C[a,b] differentiable on (a,b), ⇒ ∃ c∈(a,b), s.t. f′(c)=b−af(b)−f(a) f∈C(n)[a,b], f(n+1) exist on (a,b), x0∈[a,b]⇒ ∀ x∈[a,b],x=x0,∃ ξ between x,x0 s.t.f(x)=T(n)(x,x0)+(n+1)!f(n+1)(ξ)(x−x0)n+1 where T(n) is define in Sequence and Series proof fix x, let α s.t. f(x)=T(n)(x,x0)+(n+1)!f(n+1)(ξ)(x−x0)n+1+α(x−x0)n+1g(t)=f(t)−T(n)(t,x0)−α(t−x0)n+1 apply Rolle’s theorem n time to show ∃ ξ between x,x0 s.t. g(n+1)(xn+1)=0 {fn}, fn defined on E fn converge pointwise to limiting function f⇔ ∀ x∈E,n→∞limfn(x)=f(x) fn converge uniformly to limiting function f⇔ ∀ ε>0,∃ N s.t. ∀ n≥N∀ x∈E,∣fn(x)−f(x)∣≤ε ∀ n,fn∈C(E)⇒f∈C(E) ∀ n,fn∈C[a,b]⇒ n→∞lim∫abfn(x) dx=∫abf(x) dx sequence of function Fn, Fn∈C(1)[a,b], Fn′=fn, ∃ x0,{Fn(x0)} converge ⇒F∈C(1)[a,b], F′=f Q:=[a,b]×[c,d], f∈C(Q). ∀ x0∈[a,b],f(x0,⋅)∈C[c,d], fy∈C(Q). F on [c,d], F(y):=∫abf(x,y) dx ⇒F∈C(1)[c,d], F′(y)=∫abfy(x,y) dx f bounded on E supremum norm or sup norm of f ∥f∥∞:=Esup∣f(x)∣ {fn} on E converge in the sup norm to f⇔ n→∞lim∥fn−f∥∞=0 {fn} on E is Cauchy sequence in the sup norm ⇔ ∀ ε>0,∃ N s.t.∀ m,n≥N,∥fm−fn∥∞≤ε f∈C[a,b] Q:=[a,b]×[a,b], K∈C(Q), ∀ x,y∈Q,∣K(x,y)∣≤M ψ(x)=f(x)+λ∫abK(x,y)ψ(y) dy α:=∣λ∣M(b−a)<1⇒ψ has unique continuous solution proof of existence define any continuous ψ0 and ψn(x)=f(x)+λ∫abK(x,y)ψn−1(y) dy show that ψn∈C[a,b] by induction show that ∀ x∈[a,b] ∣ψn+1(x)−ψn(x)∣≤α∥ψn−ψn−1∥∞ show by iteration that ψn is Cauchy in sup norm proof of uniqueness by contradiction: subtract equation of two solution function function with domain containing function f is twice continuously differentiable functional of three variables J:=∫abf(x,y(x),y′(x)) dx y(x) is extremal for J⇒y(x) satisfy Euler equation fy−dxdfy′=0 (M,ρ) ρ:M×M→[0,∞] ∀ x,y,z∈M, {xn}, xn∈(M,ρ) {xn} converge to x∈M⇔ n→∞limρ(xn,x)=0 also denoted as n→∞limxn=x metric ρ,σ on M are equivalent ⇔ ∀ ε>0,x∈M,∃ δ>0 s.t. ∀ y∈Mρ(x,y)<δ⇒σ(x,y)<εσ(x,y)<δ⇒ρ(x,y)<ε T:M→M is contraction ⇔ ∃ α∈[0,1) s.t.∀ x,y∈M,ρ(T(x),T(y))≤αρ(x,y) T(xˉ)=xˉ stable ∃ δ>0 s.t. x0∈[xˉ−δ,xˉ+δ]⇒xn→xˉ (M,ρ) complete ⇒ ∑aj converge ⇔∀ ε>0,∃ N s.t. ∀ n,m≥N,j=n∑maj≤ε ∑aj converge ⇒aj→0 linearity {an}, an∈R sˉN=sup{an∣n≥N}sN=inf{an∣n≥N} {an}→a⇔ −∞<limsupan≤liminfan<∞⇔limsupan=a=liminfan an>0⇒ liminfanan+1≤liminfnan≤limsupnan≤limsupanan+1 limsupan:=sˉ:=N→∞limsˉN liminfan:=s:=N→∞limsN partial sum Sn of series ∑j=1∞aj Sn:=j=1∑naj ∑∣aj∣ converge ⇒∑aj absolutely converge ∃ J,∀ j≥J,0≤aj≤bj≤cj⇒ α:=limsup∣aj∣j1 aj=0 {fj(x)},x∈E f(x)=j=1∑∞fj(x) ⇒∑j=1∞fj(x) converge uniformly ∑j=1∞fj(x) converge uniformly to f(x) on [a,b]⇒ ∀ x∈[a,b], ∫axf(t) dt=j=1∑∞∫axfj(t) dt ⇒ f(x):=j=0∑∞aj(x−x0)jρ:=limsup∣aj∣j1 radius of convergence of f(x) R=⎩⎨⎧∞0ρ1ρ=0ρ=∞otherwise DNE = doesn’t exist near a = ∃ b,c,b<a<c, on (b,c) NOS = need only show thm = theorem WLOG = without loss of generality WTS = want to show WHP = with high probability set M set O⊆P(M) with all its element open set M and choice of set of open subset O neighborhood of p: U∈O,p∈U S⊆M closure of S, Sˉ=S∪∂S point p ∃ U∋p s.t. U∩S=U point p ∃ U∋p s.t. U∩S=∅ neither interior nor exterior of S limit point p of S⇔ ∀ U^,U^∩S=∅ ∀ U∋p,∃ m∈N s.t. n>m⇒s(n)∈U topological space (M,OM),(N,ON) map f:M→N continuous ⇔ ∀ V∈OM,preimf(V)∈OM continuous g:D→D ⇒∃ at least one fixed point assign each fixed point index νi ⇒ Lefschetz number L=∑iνi is topological invariant install binary update all binaries installed search package install package update list installed packages show dependency tree of package export package and cask name to reproduce environment display information about package clean up junk no stop-the-world auto update install a_gem install all gem in project remove outdated gems find package: search online install prebuilt package install package update nix and all nix package check and fix nix store help node find global module (change the path if necessary) get the fastest mirror (France for example) set mirror resolve untrusted packages when updating always use upgrade pip3 update all check version remove cache The horrible LATEX package manager. find which package to download for a missing file install package force version of dependency incompatible w/ some other dependency (https://github.com/astral-sh/rye/issues/505) and the char array, null terminated in this case, same as declare before use empty declaration is fine near global variable (file only) file scope function number representing memory address struct with overlapped field explicitly typecasting has the same effect usually what you do compile and detect memory leak print Charlist as list of integer control sequence introducer need a dot to call git clone without downloading check what are the branches force sync from origin pull main without checking out main fetch only 1 commit sane config commit with message stage all changes and commit revert to the last commit or change the last commit create orphan branch delete all history of a certain file (deprecated) delete all history of a certain file using git-filter-repo ignore all symlink git commit sizes verify no change since commit remove uncommitted files remove ignored files go back and clean up history squash merge add repo as submodule pull every repo under the current folder fetch and status every repo under the current folder push all branch to all remote push main to all remote control sequence introducer 1B two’s complement, e.g. 2B 4B 4B 8B 8B operation: abs, max, sin, asin, exp, log, pow, round, random, sqrt variable: PI, E 2B built-in reference type corresponding to primitive type an array of array at least do once access from same package access everywhere access from same package or subclass access from same class make the variable immutable make variable or function static, as opposed to instance support HTML throw the error if scanner: same method name as class include nothing have everything super class have a class that can be fed with different data type imports the method so that its prefix can be omitted don’t use can contain multiple types of data is just immutable array formatted string minimum rather dictionary is equivalent to defined inside of the object block use constructor enable type create Julia support real Unicode operator Julia support arbitrary comparison chaining Use double or triple quote for string literal index take raw bytes index start at 1 can use traditional syntax assignment form call with named function use with anonymous function use with do-block function can have different method for different type or is equivalent to is equivalent to example above apply declare nullable variable return assume not-null using non-null assertion operator return refer to existing function create anonymous function directly or refer to the only argument as call function using trailing lambda syntax can only call in a brainless scope scope of current function android UI component can use can be started in background using use context to choose prettry print only loop through map in order array is just map with key 1, 2, 3, … insert element into array in order sort array use a variable amount of argument if statement while statement for statement emulate ternary operator (neither option can be falsy) reload module eagerly (Lua does not read the file again by default at the second install package A/B delimiter multiline string or raw string formatted string literals (f-string) enclose variable name in format language format no data dictionary without value some method map to arithmetic operator but those only work on other set instead of any iterable function name are variable change something external to the function itself placeholder for variable automatically return exit loop Local ≤ Enclosing ≤ Global ≤ Built-in empty structure to contain data function in class object built from class mutable variable with initial value every object of this class has, defined right under the class structure variation (can be combined) a folder to group modules for the package to be recognized a package under a package automate install, update, remove of third-party package virtual environment is used to make sure things work after some update start from root start from cwd all level of directory from low to high remove subtree both can only interact with byte directly exactly like create file if not exist and parent exist kill the current tqdm progress bar: exact result by default, to get approximation imaginary unit vector are represented as list a list of list search help 1D array 2D array start with underscore to distinguish from function specify the language used in the first line of a program string literal can have new line call function simple string string considering match a string with regex and return index of match match string with regex and name each part same as in this map route and set up URL and path helper method in generate model in embedded Ruby in HTML run ruby code run ruby code and render the return value write comment that does not render in result product a link with some_text to redirect to show a model object render image link to the same page with different params partial has name starting with e.g. relative path without add argument render a collection or even longer format in controller for generate by default, instance variable in controller can be accessed in view need to query model object by only initialize new model object assign value to new model object and attempt to save it SQLite3 by default in open a rails irb console a function that is implemented on a type immutable by default new string enumeration with variant handle error and convert to convert error to value: using indicate successful run, contain result indicate failure, contain reason crash on represent instruction for action end with represent a value end without pass it a reference because it take ownership of argument inline comment both same name as package use the parent path use the whole path brings everything into scope empty method default implementation use default implementation—do nothing override implementation shorthand trait bound syntax anonymous function stored as variable that can capture its environment implement borrow by default implement adopt ownership using letter, choose switch user to root and do print working directory list file in change working directory to go up one directory go back to the last directory copy one file to a given directory [and name it as instructed (the new name cannot be a local folder’s name, or it will just copy it there)] copy recursively one folder to a given directory remove one file remove recursively one folder remove one empty folder make directory run the file display [human-readable (i.e., in MB, GB, etc.)] file system disk space usage disk usage for this directory free and used memory in the system [in megabytes] table of processes print all system information: name, kernel, etc. print version information for the linux release add a new user assign password to the user manual for the command introduction to linux command-line search in manual what is show command you input with numbers before them Ctrl + R search for command you input Ctrl + A/ Home move cursor to start of line Ctrl + E/ End move cursor to end of line Ctrl + B move cursor to beginning of word Ctrl + K delete to the end of line Ctrl + U delete to start of line Ctrl + W delete word Alt + B go back one word Alt + F go forward one word Alt + C capitalize the letter and move cursor to end of word control sequence introducer (CSI): clear screen and return to home: clear line and return to left: Silly ideas about how a language should have all its syntax trailing. Symbols can be names. The program is interpreted into a program that gets compiled. want: meta question: (code and discussion are taken private; please ask Steven for repo access) see also clone w/ use Rye to manage Python dependencies ( register Pre-commit Hook to run linters and formatters automatically before Git commit: run (citation in Quarto format w/ Text-based LLM detection cannot generalize to all webpages. Types of webpages from the viewpoint of such detection (enumeration based on Deepseek): Single or cohesive narrative/text blocks. Blog Posts, News Articles, Research Publications, Tutorials/Guides, E-books/Whitepapers, Podcast Transcripts, Recipes, Interactive Stories, Archived Content. 👌 These can be treated as a single block of text and directly classified. Multi-Section Text Pages. Homepage, FAQ, Glossary, Forum (boards/posts), Directory, Wiki/Knowledge Base, Portfolio, Testimonials, Case Studies, Team Directory, Event Listings, Press Releases, User Profiles, Social Feeds, Q&A Platforms. 😰 These, when treated as a single block of text, causes discontinuity in the text and degrade text detection performance. Segmenting them and classifying each segment separately may work. Boilerplate/Legal/Standardized Content. Privacy Policy, Terms of Service, Disclaimer, Product Pages (descriptions), Download Pages, Account Settings, Pricing Pages, Services Pages, Career Listings, API Documentation, Client Dashboards, Affiliate Pages. 🤷 These have standardized forms, such that humans would write them in a similar way that LLMs do, so detection makes little sense. Media/Non-Text Pages. Image Galleries, Video Pages, Audio Streams, 3D/Virtual Tours. ❌ Text detection is not applicable to these. Interactive/Functional Interfaces. Dashboards, Quizzes/Surveys, Calendars, Booking/Checkout Pages, Login/Registration Forms, Search Results, Advanced Filters, Live Chat, Calculators/Converters, Games, AR/VR Interfaces. Stock Tickers, Weather Forecasts, Order Tracking, Live Streams/Webinars, Auction/Bidding Pages, Real Estate Listings, Job Boards. Comparison Tools, Financial Calculators, Medical/Appointment Systems, Code Playgrounds, Maps. ❌ These are not really text-centric content, so text detection is not applicable. identify thousands of websites each week using AI-generated content See https://sichanghe.github.io/notes/research/web_user_facing.html#search-engine-optimization-seo. See https://sichanghe.github.io/notes/research/gen_ai.html. plan: script: result dumped to low-score page: lower-score page (above FPR-threshold but around F1-threshold): script: script: extracting&counting ad may help identify content farm to load all ad, scroll to the bottom and top, 20 time combined, each time wait for re-crawled all 0-ad page after adjusting browser interaction w/ see Analyzing the (In)Accessibility of Online Advertisements most should be Google Ads Binoculars, and presumably other generated text detectors, can only perform well when detecting articles such as blog posts. Non-articles, include lists of links on a homepage, login pages, dashboards of sports scores, and spec charts of products, may cause high false positive rates. Current method to filter out non-articles: Ideas for filtering: Unreliable ideas: Instead of only filtering out what we cannot handle (non-article), we can also try to generalize our method to all webpages. If we could split a webpage into sections, we could apply Binoculars to each section. For example, we could segment a homepage full of links into each of the links it contains, and subsequently filter them out based on length; we could segment a forum page into each comment. Importantly, we need to segment after main text extraction with Trafilatura. credible source personal website company website find note: each generator: covering: want: commercial: paper: testing: paper on AI-generated text: news: display info on search result/ webpage itself extension to automatically detect AI-generated text: why extension for human to vote DNE: Study why websites use JS. Specifically, what the JS is for: intention, use, application, etc. (spheres), that may or may not be replaceable by plain html/css or moving the functionality to the backend. Spheres may overlap, producing a Venn diagram. I believe front-end JS is overused. Many websites ship multiple MiB of JS to their clients (e.g. Facebook), but JS is designed for scripting, not systems. Current mitigations to this issue is to compile from TS, etc., but JS is not designed as a compilation target—we introduced WASM for this. WASM cannot replace JS, but I hope it will in the future. WASM cannot replace JS because it lacks JS features: To mitigate the first problem, we need to understand what front-end JS is used for. Why do websites need JS in the first place? TODO: Implication: only open web, no login. Towards better user experiences online. Track the evolution of JS usage in websites. Where are we going? Are we getting more JS unnecessary? Are we on a slope to JS hell? run JS in JS. track user data flow, decide if sent browser add feature but not remove. browser feature: n:1 map to web standard; found in Firefox WebIDL file. browser extension for instrumentation and enough 5-round user interaction. HTML&DOM API most prominent; Beacon for tracking; hardware access, storage. dataset: Web API usage in the Alexa 10k instrument Blink&V8 via hook. record DOM change+navigation. reconstruct web attack, flowchart viz VisibleV8, maintained, instrument V8. Tracking JS w/ JS can be spotted. track user to analyze UX. poorly written. VV8 creates a log file per thread, roughly equivalent to a browser page we create plus some junk background workers. Each of Details in By manually inspecting the 678 most popular APIs that make up 90% of all API calls in the top 100 sites, we spot “anchor” APIs (list in Out of 678 most popular APIs, which takes up 90% of all API calls in the top 100 sites: We aggregated data for all API calls, filtering out user-defined functions and other irrelevant data as possible. Results are in Overall, data are tail-heavy, aligning with prior observations: From the CCDF of number of calls per API, we can see that 0.1% of APIs (18) are called over 1,000,000 times, massively outnumbering other APIs. There is also a 2-4 time gap between total API calls and API calls after interaction began, signalling a large number of API calls before interaction (precisely, 45,109,934 before vs 69,625,413 after). Note: this graph uses the To find how many APIs need to be investigated to cover 90% of all API calls, we plot the CDF of the fractions of API calls vs fractions of APIs: It takes 1.75% (318) APIs to cover 80% of all API calls, and 3.74% (678.0) APIs to cover 90%. As a rough look, we sample the top 20 API calls based on various metrics. The most called APIs are mainly DOM-related, including many “get” calls on on From the relatively low Unfortunately, the top 20s for Since “get” calls do not usually create side effects, we focus on function calls. The most popular function calls overall matches our expectations, mainly focusing on querying and interacting with the DOM. There are some interesting popular “set” calls as well, despite the many internal functions. Some APIs clearly indicate developers’ intent. We look at them case by case. Interestingly, most of these calls are done after interaction began, probably during YouTube’s navigation-less page switching. I.e., YouTube does not load a new page when you click a link, but instead swaps out the content of the current page and changes the URL. The script with context id Script with ID Each direct Direct In reality, the Hoisting: some constructs behave as if they are evaluated first before the script is run. Top-level functions. We simply Variable declarations. Functions cannot capture variables declared in adjacent We identify all variable declarations in the current scope, declare all of them first using (Deferred) Some assignments do not have a single variable identifier on their left-hand side (LHS), such as destructuring. The LHS can contain arbitrary expressions. We currently stick For each Functions declared in IIFEs do not leak out. We declare the function identifiers as variables first, then convert all function declarations to assignments of function expressions to those variables. (Deferred) This seems to break a few scripts, especially when they call functions defined in other scripts. No idea for solutions yet because the error messages did not reveal the cause. We put them at the top of the script. We keep these scripts as is. We do not rewrite with We Bloat: deeply nested We use Browser crash with stack size exceeded when nesting We limit the depth of Of the 40116 scripts we analyzed (3192.7MB, details in We crawled the top 1000 websites with the problem: steps: split photo into public and private part, encrypt private part proposal: compare photo crypto metadata time+location w/ social media post text before manipulative post go viral. good for news. need to balance privacy “Click”, “Capture Cam” and “ProofMode” in Peripheral Literature for C2PA. Photos taken: “Verify” page is happy w/ Click & Capture Cam photo, but show Accessible, scalable and performant Countermeasures Countermeasures Countermeasures [1] Balan, K., Agarwal, S., Jenni, S., Parsons, A., Gilbert, A. and Collomosse, J. 2023. EKILA: Synthetic media provenance and attribution for generative art. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2023), 913–922. [2] Bureacă, E. and Aciobăniței, I. 2024. A blockchain blockchain-based framework for content provenance and authenticity. 2024 16th international conference on electronics, computers and artificial intelligence (ECAI) (2024), 1–5. [3] Bushey, J. 2023. AI-generated images as an emergent record format. 2023 IEEE international conference on big data (BigData) (2023), 2020–2031. [4] Collomosse, J. and Parsons, A. 2024. To authenticity, and beyond! Building safe and fair generative AI upon the three pillars of provenance. IEEE Computer Graphics and Applications. 44, 3 (2024), 82–90. [5] Fotos, N. and Delgado, J. 2023. Ensuring privacy in provenance information for images. 2023 24th international conference on digital signal processing (DSP) (2023), 1–5. [6] Kharvi, P.L. 2024. Understanding the impact of AI-generated deepfakes on public opinion, political discourse, and personal security in social media. IEEE Security & Privacy. (2024). [7] Mo, J., Kang, X., Hu, Z., ZHou, H., Li, T. and Gu, X. 2023. Towards trustworthy digital media in the aigc era: An introduction to the upcoming IsoJpegTrust standard. IEEE Communications Standards Magazine. 7, 4 (2023), 2–5. [8] Rainey, J., Elawady, M., Abhayartne, C. and Bhowmik, D. 2023. TRAIT: A trusted media distribution framework. 2023 24th international conference on digital signal processing (DSP) (2023), 1–5. [9] Romero-Moreno, F. Deepfake fraud detection: Safeguarding trust in generative ai. Available at SSRN 5031627. [10] Shoker, S., Reddie, A., Barrington, S., Booth, R., Brundage, M., Chahal, H., Depp, M., Drexel, B., Gupta, R., Favaro, M., et al. 2023. Confidence-building measures for artificial intelligence: Workshop proceedings. arXiv preprint arXiv:2308.00862. (2023). [11] Strickland, E. 2024. This election year, look for content credentials: Media organizations combat deepfakes and disinformation with digital manifests. IEEE Spectrum. 61, 01 (2024), 24–27. [12] Vilesov, A., Tian, Y., Sehatbakhsh, N. and Kadambi, A. 2024. Solutions to deepfakes: Can camera hardware, cryptography, and deep learning verify real images? arXiv preprint arXiv:2407.04169. (2024). right to be forgotten: EU & California law people search site Data Defense: Evaluating People-Search Site Removal Services, Yael Grauer, Victoria Kauffman, Leigh Honeywell, Consumer Reports, 2024 web atom: group of URL that change together with high probability They are also indicators of lower-quality, possibly mass-produced, or even AI-generated content. examples: ideas: set up server through SSH: https://remotedesktop.google.com/headless (need enable nfs add directory to export: edit export all specified directory mount on macOS automatically mount on macOS open add into the file: for pre-Catalina, it should be kitty ssh port forwarding general usage copy file from ssh server to client copy file from client to ssh server fund from UCCA Lab w/ corporate client give artist full control of space hint what’s inside at the museum doorway reuse old art aim → goal SMART goal: specific, measurable, achievable, relevant, timely doubly sand glass urge timely report GitHub linguist settings in Git large file system (Git LFS) settings in GitHub repository size: https://api.github.com/repos/git/git (from Haodong) places house vs apartment: house old, apartment expensive & small & have maintenance website: Zillow careful for: heat, floor noise PhD student are automatically silently enrolled in USC Aetna Student Health Insurance Plan (SHIP) (for automatically completing part of the “training”) Staying safe: Diversity, Equity and Inclusion for Students: same as by word but capitalize the last letter a WORD contains any consecutive non-whitespace character type a character in replace of capitalize the letter or capitalize first letter simple camelCase to snake_case note: replace variable starting with SSH into server with port forwarding, then run Tmux: alternatives to run Code Tunnels in Tmux on server. needed to get headless VSCode running on server download vscode cli and install on server: Download Visual Studio Code - Mac, Linux, Windows start code tunnel on server, log in, etc. follow instruction, get connect to this code tunnel from headless Chrome in Tmux on server, and start Live Share. needed to get persistent session download chrome and install on server: Google Chrome Web Browser. to do this w/o sudo on debian, only extract the start headless Chrome on server: connect headless Chrome from local machine with Chrome DevTools remote debugging. open open up chrome console and overwrite clipboard. needed to see Live Share link later because DevTools remote clipboard is fake: start Live Share from UI. open directory in UI, click Live Share button at the bottom to start it, choose log in. complete login from remote popup tab by going back to get Live Share link. return to UI tab, Live Share should be on, it should say link copied to clipboard (if not, click bottom Live Share button again to copy again). get link in console as printed out (optional) to manage use the Live Share link yourself and send it to others. you can close your local browser and SSH connection now. the session remains up as long as your Tmux session on the server is up Enabling discussion for feedback from people trying this out: https://medium.com/@jozott/format-on-save-xcode-swift-8133d049b3ac launch check restart Xcode when running GCC, if some of these are generated from old NP&P → decision problem on desicion problem → board game P space: ∃x1,∀x2,∃x3,⋯ s.t. f(x) where ∣x∣ is polynomial #P: account #solution to NP problem linear programming: proved P learning problem binary search [1:n] (go to middle) power of randomization: O(n2) but Θ(n) wrt randomness: ⇒ET(n)≤n+21ET(43n)+21ET(n)⇒ET(n)≤2n+ET(43n)⇒ET(n)∼O(n) ⇒ distance graph G=(V,E),E⊆V×V ℓ(e),e∈E binary search is graph w/ ℓ(e)≡1, each number node has edge w/ next number generalized undirected graph “binary search” for target t∈V from qi∈V claim: ∣Pt+1∣≤21∣Pt∣ proof: ΦPt(u)≤ΦPt(qt)−(∣Pt+1∣−∣Pt/Pt+1∣)−ℓ(qt,vt),ΦPt(u)≥ΦPt(qt)⇒−∣Pt+1∣+∣Pt/Pt+1∣≥ℓ(qt,vt)≥0⇒∣Pt+1∣≤21∣Pt∣ A General Framework for Robust Interactive Learning, Ehsan Emamjomeh-Zadeh, David Kempe given hypercube X, search space H, find h∈H s.t. h≤X cut: split graph V to S and Sˉ cut(S)={(u,v)∈E∣u∈S,v∈Sˉ} affinity: inverse distance, closeness traditional Kruskal reversed Kruskal given G=(V,E,d) w/ metric d, k, want (S1,⋯,Sk) s.t. given: input ground set V={1,⋯,n}, S1,⋯,Si⊂V w/ wi,i∈I=[1,⋯,m] want: T⊂I s.t. ∪i∈TSi=V and ∣T∣ minimized T⊂I∪i∈TSi=Vargmini∈T∑wi idea: minimize average cost ∣Si∣wi algorithm: not optimal when many large Si also cover small Sj claim: greedy is within logn of optimum, where n=maxj∣Sj∣ doing consistently better than logn of optimum is NP-hard greedy cost cG(it)=∣Sit∩Ut∣wit≤H(∣Sjt∣)wjt f:{S∣S⊆V}→R property: ∀S⊆T,∀v∈/T ∇fS(v)≥∇fT(v) problem input: submodular f:{S∣S⊆V}→R∗, k output: S⊂V, s.t. ∣S∣=k, maxf(S) oracle model: can get f(T) example: max cover: for mapping f:V→B choose k element from V to maximize ∣f(S)∣ greedy algorithm S′: at time t, add ut that maximize ∇fut(St−1′) theorem: ∀f monotone & submodular, ∀k, f(S′)≥(1−e1)f(S∗) proof: S∗=:{v1,⋯,vk};f(S∗)≤f(S∗∪St′)=f(St′)+[f(S∗∪St′)−f(St′)];f(S∗∪St′)−f(St′)=i=1∑k[f({v1,⋯,vi}∪St′)−f({v1,⋯,vi−1}∪St′)]=i=1∑k∇fvi({v1,⋯,vi−1}∪St′)≤i=1∑k∇fvi(St′)=k∇fvi(St′)≤k∇fut+1(St′)=k[f(St+1′)−f(St)]⇒δt:=f(S∗)−f(St′)≤k[f(St+1′)−f(St)]=k[(f(St+1′)−f(S∗))+(f(S∗)−f(St))]=k[−δt+1+δt]⇒δt+1≤kk−1δt≤⋯≤(1−k1)t+1δ0=(1−k1)t+1f(S∗)≤e−kt+1f(S∗)⇒f(Sk′)=f(S∗)−δk≥(1−e−kk)f(S∗) alternative definition for submodular function (equivalent): ∀A,B⊆V,f(A)+f(B)≥f(A∩B)+f(A∪B) Reach(S):={v∣can reach v from S} f(S)=∣Reach(S)∣ f submodular: ∇fS(v)=∣{x∣can reach x from v but not S}∣ input G=(V,E),pe;t=1,⋯,T each node has probability p to influence each node v has threshold θv to be influenced by neighbor input: P={p1,⋯,pn}⊆Rd (21−ε)-median algorithm, use 2-way tree of height h: proof: h+1Pr(x)=(23)hPr(x)2(1−hPr(x))+(23)hPr(x)2⇒h+1Pr(x)=3hPr(x)2−2hPr(x)3 P={p1,⋯,pn}⊆Rd construct k-NNG: divide and conquer (O(n(logn)2)) non-overlapping 2D ball set condition: if can dig n round lake on the spherical, then charge $1 for each lake on tour though great circle proof: assume the globe has radius 1 each lake i define a belt of width 2ri perpendicular to great circle passing through it ⇒ expectation of charge: i∑4π122π⋅2ri=i∑ri lake area cannot exceed globe area: i∑πri2≤4π12⇒i∑ri2≤4 clearly want equal ri=r0 by convexity ⇒i∑ri≤nr0=nnr02≤n4=2n max number of non-overlapping ball to touch one ball a great circle divide disk into BN,BS,BON can have conformal map s.t. median of all disk center is center of globe ⇒∣BN∣,∣BS∣≤(1−d+21)n=43n∣BON∣≤2kd1n1−d1=2n can build binary search tree by successive random split through median in d-dimension, with ∞ convex point set, if ∀d+1 convex set Cπ1,⋯,Cπd+1, all them intersect, then all ∞ of these set intersect in d-dimension proof: {∑i=1d+2aixi=0∑i=1d+2ai=0 has d+1 linear equation ⇒ has non-trivial solution, i.e., ai=0 W:=i=1,ai>0∑i=d+2ai=i=1,ai<0∑i=d+2(−ai)⇒Wai∈(0,1] ∀ai⇒p=i=1,ai>0∑i=d+2Waixi=i=1,ai<0∑i=d+2W−aixi is both in convex hull of {xi∣ai>0} and {xi∣ai<0} classic algorithm: O(n2)—not scalable can divide each number into high and low half x⋅y=xH⋅yH⋅2n+(xH⋅yL+xL⋅yH)⋅22n+xL⋅yL can save one smaller multiplication by: xH⋅yL+xL⋅yH=(xH+xL)(yH+yL)−xH⋅yH−xL⋅yL FFT make multiplication O(nlogn), nearly as easy as addition realizable data recover from (x1,ℓ1),⋯,(xL,ℓL) by solving 11⋮1x1x2⋮xLx12x22⋮xL2⋯⋯⋱⋯x1nx2n⋮xLna0a1⋮an=ℓ1ℓ2⋮ℓL easy to multiple (xi,f(xi)) and (xi,g(xi)): O(1) need 2n+1 data point to recover f⋅g active learning: can pick nice data point as wished divide unit circle evenly ⇒ all in form eiθ ⇒ sample zj=wnj:=einjπ for j=0,⋯,2n−1 ∀w=1,wn=1, i=0∑n−1wi=1−w1−wn=0 can save computation by wn2j=(wnj)2, etc. divide recursively: f(x)=j=0∑n−1ajxj=j=0∑2na2j(x2)j+xj=0∑2na2j+1(x2)j ⇒ calculate all in O(nlogn) O(∣V∣+∣E∣) approximation: P=αn1+(1−α)MTP⇒(I−αMT)P=αn1⇒P=(I−(1−α)MT)−1αn1=(t=0∑∞α((1−α)MT)t)n1i=0∑∞α(1−α)t=1 ⇒(MT)tn1: start random and walk t round flow: same source&destination (IP&port) & protocol NABC for research: (hook), need, approach, benefit, competition dynamic host configuration protocol (DHCP): automatic IP address assignment multiprotocol label switching (MPLS): show up on BGP external paper the first j keys but sorted convert a heap to a max-heap convert a heap to a max-heap from a max-heap that act as a queue return largest ( pop max increase a key and keep the heap a max-heap sort integer know to be less than a constant sort according to the last digit, then the previous, until the first bounded above and below bounded above bounded below bounded above, not asymptotically tight bounded below, not asymptotically tight the ith smallest element in the set halfway point of the set ⌊(n+1)/2⌋ find an element in the set larger than exactly i−1 element most supervised learning when ppl talk about Mill’s utilitarianism: maximize benefit for everyone, mostly equally bane: Kant’s formalism/ duty ethics: unconditional command for every individual bane: universal principle may harm specific people Locke’s Right Ethics: individual has right simply by existence bane: what to do when two people’s rights conflict Aristotle’s virtue ethics: objective goodness from human qualities bane: how to find the “golden mean” the golden rule (all above agrees with) Do unto others as you would have others do unto you statements of general principles, followed by instructions for specific conduct defiens duties the professional owes to society/employers/clients/colleagues/subordinates/profession/self perceptional abilities PEAS: performance measurement, environment, actuators, sensors properties nentry=t=1∑T∣P∣T match input against rule, return action expand node with least path cost g(n) example finite tuple {S,A,{Psa},γ,R} space S action A state transition probabilities Psa discount factor γ∈[0,1) reward function R: evaluation metric total payoff. maximize this V=i∑γiR(si) policy π:s↦a. find this optimal policy π∗ π∗=aargmaxs′∑Psa(s′)V∗(s′) mapping state to expected total payoff Vπ:s↦R Vπ(s)=E[i∑γiR(si)s0=s]=E[[R(s)]+γVπ(s′)]⇒Vπ(s)=R(s)+γs′∑Psπ(s)(s′)Vπ(s′) Bellman equation V(s):=0 Bellman update: ∀ s, V(s):=R(s)+amaxγs′∑Psa(s′)V(s′) random π repeat: V:=Vπby Bellman equationπ(s):=aargmaxs′∑Psa(s′)V(s′) ε-greedy a={argmaxVrandom a∈Aprobability 1−εprobability ε softmax V(s)=R(s)+γamaxEs′∼Psa[V(s′)] 4 ~ 8 dimension approximate V∗ V∗(s)=θTϕ(s) trial model/simulator: s′↦Psa s′=s+Δts˙ learn from data approximate V∗(s) from s(i),i∈{1,…,n} trial: ramdomly sample s(i),i∈{1,…,n} initialization: θ:=0 repeat for i∈{1,…,n}: repeat for 1∈A: sample si′∼Psa(i),i∈{1,…,k} q(a):=k1j=1∑k[R(s(i))+γV(sj′)] R(s)+γEs′∼Psa[V(s′)] y(i):=amaxq(a) any regression model, e.g. θ:=θargmini∑(θTϕ(s(i))−y(i))2 R:S×A→R Bellman equation: V∗(s)=amax[R(s,a)+γs′∑Psa(s′)V∗(s′)]⇒π∗=aargmax[R(s,a)+γs′∑Psa(s′)V∗(s′)] finite tuple {S,A,{Psa},T,R} time horizon T∈(0,+∞) maximize ∑t=0TR(st,at) action based on time πt∗ time-dependent dynamic Vt(s)=E[t′=t∑TR(st′,at′)]Vt∗(s)=⎩⎨⎧amax[R(t)(s,a)+Es′∼Psa(t)(Vt+1∗(s′))]amax[R(t)(s,a)]t∈{0,…,T−1}t=T S:Rn,A:Rd,n>d linear transition with noise Psa st+1=Ast+Bat+ωt negative quadratic reward to push system back R(st,at)=−stTutst−atTvtat stochastic policy πθ:S×A→R πθ(s,a): probability of taking action a at state s direct policy search: find reasonable θ θmaxE[t=0∑TR(st,at)πθ] repeat: sample st,at reinforce θ:=θ+αt∑[πθ(st,at)∇θπθ(st,at)]t∑R(st,at) reason it converge: product rule θ:=θ+αt∑[∇θlnπθ(st,at)](f(τ)−b(s)) at time t+1, update V(st) V(st):=V(st)+α(δt) TD error δt δt=R(st+1)+γV(st+1)−V(st) initialize S behavior policy: select at based on Q repeat for each step select potential a′ for s′ based on Q update Q(s,a) Q(st,at):=Q(st,at)+α[R(st)+γQ(st+1,a′)−Q(st,at)] s:=s′ until s is terminal same as SARSA except Q(st,at):=Q(st,at)+α[R(st)+γamaxQ(st+1,a)−Q(st,at)] operational expense (OPEX), capital expense (CAPEX) → centralize server tenant isolation: challenge for cloud provider cloud provider classification: infrastructure/ platform/ software as a service data center hierarchy: rack, row & aisle, pod (building unit) top-of-rack (ToR) switch: 2 per rack, high capacity network topology Internet redundancy: multiple ISP data storage Hadoop Spark paravirtualization: modify guest OS to not use kernel space full virtualization container automation level: deployment & configuration, → monitoring & measurement, → trends & prediction, → root cause analysis, → troubleshooting orchestration Kubernetes microservice: smaller scope & team, modularity, less complexity, language flexibility, test coverage, rapid deployment, fault isolation; cascading error, functionality & data duplication, attack surface service mesh proxy: load balancing, convert conventional communication protocol hysteresis: delay in response to change serverless/ FaaS: API redirection & runtime & DB by provider. event-driven, stateless, scale from zero to infinite classic software development → agile → DevOps → site reliability engineering (SRE): merge development, quality assurance, operation networking safety and isolation: harden north-south traffic, assume cooperation & prevent traffic among tenant on east-west traffic remote storage interface: file/ block interface (NFS/ virtual disk) object storage: key-bytes pair information: data with meaning knowledge: actionable information exploratory data analysis (EDA) relation ≈ table data definition language (DDL): for schema data manipulation language (DML): for query and CRUD DML & DDL both in SQL data model (DM): entity, attribute, relationship, constraint entity relationship diagram (E-R diagram) (ER diagram) unified modeling language (UML): for OODM relation schema R=(A1,…) relation instance r(R) attribute usually atomic superkey K: subset of row tuple, bijection with row candidate key = minimal superkey union compatibility/ type compatibility data control language (DCL) & transaction control language (TCL): management for where clause filter: like, between, some, all for where clause standalone: exists, unique union, intersect, except (all) boolean type: true, false, unknown domain constraint on single relation: referential integrity: named constraint: assertion function call user-defined type (UDT): distinct type/ structured data type create table extension stored function/ stored procedure: stored function: deterministic/ non-deterministic (default) procedural SQL: trigger: error handling: cursor: entity-relationship model (E-R model): entity set, relationship set, attribute lossy decomposition: r⊂ΠR1(r)⋈ΠR2(r) (lossless: =) functional dependency (FD) X→Y: can uniquely identify Y by X Boyce-Codd Normal Form (BCNF): α→β,α,β⊆R⇒β⊆α or α superkey mean time to failure MTTF RAID: data striping vs mirror fixed-length record: deletion: free list for deletion variable-length record: offset + length, null bitmap heap: free-space map (first/second-level) sequential file: linked list of sorted search key multi-table clustering file organization: faster some join & slower some other, cluster key table partitioning buffer manager: buffer replacement strategy (LRU/MRU), pin/unpin, shared/exclusive lock column-oriented storage: less IO, better caching & compression dense index: index for each search-key value sparse index: index for some search-key value deadlock prevention: recovery system: encryption one-time pad bitwise XOR: bitwise exclusive or ∧ caret (\wedge) Boolean algebra ⎩⎨⎧a∧a=0a∧0=a(a∧b)∧c=a∧(b∧c) pseudo-random pseudo random bits: linear feedback shift register LFSR bit, cell, register, seed, initial state [N, k] LFSR: N-bit register with taps at N and k seed cannot be 0, at most 2N − 1 fills in N-bit register k counts from 1… Java programming virtual terminal vs integrated development environment IDE Java program shell: public class program_name { public static void main(String[] args) {//begin body }//end } piping and redirecting—|, <, > | is two directional < and > take one’s output as the other one’s input, following the arrow for multiple piping, “parenthesis” are added as if convert to type: Type.parseType(String_needs_converting) cast: (type) what_needs_converting does not convert type when divide convert to the more precise type when operate between two types int Integer.MAX_VALUE + 1 = Integer.MIN_VALUE =-Integer.MIN_VALUE remainder % char—uses ASCII, can be operated as if int, cast to int if not assigned to a char a~z, A~Z, 0~9 are together in ASCII, this can give a range char array converts to String when called String.valueOf() StringBuilder—better way to concatenate strings concatenate strings: .append double special values: Infinity, NaN math library type operation(type variable, …) type: double, int, long, float operation: abs, max, sin, asin, exp, log, pow, round, random, sqrt variable: a, PI, E boolean operation and && or || not ! xor (!a && b) || (a && !b) (a && b) == (!(!a || !b)) naming camel case: moreThanOneWord constant: MORE_THAN_ONE_WORD loop if if (boolean) {execution1;} else {execution2;} while while (boolean) { execution1 } for for (initialization_statement1, …; boolean; increament_statement1, …) { execution } switch switch (variable) { case literal -> operation; … } do while do {execution1} while (boolean); at least does once array type[] name;//declare array name of type type new double[length];//an array of length length, all 0.0s name[index] = literal//refer to array name by index index {literal1,…}//an array .length copy: copy the length and every single value assign: refer to the same array two dimentional array double[][] a a[0].length decide all a[i].length sorting: Arrays.sort() output: System.out.println(output)//print output \n System.out.print(output)//print output System.out.printf(“string1%w1.p1c1… string2”,output1,…)//print string1 output1… string2 with field width w, precision .p, and conversion code c negative w counts from the right. for c, d: decimal, f: floating, e: scientific, s: string, b: boolean must have % and c this is the standard string format in Java https://docs.oracle.com/javase/tutorial/java/data/numberformat.html DecimalFormat df = new DecimalFormat(“##.##”); StdDraw line(,,,), point(,), setCanvasSize(w,h), setXscale(x0,x1), setPenRadius(), shape, filledShape text(x,y,String) enableDoubleBuffering, show, clear, pause, System.exit(0); makes the program stop input: command-line argument: args[No.] //input in modyfy run configuration - program argument StdIn: isEmpty, readType, readAllTypes, hasNextChar, hasNextLine*//Type = Int, Double, Boolean, String, or Line* scanner:import java.util.Scanner; //import scanner Scanner sc= new Scanner(System.in); //define a scanner Type variable = sc.nextType();//let variable be input make an explanation /* function static void can change content of array as they are mutable, cannot change value of primitive type as they are immutable import static imports the method so that its prefix can be ommited don’t use applications programming interface API private static long[] makes the array global call: ClassName.method(object) final makes the variable immutatble recursion recurrence relation T(n)={2T(n−1)+1,n>11,n=1=2n−1(n>0) Gray codes: show all n-digit binary numbers by only changing 1 digit at a time start from i = 0, change the ith digit from 0 to 1 and do everything done to the first i − 1 digits backwards recursive graph fractal Brownian bridge: random path to bridge to points midpoint displacement method: take the midpoint and Gaussianly-randomly move it, and set it as one of the end point volatility: initial variance of the Gaussian dist Hurst exponent H: divide the variance of the Gaussian dist by 22H H is ½ for Brownian bridge fractal dimension: 2 − H plasma clouds exponential waste: calculate the same thing again dynamic programming store calculated results in array top-down solution make global array and recursion bottom-up solution make local array and loop longest common subsequance LCS problem for x[i…m) and y[j…n), opt[i,j]=⎩⎨⎧0,i=mandj=n{opt[i+1,j+1]+1,x[i]=y[j]max([i+1,j],[i,j+1]),x[i]=y[j],otherwise percolation Monte Carlo simulation: apply randomness to estimate unknown qualities by simulation 2-dimentional n by n sites blocked site, open site; full site, empty site site vacancy probability p vertical percolation: go straight down object reference type vs primitive type create object: ClassName reference = new ClassName(argument1,argument2,…) Color import java.awt.Color; Color(r,g,b) col.getBlue() Luminance Y = 0.299r + 0.587g + 0.114b reference type aliasing—pass by value orphaned object: don’t have a reference instance method: data-type operation for reference type reference.method(argument) create data-type public class ClassName { … *// constructor, with the same name as the class instanceVariable1 = parameterVariable1; /* use short name for instance variables for convenience use full name for parameter variables as the client can see them immutability: final makes the value of primitive type and the identity of the object of reference type immutable defensive copy: copy each of the values of the parameter variables to avoid aliasing interface type public interface interfaceName { abstract methods: includes nothing implements public class ClassName implements interfaceName{ the class must have methods coresponding to all abstract methods in the interface interface enables a method name to call different methods according to the object functional interface: single method lambda expression (parameterVariable1,…) -> stuffToReturn; implementation inheritance subclass general equals method public boolean equals(Object x) general hashcode method public int hashCode() { wrapper type: built-in reference type corresponding to primitive type autoboxing and unboxing performance running time: time when finish minus time when start (in millisecond) System.currentTimeMillis() System.nanoTime() algorithm tilde notation ∼ f(n) order of growth empirical analysis doubling hypothesis: double the input size, study the running time mathematical analysis memory usage: bool—1; char—2; int, float, padding—4; long, double, object reference—8; object overhead—16; array overhead—24 (b) big-O notation O(g(n)): running time |f(n)| < c|g(n)| ∀ n > n0 binary search exception filter—determine whether a name is in a sorted array collection operations—create, insert, remove, test emptiness item (pushdown) stack last-in-first-out (LIFO) operations—push, pop, test emptiness queue first-in-first-out (FIFO) linked-list node—content + next node iterator resizing array—double the size if no enough space, halve it if less than ¼ used avoid loitering—set the variable to null generic type generic class—a class that can be fed with different data types public class GenericClass<typeParameter> type parameter—a name for the data type to be used in this class by the client constructor public GenericClass(typeParameter parameter**) … using the constructor GenericClass<String> variableName = new GenericClass<String>(“blah”)**; symbol table key-value pairs—cannot be null get(key)/ put(key,val) contains(key) hashing binary search tree BST put get regular expression minimum RE concatenate union—| or {…,…} closure—x* grouping—() java REs String.matches(regex) Pattern pattern = Pattern.compile(RE) Matcher matcher = pattern.matcher(input) matcher.find() matcher.matches() shorthands wildcard—. beginning of line—^ end of line—$ range—[char1-char2] negation—^x closure operator extension one or more—x+ zero or one—x? length—{num} between—{low, high} deterministic finite-state automata DFA finite number of states transitions tape reader universal virtual DFA input format—number of states \s alphabet \s one line of each state nondeterministic finite-state automata DFA pushdown automaton PDA Turing machine universality computability David Hilbert’s questions: mathematics is consistent, complete, decidable? Kurt Gödel: axiomatic system cannot be both consistent and complete Alan Turing: mathematics is not complete halting problem—reductio ab absurdum totality problem, equivalence problem Henry Rice: determining whether a program has any given functional property is unsolvable intractability—no algorithm exists to efficently solve polynomial-time algorithm—bounded above by a**nb—efficient exponential-time algorithm—bounded below by 2a**nb satisfiability problems search problems—the problems with solutions that polynomial-times algorithms can check NP—the set of all search problems NP-complete—every search problems polynomial-time reduces to it P—the set of all search problems that can be solved in poly-time representation binary—0b convert to binary and print—System.out.print(String.format(“%16s”, Integer.toBinaryString(number)).replace(’ ’, ‘0’)); hexadecimal—0x read hexadecimal—Integer.parseInt(input, 16) twos-complement—−x = ∼ x + 1 ∼ x—complement, flip all bits arithmetic shift right—add 1 to begin if negative logical shift right—always add 0 masking—get the bits wanted by bitwise or with a mask that only has 1s in the corresponding bits TOY machine 16 bit memory memory location M[00] ∼ M[F**F] memory dump M[F**F]—standard IO arithmetic and logic unit ALU computation engine register 16 words R[0] ∼ R[F] connect ALU & memory R[0] is always 0 program counter PC 8 bit memory address of next instruction instruction register IR current instruction fetch-inctemenat-execute cycle fetch from memory to IR increment PC execute as IR says and change memory or PC or calculate instruction format (16 bit) type RR—opcode + 3 registers (destination + 2 sources) arithmetic instruction—1, 2 for addition, subtraction logical instruction—3~6 type A—opcode + register (destination) + memory address (8 bits) memory address instruction—7, A, B memory instruction—8, 9 flow of control instruction—C~F branching halt—0 loop with branching self-modifying code TOY VM booting—using a small program, load input into M[00] ∼ M[F**F] to start the machine assembler—run assembly-language assembly-language—use symbolic name for operation and addresses interpreter—directly execute instruction written in a programming language compiler—transform source code into another computer language bootstrapping—use one virtual machine to create more powerful machine Boolean logic Boolean function and notation NOT(x) ¬x x’ AND(x,y) x∧y xy OR(x,y) x∨y x+y XOR(x,y) x⊕y x⊕y truth table Boolean arithmic absorption—x(x + y) = x + xy = x DeMorgan’s laws—(xy)’ = x’ + y’ (x + y)’ = x’y’ Boolean function of more variables majority—MAJ(…) = 1 iff more 1 than 0 odd-parity—ODD(…) = 1 iff number of 1 is odd sum-of-products representation—OR(AND(each of the cases that contribute to 1)…) circuit—interconnected network of wires, power connections, and control switchs that take value from input wire and output via output wire control switch relay—control line with electromagnet vacuum tube transistor gates recursive definition—a gate or a network of circuits that are connected with wires combinational circuit—no loop output depend only on input decoder—tell which output wire to activate n input, 2n output demultiplexer (demux)—add another input to decoder add AND gate to decoder output act as 1-to-2n logical switch multiplexer (mux)—AND all corresponding input and OR all of those n + 2n input wires, 1 output wire act as 2n-to-1 logical switch sum-of-product circuit ALU module—parts of the computer bus—group of wires to transmit data between modules sequential circuit buzzer flip-flop set reset register memory bit-slice design—same circuit for demux to write and mux to read clock tick-tock clock speed fetch and execute complementary run and halt input CPU control fetch fetch write execute execute write P(B)=P((X,Y∈B)) (X,Y)∈DP((X,Y)∈C)=area(D)area(C)for C⊂D has uniform joint distribution use areas to find probability, always geometric method first f(x,y)≥0∀(x,y)∈R∬R2f(x,y)dxdy=1 P(X=x∣Y=y)=P(Y=y)P(X=x,Y=y) (S,A,P)S—sample spaceA—eventsP—probability P(Y=y)=x∑P(Y=y∣X=x)P(X=x) E(Y∣A)=y∑yP(Y=y∣A) for random variable Y and event A E(X+Y∣A)=E(X∣A)+E(Y∣A) E(Y)=i=1∑nE(Y∣Ai)P(Ai)if Ai is partition of outcome space⇒average conditional probability P(B)=i=1∑nP(B∣Ai)P(Ai) E(Y)=x∑E(Y∣X=x)P(X=x) aka the formula for E(Y) by conditioning on X E(Y∣X) is a function g(x) of x E(Y)=x∑E(Y∣X=x)P(X=x)=x∑g(x)P(X=x)=E(g(x))⇒E(Y)=E[E(Y∣X)] E.g.E(X=x)=f(x)⇒E(X)=f(X) fX∣Y(X∣Y=y)=∫Rf(x,y)dxf(x,y)=fY(y)f(x,y) if fY(y)>0 ⇒f(x,y)=fX∣Y(x∣Y=y)fY(y) P(x∈D∣Y=y)=∫DfX∣Y(x∣Y=y)dx P(A∣X=x)=P(X∈dx)P(A,X∈dx) X∈dxX fall in small interval near x fY(y∣X=x)=fX(x)f(x,y) ⇒∫fY(y∣X=x)dy=1 z=f(x,y) fX(x)=∫f(x,y)dy f(x,y)=fX(x)fY(y∣X=x) P(A)=∫P(A∣X=x)fX(x)dx E[g(Y)∣X=x]=∫g(y)fY(y∣X=x) dyE[g(Y)]=∫E[g(Y)∣X=x] fX(x)dx var(X∣Y)≡E[(X−E[X∣Y])2 ∣Y]=E[X2∣Y]−(E[X∣Y])2 Var(X)=E[Var(X∣Y)]+Var(E[X∣Y]) E[i=1∑NXi]=E[X] E[N]Var(i=1∑NXi)=E[N] Var(X)+(E[X])2 Var(N) take the wanted parameter θ as a random variable with distribution prior distribution take a sample and use it to update the posterior distribution want Θ in Xi∼distribution(Θ) let Θ∼priorDistribution⇒fΘ(θ) with observations (measurements) X=(X1,…,Xn) know conditional distribution fX∣Θ(x∣θ) ⇒ posterior distribution fΘ∣X(θ∣x)=fX(x)fX∣Θ(x∣θ)fΘ(θ)=∫fX∣Θ(x∣θ′)fΘ(θ′)dθ′fX∣Θ(x∣θ)fΘ(θ) i.e. posterior = likelihood ⋅ prior / evidence the posterior distribution is of the same type as the prior distribution 1A(x)={10x∈Ax∈/A⇒E[X∣A]=P(A)E[X⋅1A(X)] select θ^ so that posterior probability fΘ∣X(θ∣x) is maximized ⇔ maximize fX∣Θ(x∣θ)fΘ(θ) select point estimate θ^ to represent best guess of Θ select an estimator/function with least squared error for θ, want θ^ such that E[(θ−θ^)2] is minimized ⇒ choose θ^=E[θ] with observations X=x, want θ^ to minimized E[(θ−θ^)2∣X=x] ⇒ choose θ^=E[θ∣X=x] the linear form of LMS denote Hi as event {Θ=θi} choose Hi iff it has a larger posterior P(Θ=θi∣X=x) or equivalently pΘ(θi)fX∣Θ(x∣θi) coin 1 gets head by p1, coin 2 by p2>p1 toss an unknown coin, coin θ, n times and get k heads choose θ=1 iff P(θ=1∣X=k)≥P(θ=2∣X=k) θ~=θ−θ^ P(Xn+1=j∣X0=a,⋯,Xn=i)=P(Xn+1=j∣Xn=i) does not change p11⋮pm1⋯⋱⋯p1m⋮pmm pabn=P(Xm+n=b∣Xm=a) pijn+m=pijn⋅pijm⇒P(Xm+n=j∣X0=i)=k∈S∑P(Xm=k∣X0=i)P(Xm+n=j∣Xm=k) P(Xn=k)=x∑π0(x)pxkn site i leads to j, i.e. i⟶j ∃ n≥0,P(Xn=j∣X0=i)>0 i⟶j,j⟶i⇒i and j are in the same class can not go to another class class C(C⊆S) is closed, i∈C,i⟶j⇒j∈C let A⊆S, hitting time of A, TA: TA={min{n>0:Xn∈A}∞∃ n, Xn∈A∀ n, Xn∈/A⇒pxyn=m=1∑nPx(Ty=m)Pyyn−m ρxy=Px(Ty<∞)=P(Ty<∞∣X0=x) site y is recurrent if ρyy=1 transient if ρyy<1 N(y)=n=1∑∞1y(Xn) g(x,y)=Ex[N(y)]=n=1∑∞nPx(N(y)=n)=n=1∑∞pxyn S is finite ⇒∃ at least 1 recurrent state y y is recurrent, y⟶z⇒z recurrent, y,z∈ the same closed class ρyy=ρyz=ρzz=ρzy=1 you always get absorbed in a closed class of recurrent states after moving around ρ{x1,⋯,xk}(x) μx=y∑(1+μy) pxy=1+y∑μypxy let {Xn} be a Markov chain with state space S a distribution π on S is said to be stationary if x∈S∑pxyπ(x)=π(y)∀ y∈S ⇒ the distribution stay the same x∈S∑pxynπ(x)=π(y)∀ y∈S,n∈N∗ ∀ initial distribution π0, the distribution after infinite steps: n→∞limP(Xn=y)=π(y)∀ y∈S ⇒ the distribution will be independent on π0 build linear system with ⎩⎨⎧π(1)=∑x∈Spx1π(x)π(2)=∑x∈Spx2π(x)⋮π(n−1)=∑x∈Spx(n−1)π(x)∑x∈Sπ(x)=1⇒p11−1p12⋮p1(n−1)1p21p22−1p2(n−1)⋯⋯⋯⋯pn1pn2pn(n−1)1π1π2⋮πn=0⋮01 n→∞limpxxn=0or=0 GCD of all n such that pxxn>0 x⟷y⇒dx=dy n→∞limpxyn=π(y) Nn(y)=m=1∑n1y(Xn)nNn(y)→Ey[Ty]1 my=Ey[Ty] π(y)=Ey[Ty]1 the stationary distribution is unique and n→∞limn∑m=1npxyn=π(y) consider a population of individual: P(Z=i)=pi n→∞limP(Xn=0)=? ΦX(t)=E[tX]=x∑P(X=x) tx⇒ΦX′(t)=x∑xP(X=x) tx−1⇒⎩⎨⎧μ=E[X]=ΦX′(1)ΦX(0)=p0ΦX(1)=1 ρ=ρ1,0=k∑P(Z=k)ρk,0=k∑P(Z=k)ρk=Φ(ρ) if μ=0, then p0=1⇒ρ=1 if 0<μ≤1,P(t=0)>0, then ρ=1 if μ>1, then ∃ 0<ρ<1 S={0,1,⋯,d}pxy=⎩⎨⎧qxrxpxy=x−1y=xy=x+1q0=0,qx>0pd=0,px>0 (0<x<d) with 0≤a<x<b, find u(x)=Px(Ta<Tb) γx=p1p2⋯pxq1q2⋯qxγ(0)=1u(x)=qxu(x−1)+rxu(x)+pxu(x+1)⇒γxu(x+1)−u(x)=γx−1u(x)−u(x−1)⇒u(x)=∑y=ab−1γx∑y=ab−1γy1−u(x)=∑y=ab−1γy∑y=ax−1γy u(x)=1−(pq)a−(pq)b(pq)a−(pq)x P1(T0<∞)=n→∞limP1(T0<n) because {T0<n} is non-decreasing ⇒P1(T0<∞)=n→∞lim(1−∑y=0n−1γy1) ⇒ a birth and death chain is recurrent iff ∑y=0∞γy=∞ if transient ρx0=∑y=0∞γy∑y=x∞γy=0 if recurrent π(k+1)=q1q2⋯qk+1p0p1⋯pkπ(0) stationary distribution exists iff ∑k=1∞q1q2⋯qk+1p0p1⋯pk<∞ ⇒ positive recurrent else, null recurrent ∣V∣ d(u) Pxy={d(x)10y is adjacent to xotherwise π(x)=αd(x) where α=x∑d(x) birth and death process S consist of all points in Rd with integer coordinate n!≈2πn nne−n when n is large pxx2n=(2nn)p2n(1−p)2n {=∞<∞p=21otherwise P(X(s+t)=j∣X(s)=i,X(u)=xu)=P(X(s+t)=j∣X(s)=i)∀ u, 0≤u≤s P(X(s+t)=j∣X(s)=i)=P(X(t)=j∣X(0)=i)=pij(t) pij(s+t)=k∈S∑pik(s)pkj(t) the time τi staying at state i before changing τi∼Exp(qi) prove P(τi>s+t∣ τi>s,X(0)=i)=P(τi>t∣X(0)=i) ⇒ the holding time is memory-less {Yi} consisting of the state the continuous Markov chain visit pij=P(Yn+1=j∣Yn=i)=P(Y1=j∣Y0=i) before jumping from i to j, holding time ∼Exp(qij) j∑qij=qipij=qiqij go to the earliest state p~=0q2q21⋮q1q120⋮⋯⋯⋱ derivative at 0 pij′(0)={qij−qii=ji=j pij′(t)=k=i∑qikpkj(t)−qipij(t)=k∑Qikpkj(t) pij′(t)=k=j∑pik(t)qkj−qjpij(t)=k∑pik(t)Qkj Q=−q1q21⋮q12−q2⋮q13q23⋮⋯⋯⋱Qij=pij′(0)={qij−qii=ji=j square matrix A etA=n=0∑∞n!(tA)n∂t∂etA=AetA p(t)=p11(t)⋮pn1(t)⋯⋱⋯p1n(t)⋮pnn(t)p′(t)=Qp(t)=p(t)Q⇒p(t)=etQ for Q=SDS−1,λi the eigenvalues p(t)=SetDS−1(etD)ij=⎩⎨⎧etλi0i=ji=j π=πp(t)⇔x∈S∑pxy(t)π(x)=π(y)∀ y∈S,t πQ=0⇔i∑π(i)Qij=0∀ j π(y)=t→∞limpxy(t)∀ x let h(j)=π(j)qj Φ(j)=∑kh(k)h(j) ∃ t>0, ∀ i,j, pij(t)>0⇔ irreducible if ∃ t0>0, pij(t0)>0, then ∀ t>t0, pij(t)>0 if the chain is finite or positive recurrent, if the chain is irreducible and S is positive recurrent, then ∃ unique stationary distribution and limiting distribution π {λx=qx,x+1μx=qx,x−1 μx=0pxy(t)=λy−1∫01e−λy(t−s)px,y−1dspx,x+1(t)=⎩⎨⎧λx+1−λxλx(e−λxt−e−λx+1t)λxte−λxtλx+1=λxλx+1=λx x=1∑∞λ1λ2⋯λxμ1μ2⋯μx={∞<∞recurrenttransient continuous time Stochastic process {N(t), t≥0} with parameter (rate) λ>0, and N(t1)−N(t0),⋯,N(tn)−N(tn−1) where t0<t1<⋯<tn ⇒ N(t1+s1)−N(s1)+N(t2+s2)−N(s2)∼Exp((t1+t2)λ) the time between the occurrence of the (i−1)th event and the ith Ti∼Exp(λ) the time of the occurrence of the ith event Si=j=1∑iTjSi∼Gamma(i,λ) sum of i.i.d. Exp(λ) N1(t)+N2(t) is Poisson process with λ1+λ2 N(t)=N1(t)+N2(t) with every occurrence go to N1(t) with probability p and go to N2(t) with probability (1−p), then N1(t) is Poisson process with pλ, N2(t) is Poisson process with (1−p)λ, the distribution between them is binomial P(S1<s∣N(t)=1)=tss<t let Ui be i.i.d. ∼U(0,t) let U(i) denote them in ascending order f(S1,S2,⋯,Sn)=tnn! let Si be occurrence time for N(t) given N(t)=n P(s1<S1<s1+δ,⋯∣N(t)=n)=P(N(t)=n)P(s1<S1<s1+δ,⋯,N(t)=n)=P(N(t)=n)P(N(t−nδ)=0)[P(N(δ)=1)]n=tnδnn!f(S1,⋯)=δ→0limδntnδnn!=tnn! f(x) is o(h) if x→0limxf(x)=0 for Stochastic process X(t),(t≥0) μX(t)=E[X(t)] rX(t,s)=Cov(X(t),X(s)) all the property of covariance apply to covariance function ∀ τ, Y(t)=X(t+τ),⎩⎨⎧μY=μXrY(s,t)=rX(s,t)⇔⎩⎨⎧E[X(t)]=E[X(t+τ)]∀ τE[X(t)X(s)]=f(t−s) Stochastic process X(t),(t≥0) a1X(t1)+a2X(t2)+⋯+anX(tn)∼N∀ 0≤t1<t2<⋯<tn, a1,a2,⋯,an∈R, n=1,2,⋯⇒((X(t1),⋯)) is multivariate normalGaussian processes with the same μ,σ,r(s,t)⇔X(t)=dY(t) random variable X1,X2,⋯,Xk a1X1+a2X2+⋯∼Normal∀ a1,a2,⋯∈R f(X1,X2,⋯)=(2π)k∣detV∣1exp{−2(Xˉ−μˉ)TV−1(Xˉ−μˉ)} continuous time Stochastic process W(t),(t≥0) rW(s,t)=min{s,t} sum S(n) of n random walk X1,⋯,Xn S(n)=⎩⎨⎧0∑i=1nXin=0n>0 is discrete, but S standard Brownian motion ⇔ Gaussian process X(t) that standard Brownian motion W(t)⇒ the following are standard Brownian motion (t≥0) first time reaching a Ta=min{t:W(t)=a} X∼Beta(a,b) f(x)={B(α,β)xα−1(1−x)β−100<x<1otherwise B(α,β)=∫01θα−1(1−θ)β−1dθ=(α+β−1)!(α−1)!(β−1)! X∼Exp(λ) fX(x)=f(x)={λe−λx0x>0∫0∞λe−λxdx=1P(X≥t)=e−λtMX(t)=λ−tλE[X]=MX′(0)=λ1Var(X)=M′′(0)−(M′(0))2=λ21 P(X>t+s ∣ X>t)=P(X>s) fSn(x)=λe−λx(n−1)!(λx)n−1 proved by induction P(X1<X2)=λ1+λ2λ1 P(Xi=min)=∑iλiλi V∼Exp(i∑λi) X∼Gamma(α,λ) f(x)={Γ(α)λαxα−1e−λx0x≥0x<0E[X]=λαVar(X)=λ2α X∼Poisson(λ) pX(k)=e−λk!λkE[X]=Var(X)=λ μx,μy,σx>0,σy>0,−1<ρ<1f(x,y)=2πσxσy1−ρ21e−2(1−ρ2)1[(σxx−μx)2+(σyy−μy)2−2ρσxσy(x−μx)(y−μy)] X∼N(μx,σx2)Y∼N(μy,σy2) (X∣Y=y)∼N(μx+ρσyσx(y−μy),σ2=σx2(1−ρ2))(Y∣X=x)∼N(μy+ρσxσy(x−μx),σ2=σy2(1−ρ2))ρ=Corr(X,Y)=σxσyCov(X,Y)X,Y independent⇔Cov(X,Y)=0 ⌊x⌋= the greatest integer smaller than x n→∞lim(1+Br)A=erifn→∞limB′A′=1 sequence index variable explicit formulas recurrence relation arithmetic sequences geometric sequences bounded above/ below (un)bounded convergent ⇒ bounded bounded, monotone ⇒ convergent series convergent vs divergent series harmonic sum ∑n=1∞n1=∞ p series: ∑n=1∞np1⎩⎨⎧convergent,p>1divergent,p≤1 ∑n=1∞n21=6π2 ∑n=k∞an convergent ⇒ limn → ∞an = 0 limn → ∞an ≠ 0⇒ ∑n=k∞an divergent tests divergence test partial sum geometric: |q| < 1 integral: f, f(n) = an, continuous, ultimately decreasing 🡪 same comparison: limn→∞bnan=c⇒⎩⎨⎧same,c=0anconvergent,c=0,bnconvergentandivergent,c=∞,bndivergent scaling with ∑n=1∞np1(p>1) or ∑n=1∞n1 alternating: |an + 1| − |an| < 0, ∀ n > n0, limn → ∞an = 0⇒ convergent absolute convergence: ∑∣an∣ convergent conditional convergence: ∑an convergent, but ∑∣an∣ divergent ∑∣an∣ convergent ⇒∑an convergent ratio: limn→∞anan+1⎩⎨⎧<1⇒∑anabsolutelyconvergent>1⇒∑andivergent root: limn→∞n∣an∣{>1⇒divergent<1⇒convergent uniform convergence: fn(x) converge to f(x): fn(x) → f(x) as n → ∞ ∃N(ε) independent to x, ∀n > N, |fn(x) − f(x)| < ε⇔ uniform convergence: fn ⇉ f dominated convergence: fi(x) → f(x) as x → ∞, ∃g, ∀x, i, fi(x) < g(x) ⇒ limi → ∞∫abfi(x)d**x = ∫abf(x)d**x power series: ∑n=0∞cn(x−a)n center a, coefficient cn ⇒ x = a⇔ convergent or ∀x ∈ ℝ, convergent or ∃R > 0, |x − a| < R⇒ convergent, |x − a| > R⇒ divergent method: ρ=limn→∞cncn+1, radius of convergence R=ρ1 |x − a| < R⇒ absolutely convergent, |x − a| > R⇒ divergent from ratio test Bessel function of the first kind: J0=∑n=0∞22n(n!)2(−1)nx2n basic formula: 1−x1=∑n=0∞xnfor∣x∣<1 plug out terms, create (x − a), see something as 1 − x, do it, range Taylor and Maclaurin Series: f(x)=∑n=0∞n!f(n)(a)(x−a)n 1−x1=∑n=0∞xnfor∣x∣<1 ln(1+x)=∑n=0∞(−1)nn+1xn+1 for |x| < 1 −ln(1−x)=∑n=1∞nxn ex=∑n=0∞n!xn sinx=∑n=0∞(−1)n(2n+1)!x2n+1 cosx=∑n=0∞(−1)n(2n)!x2n tan−1x=∑n=0∞(−1)n2n+1x2n+1 for |x| < 1 (1+x)k=∑n=0∞(kn)xn (x+y)k=∑n=0∞(kn)xnyk−n=∑n=0k(kn)xnyk−n differentiation and integration: f′(x)=∑n=1∞n!f(n)(a)((x−a)n)′=∑n=1∞n!f(n)(a)n(x−a)n−1, ∫f(x)dx=∑n=0∞n!f(n)(a)∫(x−a)ndx=∑n=0∞n!f(n)(a)n+1(x−a)n+1+C may lose boundary of convergence Cauchy product: f(x)g(x)=∑n=0∞anxn∑n=0∞bnxn=∑n=0∞cnxn, cn=∑k=0∞akbn−k (kn)=n!(k−n)!k!=n!k(k−1)…(k−n+1) probability sample space S: nonempty set collection of events: subset of S: P(S) probability measure P(A) ∈ [0, 1], P(⌀) = 0, P(S) = 1 P(⋃i=1nAi)=∑i=1nP(Ai) if Ai… are mutually disjoint symmetric difference: AΔB = (A ∖ B) ∪ (B ∖ A) complement: Ac : = S ∖ A disjoint: A ∩ B = ⌀ ⇔ A ⊆ Bc ⇒ A ∖ B = A ∩ Bc De Morgan Laws: (A ∪ B)c = Ac ∩ Bc, (A ∩ B)c = Ac ∪ Bc A = B ⇔ A ⊆ B, B ⊆ A cardinality: |A|= # of elements in A inclusion-exclusion principle: |A ∪ B| = |A| + |B| − |A ∩ B|, |A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C| partition of the sample: Bi, disjoint, S = B1 ∪ B2 ∪ … P(A) = P(A ∩ B1) + P(A ∩ B2) + … P(A ∖ B) = P(A ∩ Bc) = P(A) − P(B) not necessarily disjoint: P(A1 ∩ A2 ∩ …) ≤ P(A1) + P(A2) + … independent: P(A ∩ B) = P(A)P(B) for two events, and P(A1 ∩ … ∩ Ai) = P(A1)…P(Ai) for more events uniform probability: P(A)=∣S∣∣A∣ continuous uniform distribution: P([x,y])=b−ay−x, [x, y] ⊂ [a, b] combinatorics: length k, n symbols ⇒ nk ordered subset ⇒(n−k)!n!=nk binomial: (nk)=k!(n−k)!n!,or0ifk<0ork>n (nk)=(n−1k)+(n−1k−1) (2nn)=∑k=0n(nk)2 generating function: f(x)=∑n=0∞anxn exponential generating function: f(x)=∑n=0∞annxn Bayes’ theorem: P(B∣A)=P(A)P(B∩A)=P(A)P(B)P(A∣B) random variable X(bla) = 2 discrete: ∑x∈RP(X=x)=1 PDF probability mass/ density function: pX(x) = P(X = x) P(X=x)=∑iP(Ai)P(X=x∣Ai) law of total probability: P(A)=∑x∈RP(X=x)P(A∣X=x) Bernoulli: pX(0) = q, pX(1) = p Binomial: pX(k)=(nk)pkqn−k rX(t) = (t**p + q)n geometric: pX(k) = p**qk E(X)=pq,var(X)=p2q negative-binomial: pX(k)=(r+k−1k)prqk=(−1)k(−rk)prqk Poisson: pX(k)=e−λk!λk E(X) = var(X) = λ hypergeometric: pX(k)=(nN)(Mk)(N−Mn−k) continuous distribution: pX(k) = 0 ∀ x ∈ ℝ,∫−∞∞f(x)d**x = 1 uniform: f(x)={b−a1,a≤x≤b0 var(X)=12(b−a)2 exponential: f(x)={λe−λx,0≤x0,x<0 P(x ≥ t) = e−λ**t E(X)=λ1,var(X)=λ21 rX(t)=λ−lntλifλ−lnt>0 normal: f(x)=n(x;μ,σ2):=2πσ1e−2σ2(x−μ)2 Φ(x)=2π1∫−∞xe−2t2dt gamma: f(x)={Γ(α)λαxα−1e−λx,x≥00,x<0 Γ(x)=∫0∞tx−1e−tdt={(x−1)!,x∈N∗π,x=21 Γ(x + 1) = x**Γ(x) chi-squared: Χ2(n) is the distribution of Z = X12 + … + Xn2 with Xi ∼ N(0, 1) Z∼Γ(2n,21) E(Z) = n fZ(z)=2πz1e−2z for z ≥ 0 if n = 1 t (or student): t(n) is the distribution of Z=nX12+…+Xn2X with Xi ∼ N(0, 1) T=nSnX−μ∼t(n−1) CDF cumulative distribution fun FX(x)=P(X≤x)={∑yi≤xfX(yi),discrete∫−∞xfX(t)dt,continuous joint distributions Y = h(X) if X discrete, then PY(y)=∑x∈h−1yPX(x) if X continuous, h(x) monotone where fX(x) > 0, then fY(y)=∣h′(h−1(y))∣fX(h−1(y)) marginal distribution: FX(x) = limy → ∞FX, Y(x, y) joint probability fun marginal PDF: pX(x)=∑ypX,Y(x,y) table joint density fun: f(x, y), ∫−∞∞∫−∞∞f(x, y)dxd**y = 1 marginal PDF fX(x) = ∫−∞∞fX, Y(x, y)d**y independent P(A∣B)=P(B)P(A∩B) pY∣X(y∣x)=∑zpX,Y(x,z)pX,Y(x,y)=pX(x)pX,Y(x,y) or f continuous: P(a≤Y≤b,X=x)=∫abfX(x)fX,Y(x,y)dy law of total prob: P((X, Y) ∈ B) = ∬BfX(x)fY|X(y|x)dxd**y independent random variables: ∀B1, B2, P(X ∈ B1, Y ∈ B2) = P(X ∈ B1)P(Y ∈ B2) expectation: E(X**Y) = E(X)E(Y) if X, Y independent variance: var(X) = E((X − μx)2) = E(X2) − E2(X), var(X + Y) = var(X) + var(Y) + 2cov(X, Y) covariance: cov(X, Y) = E((X − μX)(Y − μY)) = E(X**Y) − E(X)E(Y) cov(a**X + b**Y, Z) = a cov(X, Z) + b cov(Y, Z) correlation: corr(X,Y)=Sd(X)Sd(Y)cov(X,Y) i.i.d. mutually independent and identically distributed X1, …, Xn ∼ Bernoulli(p) i. i. d. ⇒ X1 + … + Xn ∼ B(n, p) X1 ∼ B(n1, p), X2 ∼ B(n2, p) ⇒ X1 + X2 ∼ B(n1 + n2, p) X1 ∼ Poisson(λ1), X2 ∼ Poisson(λ2) ⇒ X1 + X2 ∼ Poisson(λ1 + λ2) generating fun: rX(t) = E(tX) rX + Y(t) = rX(t)rY(t) kth moment: E(Xk) kth central moment: E((X − μx)k) moment-generating fun: MX(s)=E(eXs)=rX(es)=∑n=0∞n!E(Xn)sn MX(k)(0) = E(Xk) MX + Y(s) = MX(s)MY(s) X, Y independent, work for rX(t) uniqueness: if ∃s0 > 0, MX(s) < ∞ ∀s ∈ (−s0, s0), MX(s) = MY(s), then X, Y have the same distribution also work for r characteristic fun: cX(s)= E(eiXs) inequalities Markov’s:P(X≥a)≤aE(X) X ≥ 0, a > 0 Chebychev’s: P(∣X−μX∣≥a)≤a2var(X) a > 0 Cauchy-Schwarz: ∣cov(X,Y)∣≤var(X)var(Y) =, iff X−μX=var(Y)cov(X,Y)(Y−μY) if var(Y) > 0 law of large number sample sum: Sn = X1 + … + Xn for {Xi} i. i. d. sample average: Mn=X=n1(X1+…+Xn)=nSn WLLN weak: limn → ∞P(|Mn − μ| ≥ ε) = 0 for {Xi} same μ, var ≤ v, v < ∞, ε > 0 not necessarily i. i. d. Mn→Pμ SLLN strong: P{limn → ∞Mn = μ} = 1 for {Xi} i. i. d. Mn→a.s.μx central limit theorem: limn → ∞P(Zn ≤ x) = P(Z ≤ x) for {Xi} i. i. d, Z ∼ N(0, 1), Zn=nσSn−nμ=n(σMn−μ) Φ(x)=limn→∞P(Zn≤x)=limn→∞P(Sn≤nμ+xnσ)=limn→∞P(Mn≤μ+nxσ) Zn→DZ∼N(0,1) convolution: for X, Y independent, Z = X + Y discrete: pZ(z)=∑wpX(z−w)pY(w) continuous: fZ(z) = ∫−∞∞fX(z − w)fY(w)d**w different convergence Xn→PY: {Xn} converges in probability to Y, if limn → ∞P(|Xn − Y| ≥ ε) = 0 Xn→a.s.Y: {Xn} converges with probability 1 (almost surely) to Y, if limn → ∞P(Xn = Y) = 1 Xn→DY: {Xn} converges in distribution to Y, if limn → ∞P(Xn ≤ x) = P(Y ≤ x) ∀ x ∈ {x|P(Y = x) = 0} Xn→a.s.Y⇒Xn→DY⇒Xn→PY sample sample variance: Sn2=n−11∑j=1n(Xj−X)2 kth sample moment: xk point estimation the method of moments: μn=X,σ2=X2−X2 MLE maximum likelihood estimation: make L big as possible maximum likelihood fun: L(θ; x1, …, xn) = p(X1 = x1, …Xn = xn|θ) or f x1, …, xn independent: L(θ;x1,…,xn)=∏j=1np(Xj=xj∣θ) good point estimator unbiased: E(θ̂n) = θ consistent: θn→Pθ mean squared error E((θ̂n − θ)2) is small confidence interval: P(θ̂n− ≤ θ ≤ θ̂n+) ≥ 1 − α ⇐ P(a ≤ g(θ) ≤ b) ≥ 1 − α N, t: let −b = a, Χ2: let a=X2α2(n),b=X1−2α2(n) deal with Xi ∼ N(μ, σ2) S2=n−11∑i=1n(Xi−X)2 σ=σ0⇒σ0/nX−μ∼N(0,1)⇒μ Sn/nX−μ∼t(n−1)⇒μ μ = μ0: σ21∑i=1n(Xi−μ0)2∼X2(n)⇒σ σ2(n−1)S2∼X2(n−1)⇒σ hypothesis testing null hypothesis H0 type I/ II error: H0 true/ false level of significance α: probability for type I error power of the test 1 − β, β: probability for type II error p-value: probability that H0 is true reject if p-value < α for {Xi ∼ N(μ, σ02)} i.i.d., use σ0/nX−μ0∼N(0,1) for two samples, T=n1S12+n2S22(X1−μ1)−(X2−μ2)∼t(df),df=n1−11(n1S12)2+n2−11(n2S22)2(n1S12+n2S22)2 linear regression linear least squares {m=∑j=1n(xj−x)∑j=1n(xj−x)(yj−y)b=y−mx correlation coefficient: r=corr(X,Y)=[∑j=1nxj2−(∑j=1nxj)2][∑j=1nyj2−(∑j=1nyj)2]n∑j=1n(xjyj)−∑j=1nxj∑j=1nyj standard statistical model: Yj = β0 + β1xj + ej, j = 1, 2, …, n E(ej) = 0, var(ej) = σ2 intercept β0, slope β1, residual ej estimators ⎩⎨⎧β1=lXX1∑j=1n(xj−x)(yj−y)=lXX1(∑j=1nxjyj−nxy)β0=n∑j=1nxj2−(∑j=1nxj)2∑j=1nxj2∑j=1nyj−∑j=1nxj∑j=1nxjyj lXX=j=1∑n(xj−x)2=j=1∑nxj2−nx2 E(β̂0) = β0, E(β̂1) = β1 var(β0)=(n1+lXXx2)σ2,var(β1)=lXXσ2,cov(β0,β1)=−lXXxσ2 if ej ∼ N(0, σ2) i.i.d., then β̂0, β̂1 normal dist sum of squared errors SSE: Se2=∑j=1n(Yj−β0−β1xj)2 σ2=n−2Se2 E(σ2)=σ2,σ2Se2∼X2(n−2),σ2 independent of β̂0, β̂1 σknownσ/lXXβ1−β1∼N(0,1)⇒β1 σ/lXXβ1−β1∼t(n−2)⇒β1 σknownσ/n1+lXXx2β0−β0∼N(0,1)⇒β0 σ/n1+lXXx2β0−β0∼t(n−2)⇒β0 1 − α confidence interval for β̂0, β̂1 1 − α confidence interval for σ2 [X2α2(n−2)(n−2)σ2,X1−2α2(n−2)(n−2)σ2] y0∼N(β0+β1x0,(n1+lXX(x0−x)2)σ2) 1 − α confidence interval for y0 = E(ŷ0): [y0−t1−2α(n−2)σn1+lXX(x0−x)2,y0+t1−2α(n−2)σn1+lXX(x0−x)2] linear system: {a1x1+…anxn=b… equivalent ∼—same solution set consistent—have solution; inconsistent—no solutions coefficient matrix: [a1………an…] augmented matrix: [a1………an…b…] row operation replacement interchange scaling row equivalent—same after proper row operation ⇒ equivalent matrix entry—ai**j o**r ai, j o**r A[i, j]o**r Ai, j—the (i, j) entry of A leading entry—leftmost entry of a nonzero row (row) echelon form—step-like, leading entry going in to the right echelon matrix reduced echelon form—all leading entries are 1, entries below them are 0 an echelon form (REF) of… the reduced echelon form (RREF) of… only 1 for each matrix pivot position—position corresponding to the leading entry of its reduced echelon form pivot column—the column containing a pivot position pivot—nonzero number in a pivot position row reduction forward phase—make pivots the leading entries backward phase—eliminate non-pivots in pivot column partial pivoting—use greatest entry in a column as the pivot for programming solve linear system basic variable—corresponding to pivot column free variable—not so element(s): 0—inconsisten**t o**r 1—consisten**t o**r ∞ parametric description—⎩⎨⎧basicvariable1=f(freevariable1,…)… back-substitution—solve for the basic variables one by one dealing with echelon form existence and uniqueness theorem—consistent⎩⎨⎧1uniquesolution,ifnofreevariable∞solutions,ifatleastonefreevariable ⇔ no rightmost column as pivot column vector equation— x1a1 + … + xnan = b column vector solution for vector equation x1a1 + … + xnan = b equivalent to solution of [a1…anb] Spa**n{v1,⋯, vp}—subset of ℝn spanned (/generated) by v1,⋯, vp—the set of all linear combinations of v1,⋯, vp b ∈ spa**n{a1, ⋯, an} ⇔ A has a pivot in every row matrix equation Ax = b: Ax=[a1…an]x1⋯xn=x1a1+…+xnan =a1⋮amx=a1⋅x⋮am⋅x bi=∑k=1n(ai,k⋅xk) Ax = b solution equivalent to x1a1 + … + xnan = b existence of solutions—has solution iff b ∈ Spa**n{a1, ⋯, an} {v1, ⋯, vp} spans (/ generates) ℝm if Spa**n{v1, ⋯, vp} = ℝm identity matrix In, {Ij,j=1,j=1,⋯,nIj,k=0,j=k Inx = x ∀ x ∈ ℝn homogeneous linear system—can be writen as Ax = 0 opposite: heterogeneous at least 1 solution x = 0, a.k.a. trivial solution nontrivial solution iff ∃free variable parametric vector form— x**=su** + tv (s, t ∈ ℝ) parametric vector equation of the plane solution representation vector parameter vector form— x = p + tv (t ∈ ℝ) the equation of the line through p parallele to v span solution set of Ax = b— {w|w = p**+vh} i**f ∃solutio**n p ∀solutio**n o**f Ax** = 0 vh linear dependency of matrix columns a1⋯an of A are linearly independent ⇐ Ax = 0 has only the trivial solution dependent ⇐ ∃ c1⋯cn, c1a1 + ⋯ + cnan = 0 more vectors than entries in each vector ⇒ linearly dependent linearly dependent ⇔ at least one vector = linear combination of the others ⇒ ∃vj (j > i), vj is a linear combination of v1…vj − 1 if {v1…vp} is linearly dependent AND v1 ≠ 0 contains 0 ⇒ linearly dependent linear transformation transformation (/ function / mapping) T : ℝn → ℝm domain ℝn codomain ℝm image T(x) range: set of all images matrix transformation x ↦ Ax ⇔ ∀ u**,** v, c, d, T(cu + dv) = c**T(u) + d**T(v) kernal Nul A range Col A matrix operations diagonal entry—ai**i main diagonal diagonal matrix—square matrix with all non-diagonal entries 0 zero metrix 0 vector multiplication AB=A[b1⋯bp]=[Ab1⋯Abp] A(Bx)=(A**B)x row-column rule—(AB)ij=ai1b1j+⋯ainbnj=∑k=1naikbkj *AB=a1⋮am[b1⋯bp]=a1b1⋮amb1⋯⋱⋯a1bp⋮ambp A is right-multiplied by B, B is left-multiplied by A A, B commute—A**B = B**A power of matrix Ak=∏i=1kA A0 is the identity matrix transpose matrix AT (A**B)T = BTAT matrix inversion invertible—for n × n matrix A, ∃ n × n matrix C, C**A = In = A**C C: an inverse of A (non)singular matrix—(not) invertible 2 × 2 matrix A=[acbd]⇒A−1=detA1[d−c−ba] iff det A = a**d − b**c ≠ 0 ∀ invertible n × n matrix A, b ∈ ℝn, Ax = b ⇒ x**=**A−1b (A**B)−1 = B−1A−1 (AT)−1 = (A−1)T only product of invertible matrices is invertible for n × n matrix: A is invertible ⇔ row equivalent to In ⇔ linear transformation x**↦Ax** is one-to-one ⇔ ∀ b**∈ℝn, Ax** = b has solution ⇔ linear transformation x**↦Ax** maps ℝn ***o***nto ℝn ⇔ ∃ n × n matrix C, C**A = I ⇔ ∃ n × n matri**x D, A**D = I ⇔ AT is invertible ⇔ det A ≠ 0 ⇔ columns of A forms a basis of ℝn ⇔ Col A = ℝn ⇔ 0 is not an eigenvalue of A ⇔ det A ≠ 0 invertible linear transformation T : ℝn**↦ℝn—∃ S : ℝn↦**ℝn, ∀ x ∈ ℝn, S(T(x)) = x, T(S(x)) = x S—the inverse T−1 elementary matrix—obtained by performing a single elementary row operation on identity matrix roo**p A = E**A, where E = roo**p Im, roo**p is the single elementary row operation elementary matrix E is invertible, E−1 is another elementary matrix obtained by reversed row operation n × n matrix A is invertible iff row equivalent to In A−1 = roo**p In if In = roo**p A find A−1—row reduce [AI] and get [IA−1] LU fractorization—A = L**U A : m × n matrix L : m × m unit lower triangular matrix lower triangular matrix—entries above the main diagonal are 0s U : m × n echelon matrix implementation—Ax=b⇔Ly=bUx=y calculation: p. 145 determinant detA=∑j=1n(−1)i+jaijdetAij A is quare matrix usually use i = 1 the first row origin—the last pivot in echelon form of the matrix obtained without division Ai**j—obtained by deleting the ith row and jth column in A (i, j)-cofactor Ci**j = (−1)i + jdet Ai**j cofactor expansion across the ith row of A—detA=∑j=1naijCij triangular matrix A—detA=∏i=1naii the entries on the main diagonal row/ column operation replacement—same interchange—opposite scaling—scales the same way echelon form U by row replacements and r interchanges detA=⎩⎨⎧AisinvertibleAisnotinvertible(−1)r∏uii0whenwhen det A = det AT det A**B = (det A)(det B) proved using elementary matrices detA−1=detA1 if A is invertible det k**A = kndet A if A has n rows linearity property—linear transformation T(x)=det[a1⋯aj−1xaj+1⋯an] where only x is a variable Cramer’s rule—the unique solution of Ax = b has entries xi=detAdetAi(b) A is invertible n × n matrix, b ∈ ℝ Ai(b)=[a1⋯b⋯an] Laplace transfrom—converts system of linear differential equations to system of linear algebraic equations a way to find A−1—A(A−1)j = (In)j (A−1)ij=detAdetAi(In)j)=detACji adjugate/ classical adjoint of A—adjA=C11⋮C1n⋯⋱⋯Cn1⋮Cnn determinant as area or volume area of parallelogram of 2 × 2 matrix A is |det A| volume of parallelepiped of 3 × 3 matrix A is |det A| linear transformation T : ℝ2 → ℝ2 (ℝ3 → ℝ3) determined by 2 × 2 (3 × 3) matrix A ⇒ {are**a (volum**e) o**f T(S)} = |det A| ⋅ {are**a (volum**e) o**f S} vector space—10 axioms subspace—3 axioms needed 0 ∈ H H is closed under vector addition—u + v ∈ H H is closed under multiplication by scalars—cu ∈ H Spa**n{vi} is a subset of vector space V if vi ∈ V null space of A, Nul A—solution set of Ax = 0 —is a subspace of ℝn explicit discription—with the vectors that span Nul A basis—solve and represent with free variables dim Nul A= number of free variables column space of A, Col A—Spa**n{a1, ⋯, an} —is a subspace of ℝm Col A = {b : b**=Ax**, x ∈ ℝn} Col A = ℝm iff ∀b ∈ ℝm, ∃x, Ax = b basis—pivot columns dim Col A= number of pivot columns row space of A, RowA—span{a1⋯an} —is a subspace of ℝn = Col AT basis—pivot rows kernal/ null space of linear transformation T—T(u) = 0 basis Β of subset H—linearly independent, span H standard basis for ℝn {e1, ⋯, en} spanning set A small AP linearly dependent set A big AP coordinate system the unique representation theorem Β-coordinates of x—scalars that map them Β-coordinate vector of x—[x]B=c1⋮cn coordinate mapping x ↦ [x]Β it is a one-to-one linear transformation—isomorphism x = PΒ[x]Β, where change of coordinate matrix PΒ = [b1, ⋯, bn] dimension finite-dimensional vector space—spanned by finite set 0 dimension—{0} infinite-dimensional dim ℙn = n + 1 rank—dim Col A rank thm—ran**k A + dim Nul A = n an eigenvector corresponding to λ—x ≠ 0, Ax = λx, A i**s n × n an eigenvalue of A—λ main diagonal entries on triangular matrix 0 is an eigenvalue of A iff A is not invertible find eigenvector x given eigenvalue λ: Ax = λx ⇒ (A − λ**I)x = 0 (A − λ**I)T = AT − λ**I eigenspace of A—Nul (A − λ**I) characteristic equation of A—det (A − λ**I) = 0 ⇔ λ is an eigenvalue of A characteristic polynomial of A (with degree n) multiplicity of eigenvalue λ—the multiplicity as root similarity—∃P invertible, P−1A**P = B PBP−1 = A A, B are similar similarity transformation—A → P−1A**P same characteristic polynomial and same eigenvalues diagonalize—A = PDP−1 where D is diagonal diagonalizable—∃P, D ⇔ A has n linearly independent eigenvectors ⇔ ∑dimensions of eigenspaces = n ⇔ characteristic polynomial factors completely into linear factors ⇔ dimension of eigenspace for each λ= multiplicity ⇔ basis of all eigenspaces form a basis of ℝn ⇐ A has n distinct eigenvalues diagonizing—find eigenvalues ⇒ D; find eigenspace basis ⇒ P; check A**P = P**D vector operation inner product (dot product)—u ⋅ v**=**uTv length—∥v∥=v⋅v normalizing—get unit vector in the same direction as nonzero vector distance dis**t(u**,** v) = ∥u − v∥ orthogonal relationship orthogonal—u ⋅ v**=**0 ⇔ ∥u + v∥ = ∥u∥ + ∥v∥ orthogonal complement W⊥—the set of all vectors orthogonal to subspace W is a subset of ℝn interactive—A⊥ = B ⇔ B⊥ = A (Row A)⊥ = Nul A (Col A)⊥ = Nul AT orthogonal set—ui**⋅**uj = 0 ∀ i ≠ j ⇒ linearly independent is a basis for the subspace orthogonal basis—basis that is orthogonal set weight—cj=uj⋅ujy⋅uj for y ∈ Spa**n{uj} y=∑cjuj orthogonal projection— projLy=y=u⋅uy⋅uu y=y+z where z**⊥**u subspace L spanned by u orthonormal set—unit vector orthogonal set orthonormal basis—basis that is orthonormal set Gram-Schmidt process—calculate orthonormal basis by subtracting projection QR factorization—orthonormal basis of linearly independent matrix Q, R = QTA orthogonal matrix—square matrix with orthonormal columns ⇔ U−1 = UT ⇔ orthonormal rows ∥Ux∥ = ∥x∥ (Ux) ⋅ (Uy) = x ⋅ y Ux⊥Uy ⇔ x**⊥**y product of orthogonal matrix is orthogonal matrix orthogonal projection—projWy=y=∑ui⋅uiy⋅uiui projWy=∑(y⋅ui)ui=UUTy for U of orthonormal basis ui best approximation theorem—y is the closest point in W to y general least-square problem—find x for smallest least-square error ∥b − Ax∥ Ax=b=projColAb ⇔ normal equation—ATAx = ATb ran**k ATA = ran**k A Ax = b has a unique least-square solution ∀ b ∈ ℝm ⇔ A is linearly independent ⇔ ATA is invertible ⇔x=(ATA)−1ATb ⇔Rx=QTb⇔x=R−1QTb Χβ**=**y parameter vector—β observation vector—y least-square line (line of regression)—y = β0 + β1x observed value—yi predicted value—β0 + β1xi residual—β0 + β1xi − yi Χβ = y where X=1x1⋮⋮1xn,β=[β0β1],y=y1⋮yn residual vector ϵ diagonalization of symmetric matrix symmetric matrix—AT = A eigenvectors from different eigenspaces are orthogonal orthogonally diagonalizable—could have orthogonal matrix P A = PDP−1 = PDPT ⇔ A is symmetric spectral decomposition—A=∑λiuiuiT where ui are the columns of P uiuiT is a projection matrix—pro**j{ui}x**=**(uiuiT)x single value decomposition SVD—A = UΣVT left singular vectors form U is m × m orthogonal Σ=[D0(m−r)0(n−r)T0(m−r)×(n−r)] is m × n D=σ10⋱0σr, σ1 ≥ ⋯ ≥ σr > 0 right singular vectors form V is n × n orthogonal ∀ m × n matrix A, ATA is symmetic its eigenvalues λ1 ≥ ⋯ ≥ λi ≥ ⋯ ≥ 0 single values—σ1,⋯,σn=λ1,⋯,λn for {v1, ⋯****,vn} with eigenvectors of ATA correponding to λ1 ≥ ⋯ ≥ λi, and A has r nonzero singular values, {Av1, ⋯****,Avr} is an orthogonal basis for Col A, ran**k A = r find single value decomposition find orthogonal diagonalization of ATA—P, D′ V = P to orthonomal = {vi} D and Σ U—ui=σiAvi broader speech processing online/streaming mode: continuous conversion omnidirectional/ cardioid/ hyper cardioid microphone sound sample rate μ-curve audio format: use PCM online endpointing format: push to talk, hit to talk, continuous listening spectrogram: energy distribution over frequency vs time preemphasize speech signal: boost high frequency spreemp(n)=s(n)−αs(n−1)α=0.95 spectrogram: from time series to frequency domain auditory perception: from frequency to Bark log Mel spectrum: log of integration of each filter bank Mel cepstrum discrete cosine transform (DCT) compression longest common subsequence: dynamic programming, dummy char padding, streaming, search trellis, sub-trellis, lexical tree DTW: dynamic time warping: disallow skipping (vertical on trellis), non-linear mapping (super diagonal) Mahalanobis distance d(x,mj)=(x−mj)TCj−1(x−mj) d(x,mj)=21log((2π)D∣Cj∣)+(x−mj)TCj−1(x−mj) covariance Σ=σ11σ21⋮σd1σ12σ22⋮σd2⋯⋯⋱⋯σ1dσ2d⋮σdd self-transition penalty: model phoneme duration MFCC (Mel frequency cepstrum coefficient) dynamic time warping (DTW) hidden Markov model for DTW expectation-maximization (EM) algorithm initialize with k-means clustering auxiliary function: conditional expectation of the complete data log likelihood Q(Θ,Θ(t))=y∑p(y∣X,Θt)log(p(X,y∣Θ)) evaluate Θt+1 Θt+1=ΘargmaxQ(Θ,Θt) iterate until convergence forward algorithm Baum Welch: soft state alignment log-domain math log(xy)=log(x)2log(y)log(x+y)=log(x)+log(1+2log(y)−log(x)) continuous text recognition: small-scale problem, e.g. voice command grammar: only focus on syntax not semantics training with continuous speech: bootstrap & iterate N-gram state-of-the-art system phoneme: 39 English, Mandarin with tone multiple pronunciation: multiple internal model + probability mono-phone/ context-independent (CI) model di-phone: model previous and current phoneme. problem: cross-word effect tri-phone: model multiple current phoneme based on previous and next phoneme inexact search: run n-gram on (n-1)-gram model by applying word-transition prob SGD > GD for online learning SGD problem: mini batch too big gated recurrent unit (GRU): a RNN principal component analysis (PCA): autoencoder, dimensionality reduction loss λik: take action αi when sample belong to Ck goal: minimize risk R(αi∣X)=k=1∑KλikP(Ck∣X) for 0-1 loss, R(αi∣X)=1−P(Ci∣X) reject class: a K+1th class w/ fixed loss λ∈(0,1) or, maximize discriminant function gi(x),i=1…K maximum likelihood estimator (MLE) parametric (distribution based on known parameters) vs non-parametric X∈RN×PY∈RN×1L(W)=∥WTX−Y∥2=WTXTXW−2WTXTY+YTYWargminL(W)⇒∂W∂L(W)=0⇒2XTXW−2XTY=0⇒W^=(XTX)−1XTY Wargmin[L(W)+λP(W)]⇒W^=(XTX+λI)−1XTY assuming P(W)=∥W∥2 XTX+λI invertible, proof by positive definite need to try different K y=−1,1 objective, maximize minimum margin: W,bargmaxi=1…Nmin∥W∥1∣WTXi+b∣ s.t. yi(WTXi+b)>0⇒W,bargmax∥W∥1⋅1 s.t. yi(WTXi+b)≥1⇒W,bargmin21∥W∥2 s.t. yi(WTXi+b)≥1 by ∣WTXi+b∣=yi(WTXi+b)i=1…Nmin∣WTXi+b∣:=1 apply Lagrange multiplier: L(W,b,λ)=21∥W∥2+i=1∑Nλi(1−yi(WTXi+b))⇒W,bminλi≥0maxL(W,b,λ)⇒λi≥0maxW,bminL(W,b,λ) ∂W∂L=∂b∂L=0⇒W^=i=1∑NλiyiXi,i=1∑Nλiyi=0⇒L^=−21∥W^∥2+i=1∑Nλi=−21i=1∑Nj=1∑NλiλjyiyjXiTXj+i=1∑Nλi final objective: λi≥0argmin(−L^)=21i=1∑Nj=1∑NλiλjyiyjXiTXj−i=1∑Nλi solution: sequential minimal optimization (SMO) solve non-linear problem w/ linear classifier kernel, but w/o φ restriction objective: λi≥0argmin(−L^)=21i=1∑Nj=1∑NλiλjyiyjK(Xi,Xj)−i=1∑Nλi positive-definite kernel output positive-definite matrix alternative definition of kernel: lossy transformation from p to q dimension maximize variance uargmaxuTSu s.t. uTu=1⇒λargmaxL(u,λ)=uTSu+λ(1−uTu)⇒Su=λu minimize distance between full projection (p-dimensional) and principal component analysis (PCA, q-dimensional): objective: ukargminN1i=1∑N∥X^i−Xi′∥2=N1i=1∑Nk=q+1∑p(X for value f following distribution with PDF p, do not know CDF, want expectation define distribution q(z) with known CDF weight (importance) ωi:=q(zi)p(zi) E(f):=∫f(z)p(z)dz=∫f(z)q(z)p(z)q(z)dz≈∫ωif(z)q(z)dz≈L1i=1∑Lωif(zi) when seeking minimum, accept increase in f with probability e−TΔf<1 where temperature T∈(0,1),T←Tγ,γ=0.99∈(0,1) Markov chain in stationary distribution if π(x)pxy=π(y)pyx want to sample target distribution p design Markov chain w/ stationary distribution π=p: want to sample variable xi following different distribution fix x−i:={x1…xi−1,xi+1…} to previous value when sampling xi: Qx,x∗:=p(xi∗∣x−1) a special case for metropolis hasting method ⇐ p(x−i∗)=p(x−i)⇒α(x,x∗)=p(x)p(xi∗∣x−i)p(x∗)p(xi∣x−i∗)=p(xi∣x−i)p(x−i)p(xi∗∣x−i)p(xi∗∣x−i∗)p(x−i∗)p(xi∣x−i∗)=p(xi∣x−i)p(x−i)p(xi∗∣x−i)p(xi∗∣x−i)p(x−i)p(xi∣x−i)=1 sup=lnp(x)1=−lnp(x) empirical risk Rn(f) close to true risk R(f) P(∣R(f)−Rn(f)∣≥ε)→0 as n→∞ for finite class F={fi},i=1,…,m P(∣R(f)−Rn(f)∣≥ε)≤2mexp(−2nε2) for infinite class F P(f∈Fsup∣R(f)−Rn(f)∣>ε)≤2N(F,2n)exp(4−nε2) proof: P(∣R(f)−Rn(f)∣≥ε)≤P(f∈Fsup∣R(f)−Rn(f)∣≥2ε)≤2P(f∈Fsup∣Rn(f)−Rn′(f)∣≥2ε)by Symmetrization Lemma where Rn′(f) is empirical risk of another n sample (ghost sample) ⇒∃c≤N(F,2n) class of f for sample & ghost sample problem: hard to compute shattering coefficient maximum number of FX1,…,Xn, function we can get by restricting F to X1,…,Xn maximum n s.t. ∃f∈F,∀X={X1…,Xn}, f classify X completely correctly richness of function class F in sample set {Xi} sample label σi=±1,i=1,…,n Radn(F)=E[f∈Fsupn1i=1∑nσif(Xi)] solution: for each possible set of σi, find the function f to maximize the inner sum, then take weighted average to balance training error and model complexity Segmentation fault (brain dumped). And really, what’s with the love for all those programming languages? Trying to set a record for indecisiveness? According to GitHub, I used over 30 computer languages, making a primary target for GitHub Roaster. Unfortunately, my lasting addiction in exploring different programming languages has not found me a perfect match. I have been tolerating Python and Rust for most of my projects, with Python for scripting and Rust for when the Python code becomes too complex or slow. Sometimes, I wonder how my language learning journey influenced my current choices and liking. Would I have ended up choosing Go if I did not start with Java? Maybe I would have been a Kotlin boy if I did not learn Rust? In this memo, I would like to recall my journey, including the dark history I do not bring up, and look at how I ended up stuck with the snake and the crab languages and said no to the rat and the girl languages. I said I started with Java, but that is technically not true. When I was in my 6th grade, I was left at a friend’s house for a few days. During the day, I was alone, and looked around their house for books to read—one of them being a C introductory book. I remember reading a few chapters about control flows, thinking that was nothing nerdy and high-profile. In retrospect, those few hours probably had a significant impact later. In the IT course in my middle school, we were taught a CAPITALIZED LANGUAGE for programming graphical interfaces. This was probably Visual BASIC. I remember the absolute chaos in the computer labs with everyone trying to get online and play video games and the teacher hardly having patient to teach. Despite how bad I was at typing tests, I aced the programming exam, probably because of the C book I read a year ago. I would move on to write automation scripts to play video games in high school. Those programs are much more sophisticated than the BASIC ones because some of them were to automate finishing game levels. That was probably when I learned to debug by running the program again and again, but I apparently did not gain a deep understanding—sometimes things just did not work without apparent reasons. Going into university, I swore I would not become a programmer because computers are so unreliable and annoying. Boy, was I wrong. I now blame that feeling on Microsoft Windows. The experience learning my first programming language fundamentally set my expectations for any programming languages and influenced my taste to this day. Entering university, my perception of programming was completely changed. I was shocked in Physics and Chemistry classes seeing people programming Python in the labs to run experiments. There were master’s students persuading me to learn computer science, saying that a computer science major can move to most other fields easily, but not the other way around. Suddenly, I wanted to try programming again and started watching Python video tutorials online. Unfortunately, I did not learn Python by watching videos, and computer science at my university sucked. My peers told me that the entry level computer science course on Python was a waste of time, and advised me to straight-up take the higher-level course on Java. I took their advice and started learning Java instead of Python. I should thank these people for changing my life if I remember who they were. Java, despite being the meme boilerplate-driven instant-legacy language, was a good first language, in retrospect. Unlike Python, which gives its learners false confidence about their abilities to program (despite often not being able to type out correct syntax), Java causes pain in all areas of its learners’ brains, greatly helping them remember the syntax. Although instructors like to teach programming like a philosophy, it is not. Getting started at programming is more like learning to play a musical instrument or sport—you need to build muscle memory so you can play at all! Then, you can learn to appreciate the philosophy. This is why I programmed more smoothly with languages with more syntax and later felt at home with Rust. Another huge influence Java had on me was the development environment. I followed video instructions to learn IntelliJ IDEA. To this day, I still see JetBrains IDEs as absolute models in terms of auto-completion, refactoring, and code suggestions, despite how bloated and non-customizable they are. This is why I find VSCode lacking for Java and Kotlin, and why I did not like Python, Julia, or Ruby when I tried them—I got no method autocompletion or type hints in the editor. Python has since caught up with various language servers, but the lacking editor experience is still holding me back from Julia the girl language. Liked it or not, I was forced to learn Python again and use it for my first collaborative project. Another student was scraping our university websites and making a custom search engine. I did not grasp Python concurrency and had speed problems crawling the websites. I blamed this on Python being slow. Since Python was so slow, my logical choice was to speed it up, but I ended up in a completely different rabbit—or rather “crab” hole. The seemingly lazy methods like Cython and PyPy all showed major pitfalls quickly, so I started watching videos on how to write Python packages in C and C++ and disliked how they awfully incremented and decremented the reference pointers manually. By pure chance, I heard one of the videos mention “you might as well use Rust”. “There is another language that is not C or C++ that you can write Python packages in??” I was in surprise. As I dug deeper into what Rust was, it sounded too good. It was fast as C++; but unlike C++, it had Python-like syntax I could understand, straightforward high-level abstractions like Java, error messages that says what is wrong, and a package manager; and it did not That was when I deviated from trying to write Python packages in Rust to directly writing Rust programs. I found Rust libraries for HTTP requests, and how to do basically the same things I did with the Python crawler in async Rust. Oh boy, was async Rust a bait! I shot myself in the foot so many times: mutex guards across The instructor advising the search engine project, however, found Rust too complicated. “I think it is beyond you,” he even suggested. I was pissed. That drove me to finish the Tokio book and later learn more advanced topics in Rust. One thing he said was true, though: having to wait for the code to compile sucks. However, these problems are tiny compared to the lack of an established ecosystem, which I gradually realized when trying to replace all the Python packages we used. All in all, Rust was and has been a great experience for me. It requires thinking thoroughly about the task at hand and being explicit about all the types and sometimes the lifetimes; in exchange, it offers a rich development environment, robust compile-time checks, and top-tier speed. Rust also introduced me to pattern matching, algebraic data types, and the iterator pattern, which are the main reasons I say no to other imperative languages. As I mentioned, I have never been fully satisfied with Python and Rust and peeked at a dozen other languages. I have wanted something that helps out with extensive compile-time checks like Rust but without the type gymnastics, and taps into a vast ecosystem like Python but without the atrocious performance and multicore solutions. The languages that stand out are the ones I tried for the ecosystem argument: they have packages or frameworks for specific tasks like Ruby on Rails. I still place these languages as my go-to for specific tasks. Unfortunately, since I was spoiled by some of Rust’s outstanding benefits, other languages I tried did not stick: I realize it has become very difficult for me to adopt another language for general purposes. For any language, I would want a supportive development environment, “functional programming features” like pattern matching, potential for high performance, and a large ecosystem. I have been vendor-locked in… Started 2024-12-27, finished 2025-01-03 I optimized my route analyzer to get 300× faster. It analyzed around 800 million routes within 3 hours.1 The exact multiply of speedup does not matter; rather, I find it valuable to discuss the data structure changes, profiling, and algorithm experiments, the major gains, and the wasted effort. No, I did not rewrite a Python implementation to gain a trivial 50× speedup; I started with a multithreaded Rust implementation. I bet I would have lost all my hair if I had tried to squeeze out this much performance from some garbage-collected language! Note that I aim this article at any programmer at the intermediate level or above. Despite the Rust-like pseudocode, you should feel at home if you are familiar with the syntax of Python or any other C-family language. Let’s start! The main bottleneck of the route analyzer lay in finding inexact matches of routes’ IP address prefixes in nested sets. In our research, this matching is a core functionality for matching routes on the Internet against public routing policies. The technicality of the research does not matter here. Instead, let’s look at the context from a programming perspective to understand the optimizations! As the pseudocode below illustrates, our task is to The example pseudocode below checks if the prefix To understand the problem, I started with a straightforward implementation. Explanations: For starters, profiling and standard data structure changes were the lowest-handing fruit. Although I lost the exact performance record of early implementations, my benchmark code checked only 256 routes and clearly took minutes to run. Therefore, I realized that the naive implementation would not scale, and immediately started profiling the benchmark code to find where most of the time was spent. Surprisingly, Cargo-FlameGraph showed that most of the time was spent on checking if set names have been visited ( The other effective change was to replace the maps. The hash map being only slightly faster than the B-tree map was no surprise. The B-tree map is cache-efficient, and avoids the somewhat expensive string hashing. This means, for smaller maps, the B-tree may be faster. Though, it turned out that my maps are large enough and my string keys are small enough, such that the hash map won. These trivial changes turned out to be clear wins, and show some basics of performance optimization: Besides the bottleneck from This classic linear search naturally reminded me of binary search. Though, a regular binary search does not suffice because range operators make prefix matching inexact, therefore individual checks made sense. However, range operators do not permit arbitrary matches; they at most relax the matching to specific ranges. Meanwhile, sorting brings similar prefixes close to each other. Combining these two facts, I created an optimized algorithm based on binary search: Function Although this custom binary search is rather a hack out of intuition than a mathematically proven algorithm, I verified it against 26 million routes and got the same results as that of linear search. Hence, I deem it correct. Prof. Italo Cunha also suggested using a prefix trie in place of the vector, but benchmarks indicated it was only about half as fast as the binary search,4 so I stuck with what we have. In theory, the trie may be imbalanced, causing long cache-unfriendly traversals, especially for large prefix sets. Vectors are obviously the most cache-friendly… which brings up the motivation for another optimization. Conceptually, our custom binary search saves us from checking the prefixes far from At this point, the analyzer was processing 26 million routes in 48 minutes,4 or 9.3 routes per millisecond, a tolerable speed. However, I was too impatient to wait more than one day to analyze 800 million routes. After reading about Bloom filters, it was tempting for me to eliminate more bottlenecks from the This proves expensive because of hashing and memory access. First, With this analysis, I initially hypothesized that the repeated probing could be a major bottleneck. I knew that Rust’s standard library Therefore, I implemented BloomHashSet for an insertion-optimized set. I reuse the hash table from HashBrown’s “raw” module, and added a bit vector for the Bloom filter. Initially, I used a Bloom filter package that hashed each name four times, but soon realized hashing is slow. Thus, I only hash once (k=1) and reuse the hash for both the Bloom filter and the hash table. To achieve a false positive rate of ε=6%, I gave the Bloom filters 16× capacity compared to the hash table (m/n=16). Experiments revealed that I need to preallocate an astonishing capacity of 214 for the hash tables to hold all the prefixes in large nested sets. Finally, I reuse the same hash for both the lookup and the insertion.6 These efforts provided an around 10% speedup, which is not worth it in my opinion. The preallocation and hash reuse, the trivial parts, probably provided most of the speedup, not the Bloom filter, the fancy part. After further reading HashBrown’s source code, I realized my initial probing hypothesis makes little sense: Lessons learned: We have gone a long way removing performance bottlenecks, however, the overall algorithm remains—we are still calling recursive functions in a loop: Effectively, this recursion flattens each nested set on the fly. When talking to Prof. Harsha Madhyastha, I explained how caching the flattened prefix sets is untenable because of their massive sizes. Indeed, attempting to flatten every nested set to their prefixes consumed all 300 GiB of RAM and caused our server to hang! Hence, further flattening prefix sets is clearly infeasible—not a satisfying answer! After being unhappy for a few days, I suddenly rediscovered a tenable caching strategy—flattening outside-in instead of inside-out. Flattening prefixes takes too much memory because each prefix ( Below are two of the few surviving flame graphs that shows the differences in profiling. With this caching optimization, the analyzer achieved its current performance of processing 779 million routes in 2 hour 49 minutes,7 or 76.9 routes per millisecond. This speed is 8× that of the first version using binary search, and over 300× that of the initial version if you do the arithmetic. Notice how most current optimizations are very basic: So are the ideas behind them: Last but not least, the moderate CPU, memory, and conceptual overhead of Rust made these optimizations somewhat easy. I procrastinated on finishing this article for two whole months because it is quite non-trivial to write! My goal is to separate out the optimization experience from the research context and Rust programming, but, as you can see, they are tightly coupled. I started by trying to lay down a simplified explanation of the routing analysis problem with several mathematical definitions, and even drew a diagram to illustrate their relationships. However, I soon realized that few people would want to read such definition-heavy text. Therefore, I resorted to throw away most of the routing context by renaming the variables and functions. For ease of reading, I converted the topic to a programming problem and explained it with pseudocode. I am moderately happy about the outcome of these efforts. Started 2024-05-30, finished 2024-07-29. I ran the experiments on a server with dual EPYC 7763 64-Core processors with hyper-threading turned off to avoid IOMMU issues. ↩ These are simplified Rust-like pseudocode based on my real code. I removed the inessential references ( In the real code, we record any missing entries instead of ignoring them. ↩ This Pull Request shows the trie is slower: Replace Here is the Issue that records the current performance: RIBs processing performance. ↩ When reviewing the submission instruction for a conference, I was reminded that the figures in my paper need to be “accessible” when printed in grayscale. Usually, this would be done by adding patterns like triangles and circles in the plot. However, we were plotting stacked areas like the left one below, which contains dense and tiny vertical bars, so a grayscale gradient seems to be our only option. I found nice grayscale-gradient “sequential” Matplotlib colormaps like the right one above, but they make it impossible to hand-pick a distinct and pretty color for each category. “This is not a big deal,” I thought, “I’ll just use HSL (hue-saturation-lightness), set s to 1, l to a gradient, and hand-pick the h.” Except I was so wrong—lightness is not grayscale, and figuring everything out took me the whole morning. After wasting an hour failing to find any implementation on the Internet, I rolled my own scripts to calculate RGB color values from hue and grayscale, which allowed hand-picking hues while still having perfect grayscale gradient: Below is the definition of these six colors. I hand-picked the hue values so each color looks natural, but the grayscale values are sampled linearly. As a trade-off, I spent some time doing math on a whiteboard. After some Wikipedia’ing, I extracted the formula for RGB-HSL and RGB-grayscale conversions, and did the math on the whiteboard above. For a color with linear red, green, blue values r,g,b∈[0,1], hue h∈[0,360), saturation s∈[0,1], lightness l∈[0,1], and grayscale p∈[0,1]: xyzchh′lsRp:=max{r,g,b}:=mid{r,g,b}:=min{r,g,b}:=x−z=:h′⋅60°=⎩⎨⎧cg−bmod6cb−r+2cr−g+4if x=rif x=gif x=b=2x+z=1−∣2l−1∣c:=0.2126,G:=0.7152,B:=0.0722b=Rr+Gg+Bbmaximum RGB valuemiddle RGB valueminimum RGB valuechroma601hueif l=0,1 Recall that we know h and p, and want to solve for r, g, and b. The easiest bit is exploiting saturation s=1 and the formula for it: 1=1−∣22x+z−1∣x−z⇒{x=1z=0if x+z≥1if x+z<1 The many conditions of the hue is a source of nightmare for simplification. Fortunately, I observed the formula for h′ and found out only the order of the three RGB values matters if I introduce a new variable k: k⇒k:=⎩⎨⎧6−h′,h′,2−h′,h′−2,4−h′,h′−4, if h′∈[5,6) if h′∈[0,1) if h′∈[1,2) if h′∈[2,3) if h′∈[3,4) if h′∈[4,5)(i.e. r>b>g)(i.e. r>g>b)(i.e. g>r>b)(i.e. g>b>r)(i.e. b>g>r)(i.e. b>r>g)=x−zy−z Since we can get the order of r,g,b from h, we can also know the grayscale coefficient ki for i=x,y,z. For example, if r>g>b, then we have: x=r,y=g,z=bkx=R,ky=G,kz=B⇒p=kxx+kyy+kzz That is, we need to solve this linear system: ⇒{x=1z=0if x+z≥1if x+z<1k=x−zy−zp=kxx+kyy+kzz{x=kx+kykp,x=1,y=kx,y=(1−k)z+k,z=0z=ky(1−k)+kzp=kx−kykif x+z<1if x+z≥1 This is then trivial to program. In the code, I first compute a potential answer using the first case, then fall through to the second case if the resulting x+z≥1, and then, boom! We have the RGB values from hue and grayscale. …except we don’t. RGB values computed from a linear grayscale gradient using the code above does not produce a grayscale gradient. In fact, I found some colors to have very similar grayscale. This is because the RGB values we computed are linear RGB values, and monitors use standard RGB (sRGB) instead, with the conversion: CsRGB={12.92Clinear1.055Clinear1/2.4−0.055if Clinear≤0.0031308if Clinear>0.0031308 Why apply a weakly concave curve to RGB values? It turns out that human eyes are less sensitive to changes in bright colors, and so the curved sRGB values better reflect the perceived brightness. Applying this knowledge to creating a grayscale gradient colormap, we need to first generate the gradient in sRGB, then convert each grayscale to linear RGB and apply the hue-grayscale to linear RGB conversion, and finally convert them back to sRGB. And, that is it! A perfect grayscale-gradient colormap with hand-picked hues. We fixed s≡1 above, but what if we want to be able to change the saturation? Can you solve the system for r,g,b given h,s,p? 2024-05-25 What does mdBook-KaTeX do? It is an mdBook preprocessor that pre-renders math expressions. For example, if you have this following snippet in your book: mdBook-KaTeX would pre-rendered it as: and then feed it back to mdBook. All the HTML tags might look a bit scary, but this is what all HTML-based math renderers do—generate a load of nested tags. It enables the expressions to look nice in a browser: Define f(x): f(x)=x2x∈R Most renderers just do this in the browser after the users load the webpage. mdBook-KaTeX lets you pre-render upfront, so the browser would not need to run any JavaScript. In this article, however, I want to focus on the other side of mdBook-KaTeX instead—mdBook-KaTeX as an mdBook preprocessor. What does an mdBook preprocessor do? Well, in a nutshell, mdBook preprocessors are used to customize mdBook, the static site generator used to render The Rust Programming Language. Preprocessors read the loaded book data from mdBook, manipulate them, and pass them back to mdBook. This sounds abstract, though, so let’s dive into what mdBook-KaTeX does, with simplified code, as a concrete example. mdBook-KaTeX is, firstly, a Command Line Interface (CLI) App written in Rust. It uses Clap to parse the arguments passed in: As we can see, mdBook-KaTeX only takes one command— We don’t use the All preprocessors need to have this command so mdBook can check whether it supports a renderer. Usually, we use mdBook to render Markdown into HTML with the In this case, mdBook-KaTeX would just output nothing with status code 0 to indicate that it supports the If no command is specified, mdBook-KaTeX should read the book data as JSON from StdIn. Here, we read the context So far, the process above is basically universal for any mdBook preprocessors. Yes, you can copy the code from the Simply stated, we just loop over the all the chapters in the argument Above, we use the Below, we have a simplified version of Based on the types of the If you have been following along, I hope you got a gist about how an mdBook preprocessor works and probably how to write one yourself! (If not, leave a question below). In reality, though, the code for mdBook-KaTeX is way more complicated due to: mdBook-KaTeX offers a wide range of options. We read these options from the Parallelism is more interesting. Since the In v0.5.0, we switched to Rayon for simplicity, but the basic ideas are the same. To spawn threads and get parallelism, each thread ideally needs to own its data. So, we need to scan for tasks first, save them in vectors, and then execute the tasks in the vector in parallel. For example, this is how we actually parallelize processing each chapter: In summary, we have walked through mdBook-KaTeX as an mdBook preprocessor example to show what a preprocessor does: The reality, however, is that, like many other projects, mdBook preprocessors get messy easily. We will talk about the mess mdBook-KaTeX was in next time. 2023-05-27, edited: 2023-06-11, 2024-05-24 This summer, my parents and I have been giving advice to more than a dozen of relatives and friends in mainland China on their kids’ education. I sense that Chinese parents are facing unprecedented challenges: Schools work the asses out of kids in the hope of higher scores. Kids have more homework than they can finish without going to sleep late, even in elementary schools. After school, most kids are sent to tutoring organizations by their parents. High school kids study at school Monday through Saturday, 7 am to 9 pm. A kid is almost treated as a mere number. We all know a person is not measurable as a single number, but the important question is what to do under the environment’s pressure. I believe it is time we revisit first principles and rethink the purpose of education. My conclusion is we should borrow many ideas from machine learning for human learning, both for kids and adults. A Rust conference talk recommended beginners to slap A primary reason people get started with async Rust is to use certain libraries. This is the most prominent if you try Rust for web-related endeavors. Consider the following scenario for a Python developer: Having heard that Rust is fast like C++ without the SegFault, you recently learned some basic Rust. Soon, one of your Rust scripts needs to make HTTP requests, so you naturally search for a library akin to Python’s Requests. You find Reqwest, which uses this async/await syntax in its examples. “Okay,” you think, “guess I need to learn this new async await thing.” So, you search for how to use async await in Rust and find Tokio. In fact, most developers either do Python or JavaScript. Then, consider this other scenario for a Node.js developer: To test Rust as an alternative to Node.js, you start a new web backend in Rust. You search for Rust backend frameworks and find Actix and Axum, both uses async/await. “Ha!” you think, “I know this. This is basically the same as async await in JavaScript.” In either case, you get the impression that async/await is this special syntax that you stick on top of regular code to make them run “asynchronously”… whatever that means. If you are a legitimate learner, you may read the Tokio tutorial and the Async Book instead of merely watching videos online. These valuable resources teach you additional important “caveats”, including: However, you probably also miss several fundamental ideas of async Rust, which would bite you in the future either in programming or performance. As async Rust becomes more prominent, more and more people start doubting whether it brings more value than troubles. Common criticism include: With all these overhead, one would naturally wonder if async Rust is worth it. Many even argue that going “asynchronous” is only worth it if you have extreme concurrency loads, and that operating system (OS) threads suit most applications better. To understand these concerns, we need a clearer context for async Rust. The key point of async is yielding the control back to the runtime ( Now, to understand the nicety of these features, let’s consider Erlang’s preemptive scheduling, since async Rust shares many goals with it. Erlang powered soft massive-scale real-time systems such as telephone services and WhatsApp. It has OS-like preemptive scheduling over lightweight green threads called Erlang processes. Therefore, every Erlang process soon gets a chance to run. Bad Erlang processes never block the whole system. These guarantees enable services like telephone to function during massive overload periods, albeit slower. Additionally, you can terminate any Erlang process, and it would exit immediately unless it traps exit (in which case you can “brutal kill” it), providing the ultimate developer-friendly and reliable cancellation. In summary, to support massively concurrent real-time systems, Erlang’s OS-like preemptive scheduling optimizes for minimum latency and reliability. Now, comparing Erlang’s scheduling to Rust without async, you would see the significant gap in between. While Erlang is built from the ground up to achieve preemptive scheduling using a virtual machine, Rust cannot even afford a runtime in the language. Additionally, Rust offers heavyweight OS threads that cannot be terminated from other threads, a big sucker! Hence, to get the niceties of async scheduling, Rust’s users are bound to make some sacrifices in terms of ease of development. If Rust fundamentally does not support preemptive scheduling or thread termination in the language, how can it achieve features similar to Erlang’s? This brings the core idea of async Rust: you, the programmer, bears the responsibility to make your program suitable for cooperative scheduling! That is the core of async Rust programming! The runtime cannot achieve these features if your code does not yield! This fact brings several often missing yet important lessons: The async/await syntactic sugar, albeit eases programming, obfuscates async Rust’s actual underlying mechanism for beginners. In handwritten structures that implements The lack of understanding in async Rust among beginners causes widespread mistakes that visibly hurt their performance. Many beginners thoughtlessly slap It also revealed why Rust would always suck compared to Erlang—cancellation is not free. This means any async function needs to be carefully written, with lots of yield points inserted (using I finally got around to read the old legendary book, Design Patterns: Elements of Reusable Object-Oriented Software. I find it depicting an OOP landscape and philosophy much different that the OOP we see today. More importantly, it claims that OOP fundamentally sacrifices control and performance in favor of manageability, a sign to me that says OOP is unsuitable for systems programming, where control and performance is crucial. A type is a name used to denote a particular interface. We speak of an object as having the type “Window” if it accepts all requests for the operations defined in the interface named “Window.” An object may have many types, and widely different objects can share a type. Part of an object’s interface may be characterized by one type, and other parts by other types. Two objects of the same type need only share parts of their interfaces. Interfaces can contain other interfaces as subsets. We say that a type is a subtype of another if its interface contains the interface of its supertype. Often we speak of a subtype inheriting the interface of its supertype. This is to say that OOP does not stress the importance of each object’s implementation, but the messages it accepts (methods). Conceptually, the problem is, implementation details often decide the suitability. Calling a costly method in a long loop is almost bound to lead to performance issues. And, modern multi-threaded systems even have additional thread-safety concerns for each function. When one opts into programming on the abstraction level of interfaces, they become agnostic to these concerns. Concretely, interfaces-agnosticism forces inefficient implementations. Most OOP languages resolve to boxing every object and doing run-time lookups to determine if an object implements an interface. Languages like Objective C and Cpp create fat structures that bumble with them the method pointers, and languages like Python and JavaScript lies onto using dictionaries as objects for this dynamism. All of these options incur performance penalties from dynamic allocation, lookups, and pointer redirections. The popular alternative, generic programming with monomorphization, shifts this cost to compile-time and to the programmer. This is notorious in Cpp for its atrocious template function error messages, and in Rust for the complex However, more powerful compilers may be able to handle this for the programmer. Roc seems to abstract automatic monomorphization away from the programmers. Cig works entirely around generic programming by evaluating part of the program at compile time. These efforts may be able to promote the idea of reuse from OOP without either run-time nor human overhead, but they still fundamentally depend on knowing the data structure implementations—the “types”. An object-oriented program’s run-time structure often bears little resemblance to its code structure. The code structure is frozen at compile-time; it consists of classes in fixed inheritance relationships. A program’s run-time structure consists of rapidly changing networks of communicating objects. This comes to say that programmers do not see what the system actual does when they read the code. Translated to today’s OOP, when you read a module, a class, or a method, you have little idea what domain-specific tasks it performs. This just makes it difficult to map real-world problems to the code, adding hurdles to reason about the system for any given tasks. For a concrete example, the authors details one aspect of why this indirection exists: the pervasive use of references (“acquaintance”). References connect objects into a graph, a costly task for a garbage collector, but more importantly, I believe, a constant though exercise for programmers to track. These entries are stored only for the history. This was when I set up the blog in WordPress. Isn’t WordPress for non-technical people? Why would I use WordPress? Also, why am I starting a blog? Didn’t I say that I don’t need a blog? Well, if I wrote something moderately decent and long, I need a more decent place for it other than GitHub issues and Reddit. Of course, I could use GitHub Pages to make a static site, but people wouldn’t be able to comment on that. Of course, I could set up a whole server with a JavaScript front-end library (probably Svelte) and a back-end with Rust, deploy it on Amazon lambda, and connect it to a PostgreSQL database. That would be really cool. But, then, I would probably need to find a domain for it and pay $20 a year… I don’t want to. Also, people don’t care about you knowing how to mess with AWS, they only appreciate the silly CSS tricks they see that makes up the front-end. So, setting up a server manually and hosting a blog probably would not be worth it. For now, that is it. I’ll post random articles here. But, I will manually keep a Markdown copy on GitHub. 2023-05-26 [] Sichang He (Steven) sichangh@usc.edu Ph.D. Student in Computer Science GitHub & YouTube @SichangHe University of Southern California Website Education University of Southern California (USC) Aug. 2024 – present Los Angeles, USA Ph.D. student in Computer Science Advisor: Dr. Harsha V. Madhyastha – Research focus: User-facing measurements and enhancements of the Web. – Selected courses with projects: ∗ Advanced Computer Networking (A+): JSphere: measure webpages’ browser API calls to classify JavaScript. Duke Kunshan University (DKU) & Duke University Dual Degree Aug. 2020 – May 2024 B.S. in Data Science (by DKU) & Kunshan, China & B.S. in Interdisciplinary Studies (Subplan: Data Science, by Duke) Durham, NC – In addition to studies in Data Science, completed many courses in Computer Science and Mathematics, effectively emulating a triple major. – Selected courses with projects: ∗ Computer Network Architecture (A+): async TCP server and client in Elixir and CLI REPL shell in Rust. ∗ Data Acquisition and Visualization (A): Poster using D3.js, Svelte, and TypeScript. ∗ Computer Organization and Programming (A+, at Duke): Binary search tree in MIPS assembly and ring buffer in C; fifth place in PizzaCalc assembly length optimization competition (68 lines). ∗ Numerical Analysis (A+); Intro Abstract Algebra (at Duke), Complex Analysis (A). – Achieved full grade on all five courses at Duke University in the Fall 2022 semester. Zhixin High School Aug. 2017 – Jun. 2020 Guangzhou, China – Chemistry Olympiad competition team; Biology Olympiad competition team. Conference Publication [1] Sichang He, Italo Cunha, and Ethan Katz-Bassett. “RPSLyzer: Characterization and Verification of Policies in Internet Routing Registries”. In: Proceedings of the 2024 ACM Internet Measurement Conference. Nov. 2024. url: https://github.com/SichangHe/internet_route_verification/releases/tag/imc- camera-ready. [2] Sichang He, Beilong Tang, Boyan Zhang, Jiaqi Shao, Xiaomin Ouyang, Daniel Nata Nugraha, and Bing Luo. “FedKit: Enabling Cross-Platform Federated Learning for Android and iOS”. In: IEEE INFOCOM 2024-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). May 2024. url: http://www.arxiv.org/pdf/2402.10464. Research Experience Research Assistant Aug. 2024 – present Networked Systems Lab, USC Advisor: Dr. Harsha V. Madhyastha – Thoroughly surveyed systems research problems stemming from AI-generated text and images. Independent Researcher on Internet Route Verification Apr. 2023 – May 2024 Independent research, Federal University of Minas Gerais, Brazil (Remote) Advisor: Dr. Italo Cunha – Designed and implemented an efficient and comprehensive parser for the Routing Policy Specification Language (RPSL), a language used in the Internet Route Registry to document public interdomain routing policies, guide public peering, and aid routing issue troubleshooting. ∗ Studied and complied with RPSL semantics in RFCs, covering over 99% of all real-world RPSL use cases. ∗ Optimized the parser to interpret and verify hundreds of millions of routes per hour. Sichang He (Steven) - University of Southern California Page 1 of 3 [] – Employed the RPSL parser to verify observed interdomain routes; analyzed verification reports and identified com- mon RPSL usage patterns and usage mistakes. ∗ Provided tooling to verify routes against the RPSL, helping improve interdomain routing security. ∗ Identified and implemented checks for 9 potential reasons why routes fail to match the relevant RPSL. Research Assistant for Mobile Federated Learning (FL) Project Mar. 2023 – May. 2024 The FedCampus Team, EdgeIntelligence Lab, DKU Advisor: Dr. Bing Luo – Authored FedKit, open-source SDKs to streamline real-world FL experiments across Android and iOS devices, enabling training shared ML models collaboratively without sharing private data on smartphones. ∗ Contributed to the Flower FL framework: revamped the Android example; helped correct the iOS example. – Led and managed the systems development for the FedCampus Android/iOS app, leveraging personal health data from over 100 participants to conduct real-world FL and federated analytics experiments on DKU campus. ∗ Supervised and mentored four undergrads and one M. Eng. student in mobile and web development. ∗ Investigated and led core technology adoption, including Flower, Kotlin, and Flutter, facilitating development. Research Assistant for Search Engine Research Project Dec. 2021 – May 2023 The Search So Team, DKU Advisor: Dr. Jiang Long – Developed a feature-rich open source web scraper in async Rust to scrape DKU sites, intranet, and Duke sites. – Improved backend HTML processing, frontend interface, and version control. Presentation [1] Bing Luo and Sichang He. FedCampus: a Privacy-preserving Data Platform for Smart Campus with Federated Learning and Analytics. Flower AI Summit, 2024. Mar. 2024. url: https://www.youtube.com/ watch?v=K7yu2jvu_t8. Teaching Experience Teaching Assistant, Introduction to Programming, DKU Nov. 2021 – Mar. 2022 – Hosted weekly lab sessions; created the course’s first video tutorial on development environment setup. Math & CompSci. Tutor, Academic Resource Center, DKU May 2021 – May 2022 – Obtained CRLA’s International Tutor Training Program Certification, Level I . Awards • Senior Scholar-Athlete Award through the Running Club, DKU Athletics (Apr. 2024) • Silver Medal, International Genetically Engineered Machine (iGEM) 2022 DKU Team (Oct. 2022) Developed the team wiki independently; helped non-technical members adopt Git; validated protein designs with modeling software. • Dean’s List (Spring 2021) & Dean’s List with Distinction (Fall 2021, Fall 2022, Spring 2023), DKU • Chancellor’s Scholarship & UGRD Entrance Scholarship, DKU (merit-based, Fall 2020 – Spring 2024) Side Projects Open Source Developer & Maintainer of mdBook-KaTeX Nov. 2022 – present Math expression preprocessor for mdBook written in Rust; over 90,000+ downloads on crates.io GitHub – Took over maintainership by publishing a fork when it was unmaintained. – Fixed numerous bugs and developed new features, resolving more than 20 GitHub issues others had opened. – Improved speed by over 10 times by adopting parallelism and avoiding repeated rendering. Author of Open Source Web Forum Using Ruby on Rails Jun. 2022 – Aug. 2022 – Featured infinitely nested comments; deployed on Heroku. Sichang He (Steven) - University of Southern California Page 2 of 3 [] Outreach • Chinese Editor & Translator at DKU Intersections Journal Jun. 2021 – Aug. 2021 Co-translated three English articles into Chinese; reviewed and edited multiple articles and Intersections’ website. Skills Natural Language Mandarin (native), Cantonese (fluent). Programming Language – Invested and proficient: Rust, Python, Elixir, JavaScript. – Used in projects: Kotlin, Dart, Swift, Ruby, Java, Lua, TypeScript, Svelte, HTML, CSS, C, SQL, Racket, L A T E X. – Familiar: C++, Fish, Bash, Julia, Go, Erlang, Gleam, VimScript, Elisp, Mathematica, MATLAB, Scala, R. Computer Software – Selected frameworks/libraries: Playwright, Django, Phoenix Framework, Tokio, Tailwindcss, D3.js. – Selected development experience: SSH, Tmux, Neovim, Docker, Kubernetes, Evcxr, Arch Linux, Hackintosh. Sichang He (Steven) - University of Southern California Page 3 of 3 [] Guests: don’t read this. This is private. 2022-03-17 I have this feeling that I am about to be unable to write. Been having all sorts of math, stats, and CS courses, there is really not much writing to do. DKU provides little paper to write, specifically, little literature-ish things. All said, I decided to start writing diaries. Not for people to read, but more for me to stay sane on the writing side. Some people have blogs. It’s a good idea to have your things on the web, so it lasts longer than your local file, or worse, a piece of paper made in the 2020s. But, I dislike the idea that you write something solely about yourself for others to read. That does not affect my adoption of the shell of blogs. I will put this diary on the web. To post something on the web, you need HTML, and there are different ways to make HTML. The most popular way is WordPress. 43% of the web is built on WordPress. More bloggers, small businesses, and Fortune 500 companies use WordPress than all other options combined. Join the millions of people that call WordPress.com home. What a great ad from WordPress themselves! I would have fallen for it if I did not know how bloated WordPress is. The DKU website itself runs WordPress. And, the performance is “insane”. You can open a small webpage in 5 seconds! Imagine? And also some webpage in 30 seconds! Amazing! The Intersection uses WordPress for its website (not published yet). I saw the source code it generates for a simple article: 7 layers of nested divs! And, it also has “insane” performance. If I look back from the future and see WordPress’s dominance, what would I think? People were crazy? No, people are lazy and will not change if things work. It is possible, but the HTML part probably would take much more time than the text part. Plus, the CSS part would be the most annoying. Markdown is a lightweight markup language that looks nice as source code. It is rendered by converting to HTML. I feel fast and delightful writing Markdown for my notes. Also, I used Markdown to write presentation slides and it is just minimum effort. Without more trash talk, it is sure that I would choose Markdown if possible. But, what flavor? Markdown has a shit ton of different flavors. Some does not eat your What I need is: I ended up considering Hugo and mdBook only for fast and simple Markdown-HTML conversion. I found Hugo by searching Markdown website (or something like that) for DKU iGEM Wiki (a future thing). It is marketed as an easy-to-use and fastest static site generator. It has many themes that the community made. Most of them look like PowerPoint templates (no offense). I spent way too much time looking around to find a nice looking theme, and found hugo-book and Doks. Without a good theme, I have no mdBook basically supports everything I mentioned out of the box. The “downside” is that it has only 5 themes (which you can try by clicking the brush top-left), and they all have the same structure. However, I don’t need a thousand custom themes that are crappy, instead, one good theme is good. It saves me the burden to fight with CSS (great!). That is why you are seeing this diary in mdBook form. I need a server to host the diary. I can host it on my machine, but a domain name is just a waste of money in my opinion. And, getting a paid server provider is the same. Then, I saw GitHub as a free service provider: GitHub pages. And, it actually does the thing: host your website. It also is fast. The downside is that GitHub is blocked from PRC using DNS pollution (or something). GitLab is not and it works in PRC. However, I do not like the GitLab interface, and GitLab is much slower and laggier. Eventually, I decided to choose GitHub. That will do. I just “wasted” a whole morning on this thing. Now, I will go to lunch and then put this on GitHub. 2022-04-04 I switched from Visual Studio Code (VSCode) to VSCodium. I am writing this text using VSCodium. This may be a weird reason, but I switched from VSCode to VSCodium because of performance. I was happy using VSCode. However, Saa told me VSCode was resource-intensive. She said her Lenovo laptop fans ramped up as soon as she opened VSCode. Being on an M1 MacBook Pro, I have never experienced ramping fans. However, I did experience warm keyboard when I browsed the web with VSCode in the background, but I blamed it on the external monitor and too many browser tabs. I checked Activity Monitor, Visual Studio Code.app seemed to be using very little CPU and memory, but there were also several “renderers” using as many resources, and they combined to above 1GB of memory and 1% or 2% CPU. After some searching about VSCode being resource-heavy, I found out that VSCode uses Electron, which uses a Chromium engine to render everything, which means VSCode is essentially just an app running in its bundled browser. It turned out Electron used about 2% CPU and half a Gig of memory. Using a browser at its core guarantees that VSCode is sure to be as resource-heavy as any browsers. Thus, it is fundamentally flawed. I spent some time looking for other text editors. Sublime Text seems to be the most popular alternative, however, it is strictly proprietary, and I caught words that you need to buy it and join its developer version to get the latest version and support. Atom is automatically excluded from the possibilities because it started this “mini-browser app” shenanigan in the first place. Neovim is a “pure,” performant text editor. I actually tried VimR and Neovide. They provide something far from a drop-in replacement to VSCode, though. I imagine I would need to spend days configuring everything. Then, I read about VSCodium. I learned that although the VSCode project is open source under the MIT license, the Visual Studio Code app Microsoft ships is built using Microsoft’s custom configuration with their telemetry. This reminds me of how bloated Windows’s are… I decided that switching to VSCodium will probably disable a lot of trash processes Microsoft forces me to run when using VSCode. And it would make a drop-in replacement. Brainlessly, I ran this to install VSCodium: I sim-linked Upon opening, VSCodium was glitchy. It opened deathly slowly and was far from as responsive as VSCode. I was wondering whether that was because of some magical configuration Microsoft does when they build VSCode. But, I soon recognized bigger problems. Right out of the box, Remote SSH does not work anymore. I uninstalled it and tried to reinstall it from the marketplace, but it turned out that VSCodium uses an open-source marketplace instead of the one VSCode uses. This marketplace does not have anything from Microsoft. Installing the extension from a file downloaded using the browser show it in the list of installed extensions, but it still does not work. It turned out that Remote SSH is strictly proprietary and only works on Microsoft’s Visual Studio Code build. After some fiddling around, I found SSHFS works pretty nicely. It has all the features Remote SSH has, and it allows you to save the password of the remote users’, and to enable it only at certain workspaces instead of forcing you to enable it globally. It actually seems better than what Microsoft offers. I wonder why there are so few downloads. I also followed the instruction from VSCodium’s GitHub to enable VSCode’s marketplace by adding a I can tell easily that the VSCodium marketplace has way fewer extensions available, but I guess as long as the extensions I need are not “Visual-Studio-Code-only,” I can just install them via the browser and some commands. Why is VSCodium so slow? Why? Why on earth? I opened Activity Monitor and found that it is “Intel” —the app from Homebrew works via Rosetta translation… I immediately went and searched for a macOS ARM build. I saw a GitHub issue, where they discussed not being able to build VSCodium for MacOS ARM because the GitHub CI did not have a macOS ARM worker option. It turns out that they have not figured it out yet by now. However, there is a guy that has an M1 Mac Mini, and they built MacOS ARM binaries and released up-to-date versions on GitHub. I downloaded their build and it works nicely. I checked Activity Monitor, VSCodium seems to use much fewer resources than VSCode while still using quite a lot… 2022-05-08 It is 7 on the morning. I woke up at 6:30 because of the noise on the train—people talking, watching videos while letting the sound out loud, and noise from plastic bags when someone grabs something to eat. I am on a “fast” train from Jinhua to Guangzhou. “Fast” is the type of train prefixed with K and it is ironically the slowest. The train goes about 100km/h and it was considered “fast” when it came out. I am writing this in a bed with narrow space around me. I have been in a lockdown in Kunshan for a month. During the time, we have been having courses fully online. The total places that we are allowed to go is the union of campus, the apartment, and the school shuttles that travel between them. We are also forbidden from delivery food due to “potential contamination” while the school shut down all canteens but one. Near the end of April, we received an email that says something about students telling the school that they need to leave to apply for the visa because they are going aboard, and asks those who want to leave before May to fill in a questionnaire so that they can send them out of Kunshan on a bus to Suzhou. I did not want to leave just in a few days because I have all the belongings to pack. There is also a catch that the school will throw away all the personal belongings left. On May 1, they sent a similar email “to collect information.” I calculated and decided to leave on May 7. I bought the plane ticket and filled in the questionnaire. If that worked as expected, I would have arrived home by now. But, the airline was cancelled on May 4. After a lot of searching, the alternative is “fast” train from Jinhua. The questionnaire was a mystery. Nothing came back from the school. On May 5, because the airline was cancelled, I called Chinese Students Services (also CSS, ironically). They answered the phone after a few attempts. The answer was that the school was “collecting information,” and they just sent the people leaving on May 5 to Suzhou. CSS told me that the information for us will come in a day or two. They also implied that the bus usually leaves around 9 or 10. My impression was basically: we don’t ensure anything would happen so we have no responsibility. Anyway, at 9 a.m. on May 7, I got the bus and left Kunshan the jail. The ride to Suzhou North Railway Station was smooth, with no questions asked on the way. This was because because Suzhou was just downgraded to “low risk area” and the travel code would be green as opposed to green with an astroid for a city with COVID cases. The station has two layers of gates, each with several guards who check people’s tickets and travel code. Only those who have already bought the tickets and have a green code can enter the station. After I enter the station gate, a man behind me commented: “so many stuffs! You are moving house? Where are you going to?” I told him Jinhua. He said he was going to Jinhua as well and asked me where am I from. “Hunan.” “That’s great! I am from Hunan as well, we can have companies.” said the young man wearing glasses with a large suitcase. We entered the station, which has a large square in front of the building, enclosed by large fences. The people at the door of the building checked our tickets and told us that we may only enter after 1:30 p.m. because our trains are in the afternoon. It was still 10 a.m. The sum was shining above us and the weather was hot. We found a tree in the square, parked our luggage, and sat there. The square was so crowded. It is about 100m wide with a lot of stone seats, but nearly every seat is occupied, and luggages are everywhere. The man told me that there was a wave of people going home. He said now that Kunshan has green travel code without the astroid, people can finally leave after one month of lockdown. He showed me a few videos on his cellphone. One of them was people raiding a shop for food in April. Another was blood everywhere on the ground, and people standing around doing nothing. He told me that people in his factory had a fight and the policemen had just entered. The leading policeman laughed in the video: “killed someone, huh?” as if he was watching a comedy show. The man complained to me that Jiangsu people are less kind than Hunan people. He said that Jiangsu people tend to put up a fight easily. Apparently, he was leaving this place and going home, and he must have quit his job in the factory. With this new “friend” watching out for my luggages, I checked around the square for shops. It was quite a walk. The square spans a whole area and was full of people. There was no sign of shops open except for two shops at the edge of the building, but both of them are closed. The men’s and women’s room were outside of the station. There was an exit by the entrance of the station. Two men sat there, one of them holding the fence gate handle. They told me to leave my National ID on the bench with the other people’s, and take it when I came back. The real problem was lunch. There wasn’t even a place that sells water. Our “friend” spotted a man eating delivery food. I asked the eater, who told me that he received the delivery food across the fence. I ordered and ate delivery food for lunch, which was handed into the square above the fence. When I was eating my lunch by the tree, a few aged women came to water the tree. We had to move all our luggages and watch the women water the tree until everything around the flowerbed is wet. They laughed and watered more trees while yelling at people to leave. At 1 p.m., our “friend” suggested that we join the queue to enter the station building. At that time there were already two dozen people in the queue, and everyone could only enter half an hour later. As time goes by, the queue grew longer quickly. I took out the laptop and tried to read, but the laptop was so hot under the sun, that I gave up soon and started to stand still. After 25 minutes or so, the queue began to move forward and people started to sneak in. The sneakers either smiled and just jumped in the queue, or pretended to be dumb. A station staff came to argue with some of the sneakers, but there were so many of them that the staff did not see some. Nonetheless, I entered the station and enjoyed the air conditioner. I jumped into the train to Jinhua 2 hours later. The train was full of luggage and I had to put my luggage between the carriage. The situation was not better in the train from Jinhua to Guangzhou. So many people are running away from Jiangsu and it makes total sense. 2022-05-15 It is 12 p.m. at night and I’m the only one up at home. I just had a shower and my hair is still half-dry. During the shower, I was thinking and I was feeling lost. I thought again about my Signature Work, which does not exist at the moment. I wrote to a computer science professor for recommendation for mentors. In that email, I stated that I wanted to deploy a server with Rust and WASM but I felt that it would be impossible to accomplish anything alone and I considered scientific computing as well. One move I think I did wrong was to mention a project with another computer science professor, who the professor I wrote to recommended for mentor without a reason. I guess they are just pushing me back to whatever I have. The reason I asked another professor for recommendation for mentors is right because I sensed that there are something wrong with the way the current project runs. I feel like that we are not doing anything special while getting excited —we are just reinventing the wheel and trying to enjoy it. Text searching and test processing, both are something that countless people have done before, and our team is just importing Python libraries, smashing it into the project and writing glue code, and calling it a progress. A while ago there was a seminar where computer science professors and students showed what they had done. All of them used machine learning to treat some problems that may or may not have any meanings. I have this impression that people are machines learning for machine learning. It looks tempting, but I know that machine learning is extremely data-demanding and therefore hard. I decided that machine learning would not be a main theme in any research of mine unless I really have a huge amount of preprocessed data, which I probably won’t have. I have this feeling that computer science people in DKU tend to play with little toys that people from a more academic background would discard. This could be similar for data science people. I am considering to do some mathematics studies… I thought about it and recall that my mathematics is a huge pile of mess, being taking all these “applied” math classes at DKU. I just took the advanced linear algebra last session and I had zero motivation to study it along side with CS301 with our lockdown and online classes. I don’t really know what the heck I learned in these courses. I know they have a lot of proofs that is important, but I don’t really understand why any of them is constructed like that. I also did horrible in CS301, which I confirmed today by checking the final exam grade. I thought to myself that I don’t care grades that much. I figured that I care more about my experiences. But, I do care about grades. The way this shows is that bad grades make me visibly unhappy, and good grades does not make me happy. It is just poisonous. I am now thinking that I really should reread the books that I ought to read but instead attended “slides classes.” I think learning mathematics is not about taking those freaking classes. The classes that I did well, I learned them in advance by book. I am guessing that I actually learn much less effectively taking classes than reading books for the following classes. But, I have been too lazy to read all the books before taking all these classes. A lot of the time definitely went to YouTube. Right now, I’m learning JavaScript, while also planing for more Rust, Python, and C exercises. But, I suspect that these are not as important as reading more mathematics books for me. I need to change… 20220806 After a long time, I decided that I should write something again. I’ve been sick multiple times during the months since coming back to Canton. I highly doubt that the pathogen density here is much higher than in Kunshan. Despite that, I have actually done quite a lot of stuff recently. After fiddling with qemu and virt-manager on my M1 MBP, I gave up and went back to Parallels Desktop. I used virt-manager with qemu on Arch Linux and Ubuntu Linux. On M1 MBP, though, KVM does not exist, and virt-manager cannot connect to hvf in my experience. Therefore, I was merely using qemu’s emulation and the performance was dreadful. Having met obstacles using virt-manager, I decided to try out Parallels Desktop again. The problem was how to get it. At first, I searched for the good old Russian TNT. The search took a long time and the results were diminishing. I then looked for Chinese cracks. All of the Chinese cracks use PD runner, which I knew about and also tried. Basically, it is a standalone program that calls functions from Parallels Despite to launch VMs after the trail period has ended. Some sites even sell PD runner for quite an amount of money. I don’t like the idea of getting a caller program somewhere randomly on the internet and using it just to launch an official program. I remembered vaguely that this thing was leaked from GitHub. And, that was the right place to search for it. The program was hidden behind a non-default branch for some reason, but I managed to download it. I simply got Parallels Desktop from their official website. It turned out later that the PD runner from GitHub works just fine. Now, the problem became which distro I should use. A big thing about Linux distros is that no all of them support ARM that well. It also come with the actual reason why I need a Linux VM. There are certain tools, such as valgrind, that cannot be used on macOS. And, I want to be able to get my hands on some of them. Because that’s the reason, I decided that compatibility is the top requirement. Therefore, Alpine, which is very lightweight and “suckless,” is excluded from my choices. I use Arch Linux for my other laptop, but I believe ARM is not a first class support for them. It usually comes to Ubuntu as the choice for such VMs. But, I dislike distros like Ubuntu because they fiddle on the original repos (like Debian’s repos) and produce what effectively is a Frankenstein of repos. Their base, Debian is good though, but Debian stable provides packages that are on average one or two years old. Luckily, I recently got to know that Debian testing is actually very usable. It is also a rolling-release distro. It seems like a very good choice for development which is what I would do on the VM anyways. I grabbed a weekly build ISO and installed it as a Parallels Desktop VM. The first time, I got it wrong. I used TUI install, partitioned the drive, gave it 100MB for the boot partition and the rest for the home partition. At the end of the installation I selected “install system utilities” and did not install any desktop environments. As the installation process proceeds, I was watching the virtual disk growing larger and larger in size on Finder. It grew from under 1G to over 2.5G. I wanted a minimum installation, and that was not. I deleted the VM and tried a second time. This time, I did not select anything to preinstall during the installation process. However, I made another mistake. At the end of the installation, it prompted me to remove the installation media, which I did later by deleting the ISO. What happened was, it said that the VM did not have any OS installed! I reconnected the ISO and booted into rescue mode, but I didn’t know what to do to rescue it. So, I installed it again on the same virtual disk. Upon fresh install, the VM was about 2G, which makes me happy. However, I tried to install packages using sudo and found out that sudo wasn’t installed. So, I used su to install doas because I heard that doas is smaller and simpler to use. I began to understand why people just stick to Ubuntu. Most people would not be able to handle this. They would just go online and type in whatever command they see in a guide to install something. And, the guide would most certainly tell they to sudo. I installed both clang and GCC and both of them are about 0.5G, therefore the VM grew much bigger. It was still much smaller than it could be if I were to install a desktop environment. My previous VMs took around 5G to 7G even without the compilers. I decided that I want to stick with the terminal for this VM. So, the text editor choice would naturally be Neovim. My main text editor is Code - Insiders at the moment. The reason why I switched from VSCodium was that the fantastic Pylance extension would only run on Microsoft’s builds. Also, the insider build is slightly faster in my experience. I also tried Emacs, but ended up not using it. That would be for another story. Previously, I was using SpaceVim, which is a Vim distribution. It has numerous plugins built-in and supports various languages out of the box. The problem though, is that it is large and hard to install. The installation depends on GitHub connection, but my VM is not guaranteed to have that. I decided to go vanilla on the VM. But, at the same time, I want to go full vanilla Neovim because that is how I can learn. I went ahead and read the neovim-lua guide on GitHub, and added the configurations I want to my pure-Lua config files. To keep things small, I found a theme on GitHub based on One Light and copied it to my repo. After all that was done, I was pretty happy about my Neovim setup on the VM. But, I was less happy about my Neovim setup on my Mac. I tried to port my configurations to the macOS side but apparently those conflict with SpaceVim settings. I finally decided to ditch SpaceVim, which also meant that I would need to have my own plugin manager. I landed on dein for its lazy-loading features for performance. It took two days to fiddle my Neovim setup and learn Lua. After using some language servers, I concluded that Neovim cannot replace VSCode for me. It is much more convenient to open up a file instantly from terminal with Neovim. But, it lacks features like toggling comments on keyboard shortcuts. Although one could probably configure Neovim to do all that, it take quite a amount of time. Also, the way VSCode is used is that you start it and let it stay, therefore, you get much faster startup speed when you open another file. But, you start and close Neovim frequently, and each time, the language server or whatever extensions you have loads for a while. However, I am largely happy about how the configurations turned out. I got to fiddle with Lua, which is a different beast than the other scripting languages I used. Lua’s performance is simply fantastic while it sucks at not having arrays. It also took me a while to learn Lua’s custom Regex. After gaining some familiarity with Lua, I decided to write the yabai helper in Lua. The helper takes the information about all of the windows in the current space and calculates the next window to switch to. In this way, I managed to let yabai switch window in a circular fashion, including the windows yabai does not manage. 20230313 I have abandoned this diary for a long time since I went to Duke last fall semester. This could be reasoned at follows: Duke seemed like a real school. People walked around carrying books and talking to others. In all my classes (except perhaps CS 203), class activities were unmatched compared to DKU classes. The courses have loads of homework, almost as much homework as DKU courses each week. But, the courses are not difficult as DKU courses in that you have the time and chance to “breathe.” Unlike crappy DKU seven-week courses, Duke courses last for 14 weeks, and give you one extra week for reviews, one extra week for final exams. I took: I have not much motivation to talk about them in details. In general, Duke CS classes taught me real stuff, unlike DKU CS classes… The relationship ended with a special COVID related drama that I could not foresee. What I did foresee, though, was its nature of temporariness. Back then, I wrote the relationship contract with a clause regarding break ups. What I think went wrong, was that I pet them too much and they behaved more and more like a pet, dependent, purposeless, and naughty. I need a back end technology. First, I thought Ruby on Rails was the most suitable. Now, I am turning to the Elixir Phoenix Framework. Rails was said to be the most productive back end framework. I learned Ruby and walked through the Rails “getting started” document last summer. Then, I made a forum. Pain started to increase when I was making the forum: Main reasons why Rails suck: Too much “magic.” Backwards incompatibility. Hotwire was not fun to play with. It’s like playing a puzzle game, instead of helping me. I have not actually started using Phoenix yet. When following the Functional Web Development book, I wanted to separate the Elixir project into several modules, so I created an umbrella project and initialized a Phoenix project inside it. The project straight-up could not build. I then initialized a Phoenix umbrella project. It builds and runs fine. But, it creates a weird project structure that intertwines with Phoenix. So, I guess I will just not use umbrellas when I want true separation. 20230315 As I progressed in developing the 2022 DKU iGEM Wiki using mdBook, I realized that the iGEM competition requires that all content be hosted on iGEM, so using MathJax from a CDN was technically against the rules. Instead of hosting a MathJax ourselves, I tended to another option I found—compiling math expressions into HTML using KaTex, provided by mdBook plugin Advantages above MathJax: It was November 2022, the owner of The last stable release it has was 0.2.10, the CI started failing after that and never got fixed. Despite that, another contributor, Matthewacon, added many features. Therefore, the README suggested users to install There were also several very old issues lying there, such as math blocks breaking tables and code blocks. Pull requests were made, but they got no response neither. The option I posted an issue asking whether the repo was being maintained. That issue got no response. I decided that I should fork the project and maintain it myself because I relied on it. An issue with a simple fork would be that I could not I did many tries to fix the CI on my fork, creating an array of “CI failed” message in my mailbox, and eventually got the CI to pass. I published the crate and people could install it using just Cargo and no Git. So, I opened an issue introducing To my surprise, that issue lead to a speedy response from lzanini. The original author offered to add me as a collaborator so I can maintain the original repository. They argued that it would be more convenient for the users. It turned out quite well, in my opinion. I accepted the collaborator invitation and started to maintain 2023-03-28 Professor Nicolas Lane invited Professor Bing Luo to give a talk at Flower Summit 2024 about FedCampus. Since Professor Luo thought it to be a nice opportunity for me (or rather, he does not know how FedCampus works technically), he brought me in to co-talk. As a result, I begged DKU Undergraduate Studies and Division of Natural and Applied Sciences (DNAS) to sponsor my trip to London, and got a whopping ¥3500 + ¥5000. I stayed at London from 03-13 night to 03-16 morning, brought a poster about Fed Kit, and gave 31 of a talk. The talk was live streamed on YouTube, together with all other Flower Summit talks. It was one of the only talks where the audience was laughing… But, the live stream did not catch that sound track. On my flight back, however, I had some thermal cycles, and caught what I believed to be a bacterial infection. After I got back, I heard someone else also got sick on a plane. Man, flights are dangerous! Not because they crash, but because they are infectious. Overall, I bought only 3 brunches at the inn, and figured out all other meals with unhealthy free food from the summit. On 03-16, my butt hurt, so I was standing and kneeing, and got really tired in the afternoon, and started sleeping in the conference. I immediately went to sleep once back to the inn, at around 18. And then, I suffered from a -2hr jet lag when I got back to DKU. People asked me whether it was fun. I told them the conference was “fine”. The academics wanted to present their papers - so, yeah… I did talk to a few Ph.D. students and some Flower Labs people. I remember people being unsatisfied when I told them to find me on GitHub, and asked for my LinkedIn. I guess I have to use this atrocious ad platform now. There was a recently-graduated dude from New Zealand - who added machine learning to his statistics research because everyone was doing machine learning - and a few people who basically told me that they “borrowed” our implementation to do on-smartphone federated learning. I mean, FedKit is open source, and so is the code we contributed to Flower, but getting 0 credit is less fun than anticipated. I now understand why people put those corporate-style authorship and copyright notice in every single one of their source files. Started: 2024-11-10. Finished: 2024-11-29. I am on an A380 flying back to Los Angeles at 0.9 mach. IMC ’24 just concluded yesterday. It was nice in that I heard several interesting talks, met enthusiastic people, and ate random Spanish food at the conference. I also gave a talk, which mostly went as planned. It was unfortunate in that I got sick midway through the trip and struggled with jet lag despite having stayed in UTC+1 for 4 days before the conference. Let me recall the mishaps I experienced leading to the conference and the journey in Europe. The $2000 trip was originally not funded at all. After my paper was accepted, I asked three departments of DKU for travel funds, and got none. The main argument is I have graduated. I offered a proposition to the executive vise chancellor, offering to list DKU as my only institution in the paper. Apparently, they did not recognize the significance of having papers in conferences this prestige, despite DKU not having papers in conferences at this level. With that, DKU officially have zero direct contribution in my paper, so I decided not to list them as my institution. This was at one point a huge time eater and upset moment. It is especially upsetting because I got a personal offer to partially fund my travel, in exchange for DKU being listed as my institution, which would have looked very shameful. However, this is understandable because I know DKU is experiencing a decline in administrative efficiency. Though, Italo also thought it looked bad, and listed DKU as his institution alongside UFMG. Eventually, I followed Harsha’s advice to apply for a student travel grant from IMC, and fortunately got it. This twist, however, caused delays in my preparation for the conference, and had time implications I would discover later. IMC ’24 was in Madrid, Spain, but I did not got a Spanish visa. This would normally have meant I would not be able to enter Spain at all, as is the case for several presenters. I was at the conference thanks to a workaround for Schengen visas. I applied for the Spanish visa through their official agency, got rejected because of minor issues, and estimated that the time line for visa appointment rescheduling was unrealistic. Not only that, they appeared to not know what they were doing when I was at the visa center, and blocked my rescheduling due to their government-grade service website. This is when I started to think about giving up with the Spanish. I checked the rules for Schengen visas and countries that process visas quickly, and eventually chose France. The visa went through smoothly and quickly. The only downside was I needed to stay in France for the same number of nights as I did in Spain. So, I was forced to tour Paris for 4 days, a strange thing to say. This is not exactly visa shopping because my 4-night stay at Paris makes the visa application legitimate. This is apparently a thing. Later in the community session in IMC, I heard Christophe talk about how one could get a visa appointment for Norway or some other Nordic countries when the visa topic was brought up. The flight to Paris CDG went via Dallas DFW, and the DFW-CDG flight was an uncomfortable 8 hours. I got the worst seat, between the window and the aisle. No posture was comfortable. The food was also pathetic; both my neighbors left most of their meals untouched. What was worse? American Airlines people had very little tolerance for passengers hanging around. To relieve the stress in these long-haul flights, I usually do these small standing sessions for around ten minutes at the back of the plane. This plane, however, was configured to have no room at the back, so I hung around the food lounge and the emergency gates in the middle. Flight attendants would just go by and tell me to go back to my seat. One of them even stopped by to say they cannot have people camping around. This could have been a cultural thing. There was a fat man in a suit sitting near one of the gates I hung around. I was standing there leaning on the jump seat drinking juice, and he stared at me. After a while, he got up, so I moved around to leave him some room to go to the lavatory. Instead, he stood on my face, and said “go somewhere else”. “This guy thinks he owns this place,” I thought. I stared at him, had some more juice, and walked to the gate on the other side. Apparently, fat Americans simply get offended when you appear near them and get face to face. When I got to Paris in the morning, though, the chaos began. Since the AirBnB reservation starts from the afternoon, I decided to hang around a park between the airport and the place I stayed at. It was a very rural area with wind blowing across. I did not bother to put my puffed jacket on, and got a bit cold. Maybe I should have taken it more seriously at the time. The lunch at a random restaurant proved Paris’ food to be much cheaper than that in Los Angeles. This continued to be true throughout my stay, even in more central areas. What annoyed me was the lack of local food. Near the place I stayed at, it was all Pakistani, Indian, and Turkish food. Even near La Louvre, I walked in a huge block of all Asian food. I was like “what the heck is going on with this city? Where is the local food?” On the second day, I toured Musée du Louvre for 4hr. Unfortunately, it turned out the museum sells tickets with time slots, which is basically reservation. Since I spent the time before the trip desperately preparing for my talk, I only booked the tickets on the day I arrived and got the late 13:30 entrance time. The museum closes at 18:00. Having a late entrance to the museum did not bother to make the whole tour ultra long. I only woke up at around 10:00 due to jet lag, and walked around the famous piss river after my breakfast. Paris is really coherent with the styles of roads and buildings. I kind of see why people like the city and find it romantic. In the museum, I saw beautiful NSFW oil paintings and took selfies with the tiny transgender Mona Lisa painting people line up to see. On my way back though, I got into trouble. I was taking the same metro line that I went to central Paris with. It stopped at a station and never resumed. Eventually, the driver said something in the announcement and people looked upset, some of them getting off. After a while, everyone got off and started waiting on the platform. The driver got of the train and walked to the other end, and drove it back! I assumed there was a problem with the train, and waited for the next one like many other passengers did. The next train came, and we got on. And, the announcement from the driver started… People were upset, and eventually got off. Some left the station. After a while, the driver said something to the crowd, and some people got on, so I followed them. The train drove back to the previous station! Okay, so I realized there was some problem with the line and what the driver offered us was to drive us back to the previous station which had transfer options. In the monitors that displayed information, I finally saw the English translation of the announcement. It said that the line was partially shut down due to an unattended bag, and that the service would resume after some time. I went to the platform to wait for the service to resume, but got systematically kicked out by the people shutting down the line. Some of them knew English and explained to me that I needed to change my route. I had to take another metro line and transfer to a bus. That bus station for the exact bus line I needed to take was also moved, for some reasons, and it took me a while to figure it out. With the rush and mishaps, I also took the wrong direction once and missed a bus because of not waving for it to stop. The supposedly 50min trip took 2hr, and by the time I got to the AirBnB place it was already 22:30. I apparently did not learn this lesson that public transport in a language I completely have knowledge of is a guaranteed disaster, and did it again later in my trip to the airport. This was when I woke up and felt something stuck in my throat the next afternoon. I realized it was almost a cold and it became dangerous to my talk because I lost my voice. I had had trouble speaking for extended periods of time in the days leading to the trip when I practiced my talk, but this one was finally the bomb detonated. I could hear my mumbling talk being very deep when listening to the recordings of my practice talks. After a day of rest and not getting better, I decided it was time to do something other than hoping for self-healing. I asked my AirBnB host. She typed a line of text on my phone and told me to go to a medicine store nearby and show they this text. I went there, went in line, saw the guy behind a counter, showed him the text, and told him I had a throat inflammation in English. The guy replied in French, went to the back of the store, and came back with some medicine. So, I bought that random drug and took it according to my host’s instructions. I also aborted all my travel in Paris and stayed in bed until I left. The CDG airport was super small, with no restaurants or shops in it. The counter only checked my boarding pass and simply let me onto the plane, not even checking my passport. This was how I got to Madrid. Madrid was surprisingly super crowded. The MAD airport has a fitting name, that the airport has many levels and a long and slow line connecting many terminals far apart. The metro was crowded, stations have direction signs only at the entrances but not at the platforms, and the platforms are connected by mazes of passage ways. The conference was near Gran Via, meaning “great road”, and it was a super crowded central area with mountains of people, mostly Caucasian, walking around the streets by the various shops. It was also much more expensive. The hostel I stayed at was quite a place. The staff were all young people who spoke English, and they continuously organize events like free food and tours around the city. I was not in the mood for any events because the talk still had some major problems although it was only two days before I would give it. Italo wanted to show some But, I could not practice my talk aloud. I was lucky to meet Ethan at the reception the day before the conference. He knew I was sick and reminded me the most important thing to do was to get enough rest. That was kind of hard to do, though, because I completely did not overcome my jet lag. I would go to bed at around 22:00, think about the talk in my head, and become fully awake at around 01:00. My roommates were also nuts, and they would come back at 02:00 after who knows what. The streets outside the hostel were also loud, and the earplugs I got from the front desk was kind of weak. Anyway, I managed to get up at around 08:00 every day and stayed awake for the conference. Maybe it could have been a good thing that I practiced my talk in my head. The midnight before I gave the talk, I thought of something nice. Italo always thought the introduction was a slow ramp and wanted something more catchy, as did Ethan. I essentially came up with a catchy clickbait intro at 01:00 after 3hr of thinking in bed. I got up, checked the Sao Paulo time, and realized I would not receive any feedback from Italo about this new intro before I give talk if I waited until morning, so I called him. He picked it up, said he liked it, and told me I am crazy and should go to sleep after finding out it was 01:00 for me. The next day, I gave the talk, and visibly saw people to be looking when I said the crazy intro. I think this last-minute idea from thinking in bed was a huge enhancement to my talk. Anyway, the talk went most as planned. People seemed to be listening instead of focusing on their phones and laptops, and I said most of the things I planned to say. There were some minor details. During my dry run at NSL, people suggested that the projector may screw up the slides a bit and I needed more margins in each direction. At the conference, I saw that the projector was very good and we could see the whole slide, but the speaker would stand in front of the main screen and block the view. The stupid thing about the setup was that the lectern would block part of the main screen, and it blocks more after the speaker stood there and put their laptop on the lectern. I really wanted to avoid blocking more screen, although I did place a large margin at the bottom of each slide, so I essentially pressed down the laptop screen after plugging it in, which I believe made the experience much better for the audience at the center. However, I messed up with the clicker and clicked a lock-screen button near the end of my talk, although I fortunately immediately restored the setup thanks to not having speaker notes and only needing to fullscreen my Google Slides. Later, I talked to another speaker, he was like “after I saw you did that, I was saying to myself ‘DO NOT press that button’ when I did the presentation”. I think I went over time, though, because my session chair Phillippa looked unmoved and said we only had time for one quick question, although she probably simply is that style and the conference had always been overtime at that point. Presentations are hard. I saw so many people committing “presentation crimes”: having walls of text, reading the slides or speaker notes, dumping information without giving the audience time to comprehend, having a final slide with only “thank you” in it, and standing still without eye contact. I am glad that I avoided those. Jet lag made it unnecessarily hard to enjoy the conference. I could barely be excited, and I am surprised quite a few people decided to talk to me anyway during the breaks and when eating weird Spanish snacks. Most of these are PhD students. The professors are super busy doing stuff, and many of them are “chairs” of something. I later found out there were somewhere around 20 chairs… There are a few interesting talks. I forgot most of them, but I remember learning that QUIC is not unfair against CUBIC, many Web-related research are social, and crypto is encroaching into the IMC. The community session was quite an experience, where the chairs discussed the review process, how they separated long and short paper reviews this year, and what they should do to bring PhD students into the review process. I never knew the existence of shadow PC, which sounded stupid. They also touched on the visa issue. I talked to George Smaragdakis about letting people share their visa experiences; he has not replied yet. I also saw Dave Levin, who rejected my PhD application very late. It was kind of fun to see these people. At the end, I remembered to ask people for contact. It is quite difficult when you do not want to use social media to stay in touch. I gave in and added some people on LinkedIn, while only exchanged emails with others. It is obviously difficult to stay in the loop with emails, but I guess I will write people emails from time to time when I remember them. I really need to prioritize that over all the other dues! I wanted to delete two empty log files, so I selected them in the NeoTree file explorer in Neovim, and pressed “This must be a display glitch,” I thought, but running Since Exxact is a server, I asked Haodong if it had any backups. The answer was negative. Haodong suggested I run some recovery tools, reasoning that the file system should have only marked the file content on the disk as unused rather than zeroing them out. I asked Perplexity and it first found Testdisk, but I tried it and it showed no files to undelete. I though I was using the TUI wrong, so I asked ChatGPT how to use it and was advised to use Extundelete instead, but it also did nothing, even after unmounting the disk: I then followed suggestions from Testdisk and ChatGPT to try PhotoRec, which did recover 251591 files, but with all the file paths missing and made-up names. I tried Ext4magic, but it also did produce recovery with file paths: I now have a bunch of files I do not know where they belonged. Since I point to file paths in my database, these files are not very useful… While I reported this bug to NeoTree, I found more issues where people lost their whole work directory. I should not have trusted NeoTree for deleting files. Update: I threw gave up on recovering the files and crawled all data again. The recovered files have their path and name lost, and the contents are mangled, so they would have required much effort to sort out. So, I chose to rerun the scripts instead.directional derivative of V(x) along vector field f(x) (derivative of V along the flow of f)
Lyapunov’s stability theorem
Lyapunov function
Lyapunov’s instability theorem
non-constant periodic solution
limit cycle L
example of limit cycle
limit cycle enclose equilibrium
Bendixson’s negative criterion
Poincaré-Bendixson Theorem
bifurcation in one-dimensional system
implicit function theorem
persistence of equilibrium
fold bifurcation (saddle-node bifurcation)
condition
stability
proof
bifurcation in planar system
persistence of equilibrium
fold bifurcation (saddle-node bifurcation)
Geometric Vector
vector
vector addition
scalar multiplication
component of vector
standard basis vector
dot product (scalar product, inner product)
angle between a,b, θ
perpendicular vector
direction angle
projection
cross product
magnitude of cross product
parallel vector
property of cross product
triple product
scalar triple product
vector triple product
line L
plane
plane-plane relation
normal vector parallel
acute angle between normal vectorpoint-plane distance
cylinder
quadratic surface
standard form of quadratic surface
all traces are ellipses
horizontal traces are ellipses vertical traces in the planes x=k and y=k are hyperbolas if k=0 but are pairs of lines if k=0.
horizontal traces are ellipses
vertical traces are parabolas
variable to the first power indicate axis of paraboloid
horizontal traces are ellipses
vertical traces are hyperbolas
axis of symmetry corresponds to variable with negative coefficient
horizontal traces are hyperbolas
vertical traces are parabolas
horizontal traces in z=k are ellipses if k>c or k<−c vertical traces are hyperbolas two minus signs indicate two sheets.vector function (vector-valued function)
limit of vector function
continuity of vector function
space curve
draw projection onto three planederivative of vector function
⇒dtd(r(t)⋅r(t))=0⇒r(t)⋅r′(t)=0unit tangent vector
integral of vector function
arc length
curvature
smooth parametrized curve
(principal) unit normal vector
osculating plane
osculating circle (circle of curvature)
binormal vector
normal plane
velocity vector
speed (magnitude of velocity vector)
acceleration vector
Multivariate Calculus
multivariate function
level curve (contour curve)/ level space
limit of multivariate function
point out that the function approach different value from two different directioncontinuity of multivariate function
partial derivative
higher partial derivative
Clairaut’s theorem
tangent plane
line approximation
increment
differentiable function
differential
implicit function
implicit function theorem
gradient
directional derivative
level surface
tangent plane of level surface
extrema
extrema subject to constraint
surface area
Jacobian
line integral
work
fundamental theorem of line integral
conservative vector field
independence of path of line integral
Green’s theorem
C=∂D is boundary of D,
P,Q have continuous partial derivativenabla
Laplace operator
curl
divergence
Green’s theorem in vector form
parametric surface
oriented surface
flux
Stoke’s theorem
divergence theorem (Gauss’s theorem)
Parametric Equation
derivative
arc length
polar coordinates (r,θ)
transformation between Cartesian coordinates and polar coordinates
polar curve
area
arc length in polar coordinates
Sequence and Series
sequence {an}
sequence convergence
prove sequence convergence by function
sequence convergence preserve by chaining continuous function
convergence of {rn}
sequence monotonicity
bounded sequence
monotonic sequence theorem
series (infinite series)
nth partial sum sn
series convergence
example series
geometric series
power series ∑xn
harmonic series
p-series
divergence test
integral test
remainder estimate for integral test
comparison test
⇒an converge
⇒an divergelimit comparison test
alternating series
alternating series test
alternating series estimation theorem
absolute convergence
conditional convergence
ratio test
root test
convergence test strategy
⇒ (limit) comparison testpower series
radius of convergence R
converge iff x=a
converge for x∈R
converge if ∣x−a∣<R, diverge if ∣x−a∣>Rinterval of convergence
find interval of convergence
differentiation or integration of power series
convergence and differentiability of power series
function f(x)’s power series expansion
Taylor series
Maclaurin series
n-th degree Taylor polynomial Tn
remainder of Taylor series Rn
Taylor’s inequality
Maclaurin series for ex
Maclaurin series for trigonometric function
binomial series
Complex Analysis
complex number C
complex plane
binary operation of complex number
complex number short syntax
Argand diagram
complex conjugation
Euler’s formula
de Moivre’s formula
complex root
standard topology Os
region
connectedness
bounded
complex infinity
Riemann sphere S
limit
connection between complex limit and real limit
continuity
differentiation
Cauchy-Riemann equations
Cauchy-Riemann equations in polar coordinate
differentiability from Cauchy-Riemann equations
at z0holomorphicity
entire function
constancy from holomorphicity
parametric curve
parametric derivative
parametric integral
reparametrization
↦ϕ(t)=tC:z(t)=z(t) parametric curve length
contour
Jordan curve theorem
contour integral
function f(z) piecewise continuous on Cupper bound theorem for contour integral (ML theorem)
f(z) piecewise continuous on C,
f(z) bounded by Mproof of lemma
path independence of contour integral
Cauchy-Goursat theorem
simple closed contour C in D,
f:D→C holomorphic on and within CCauchy’s theorem
proof using Green's theorem and Cauchy-Riemann equations
Cauchy-Goursat theorem on simply connected domain
closed contour C in D,
f:D→C holomorphic on Dadoption of Cauchy-Goursat theorem on multiply connected domain
positively oriented contour C, negatively oriented simple closed contour Ci,
Ci inside of and disjoint from C,
f:D→C holomorphic on D and on the region between C,Cipath deformation principle
f holomorphic between C1,C2Cauchy’s integral formula (Cauchy’s formula)
f holomorphic inside and on Cproof
show using upper bound theorem evaluating integral with Cauchy’s integral formula
Morera’s theorem
Liouville’s theorem
fundamental theorem of algebra
maximum modulus principle
⇒∣f∣ has no maximum on Danalyticity
Laurent series
any closed contour C in D,
function f holomorphic in D⇒residue
Cauchy residue theorem
f analytic on and within C except on finite number of singularities zk,k∈{1,2,⋯,n}residue at infinity
singularity
isolated singularity
essential isolated singularity
pole
residue at pole
zero of order m
f(z0)=0,
∃ m∈N+ s.t.zero and pole
q has zero of order m at at z0,
p(z0)=0improper integral
Cauchy principal value CPV
improper integral for even rational function
q(x) has finitely many zero zk above the real axisFourier integral
Jordan’s lemma
semicircle contour CR:∣z∣=R>R0, 0≤θ≤πJordan’s inequality
indented path
lemma for indented path
clockwise upper semicircle Ci:z=xi+rieiθ,π≥θ≥0,ri<r,
Laurent series of f about xi contain no even negative powerlemma for integral over branch point
f(z) continuous on Cr⇒trigonometric integral
Laplace transform
Bromwich formula
line contour ΓR from γ−iR to γ+iRmeromorphic
theorem for meromorphic function
f non-zero and analytic on C, meromorphic within Cwinding number
f(z)=ρeiϕ=0 on C,
Γ:f(z(t))argument principle
Rouché’s theorem (dog on a leash theorem)
∣g∣<∣f∣ on C
⇒∣g∣<∣f∣ on CBrouwer’s fixed point theorem special version
g:D→int D analyticHopf’s theorem
one can be continuously deformed into another without crossing point wvector field index theorem
simple closed contour C enclose singularity zi,i=1,2,⋯ of V with index IV(Ci)⇒vector field singularity
vector field index
Möbius transformation
circle in Argand diagram
cross-ratio
z1↦w1,z2↦w2,z3↦w3conformal transformation
inverse function theorem
f′(z0)=0conformal transformation preserve angle
smooth C1,C2 at z0 and intersect with acute angle ψconformal transformation preserve harmonic
harmonic function
Dirichlet problem
prescribed ϕ(∂R),
find ϕmaximum principle
ϕ harmonic on Runiqueness of solution to Dirichlet problem
Complex function
logarithm
branch of logarithm
derivative of logarithm
exponential
derivative of exponential
trigonometric function
hyperbolic function
inverse trigonometric function
derivative of inverse trigonometric function
Contour integral of power function on circle
Complex Sequence and Series
sequence
sequence convergence
series
series convergence
remainder
power series
⇒ absolutely convergent in circle ∣z−z0∣<∣z1−z0∣ power series integration
proof
Taylor’s theorem
proof using Cauchy's integral formula
uniqueness of Taylor series
proof using power series integration and Cauchy's integral formula
Linear Algebra
determinant
Numerical Analysis
gradient descent
line search
gradient descent momentum
stochastic gradient descent (SGD)
integration
first-order method
second-order method
midpoint rule
Proof
proof by induction
direct proof
contrapositive proof
proof by contradiction
circular proof
Real Analysis
set
complement of set S
cardinality
finite set
infinite set
countable set
real number
every Cauchy sequence in R converge to RArchimedian property
function
inverse function
sequence
sequence convergence
sequence diverge to infinity
bounded sequence
limit theorem
Cauchy sequence
Cauchy sequence in R converge to finite number in Rsubsequence
limit point
Bolzano-Weierstrass Theorem
continuity
continuous function on closed interval
intermediate value theorem
uniform continuity
Lipschitz continuity
integral
partition
upper sum/ lower sum
Riemann integrability
Riemann integral
Riemann sum
differentiation
continuously differentiable
Rolle’s theorem
mean value theorem
Taylor’s theorem
sequence of function
sequence of function pointwise convergence
sequence of function uniform convergence
supremum norm
function converge in the sup norm
Cauchy sequence in the sup norm
integral equation
calculus of variation
functional
Euler equation
metric space
metric
sequence of point in metric space converge
equivalent metric
Cauchy sequence of point in metric space
complete metric space
contraction
fixed point of contraction
contraction mapping principle
series of function
limit superior and limit inferior
limit superior
limit inferior
partial sum
series convergence test
series absolute convergence
comparison test
root test
ratio test
series of function converge
Weierstrass M-test
integral of uniformly convergent series of function
derivative of uniformly convergent series of function
power series
convergence of power series
property of f(x) in power series
Shorthand
Topology
open
topological space (M,O)
neighborhood
closure
interior point
exterior point
boundary point
limit point
sequence convergence
continuity
Brouwer’s fixed point theorem
Package Manager
Cargo
cargo install <binary name>
cargo install-update --all
Homebrew
brew search <package>
brew install <package>
brew update --force \
&& brew upgrade --greedy \
&& brew cleanup \
&& brew autoremove
brew list
brew deps -t <package>
brew leaves > brew_leaves.txt
brew list --cask > brew_list_cask.txt
brew info <package>
brew cleanup && brew autoremove
export HOMEBREW_NO_AUTO_UPDATE=1
Gem
gem instal a_gem
bundle
gem cleanup
Nix
nix-env -ibA <PACKAGE>
nix-env -iA <PACKAGE>
sudo nix-channel --update && nix-env -ibA nixpkgs.nix && nix-env -ub
sudo nix-store --verify
NPM
module.paths.push("/opt/homebrew/lib/node_modules");
Pacman
curl -s "https://archlinux.org/mirrorlist/?country=FR&protocol=https&use_mirror_status=on" | sed -e 's/^#Server/Server/' -e '/^#/d' | rankmirrors -n 5 -
sudo vim /etc/pacman.d/mirrorlist
sudo pacman -Sy archlinux-keyring
pip
python3 -m pip
instead of pip
python -m pip
on Windowspython3 -m pip install --upgrade pip
python3 -m pip list --outdated --format=json \
| python3 -c "import json, sys
print('\n'.join([x['name'] for x in json.load(sys.stdin)]))" \
| grep -v '^\-e' \
| cut -d = -f 1 \
| xargs -n1 python3 -m pip install -U
python3 -m pip -V
python3 -m pip cache purge
TeXLive Manager (tlmgr)
tlmgr search --file <name_of_missing_file> --global
tlmgr install <package_name>
Rye
[tool.uv]
override-dependencies = ["PACKAGE_NAME>=VERSION"]
Programming
C
type
primitive type
char
int
float
double
long
unsigned
variantarray
d0
× d1
× … × dn
int arrayint arrayName[d0][d1] /*…*/[dn] = {
{ /*…*/ },
/*…*/
};
d0
string
immutable string
char* name = "text";
mutable string
char name[] = "text";
char name[5] = "text";
string method
strlen(str)
length of str
strncmp(str0,str1,n)
compare str0
and str1
to at most n
strncat(dest, src, n)
concatenate src
to dest
for at most n
function
static
static variable
static function
pointer
explicitly create pointer
int* ptr = &8;
implicitly convert to pointer
char* ptr = "bla";
struct
declare struct
struct
keywordstruct point {
int x;
int y;
};
struct point p;
p.x = 0;
p.y = 0;
typedef
keywordtypedef struct {
int x;
int y;
} point;
point p;
recursive definition
typedef struct node_t {
int val;
struct node_t* next;
} node;
access struct’s field
.
->
for pointer structnested struct
typedef struct {
struct {
int x;
int y;
};
int z;
} nested;
union
typedef union {
int theInt;
char chars[4];
} intChar;
union for struct index
union Coins {
struct {
int quarter;
int dime;
int nickel;
int penny;
};
int coins[4];
};
dynamic allocation
person* p = malloc(sizeof(person));
malloc
return void pointer, implicitly convertedperson* p = (person*) malloc(sizeof(person));
clean the dynamic allocation
free(person);
dynamically allocated array
int* arr = malloc(n * sizeof(int));
arr[i]
like ordinary arraypointer arithmetics
int*
skip by 4 bytepointer function
int real_function(int n) {
return n * n;
}
int (*function_name)(int arg) = &real_function;
call pointer function
(function_name)(5);
array of pointer function
void f1();
void f2();
void (*function_name[2])() = {&f1, &f2};
clang
clang program.c -c -o program.o
clang program.o -o program
clang program.c \
-fsanitize=address,leak,undefined \
program
Elixir
[1, 2, 3] |> inspect(charlists: :as_lists) |> IO.puts()
\e
instead of \033[
function
anonymous function
f = fn arg0, arg1, … -> result end
f.(a0, a1, …)
capture syntax
&fn1/n_args
&(&1…)
&1
for the first argument, &2
for the second, etc.Git
Checkout
git clone <repo> --no-checkout
git fetch && git branch
git reset --hard origin/main
git fetch origin main:main
git fetch --depth 1 git@github.com:username/repo.git FULL_SHA_FOR_COMMIT
Config
git config --global pull.rebase true # rebase on pull conflict
git config --global rebase.autoStash true # stash before rebase
git config --global submodule.recurse true # always recur submodule
Commit
git commit -m "<msg>"
git commit -am "<msg>"
git reset HEAD~
git revert …
--amend
git switch --orphan <branch>
Bookkeeping
FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch --index-filter \
'git rm -rf --cached --ignore-unmatch <path_to_file>' HEAD
git filter-repo --invert-paths --path '<path_to_file>' --use-base-name
find * -type l -not -exec grep -q "^{}$" .gitignore \; -print >> .gitignore
bash -c "git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest"
git diff --exit-code HEAD
git clean -f
git clean -fX
git rebase -i COMMIT_BEFORE_CHANGES
git merge --squash <branch>
Multiple repo
git submodule add <repo>
fd .git -H -t d -x git --git-dir={} pull
fd .git -H -t d -x git --git-dir={} fetch \; -x git --git-dir={} --work-tree {}/.. status
Multiple origin
git remote | xargs -L1 git push --all
git remote | xargs -L1 -I R git push R main
Gleam
\u{1b}[
instead of \033[
Java
built-in data type
primitive type
bool
&&
||
!
assert (a && b) == (!(!a || !b))
number
Integer.MAX_VALUE + 1 = Integer.MIN_VALUE = -Integer.MIN_VALUE
short
int
float
long
double
Infinity
NaN
numerical operator
+ - *
/
integer division if both integer else float division%
remainderMath
librarychar
convert to String
String.valueOf(charArray)
cast
(target_type) var_to_convert
wrapper type
String
concatenation
+
string1.append(string2)
StringBuilder
better way to concatenate stringarray
{literal1,…}//an array
arr1.length//length attribute of array
Arrays.sort(arr1)//sort array arr1
declaration
type1[] arr1; //declare array name of type type1
initialization
new double[length] //an array of length length, all 0.0s
null
access & mutate
arr1[i] = literal; //refer to array arr1 by index i
two-dimensional array
basic syntax
conditioning
if
if (bool1) { execution1; }
else if (bool2) { execution2; }
// …
else { execution0; }
switch
switch (var1) {
case value1: execution1;
break;
// …
}
loop
while
while (bool1) { execution; }
do while
do { execution; } while (boolean);
for
for (initialization_statement1, …; bool1; increment_statement1, …) { execution; }
access modifier
default
public
public …;
protected
protected …;
private
private …;
mutability modifier
final … var1 …;
static vs instance
… static …;
comment
// inline
/* block */
docstring
/**
* doc
*/
assertion
assert bool1 : "error message";
bool1 == false
input
command-line argument
args[i]
0
stdin
import java.util.Scanner; //import scanner
Scanner sc= new Scanner(System.in); //define a scanner
Type1 var1 = sc.nextType1();//let variable be input
output
print
System.out.println(output)//print output \n
System.out.print(output)//print output
formatted print
System.out.printf(“string1%w1.p1c1… string2”,output1,…)
//print string1 output1… string2
// with field width w, precision .p, and conversion code c
w
counts from the rightd
: decimalf
: floatinge
: scientifics
: stringb
: boolean%
and c
pretty print object
Gson gson = new GsonBuilder().setPrettyPrinting().serializeNulls().create();
gson.toJson(object0);
class
class as function library
public class ProgramName {
public static void main(String[] args) {
// main function
}
}
global variable
public class ProgramName {
static type1 var1;
}
class as abstract data type
public class ClassName {
type1 ins1; // instance variable …
/*
* constructor
*/
public ClassName(arg…) {
// …
ins1 = …; // need to initialize all instance variable …
}
public type2 method1(arg…) {
// …
} // instance method …
public static void main(String[] args) {
// test
}
}
constructor
use constructor
ClassName var1 = new ClassName(arg…);
access instance variable
var1.ins1
use instance method
var1.method1(arg…);
common instance method
equals method
public boolean equals(Object x)
{
if (x == null) return false;
if (this.getClass() != x.getClass()) return false;
return (this.ins1 == x.ins1) && …;
}
hashCode method
public int hashCode() {
return Objects.hash(ins1,…);
}
interface
public interface InterfaceName {
type1 var1; // instance variable …
public abstract type1 method1(arg…); // empty abstract method …
}
abstract method
implement
public class ClassName implements InterfaceName{
// …
}
implement Iterator
import java.util.Iterator;
class ClassName implements Iterator<K> {
// …
// must-need method for Iterator
public boolean hasNext() {
// …
}
public Item next() {
// …
}
public void remove() {
// …
}
}
subclass
public class SubClassName extends ClassName {
// …
}
generic programming
generic class
public class GenericClass<typeParameter> {
// …
}
use constructor
GenericClass<type1> var1 = new GenericClass<type1>(var0);
functional programming
lambda expression
(arg…) -> stuffToReturn;
package
import
static import
import static package1
timing
System.currentTimeMillis()
System.nanoTime()
JavaScript
data type
??
give default value if the expression before is null
or undefined
?
skip method chain and return undefined
if expression before is undefined
assignment
||
execute if the former one failarray
[…]
access data
array1[index1]
.length
.push()
.pop()
work like stack.unshift()
.shift()
work like queue.splice(index,number,…)
remove number
items from index
and insert …
.reduce(fun1)
take variable into function and put result back[...arr1]
[attr1, ...arr2] = arr1
loop array
for… of
loop.forEach(callback)
string
`blah ${var1}…`
.toUpperCase()
.toLowerCase()
.trim()
.slice(start,end)
a.slice(1)
return substring excluding a[0]
a.slice(-1)
return a[a.length - 1]
function
function fun1(var1,…) {
// …
}
anonymous function
(var1,…) => {
// …
}
var1 => expression1
variadic function
arguments
magic variable: array-like object(arg0, …, ...args)
: arraycomparison
===
!==
does not convert type==
!=
convert typeobject
{
attr1: lit1,
// …
}
delete
attribute.hasOwnProperty(attr)
check if has attr
{attr1, ...obj2} = obj1
obj2 = {...obj1, attr1: val1}
quick initialization
{
attr1,
attr2,
// …
}
{
attr1: attr1,
attr2: attr2,
// …
}
access attribute
.attr1
object1[attr1]
immutable object
Object.freeze(…)
method
class
class Class1{…}
constructor
constructor(…){…}
new
getter/setter
get var1() {…}
/set var1(…) {…}
MathJax
$…$
inline math<script type="text/x-mathjax-config">
MathJax.Hub.Config({ tex2jax: {inlineMath: [["$","$"]]} })
</script>
Julia
use latex symbol
\…
and tabbuilt-in
constant
π
or pi = 3.1415926535897...
Inf
and NaN
i im
math function
sqrt()
variable
check type
typeof()
numerical type
extreme representable value
typemin(Type1)
/ typemax(Type1)
convert to bits
bitstring()
avoid overflow using
big()
BigInt
or BigFloat
or use big"…"
if the number if too big for Int or Floatmachine epsilon
eps(Type1)
implicit numeric multiplication
2x^2x
is equivalent to 2 * x^(2 * x)
arithmetic operation
÷
integer division^
power.
make the next operator element-wise
e.g. .+
element-wise addnumeric comparison
isequal()
compare objectisfinite()
isinf()
isnan()
conversion
Type1(…)
pattern matching
_
can be assigned valuestring
begin
and end
for indexr"…"
regex stringb"…"
byte stringv"…"
version literalraw"…"
raw stringstring concatenation
string(str1, str2, …)
combine multiple stringstr1 * str2
concatenate them"$str1 and $(str2)."
formatted stringtuple and list
(a, b, c)
tuple[a, b, c]
listmatch tuple or list or variable length
first, second, rest... = (a, b, c, d, e)
# rest = (c, d, e)
function
function func1(args)
# …
end
func1(args) = # …
nothing
is returned without a return valuefunction argument type annotation
func1(arg1::Type1, …)
optional argument
func1(arg1, opt_arg1=default_val1, …)
keyword argument
func1(arg1, ¬; keyword_arg1=default_val1, …)
;
when calledfunction taking function as argument
func1(f::Function, arg2, …)
func1(x -> …, val2, …)
func1(val2, …) do arg1_for_function_passed_in, …
# …
end
dispatch
anonymous function
(args…) -> # …
function (args)
# …
end
function composition
(f ∘ g)(args)
f(g(args))
function chaining (function piping)
args |> g |> f
use piping with broadcasting
.|>
apply piping element-wisevectorize function
func1.(args)
func1
to each element in each argument, equivalent tobroadcast(f, args)
Kotlin
pattern matching
when (var0) {
LITERAL -> // …
in 0..5 -> // …
is TYPE -> // …
else -> // …
}
nullability
var0: TYPE?
late-initialized variable
lateinit var var0: Type0
access field in nullable variable
null
if var0
is null
using safe-call operatorvar0?.field
var0!!.field
default_var0
if null
using Elvis operatorvar0?.field ?: default_var0
anonymous function
fun fn0(arg0: TYPE0): RETURN_TYPE0 {
// …
}
val lambda0: (TYPE0 -> RETURN_TYPE0) = ::fn0
val lambda0: (TYPE0 -> RETURN_TYPE) = { arg0 ->
// …
}
it
val lambda0 = {
doStuffWith(it) // …
}
fnUsingLambda(arg0) {
// Body of anonymous function.
}
suspend function
suspend fun fn0(arg0: TYPE0): RETURN_TYPE0 {
// …
}
CoroutineScope
, execute seemingly sequentiallyGlobalScope
val scope = CoroutineScope(Job())
val scope = MainScope()
launch
or async
of a CoroutineScope
Dispatcher
:withContext(Dispatchers.IO) {/*IO tasks*/}
withContext(Dispatchers.Default) {/*CPU tasks*/}
Misc
Array
s: .contentToString()
Lua
data type
nil
string
str1 .. str2
concatenationstring.format("literal and %s…", variables…)
formatted stringboolean
false
and nil
are falsymap
map1 = {
key1 = val1,
-- …
}
for index, element in ipairs(map1) do
-- …
end
pairs
loop not necessarily in order#map1
lengtharray
list = {}
for element in iterable1 do
table.insert(list, element)
end
table.sort(list)
arithmetic operation
~
not ~=
not equalvariable
local
make variable localfunction
function func1(args)
-- …
return …
end
{ ... }
putfunction func1(...)
local args = { ... }
-- …
end
condition
if bool1 then
-- …
else if bool2 then
-- …
end
else
-- …
end
while bool1 do
-- …
end
for element in iterable1 do
-- …
end
result = bool1 and option1 or option2
input/output
command line argument
local args = { ... }
get script file parent directory
debug.getinfo(1).source:match("@?(.*/)")
get home directory
os.getenv('HOME')
run command
function run_command(command, pattern)
handle = io.popen(command)
result = handle:read(pattern or "*a")
handle:close()
return result
end
module
load a module
mod1 = require("mod1")
require
)local function use(module)
package.loaded[module] = nil
return require(module)
end
create a module
Mod1 = {}
function Mod1.func1(args)
-- …
end
return Mod1
Perl
cpanm install A::B
Python
environment
variable
operator
=
naming convention
PEP 8 guideline
inspect (in REPL)
unassign (delete)
del var1
data type
type()
string
str
len(string1)
string literal
"string literal"
""
or ''
"first and \
second"
"""first line
second line"""
r"raw string"
string operation
string1 + string2
need to manually make sure both are stringstring1[index1]
→ character
negative index count from rightstring1[index1:index2]
return string consisting of string1[index1]
to string1[index2 - 1]
leaving index1
or index2
empty means 0
or -1
by default
exceeding string length return empty string ''
for those parts\string1 * n
repeat n
timeimmutability
string method
.lower()
.upper()
.rstript()
from the right
lstript()
from the left
stript()
both side.startswith()
.endswith()
→ boolean.find(substring1)
→ int index of the first found; -1
if not found.replace(string1,string2)
input
input(prompt_string)
→ strstring conversion
str(var1)
output
print(string1, string2…)
each part must be string
' '
are added between string
f"{string1}string_literal…"
→ str{}
number
int
float
number literal
1
_
ignored1e2
→ floatinf
/ too small → -inf
arithmetic operation
+
subtraction -
multiplication *
→ int iff both operand int, else float/
→ float//
→ int iff both operand int, else float, round down
no 0
denominator allowed regardless of type**
→ int iff both operand positive int, else *float%
→ int iff both operand int, else *floatnumber method
round(num1,n)
by default, blank precision → int with None
precision
with int n
→ int iff both int, else float with n
floating digit (negative precision go beyond decimal)abs()
return same typepow(base1,exponent1) = base1 ** exponent1
, pow(base1,exponent1,mod1) = (base1 ** exponent1) % mod1
float1.is_integer()
→ booleannumber conversion
int(string1)
can only take string in integer formoutput
fixed-point number
f"{num1:format}"
.pt
precision p
type t
e.g. .2f
.pt%
in percentage form,
separate digit with commacomplex number
format
n =
num_real + num_imagine
j
n.real
n.imag
→ floatn.conjugate()
*int
and float
also have theseNone
NoneType
data structure
tuple
tuple literal
(elem1,…)
()
(element1,)
built-in create method
tuple(converted1)
convert iterable converted1
into tuplelength
len(tuple1)
indexing
tuple1[index1]
slicing
tuple1[index1:index2]
→ shallow copypacking/ unpacking
tuple1 = elem1,…
var1,… = tuple1
var1,…,varN = literal1,…literalN
check contain
elem1 in tuple1
→ booleanlist
[]
create
list(iterable1)
string1.split(separator1)
[expression1 for elem in iterable1]
mutate
list1[index1] = elem1
list1[index1:index2] = list2
not necessarily same lengthlist1.insert(index1,elem1)
index too big is seen as last indexlist1.append(elem1)
append to last space, equivalent to list1.insert(list1.__len__(),elem1)
list1.extend(iterable1)
append an iterablelist1.pop(index1)
return and remove list1[index1]
list method
sum(list1)
iff all element are numbermin()
max()
list1[:]
or list1.copy()
list1.sort()
key list1.sort(key=func1)
sort by return value of func1(elem)
dictionary
create dictionary
{key1:val1,…}
((key1:val1),…)
{}
or dict()
access
dict1[key1]
→ value correspondingdict[index1]
dict1.items()
mutate dictionary
dict1[key1] = val1
del dict1[key1]
set
create set
set(a) # or
{a, b, c}
set method
.union(iter1)
≈ |
.difference(iter1)
≈ -
.symmetrical_difference(iter1)
≈ ^
issubset(set1)
≈ <=
issuperset(set1)
≈ >=
function
property
function are values
side effect
anatomy
def fun_name(parameter_list):
fun_signature
fun_body
parameter
fun_name(para1=val1,…)
return statement
return return_value
None
if no return statementcall
function_name(args,…)
built-in function
help(fun_name)
user-defined function
loop
while loop
while test_condition:
for loop
for membership_expression:
else:
execute if not breakmembership expression
a in b
i in range(n)
break out
break
next iteration continue
scope
LEGB rule
global var_name
let local access global
class
class Class1:
CamelCasedunder method
.__method1__()
local method
_method1()
naming convention(instance) method
.__str__()
instance
instantiate
Class1(para1,…)
attribute
class attribute
instance attribute
.__init__()
def __init__(self,para1,…):
access
object1.attribute1
dot notationinheritance
class ChildClass1(ParentClass1):
object
isinstance(object1,Class1)
method inheritance
super().method1(arg1,…)
module
calling module
import module1
import module1 as name1
from module1 import name1,…
namespace
module1
in module1.name1
specify different options for import and ran directly
if __name__ == '__main__':
package
must-have module
__init__.py
import module from package
import package1.module1
from package1.module1 import name1
subpackage
package manager
de facto package manager
pip
file input/ output
path library module
pathlib
Path
objectpath1.name
path1.stem
path1.suffix
path1.exists()
→ booleanpath1.is_file()
→ booleanpath1.is_dir()
→ booleancreate
Path(string1)
Path.home()
or Path.cwd()
path1 / string1
or path1 / path2
absolute/ relative path
absolute path
path1.is_absolute()
→ booleanpath1.anchor
relative path
path1.anchor
→ ''
path component
path1.parents
→ iterable one level up path1.parent
→ Pathmanipulation
path1.mkdir()
avoid error if directory already exist path1.mkdir(exist_ok=True)
create all the parents directory if not exist path1.mkdir(parents=True)
path1.touch()
trying to make file that already exist do nothingpath1.iterdir()
→ iterablesrc_path1.replace(des_path1)
overwrite destination by defaultpath1.unlink()
avoid error if not exist path1.unlink(missing_ok=True)
search for file in directory
path1.glob(pattern1)
→ iterable*
—any number of character?
—1 character[abc]
—any included character
use **/
search in the subtree
or use path1.rglob(pattern1)
shutil
moduleshuril.rmtree(path1)
os
modulepathlib
and shutil
are implemented using os
file
text file
character encoding
line ending
\r
\n
csv file
csv
modulewrite
csv.writer(file1)
writer1.writerow(list_of_str1)
add ','
between all item and \n
at the endwriter1.writerows(list_of_list_of_str1)
csv.DictWriter(file1,fieldname=list_of_str1)
DictWriter1.writeheader()
→ int of character writtenDictWriter1.writerow(dict1)
DictWriter1.writerows(list_of_dict1)
it in fact work with any object with write()
methodread
csv.reader(file1)
→ iterable of list of strcsv.DictReader(file1)
→ *iterable of dict of header:val
binary file
file object
create
path1.open()
from pathlib
path1.open(mode=mode1)
"r"
—read"w"
—write"a"
—append"~b"
—for binary file with ~
be one of the mentionedpath1.open(encoding=encoding1)
. "ascii"
"utf-8"
open(string1)
built-inpath1.open()
except using a string
with statement
with … as file1:
close
file1.close()
read
file1.read()
→ strfile1.readlines()
→ iterable of strwrite
write a string
file1.write(string1)
→ int of character writtenwrite a list of string
file1.writelines(list1)
regular expression (regex)
meta-character
wild card
*
any number of the character left to it, greedy, excluding \n
.
a character, excluding \n
*?
any number of the character left to it, non-greedy, excluding \n
string method
re.findall(pattern_string1,checked_string1)
→ list of strre.meth1(…,re.IGNORECASE)
re.search(pattern_string1,checked_string1)
→ MatchObject match_object1.group()
→ the first greedy resultre.sub(pattern_string1,replace_string1,checked_string1)
mistake
error
try
try:
execution1
except ErrorClass as error_name:
execution2
finally:
execution3
except (error_name1,error_name2):
except
except:
catch any error
comment
# block comment
code # in-line comment
docstring """description of this program"""
tqdm
from tqdm import tqdm
tqdm._instances.pop().close()
Mathematica
approximation
….
N[]
complex number
I
variable assignment
Clear[]
data structure
list `
list1[[index1]]
list1[[start1;;end1]]
FullForm[]
vector
.
or Dot[,]
esc
+ cross
+ esc
or Cross[,]
matrix
MatrixForm[]
hide output
;
force simplify
Simplify[]
math function
define
f[var1_]:=…
calculate
f[…]
integration
Integrate[fun1[var1],var1]
Integrate[fun1[var1],{var1,start1,end1}]
NIntegrate
equation
polynomial
SolveValue[]
gives precise solution NSolveValue[]
gives approximate solutionnon-polynomial
FindRoot[]
uses Newton’s Methodplot
MATLAB
syntax
number
special variable
ans
eps
special constant
Nan
Inf
i
or j
pi
numerical operator
+
minus -
*
.*
^
.^
\
.\
/
./
sqrt()
character syntax
:
()
[]
.
...
,
_
._
semicolon
;
percentage sign
%
% comment
% {
comment
% }
save and load
.mat
file save file_name
load file_name
variable
assign
null
typeans
var1 = command1 ...
command2
global
vector
[num1 …]
[row1;…]
matrix
[r1c1 r1c2 …; r2c1 …]
solve Ax=B
x = A\B
variable management
who
whos
clear
var1
clear var1
exist var1
session management
shell environment
matlab -nodisplay -nodesktop
quit
or exit
file system
cd
dir
delete
pwd
format
format arg1
format short
format long
format bank
format arg1 e
format rat
find help
help
lookfor
MIPS Assembly
register
$0
: always 0$v0
, $v1
: expression evaluation and function return value$a0
~ $a3
: argument$t0
~ $t9
: temporary variable saved by caller$s0
~ $s7
: temporary variable saved by callee$gp
: global area pointer$sp
: stack pointer$fp
: frame pointer$ra
: return addressmemory declaration
.data
var0: .word number0, number1, …
var1: .space number_of_bytes
var2: .byte character0, character1, …
var3: .asciiz string
var4: .float floating_point_number
.text
.align 2
.global main
main: …
function0: …
line label
label: instruction0
…
inner label
instruction format
opcode operands 6 bits 26 bits R-type
name Op Rs Rt Rd Sh Func bits 6 5 5 5 5 6 I-type
name Op Rs Rt Immed bits 6 5 5 16 name Op Target bits 6 26 system call
syscall
$v0
$a0
, $a1
, $f12
(float)syscall
$v0
, $f0
(float)syscall code
arguments: buffer, lengthRacket
language type
#lang racket
data type
string
boolean
#t
#f
#f
are considered #t
variable
(define var_name expression)
function (procedure)
(define (func_name args) (output))
(func_name args)
special form
(if test1 do_if_true do_if_false)
(cond (test1 do_if_test1)
(test2 do_if_test2)
(else fallback_expr))
(begin first_thing_to_do second_to_do)
Ruby
string
string literal
'parsed as is'
\
"string where \nis parsed"
formatted string
str = "yes
"say #{str}"
match string
str =~ regex
comp = /\[(?<comp1>.*)\](?<comp2>.*)/.match(str)
comp1 = comp[:comp1]
comp2 = comp[:comp2]
get all word from string
str1.split(/[^[[:word:]]]+/)
array
arr = [ "a", 1, true ]
arr[0] # => "a"
arr[1, 2] # => [ 1, true ]
array method
.length
.push(ele)
.pop
# loop through
arr.each do |ele|
# …
end
# join with `str`
arr.join(str)
# map
arr.map do |ele|
# …
end
# map with index
arr.map.with_index do |ele, index|
# …
end
string array
%w(this will be split) # => [ "this", "will", "be", "split" ]
hash
# old way
ha = {
'a' => 1,
'b' => 2
}
ha['a'] # => 1
ha['c'] # => nil
# new way
ha = {
'a': 1,
'b': 2
}
ha[:a] # => 1
block
{ |args|
# …
}
# or
do |args|
# …
end
call block
yield {|ele| puts ele}
control flow
if
if a
# …
elsif b
# …
else
# …
end
case
case a
when b
# …
when c
# …
end
while
while a
# …
end
statement modifier
do_something if a
function
yield
execute a blockeval
evaluate a string as codedefine function
def func1(arg = default_value)
# …
# return last expression
end
main function
if __FILE__ == $0
# main function
end
class
class ClassName
# called with `new`
def initialize(arg)
# …
end
end
instance variable
@name
access instance variable
attr_accessor :name
method
# all instance method of class
ClassName.instance_methods
# instance method excluding inherited
ClassName.instance_methods(false)
inherited method
.respond_to?(str)
check if object has method str
.to_s
convert to string.nil?
check if nilstatic method
def self.static_method_name(arg)
# …
end
safe navigation syntax
&.
optional chaining
module
module ModuleName
def self.static_method1
# …
end
end
exception
class CustomError < RuntimeError
end
Rails
new rails project
rails new <project_name>
start rails server
rails s
routing
config/routes.rb
Rails.application.routes.draw do
# root route is special
root 'articles#index'
# non-root route like this
# trigger `index` action in `articles` when GET request on `articles`
get 'articles', to: 'articles#index'
# route with variable use `:`
# the variable is passed to controller in the hash `params`
# in this case `params[:id]`
get 'articles/:id', to: 'articles#show'
end
resourceful routing
Rails.application.routes.draw do
resources :articles
end
URI pattern controller#action articles articles#index articles/new articles#new articles/:id articles#show (POST) articles articles#create articles/:id/edit articles#edit (PATCH) article/:id articles#update (DELETE) article/:id articles#destroy method return article_path “articles/#{article.id}” model
app/models
generate model
Article
with two fieldrails g model Article title:string body:text
generate and save new model object
# generate new article object
article = Article.new(title: 'Example', body: 'Example text.')
# save to database
article.save # => true
update model object
article.update # => true
validation of model object when save or update
# declare a `article_params` method to validate
def article_params
params.require(:article).permit(:title, :author, :body, :status)
end
# use the method when try saving or updating
if @article.update(article_params)
redirect_to @article
else
render :edit, status: :unprocessable_entity
end
query model object from database
# query article by id
Article.find(1) # => 1 Article object or Nil
# query all articles
Article.all # => 1 ActiveRecord::Relation object
view
app/views
html.erb
file<% … %>
<%= … %>
<%# comment %>
access instance variable in view
<ul>
<% @articles.each do |article| %>
<li>
<%= article.title %>
</li>
<% end %>
</ul>
link to other page in view
article_path
<%= link_to some_text, article %>
redirect_to @article
redirect_to
to mutate database<%= image_tag image_path %>
<%= link_to some_text,
url_for(params.permit!.merge(field_to_change: field_value)) %>
partial template
_
_form.html.erb
access partial template from another view
<%= render 'comments/form' %>
_
render 'form', article: @article
# or the longer format
render partial: 'form', locals: { article: @article }
# or even longer format
render partial: 'form', object: @article, as: 'article'
render @products
# or longer format
render partial: 'product', collection: @products
<% @products.each do |product| %>
<%= render partial: "product", locals: { product: product } %>
<% end %>
controller
app/controllers
articles
is articles_controller.rb
generate controller
ArticlesController
and its index
actionrails g controller Articles index
index
action of Articles
render app/views/articles/index.html.erb
instance variable in controller
# this will be accessed in `articles/index.html.erb`
def index
@article = Article.all
end
controller method for resourceful routing
index
method in controllershow
method in controllerparams[:id]
new
method in controllercreate
method in controllerdef create
@article = Article.new(title: '…', body: '…')
if @article.save
redirect_to @article
else
render :new, status: :unprocessable_entity
end
database
migration
db/migrate
rails db:migrate
console
rails console
Hotwire
Turbo
fn
use std::io
Rust
function
fn
main function
fn main(){}
associated function
`type1::fun1()`
method
instance1
.method1();
variable
declare
let
let mut
type
String
String::new()
str
"str literal" // str literal
r#"str
literal"# /* raw str
literal */
enumeration
variant
result
Result
Ok
or Err
Option
:result.map_err(|e|/*…*/).ok()`
result.map_or_else(|e|/*…*/, identity)
anyhow
, both Result
and Option
can be wrapped as Result
using .context
Ok
variantErr
variant.expect()
methodErr
and display argumentstatement
;
expression
;
can be used as return statementreference
&var1
&mut var1
input/ output
use std::io
input
io::stdin()
→ an instance of type std::io::Stdin
std::io::Stdin
macro
print
println!()
debug print
dbg!()
todo
todo!()
unreachable
unreachable!()
comment
//
package
crate
root convention
src/main.rs
src/lib.rs
import crate
use
crate::…
self::…
import function
import struct or enum
alias
use … as …
re-exporting
pub use …
nesting path
use common_path::{path1,path2…}
use common_path::{self,path1…}
glob operator
*
trait
define
pub trait Trait1 {…}
trait method
`fn fun1(&self,…)->Type1;`
`fn fun1(&self,…)->Type1 {…}`
implement trait on type
`impl Trait1 for Type1 {…}`
implement trait method
`fn fun1(&self,…)->Type1 {…}`
use trait as parameter
`pub fn fun1(para1: &impl Trait1) {…}`
pub fn fun1<T: Trait1>(para1: &T) {…}
multiple trait bound
+
pub fn fun1(para1: &(impl Trait1 + Trait2)) {…}
pub fn fun1<T: Trait1 + Trait2>(para1: &T) {…}
where
clausefn fun1<T,U>(t: &T, u: &U)->Type1
where T: Trait1 + Trait2,
U: Trait3 + Trait4
{…}
return type implement trait
`fn fun1(…)->impl Trait1 {…}`
closure
`let closure1 = |var1,…| {return_value};`
FnOnce
traittype annotation
|var1: type1,…| -> return_type {return_value};
storing closure
struct Store1<T>
where
T: Fn(type_for_var1,…) -> return_type,
{
attr1: T,
…
}
Option<…>
capture environment
Fn
trait, mutable as needed implement FnMut
traitmove
`move |var1,…| {return_value}`
R lang
naming
character allowed
.
, _
_
..
basic syntax
command separation
;
or \n
comment
#
last to the end of lineassignment
=
assignee <- assigner
assigner -> assignee
function
execute from
.R
filesource("filename.R")
manipulate data storage
show all object
objects()
remove object
rm(obj1,obj2,…)
store to disk
y
when quit q()
stored as .RData
data structure
vector
c(ele1,ele2,…)
can also take vector as element, will expandpackage managing
install package
install.packages("package_name")
use a vector install.packages(c("package1","package2",…))
Shell
sudo
file & directory
pwd
path
ls [path]
path
, by default to ~
cd [path]
cd ..
cd -
cp <original file path> <new copy’s parent folder path>[/<new copy’s name>]
if the new directory does not exist, the command will create itcp -r <original folder path> <new copy’s parent folder path>
rm <file path>
rm -r <folder path>
rmdir <folder path>
mkdir
./<file name>
system information
df [-h]
du [path]
free [-m]
top
uname -a
lsb_release -a
user
adduser <new user’s name>
passwd <user name>
manual
man <command>
man intro
<string>
whatis -r <string>
history
history
cursor navigation
ANSI terminal escape sequence
\033[
\033[2J\033[H
\033[2K\r
The Trail Language
Syntax
a+b
is a single token, it can not involve procedure call.=
and many other side-effects-only procedures return ( )
.1 = a -- Assign `1` to `a` by calling procedure `=` at comptime.
true =M b -- Mutable variable.
a / 2.0 = a/2 -- Call procedure `/` with `a` and `2.0` and assign to `a/2`.
false =R b -- Reassign `b`.
0 .. 10 do { "Hey" print } -- Print "Hey" 10 times.
( 1 2 ) nth 0 assert_eq 1
Procedures
=>
is a comptime procedure that takes a tuple of identifiers and a block as arguments and returns a procedure.call
to call it.+doc
takes anything and a string as its two arguments, and apply the string as documentation to the first argument.( a b ) => { a + b } = add -- Procedure `add`.
( ) => { "hell" ++ "o" } +doc "Return \"hello\"." = greet -- Procedure `greet`.
greet doc starts_with "Return" assert -- Assert the documentation of `greet`.
greet = also_greet
greet call = hello -- => "hello"
Names
( a b ) => { a pow b } = ** -- Procedure `**`.
Compile-time (comptime) computation
C
are calculated.32 * 32 = TWO_TO_TEN -- Comptime.
( a b ) => {
b * b = b_square -- Run-time.
a * TWO_TO_TEN + b_square -- Run-time.
} = hyperbole -- Comptime.
( x ) => {
2 hyperbole 7 C = LUCKY_NUMBER -- Comptime.
LUCKY_NUMBER is_comptime assert -- Assert that `LUCKY_NUMBER` is comptime.
x + LUCKY_NUMBER -- Run-time.
} = add_luck -- Comptime.
2 =M TO_BE_CHANGED -- Comptime mutable.
0 .. 10 do { TO_BE_CHANGED * TO_BE_CHANGED =R TO_BE_CHANGED } -- Comptime
-- `TO_BE_CHANGED` = 4294967296
-- Assign a procedure type that take one argument of `Any` type and returns
-- an instance of `( )` to `equal_type`.
( Any ) proc_type ( ) = equal_type
= s_type assert_eq equal_type -- Assert `=`'s type.
Referral Links
Payment
Research Notes
Topic
Execution
arguments.md
and literature.md
Overall steps
[ ] find additional topics beyond blog posts to generate websites/ find tools to generates similar to existing Preliminary
preliminary_binoculars_eval.md
) edit: … and WikiHow search resultKeyword acquisition
wikihow.md
)[ ] from Google Trends ( google_trends.md
)get topic class from Google Trends[ ] human brainstorm related keywordsWeb searching and crawling
web_search.md
DOM Distiller Reading ModeGenerated text detection
[ ] filter out non-article ( rely on #tokens filtering & low probability of getting many title pagesfilter_non_article.md
)[ ] text cleaning cleaned by TrafilaturaCase studies
ad_extraction.md
)perhaps use blacklist used by RefinedWeb few matchesDevelopment
--recurse-submodules
and remember to update submodules on pullrye sync
, rye add
)nvidia-cuda-runtime-cu12
version for Binoculars are unfortunately hardcoded for Exxact; need to change if used on other machine. venv/bin/activate # if not in Rye's virtual env
pre-commit install
. static_checks.sh
before making pull requestArguments
reference.bib
for future writing)Significance
Crawling
Generated text detection
Content farm
Generalizing to non-article webpages
Literature
Content farm & scam
Search Engine Optimization (SEO)
Generative AI (GenAI)
Training data curation
Synthetic data
Google Trends
google_trends_all_cat.json
from https://github.com/pat310/google-trends-api/wiki/Google-Trends-Categorieshow to
→ how to train y
→ how to train your dragon
Preliminary Binoculars Evaluation
degentweb.common_crawl.classify_english
data/common_crawl/prelim_test/
; 5961 page; used 1h 20mManual inspection of low-score page among 1010 Common Crawl page
Case studies
Problem
❓ tested, unreliable<meta property="og:type" content="article">
but not every article has thisGoogle Bing 500 WikiHow articlesdegentweb.browser.bing_search
degentweb.classifying.google_prelim
degentweb.classifying.prelim_data_analysis
not running browser; 20 Bing search result per query; ~133 Google querytend to have seem not reliableog:type
be article
As an AI
og:type
or have it be webpage
Site crawling
degentweb.browser.visit_subdomains
Ad Extraction
Options for methods
lxml
or similar Browser operation
networkidle
(in degentweb.browser.save_page
)degentweb.browser.recrawl_no_ads
Ad classification
Filter out Non-Article
og:type
= article
. unreliable because not every article has this tag and content farm may have this tag be website
Segmentation for generalization
Webpage segmentation (WPS) tool
Baseline Websites
curl/seed_db.py
Human-written
EDGAR database → company name → search for website many of them do not have websiteUS Business Database? no website linkLinkedIn? forbit crawling.*/blog/.*
URL in CommonCrawl?Machine-generated
curl/suggestions.py
) Training/test dataset
AI website generator
Provided example generated sites
/blog
; unlike most content farm foundWebsite generator capability
What category to cover
Classifying Websites
SVM website classifier
classifying/site_svm.py
Content-Type: text/html
filter_non_article.md
)baseline_sites.md
)Applying in the wild
classifying/full_site_svm.py
)Crawling websites for classification
browser/bing_search.py
Generative AI (GenAI)
AI-generated text detector
(7s per input on A100) Issues from AI-generated text
et al.
) lessHuman detection of AI-generated content
Browser extension/add-on
Sketch of JSphere the Project
Problems
Hypotheses
Preemptive loading. I don’t think so.Deeper motivation
JS functionality spheres
Survey methods
Questionnaire for developers—social science :(Analysis methods
—too much work. Presumably VisibleV8. Anticipated implications
Development
Literature around JS Monitoring
A Symbolic Execution Framework for JavaScript, 2010 S&P
An empirical study of privacy-violating information flows in JavaScript web applications, 2010 CCS
Modeling the HTML DOM and browser API in static analysis of JavaScript web applications, 2011 ESEC/FSE
🤷 JSFlow: Tracking Information Flow in JavaScript and its APIs, 2014 SAC
Online Tracking: A 1-million-site Measurement and Analysis, 2016 CCS
⭐ Browser Feature Usage on the Modern Web, 2016 IMC
🙅 JSgraph: Enabling Reconstruction of Web Attacks via Efficient Tracking of Live In-Browser JavaScript Executions, 2018 NDSS
⭐ VisibleV8: In-browser Monitoring of JavaScript in the Wild, 2019 IMC
Hiding in Plain Site: Detecting JavaScript Obfuscation through Concealed Browser API Usage, 2020 IMC
Jalangi: A Selective Record-Replay and Dynamic Analysis Framework for JavaScript, 2013 ESEC/FSE
👎 UXJs: Tracking and Analyzing Web Usage Information With a Javascript Oriented Approach, 2020 IEEE Access
FV8: A Forced Execution JavaScript Engine for Detecting Evasive Techniques, 2024 USENIX Security
Execution
Crawling
tests/README.md
. Need parser. They have post-processor/
but too specific. crawler.js
). ⇒ Let’s just use Puppeteer’s successor, Playwright. User-Agent
.headless_browser/target/
on host. sudo setenforce 0
for Fedora due to SELinux.encodeURIComponent(url)
, under which; user_data/
;N
out of 0~4) trial launch separate VV8 and write: $N/vv8-*.log
$N.har
reachable$N.json
eval
.Prevent navigation. Go back in browser history immediately when navigating. Browser bug: sometimes go back too much to Detect if page has horde on about:blank
.load
event, and reload if not.[ ] Fewer navigation when headless??www.
prefix, e.g., google.com.user_data/
after all trials.Analyze API call traces
popular_api_calls_analysis.md
. notable_apis.md
.HTMLDocument.createElement
before interaction is clearly DOM element generation.addEventListener
calls are frontend processing.notable_apis.md
.eval
per function chunk of 1kB code. Details in eval_trick.md
. Log file interpretation
$N/vv8-*.log
contains:Observations when manually inspecting aggregated logs for YouTube
youtube_scripts_api_calls_overview.md
.addEventListener
and appendChild
strongly indicate specific spheres.getting and setting custom attributes on window
, etc. are recorded, but they are not browser APIs.Function
s generally seem more useful because we can and do filter out user-defined ones. this
, 2 characters for attr
, at most 3 consecutive number).getting and setting from window
, calling Array
, etc. generally means nothing. API types (function, get, etc.) also seem useless once we consider this
and attr
.Classification heuristics
notable_apis.md
). See the classification results in classification_results.md
.Certain indicators
.*Event
, Location
(some attributes), HTML(Input|TextArea)Element.(value|checked)
addEventListener
, getBoundingClientRect
textContent
and anything on URLSearchParams
, DOMRect
, DOMRectReadOnly
createElement
, createElementNS
, createTextNode
, appendChild
, insertBefore
, CSSStyleDeclaration.setProperty
CSSStyleDeclaration
, style
removeAttribute
, matchMedia
, removeChild
, requestAnimationFrame
, cancelAnimationFrame
, FontFaceSet.load
, MediaQueryList.matches
hidden
, disabled
Performance
, PerformanceTiming
, PerformanceResourceTiming
, Navigator.sendBeacon
Intermediate indicators
XMLHttpRequest
(and Window.fetch
): send/fetch data from server, one of: SVGGraphicsElement
subclasses and canvas elements: graphics for UX enhancement, but you can render them and send SVG, so maybe DOM element generation?CSSStyleRule
, CSSRuleList
: UX enhancement or DOM element generation.Window.scrollY
: UX enhancement or frontend processing.Uncertain indicators
querySelector[All]
, getElement[s]By.*
: get a node, but then what?.*Element
’s contains
, matches
: search for a node or string, but then what?Storage
, HTMLDocument.cookie
: local storage, but then what?DOMTokenList
: store/retrieve info on node, but then what?IntersectionObserverEntry
: viewport and visibility, but then what?ShadowRoot
: web components, but then what?Crypto.getRandomValues
frames
: iframesDeferred
chrome.1
, chrome.2
and other such logs after their previous logs (chrome.0
) when analyzing to avoid unknown execution context ID.Notable APIs
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
16281 Function HTMLDivElement matches 843 835 1115182 1064906 4.807023 6.144259 2.282807 2.758132 95.491678
3621 Get TransitionEvent target 326 326 884580 883460 3.460662 5.166294 5.826075 6.594800 99.873386
9483 Function HTMLDivElement querySelector 1871 1647 847881 808260 1.416274 2.079385 2.040353 1.344933 95.327057
3378 Function CustomElementRegistry get 101 71 828723 806370 12.442614 13.151448 5.779473 2.220260 97.302718
14414 Get Event target 2002 1903 824110 783286 1.179062 1.482671 4.650736 5.382586 95.046292
16594 Function Performance now 2764 2340 624855 498159 0.960439 1.144829 3.490532 3.096365 79.723936
13516 Get Window location 9423 7181 579071 319716 0.664399 0.615618 6.606676 4.230041 55.211882
16209 Function HTMLDivElement querySelectorAll 2294 2227 574049 442116 0.767773 0.923417 1.696780 1.783361 77.017119
5904 Get HTMLDivElement classList 2137 1910 435096 351005 0.811707 0.916675 2.908658 2.905556 80.673001
17431 Get HTMLDocument nodeType 1153 969 419956 348949 0.773276 0.978419 1.682256 1.831237 83.091800
11333 Function HTMLDivElement closest 1111 1109 415799 62022 1.477813 0.420875 2.660322 3.038853 14.916342
5142 Get Event keyCode 1147 1147 376969 361243 0.682080 0.925483 2.624004 9.471046 95.828304
14908 Get Event key 790 786 344674 329328 0.900894 1.128496 2.078672 4.175591 95.547677
4078 Function HTMLDivElement contains 542 535 303535 254118 0.702358 0.960562 1.509068 3.241954 83.719505
5498 Get HTMLDivElement className 904 870 299901 263160 0.837205 1.027364 2.385866 2.627550 87.748957
2457 Get Window getComputedStyle 1535 1310 295456 159215 0.592931 0.438401 1.151197 0.846097 53.887889
2014 Function HTMLDocument createElement 9320 6330 286107 122524 0.296257 0.219127 2.612151 0.967649 42.824538
9545 Get Event which 455 454 263921 248376 0.621834 0.818845 1.011871 1.339680 94.109980
7048 Function HTMLDivElement addEventListener 2514 2184 252068 107180 0.391858 0.234839 2.477123 1.046851 42.520272
9862 Get HTMLAnchorElement nodeType 1250 1203 250886 102360 0.416466 0.238107 0.954481 0.410062 40.799407
7499 Get MouseEvent target 2602 2602 243446 243446 0.353644 0.459806 2.971607 4.680082 100.000000
3346 Get DOMRect top 1266 1213 227425 144601 0.468570 0.443659 1.414995 1.919121 63.581840
5732 Function HTMLDocument querySelectorAll 4647 3765 223991 187374 0.332157 0.412884 1.507670 0.883315 83.652468
8406 Get Window performance 3257 2539 223517 126555 0.347078 0.293616 2.802016 1.382159 56.619854
4999 Get ImageData data 22 12 218417 17777 14.651612 16.882722 6.550062 2.413431 8.139018
16481 Get DOMRect left 1177 1134 212885 126981 0.441333 0.387885 1.458221 1.986245 59.647697
11541 Function DOMTokenList contains 1919 1724 210971 141308 0.421512 0.391151 2.304020 2.127485 66.979822
3329 Get Event ctrlKey 543 533 209527 208734 0.494948 0.654059 1.262094 3.066172 99.621528
14621 Get Event timeStamp 1033 944 203949 199608 0.405411 0.522149 0.740702 0.852084 97.871527
4138 Get Event changedTouches 412 408 195658 195448 0.507606 0.648649 1.343745 1.961513 99.892670
10053 Get HTMLSpanElement tagName 702 663 193630 81431 0.472738 0.266883 1.569983 0.315915 42.054950
2996 Get HTMLDocument cookie 5578 4147 172695 110544 0.324422 0.336975 7.789128 2.641675 64.011118
7105 Get Event defaultPrevented 714 690 171461 167435 0.374097 0.499074 0.737214 0.976114 97.651944
6714 Get HTMLSpanElement style 548 504 167025 11671 0.517220 0.051895 3.090145 1.050911 6.987577
968 Get HTMLAnchorElement href 2456 2328 145107 135169 0.502295 0.942538 1.435041 1.251417 93.151261
4177 Get Performance timing 1679 1332 143633 127130 0.302955 0.382869 2.240215 0.829030 88.510301
14823 Function Window getComputedStyle 1697 1384 143326 75800 0.299979 0.220471 1.873741 2.419006 52.886427
7305 Get HTMLMetaElement dataset 335 295 142890 130639 0.982850 3.175198 1.974473 0.559279 91.426272
12098 Get PerformanceTiming navigationStart 1421 1228 136193 124766 0.302260 0.397229 1.608191 0.654052 91.609701
14284 Get MutationRecord target 114 109 136183 124456 2.137970 2.359855 2.534340 4.659148 91.388793
1861 Get HTMLAnchorElement addEventListener 1370 1178 131340 29593 0.307599 0.098327 7.717213 0.833458 22.531597
10400 Get Event metaKey 528 518 128645 128019 0.305513 0.402858 1.198076 3.016747 99.513390
4750 Get Event shiftKey 491 481 126416 125790 0.300380 0.396070 0.798485 1.127302 99.504810
15693 Get Window history 1452 1223 123512 37555 0.255723 0.114506 0.727469 0.699028 30.405952
5048 Get Event altKey 556 551 123345 122726 0.288138 0.378780 1.216686 2.985656 99.498156
2211 Get Window addEventListener 7893 6191 122537 31256 0.140030 0.056707 2.405811 0.710843 25.507398
13441 Get IntersectionObserverEntry boundingClientRect 135 118 121778 115749 2.683412 2.996992 0.959645 1.765307 95.049188
15065 Get Event relatedTarget 354 340 120531 119802 0.354351 0.449621 0.857684 1.087789 99.395176
595 Function HTMLAnchorElement isSameNode 30 30 117855 117855 8.029145 12.948267 7.101826 11.841353 100.000000
911 Function HTMLElement getAttribute 902 860 117772 31042 0.206437 0.074248 0.942646 0.938351 26.357708
12704 Function Storage setItem 2530 2151 117494 100433 0.229494 0.285617 1.399357 0.855013 85.479259
4182 Get Event touches 104 104 117488 117488 1.273323 1.823082 4.985049 5.382803 100.000000
3222 Get Event button 237 228 115014 114410 0.373357 0.467717 1.213677 1.483833 99.474847
8221 Get Event pointerType 215 211 113740 113504 0.392012 0.482340 1.327292 1.639816 99.792509
18 Get Event charCode 234 234 111544 110954 0.343765 0.432244 1.156061 1.442831 99.471061
3947 Function HTMLAnchorElement setAttribute 1354 1154 110406 80822 0.276005 0.419018 1.176040 0.787740 73.204355
15997 Get HTMLDivElement dataset 1019 941 110241 58327 0.305932 0.209428 2.886125 1.568396 52.908627
7327 Get HTMLDocument createElementNS 797 672 110043 29138 0.713621 0.383323 1.456620 0.289284 26.478740
15619 Function HTMLDocument querySelector 5256 3878 109894 70588 0.143162 0.159666 5.420111 2.722504 64.232806
14249 Get HTMLElement contains 156 155 109845 6882 0.529747 0.077596 1.327992 0.179596 6.265192
16323 Get CSSStyleDeclaration position 341 315 106935 93720 0.345195 0.415523 0.474251 0.340467 87.642026
4579 Get PerformanceResourceTiming name 459 421 104968 48263 0.501391 0.351048 7.462511 6.226703 45.978774
11476 Function DocumentFragment appendChild 555 540 103875 94235 0.567692 0.809485 2.425293 3.739579 90.719615
17561 Get HTMLDivElement removeEventListener 269 261 102308 93036 0.367874 0.461578 0.433881 0.630124 90.937170
14811 Function HTMLInputElement matches 120 120 101906 101899 0.937924 1.010790 1.194562 1.553083 99.993131
2818 Get IntersectionObserverEntry isIntersecting 835 815 100920 74725 0.674087 0.814893 5.613972 7.372370 74.043797
10958 Get MutationRecord attributeName 89 81 100443 98200 1.816932 1.910443 13.742685 17.799135 97.766893
15809 Function HTMLAnchorElement removeAttribute 373 347 99233 96946 1.153120 1.474503 0.726152 0.792383 97.695323
8141 Get Event pointerId 203 199 98769 98559 0.355878 0.442541 1.309350 1.638151 99.787383
9253 Get Event state 203 199 98769 98559 0.355878 0.442541 1.309350 1.638151 99.787383
15723 Get Event pageX 337 333 95672 95055 0.270339 0.381892 0.936991 1.368982 99.355088
17872 Get Event pageY 337 333 95672 95055 0.270339 0.381892 0.936991 1.368982 99.355088
1914 Function HTMLBodyElement appendChild 2017 1641 95122 87854 0.192521 0.238969 2.019357 1.951309 92.359286
66 Get DOMRect bottom 718 690 94763 38693 0.584749 0.580383 1.550073 2.244249 40.831337
1215 Get HTMLDivElement textContent 815 784 93982 72828 0.622264 0.822394 1.921985 1.384557 77.491435
15126 Get Window Reflect 815 650 93695 985 0.227729 0.003498 1.616386 0.718477 1.051283
6681 Get HTMLDocument getElementsByTagName 4894 3152 91115 76951 0.158297 0.196486 2.076936 0.360218 84.454810
6007 Function HTMLDivElement appendChild 2328 2052 87416 23624 0.157654 0.065366 2.490404 1.678324 27.024801
14338 Get Event clientY 227 223 86343 85736 0.274052 0.346012 1.155634 1.443879 99.296990
7370 Get Event clientX 223 219 86333 85726 0.274112 0.346119 1.174707 1.468122 99.296908
4883 Get HTMLAnchorElement classList 871 842 84332 70097 0.201348 0.278285 2.291487 2.019763 83.120286
16115 Get MouseEvent type 755 755 83349 83349 0.168377 0.221242 1.248696 1.422853 100.000000
6772 Get HTMLDocument hidden 1603 1537 82147 47474 0.200568 0.175928 1.534644 2.676095 57.791520
8011 Function EventTarget dispatchEvent 110 110 81800 63914 0.311552 0.299519 0.360584 0.373508 78.134474
12711 Function HTMLDocument getElementById 5559 3690 81282 42177 0.114240 0.095513 6.735331 2.271519 51.889717
11100 Function HTMLDivElement getBoundingClientRect 1295 1148 81101 61989 0.178199 0.207279 1.526008 1.561731 76.434323
2723 Function Storage getItem 4445 3465 80359 53636 0.111631 0.125812 2.384386 1.275023 66.745480
17488 Get HTMLAnchorElement disabled 42 42 79569 79569 2.102839 2.148920 1.905932 1.917607 100.000000
8714 Construction Event None 641 543 79374 63053 0.198452 0.214734 1.748308 0.696653 79.437851
5616 Function HTMLAnchorElement addEventListener 1346 1174 77789 18943 0.184291 0.063770 6.796528 0.760201 24.351772
611 Get HTMLElement addEventListener 755 743 77573 2900 0.456644 0.026392 1.805401 1.324329 3.738414
17667 Get Performance timeOrigin 638 498 77460 75360 0.467205 0.675298 0.726314 0.243126 97.288923
9166 Get Window requestAnimationFrame 1735 1609 77177 30706 0.153209 0.090460 2.655344 3.087896 39.786465
2338 Get Event screenY 208 204 76925 76328 0.269005 0.342615 1.238773 1.549703 99.223919
9977 Get Event screenX 208 204 76925 76328 0.269005 0.342615 1.238773 1.549703 99.223919
2418 Function HTMLBodyElement removeChild 667 517 76466 75216 0.460771 0.746417 1.423947 0.993174 98.365287
10247 Get HTMLBodyElement constructor 37 37 75365 32587 2.267393 2.680637 2.514466 2.597025 43.238904
14409 Function DOMTokenList add 2140 1859 75038 50171 0.119709 0.129215 2.026993 1.599572 66.860791
1353 Get DOMRect width 898 766 73558 50213 0.191047 0.175964 1.478908 1.038934 68.263139
8987 Get DOMRect right 721 696 72803 18396 0.448942 0.269434 0.867479 1.108576 25.268189
17217 Get Window matchMedia 1611 1379 72404 20964 0.157922 0.062416 2.640280 2.069632 28.954201
9493 Set Event flow 121 121 72147 57173 0.262724 0.259411 0.292955 0.308683 79.245152
3085 Get Event offsetX 155 155 70920 70340 0.249699 0.317938 0.758509 0.929270 99.182177
4424 Get Event offsetY 155 155 70920 70340 0.249699 0.317938 0.758509 0.929270 99.182177
14697 Get HTMLLIElement querySelector 324 316 68679 57442 0.438484 0.410091 1.165981 0.755493 83.638376
4801 Get HTMLAnchorElement textContent 834 818 68587 22234 0.426822 0.236131 1.026544 0.598604 32.417222
568 Get Window removeEventListener 2851 2478 67894 22285 0.106554 0.051886 0.643697 0.698827 32.823224
12336 Get HTMLElement id 396 376 66643 6243 0.211400 0.026531 0.862097 0.147936 9.367826
16137 Get Location search 5601 4460 65008 34421 0.089337 0.081936 2.977207 2.503777 52.948868
10354 Get Location hash 1782 1382 64491 49026 0.136625 0.146484 1.765293 0.619653 76.019910
9312 Get HTMLDocument removeEventListener 1123 1024 64243 25996 0.226914 0.159079 0.461634 0.592738 40.465109
7772 Function Storage removeItem 1630 1325 62618 55509 0.145805 0.179347 1.016465 0.985584 88.647034
12900 Get IntersectionObserverEntry target 515 496 62370 50038 0.595738 0.825574 6.585918 9.794942 80.227674
13380 Get HTMLImageElement src 324 278 61949 55192 0.248963 0.279343 1.549264 0.817876 89.092641
13971 Function CSSStyleDeclaration getPropertyValue 877 850 60107 18237 0.150438 0.063671 0.519474 0.272330 30.340892
12778 Function HTMLDocument getElementsByTagName 4775 3038 60070 46347 0.104988 0.118652 2.101945 0.346168 77.154986
5032 Get MouseEvent clientY 504 504 59862 59862 0.150060 0.198041 0.564462 0.695423 100.000000
15651 Get MouseEvent clientX 502 502 59859 59859 0.150075 0.198069 0.566498 0.697981 100.000000
16479 Get HTMLScriptElement contains 50 50 59428 4725 0.497370 0.195014 0.440591 0.120095 7.950798
5017 Get HTMLDocument createEvent 946 769 58567 20951 0.453316 0.292947 1.299261 0.678529 35.772705
12149 Get Window cancelAnimationFrame 753 709 57477 16721 0.369145 0.239573 0.606008 0.692131 29.091637
18119 Function Window addEventListener 7166 5668 56988 12036 0.087308 0.031169 1.998809 0.587056 21.120236
1619 Function DOMTokenList remove 1455 1297 56450 33813 0.093652 0.088424 1.097433 1.080078 59.899026
17146 Get MouseEvent ctrlKey 568 568 54139 54139 0.134587 0.176825 0.225626 0.324791 100.000000
12805 Get HTMLLIElement classList 432 379 54046 24570 0.364803 0.213858 1.165796 0.336717 45.461274
6363 Get HTMLInputElement value 959 918 53080 43990 0.116852 0.140776 3.371986 3.226430 82.874906
4864 Function CSSStyleDeclaration setProperty 844 733 52840 21599 0.344912 0.211022 3.304432 2.357075 40.876230
13983 Get XMLHttpRequest readyState 1799 1460 52660 41546 0.117972 0.138164 2.416760 1.615187 78.894797
11621 Get HTMLScriptElement querySelectorAll 255 254 52491 7958 0.222868 0.065104 0.889439 0.412583 15.160694
17248 Get AnimationEvent target 156 154 51350 26463 0.946958 0.695481 1.293656 2.294568 51.534567
7970 Get HTMLBodyElement querySelectorAll 899 887 49554 47400 0.104660 0.162933 0.451424 0.528431 95.653227
16722 Get DOMRect height 848 728 48913 33326 0.132691 0.115341 0.943817 0.621586 68.133216
1007 Function HTMLHtmlElement contains 239 227 48804 19490 0.149274 0.077270 0.346510 0.126978 39.935251
1586 Function HTMLElement contains 156 155 48695 5210 0.234840 0.058744 0.499611 0.158083 10.699250
10761 Get HTMLLIElement matches 194 194 48675 38829 0.448194 0.417750 0.121070 0.148470 79.771957
13396 Get MouseEvent timeStamp 743 743 48631 48631 0.109854 0.142058 0.730755 1.041972 100.000000
8125 Get MouseEvent defaultPrevented 717 717 48346 48346 0.113253 0.151701 0.240732 0.434295 100.000000
5205 Function HTMLDivElement removeEventListener 260 252 47554 43494 0.170548 0.215572 0.371203 0.479496 91.462338
6411 Function HTMLAnchorElement getBoundingClientRect 315 308 45499 18954 0.178495 0.126211 0.733821 0.767323 41.658058
9289 Function HTMLLIElement querySelector 324 316 44104 34549 0.281584 0.246653 1.151900 0.741847 78.335298
10020 Set HTMLAnchorElement href 2781 2633 43452 29806 0.078498 0.084109 0.583468 0.409084 68.595232
10485 Function Performance get timing 1518 1263 42977 34680 0.093326 0.108573 2.111314 0.486240 80.694325
8749 Function HTMLBodyElement querySelectorAll 909 897 42696 40672 0.086075 0.131413 0.442494 0.518009 95.259509
10978 Function HTMLDivElement removeAttribute 960 905 42597 28256 0.141140 0.149708 0.946574 0.483958 66.333310
9128 Function Window requestAnimationFrame 1655 1422 42244 28852 0.092087 0.097062 1.884883 3.626479 68.298457
8299 Get CSSStyleDeclaration display 737 706 39290 23448 0.106862 0.085491 0.902827 1.160609 59.679308
14494 Function HTMLDocument addEventListener 5870 5040 38962 10686 0.045599 0.019454 1.434622 0.627379 27.426723
9252 Get HTMLDocument createTextNode 1377 1101 38610 13895 0.081862 0.041612 0.448151 0.128047 35.988086
1971 Get HTMLDivElement insertBefore 605 435 38586 20373 0.186160 0.143721 2.028594 0.351760 52.798943
12639 Get Location pathname 3308 3049 38557 22143 0.073938 0.063340 5.189143 6.232416 57.429261
8396 Get HTMLLIElement style 560 405 37454 20411 0.592363 0.288392 5.040646 1.269598 27.796764
14663 Function HTMLImageElement getBoundingClientRect 471 462 37376 26681 0.304719 0.304806 0.494410 0.521456 71.385381
9416 Get MouseEvent button 651 651 37270 37270 0.081775 0.105410 0.203340 0.320155 100.000000
6941 Get Text querySelectorAll 109 109 37203 9394 1.316163 0.468775 2.026148 0.726001 25.250652
2968 Set Text textContent 42 41 37094 23641 1.142186 0.933173 0.917371 0.731892 63.732679
12725 Get SVGGElement setAttribute 181 94 36915 34399 1.276700 1.692134 0.734407 0.935771 93.184342
17687 Get HTMLHeadingElement textContent 515 491 36591 22860 0.363449 0.418207 1.356123 0.986070 62.474379
1907 Get MouseEvent relatedTarget 551 551 36528 36528 0.090811 0.121204 0.181478 0.322789 100.000000
10573 Get HTMLSpanElement querySelectorAll 294 290 36423 3483 0.141920 0.025951 0.638319 0.257111 9.562639
10180 Get SVGAnimatedString animVal 141 141 35733 0 1.063953 0.000000 1.073527 0.000000 0.000000
14038 Function HTMLLIElement matches 212 212 35613 27566 0.307302 0.276173 0.174845 0.209069 77.404319
9460 Function HTMLScriptElement contains 50 50 34822 2789 0.291435 0.115110 0.290610 0.102494 8.009304
1408 Function IntersectionObserver observe 1207 1148 34729 10292 0.115662 0.055461 1.237426 0.623971 29.635175
8866 Get Event preventDefault 152 146 33383 32313 0.361916 0.420887 1.060576 2.027285 96.794776
6798 Get Event currentTarget 245 228 32970 31656 0.213604 0.257495 0.752055 1.005232 96.014559
231 Get HTMLImageElement addEventListener 593 543 32948 13187 0.164906 0.109288 0.794287 0.199816 40.023674
16132 Get MouseEvent shiftKey 568 568 32948 32948 0.081890 0.107583 0.210033 0.304548 100.000000
3772 Function HTMLElement addEventListener 750 739 32724 2545 0.183205 0.022435 1.564308 1.332341 7.777167
16218 Get DOMRectReadOnly width 171 152 32313 29120 0.920164 1.404839 1.141764 1.228964 90.118528
11062 Get Event composedPath 113 108 32060 31658 0.506599 0.618031 0.903666 1.284913 98.746101
4198 Get HTMLButtonElement addEventListener 1479 1202 31731 4233 0.076390 0.014662 1.316896 0.835040 13.340267
1721 Get HTMLFormElement nodeName 211 209 31631 25358 0.086438 0.085160 0.221656 0.229548 80.168189
3350 Get HTMLInputElement checked 346 334 31629 30890 0.325307 0.469699 0.799651 0.794651 97.663537
9717 Function HTMLDocument createTextNode 1347 1071 31621 10393 0.067451 0.031229 0.436073 0.127475 32.867398
16017 Function HTMLScriptElement querySelectorAll 265 264 31596 3200 0.122425 0.022725 0.449440 0.213413 10.127864
15776 Get Event eventPhase 314 303 31579 29878 0.163133 0.214638 0.534467 0.701922 94.613509
17500 Get HTMLDivElement scrollLeft 179 173 31301 15018 0.176875 0.124601 1.615388 0.642169 47.979298
3282 Get DOMRectReadOnly height 171 151 30922 28292 0.907513 1.425795 0.429273 0.414457 91.494729
719 Get Crypto getRandomValues 1541 1225 30435 3165 0.170737 0.033795 1.865329 0.867292 10.399211
15518 Get HTMLElement style 549 489 30418 25567 0.105885 0.125329 1.493825 0.611387 84.052206
8459 Get HTMLAnchorElement removeEventListener 163 163 30074 26586 0.212811 0.252762 10.830490 33.543656 88.401942
5743 Get Event isTrusted 296 286 29859 28557 0.231935 0.348898 0.525885 0.706710 95.639506
13864 Get HTMLDocument getElementsByClassName 1339 1087 29724 9286 0.081348 0.035449 1.980805 0.892173 31.240748
553 Function HTMLDivElement insertBefore 590 435 29696 15125 0.137121 0.100618 1.936143 0.295648 50.932786
11456 Set CSSStyleDeclaration display 2158 1870 29629 20599 0.057507 0.064276 1.980954 1.334224 69.523102
11506 Get Event composed 198 193 29307 28905 0.412240 0.484833 1.555166 2.268116 98.628314
13010 Set CSSStyleDeclaration height 1003 833 29251 24464 0.069698 0.081981 0.304656 0.070588 83.634748
12834 Get MouseEvent metaKey 568 568 28631 28631 0.071161 0.093487 0.206934 0.300433 100.000000
4164 Function URLSearchParams get 1828 1560 28238 17025 0.108924 0.145465 2.392487 1.587353 60.291097
13685 Get MouseEvent which 422 422 27894 27894 0.078999 0.110417 0.613775 0.847804 100.000000
12006 Get HTMLAnchorElement contains 78 77 27892 6093 0.214739 0.199062 0.406560 0.474514 21.844973
8914 Get MouseEvent altKey 541 541 27876 27876 0.069528 0.091297 0.188469 0.273704 100.000000
15714 Get HTMLScriptElement matches 13 13 27112 8121 0.447334 0.144384 0.204745 0.068769 29.953526
6813 Get HTMLScriptElement querySelector 5 5 27016 8111 0.470368 0.146231 0.487683 0.159742 30.022949
10796 Get HTMLHtmlElement scrollLeft 686 674 26980 24978 0.066215 0.087673 0.352541 0.608070 92.579689
5832 Get HTMLAnchorElement querySelectorAll 182 181 26790 7994 0.182795 0.191212 0.719168 0.296390 29.839492
5335 Get ShadowRoot host 27 23 26450 18481 3.353093 3.789404 2.990668 2.523576 69.871456
8265 Get MouseEvent screenX 432 431 26156 26155 0.074340 0.100463 0.187035 0.281187 99.996177
2495 Get MouseEvent screenY 430 430 26154 26154 0.074335 0.100459 0.182993 0.279588 100.000000
11802 Function HTMLDocument createElementNS 870 748 25562 7929 0.173085 0.106273 1.088253 0.170741 31.018700
1271 Function HTMLImageElement addEventListener 585 539 25288 11160 0.128254 0.094256 0.826761 0.479900 44.131604
8172 Set HTMLDocument cookie 2810 2376 25002 15393 0.057419 0.058849 5.065215 0.855618 61.567075
1117 Get HTMLLIElement querySelectorAll 428 283 24763 6135 0.117284 0.058253 0.788971 0.077840 24.774866
10647 Get PerformanceResourceTiming responseEnd 271 247 23874 7791 0.312687 0.302471 1.978660 1.709044 32.633828
4216 Get MessageEvent data 908 677 23826 9579 0.240720 0.147865 5.398752 5.537861 40.203979
1279 Function HTMLAnchorElement contains 78 77 23764 5635 0.182958 0.184099 0.389154 0.470281 23.712338
2717 Get MouseEvent pointerType 210 210 23241 23241 0.082687 0.102711 0.464080 0.477471 100.000000
14151 Get HTMLDocument scrollingElement 594 582 23040 21476 0.072746 0.086050 0.213598 0.306672 93.211806
6381 Get MouseEvent offsetX 199 199 22993 22993 0.081879 0.104333 0.313116 0.427567 100.000000
10351 Get MouseEvent offsetY 199 199 22993 22993 0.081879 0.104333 0.313116 0.427567 100.000000
12434 Get HTMLTextAreaElement value 142 138 22946 22587 0.084866 0.104089 1.192374 1.197698 98.435457
16255 Function HTMLBodyElement contains 370 207 22532 16616 0.078523 0.077264 0.180246 0.080905 73.744009
1345 Get MouseEvent keyCode 238 238 22474 22474 0.080439 0.102588 0.282488 0.310556 100.000000
13718 Get MouseEvent charCode 193 193 22411 22411 0.080467 0.102558 0.311932 0.327368 100.000000
699 Get HTMLElement textContent 404 394 22233 3757 0.205046 0.063767 0.561368 0.158516 16.898304
1513 Get Window origin 1289 1101 22169 14509 0.202095 0.240889 1.286647 1.650455 65.447246
7132 Get HTMLStyleElement querySelectorAll 72 71 22038 1311 0.120812 0.015140 0.152180 0.047534 5.948816
7770 Get MouseEvent changedTouches 195 195 21964 21964 0.084084 0.106469 0.323369 0.343555 100.000000
7877 Get DOMRectReadOnly top 113 103 21890 20279 0.536787 0.543799 0.366536 0.480785 92.640475
6873 Get MouseEvent state 182 182 21823 21823 0.085844 0.109151 0.329381 0.344476 100.000000
15423 Get MouseEvent key 182 182 21823 21823 0.085844 0.109151 0.329381 0.344476 100.000000
16195 Get MouseEvent pointerId 182 182 21823 21823 0.085844 0.109151 0.329381 0.344476 100.000000
13022 Get Window frames 1252 1002 21647 5991 0.324448 0.207435 1.437239 0.732157 27.675890
6517 Get HTMLElement querySelectorAll 838 778 21615 8622 0.042891 0.026701 0.216786 0.176318 39.888966
4622 Get CustomEvent detail 525 494 21581 15168 0.351925 0.305029 1.007954 1.372400 70.284046
8544 Get HTMLButtonElement removeAttribute 415 345 21581 17502 0.153523 0.185642 0.739158 0.096874 81.099115
9282 Function HTMLButtonElement addEventListener 1475 1215 21572 3449 0.048451 0.011562 1.259833 0.811528 15.988318
10098 Get Event bubbles 311 300 21248 19547 0.155065 0.231605 0.484210 0.652921 91.994541
9949 Get Event cancelable 304 293 21215 19514 0.155911 0.233246 0.495045 0.667662 91.982088
258 Set CSSStyleDeclaration position 1192 969 21109 2744 0.052430 0.009619 0.507048 0.181102 12.999195
16455 Get Window scrollY 748 732 21054 20005 0.074004 0.097788 2.101091 4.300151 95.017574
3841 Function HTMLAnchorElement matches 428 421 20814 16177 0.090471 0.146241 0.495322 0.483386 77.721726
382 Get DOMRectReadOnly left 20 10 20755 19340 1.656417 1.796056 1.046280 1.309103 93.182366
14538 Function HTMLAnchorElement querySelectorAll 192 191 20666 5916 0.122200 0.097969 0.640198 0.272638 28.626730
1555 Get HTMLLIElement addEventListener 394 377 20635 1059 0.206049 0.015210 0.370146 0.058123 5.132057
15370 Function SVGPathElement setAttribute 472 368 20615 7684 0.211360 0.153777 1.324606 0.119985 37.273830
17625 Get DOMRectReadOnly bottom 10 9 20314 19277 1.565912 1.689944 1.043303 1.630184 94.895146
5853 Get DOMRectReadOnly right 8 7 20299 19263 1.694730 1.794396 1.301227 2.033126 94.896300
12959 Set CSSStyleDeclaration left 565 532 20098 2004 0.078073 0.011633 0.940909 0.591199 9.971141
4017 Function HTMLSpanElement querySelectorAll 304 300 20086 1848 0.071940 0.012095 0.336692 0.147938 9.200438
13605 Function HTMLDivElement removeChild 937 886 19958 6750 0.065481 0.035759 0.432451 0.263485 33.821024
4899 Get CSSStyleRule selectorText 98 95 19820 117 0.075323 0.000551 0.413574 0.001181 0.590313
14550 Set CSSStyleDeclaration width 1292 1097 19640 8521 0.065403 0.042337 1.446305 0.382616 43.385947
1409 Get HTMLSpanElement matches 119 118 19621 13586 0.095171 0.135756 0.060798 0.057475 69.242139
8499 Get SVGGElement appendChild 185 98 19413 392 0.550612 0.014762 2.657822 0.131523 2.019265
6959 Function HTMLElement querySelectorAll 844 784 19393 6851 0.036904 0.020121 0.208905 0.170334 35.327180
8445 Function HTMLLIElement querySelectorAll 429 284 19386 4685 0.090873 0.043728 0.770231 0.074259 24.166925
10151 Get HTMLSpanElement querySelector 195 150 19255 13171 0.165161 0.129837 1.334592 0.095666 68.403012
4938 Get PerformanceResourceTiming transferSize 251 243 19084 7794 0.217156 0.208775 1.392484 0.983322 40.840495
10597 Get HTMLAnchorElement appendChild 439 317 18979 6710 0.149446 0.082790 0.773147 0.377680 35.354866
13911 Function CSSRuleList item 5 5 18806 18806 1.929088 1.929088 1.933220 1.933220 100.000000
2730 Get MediaQueryList matches 1294 1115 18660 6449 0.108481 0.056959 2.218405 1.270101 34.560557
1674 Get DocumentFragment appendChild 565 548 18600 8016 0.093284 0.063870 0.311641 0.220411 43.096774
2046 Get HTMLTextAreaElement style 98 98 18491 18281 0.078172 0.098089 1.132860 1.119545 98.864312
2755 Get XMLHttpRequest responseText 1804 1432 18385 15141 0.042540 0.051085 0.746252 0.516296 82.355181
16467 Get PerformanceResourceTiming duration 263 238 18315 8300 0.179272 0.172481 0.596822 0.566587 45.318045
6253 Get SVGPathElement attributes 198 197 18189 5025 0.211708 0.111967 0.309853 0.057502 27.626587
10388 Get HTMLLIElement appendChild 171 164 18164 5593 0.272864 0.173562 0.932245 0.341723 30.791676
1263 Get HTMLParagraphElement textContent 525 510 18148 8969 0.170347 0.173469 0.685390 0.522630 49.421424
13664 Get XMLHttpRequest status 1784 1454 18146 14144 0.037343 0.044292 1.521811 1.169566 77.945553
13766 Function Storage key 140 133 18102 15314 0.069092 0.073161 1.802729 0.917468 84.598387
Analysis of popular API calls
data/api_calls.csv
and analyzed in data/src/data/api_calls.py
.Columns
api_type
: The type of API call—Get, Set, Function, or Construction.this
: The object the API is called on, may be empty for static functions.attr
: The attribute or method being called, may be empty.appear
: How many scripts the API call appears in.appear_interact
: How many scripts the API call appears in after interaction started.total
: How many times the API call is made.interact
: How many times the API call is made after interaction started.%total/total
: The percentage out of all API calls in the scripts the API appear in.%interact/interact
: The percentage out of all API calls after interaction in the scripts the API appear in.avg%total/script
: The average percentage per script.avg%interact/script
: The average percentage per script after interaction.%interact/total
: The percentage of calls after interaction out of all calls.In [2]: df.describe()
Out[2]:
appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
count 18120.000000 18120.000000 1.812000e+04 1.812000e+04 18120.000000 15662.000000 18120.000000 18120.000000 18120.000000
mean 83.331733 71.206733 6.331973e+03 3.842462e+03 0.927087 0.530690 1.367818 0.515940 39.876583
std 348.429012 279.559604 8.239731e+04 5.046730e+04 6.230344 3.780852 6.853030 2.822503 40.950548
min 1.000000 0.000000 1.000000e+00 0.000000e+00 0.000087 0.000000 0.000088 0.000000 0.000000
25% 4.000000 2.000000 5.000000e+00 0.000000e+00 0.007226 0.000000 0.016281 0.000000 0.000000
50% 7.000000 5.000000 2.300000e+01 5.000000e+00 0.036847 0.008031 0.095799 0.005917 23.913043
75% 35.000000 30.000000 2.650000e+02 8.225000e+01 0.178863 0.092942 0.512494 0.167178 83.026494
max 9957.000000 7181.000000 6.542023e+06 3.267748e+06 100.000000 100.000000 100.000000 100.000000 100.000000
Call distribution
total
and interact
columns.Top 20 API calls overall
HTMLDocument
or HTMLDivElement
and a few functions. The results are similar for calls made after interaction began.In [3]: df.nlargest(20, "appear")
Out[3]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
12035 Get HTMLDocument createElement 9957 6827 386183 160286 0.383050 0.272003 2.670769 0.975292 41.505193
13516 Get Window location 9423 7181 579071 319716 0.664399 0.615618 6.606676 4.230041 55.211882
2014 Function HTMLDocument createElement 9320 6330 286107 122524 0.296257 0.219127 2.612151 0.967649 42.824538
2211 Get Window addEventListener 7893 6191 122537 31256 0.140030 0.056707 2.405811 0.710843 25.507398
18119 Function Window addEventListener 7166 5668 56988 12036 0.087308 0.031169 1.998809 0.587056 21.120236
241 Get Navigator userAgent 6996 5088 142782 58271 0.173377 0.112728 5.145219 2.053937 40.811167
5125 Get Window document 6890 5156 280870 131302 0.349732 0.245301 2.341958 1.213727 46.748318
9201 Function Window setTimeout 6789 5622 774630 688808 0.847538 1.324614 3.571197 3.203018 88.920904
1990 Get HTMLDocument body 6479 5167 364548 234641 0.378641 0.399859 2.522424 1.552503 64.364912
2370 Get HTMLDocument addEventListener 6238 5288 106708 34235 0.119114 0.059167 1.832010 0.636335 32.082880
15595 Get Window navigator 6201 4558 268025 98575 0.384988 0.216489 4.785670 1.907738 36.778286
2129 Get HTMLDocument documentElement 6129 4566 348638 220531 0.365681 0.365297 3.225033 1.109748 63.255009
14494 Function HTMLDocument addEventListener 5870 5040 38962 10686 0.045599 0.019454 1.434622 0.627379 27.426723
7915 Get Location href 5824 4561 210402 135381 0.291802 0.297886 2.584136 2.093848 64.343970
16137 Get Location search 5601 4460 65008 34421 0.089337 0.081936 2.977207 2.503777 52.948868
14177 Get HTMLDocument getElementById 5600 3731 145667 67289 0.202675 0.151724 6.768832 2.289762 46.193716
2996 Get HTMLDocument cookie 5578 4147 172695 110544 0.324422 0.336975 7.789128 2.641675 64.011118
12711 Function HTMLDocument getElementById 5559 3690 81282 42177 0.114240 0.095513 6.735331 2.271519 51.889717
13966 Get HTMLDocument querySelector 5470 4069 214306 135242 0.261517 0.286668 5.343761 2.736037 63.106959
15619 Function HTMLDocument querySelector 5256 3878 109894 70588 0.143162 0.159666 5.420111 2.722504 64.232806
In [4]: df.nlargest(20, "appear_interact")
Out[4]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
13516 Get Window location 9423 7181 579071 319716 0.664399 0.615618 6.606676 4.230041 55.211882
12035 Get HTMLDocument createElement 9957 6827 386183 160286 0.383050 0.272003 2.670769 0.975292 41.505193
2014 Function HTMLDocument createElement 9320 6330 286107 122524 0.296257 0.219127 2.612151 0.967649 42.824538
2211 Get Window addEventListener 7893 6191 122537 31256 0.140030 0.056707 2.405811 0.710843 25.507398
18119 Function Window addEventListener 7166 5668 56988 12036 0.087308 0.031169 1.998809 0.587056 21.120236
9201 Function Window setTimeout 6789 5622 774630 688808 0.847538 1.324614 3.571197 3.203018 88.920904
2370 Get HTMLDocument addEventListener 6238 5288 106708 34235 0.119114 0.059167 1.832010 0.636335 32.082880
1990 Get HTMLDocument body 6479 5167 364548 234641 0.378641 0.399859 2.522424 1.552503 64.364912
5125 Get Window document 6890 5156 280870 131302 0.349732 0.245301 2.341958 1.213727 46.748318
241 Get Navigator userAgent 6996 5088 142782 58271 0.173377 0.112728 5.145219 2.053937 40.811167
14494 Function HTMLDocument addEventListener 5870 5040 38962 10686 0.045599 0.019454 1.434622 0.627379 27.426723
2129 Get HTMLDocument documentElement 6129 4566 348638 220531 0.365681 0.365297 3.225033 1.109748 63.255009
7915 Get Location href 5824 4561 210402 135381 0.291802 0.297886 2.584136 2.093848 64.343970
15595 Get Window navigator 6201 4558 268025 98575 0.384988 0.216489 4.785670 1.907738 36.778286
16137 Get Location search 5601 4460 65008 34421 0.089337 0.081936 2.977207 2.503777 52.948868
8169 Get Location hostname 4968 4176 136629 68568 0.429844 0.370961 4.515706 3.713880 50.185539
2996 Get HTMLDocument cookie 5578 4147 172695 110544 0.324422 0.336975 7.789128 2.641675 64.011118
13966 Get HTMLDocument querySelector 5470 4069 214306 135242 0.261517 0.286668 5.343761 2.736037 63.106959
7130 Get NodeList length 4664 4065 2382138 1072590 2.861152 1.943079 3.927897 1.756873 45.026359
15619 Function HTMLDocument querySelector 5256 3878 109894 70588 0.143162 0.159666 5.420111 2.722504 64.232806
In [5]: df.nlargest(20, "total")
Out[5]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
10159 Get HTMLDivElement parentNode 2965 2640 6542023 3267748 7.269719 6.034085 3.250519 3.723290 49.950115
10200 Get HTMLDivElement nodeType 2014 1800 4208131 3091512 5.603724 5.771592 3.311296 2.459127 73.465203
12998 Get HTMLBodyElement parentNode 1665 1501 3365475 1217366 4.883114 2.911233 1.474360 1.232955 36.172190
7130 Get NodeList length 4664 4065 2382138 1072590 2.861152 1.943079 3.927897 1.756873 45.026359
12987 Get HTMLDivElement getAttribute 2037 1865 2048964 771237 2.767731 1.546469 2.962922 2.441469 37.640339
17434 Get HTMLDivElement nodeName 1443 1384 1931993 1450429 2.994059 3.124005 4.088068 4.737432 75.074237
1340 Get HTMLDivElement hasAttribute 858 830 1766442 1005093 3.688406 2.609303 2.078833 1.705360 56.899292
14607 Get MutationRecord addedNodes 432 423 1509702 775421 5.358348 4.852424 7.431751 9.234233 51.362521
16368 Get HTMLDivElement contains 542 535 1143362 1053742 2.645657 3.983127 2.107908 6.223987 92.161713
14386 Get MutationRecord type 339 331 1130505 470080 4.330352 3.197764 2.887015 3.495311 41.581417
1488 Get HTMLDivElement parentElement 1652 1584 1119639 646139 1.859494 1.433435 3.000702 2.827354 57.709583
6767 Function HTMLDivElement getAttribute 2039 1867 1117674 508505 1.509619 1.019614 2.401499 2.032517 45.496719
16281 Function HTMLDivElement matches 843 835 1115182 1064906 4.807023 6.144259 2.282807 2.758132 95.491678
7543 Get Performance now 3207 2534 1071387 900360 1.621756 2.036715 4.229508 3.661958 84.036861
17164 Get HTMLDivElement tagName 2146 2058 1063408 740653 1.584709 1.610988 2.923359 2.710166 69.648996
9359 Get HTMLDivElement matches 715 707 1061215 1007261 5.876682 7.905329 2.152317 2.651707 94.915828
4046 Get CustomElementRegistry get 137 89 1051466 1011774 11.754951 12.657662 4.568745 1.805876 96.225080
9514 Get HTMLDivElement querySelector 1871 1647 1030328 988630 1.721028 2.543417 2.069165 1.374590 95.952939
16355 Get HTMLBodyElement nodeType 1094 926 969560 909957 1.680494 2.101329 0.870327 0.668667 93.852572
2513 Get MutationRecord removedNodes 108 107 947656 302503 5.658883 4.521003 5.596039 5.110085 31.921182
In [6]: df.nlargest(20, "interact")
Out[6]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
10159 Get HTMLDivElement parentNode 2965 2640 6542023 3267748 7.269719 6.034085 3.250519 3.723290 49.950115
10200 Get HTMLDivElement nodeType 2014 1800 4208131 3091512 5.603724 5.771592 3.311296 2.459127 73.465203
17434 Get HTMLDivElement nodeName 1443 1384 1931993 1450429 2.994059 3.124005 4.088068 4.737432 75.074237
12998 Get HTMLBodyElement parentNode 1665 1501 3365475 1217366 4.883114 2.911233 1.474360 1.232955 36.172190
7130 Get NodeList length 4664 4065 2382138 1072590 2.861152 1.943079 3.927897 1.756873 45.026359
16281 Function HTMLDivElement matches 843 835 1115182 1064906 4.807023 6.144259 2.282807 2.758132 95.491678
16368 Get HTMLDivElement contains 542 535 1143362 1053742 2.645657 3.983127 2.107908 6.223987 92.161713
4046 Get CustomElementRegistry get 137 89 1051466 1011774 11.754951 12.657662 4.568745 1.805876 96.225080
9359 Get HTMLDivElement matches 715 707 1061215 1007261 5.876682 7.905329 2.152317 2.651707 94.915828
1340 Get HTMLDivElement hasAttribute 858 830 1766442 1005093 3.688406 2.609303 2.078833 1.705360 56.899292
9514 Get HTMLDivElement querySelector 1871 1647 1030328 988630 1.721028 2.543417 2.069165 1.374590 95.952939
16355 Get HTMLBodyElement nodeType 1094 926 969560 909957 1.680494 2.101329 0.870327 0.668667 93.852572
7543 Get Performance now 3207 2534 1071387 900360 1.621756 2.036715 4.229508 3.661958 84.036861
3621 Get TransitionEvent target 326 326 884580 883460 3.460662 5.166294 5.826075 6.594800 99.873386
17442 Get HTMLBodyElement nodeName 1117 1092 905159 878087 1.588063 2.115985 0.662423 0.761367 97.009144
12774 Get HTMLHtmlElement nodeName 777 754 896397 854629 1.931199 2.493193 0.726602 0.848152 95.340457
9483 Function HTMLDivElement querySelector 1871 1647 847881 808260 1.416274 2.079385 2.040353 1.344933 95.327057
3378 Function CustomElementRegistry get 101 71 828723 806370 12.442614 13.151448 5.779473 2.220260 97.302718
14414 Get Event target 2002 1903 824110 783286 1.179062 1.482671 4.650736 5.382586 95.046292
14607 Get MutationRecord addedNodes 432 423 1509702 775421 5.358348 4.852424 7.431751 9.234233 51.362521
%interact/total
of some most popular APIs (e.g., 43% for createElement
), we can already see a large number of DOM manipulations being done before any interaction.%total/total
, %interact/interact
, avg%total/script
and avg%interact/script
are mostly user-defined junk. This only shows the custom attributes are highly popular within the scripts that use them.Top 20 API function calls
In [7]: df[df["api_type"] == "Function"].nlargest(20, "appear") # type: ignore[reportArgumentType]
Out[7]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
2014 Function HTMLDocument createElement 9320 6330 286107 122524 0.296257 0.219127 2.612151 0.967649 42.824538
18119 Function Window addEventListener 7166 5668 56988 12036 0.087308 0.031169 1.998809 0.587056 21.120236
9201 Function Window setTimeout 6789 5622 774630 688808 0.847538 1.324614 3.571197 3.203018 88.920904
14494 Function HTMLDocument addEventListener 5870 5040 38962 10686 0.045599 0.019454 1.434622 0.627379 27.426723
12711 Function HTMLDocument getElementById 5559 3690 81282 42177 0.114240 0.095513 6.735331 2.271519 51.889717
15619 Function HTMLDocument querySelector 5256 3878 109894 70588 0.143162 0.159666 5.420111 2.722504 64.232806
12778 Function HTMLDocument getElementsByTagName 4775 3038 60070 46347 0.104988 0.118652 2.101945 0.346168 77.154986
5732 Function HTMLDocument querySelectorAll 4647 3765 223991 187374 0.332157 0.412884 1.507670 0.883315 83.652468
2723 Function Storage getItem 4445 3465 80359 53636 0.111631 0.125812 2.384386 1.275023 66.745480
1029 Function Window clearTimeout 4064 3570 122973 97912 0.181246 0.218990 1.177498 2.771284 79.620730
16594 Function Performance now 2764 2340 624855 498159 0.960439 1.144829 3.490532 3.096365 79.723936
8423 Function Window removeEventListener 2715 2396 12394 7457 0.020448 0.018160 0.505042 0.459469 60.166209
16575 Function Navigator get userAgentData 2575 2003 8612 3716 0.019608 0.011080 0.995214 0.463714 43.149094
17624 Function HTMLHeadElement appendChild 2559 1888 17051 7303 0.040659 0.022542 1.667455 0.622105 42.830333
16461 Function HTMLHtmlElement getAttribute 2550 2173 54549 13001 0.088869 0.032554 0.661737 0.118803 23.833617
12704 Function Storage setItem 2530 2151 117494 100433 0.229494 0.285617 1.399357 0.855013 85.479259
7048 Function HTMLDivElement addEventListener 2514 2184 252068 107180 0.391858 0.234839 2.477123 1.046851 42.520272
6007 Function HTMLDivElement appendChild 2328 2052 87416 23624 0.157654 0.065366 2.490404 1.678324 27.024801
16209 Function HTMLDivElement querySelectorAll 2294 2227 574049 442116 0.767773 0.923417 1.696780 1.783361 77.017119
14409 Function DOMTokenList add 2140 1859 75038 50171 0.119709 0.129215 2.026993 1.599572 66.860791
In [8]: df[df["api_type"] == "Function"].nlargest(20, "appear_interact") # type: ignore[reportArgumentType]
Out[8]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
2014 Function HTMLDocument createElement 9320 6330 286107 122524 0.296257 0.219127 2.612151 0.967649 42.824538
18119 Function Window addEventListener 7166 5668 56988 12036 0.087308 0.031169 1.998809 0.587056 21.120236
9201 Function Window setTimeout 6789 5622 774630 688808 0.847538 1.324614 3.571197 3.203018 88.920904
14494 Function HTMLDocument addEventListener 5870 5040 38962 10686 0.045599 0.019454 1.434622 0.627379 27.426723
15619 Function HTMLDocument querySelector 5256 3878 109894 70588 0.143162 0.159666 5.420111 2.722504 64.232806
5732 Function HTMLDocument querySelectorAll 4647 3765 223991 187374 0.332157 0.412884 1.507670 0.883315 83.652468
12711 Function HTMLDocument getElementById 5559 3690 81282 42177 0.114240 0.095513 6.735331 2.271519 51.889717
1029 Function Window clearTimeout 4064 3570 122973 97912 0.181246 0.218990 1.177498 2.771284 79.620730
2723 Function Storage getItem 4445 3465 80359 53636 0.111631 0.125812 2.384386 1.275023 66.745480
12778 Function HTMLDocument getElementsByTagName 4775 3038 60070 46347 0.104988 0.118652 2.101945 0.346168 77.154986
8423 Function Window removeEventListener 2715 2396 12394 7457 0.020448 0.018160 0.505042 0.459469 60.166209
16594 Function Performance now 2764 2340 624855 498159 0.960439 1.144829 3.490532 3.096365 79.723936
16209 Function HTMLDivElement querySelectorAll 2294 2227 574049 442116 0.767773 0.923417 1.696780 1.783361 77.017119
7048 Function HTMLDivElement addEventListener 2514 2184 252068 107180 0.391858 0.234839 2.477123 1.046851 42.520272
16461 Function HTMLHtmlElement getAttribute 2550 2173 54549 13001 0.088869 0.032554 0.661737 0.118803 23.833617
12704 Function Storage setItem 2530 2151 117494 100433 0.229494 0.285617 1.399357 0.855013 85.479259
6007 Function HTMLDivElement appendChild 2328 2052 87416 23624 0.157654 0.065366 2.490404 1.678324 27.024801
16575 Function Navigator get userAgentData 2575 2003 8612 3716 0.019608 0.011080 0.995214 0.463714 43.149094
17624 Function HTMLHeadElement appendChild 2559 1888 17051 7303 0.040659 0.022542 1.667455 0.622105 42.830333
6767 Function HTMLDivElement getAttribute 2039 1867 1117674 508505 1.509619 1.019614 2.401499 2.032517 45.496719
In [9]: df[df["api_type"] == "Function"].nlargest(20, "total") # type: ignore[reportArgumentType]
Out[9]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
6767 Function HTMLDivElement getAttribute 2039 1867 1117674 508505 1.509619 1.019614 2.401499 2.032517 45.496719
16281 Function HTMLDivElement matches 843 835 1115182 1064906 4.807023 6.144259 2.282807 2.758132 95.491678
9483 Function HTMLDivElement querySelector 1871 1647 847881 808260 1.416274 2.079385 2.040353 1.344933 95.327057
3378 Function CustomElementRegistry get 101 71 828723 806370 12.442614 13.151448 5.779473 2.220260 97.302718
9201 Function Window setTimeout 6789 5622 774630 688808 0.847538 1.324614 3.571197 3.203018 88.920904
16594 Function Performance now 2764 2340 624855 498159 0.960439 1.144829 3.490532 3.096365 79.723936
2831 Function HTMLDivElement hasAttribute 858 830 596653 412168 1.246269 1.070484 2.153890 2.032827 69.080018
16209 Function HTMLDivElement querySelectorAll 2294 2227 574049 442116 0.767773 0.923417 1.696780 1.783361 77.017119
11333 Function HTMLDivElement closest 1111 1109 415799 62022 1.477813 0.420875 2.660322 3.038853 14.916342
15923 Function HTMLAnchorElement getAttribute 1630 1553 353941 184154 0.464089 0.381401 1.700309 2.157078 52.029576
4078 Function HTMLDivElement contains 542 535 303535 254118 0.702358 0.960562 1.509068 3.241954 83.719505
2014 Function HTMLDocument createElement 9320 6330 286107 122524 0.296257 0.219127 2.612151 0.967649 42.824538
7048 Function HTMLDivElement addEventListener 2514 2184 252068 107180 0.391858 0.234839 2.477123 1.046851 42.520272
5732 Function HTMLDocument querySelectorAll 4647 3765 223991 187374 0.332157 0.412884 1.507670 0.883315 83.652468
11541 Function DOMTokenList contains 1919 1724 210971 141308 0.421512 0.391151 2.304020 2.127485 66.979822
2490 Function Text contains 51 51 187765 5018 1.569764 0.206079 1.265214 0.071572 2.672490
16718 Function HTMLScriptElement getAttribute 2065 1544 151950 50237 0.378190 0.168485 6.170167 2.245830 33.061533
12692 Function TreeWalker nextNode 139 124 144231 71645 1.507849 0.865876 4.913585 3.978987 49.673787
14823 Function Window getComputedStyle 1697 1384 143326 75800 0.299979 0.220471 1.873741 2.419006 52.886427
4797 Function HTMLDivElement setAttribute 2075 1803 124080 65231 0.200152 0.153838 0.799871 0.543422 52.571728
In [10]: df[df["api_type"] == "Function"].nlargest(20, "interact") # type: ignore[reportArgumentType]
Out[10]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
16281 Function HTMLDivElement matches 843 835 1115182 1064906 4.807023 6.144259 2.282807 2.758132 95.491678
9483 Function HTMLDivElement querySelector 1871 1647 847881 808260 1.416274 2.079385 2.040353 1.344933 95.327057
3378 Function CustomElementRegistry get 101 71 828723 806370 12.442614 13.151448 5.779473 2.220260 97.302718
9201 Function Window setTimeout 6789 5622 774630 688808 0.847538 1.324614 3.571197 3.203018 88.920904
6767 Function HTMLDivElement getAttribute 2039 1867 1117674 508505 1.509619 1.019614 2.401499 2.032517 45.496719
16594 Function Performance now 2764 2340 624855 498159 0.960439 1.144829 3.490532 3.096365 79.723936
16209 Function HTMLDivElement querySelectorAll 2294 2227 574049 442116 0.767773 0.923417 1.696780 1.783361 77.017119
2831 Function HTMLDivElement hasAttribute 858 830 596653 412168 1.246269 1.070484 2.153890 2.032827 69.080018
4078 Function HTMLDivElement contains 542 535 303535 254118 0.702358 0.960562 1.509068 3.241954 83.719505
5732 Function HTMLDocument querySelectorAll 4647 3765 223991 187374 0.332157 0.412884 1.507670 0.883315 83.652468
15923 Function HTMLAnchorElement getAttribute 1630 1553 353941 184154 0.464089 0.381401 1.700309 2.157078 52.029576
11541 Function DOMTokenList contains 1919 1724 210971 141308 0.421512 0.391151 2.304020 2.127485 66.979822
2014 Function HTMLDocument createElement 9320 6330 286107 122524 0.296257 0.219127 2.612151 0.967649 42.824538
595 Function HTMLAnchorElement isSameNode 30 30 117855 117855 8.029145 12.948267 7.101826 11.841353 100.000000
7048 Function HTMLDivElement addEventListener 2514 2184 252068 107180 0.391858 0.234839 2.477123 1.046851 42.520272
14811 Function HTMLInputElement matches 120 120 101906 101899 0.937924 1.010790 1.194562 1.553083 99.993131
12704 Function Storage setItem 2530 2151 117494 100433 0.229494 0.285617 1.399357 0.855013 85.479259
1029 Function Window clearTimeout 4064 3570 122973 97912 0.181246 0.218990 1.177498 2.771284 79.620730
15809 Function HTMLAnchorElement removeAttribute 373 347 99233 96946 1.153120 1.474503 0.726152 0.792383 97.695323
11476 Function DocumentFragment appendChild 555 540 103875 94235 0.567692 0.809485 2.425293 3.739579 90.719615
The top 20 function calls on the other 4 metrics are also less intuitive. They are mostly internal, workers, canvas, etc., and they do not appear much.
In [12]: df[df["api_type"] == "Function"].nlargest(20, "%total/total") # type: ignore[reportArgumentType]
Out[12]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
4048 Function Object query 7 0 7 0 100.000000 NaN 100.000000 0.000000 0.000000
5873 Function DedicatedWorkerGlobalScope importScripts 15 0 15 0 100.000000 NaN 100.000000 0.000000 0.000000
10648 Function KeyboardLayoutMap Iterator next 1 0 50 0 94.339623 NaN 94.339623 0.000000 0.000000
8245 Function Navigator joinAdInterestGroup 22 0 80 0 25.559105 NaN 33.441558 0.000000 0.000000
978 Function HTMLDocument evaluate 5 2 790 0 17.665474 0.000000 17.901950 0.000000 0.000000
14861 Function FontFaceSet add 18 4 18 4 16.666667 16.666667 16.666667 3.703704 22.222222
15873 Function FontFace load 18 4 18 4 16.666667 16.666667 16.666667 3.703704 22.222222
16387 Function HTMLLinkElement insertAdjacentHTML 29 29 29 29 16.666667 16.666667 16.666667 16.666667 100.000000
2636 Function HTMLDivElement isEqualNode 4 4 350 245 14.583333 13.424658 10.373686 10.092331 70.000000
9888 Function HTMLBodyElement get dataset 11 0 11 0 14.285714 NaN 14.285714 0.000000 0.000000
11149 Function Window close 7 0 7 0 14.285714 NaN 14.285714 0.000000 0.000000
3378 Function CustomElementRegistry get 101 71 828723 806370 12.442614 13.151448 5.779473 2.220260 97.302718
6954 Function Window find 1 1 290 266 11.435331 10.901639 11.435331 10.901639 91.724138
7442 Function DedicatedWorkerGlobalScope btoa 5 0 5 0 10.000000 NaN 10.000000 0.000000 0.000000
13750 Function DOMException get name 4 4 5 5 9.433962 9.433962 9.318182 9.318182 100.000000
8856 Function DedicatedWorkerGlobalScope postMessage 53 0 998 0 9.239030 NaN 14.201290 0.000000 0.000000
595 Function HTMLAnchorElement isSameNode 30 30 117855 117855 8.029145 12.948267 7.101826 11.841353 100.000000
3325 Function CanvasRenderingContext2D save 5 5 4700 4252 7.365501 7.344330 8.110019 8.297614 90.468085
4235 Function CanvasRenderingContext2D restore 5 5 4700 4252 7.365501 7.344330 8.110019 8.297614 90.468085
2952 Function HTMLBodyElement getClientRects 12 0 12 0 6.666667 NaN 14.423077 0.000000 0.000000
In [13]: df[df["api_type"] == "Function"].nlargest(20, "%interact/interact") # type: ignore[reportArgumentType]
Out[13]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
4306 Function CanvasRenderingContext2D putImageData 7 7 1378 1146 0.600666 27.501800 7.636882 19.591959 83.164006
2788 Function BroadcastChannel removeEventListener 3 2 3 2 0.603622 25.000000 0.666398 16.666667 66.666667
5443 Function PerformanceObserverEntryList getEntriesByName 4 4 10 10 2.036660 23.255814 2.005391 28.875812 100.000000
14861 Function FontFaceSet add 18 4 18 4 16.666667 16.666667 16.666667 3.703704 22.222222
15873 Function FontFace load 18 4 18 4 16.666667 16.666667 16.666667 3.703704 22.222222
16387 Function HTMLLinkElement insertAdjacentHTML 29 29 29 29 16.666667 16.666667 16.666667 16.666667 100.000000
2636 Function HTMLDivElement isEqualNode 4 4 350 245 14.583333 13.424658 10.373686 10.092331 70.000000
3378 Function CustomElementRegistry get 101 71 828723 806370 12.442614 13.151448 5.779473 2.220260 97.302718
595 Function HTMLAnchorElement isSameNode 30 30 117855 117855 8.029145 12.948267 7.101826 11.841353 100.000000
6954 Function Window find 1 1 290 266 11.435331 10.901639 11.435331 10.901639 91.724138
4237 Function IDBTransaction removeEventListener 13 5 702 21 0.578583 9.589041 3.134371 3.592959 2.991453
1527 Function HTMLCanvasElement dispatchEvent 5 5 5 5 2.659574 9.433962 2.702381 10.952381 100.000000
13750 Function DOMException get name 4 4 5 5 9.433962 9.433962 9.318182 9.318182 100.000000
46 Function HTMLFormElement checkValidity 5 5 35 35 2.800000 9.333333 2.905555 13.967387 100.000000
3325 Function CanvasRenderingContext2D save 5 5 4700 4252 7.365501 7.344330 8.110019 8.297614 90.468085
4235 Function CanvasRenderingContext2D restore 5 5 4700 4252 7.365501 7.344330 8.110019 8.297614 90.468085
1661 Function URL get protocol 90 90 5352 1824 0.524881 6.625740 0.728693 0.645742 34.080717
16281 Function HTMLDivElement matches 843 835 1115182 1064906 4.807023 6.144259 2.282807 2.758132 95.491678
15075 Function MessageEvent stopImmediatePropagation 1 1 1 1 0.564972 5.882353 0.564972 5.882353 100.000000
7786 Function DialogHelperElement querySelector 5 5 540 495 4.846962 5.586277 4.874266 5.632438 91.666667
In [14]: df[df["api_type"] == "Function"].nlargest(20, "avg%total/script") # type: ignore[reportArgumentType]
Out[14]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
4048 Function Object query 7 0 7 0 100.000000 NaN 100.000000 0.000000 0.000000
5873 Function DedicatedWorkerGlobalScope importScripts 15 0 15 0 100.000000 NaN 100.000000 0.000000 0.000000
10648 Function KeyboardLayoutMap Iterator next 1 0 50 0 94.339623 NaN 94.339623 0.000000 0.000000
8245 Function Navigator joinAdInterestGroup 22 0 80 0 25.559105 NaN 33.441558 0.000000 0.000000
6982 Function FontFaceSet load 110 52 735 116 0.429920 2.183735 21.941940 5.717374 15.782313
13606 Function ServiceWorkerGlobalScope fetch 25 0 4307 0 3.182920 NaN 19.438611 0.000000 0.000000
12027 Function ServiceWorkerGlobalScope importScripts 11 0 36 0 0.944386 NaN 18.684029 0.000000 0.000000
978 Function HTMLDocument evaluate 5 2 790 0 17.665474 0.000000 17.901950 0.000000 0.000000
11792 Function Image setAttribute 175 85 628 204 0.101835 0.037405 17.618164 4.888760 32.484076
16387 Function HTMLLinkElement insertAdjacentHTML 29 29 29 29 16.666667 16.666667 16.666667 16.666667 100.000000
14861 Function FontFaceSet add 18 4 18 4 16.666667 16.666667 16.666667 3.703704 22.222222
15873 Function FontFace load 18 4 18 4 16.666667 16.666667 16.666667 3.703704 22.222222
17399 Function PerformanceObserverEntryList getEntriesByType 38 38 3650 3159 4.730062 5.344725 15.360827 19.455392 86.547945
2952 Function HTMLBodyElement getClientRects 12 0 12 0 6.666667 NaN 14.423077 0.000000 0.000000
11149 Function Window close 7 0 7 0 14.285714 NaN 14.285714 0.000000 0.000000
9888 Function HTMLBodyElement get dataset 11 0 11 0 14.285714 NaN 14.285714 0.000000 0.000000
8856 Function DedicatedWorkerGlobalScope postMessage 53 0 998 0 9.239030 NaN 14.201290 0.000000 0.000000
15223 Function TrustedTypePolicy createScript 225 166 5909 3701 0.046810 0.041957 12.566570 7.982096 62.633271
6954 Function Window find 1 1 290 266 11.435331 10.901639 11.435331 10.901639 91.724138
2636 Function HTMLDivElement isEqualNode 4 4 350 245 14.583333 13.424658 10.373686 10.092331 70.000000
In [15]: df[df["api_type"] == "Function"].nlargest(20, "avg%interact/script") # type: ignore[reportArgumentType]
Out[15]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
5443 Function PerformanceObserverEntryList getEntriesByName 4 4 10 10 2.036660 23.255814 2.005391 28.875812 100.000000
4306 Function CanvasRenderingContext2D putImageData 7 7 1378 1146 0.600666 27.501800 7.636882 19.591959 83.164006
17399 Function PerformanceObserverEntryList getEntriesByType 38 38 3650 3159 4.730062 5.344725 15.360827 19.455392 86.547945
3938 Function HTMLAnchorElement removeEventListener 163 163 13043 11362 0.092295 0.108022 5.412695 16.769294 87.111861
2788 Function BroadcastChannel removeEventListener 3 2 3 2 0.603622 25.000000 0.666398 16.666667 66.666667
16387 Function HTMLLinkElement insertAdjacentHTML 29 29 29 29 16.666667 16.666667 16.666667 16.666667 100.000000
46 Function HTMLFormElement checkValidity 5 5 35 35 2.800000 9.333333 2.905555 13.967387 100.000000
595 Function HTMLAnchorElement isSameNode 30 30 117855 117855 8.029145 12.948267 7.101826 11.841353 100.000000
1527 Function HTMLCanvasElement dispatchEvent 5 5 5 5 2.659574 9.433962 2.702381 10.952381 100.000000
6954 Function Window find 1 1 290 266 11.435331 10.901639 11.435331 10.901639 91.724138
1336 Function Window getSelection 103 103 8064 8064 0.517555 0.579214 9.106014 10.571244 100.000000
2636 Function HTMLDivElement isEqualNode 4 4 350 245 14.583333 13.424658 10.373686 10.092331 70.000000
2536 Function HTMLAnchorElement blur 16 16 17 17 0.001361 0.001822 9.398725 9.830911 100.000000
13871 Function Selection toString 95 95 4873 4873 0.350078 0.381274 8.209906 9.385576 100.000000
3469 Function DOMException get message 8 5 15 11 0.007231 0.020454 9.319476 9.318414 73.333333
13750 Function DOMException get name 4 4 5 5 9.433962 9.433962 9.318182 9.318182 100.000000
11893 Function BroadcastChannel close 6 4 8 2 0.045961 0.510204 0.348012 8.333333 25.000000
3325 Function CanvasRenderingContext2D save 5 5 4700 4252 7.365501 7.344330 8.110019 8.297614 90.468085
4235 Function CanvasRenderingContext2D restore 5 5 4700 4252 7.365501 7.344330 8.110019 8.297614 90.468085
15223 Function TrustedTypePolicy createScript 225 166 5909 3701 0.046810 0.041957 12.566570 7.982096 62.633271
Top 20 API Set calls
In [16]: df[df["api_type"] == "Set"].nlargest(20, "appear") # type: ignore[reportArgumentType]
Out[16]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
15422 Set HTMLScriptElement src 4145 2406 15143 6913 0.033048 0.019939 2.485560 0.417693 45.651456
11866 Set HTMLScriptElement async 3037 1612 8652 2926 0.023194 0.010426 2.714181 0.277729 33.818770
8172 Set HTMLDocument cookie 2810 2376 25002 15393 0.057419 0.058849 5.065215 0.855618 61.567075
10020 Set HTMLAnchorElement href 2781 2633 43452 29806 0.078498 0.084109 0.583468 0.409084 68.595232
15256 Set HTMLScriptElement onload 2311 1711 15091 7425 0.035689 0.023283 1.407092 0.675300 49.201511
11456 Set CSSStyleDeclaration display 2158 1870 29629 20599 0.057507 0.064276 1.980954 1.334224 69.523102
3257 Set HTMLScriptElement type 1977 1171 4850 2048 0.059195 0.034339 2.318651 0.435811 42.226804
11781 Set HTMLDivElement innerHTML 1938 1811 16250 5709 0.036140 0.021380 0.966634 0.606579 35.132308
299 Set Window dataLayer 1859 1061 1911 405 0.028186 0.011752 9.510403 0.102686 21.193093
7094 Set XMLHttpRequest onreadystatechange 1780 1455 15562 11971 0.039432 0.041042 0.740359 0.471672 76.924560
12681 Set Image src 1693 1131 6466 3467 0.016076 0.011726 4.572729 1.848441 53.618930
13778 Set HTMLScriptElement onerror 1586 1291 12757 5854 0.033671 0.019983 0.744364 0.615878 45.888532
2347 Set XMLHttpRequest withCredentials 1564 1254 8258 6551 0.019977 0.023260 0.648240 0.361858 79.329135
14550 Set CSSStyleDeclaration width 1292 1097 19640 8521 0.065403 0.042337 1.446305 0.382616 43.385947
17552 Set HTMLImageElement src 1202 837 15598 5529 0.059533 0.037416 3.087841 1.070676 35.446852
258 Set CSSStyleDeclaration position 1192 969 21109 2744 0.052430 0.009619 0.507048 0.181102 12.999195
2148 Set HTMLIFrameElement src 1106 838 1383 543 0.011561 0.008270 0.710719 0.609099 39.262473
8985 Set HTMLDivElement id 1084 916 4409 709 0.013589 0.003192 0.643126 0.349976 16.080744
17444 Set MessagePort onmessage 1043 886 1228 133 0.003000 0.000449 0.391508 0.161973 10.830619
13010 Set CSSStyleDeclaration height 1003 833 29251 24464 0.069698 0.081981 0.304656 0.070588 83.634748
In [17]: df[df["api_type"] == "Set"].nlargest(20, "appear_interact") # type: ignore[reportArgumentType]
Out[17]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
10020 Set HTMLAnchorElement href 2781 2633 43452 29806 0.078498 0.084109 0.583468 0.409084 68.595232
15422 Set HTMLScriptElement src 4145 2406 15143 6913 0.033048 0.019939 2.485560 0.417693 45.651456
8172 Set HTMLDocument cookie 2810 2376 25002 15393 0.057419 0.058849 5.065215 0.855618 61.567075
11456 Set CSSStyleDeclaration display 2158 1870 29629 20599 0.057507 0.064276 1.980954 1.334224 69.523102
11781 Set HTMLDivElement innerHTML 1938 1811 16250 5709 0.036140 0.021380 0.966634 0.606579 35.132308
15256 Set HTMLScriptElement onload 2311 1711 15091 7425 0.035689 0.023283 1.407092 0.675300 49.201511
11866 Set HTMLScriptElement async 3037 1612 8652 2926 0.023194 0.010426 2.714181 0.277729 33.818770
7094 Set XMLHttpRequest onreadystatechange 1780 1455 15562 11971 0.039432 0.041042 0.740359 0.471672 76.924560
13778 Set HTMLScriptElement onerror 1586 1291 12757 5854 0.033671 0.019983 0.744364 0.615878 45.888532
2347 Set XMLHttpRequest withCredentials 1564 1254 8258 6551 0.019977 0.023260 0.648240 0.361858 79.329135
3257 Set HTMLScriptElement type 1977 1171 4850 2048 0.059195 0.034339 2.318651 0.435811 42.226804
12681 Set Image src 1693 1131 6466 3467 0.016076 0.011726 4.572729 1.848441 53.618930
14550 Set CSSStyleDeclaration width 1292 1097 19640 8521 0.065403 0.042337 1.446305 0.382616 43.385947
299 Set Window dataLayer 1859 1061 1911 405 0.028186 0.011752 9.510403 0.102686 21.193093
258 Set CSSStyleDeclaration position 1192 969 21109 2744 0.052430 0.009619 0.507048 0.181102 12.999195
8985 Set HTMLDivElement id 1084 916 4409 709 0.013589 0.003192 0.643126 0.349976 16.080744
17444 Set MessagePort onmessage 1043 886 1228 133 0.003000 0.000449 0.391508 0.161973 10.830619
2148 Set HTMLIFrameElement src 1106 838 1383 543 0.011561 0.008270 0.710719 0.609099 39.262473
17552 Set HTMLImageElement src 1202 837 15598 5529 0.059533 0.037416 3.087841 1.070676 35.446852
13010 Set CSSStyleDeclaration height 1003 833 29251 24464 0.069698 0.081981 0.304656 0.070588 83.634748
In [18]: df[df["api_type"] == "Set"].nlargest(20, "total") # type: ignore[reportArgumentType]
Out[18]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
9493 Set Event flow 121 121 72147 57173 0.262724 0.259411 0.292955 0.308683 79.245152
10020 Set HTMLAnchorElement href 2781 2633 43452 29806 0.078498 0.084109 0.583468 0.409084 68.595232
2968 Set Text textContent 42 41 37094 23641 1.142186 0.933173 0.917371 0.731892 63.732679
11456 Set CSSStyleDeclaration display 2158 1870 29629 20599 0.057507 0.064276 1.980954 1.334224 69.523102
9902 Set HTMLSpanElement textContent 497 463 29552 8465 0.262036 0.134479 1.362292 0.089505 28.644423
13010 Set CSSStyleDeclaration height 1003 833 29251 24464 0.069698 0.081981 0.304656 0.070588 83.634748
8172 Set HTMLDocument cookie 2810 2376 25002 15393 0.057419 0.058849 5.065215 0.855618 61.567075
4044 Set HTMLImageElement onload 510 440 21372 11883 0.361391 0.249478 0.810341 0.577726 55.600786
258 Set CSSStyleDeclaration position 1192 969 21109 2744 0.052430 0.009619 0.507048 0.181102 12.999195
12959 Set CSSStyleDeclaration left 565 532 20098 2004 0.078073 0.011633 0.940909 0.591199 9.971141
14550 Set CSSStyleDeclaration width 1292 1097 19640 8521 0.065403 0.042337 1.446305 0.382616 43.385947
16491 Set CSSStyleDeclaration fontFamily 143 116 16992 591 0.161403 0.007508 4.393393 0.188850 3.478107
13647 Set CSSStyleDeclaration transform 704 696 16935 14308 0.090705 0.128078 0.874879 0.462369 84.487747
11781 Set HTMLDivElement innerHTML 1938 1811 16250 5709 0.036140 0.021380 0.966634 0.606579 35.132308
5856 Set HTMLDivElement className 881 726 15672 4497 0.087585 0.033973 1.528997 1.046062 28.694487
131 Set Event isDefaultPrevented 191 191 15620 15617 0.351237 2.103407 0.222507 7.115688 99.980794
17552 Set HTMLImageElement src 1202 837 15598 5529 0.059533 0.037416 3.087841 1.070676 35.446852
7094 Set XMLHttpRequest onreadystatechange 1780 1455 15562 11971 0.039432 0.041042 0.740359 0.471672 76.924560
15422 Set HTMLScriptElement src 4145 2406 15143 6913 0.033048 0.019939 2.485560 0.417693 45.651456
15256 Set HTMLScriptElement onload 2311 1711 15091 7425 0.035689 0.023283 1.407092 0.675300 49.201511
In [19]: df[df["api_type"] == "Set"].nlargest(20, "interact") # type: ignore[reportArgumentType]
Out[19]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
9493 Set Event flow 121 121 72147 57173 0.262724 0.259411 0.292955 0.308683 79.245152
10020 Set HTMLAnchorElement href 2781 2633 43452 29806 0.078498 0.084109 0.583468 0.409084 68.595232
13010 Set CSSStyleDeclaration height 1003 833 29251 24464 0.069698 0.081981 0.304656 0.070588 83.634748
2968 Set Text textContent 42 41 37094 23641 1.142186 0.933173 0.917371 0.731892 63.732679
11456 Set CSSStyleDeclaration display 2158 1870 29629 20599 0.057507 0.064276 1.980954 1.334224 69.523102
131 Set Event isDefaultPrevented 191 191 15620 15617 0.351237 2.103407 0.222507 7.115688 99.980794
8172 Set HTMLDocument cookie 2810 2376 25002 15393 0.057419 0.058849 5.065215 0.855618 61.567075
13647 Set CSSStyleDeclaration transform 704 696 16935 14308 0.090705 0.128078 0.874879 0.462369 84.487747
7094 Set XMLHttpRequest onreadystatechange 1780 1455 15562 11971 0.039432 0.041042 0.740359 0.471672 76.924560
4044 Set HTMLImageElement onload 510 440 21372 11883 0.361391 0.249478 0.810341 0.577726 55.600786
204 Set Event isPropagationStopped 50 50 11723 11723 1.076866 1.692896 0.501635 3.173913 100.000000
11747 Set Event nativeEvent 50 50 11723 11723 1.076866 1.692896 0.501635 3.173913 100.000000
15818 Set Event persist 50 50 11723 11723 1.076866 1.692896 0.501635 3.173913 100.000000
14550 Set CSSStyleDeclaration width 1292 1097 19640 8521 0.065403 0.042337 1.446305 0.382616 43.385947
9902 Set HTMLSpanElement textContent 497 463 29552 8465 0.262036 0.134479 1.362292 0.089505 28.644423
15256 Set HTMLScriptElement onload 2311 1711 15091 7425 0.035689 0.023283 1.407092 0.675300 49.201511
15422 Set HTMLScriptElement src 4145 2406 15143 6913 0.033048 0.019939 2.485560 0.417693 45.651456
5704 Set CSSStyleDeclaration opacity 212 192 7607 6789 0.047285 0.070397 0.306024 0.537926 89.246746
935 Set HTMLAnchorElement onclick 287 274 13304 6744 0.126817 0.105791 0.135959 0.098332 50.691521
2347 Set XMLHttpRequest withCredentials 1564 1254 8258 6551 0.019977 0.023260 0.648240 0.361858 79.329135
Specific API of interest
addEventListener
on various elements all clearly indicate frontend processing, regardless of whether they are called after interaction started.In [22]: df[(df["attr"] == "addEventListener") & (df["appear"] > 100) & (df["api_type"] == "Function")]
Out[22]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
170 Function HTMLHtmlElement addEventListener 435 403 10386 2790 0.042678 0.015690 1.014821 0.156781 26.863085
482 Function HTMLLinkElement addEventListener 357 203 6291 1129 0.165954 0.097912 1.344452 0.234178 17.946272
926 Function HTMLUListElement addEventListener 318 302 1973 73 0.020669 0.000985 0.300937 0.006706 3.699949
1271 Function HTMLImageElement addEventListener 585 539 25288 11160 0.128254 0.094256 0.826761 0.479900 44.131604
1593 Function HTMLSpanElement addEventListener 346 333 7552 270 0.049295 0.002749 0.135847 0.027511 3.575212
3005 Function HTMLParagraphElement addEventListener 101 101 2233 2 0.130516 0.000176 0.138290 0.000166 0.089566
3031 Function HTMLSelectElement addEventListener 236 217 1468 21 0.046463 0.001421 0.465725 0.017949 1.430518
3308 Function NetworkInformation addEventListener 106 103 119 18 0.021710 0.008249 0.167541 0.024107 15.126050
3772 Function HTMLElement addEventListener 750 739 32724 2545 0.183205 0.022435 1.564308 1.332341 7.777167
5077 Function MediaQueryList addEventListener 708 671 5206 2308 0.014862 0.008460 0.791159 0.285986 44.333461
5616 Function HTMLAnchorElement addEventListener 1346 1174 77789 18943 0.184291 0.063770 6.796528 0.760201 24.351772
7048 Function HTMLDivElement addEventListener 2514 2184 252068 107180 0.391858 0.234839 2.477123 1.046851 42.520272
7638 Function HTMLVideoElement addEventListener 233 193 13401 2890 0.273275 0.080812 2.216936 0.073431 21.565555
8196 Function AbortSignal addEventListener 104 104 8336 8160 0.026733 0.031255 0.176075 0.404642 97.888676
9282 Function HTMLButtonElement addEventListener 1475 1215 21572 3449 0.048451 0.011562 1.259833 0.811528 15.988318
11331 Function HTMLFormElement addEventListener 384 302 1962 97 0.050651 0.005543 4.696687 0.949741 4.943935
13049 Function XMLHttpRequest addEventListener 113 97 1176 715 0.010391 0.019337 0.309974 0.138416 60.799320
13077 Function HTMLScriptElement addEventListener 523 481 2340 1130 0.010101 0.013179 1.327969 0.154031 48.290598
13558 Function HTMLLIElement addEventListener 390 376 9582 737 0.096313 0.010782 0.262191 0.054946 7.691505
14483 Function HTMLBodyElement addEventListener 1021 964 10257 4482 0.027587 0.016387 0.603541 0.057784 43.696987
14494 Function HTMLDocument addEventListener 5870 5040 38962 10686 0.045599 0.019454 1.434622 0.627379 27.426723
16637 Function HTMLInputElement addEventListener 623 585 4813 1395 0.014064 0.005429 0.332024 0.059557 28.984002
18119 Function Window addEventListener 7166 5668 56988 12036 0.087308 0.031169 1.998809 0.587056 21.120236
HTMLDocument.createElement
indicates either UX enhancement or DOM element generation. Since only 43% are called after interaction started, more than half of them are clearly DOM element generation.In [24]: df[(df["attr"] == "createElement") & (df["api_type"] == "Function")]
Out[24]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
2014 Function HTMLDocument createElement 9320 6330 286107 122524 0.296257 0.219127 2.612151 0.967649 42.824538
Window.requestAnimationFrame
is a clear indication of UX enhancement because it is strictly for animation. Much more of it is called after interaction started. Though calling it before interaction might make sense because the page might be animating before the user interacts.In [26]: df[(df["attr"] == "requestAnimationFrame") & (df["api_type"] == "Function")]
Out[26]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
9128 Function Window requestAnimationFrame 1655 1422 42244 28852 0.092087 0.097062 1.884883 3.626479 68.298457
Navigator.sendBeacon
is a clear indication of extensional features (tracking). It seems to be called more after interaction started.In [27]: df[(df["attr"] == "sendBeacon") & (df["api_type"] == "Function")]
Out[27]:
api_type this attr appear appear_interact total interact %total/total %interact/interact avg%total/script avg%interact/script %interact/total
1989 Function Navigator sendBeacon 1132 1085 4685 3613 0.011954 0.012742 0.395376 3.096326 77.118463
Analysis of selected YouTube scripts API calls
Overview
let mut logs = read_logs("headless_browser/target/youtube.com/0").unwrap();
let i_log = logs
.iter()
.enumerate()
.max_by_key(|(_, LogFile { records, .. })| records.len())
.map(|(i, _)| i)
.unwrap();
// Aggregate API calls.
let mut aggregate = RecordAggregate::default();
for entry in &logs[i_log].records {
let (line, record) = entry.clone();
if let Err(err) = aggregate.add(line as u32, record) {
println!("{line}: {err}");
}
}
let mut ids = aggregate.scripts.keys().copied().collect::<Vec<_>>();
ids.sort_unstable();
for id in &ids {
let script = &aggregate.scripts[id];
println!(
"{id} line#{} source~{}kB used {} APIs {:?} {:?}",
script.line,
script.source.len() / 1024,
script.api_calls.len(),
script.injection_type,
script.name,
);
}
6 line#348 source~0kB used 0 APIs Not Url("https://www.youtube.com/")
7 line#351 source~0kB used 2 APIs Not Url("https://www.youtube.com/")
8 line#355 source~3kB used 6 APIs Not Url("https://www.youtube.com/")
9 line#363 source~0kB used 5 APIs Not Url("https://www.youtube.com/")
10 line#372 source~0kB used 0 APIs Not Url("https://www.youtube.com/")
11 line#374 source~2kB used 14 APIs Not Url("https://www.youtube.com/")
12 line#831 source~37kB used 44 APIs Not Url("https://www.youtube.com/s/desktop/72b8c307/jsbin/spf.vflset/spf.js")
13 line#951 source~14kB used 19 APIs Not Url("https://www.youtube.com/s/desktop/72b8c307/jsbin/network.vflset/network.js")
14 line#1112 source~8466kB used 1147 APIs Not Url("https://www.youtube.com/s/desktop/72b8c307/jsbin/desktop_polymer.vflset/desktop_polymer.js")
15 line#413 source~50kB used 37 APIs Not Url("https://www.youtube.com/s/desktop/72b8c307/jsbin/web-animations-next-lite.min.vflset/web-animations-next-lite.min.js")
16 line#591 source~5kB used 1 APIs Not Url("https://www.youtube.com/s/desktop/72b8c307/jsbin/intersection-observer.min.vflset/intersection-observer.min.js")
17 line#782 source~5kB used 1 APIs Not Url("https://www.youtube.com/s/desktop/72b8c307/jsbin/www-i18n-constants-en_US.vflset/www-i18n-constants.js")
18 line#796 source~11kB used 17 APIs Not Url("https://www.youtube.com/s/desktop/72b8c307/jsbin/www-tampering.vflset/www-tampering.js")
19 line#717 source~9kB used 28 APIs Not Url("https://www.youtube.com/s/desktop/72b8c307/jsbin/scheduler.vflset/scheduler.js")
20 line#467 source~2kB used 8 APIs Not Url("https://www.youtube.com/s/desktop/72b8c307/jsbin/custom-elements-es5-adapter.vflset/custom-elements-es5-adapter.js")
21 line#482 source~77kB used 168 APIs Not Url("https://www.youtube.com/s/desktop/72b8c307/jsbin/webcomponents-sd.vflset/webcomponents-sd.js")
22 line#595 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
23 line#607 source~445kB used 6 APIs Not Url("https://www.youtube.com/")
24 line#705 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
25 line#947 source~0kB used 0 APIs Not Url("https://www.youtube.com/")
26 line#986 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
27 line#998 source~0kB used 3 APIs Not Url("https://www.youtube.com/")
28 line#1009 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
30 line#1021 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
31 line#1036 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
32 line#1048 source~0kB used 2 APIs Not Url("https://www.youtube.com/")
33 line#1054 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
34 line#1066 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
35 line#1078 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
36 line#1090 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
37 line#1101 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
41 line#89054 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
42 line#89187 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
43 line#89498 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
44 line#89581 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
45 line#89653 source~0kB used 4 APIs Not Url("https://www.youtube.com/")
46 line#89728 source~0kB used 14 APIs Not Url("https://www.youtube.com/")
47 line#89775 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
48 line#89858 source~0kB used 0 APIs Not Url("https://www.youtube.com/")
49 line#89863 source~0kB used 1 APIs Not Url("https://www.youtube.com/")
50 line#92474 source~11kB used 5 APIs Injected Empty
55 line#94972 source~0kB used 0 APIs Injected Empty
56 line#94977 source~0kB used 4 APIs Injected Eval { parent_script_id: 50 }
58 line#94996 source~0kB used 4 APIs Injected Eval { parent_script_id: 50 }
61 line#95117 source~227kB used 126 APIs Injected Eval { parent_script_id: 50 }
64 line#95189 source~0kB used 0 APIs Interaction Eval { parent_script_id: 50 }
65 line#260337 source~277kB used 202 APIs Not Url("https://www.youtube.com/s/desktop/72b8c307/jsbin/www-searchbox.vflset/www-searchbox.js")
Selected scripts inspection
https://www.youtube.com/s/desktop/72b8c307/jsbin/www-tampering.vflset/www-tampering.js
is part of the Closure Library to detect if the page is tempered. The API calls is clearly mainly for extensional features, gathering information from userAgent
and doing math:for (api_call, lines) in &aggregate.scripts[&18].api_calls {
println!(
"{}/{} times: {api_call:?}",
lines.n_may_interact(),
lines.len()
);
}
0/2 times: ApiCall { api_type: Get, this: "Window", attr: Some("Symbol") }
0/4 times: ApiCall { api_type: Get, this: "Window", attr: Some("yt") }
0/1 times: ApiCall { api_type: Get, this: "Navigator", attr: Some("userAgentData") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("Object") }
0/6 times: ApiCall { api_type: Get, this: "Navigator", attr: Some("userAgent") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("execScript") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("String") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("window") }
0/7 times: ApiCall { api_type: Get, this: "Window", attr: Some("navigator") }
0/1 times: ApiCall { api_type: Function, this: "Navigator", attr: Some("get userAgentData") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("WeakMap") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("Array") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("Set") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("Math") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("ytbin") }
0/1 times: ApiCall { api_type: Set, this: "Window", attr: Some("ytbin") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("Map") }
https://www.youtube.com/s/desktop/72b8c307/jsbin/desktop_polymer.vflset/desktop_polymer.js
contains the Polymer code that generates HTML. These are mainly event-related APIs for frontend processing and APIs for DOM element generation (perhaps a bit of UX enhancement as well):for (api_call, lines) in &aggregate.scripts[&14].api_calls {
if lines.len() > 1000 {
println!(
"{}/{} times: {api_call:?}",
lines.n_may_interact(),
lines.len()
);
}
}
21747/22409 times: ApiCall { api_type: Get, this: "HTMLDivElement", attr: Some("usePatchedLifecycles") }
21740/21792 times: ApiCall { api_type: Function, this: "HTMLDivElement", attr: Some("querySelectorAll") }
10185/10187 times: ApiCall { api_type: Function, this: "DocumentFragment", attr: Some("appendChild") }
2433/2434 times: ApiCall { api_type: Get, this: "Event", attr: Some("type") }
43621/45032 times: ApiCall { api_type: Get, this: "HTMLDivElement", attr: Some("tagName") }
7139/7143 times: ApiCall { api_type: Function, this: "HTMLBodyElement", attr: Some("appendChild") }
10185/10226 times: ApiCall { api_type: Get, this: "DocumentFragment", attr: Some("isConnected") }
21754/21780 times: ApiCall { api_type: Get, this: "HTMLDivElement", attr: Some("isConnected") }
6303/6449 times: ApiCall { api_type: Get, this: "DocumentFragment", attr: Some("children") }
3786/3803 times: ApiCall { api_type: Function, this: "DocumentFragment", attr: Some("get children") }
2478/2480 times: ApiCall { api_type: Get, this: "Event", attr: Some("pageY") }
7139/7143 times: ApiCall { api_type: Get, this: "HTMLBodyElement", attr: Some("isConnected") }
0/8817 times: ApiCall { api_type: Get, this: "Window", attr: Some("Reflect") }
2478/2480 times: ApiCall { api_type: Get, this: "Event", attr: Some("target") }
2478/2480 times: ApiCall { api_type: Get, this: "Event", attr: Some("pageX") }
2064/2065 times: ApiCall { api_type: Get, this: "Event", attr: Some("keyCode") }
10717/10717 times: ApiCall { api_type: Function, this: "HTMLBodyElement", attr: Some("removeChild") }
23
on line #607
embedded in the HTML only sets a large 440kB JSON for various string templates for the UI (e.g., Downloading 1 video...
), thus kind of counts as DOM element generation:for (api_call, lines) in &aggregate.scripts[&23].api_calls {
println!(
"{}/{} times: {api_call:?}",
lines.n_may_interact(),
lines.len()
);
}
0/1 times: ApiCall { api_type: Set, this: "Window", attr: Some("ytcfg") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("innerWidth") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("innerHeight") }
0/1 times: ApiCall { api_type: Set, this: "Window", attr: Some("ytplayer") }
0/2 times: ApiCall { api_type: Get, this: "Window", attr: Some("ytcfg") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("yt") }
https://www.youtube.com/s/desktop/72b8c307/jsbin/network.vflset/network.js
only calls a few APIs, but hits the ones we are interested in for frontend processing, DOM element generation, and extensional features. It seems to contain a portion of the structured page fragments (SPF) library used for dynamic content fetching and rendering.for (api_call, lines) in &aggregate.scripts[&13].api_calls {
println!(
"{}/{} times: {api_call:?}",
lines.n_may_interact(),
lines.len()
);
}
0/2 times: ApiCall { api_type: Function, this: "Window", attr: Some("addEventListener") }
0/2 times: ApiCall { api_type: Get, this: "Window", attr: Some("Symbol") }
0/1 times: ApiCall { api_type: Function, this: "Window", attr: Some("postMessage") }
0/4 times: ApiCall { api_type: Get, this: "Window", attr: Some("removeEventListener") }
0/2 times: ApiCall { api_type: Get, this: "Window", attr: Some("spf") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("Array") }
0/2 times: ApiCall { api_type: Get, this: "Window", attr: Some("postMessage") }
0/1 times: ApiCall { api_type: Get, this: "HTMLDivElement", attr: Some("style") }
0/1 times: ApiCall { api_type: Get, this: "HTMLDocument", attr: Some("createElement") }
0/2 times: ApiCall { api_type: Function, this: "Window", attr: Some("removeEventListener") }
0/1 times: ApiCall { api_type: Function, this: "HTMLDocument", attr: Some("createElement") }
0/4 times: ApiCall { api_type: Get, this: "Window", attr: Some("addEventListener") }
0/1 times: ApiCall { api_type: Get, this: "Performance", attr: Some("timing") }
0/1 times: ApiCall { api_type: Function, this: "Performance", attr: Some("get timing") }
0/1 times: ApiCall { api_type: Get, this: "Window", attr: Some("Math") }
0/1 times: ApiCall { api_type: Set, this: "Window", attr: Some("spf") }
0/2 times: ApiCall { api_type: Get, this: "MessageEvent", attr: Some("data") }
0/1 times: ApiCall { api_type: Get, this: "Performance", attr: Some("now") }
0/3 times: ApiCall { api_type: Get, this: "Window", attr: Some("performance") }
27
on line #998
only loads fonts (UX enhancements).for (api_call, lines) in &aggregate.scripts[&27].api_calls {
println!(
"{}/{} times: {api_call:?}",
lines.n_may_interact(),
lines.len()
);
}
0/2 times: ApiCall { api_type: Function, this: "FontFaceSet", attr: Some("load") }
0/3 times: ApiCall { api_type: Get, this: "FontFaceSet", attr: Some("load") }
0/4 times: ApiCall { api_type: Get, this: "HTMLDocument", attr: Some("fonts") }
println!("{}", &aggregate.scripts[&27].source);
if (document.fonts && document.fonts.load) {document.fonts.load("400 10pt Roboto", ""); document.fonts.load("500 10pt Roboto", "");}
The
eval
Trickeval
creates a separate browser execution context, so the idea is to use it to break down scripts without the need of including additional scripts. The high-level idea is to split each script and subsequently each block recursively into blocks of eval
calls smaller than 1kB (an arbitrary number), and somehow have the whole script behave as before.Why it should work
eval
calls are executed in the same scope as the caller, with read-write access to all of the surrounding variables.Challenges and workarounds
eval
trick is a giant hack due to the quirks of eval
.eval
them first.eval
s, and would capture the variable in the outer scope with the same name instead, causing errors. Variables created in eval
do not leak out unless they are declared with var
in non-strict mode.var
or let
, then make sure they are not declared again in the eval
calls on the same level.var
to their beginning so most of them work, but the assigned variables do not leak out. This is usually fine because most such assignments are for local variables anyway.return
statements cannot return from inside eval
.eval
block that contains return
statements not in nested functions or classes, we wrap it in an immediately invoked function expression (IIFE) so that the return
s are valid. We call the IIFE with .call(this)
to preserve this
. We then check the return value of eval
and return early it if it is not undefined
.import
statements are not allowed inside eval
.export
statements are not allowed inside eval
.break
, continue
and yield
statements may not be able to reach the correct outer scopes inside eval
.eval
in loops or generators until we hit a function or class boundary.await
does not work inside eval
.await
on the eval
and use an async IIFE if an IIFE is used.eval
s cause mountains of backslashes.String.raw
to avoid escaping backslashes and nest nested eval
calls in functions to escape the backticks and ${
at runtime.eval
to over depth 8.eval
nesting to 8 and inline the deeper blocks.Inherent limitations
eval
disables the JIT and forces slow variable lookups.Classification results
Initial results on top 100 websites
script_features.py
):Feature Count Percentage (%) Size (MB) Size Percentage (%) Total Scripts 40116 - 3192.7 - Frontend Processing 14129 35.22 2864.3 89.72 DOM Element Generation 8196 20.43 2248.9 70.44 UX Enhancement 4496 11.21 1840.7 57.65 Extensional Features 4915 12.25 1888.1 59.14 Silent Scripts 10260 25.58 28.1 0.88 Has Request 4205 10.48 1432.1 44.86 Queries Element 13640 34.00 2731.6 85.56 Uses Storage 4571 11.39 1641.0 51.40 No Sure Category 13148 32.77 221.1 6.93 No Category 10178 25.37 179.7 5.63 Feature Combination Frontend Processing DOM Element Generation UX Enhancement Extensional Features Has Request Queries Element DOM Element Generation 6602 (16.46%), 2197.1MB (68.82%) - - - - - UX Enhancement 3840 (9.57%), 1821.5MB (57.05%) 3125 (7.79%), 1613.2MB (50.53%) - - - - Extensional Features 4229 (10.54%), 1841.1MB (57.67%) 2703 (6.74%), 1562.9MB (48.95%) 1844 (4.60%), 1440.5MB (45.12%) - - - Has Request 3981 (9.92%), 1428.9MB (44.76%) 2338 (5.83%), 1192.9MB (37.36%) 1459 (3.64%), 1162.6MB (36.42%) 1755 (4.37%), 1226.1MB (38.40%) - - Queries Element 9379 (23.38%), 2648.1MB (82.94%) 6846 (17.07%), 2152.1MB (67.41%) 3754 (9.36%), 1773.7MB (55.55%) 3508 (8.74%), 1800.3MB (56.39%) 3627 (9.04%), 1410.1MB (44.17%) - Uses Storage 4335 (10.81%), 1633.2MB (51.15%) 2728 (6.80%), 1362.7MB (42.68%) 1669 (4.16%), 1278.9MB (40.06%) 2093 (5.22%), 1372.2MB (42.98%) 2424 (6.04%), 1106.1MB (34.64%) 3722 (9.28%), 1596.9MB (50.02%) Feature Combination Scripts Count (%) Size (MB) (%) Frontend Processing & DOM Element Generation & UX Enhancement 2813 (7.01%) 1604.9MB (50.27%) Frontend Processing & DOM Element Generation & Extensional Features 2679 (6.68%) 1533.0MB (48.02%) Frontend Processing & UX Enhancement & Extensional Features 1814 (4.52%) 1439.2MB (45.08%) DOM Element Generation & UX Enhancement & Extensional Features 1482 (3.69%) 1296.6MB (40.61%) Results with the
eval
trick on top 1000 websiteseval
trick applied to scripts. The eval
trick rewrites each script into smaller blocks of eval
calls, creating V8 execution contexts that maps to smaller blocks of source code. The unit here is execution context, not script anymore.else
branch, etc.Feature Count Percentage (%) Size (MB) Size Percentage (%) Total Contexts 21027954 - 26163.2 - Silent Contexts 17572814 83.57 12850.4 49.12 Frontend Processing 338273 1.61 9469.6 36.19 DOM Element Generation 72286 0.34 6250.1 23.89 UX Enhancement 26536 0.13 5392.0 20.61 Extensional Features 32647 0.16 5617.2 21.47 Has Request 16821 0.08 4318.8 16.51 Queries Element 120844 0.57 7719.9 29.51 Uses Storage 26081 0.12 5142.6 19.66 Feature Combination Frontend Processing DOM Element Generation UX Enhancement Extensional Features Has Request Queries Element DOM Element Generation 20763 (0.10%), 5924.2MB (22.64%) - - - - - UX Enhancement 15544 (0.07%), 5229.6MB (19.99%) 12910 (0.06%), 4537.7MB (17.34%) - - - - Extensional Features 13557 (0.06%), 5540.2MB (21.18%) 6207 (0.03%), 4006.9MB (15.31%) 5369 (0.03%), 3667.5MB (14.02%) - - - Has Request 10514 (0.05%), 4276.4MB (16.35%) 5694 (0.03%), 3081.3MB (11.78%) 4519 (0.02%), 3041.1MB (11.62%) 4926 (0.02%), 3572.3MB (13.65%) - - Queries Element 33064 (0.16%), 7174.1MB (27.42%) 29577 (0.14%), 5575.6MB (21.31%) 15820 (0.08%), 4875.0MB (18.63%) 7871 (0.04%), 4812.3MB (18.39%) 7730 (0.04%), 3857.2MB (14.74%) - Uses Storage 13332 (0.06%), 5067.7MB (19.37%) 7018 (0.03%), 3569.3MB (13.64%) 5109 (0.02%), 3400.2MB (13.00%) 5215 (0.02%), 3815.2MB (14.58%) 6573 (0.03%), 3583.0MB (13.69%) 9930 (0.05%), 4640.1MB (17.74%) Feature Combination Scripts Count (%) Size (MB) (%) Frontend Processing & DOM Element Generation & UX Enhancement 10291 (0.05%) 4483.3MB (17.14%) Frontend Processing & DOM Element Generation & Extensional Features 6022 (0.03%) 3975.0MB (15.19%) Frontend Processing & UX Enhancement & Extensional Features 5241 (0.02%) 3664.5MB (14.01%) DOM Element Generation & UX Enhancement & Extensional Features 4078 (0.02%) 3235.9MB (12.37%) Photo Crypto Auth
what about photo without signature? don’t careliterature
P3: Toward Privacy-Preserving Photo Sharing, Moo-Ryong Ra, Ramesh Govindan, Antonio Ortega, NSDI 13
Mitigating Image-based Misinformation Campaigns, Calvin Ardi, Harsha V. Madhyastha
C2PA
PKI & blockchain
idea
difficulties implementing C2PA on mobile
C2PA-Related Papers
Introduce C2PA
Enhance or extend C2PA
Applying C2PA
Alternatives or complements to C2PA
Scaling PKI
Fake image detection
Anti-training
Adoption of C2PA-like standards
Peripheral Literature for C2PA
C2PA use case
C2PA implementation
Image metadata
Phone keystore security
C2PA Camera Apps Investigation
Click Capture Cam ProofMode 1733499304 bafybeiaq7b5fyime2l3lb6niajrmksfipe26r2rycxius3twog7vcarjma IMG_4717 Signers and public key
Click issued a certificate through ContentSign for me, presumably by adding a reference on the blockchain.
$ c2patool --certs 1733499304.jpg | openssl x509 -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 7991087401548139538 (0x6ee60ba732604c12)
Signature Algorithm: ecdsa-with-SHA256
Issuer: CN=Click Camera, O=ContentSign by Nodle
Validity
Not Before: Dec 6 15:31:26 2024 GMT
Not After : Dec 6 15:31:26 2025 GMT
Subject: CN=0xF0Ed23507BC598b1eA0FdB7c31517B26538829F9, O=Click Camera
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:ad:da:20:bd:92:b7:95:e9:7c:82:0f:51:b6:79:
6b:35:4c:44:27:d1:04:83:f8:fa:10:84:40:8c:b1:
73:35:75:f8:59:1b:d9:f6:7a:ab:be:7d:90:21:4d:
c0:98:40:fd:5c:3a:ab:bf:48:5b:08:7d:2b:f7:3e:
87:3a:7b:f2:63
ASN1 OID: prime256v1
NIST CURVE: P-256
X509v3 extensions:
X509v3 Basic Constraints:
CA:FALSE
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Client Authentication, E-mail Protection
X509v3 Subject Key Identifier:
13:5D:2D:9D:71:F0:A5:D5:17:41:0E:79:29:C5:2C:35:13:1F:46:DC
X509v3 Authority Key Identifier:
keyid:4C:F9:B3:FD:D5:E6:6B:C3:0D:EB:2C:0E:53:AF:2B:22:30:B5:DA:0F
DirName:/O=Click App/CN=Click App ContentSign CA 1
serial:8C:DF:B9:CC:33:D1:F8:58:97:28:5C:95:14:C5:38:2D:CF:80:63
Signature Algorithm: ecdsa-with-SHA256
Signature Value:
30:46:02:21:00:8a:72:fe:22:a4:67:84:20:b8:90:e7:b9:2a:
36:b2:4a:bf:5d:f8:1a:56:f8:89:30:82:b8:0f:ab:6a:4a:ae:
c2:02:21:00:92:e4:ef:6f:2f:41:a0:54:81:19:f9:89:f1:3c:
ea:02:ef:11:4f:4b:0d:02:a6:68:6e:e3:72:14:63:fe:6e:81
Capture Cam seems to reuse its own pubkey w/ certificate issued by DigiCert.
$ c2patool --certs bafybeiaq7b5fyime2l3lb6niajrmksfipe26r2rycxius3twog7vcarjma.jpg | openssl x509 -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
0c:59:4a:96:f5:77:2e:e9:55:f3:02:4b:21:98:82:86
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=US, O=DigiCert Inc, OU=www.digicert.com, CN=DigiCert Assured ID Client CA G2
Validity
Not Before: Feb 26 00:00:00 2024 GMT
Not After : Feb 26 23:59:59 2025 GMT
Subject: organizationIdentifier=NTRTW-82885990, C=TW, ST=Taipei City, O=Numbers Co., Ltd., SN=Chen, GN=Bofu, CN=Bofu Chen
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (384 bit)
pub:
04:c2:4e:fc:1a:84:99:0c:4e:43:03:40:87:ed:86:
96:a6:ed:14:22:99:dd:b6:86:e5:98:a1:30:6f:e0:
25:7d:71:08:78:86:ff:7e:68:d3:1d:a3:ac:43:67:
90:3e:dd:de:49:d1:5f:64:0e:92:7e:48:17:d2:ce:
cb:f2:a2:b1:e8:fd:08:81:78:6e:49:2c:b6:45:0b:
e5:a7:16:d4:87:ef:39:fd:e0:c3:1b:de:61:73:7b:
81:58:ec:bc:f5:d6:65
ASN1 OID: secp384r1
NIST CURVE: P-384
X509v3 extensions:
X509v3 Authority Key Identifier:
A5:62:20:50:DC:BB:5B:57:97:AD:23:8F:35:E2:54:6C:A9:7E:F9:4E
X509v3 Subject Key Identifier:
93:E5:50:21:D4:38:61:0C:C0:15:E3:17:BF:19:52:F6:96:FE:7F:D8
X509v3 Subject Alternative Name:
email:hi@numbersprotocol.io
X509v3 Certificate Policies:
Policy: 2.23.140.1.5.3.1
X509v3 Key Usage: critical
Digital Signature, Key Agreement
X509v3 Extended Key Usage:
E-mail Protection
X509v3 CRL Distribution Points:
Full Name:
URI:http://crl3.digicert.com/DigiCertAssuredIDClientCAG2.crl
Full Name:
URI:http://crl4.digicert.com/DigiCertAssuredIDClientCAG2.crl
Authority Information Access:
OCSP - URI:http://ocsp.digicert.com
CA Issuers - URI:http://cacerts.digicert.com/DigiCertAssuredIDClientCAG2.crt
Signature Algorithm: sha256WithRSAEncryption
Signature Value:
84:f6:67:53:cf:04:ee:05:8c:78:26:e1:2a:b4:66:39:e5:86:
60:0e:b8:dc:61:a3:83:86:2f:37:c1:39:26:df:3c:27:69:f0:
73:e2:0b:14:18:9c:16:78:58:75:10:c2:1e:71:2a:e1:41:73:
5c:04:2f:65:ec:a8:d7:d6:d5:7e:42:fa:9b:07:0a:5e:df:06:
14:50:52:8e:73:6f:12:58:ea:e9:10:2d:3c:93:ca:dd:2d:a7:
36:ff:1d:1a:4a:ab:01:98:97:de:37:a8:e4:58:78:d6:ea:77:
be:9f:00:92:1b:da:4d:e6:1e:c5:88:72:5e:9b:61:26:10:3f:
3d:67:26:35:4f:10:ef:e4:29:3d:5a:e2:72:d1:74:17:b3:a4:
21:b7:a9:87:79:1c:c8:cb:a2:7c:f4:2c:43:45:fe:67:f3:06:
a2:66:1d:c0:72:1f:2e:88:90:f2:5d:a8:29:73:9b:04:57:a4:
b3:5d:ba:da:f1:ea:5d:cb:99:28:8b:72:22:5c:93:8f:f3:18:
a5:09:5d:1a:06:2f:47:cb:c9:4d:b7:70:b2:98:6a:92:19:6c:
94:27:24:10:c7:1a:a1:ff:93:e8:f3:73:75:09:24:bc:98:41:
f7:c1:85:d1:76:45:fc:71:1e:bd:0e:3f:fa:32:0e:c6:b8:b5:
9e:fc:b2:cb
ProofMode issued a certificate for me.
$ c2patool --certs IMG_4717.JPG | openssl x509 -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
f7:13:e2:f6:79:17:45:63:f2:e5:ba:bd:75:b3:90:c8:37:9f:9e:cf
Signature Algorithm: ecdsa-with-SHA512
Issuer: CN=ProofMode-Root Offline Root CA, O=ProofMode-Root
Validity
Not Before: Dec 11 16:39:28 2024 GMT
Not After : Dec 11 16:39:28 2025 GMT
Subject: CN=ProofMode-User Content Credentials, O=ProofMode-User
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:ae:7a:c2:b2:1a:81:58:d6:d6:fd:30:32:56:d9:
5b:f1:69:ce:a9:48:c1:98:bf:99:78:97:00:ac:25:
3d:26:d4:fe:38:17:35:8a:98:38:f1:c2:f9:20:dc:
d3:b7:72:ca:a6:d4:dd:cd:b9:ce:46:c6:87:be:c1:
d1:a6:62:ef:4b
ASN1 OID: prime256v1
NIST CURVE: P-256
X509v3 extensions:
X509v3 Basic Constraints:
CA:FALSE
X509v3 Key Usage:
Digital Signature
X509v3 Subject Key Identifier:
8C:96:6B:28:0A:89:16:C1:2F:6C:32:98:12:23:99:3D:0C:8B:4E:42
X509v3 Authority Key Identifier:
8C:96:6B:28:0A:89:16:C1:2F:6C:32:98:12:23:99:3D:0C:8B:4E:42
X509v3 Extended Key Usage:
E-mail Protection
Signature Algorithm: ecdsa-with-SHA512
Signature Value:
30:45:02:21:00:99:cf:eb:9a:f1:76:c6:65:5c:44:b9:ae:e2:
a1:a0:69:d3:9d:57:d8:e8:7c:ef:e6:97:8e:be:dd:9d:5e:95:
57:02:20:54:10:b2:e7:9d:a1:f0:a4:43:a7:f1:5a:43:bd:9e:
3e:af:f2:85:cc:45:88:93:35:4d:28:80:3e:8a:b6:63:c0
Verify
The Content Credential issuer couldn’t be recognized. This file may not come from where it claims to.
for ProofMode photoc2patool
by default is happy w/ all photoc2patool
not happy w/ Click & ProofMode photo after supplying Verify known certificate trust list, but happy w/ Capture Cam photoCommand output w/ my fork w/ select extra logging.
$ export C2PATOOL_TRUST_ANCHORS='https://contentcredentials.org/trust/anchors.pem'
$ export C2PATOOL_ALLOWED_LIST='https://contentcredentials.org/trust/allowed.sha256.txt'
$ export C2PATOOL_TRUST_CONFIG='https://contentcredentials.org/trust/store.cfg'
$ RUST_LOG=trace cargo r -- 1733499304.jpg trust
[2024-12-10T23:33:09Z DEBUG c2patool] Using trust anchors from Url(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("contentcredentials.org")), port: None, path: "/trust/anchors.pem", query: None, fragment: None })
[2024-12-10T23:33:09Z DEBUG c2patool] Using allowed list from Url(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("contentcredentials.org")), port: None, path: "/trust/allowed.sha256.txt", query: None, fragment: None })
[2024-12-10T23:33:09Z DEBUG c2patool] Using trust config from Url(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("contentcredentials.org")), port: None, path: "/trust/store.cfg", query: None, fragment: None })
[2024-12-10T23:33:09Z TRACE c2pa::cose_validator] verify_cose: cert_check=true
[2024-12-10T23:33:09Z TRACE c2pa::cose_validator] check_cert: Extended key usage=ExtendedKeyUsage { any: false, server_auth: false, client_auth: true, code_signing: false, email_protection: true, time_stamping: false, ocsp_signing: false, other: [] }
[2024-12-10T23:33:09Z TRACE c2pa::cose_validator] check_trust: signing_time_epoc=Some(1733499285)
[2024-12-10T23:33:09Z TRACE c2pa::openssl::openssl_trust_handler] verify_trust: allowed_list={"igPZ4cQ+ElHb3Jagyp4o4LMVwkBicDWpm+oVuT9ctx0=", "QyrjnZlBhV+F2xAeNsb30KgVqD4QAQ+00rUtIfdXIzo=", "Z2O9JqjrmFVXI61XVcNlsodMzG1466HYQVyf+BkfWD8=", "5Obdx/KBgUxCpnrN1FVKm8TxKCGVUeJaFN7jh09SnOA=", "xXs/IWgBBAatf7AEbsXZNwgiV2zViVZFqbtqFgc0uXA=", "9heOFnvHjLz/iSNA37kqvnA40LDfSnr0UnyqZUECx7w=", "ad8hM2xz531vByVp8Hovpqv78qY0zV9zrLnRTYYdxB0=", "K9mtOEK4IaaomtzodP9jNUKxeWXb/VmZnYg7wCGG7r0=", "9qobPTVKRcOkldyUHSqN47xQN/V8tWrnW15DlUEco14=", "MFcAt2slSc/tA2sbjMNYRu26khsUmjqZ4xaQ+pfw4eI=", "SwJmKrvd0a561mMCdmrEhSGIj9tda8HjaTsTy8CCjRo=", "U2h3hfYy4GRIk7KCpu819C3nLko6A/NRn5xND2jYIng=", "vLyqdFQcIV1wOdjZvMD9rAQnvcrtOgjRRjxOF7HcyIw=", "fPAhO8QbeH8RUyk2673iIIoNFor6tZi8Fsshu07ieHI=", "AVVAhHuItD0yCby1WTAOFQOGPwiXwsgvTVbc/vlpSK0=", "rAnk74zT9+yxSkgHRFbaMC7mi/FprjBvnaeFGAroSac=", "o6uXNf1TKvSiNUL15CTvrGpfKEjCxpYrawNpYrfpkxc=", "J3TPotkZdPCd2iOdX4wmkudA821oz621eT/CAT2Zasc=", "UkQdwe7X8cO+Pj1Sb+VXxqkozHwXgC5YEqeiM42eA9w=", "czGRID6tAPpW8H+BtsxRZ38Y1cSRqebzg+MCjpBT1f4=", "OVkKb9nszYa6YpVdVfOEv481CrHaeaRnTbXM7zQC5KA=", "mfYmsZ1px3Bo+ZUvcttATyzVa9Pj8nNtHzsj99JgLwE=", "jko53/VSR3DMMb1nWmNx2+C+eh4U6CMV3BCw6tOgEVE=", "Jcv4yjJRVoH3zT76I7PzNY4cGDsT+jKNfRwu+mDLrpA=", "r8Ozpwn+U+j6fjWGiikhChvVXhGFZuMjyxgJq2S53EM=", "h72o9aSy/NjDN4dD8xdtlX28dm4c0ERupBgVuKCCltw=", "qskM23c5D9ZOelxRn8ZHfBPJVMPcpDmp53TAXUTGlVY=", "En07oVKGfm1psGJNDXqzHouLz3sjQ22uehYhHfLztQc=", "RYiv7AH1j4iSgkRM0m9CyFnvXPDZQbUTQW5C2RWlX7k=", "CaJnmoY4M0Rc4VO/v/v5diZCx/JlohwsDLm9RHOG6JI=", "28qM81MvRsS8Il4OhoYfzSQ1dnSIgepF9/553j2+MTc=", "eS7xBJoBUu4SJmCjg3gY4h1mlzwNGECqcBkaAMmlAGc=", "6wtSrm8Zm4xkIqkI9GB1lMhW90dzJVZPhOdSUL4lRTk=", "XCYKS7pr8jrDLX7NeUXrldi1pAsDm6aqovGCO4iPY0Q=", "opQNg+Qgg7MwDSY7PEcBCMpP5V9qJkF3BZp97MENFcQ=", "lVK0M1Sn1rq0KUvDqIo8/Py5MWBpb6t/T0SWgUWyWHE=", "xay/SRpiM24PoQ0V12PB8NmSdFt0X78ummtUmiUca7E=", "oAhDShFP/R0lcjRXxIaZLfLd9FrSLCmBe76XCbfssjE=", "9xnJg+oMcadqQCUtdEhBAt22rBInCbqyaaK3V3jaJmc=", "ONSIhEwHVB2K7a9RcPnBdcw2l+h4QHQLCA1OlSd7zFQ=", "QSa4RsH9d5KTNMx0WlXh+VHj5NAhcj6tHtluLmNySdM=", "I7jqB0noPFAMx3l69LFaThD+G2/WVivV8N4Z6EW/NFM="}
[2024-12-10T23:33:09Z TRACE c2pa::openssl::openssl_trust_handler] verify_trust: chain=[X509 { serial_number: "6EE60BA732604C12", signature_algorithm: ecdsa-with-SHA256, issuer: [commonName = "Click Camera", organizationName = "ContentSign by Nodle"], subject: [commonName = "0xF0Ed23507BC598b1eA0FdB7c31517B26538829F9", organizationName = "Click Camera"], not_before: Dec 6 15:31:26 2024 GMT, not_after: Dec 6 15:31:26 2025 GMT, public_key: PKey { algorithm: "EC" } }, X509 { serial_number: "8CDFB9CC33D1F85897285C9514C5382DCF8063", signature_algorithm: ecdsa-with-SHA256, issuer: [organizationName = "Click App", commonName = "Click App ContentSign CA 1"], subject: [countryName = "US", stateOrProvinceName = "California", organizationName = "ContentSign by Nodle"], not_before: Nov 7 19:26:02 2024 GMT, not_after: Nov 6 19:26:01 2029 GMT, public_key: PKey { algorithm: "EC" } }, X509 { serial_number: "A1B561309558D8ACD67233A634D6C610E18C4A", signature_algorithm: ecdsa-with-SHA384, issuer: [organizationName = "ContentSign by Nodle", commonName = "ContentSign Root CA"], subject: [organizationName = "Click App", commonName = "Click App ContentSign CA 1"], not_before: Mar 8 23:59:51 2024 GMT, not_after: Nov 17 23:23:31 2033 GMT, public_key: PKey { algorithm: "EC" } }, X509 { serial_number: "1241A148F08412DA61BDB3A67C94491AFFDFDD", signature_algorithm: ecdsa-with-SHA384, issuer: [organizationName = "ContentSign by Nodle", commonName = "ContentSign Root CA"], subject: [organizationName = "ContentSign by Nodle", commonName = "ContentSign Root CA"], not_before: Nov 20 23:23:32 2023 GMT, not_after: Nov 17 23:23:31 2033 GMT, public_key: PKey { algorithm: "EC" } }]
{
"active_manifest": "intergalactic network inc:urn:uuid:cc50e95f-5031-44b6-917b-7c51172e67e3",
"manifests": {
"intergalactic network inc:urn:uuid:cc50e95f-5031-44b6-917b-7c51172e67e3": {
"claim_generator": "Click App c2pa-rs/0.28.5",
"claim_generator_info": [
{
"name": "c2pa-mini-rs",
"version": "0.1.0"
}
],
"title": "Authentic Content",
"format": "image/jpeg",
"instance_id": "xmp:iid:62f2721e-af32-4b92-95e5-366032f779b3",
"ingredients": [],
"assertions": [
{
"label": "stds.schema-org.CreativeWork",
"data": {
"@context": "http://schema.org/",
"@type": "CreativeWork",
"author": [
{
"@type": "Person",
"name": "0xF0Ed23507BC598b1eA0FdB7c31517B26538829F9"
},
{
"@id": "https://clickapp.com/zk/account/0xF0Ed23507BC598b1eA0FdB7c31517B26538829F9",
"@type": "Person",
"name": "Click"
},
{
"@id": "https://era.zksync.network/address/0xF0Ed23507BC598b1eA0FdB7c31517B26538829F9",
"@type": "Person",
"name": "Blockchain Record"
}
]
},
"kind": "Json"
},
{
"label": "com.nodle.chain",
"data": {
"address": "0xF0Ed23507BC598b1eA0FdB7c31517B26538829F9",
"block_header": "0x7eff56d6f32e1dcb2e62702705e5f632fe37a2515d2265dcc80549b129796ac0"
}
},
{
"label": "stds.exif",
"data": {
"@context": {
"dc": "http://purl.org/dc/elements/1.1/",
"exif": "http://ns.adobe.com/exif/1.0/",
"exifEX": "http://cipa.jp/exif/2.32/",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"tiff": "http://ns.adobe.com/tiff/1.0/",
"xmp": "http://ns.adobe.com/xap/1.0/"
},
"exif:GPSLatitude": "33 deg 59' 21.47 N",
"exif:ApertureValue": "1.6",
"exif:GPSLongitude": "118 deg 21' 16.58 W",
"tiff:Make": "Apple",
"exif:GPSAltitude": "0.00",
"exif:GPSRadius": "approximate",
"exif:FileType": "JPEG",
"exif:DateTimeOriginal": "2024-12-06T07:34:45.0-08:00",
"exif:FocalLength": "5.1",
"exifEX:LensMake": "Apple",
"exif:FileSize": "1.31 MB",
"exif:ImageSize": "4032 x 3024",
"exif:ExposureTime": "1/60",
"exif:Megapixels": 12.2,
"exif:ExposureBiasValue": "0",
"exifEX:LensModel": "iPhone 13 mini back camera 5.1mm f/1.6",
"tiff:Model": "iPhone 13 mini",
"exif:ExposureProgram": 2.0
},
"kind": "Json"
}
],
"signature_info": {
"alg": "Es256",
"issuer": "Click Camera",
"cert_serial_number": "7991087401548139538",
"time": "2024-12-06T15:34:45+00:00"
},
"label": "intergalactic network inc:urn:uuid:cc50e95f-5031-44b6-917b-7c51172e67e3"
}
},
"validation_status": [
{
"code": "signingCredential.untrusted",
"url": "Cose_Sign1",
"explanation": "signing certificate untrusted"
},
{
"code": "general.error",
"url": "self#jumbf=/c2pa/intergalactic network inc:urn:uuid:cc50e95f-5031-44b6-917b-7c51172e67e3/c2pa.signature",
"explanation": "claim signature is not valid"
}
]
}
$ RUST_LOG=trace cargo r -- bafybeiaq7b5fyime2l3lb6niajrmksfipe26r2rycxius3twog7vcarjma.jpg trust
[2024-12-10T23:33:26Z DEBUG c2patool] Using trust anchors from Url(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("contentcredentials.org")), port: None, path: "/trust/anchors.pem", query: None, fragment: None })
[2024-12-10T23:33:26Z DEBUG c2patool] Using allowed list from Url(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("contentcredentials.org")), port: None, path: "/trust/allowed.sha256.txt", query: None, fragment: None })
[2024-12-10T23:33:26Z DEBUG c2patool] Using trust config from Url(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("contentcredentials.org")), port: None, path: "/trust/store.cfg", query: None, fragment: None })
[2024-12-10T23:33:26Z TRACE c2pa::cose_validator] verify_cose: cert_check=true
[2024-12-10T23:33:26Z TRACE c2pa::cose_validator] check_cert: Extended key usage=ExtendedKeyUsage { any: false, server_auth: false, client_auth: false, code_signing: false, email_protection: true, time_stamping: false, ocsp_signing: false, other: [] }
[2024-12-10T23:33:26Z TRACE c2pa::cose_validator] check_trust: signing_time_epoc=Some(1733680070)
[2024-12-10T23:33:26Z TRACE c2pa::openssl::openssl_trust_handler] verify_trust: allowed_list={"RYiv7AH1j4iSgkRM0m9CyFnvXPDZQbUTQW5C2RWlX7k=", "h72o9aSy/NjDN4dD8xdtlX28dm4c0ERupBgVuKCCltw=", "AVVAhHuItD0yCby1WTAOFQOGPwiXwsgvTVbc/vlpSK0=", "En07oVKGfm1psGJNDXqzHouLz3sjQ22uehYhHfLztQc=", "K9mtOEK4IaaomtzodP9jNUKxeWXb/VmZnYg7wCGG7r0=", "igPZ4cQ+ElHb3Jagyp4o4LMVwkBicDWpm+oVuT9ctx0=", "jko53/VSR3DMMb1nWmNx2+C+eh4U6CMV3BCw6tOgEVE=", "I7jqB0noPFAMx3l69LFaThD+G2/WVivV8N4Z6EW/NFM=", "MFcAt2slSc/tA2sbjMNYRu26khsUmjqZ4xaQ+pfw4eI=", "r8Ozpwn+U+j6fjWGiikhChvVXhGFZuMjyxgJq2S53EM=", "o6uXNf1TKvSiNUL15CTvrGpfKEjCxpYrawNpYrfpkxc=", "9qobPTVKRcOkldyUHSqN47xQN/V8tWrnW15DlUEco14=", "5Obdx/KBgUxCpnrN1FVKm8TxKCGVUeJaFN7jh09SnOA=", "ONSIhEwHVB2K7a9RcPnBdcw2l+h4QHQLCA1OlSd7zFQ=", "6wtSrm8Zm4xkIqkI9GB1lMhW90dzJVZPhOdSUL4lRTk=", "OVkKb9nszYa6YpVdVfOEv481CrHaeaRnTbXM7zQC5KA=", "QyrjnZlBhV+F2xAeNsb30KgVqD4QAQ+00rUtIfdXIzo=", "xay/SRpiM24PoQ0V12PB8NmSdFt0X78ummtUmiUca7E=", "U2h3hfYy4GRIk7KCpu819C3nLko6A/NRn5xND2jYIng=", "QSa4RsH9d5KTNMx0WlXh+VHj5NAhcj6tHtluLmNySdM=", "9xnJg+oMcadqQCUtdEhBAt22rBInCbqyaaK3V3jaJmc=", "eS7xBJoBUu4SJmCjg3gY4h1mlzwNGECqcBkaAMmlAGc=", "Z2O9JqjrmFVXI61XVcNlsodMzG1466HYQVyf+BkfWD8=", "xXs/IWgBBAatf7AEbsXZNwgiV2zViVZFqbtqFgc0uXA=", "9heOFnvHjLz/iSNA37kqvnA40LDfSnr0UnyqZUECx7w=", "Jcv4yjJRVoH3zT76I7PzNY4cGDsT+jKNfRwu+mDLrpA=", "rAnk74zT9+yxSkgHRFbaMC7mi/FprjBvnaeFGAroSac=", "CaJnmoY4M0Rc4VO/v/v5diZCx/JlohwsDLm9RHOG6JI=", "28qM81MvRsS8Il4OhoYfzSQ1dnSIgepF9/553j2+MTc=", "lVK0M1Sn1rq0KUvDqIo8/Py5MWBpb6t/T0SWgUWyWHE=", "fPAhO8QbeH8RUyk2673iIIoNFor6tZi8Fsshu07ieHI=", "vLyqdFQcIV1wOdjZvMD9rAQnvcrtOgjRRjxOF7HcyIw=", "UkQdwe7X8cO+Pj1Sb+VXxqkozHwXgC5YEqeiM42eA9w=", "XCYKS7pr8jrDLX7NeUXrldi1pAsDm6aqovGCO4iPY0Q=", "opQNg+Qgg7MwDSY7PEcBCMpP5V9qJkF3BZp97MENFcQ=", "J3TPotkZdPCd2iOdX4wmkudA821oz621eT/CAT2Zasc=", "ad8hM2xz531vByVp8Hovpqv78qY0zV9zrLnRTYYdxB0=", "SwJmKrvd0a561mMCdmrEhSGIj9tda8HjaTsTy8CCjRo=", "czGRID6tAPpW8H+BtsxRZ38Y1cSRqebzg+MCjpBT1f4=", "oAhDShFP/R0lcjRXxIaZLfLd9FrSLCmBe76XCbfssjE=", "mfYmsZ1px3Bo+ZUvcttATyzVa9Pj8nNtHzsj99JgLwE=", "qskM23c5D9ZOelxRn8ZHfBPJVMPcpDmp53TAXUTGlVY="}
[2024-12-10T23:33:26Z TRACE c2pa::openssl::openssl_trust_handler] verify_trust: allowed_list contains cert RYiv7AH1j4iSgkRM0m9CyFnvXPDZQbUTQW5C2RWlX7k=
{
"active_manifest": "numbersprotocol:urn:uuid:92583583-b457-4c09-bb89-756ee7951cc8",
"manifests": {
"numbersprotocol:urn:uuid:92583583-b457-4c09-bb89-756ee7951cc8": {
"claim_generator": "Numbers_Protocol c2patool/0.9.9 c2pa-rs/0.35.1",
"title": "bafybeicbxmbnzynhvoxr3x5mfly4ptc3rpeqbm7qokm4s3rbjpq6cmijsm",
"format": "image/jpeg",
"instance_id": "xmp:iid:bca14d74-857d-45f0-9c64-a01996492e43",
"thumbnail": {
"format": "image/jpeg",
"identifier": "self#jumbf=/c2pa/numbersprotocol:urn:uuid:92583583-b457-4c09-bb89-756ee7951cc8/c2pa.assertions/c2pa.thumbnail.claim.jpeg"
},
"ingredients": [],
"assertions": [
{
"label": "stds.schema-org.CreativeWork",
"data": {
"@context": "https://schema.org",
"@type": "CreativeWork",
"author": [
{
"@type": "Person",
"name": "ra25"
}
],
"identifier": "bafybeicbxmbnzynhvoxr3x5mfly4ptc3rpeqbm7qokm4s3rbjpq6cmijsm",
"url": "https://verify.numbersprotocol.io/asset-profile/bafybeicbxmbnzynhvoxr3x5mfly4ptc3rpeqbm7qokm4s3rbjpq6cmijsm",
"locationCreated": "34.021771068073, -118.290588268337",
"dateCreated": "2024-12-08T17:46:21Z"
},
"kind": "Json"
},
{
"label": "c2pa.actions",
"data": {
"actions": [
{
"action": "c2pa.opened",
"digitalSourceType": "http://cv.iptc.org/newscodes/digitalsourcetype/digitalCapture"
}
]
}
},
{
"label": "numbers.assetTree",
"data": {
"assetTreeCid": "bafkreifitnkbzucemuguagkc4zxg3ywvenaqbheqo4joepwxonc35ckphe",
"assetTreeSha256": "a89b541cd044650d401942e66e6de2d52341009c907712e23ed77345be894f39",
"assetTreeSignature": "0xb95262c7947c87b5de6b26d4233eb66ca325634dec7df4729e758172231ad21b303c64c2d9073fbc1b4960a23dda9f7600ecab7b66d91548a99622e3c631c6331b",
"committer": "0x51130dB91B91377A24d6Ebeb2a5fC02748b53ce1"
}
},
{
"label": "numbers.integrity.json",
"data": {
"nid": "bafybeicbxmbnzynhvoxr3x5mfly4ptc3rpeqbm7qokm4s3rbjpq6cmijsm",
"publicKey": "ra25",
"mediaHash": "f05977bbc4454dc596d9a1bb6b83261992fa545e26dde1716ccbcae39c85c007",
"captureTimestamp": 1733679964
}
},
{
"label": "stds.exif",
"data": {
"@context": {
"EXIF": "http://ns.adobe.com/EXIF/1.0/",
"EXIFEX": "http://cipa.jp/EXIF/2.32/",
"dc": "http://purl.org/dc/elements/1.1/",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"tiff": "http://ns.adobe.com/tiff/1.0/",
"xmp": "http://ns.adobe.com/xap/1.0/"
},
"EXIF:DateTimeOriginal": "2024-12-08T17:46:04Z",
"EXIF:GPSTimeStamp": "2024-12-08T17:46:04Z",
"EXIF:GPSLongitude": "-118.290588268337",
"EXIF:GPSLatitude": "34.021771068073"
},
"kind": "Json"
}
],
"signature_info": {
"alg": "Es384",
"issuer": "Numbers Co., Ltd.",
"cert_serial_number": "16414363228331548608953470346493198982",
"time": "2024-12-08T17:47:50+00:00"
},
"label": "numbersprotocol:urn:uuid:92583583-b457-4c09-bb89-756ee7951cc8"
}
}
}
$ RUST_LOG=trace cargo r -- IMG_4717.JPG trust
[2024-12-11T18:10:15Z DEBUG c2patool] Using trust anchors from Url(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("contentcredentials.org")), port: None, path: "/trust/anchors.pem", query: None, fragment: None })
[2024-12-11T18:10:16Z DEBUG c2patool] Using allowed list from Url(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("contentcredentials.org")), port: None, path: "/trust/allowed.sha256.txt", query: None, fragment: None })
[2024-12-11T18:10:16Z TRACE c2pa::cose_validator] verify_cose: cert_check=true
[2024-12-11T18:10:16Z TRACE c2pa::cose_validator] check_cert: Extended key usage=ExtendedKeyUsage { any: false, server_auth: false, client_auth: false, code_signing: false, email_protection: true, time_stamping: false, ocsp_signing: false, other: [] }
[2024-12-11T18:10:16Z TRACE c2pa::cose_validator] check_trust: signing_time_epoc=Some(1733935458)
[2024-12-11T18:10:16Z TRACE c2pa::openssl::openssl_trust_handler] verify_trust: allowed_list={"eS7xBJoBUu4SJmCjg3gY4h1mlzwNGECqcBkaAMmlAGc=", "RYiv7AH1j4iSgkRM0m9CyFnvXPDZQbUTQW5C2RWlX7k=", "czGRID6tAPpW8H+BtsxRZ38Y1cSRqebzg+MCjpBT1f4=", "MFcAt2slSc/tA2sbjMNYRu26khsUmjqZ4xaQ+pfw4eI=", "igPZ4cQ+ElHb3Jagyp4o4LMVwkBicDWpm+oVuT9ctx0=", "U2h3hfYy4GRIk7KCpu819C3nLko6A/NRn5xND2jYIng=", "5Obdx/KBgUxCpnrN1FVKm8TxKCGVUeJaFN7jh09SnOA=", "SwJmKrvd0a561mMCdmrEhSGIj9tda8HjaTsTy8CCjRo=", "J3TPotkZdPCd2iOdX4wmkudA821oz621eT/CAT2Zasc=", "qskM23c5D9ZOelxRn8ZHfBPJVMPcpDmp53TAXUTGlVY=", "Z2O9JqjrmFVXI61XVcNlsodMzG1466HYQVyf+BkfWD8=", "28qM81MvRsS8Il4OhoYfzSQ1dnSIgepF9/553j2+MTc=", "oAhDShFP/R0lcjRXxIaZLfLd9FrSLCmBe76XCbfssjE=", "lVK0M1Sn1rq0KUvDqIo8/Py5MWBpb6t/T0SWgUWyWHE=", "ad8hM2xz531vByVp8Hovpqv78qY0zV9zrLnRTYYdxB0=", "r8Ozpwn+U+j6fjWGiikhChvVXhGFZuMjyxgJq2S53EM=", "h72o9aSy/NjDN4dD8xdtlX28dm4c0ERupBgVuKCCltw=", "vLyqdFQcIV1wOdjZvMD9rAQnvcrtOgjRRjxOF7HcyIw=", "XCYKS7pr8jrDLX7NeUXrldi1pAsDm6aqovGCO4iPY0Q=", "fPAhO8QbeH8RUyk2673iIIoNFor6tZi8Fsshu07ieHI=", "xXs/IWgBBAatf7AEbsXZNwgiV2zViVZFqbtqFgc0uXA=", "9qobPTVKRcOkldyUHSqN47xQN/V8tWrnW15DlUEco14=", "En07oVKGfm1psGJNDXqzHouLz3sjQ22uehYhHfLztQc=", "QSa4RsH9d5KTNMx0WlXh+VHj5NAhcj6tHtluLmNySdM=", "I7jqB0noPFAMx3l69LFaThD+G2/WVivV8N4Z6EW/NFM=", "UkQdwe7X8cO+Pj1Sb+VXxqkozHwXgC5YEqeiM42eA9w=", "OVkKb9nszYa6YpVdVfOEv481CrHaeaRnTbXM7zQC5KA=", "CaJnmoY4M0Rc4VO/v/v5diZCx/JlohwsDLm9RHOG6JI=", "K9mtOEK4IaaomtzodP9jNUKxeWXb/VmZnYg7wCGG7r0=", "Jcv4yjJRVoH3zT76I7PzNY4cGDsT+jKNfRwu+mDLrpA=", "xay/SRpiM24PoQ0V12PB8NmSdFt0X78ummtUmiUca7E=", "AVVAhHuItD0yCby1WTAOFQOGPwiXwsgvTVbc/vlpSK0=", "rAnk74zT9+yxSkgHRFbaMC7mi/FprjBvnaeFGAroSac=", "6wtSrm8Zm4xkIqkI9GB1lMhW90dzJVZPhOdSUL4lRTk=", "jko53/VSR3DMMb1nWmNx2+C+eh4U6CMV3BCw6tOgEVE=", "9xnJg+oMcadqQCUtdEhBAt22rBInCbqyaaK3V3jaJmc=", "o6uXNf1TKvSiNUL15CTvrGpfKEjCxpYrawNpYrfpkxc=", "ONSIhEwHVB2K7a9RcPnBdcw2l+h4QHQLCA1OlSd7zFQ=", "mfYmsZ1px3Bo+ZUvcttATyzVa9Pj8nNtHzsj99JgLwE=", "9heOFnvHjLz/iSNA37kqvnA40LDfSnr0UnyqZUECx7w=", "opQNg+Qgg7MwDSY7PEcBCMpP5V9qJkF3BZp97MENFcQ=", "QyrjnZlBhV+F2xAeNsb30KgVqD4QAQ+00rUtIfdXIzo="}
[2024-12-11T18:10:16Z TRACE c2pa::openssl::openssl_trust_handler] verify_trust: chain=[]
{
"active_manifest": "urn:uuid:c24dd69c-8c9b-4a0e-8469-b93ceb93543f",
"manifests": {
"urn:uuid:c24dd69c-8c9b-4a0e-8469-b93ceb93543f": {
"claim_generator": "ProofMode/43 c2pa-rs/0.28.4",
"title": "1733935453751_c2pa.jpg",
"format": "image/jpeg",
"instance_id": "xmp:iid:00fd7d9d-77d5-4180-88eb-2077b985d477",
"thumbnail": {
"format": "image/jpeg",
"identifier": "self#jumbf=/c2pa/urn:uuid:c24dd69c-8c9b-4a0e-8469-b93ceb93543f/c2pa.assertions/c2pa.thumbnail.claim.jpeg"
},
"ingredients": [
{
"title": "1733935453751.jpg",
"format": "image/jpeg",
"instance_id": "xmp:iid:c5ed606e-8b9f-4bd0-8eff-46a201f5d68f",
"thumbnail": {
"format": "image/jpeg",
"identifier": "self#jumbf=c2pa.assertions/c2pa.thumbnail.ingredient.jpeg"
},
"relationship": "parentOf"
}
],
"assertions": [
{
"label": "c2pa.actions",
"data": {
"actions": [
{
"action": "c2pa.created"
}
]
}
},
{
"label": "c2pa.ai_training",
"data": {
"constraintInfo": null,
"use": "notAllowed"
}
},
{
"label": "c2pa.ai_generative_training",
"data": {
"constraintInfo": null,
"use": "notAllowed"
}
},
{
"label": "c2pa.data_mining",
"data": {
"constraintInfo": null,
"use": "notAllowed"
}
},
{
"label": "c2pa.inference",
"data": {
"constraintInfo": null,
"use": "notAllowed"
}
},
{
"label": "stds.schema-org.CreativeWork",
"data": {
"@context": "https://schema.org",
"@type": "CreativeWork",
"author": [
{
"@context": "https://schema.org",
"@id": "https://keys.openpgp.org",
"@type": "Person",
"identifier": "ae1a4c380d43bfb2",
"name": "ae1a4c380d43bfb2"
}
]
},
"kind": "Json"
}
],
"signature_info": {
"alg": "Es256",
"issuer": "ProofMode-User",
"cert_serial_number": "1410564205799300702414624826233507026152462458575",
"time": "2024-12-11T16:44:18+00:00"
},
"label": "urn:uuid:c24dd69c-8c9b-4a0e-8469-b93ceb93543f"
}
},
"validation_status": [
{
"code": "signingCredential.untrusted",
"url": "Cose_Sign1",
"explanation": "signing certificate untrusted"
},
{
"code": "general.error",
"url": "self#jumbf=/c2pa/urn:uuid:c24dd69c-8c9b-4a0e-8469-b93ceb93543f/c2pa.signature",
"explanation": "claim signature is not valid"
}
]
}
GenAI: Problems Introduced and Countermeasures
Deepfake
Disinformation
Cybercrime
Generation not for faking
General countermeasures
References
Removal Services
Web Atoms
Web Crawling
Web User-Facing
Search Engine Optimization (SEO)
best <category>
where <category>
is in GS1 Global Product Classification/ Google Product Taxonomysite:
for owned domainServer
Google Chrome Remote Desktop
sudo
access)NFS
systemctl enable nfs-server
etc/exports
directory_to_export ip_to_allow(rw,insecure)
sudo exportfs -arv
sudo mount -t nfs -v source target
/etc/fstab
withsudo vifs
source /System/Volumes/Data/../Data/Volumes/target_name nfs rw,nolockd,resvport,hard,bg,intr,tcp,nfc,rsize=65536,wsize=65536
source /../Volumes/target_name nfs rw,nolockd,resvport,hard,bg,intr,tcp,nfc,rsize=65536,wsize=65536
SSH
kitty +kitten ssh …
ssh -L 8080:127.0.0.1:8080 username@host
rsync
-P
give a progress barrsync -P source destination
rsync -P username@host:dir local_dir
rsync -P local_dir username@host:dir
Soft Skill
Art and Business
UCCA talk
Management
basis & direction
meeting
meeting procedure
meeting follow-up
Paper Presentation
Storage
GitHub
.gitattributes
(reference):<directory to exclude from stats>/** linguist-vendored
<directory to exclude from stats and hide from diffs>/** linguist-generated
<directory to forcely include>/** linguist-detectable
.gitattributes
:<directory to be included in Git LFS> filter=lfs diff=lfs merge=lfs -text
University of Southern California Living
Banking
plan regst. mon. US wire in out intl. in out ATM w/draw Zelle USCCU student $9 $0 $5 $20 -2/yr, $10 $40 Citibank branch, -2non-shared/mon ✅ BoA student $0 $5- $15 $30 $15 $0 if not USD, $45 BoA, $2.5 (5) if non-BoA (intl.) ✅ Chase college $0 $12- $15 $25 $15 $5 if not USD, $40 Chase, $3 (5) if non-Chase (intl.) ✅ Housing
JavaScript highlight keyword on page
function highlight(regex, classes) {
let regExp = new RegExp(regex, "gi");
let openTag = `<span class="${classes.join(' ')}">`;
function processNode(node) {
if (node.nodeType === Node.TEXT_NODE) {
let parent = node.parentNode;
if (parent && parent.nodeName !== "SCRIPT" && parent.nodeName !== "STYLE") {
let html = node.nodeValue;
if (regExp.test(html)) {
let newHtml = html.replace(regExp, function(match) {
return `${openTag}${match}</span>`;
});
let span = document.createElement("span");
span.innerHTML = newHtml;
parent.replaceChild(span, node);
}
}
} else {
let childNodes = Array.from(node.childNodes);
childNodes.forEach(child => processNode(child));
}
}
processNode(document.body);
}
highlight(String.raw`\b(?:\$[\d,]+|(?:\d,?)?\d{3})\b`, ["highlight"]);
highlight(String.raw`\b(?:female|girls?|wom[ea]n)`, ["highlight-neg"]);
let style = document.createElement("style");
style.innerHTML = `
.highlight {
background-color: yellow;
}
.highlight-neg {
color: red;
}
`;
document.head.appendChild(style);
Health insurance
PEMS
function clickAll() {
const down = document.getElementsByClassName("lesson-nav-link__link")[1];
if (down) {
console.log(down);
down.click();
}
const cont = document.getElementsByClassName("continue-btn")[0];
if (cont) {
console.log(cont);
cont.click();
}
for (const expand of document.getElementsByClassName("blocks-accordion__header")) {
if (expand.ariaExpanded === "false") {
expand.click();
}
}
for (const select of document.getElementsByClassName("blocks-tabs__header-item--after-active")) {
if (select.ariaSelected === "false") {
select.click();
}
}
setTimeout(clickAll, 200);
}
clickAll();
function clickAll() {
const next = document.getElementById("next");
if (next) {
console.log(next);
next.click();
}
const cont = document.getElementsByClassName("ng-binding")[0];
if (cont) {
console.log(cont);
cont.click();
}
for (const expand of document.getElementsByClassName("accordian-btn")) {
if (expand.ariaExpanded === "false") {
expand.click();
}
}
setTimeout(clickAll, 200);
}
clickAll();
Text Editor
Vim
movement
arrow equivalent
↑
← h j k l →
↓
by word
to start of word
<–b– ○ –w–>
word word word
to end of word
ㅤㅤ<–ge–○ –e–>
word word word
by WORD
search
search character
w
belowmove to result
○–fw–>
word word
move to left of result
○–tw–>
word sword
backward
search expression
/…
+ enter
search forward?…
+ enter
search backwardrepeat
n
next matchN
last matchexpression
end of line
the end character
<–0– ○ –$–>
ㅤㅤlineㅤㅤ
the end non-whitespace character
ㅤㅤ<–^–– ○ ––g_–>
ㅤㅤnot whitespaceㅤㅤ
by chunk
by paragraph
paragraph 1
stuff
<–––– { ––––-+
paragraph 2 ○
more stuff |
<–––– } ––––-+
paragraph 3
by page
half page
^u
^d
whole page
^b
^f
to some position in this page
H
M
L
zt
zz
zb
to past position
``
''
letter
: `
+ letter
`<
`>
`[
`]
g;
g,
change
s
S
ciw
>
(<
)=
gU
(gu
)~
count
number
+ keys
is the same as pressing keys
for number
timesgo to line
number
+ G
number
+ gg
:
+ number
visual mode
o
visual mode navigation
vip
vi"
(not only quotes, others are similar)vit
va"
(not only quotes, others are similar)undo and redo
u
^r
U
fold
zi
za
spelling
z=
zg
zw
buffer
:bn
, previous buffer :bp
<C-o>
quit
ZZ
write as root
:w !sudo tee %
regex replace
()
[]
are escaped by \
if used as regex\{-}
is *?
\<…\>
is \b…\b
\1
, etc.s/\<./\u&/g
s/\l\zs\u/_\l&/g
\( … \)
to $…$
s/\\( \(.\{-}\) \\)/$\1$/g
Visual Studio Code (VSCode)
$
with actual valuerunning Live Share on server for collaboration
ssh -L $FORWARD_PORT:localhost:$FORWARD_PORT $USER@$SERVER
tmux new -s $SESSION_NAME
screen
would workcode tunnel
$URL
of tunnel.deb
to somewhere you own and symlink into local bin on PATH
, something like:mkdir $LOCAL_BIN/chrome
dkpg -x $CHROME_DEB $LOCAL_BIN/chrome
cd $LOCAL_BIN
ln -s chrome/opt/google/chrome/google-chrome google-chrome
google-chrome --headless=new --remote-debugging-port=$FORWARD_PORT $URL
chrome://inspect/#devices
in local Chromium-based browser, click Configure…, put in localhost:$FORWARD_PORT
. wait a bit. tab of $URL
should appear on this page, click inspect below it to directly interact with web VSCode UI tab_writeText = navigator.clipboard.writeText;
navigator.clipboard.writeText = console.log;
chrome://inspect/#devices
. (wait a bit). find login tab, complete loginShared Terminals
, in the UI, open the Live Share
panel in the sidebar. needed for participants to use terminal
Xcode
Shortcuts
⌘ + ⇧ + O
- Open QuicklyFormat
brew install --cask swiftformat-for-xcode
SwiftFormat for Xcode.app
Extensions
-> Xcode Source Editor
-> SwiftFormat
Unstructured Bug Notes
as
is llvm-as
, error is below. solution: align CC w/ as
as: Unknown command line argument '-I'. Try: 'as --help'
as: Did you mean '-h'?
as: Unknown command line argument '--64'. Try: 'as --help'
as: Did you mean '-h'?
as: Too many positional arguments specified!
Can specify at most 1 positional arguments: See: as --help
Unstructured Class Notes
docx
file using PandocAdvanced Analysis of Algorithms
theory overview
&[u1]
(language)
⇒ search problem (harder, but may reduce to decision problem)
⇒ optimizationbinary search to graph search
⇒ quick selection (broader notion of middle; randomize):
21 of the time we pick element ranked 4n∼43n, can throw away 4n of the listinteractive learning
greedy algorithm
Huffman Codes
minimum spanning tree (MST)
Kruskal’s algorithm
Prim’s algorithm
clustering
⇒ by greedy algorithm, ℓ(e)≤dk−1 the k−1th longest edge in MSTapproximation algorithm
set cover
set function
submodular function
reachability
network influence
independent cascade (IC)
threshold model
iterative algorithm
polynomial local search (PLS)
Lloyd’s algorithm (K-means clustering)
want: cluster into P1,⋯,Pkdivide and conquer in geometric space
1-dimensional space
approximate median
median in 2-dimensional space
nearest neighbor
disk packing
kissing number τd
3-dimensional binary search via disk
d-dimensional convex geometry
Helly’s theorem (projection lemma)
Radon theorem: median in convex geometry definition
∀d+2 point X={x1,⋯,xd+2}⊆Rd,
can be divided into 2 set X1,X2 s.t convex hull of X1 and X2 intersectLipton-Tarjan separator theorem for planar graph
Alan George nested dissection
fast Fourier transform (FFT)
integer multiplication
polynomial multiplication
linear programming (LP)
network flow
randomization
Markov chain
PageRank
spectral graph theory
Laplacian matrix L=D−A
min cut
Advanced Computer Networking
Misc
Some Reflections on Innovation and Invention, George H. Heilmeier
The design philosophy of the DARPA Internet protocols, David D. Clark
On the Naming and Binding of Network Destinations, Jerome H. Saltzer
End-to-end arguments in system design, J. H. Saltzer, D. P. Reed, D. D. Clark
Tussle in Cyberspace: Defining Tomorrow’s Internet, David D. Clark, John Wroclawski, Karen R. Sollins, Robert Braden
Congestion Avoidance and Control, Van Jacobson, Michael J. Karels, SIGCOMM 1988
delaycwnd
instead of slow startA Binary Feedback Scheme for Congestion Avoidance in Computer Networks, Kadangode K. Ramakrishnan, Raj Jain, SIGCOMM 1988
BBR: Congestion-Based Congestion Control, Neal Cardwell, Yuchung Cheng, C. Stephen Gunn, Soheil Hassas Yeganeh, Van Jacobson
BGP Routing Policies in ISP Networks, Matthew Caesar, Jennifer Rexford
How the Great Firewall of China detects and blocks fully encrypted traffic, Wu, Mingshi, Jackson Sippe, Danesh Sivakumar, Jack Burg, Peter Anderson, Xiaokang Wang, Kevin Bock, Amir Houmansadr, Dave Levin, Eric Wustrow
[0x20,0x7e]
): in first 6 byte of packet, or > 50%, or > 20Routing Stability in Congested Networks: Experimentation and Analysis, Aman Shaikh, Anujan Varma, Lampros Kalampoukas, Rohit Dube
Why Is It Taking So Long to Secure Internet Routing?, Sharon Goldberg
In Search of the Elusive Ground Truth: The Internet’s AS-level Connectivity Structure, Ricardo Oliveira, Dan Pei, Walter Willinger, Beichuan Zhang, Lixia Zhang
Trinocular: Understanding Internet Reliability Through Adaptive Probing, Lin Quan, John Heidemann, Yuri Pradkin
How the Internet reacted to Covid-19 – A perspective from Facebook’s Edge Network, Timm Böttger, Ghida Ibrahim, Ben Vallis
Analysis and Simulation of a Fair Queueing Algorithm, Alan Demers, Srinivasan Keshavt, Scott Shenker
Controlling Queue Delay, Kathleen Nichols, Van Jacobson
Congestion Control for High Bandwidth-Delay Product Networks, Dina Katabi, Mark Handley, Charlie Rohrsy
cwnd
& RTTDirected Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks, Chalermek Intanagonwiwat, Ramesh Govindan, Deborah Estrin
An In-depth Study of LTE: Effect of Network Protocol and Application Behavior on Performance, Junxian Huang, Feng Qian, Yihua Guo, Yuanyuan Zhou, Subhabrata Sen, Qiang Xu, Z. Morley Mao, Oliver Spatscheck
A Variegated Look at 5G in the Wild: Performance, Power, and QoE Implications, Arvind Narayanan, Xumiao Zhang, Ruiyang Zhu, Ahmad Hassan, Shuowei Jin, Xiao Zhu, Xiaoxuan Zhang, Denis Rybkin, Zhengxuan Yang, Z. Morley Mao, Feng Qian, Zhi-Li Zhang
On the Self-Similar Nature of Ethernet Traffic, Will E. Leland, Murad S. Taqqu, Walter Willinger, Daniel V. Wilson
Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes, Mark E. Crovella, Azer Bestavros
Internet Inter-Domain Traffic, Craig Labovitz, Scott Iekel-Johnson, Danny McPherson, Jon Oberheide, Farnam Jahanian
MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean, Sanjay Ghemawat
The Tail at Scale, JeffRey Dean, Luiz André Barroso
Rethinking Enterprise Network Control, Mart ́ın Casado, Michael J. Freedman, Justin Pettit, Jianying Luo, Natasha Gude, Nick McKeown, Scott Shenker
P4: Programming Protocol-Independent Packet Processors, Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, David Walker
VL2: A Scalable and Flexible Data Center Network, Albert Greenberg, Srikanth Kandula, David A. Maltz, James R. Hamilton, Changhoon Kim, Parveen Patel, Navendu Jain, Parantap Lahiri, Sudipta Sengupta
Engineering Egress with Edge Fabric Steering Oceans of Content to the World, Brandon Schlinker, Hyojeong Kim, Timothy Cui, Ethan Katz-Bassett, Harsha V. Madhyastha, Italo Cunha, James Quinn, Saif Hasan, Petr Lapukhov, Hongyi Zeng
B4: Experience with a Globally-Deployed Software Defined WAN, Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jonathan Zolla, Urs Hölzle, Stephen Stuart, Amin Vahdat
Networking Named Content, Van Jacobson, Diana K. Smetters, James D. Thornton, Michael Plass, Nick Briggs, Rebecca Braynard
Fundamental Design Issues for the Future Internet, Scott Shenker
Click Trajectories: End-to-End Analysis of the Spam Value Chain, Kirill Levchenko, Andreas Pitsillidis, Neha Chachra, Brandon Enright, Mark Felegyhazi, Chris Grier, Tristan Halvorson, Chris Kanich, Christian Kreibich, He Liu, Damon McCoy, Nicholas Weaver, Vern Paxson, Geoffrey M. Voelker, Stefan Savage
Investigating Large Scale HTTPS Interception in Kazakhstan, Ram Sundara Raman, Leonid Evdokimov, Eric Wurstrow, J. Alex Halderman, Roya Ensafi
The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research, David Dittrich, Erin Kenneally
Tor: The Second-Generation Onion Router, Roger Dingledine, Nick Mathewson, Paul Syverson
Advanced Operating Systems
How to read a paper, Srinivasan Keshav, SIGCOMM CCR, 2007
An Evaluation of The Ninth SOSP Submissions or How (and How Not) to Write A Good Systems Paper, Roy Levin, David D. Redell
RAID: High-Performance, Reliable Secondary Storage, Peter M. Chen, Edward K. Lee, Garth A. Gibson, Randy H. Katz, David A. Patterson, ACM Computing Surveys, 1994
The Recovery Manager of the System R Database Manager, Jim Gray, Paul McJones, Mike Blasgen, Bruce Lindsay, Raymond Lorie, Tom Price, Franco Putzolu, Irving Traiger, ACM Computing Surveys, 1981
Disconnected Operation in the Coda File System, James J. Kistler M. Satyanarayanan, ACM Transactions on Computer Systems, 1992
Eraser: A Dynamic Data Race Detector for Multithreaded Programs, Stefan Savage, Michael Burrows, Greg Nelson, and Patrick Sobalvarro, Thomas Anderson, ACM Transactions on Computer Systems, 1997
Improving the reliability of commodity operating systems, Michael M. Swift, Brian N. Bershad, Henry M. Levy, SOSP, 2003
Cores that don’t count, Peter H. Hochschild, Paul Turner, Jeffrey C. Mogul, Rama Govindaraju, Parthasarathy, Ranganathan, David E. Culler, Amin Vahdat, HotOS, 2021
The Design and Implementation of a Log-Structured File System, Mendel Rosenblum, John K. Ousterh, ACM Transactions on Computer Systems, 1992
Rethink the sync, Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, Jason Flinn, TOCS, 2008
Caching in the Sprite Network File System, Michael N. Nelson, Brent B. Welch, John K. Ousterho, ACM Transactions on Computer Systems, 1988
Outatime: Using Speculation to Enable Low-Latency Continuous Interaction for Mobile Cloud Gaming, Kyungmin Lee, David Chu, Eduardo Cuervo, Johannes Kopf, Yury Degtyarev, Sergey Grizan, Alec Wolman, Jason Flinn, MobiSys, 2015
A low-bandwidth network file system, Athicha Muthitacharoen, Benjie Chen, David Mazières, SOSP, 2001
Coz: finding code that counts with causal profiling, Charlie Curtsinger, Emery D. Berger, SOSP, 2015
Horcrux: Automatic JavaScript Parallelism for Resource-Efficient Web Computation, Shaghayegh Mardani, Ayush Goel, Ronny Ko, Harsha V. Madhyastha, Ravi Netravali, OSDI, 2021
Shielding Applications from an Untrusted Cloud with Haven, Andrew Baumann, Marcus Peinado, Galen Hunt, OSDI, 2014
Difference Engine: Harnessing Memory Redundancy in Virtual Machines, Diwaker Gupta, Sangmin Lee, Michael Vrable, Stefan Savage, Alex C. Snoeren, George Varghese, Geoffrey M. Voelker, Amin Vahdat, Communications of the ACM, 2010
Deciding when to forget in the Elephant file system, Douglas S. Santry, Michael J. Feeley, Norman C. Hutchinson, Alistair C. Veitch, Ross W. Carton, Jacob Ofir, SOSP, 1999
Energy-aware adaptation for mobile applications, Jason Flinn, M. Satyanarayanan, SOSP, 1999
MAUI: Making Smartphones Last Longer with Code Offload, Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra, Paramvir Bahl, MobiSys, 2010
Wide-area cooperative storage with CFS, Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, SIGOPS OSR, 2001
Cluster-Based Scalable Network Services, Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Paul Gauthier, SOSP, 1997
Improving MapReduce Performance in Heterogeneous Environments, Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, Ion Stoica, OSDI, 2008
Efficient Memory Disaggregation with Infnswap, Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, Kang G. Shin, NSDI, 2017
Hints for Computer System Design, Butler W. Lampson, SOSP, 1983
On system design, Jim Waldo, SIGPLAN Notices, 2006
Triangulating Python Performance Issues with Scalene, Emery D. Berger, Sam Stern, Juan Altmayer Pizzorno, OSDI, 2023
Algorithms and Databases
sort
insertion sort
loop invariant
heap sort
max-heap
A
size
A[0]
A[(n - 1)//2]
A[n]
2n + 1 < size
A[2n + 1]
2n + 2 < size
A[2n + 2]
build max-heap
A[parent(size - 1)]
to A[0]
, max-heapify the heapmax-heapify
A[i]
provided that both left and right are max-heapA[l]
among A[i]
, left, rightA[i]
is not the maximum, swap them and max-heapify from A[l]
priority queue
max
A[0]
)extract max
A[0]
and A[size]
A[size]
A[size]
and decrement size
0
increase key
A[i]
with A[parent(i)]
while i > 0
and A[parent(i)] < A[i]
insert
size
A[size - 1]
=−∞A[size - 1]
quicksort
counting sort
radix sort
growth of function
asymptotic notation
Θ notation
O notation
Ω notation
o notation
ω notation
median & order statistics
ith order statistic
(lower) median
selection problem
Artificial Intelligence
ethics
codes of ethics
engineering design process
handling ethical issues
definition of artificial intelligence
Turing test
total Turing test
thinking humanly
thinking rationally
knowledge-based system
intelligent agent
rational agent
task environment
agent structure
table-driven structure
simple reflex agent
search
example application
collaborative perception
anomaly detection
uninformed search
breadth-first search
uniform-cost search
informed search (heuristic search)
A* search (A-star search)
reinforcement learning
dynamic programming
discrete Markov decision process (discrete MDP)
find optimal policy
value iteration
policy iteration
exploration and exploitation
continuous Markov decision process (continuous MDP)
inverted pendulum
discretization
value function approximation
fitted value iteration
is estimation of
Mealy machine MDP
finite horizon MDP
linear quadratic regulation
ut:Ru×n,vt:Rn×n,ut,vt≥0policy searching method
partial observable MDP
reinforce with baseline
Monte Carlo method
time-difference learning (TD-learning)
on-policy TD (SARSA)
off-policy TD: Q-learning
Cloud Computing
Introduction to Databases
like
: any substring %
, any char _
select x, fn(y) from _ group by x having …
select _ from _ where x in y;
-- lateral clause:
select _ from x, lateral (/* access outside variables */) …;
-- with clause:
with alias_name as (…) select …;
delete from _ where _;
insert into table_name (attributes, …) values (…,), (…,), …
update table_name set attributes = value where …;
-- join using
select _ from relation1 join relation2 using (attr1, …);
-- view/ materialized view
create view view_name as select …;
create materialized view view_name as select …;
not null
, unique
, check (…)
foreign key (attr1) references table_name on delete …
attr1 type, constraint constr_name check (…)
alter table table_name drop constraint constr_name;
create assertion assert_name check (…);
coalesce(attr1, default_value_in_place_of_null)
cast(attr1 as type)
data_format(value, 'format string')
if(predicate, value_when_true, value_when_false)
or decode
in Oraclecreate type type_name as …;
create domain domain_name as …;
create table table2 like table1;
create table table1 as (select …) with data;
delimiter $$
create function fn_name(arg1 type1, …) returns type_out begin
-- declare local variable, set to NULL by default
declare local_var var_type default default_val;
-- mutate local variable
set local_var = …;
select _ into local_var from …;
return …;
end $$
-- `type_out` can be a `table (…)`—table function
create procedure proc_name(
in arg_input type1, out arg_output type2, inout arg_mutate type3, …
) begin
-- …
end $$
-- no returning for procedure
delimiter ;
-- call procedure
call proc_name(args…, @outside_var);
-- expression with pattern matching
case -- optionally with value here
when _ then _
…
else _
end;
-- loop
label1: loop
iterate; -- continue
leave; -- break
end loop;
while predicate1 do
-- …
end while;
repeat
-- …
until predicate1 end repeat;
for each_row as table_value1 do
-- …
end for;
create trigger trigger_name after insert on table_name -- or `before`, `delete`
referencing new row as row_name for each row when ( -- or `old row`.
-- or without renaming `new` or `old`.
-- …
)
begin -- compound statement
rollback
end;
-- multiple trigger
create trigger trigger_name before update on table_name
for each row follows another_trigger_name begin
-- …
end, $$
declare continue handle for sqlstate 'err_no' begin -- or for `not found`
-- …
end; -- or `set _ = …`
declare cur_name for select _ from _;
open cur_name;
fetch cur_name into var1, …;
close cur_name;
Introduction to Programming and Data Structures
*
/
*// instance variable inside the reference type (not final if need to change value)
*private final type instanceVariable1;
*public ClassName(type parameterVariable1, …) {
type localVariable = value;
…
}
*/
*public double methodName(type argument1, …) {
…
return something;
}
public abstract type methodName(type parameterVariable);
}
public int methodName(type parameterVariable) {
}
}
{
if (x == null) return false;
if (this.getClass() != x.getClass()) return false;
Class that = (Class) x;
return (this.instanceVariable1 == that.instanceVariable1) && …;
}
return Objects.hash(instanceVariable1,…);
}
{
para = parameter;
}Probability, Random Variables, and Stochastic Processes
joint distribution: discrete
uniform distribution
independent uniform variables
joint distribution: continuous
conditional probability: discrete
probability model
rule of average conditional probability
conditional expectation: discrete
properties
conditioning: continuous case
probability over a region
infinitesimal conditioning formula
conditional density of Y given X=x
joint density surface
marginal density of X
multiplication rule for density
averaging conditional probability
integral conditioning formula
conditional expectation
conditional variance
! another way to calculate variance
! for Xi i.i.d. with number N>0 independent to Xi
Bayesian estimation
general case
conjugate prior
indicator
maximum a posteriori probability (MAP) rule
point estimation
estimator Θ^=g(X)
least mean squares (LMS) estimation
base case
general case
linear least mean squares estimation
hypothesis testing
MAP rule for hypothesis testing
general case—two coins
property
discrete Markov chain
one-step probability P(Xn+1=j∣Xn=i)=P(i,j)=pij
transition probability matrix
initial distribution π0
n-step transition matrix
class structure
closed class
absorbing state
transient & recurrent
hitting time
transient and recurrent state
number of times pass y
theorem
result
starting from x, the probability to go in class {x1,⋯,xk}
starting from x, the expected steps to be absorbed
stationary distribution
find stationary distribution
null recurrent and positive recurrent
period of a state dx
states in the same class have the same period
a chain is irreducible if S is a closed class
if every state of a Markov chain has period 1 we say the chain is aperiodic
suppose a Markov chain is irreducible and aperiodic and has a stationary distribution π
suppose a Markov chain is irreducible and all states are recurrent
suppose a Markov chain is irreducible and has a stationary distribution π
Stochastic process
branching process
question of interest
generating function
fixed point of ρ
birth and death process
gambler’s ruin
irreducible and stationary distribution
random walk
random walk on finite graph G
random walk on positive integer
random walk on integer Zd
Stirling’s formula
random walk on Z1
continuous Markov chain
time-homogenous—s does nothing
Chapman–Kolmogorov Equations
holding time at state i
proof
embedded discrete Markov chain
another representation of pij(t)
recover pij(t) using Q
matrix exponential
long time behavior of continuous Markov chain
stationary distribution
relationship with stationary distribution of embedded chain Φ
irreducible
fundamental theorem
continuous birth and death process
pure birth
recurrence
Poisson process
inter arrival time Ti
arrival time Si
alternative definition using exponential distribution
combine independent Poisson process
splitting Poisson process
1 occurrence time spot
many occurrence time spot
proof
definition of o(h)
alternative definition
proof
property of Stochastic process
second order stationary process
Gaussian process
multivariate normal distribution
Brownian motion
standard Brownian motion
covariance function
connection between Brownian motion and discrete random walk
(t)=⎩⎨⎧S(t)S(⌊t⌋)+X⌊t⌋+1(t−⌊t⌋)t∈N∗t∈/N∗ is continuous, can approximate Brownian motion after scaling and limiting using St(n)=nS(nt)→⋯W(t) relation between Gaussian process and Brownian motion
transformation of Brownian motion
hitting time
some distributions
Beta random variables
Beta function
exponential random variable
memory-less property
sum of n i.i.d. exponential distribution is gamma distribution
comparison between X1∼Exp(λ1) and X2∼Exp(λ2)
comparison between X1,X2⋯∼Exp with parameter λ1,λ2,⋯
minimum of n exponential distribution with λi’s
Gamma distribution
Poisson distribution
bivariate normal distribution
joint density
marginals
conditionals
Probability and Statistics
Linear Algebra
solution set
design matrix—ΧSpeech Recognition
.wav
feature extraction
<s>
, </s>
Statistical Machine Learning
Bayesian decision theory
linear regression
regularization in lasso
K-nearest neighbors (KNN)
support vector machine (SVM)
hard-margin binary SVM
soft-margin binary SVM
kernel SVM
positive-definite kernel
dimensionality reduction by principal component analysis (PCA)
iTuk)uk2=N1i=1∑Nk=q+1∑p(XiTuk)2⇒ukargmink=q+1∑pukTSuk s.t. ukTuk=1⇒Suk=λuk Monte Carlo sampling method
transformation method
rejection sampling
importance sampling
sampling-importance-resampling
simulated annealing
Markov chain
detailed balance
Markov chain Monte Carlo (MCMC)
metropolis hasting algorithm (MH algorithm)
Gibbs sampling
entropy
surprise
decision tree based on entropy
statistical learning theory
consistence wrt F & P
generalization bound
shattering coefficient N(F,n)
(Vapnik-Chervonenkis dimension) VC dimension
Rademacher complexity
structural risk minimization (SRM)
Unstructured Reading Notes
comparison: memcpy 64 KiB took 3µs, Goroutine switching took 170nsulimit
(VM), ~8 KiB resident (RSS) without touching stackSteven Hé (Sīchàng)’s Blog
My Programming Language Journey: How I Ended Up Using Rust and Python Most After Trying A Dozen Languages
Before the beginning
My first programming languages
Falling into the crab hole 🦀🕳️
Segmentation Fault. Core dumped
. It was 2022, the peak of the Rust hype; there were lots of conference talks introducing Rust. From these talks, I started getting a feel of the basic syntax and concepts. Ownership, the Result
enumeration type, lifetime somehow all made perfect sense to me, perhaps because I was already used to analytical thinking from learning too much math. So, I read the first half of the book The Rust Programming Language and translated some Python programs to Rust. Everything made sense and the programs got 40x faster with basically the same structures. I was sold..await
causing errors, async timeouts not timing out, and deadlocks. Rust is definitely not a language you can hack around; you need to understand the concepts behind what you are doing.What about…
Array.map
cost performance and it only really works well in heavy JetBrains IDEs.
~300× Speed Up in Rust: Finding Inexact Matches in Nested Sets
Simplified background
impl
ement the match_nested_set
method (fn
) on the Matcher
data struct
ure.2 This method should output whether the Matcher’s (IP address) prefix (self.prefix
of type IpNet
) matches a nested set of prefixes called name
. Additionally, it should accept a Range Operator op
that can modify how the prefix matching is done.struct Matcher {
prefix: IpNet,
sets: BTreeMap<u32, Vec<IpNet>>,
nested_sets: BTreeMap<String, NestedSet>,
}
impl Matcher {
fn match_nested_set(self, name: String, op: RangeOperator) -> bool {
unimplemented!()
}
}
193.254.30.0/24
matches the nested set CUSTOMERS
when we apply the Range Operator Plus
. The nested set CUSTOMERS
has a member 69
that contains the prefix 193.254.0.0/16
. Since 193.254.0.0/16
contains 193.254.30.0/24
, and Plus
specifies the “contains” relationship, then the method should return true
.let matcher = Matcher {
prefix: 193.254.30.0/24,
sets: BTreeMap(69: 193.254.0.0/16),
nested_sets: BTreeMap(
"CUSTOMERS": NestedSet { sets: [69], nested_sets: [] }
),
};
matcher.match_nested_set("CUSTOMERS", RangeOperator::Plus)
// => true
Naive implementation
impl Matcher {
fn match_nested_set(
self,
name: String,
op: RangeOperator,
visited: Vec<String>,
) -> bool {
if visited.contains(name) {
return false;
}
visited.push(name);
let Some(nested_set) = self.nested_sets.get(name) else {
return false;
};
for set in nested_set.sets {
if self.match_set(set, op) {
return true;
}
}
for nested2_set in nested_set.nested_sets {
if self.match_nested_set(nested2_set, op, visited) {
return true;
}
}
false
}
fn match_set(
self,
num: u32,
op: RangeOperator,
) -> bool {
let Some(set) = self.sets.get(num) else {
return false;
};
set.iter()
.any(|prefix| prefix_and_op_matches(prefix, op, self.prefix))
}
}
visited
tracks the nested set names we have checked to prevent infinite recursions from nested sets that contain each other.let else
syntax, we look up name
in the nested sets, and either bind the result to the variable, or return false
early if name
is not found.3 We apply the same trick to num
.set.iter().any
expression uses a common declarative pattern. It loops through the prefix
elements in set
, calls prefix_and_op_matches
on each of them, and returns true
iff one of the calls returns true
.prefix_and_op_matches
checks if prefix
matches Range Operator op
and another prefix.Standard data structure changes
visited.contains
). Since visited
was a vector (a.k.a. array list), its linear search compounded with recursion to a cubic growth. I chose to use vectors because I thought most nested sets would have less than ten members, in which case linear search would be fine, but my assumption was clearly wrong. I changed visited
to use HashSet
and boosted the speed by about 10×.Matcher
contains two B-tree maps, which are amazing sorted binary trees with O(logn) lookup time complexity. However, hash maps have Θ(1) lookup. Therefore, it was a no-brainer to try replacing BTreeMap
with HashMap
. I recall this find-replace change causing the get
calls to occupy seemingly larger areas in the flame graph! However, I also saw a 30% faster benchmark, so the hash map was indeed faster.Avoiding redundant work
visited
, further profiling showed our recursion calls all ended up at a slow expression:set.iter()
.any(|prefix| prefix_and_op_matches(prefix, op, self.prefix))
fn match_ips(prefix: IpNet, prefixs: [IpNet], op: RangeOperator) -> bool {
let center = prefixs.binary_search(prefix).map_or_else(identity, identity);
// Check center.
if let Some(value) = prefixs.get(center) {
if prefix_and_op_matches(value, op, prefix) {
return true;
}
}
// Check right.
for value in prefixs[(center + 1).min(prefixs.len())..] {
if prefix_and_op_matches(value, op, prefix) {
return true;
}
if !prefix.is_sibling(value) {
break;
}
}
// Check left.
for value in prefixs[..(center.saturating_sub(1)).max(prefixs.len())]
.iter()
.rev()
{
if prefix_and_op_matches(value, op, prefix) {
return true;
}
if !prefix.is_sibling(value) {
break;
}
}
false
}
match_ips
first obtains a starting point center
using a regular binary search on the sorted prefixes
; the prefix at center
is the most similar to prefix
. From there, we search rightward and then leftward in the prefix list, until they are no longer siblings with prefix
—a necessary condition for matches.center
, thus we should gain more efficiency with larger, flattened prefix sets. However, flattening nested prefix sets results in large vectors and duplicated prefixes, which bloats memory and hurts cache efficiency. After empirical testing, I chose to flatten the nested prefix sets once, which yielded a best, 2× speedup.5Custom data structure
visited
set. Recall that visited
was a HashSet
that stores the seen sets’ names to prevent infinite recursion:fn match_nested_set(
self,
name: String,
op: RangeOperator,
visited: HashSet<String>,
) -> bool {
if visited.contains(name) {
return false;
}
visited.insert(name);
// …
}
visited.contains
hashes name
and tries to look for it in the hash table, which would probably fail because nested sets containing each other are rare. Then, visited.insert
hashes name
again and finds a bucket in the hash table to store it.HashSet
does quadratic probing, i.e., when a hash collision occurs during lookup or insertion, it hops around the hash table until it finds an empty slot. As we encounter numerous names, I guessed that visited
was often somewhat full when we looked up unseen names, causing it to probe around until it finds an empty slot, which seems inefficient. Thus, I thought a Bloom filter could help by providing an early rejection when a name is not in visited
.Tenable caching
fn match_nested_set(/* … */) -> bool {
// …
for set in nested_set.sets {
if self.match_set(set, op) {
return true;
}
}
for nested2_set in nested_set.nested_sets {
if self.match_nested_set(nested2_set, op, visited) {
return true;
}
}
false
}
IpNet
) is 18 bytes and they are numerous, but sets are represented with 32-bit numbers and are much fewer. Although flattening nested sets to their set num
bers does not leverage our custom binary search, it eliminates the recursion calls, along with the major overhead of tracking visited
(R.I.P. BloomHashSet
), thus achieving the effect of partially caching the set flattening process.match_nested_set
is called filter_as_set
. The monolithic bar on the left of each graph is data loading, and clearly shows the left graph took much more time besides data loading.About this article
&
or &mut
) and parsing code to be friendly to readers unfamiliar with Rust. I also changed the names to avoid explaining the research context. ↩Vec<IpNet>
with trie IpTrie
to improve filter_as_set
bottleneck. I ran these experiments on the same server but with hyper-threading turned on, which may yield higher speed. ↩ ↩2Creating Perfect Grayscale-Gradient Colormaps
Result: hand-picked hues and perfect grayscale gradient
COLORS6: Final = tuple(
hue_grayscale_to_srgb(hue, grayscale)
for hue, grayscale in zip(
[60, 180, 120, 240, 0, 300],
np.linspace(0.96, 0.2, 6),
)
)
Math: calculating linear RGB from hue and grayscale
Linear RGB vs standard RGB
Exercise for the reader
What an mdBook Preprocessor Does—Code Walk-through of mdBook-KaTeX
Define $f(x)$:
$$
f(x)=x^2\\
x\in\R
$$
Define <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span>:
<span class="katex-display"><span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8641em;"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span><span class="mspace newline"></span><span class="base"><span class="strut" style="height:0.5782em;vertical-align:-0.0391em;"></span><span class="mord mathnormal">x</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6889em;"></span><span class="mord mathbb">R</span></span></span></span></span>
Topic: what an mdBook preprocessor does
mdBook-KaTeX as a CLI App
$ mdbook-katex --help
A preprocessor that renders KaTex equations to HTML.
Usage: mdbook-katex [COMMAND]
Commands:
supports Check whether a renderer is supported by this preprocessor
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
supports
. But, if no command is specified, it reads from StdIn:$ mdbook-katex --help
# (Nothing happens).
# (Press control + D).
Error: Unable to parse the input
Caused by:
EOF while parsing a value at line 1 column 0
mdbook-katex
command directly. Instead, mdBook would invoke it when it builds the book.The
supports
commandhtml
renderer. So, after loading the book from disk, mdBook invokes mdBook-KaTeX like this:mdbook-katex supports html
html
renderer.Reading and processing the book data from StdIn
let pre = KatexProcessor;
let (ctx, book) = CmdPreprocessor::parse_input(io::stdin())?;
let processed_book = pre.run(&ctx, book)?;
serde_json::to_writer(io::stdout(), &processed_book)?;
ctx: PreprocessorContext
and the book data book: Book
from StdIn using mdbook::preprocess::CmdPreprocessor
. We then run it through our preprocessor pre
and get the processed_book: Book
. Finally, we print the book data back to StdOut, where mdBook would catch it and use it for the next steps.main.rs
of mdBook-KaTeX and start your own preprocessor; the only change would be replacing KatexProcessor
with another struct
that implements mdbook::preprocess::Preprocessor
:pub trait Preprocessor {
fn name(&self) -> &str;
fn run(&self, ctx: &PreprocessorContext, book: Book) -> Result<Book>;
fn supports_renderer(&self, renderer: &str) -> bool;
}
name
and supports_renderer
are trivial, but run
is where the fun lives. For KatexProcessor
, it finds the math expressions in each chapter of book
and render them.Processing
book
book
above. For each chapter, we loop over its bytes, find the math expressions, and replace them with rendered HTML. Then, we stick these chapters back into book
.fn run(&self, ctx: &PreprocessorContext, mut book: Book) -> Result<Book> {
// …
book.for_each_mut(|item| {
if let BookItem::Chapter(chapter) = item {
chapter.content = process_chapter(&chapter.content, /* … */)
}
});
Ok(book)
}
for_each_mut
method on book
to iterate over its items and mutate them. We filter out the item: &mut BookItem
s that are BookItem::Chapter
. We then call process_chapter
on their content: String
and assign the results back.process_chapter
. scan: Scan
is a custom scanner that scans through each byte in raw_content
and produces Event
s that indicate the beginnings and ends of blocks.fn process_chapter(raw_content: String, /* other args */) -> String {
let scan = Scan::new(&raw_content, /* … */);
let mut rendered = Vec::new();
let mut checkpoint = 0;
for event in scan {
match event {
Event::TextEnd(end) => rendered.push((&raw_content[checkpoint..end]).into()),
Event::InlineEnd(end) => {
rendered.push(render(&raw_content[checkpoint..end], /* … */));
checkpoint = end;
}
// …
}
}
// …
rendered.join("")
}
Event
, we identify text blocks and math blocks, and apply the render
function to the math blocks. The render
function then uses the katex
crate to render HTML. Finally, we join
all the strings in rendered: Vec<String>
into the new content of the chapter.What next
Configuration options
ctx: &PreprocessorContext
argument passed into the run
method. Then, we further parse the configurations and pass them around.Parallelism
katex
crate uses QuickJs to render KaTeX, which is ironically slow, KaTeX rendering has been the performance bottleneck. Initially, by manually scheduling rendering tasks using Tokio, I was able to get 5x speed on an M1 Mac, from 10sec to 2sec rendering my 30-thousand-word notes.let mut chapters = Vec::with_capacity(book.sections.len());
book.for_each_mut(|item| {
if let BookItem::Chapter(chapter) = item {
chapters.push(chapter.content.clone());
}
});
let mut contents: Vec<_> = chapters
.into_par_iter()
.rev()
.map(|raw_content| process_chapter(raw_content, /* … */))
.collect();
book.for_each_mut(|item| {
if let BookItem::Chapter(chapter) = item {
chapter.content = contents.pop().expect("Chapter number mismatch.");
}
});
chapters
for each thread to own the chapter they process.into_par_iter
method for Vec
and the map
method, which Rayon provides, to process the chapters in parallel.for_each_mut
even when gathering the chapters because iter
unfortunately iterates them in a different order.rev
to iterate the chapters in reverse order, so when we put the rendered chapters back into the book, we can simply call pop
on the contents to get them in the correct order.Conclusion and preview
supports
command.
Learn Like Machine Learning Models
We are Learning Async Rust Wrong
async
on every function and .await
on every function call when programming async Rust. I believe such suggestions reveal that many of us are stuck at surface-level understanding of async Rust. However, such phenomenon has to do with the way we get introduced to async Rust: many of us were fooled too much when we started learning async, we patched our understandings as we hit confusing walls. Instead, I believe we should adequately explain to newcomers the important bits of async Rust in the first place.Lies: a common learning path
Send + Sync + 'static
).The missing context for async
E.g., lots of tricky trait bounds are needed; the error messages are contrived.
E.g., arbitrary functions can bottleneck the system, and it is hard to reason about.
E.g., confusing panics when running blocking code in async context.Future
returning Poll::Pending
when polled). Although yielding is an overhead, it enables two superb features:
One task cannot block for too long; all tasks get a chance to run soon.
When the runtime gets back the control, it can then apply cancellation, check other branches of select!
, etc.The missing lessons
Future
trait, you specify tasks that can be suspended.Future
provides a common interface for tasks that can be suspended, so that your code can yield control back to its caller, usually a runtime; only then, the runtime retains control for cooperative scheduling, suspension, and cancellation.
To emulate preemptive scheduling and immediate cancellation, each yield point must be close to each other. Thus, you need to meticulously insert yield points in suitable places of your async code, a daunting task that demands experience.
If you have a sucker async function that runs for a whole millisecond before it yields, it surely will eat a whole millisecond as soon as it starts.Future
, yield points are obvious—they are when you return Poll::Pending
. In contrast, the async/await syntax hides yield points from programmers. Consequently, most beginners have no idea what a yield point is; even those who know the concept often hold this mistake that calling .await
creates a yield point, presumably from JavaScript knowledge.async
on their functions, and call them like regular functions but with .await
, just like the aforementioned conference talk depicts.yield_now()
), and all heavy sync functions need to be wrapped in either spawn_blocking
or block_in_place
plus a yield_now()
(I have tested, block_in_place
does not yield).Design Patterns Reveal OOP is Inherently Unsuitable for Systems
Lack of emphasis on object implementation
where
clause. To know that a data structure implements an interface, some work seems to be required.Disconnection between code and reality
Deprecated
How did we get here? WordPress?
WTF
The reality
What I get with WordPress
Conclusion
Diary
start
choice for the diary
my take on blogs
generate HTML
WordPress
hand write
Markdown
Markdown flavor
\n
, some support \(\TeX\), and some support PHP.Hugo
hugo-book
just looks like a crappy version of GitBook, might as well use GitBook. But, GitBook is “converted” from open source to “proprietary services” (what a shame). Might as well use mdBook which is still open source.Doks
look nice, but it shocked me by taking 500MB on my disk (how does some HTML and CSS text take so much space?). It is absolutely not anything lightweight, and it reminds me of WordPress…Nice look
nor Simplicity
and I need to implement Math support and Code highlighting myself (which, is just adding some <link>
on top of the file, but still, annoying). Also, some themes I tried are buggy and do not even work if I added the math support.mdBook
web server
coda
VSCodium
the reason to switch
the problem with VSCode
searching for an alternative
installation via Homebrew
brew install vscodium
~/.vscode-oss
to ~/.vscode
and ~/Library/Application Support/VSCodium
to ~/Library/Application Support/Code
. I also deleted everything that I don’t recognize as my configuration file nor extensions.the extension problem with VSCodium
product.json
. This change broke the Python extension and did not give me the VSCode marketplace.performance and macOS ARM support
The Run
The lockdown
Permission to leave
A crowded square
Enter the station
The rest
Feeling Lost
Linux VM And Neovim
Linux VM on M1 MBP
virt-manager and qemu
Parallels Desktop
The distro for the VM
Installing the VM
Neovim
Lua
Back from Duke
Duke semester
Ex-relationship
Elixir Phoenix
Leaving Rails
Learning Elixir
Learning Phoenix
mdbook-katex
What and why I use it
mdbook-katex
. Previously, my notes have already being using it, so I was confident that it works well.\\(
delimiters, just use $
and $$
.mdbook serve
.The project was in a bad condition
mdbook-katex
, lzanini, had not made any updates on the project for a year.mdbook-katex
via Cargo and Git:cargo install --git "https://github.com/lzanini/mdbook-katex"
static-css
was extremely slow for me, taking around 20 seconds on my slow network. I later would find out that I was downloading all the fonts twice every time the book gets rebuilt.mdbook-katex2
cargo install
it. Cargo has a “first come first server” policy so that new packages can never take old packages’ name. I still wanted to have my fork on crates.io so I changed the package name to mdbook-katex2
.mdbook-katex2
and thanking the original author.mdbook-katex
.Spring 2024 Ideas
Rethinking The Web
Browser space
#include
, only the bad <iframe>
.Rust
Threads & Async & Coroutines & Lunatics
Spring 2024 Activities
Flower “AI” Summit 2024, London
IMC ’24
Funding issues
Visa shopping
Paris and how I got sick
Madrid and the talk
whois
queries about Spanish autonomous systems and gave me some materials last-minute. Belly lying on the twin bed in my 4-person shared room, I managed to figure out where I could sneak in that part to my talk.The conference overall
Recovering Files on Exxact
d
for “delete” and y
for “yes”. It lagged for a while, then, to my horror, only one folder was shown in the file explorer.ls
confirmed that everything is deleted except the file Chromium was writing to. That is, data collected in roughly a month are all gone.% sudo extundelete /dev/nvme0n1 --restore-all
NOTICE: Extended attributes are not restored.
WARNING: EXT3_FEATURE_INCOMPAT_RECOVER is set.
The partition should be unmounted to undelete any files without further data loss.
If the partition is not currently mounted, this message indicates
it was improperly unmounted, and you should run fsck before continuing.
If you decide to continue, extundelete may overwrite some of the deleted
files and make recovering those files impossible. You should unmount the
file system and check it with fsck before using extundelete.
Would you like to continue? (y/n)
n
% sudo umount /dev/nvme0n1
% sudo extundelete /dev/nvme0n1 --restore-all
NOTICE: Extended attributes are not restored.
Loading filesystem metadata ... 28616 groups loaded.
Loading journal descriptors ... 0 descriptors loaded.
Searching for recoverable inodes in directory / ...
0 recoverable inodes found.
Looking through the directory structure for deleted files ...
0 recoverable inodes still lost.
No files were undeleted.
% sudo fsck -f /dev/nvme0n1
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
ssd1: 27/234422272 files (3.7% non-contiguous), 15003479/937684566 blocks
% sudo extundelete /dev/nvme0n1 --restore-all
NOTICE: Extended attributes are not restored.
Loading filesystem metadata ... 28616 groups loaded.
Loading journal descriptors ... 0 descriptors loaded.
Searching for recoverable inodes in directory / ...
0 recoverable inodes found.
Looking through the directory structure for deleted files ...
0 recoverable inodes still lost.
No files were undeleted.
% sudo debugfs -R "dump <8> /hdd1/nvme0n1.journal" /dev/nvme0n1
% cd /hdd1
% du -sh nvme0n1.journal
1.1G nvme0n1.journal
% sudo apt-get install ext4magic
% sudo ext4magic -M -j /hdd1/nvme0n1.journal -d /hdd1/DeGenTWeb_recovery/ /dev/nvme0n1