Execution

Execution

Crawling

Test VV8.
- Docker just works.
- Log format in tests/README.md. Need parser. They have post-processor/ but too specific.
  - Write log parser library in Rust.
- Does https://github.com/wspr-ncsu/visiblev8-crawler work?
  - Over-engineered. Would prefer simpler single script than this monstrosity of SQLite, Celery, Mongo, and PostgreSQL.
  - Runs Puppeteer eventually (crawler.js). ⇒ Let’s just use Puppeteer’s successor, Playwright.
    - Figure out Playwright.
- Prevent being blocked by USC OIT: Ask John/ add link to research description page in User-Agent.
Make Playwright and monkey testing work.
- Mounting: write directly to headless_browser/target/ on host.
  - Need sysadmin capability & root in Docker to run Playwright & create directory, or else spurious error.
  - Need sudo setenforce 0 for Fedora due to SELinux.
- File management: each site has own directory by encodeURIComponent(url), under which;
  - share browser cache in user_data/;
  - each (N out of 0~4) trial launch separate VV8 and write:
    - $N/vv8-*.log
    - $N.har
    - reachable$N.json
- Disable content security policy (CSP) for eval.
- ~~Prevent navigation.~~ Go back in browser history immediately when navigating.
  - ~~Browser bug: sometimes go back too much to about:blank.~~ Detect if page has horde on load event, and reload if not.
  - ~~[ ] Fewer navigation when headless??~~
  - Some sites (e.g., YouTube) change URL w/o navigation; cannot do anything about them.
- Visit 3 + 9 clicked pages like Snyder did.
- Some secondary URLs’ host name vary by www. prefix, e.g., google.com.
- Split out each visit to separate browser page, so that each VV8 log can be split by when gremlins is injected into “loading” vs “interacting”.
- Save space: remove user_data/ after all trials.
- Crawl only 100 first

Analyze API call traces

Separate site load & interaction
- Make single gremlins injection split each VV8 log into a part w/o interaction and a part w/ interaction: separate browser page for each load.
- When aggregating record, split by gremlins injection in VV8 log.
Find anchor APIs, the most popular APIs overall and per script.
- Filter out most internal, user-defined, and injected calls.
- Analysis of API popularity in popular_api_calls_analysis.md.
  - Tail-heavy distribution: It takes 1.75% (318) APIs to cover 80% of all API calls, and 3.74% (678.0) APIs to cover 90%.
  - Many calls before interaction begin.
  - DOM & event APIs dominate absolute counts.
  - Popularity per script is useless.
  - APIs called out in the proposal are somewhat popular.
- Pick manually among 678 APIs that make up 90% of calls, details in notable_apis.md.
Figure out frontend interaction/ DOM element generation API classification
- HTMLDocument.createElement before interaction is clearly DOM element generation.
- Various addEventListener calls are frontend processing.
- More potential heuristics in notable_apis.md.
- We only somewhat know what spheres a script belongs to, but how do we know it does not belong to another sphere?
  - We can probably only claim we detect which sphere.
Split script to fine-grained!!
- eval per ~~function~~ chunk of 1kB code. Details in eval_trick.md.
  - Doable because VV8 keep separate execution context

Log file interpretation

VV8 creates a log file per thread, roughly equivalent to a browser page we create plus some junk background workers. Each of $N/vv8-*.log contains:

Before gremlins injection:
- JS contexts created & their source code.
- API calls in each context.
  - Guaranteed not for interactions.
After gremlins injection:
- All of the above, but may be for interactions.

Observations when manually inspecting aggregated logs for YouTube

Details in youtube_scripts_api_calls_overview.md.

Strong indicators: popular APIs like addEventListener and appendChild strongly indicate specific spheres.
API pollution: ~~getting and setting custom attributes on window, etc. are recorded, but they are not browser APIs.~~ Functions generally seem more useful because we can and do filter out user-defined ones.
- Largely dealt with by filtering by API names (alphanumeric or space, at least 3 characters for this, 2 characters for attr, at most 3 consecutive number).
Useless information: ~~getting and setting from window, calling Array, etc. generally means nothing. API types (function, get, etc.) also seem useless once we consider this and attr.~~
- Just track anchor APIs and pick Function over Get for anchor APIs.
Difficult scripts: some scripts only call a few APIs, so they are difficult to classify.
- Do we care about every script or just big ones or just ones that call many APIs?
- Many scripts are in the HTML, so how to aggregate their stats over the 5 trials?
Aggregate multiple runs of same scripts.

By manually inspecting the 678 most popular APIs that make up 90% of all API calls in the top 100 sites, we spot “anchor” APIs (list in notable_apis.md). See the classification results in classification_results.md.

Certain indicators

Frontend processing
- Get .*Event, Location (some attributes), HTML(Input|TextArea)Element.(value|checked)
- Function addEventListener~~, getBoundingClientRect~~
  - These can also be used to trigger DOM element generation?
- Set textContent and anything on URLSearchParams, DOMRect, DOMRectReadOnly
DOM element generation, before interaction begins
- Function createElement, createElementNS, createTextNode, appendChild, insertBefore, CSSStyleDeclaration.setProperty
- Set CSSStyleDeclaration, style
UX enhancement
- Function removeAttribute, matchMedia, removeChild, requestAnimationFrame, cancelAnimationFrame, FontFaceSet.load, MediaQueryList.matches
- Set hidden, disabled
Extensional features
- Performance, PerformanceTiming, PerformanceResourceTiming, Navigator.sendBeacon

Intermediate indicators

XMLHttpRequest (and Window.fetch): send/fetch data from server, one of:
- Form submission, CRUD → frontend processing.
- Auth, tracking, telemetry → extensional features.
- Load data onto page → DOM element generation (but will be detected through other API calls)?
SVGGraphicsElement subclasses and canvas elements: graphics for UX enhancement, but you can render them and send SVG, so maybe DOM element generation?
CSSStyleRule, CSSRuleList: UX enhancement or DOM element generation.
Window.scrollY: UX enhancement or frontend processing.

Uncertain indicators

querySelector[All], getElement[s]By.*: get a node, but then what?
.*Element’s contains, matches: search for a node or string, but then what?
Storage, HTMLDocument.cookie: local storage, but then what?
DOMTokenList: store/retrieve info on node, but then what?
IntersectionObserverEntry: viewport and visibility, but then what?
ShadowRoot: web components, but then what?
Crypto.getRandomValues
frames: iframes

Deferred

Would like
- Clean up the APIs better.
- Separate out the 5 trials.
- Save space: compress logs.
- Proper logging.
- Checkpointing and resuming.
- Concatenate chrome.1, chrome.2 and other such logs after their previous logs (chrome.0) when analyzing to avoid unknown execution context ID.
Just thoughts
- If top 1000 sites yield poor results, try sampling other sites.
- Targeted event listener tests instead of chaos testing?

Steven Hé (Sīchàng)