About

What this is

A research observatory for the passive fingerprints a server can compute about connecting clients — the same signal families commercial bot-detection and device-intelligence products use. Fingerprints are computed at the network edge from raw packets and protocol behaviour, never client-side from anything the page asks your browser to compute.

Signal families

TCP SYN (p0f)p0f v3 raw_sig from the TCP handshake: TTL, MSS, window, option layout, quirks. TCP only — QUIC connections have none.
TCP SYN (JA4T)JA4T fingerprint rendered at the edge from the raw TCP SYN record: TCP window, ordered option kinds, MSS and window scale.
QUIC transport paramsThe set of QUIC transport-parameter IDs the client offers (sorted, GREASE-folded) — the QUIC transport-layer analogue of the TCP SYN signature. QUIC only.
TLS ClientHello (JA4)JA4 fingerprint of the TLS ClientHello with the raw ja4_r variant (full cipher / extension / signature-algorithm lists). The transport that carried it is the fingerprint's own first character: t for TCP, q for QUIC.
HTTP/2 framesAkamai-style fingerprint of initial SETTINGS order, WINDOW_UPDATE, PRIORITY and pseudo-header order. Only present on HTTP/2-over-TCP connections; availability is reported in coverage stats.
HTTP request (JA4H)Method, version, header count, cookie and language attributes of the request, with public ja4h_r variants preserving ordered header names and sorted cookie names where available. Cookie values stay represented by the cookie-value hash in stored and displayed population data.
User-AgentThe self-declared identity — interesting mostly when it disagrees with the layers below it.
Origin (country / ASN)Not a wire fingerprint: coarse IP metadata derived at ingest, browsable like any other family.

How counting works

Observations are recorded passively on page views (deduplicated per client+stack within a short window, approximating visits), actively by a browser beacon that probes both transports with a shared correlation id, and via an authenticated ingest API for other collection points. The site's own monitoring and operator traffic is excluded. Hourly per-fingerprint counts and pairwise co-occurrence counts are rolled up continuously; all observation data is pseudonymised before it is stored.

Bot claim checks

When a User-Agent claims a known crawler or user-triggered fetcher, Thumbprint compares the source IP with the ranges that operator publishes. The current checks cover Googlebot, Bingbot, GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, CCBot, Applebot, DuckDuckBot and AhrefsBot.

Privacy

Fingerprint labels

Where shown, the application a JA4 is "identified as" comes from the community JA4+ database by FoxIO, used here under the BSD-3-Clause licence covering JA4 TLS client fingerprinting. JA4 fingerprints a client's TLS stack, not its identity, so different applications built on the same stack (notably the many Chromium-based browsers) legitimately share one JA4 — labels are therefore shown as a distribution, not a verdict. Labels are sourced only for the JA4 family; the snapshot is updated manually.

Tag colour marks provenance throughout the site: violet tags are JA4+ database inferences, green tags are this site's own measured catalog data, and blue tags are values derived directly from the wire.

Controlled-client catalog

The catalog upgrades labels from inference to measurement: known clients (browsers, CLI tools, HTTP libraries, automation engines) are driven through this site's own edge in pre-registered sessions, so each entry records exactly which client produced which multi-signal stack. Sessions are registered through an authenticated API before the run and captured via a single-use URL, so nobody can label their own traffic after the fact. Catalog traffic never enters the population statistics. Where a fingerprint page shows "matches controlled captures", that is this dataset speaking. The catalog is self-generated and exportable (JSON, CSV); Linux container captures share the capture host's TCP signature and network, and automation-bundled engines (e.g. Playwright's) are recorded as distinct from branded builds because that gap is itself research signal.

Probe origins

Browser catalog sessions use named probe origins so one declared client can be measured across transports with the same correlation id.

The resumption probes deliberately issue tickets and close connections so a second request can expose resumed ClientHellos. Ordinary observatory hosts keep automatic TLS 1.3 tickets disabled, so those server-induced pre_shared_key variants stay isolated to the resumption probes.

Prior art

This project is inspired by tlsfingerprint.io (Frolov & Wustrow, "The use of TLS in Censorship Circumvention", NDSS 2019), generalized from TLS ClientHellos to the full transport/application stack, with JA4-suite fingerprint formats by FoxIO.