Math prerequisites for Bitcoin (side-quest)

sommaire · 5 sections

Modular arithmetic: the wrap-around clock
Hash functions: the fingerprint machine
Elliptic curves: the one-way lock
Law of large numbers: what “on average” really means
Going further

You’re here because you clicked “math prerequisites” from an article in the Inside the gears of Bitcoin series. This article isn’t part of the series itself — it sits to the side, like a side-quest in a video game: you stop by to pick up what you’re missing, then you go back where you were.

We’ll cover four building blocks: modular arithmetic, hash functions, the intuition of an elliptic curve, and the law of large numbers. Not a course — just the idea of each one, and why Bitcoin needs it. If you already know one of the four, skip the section.

Modular arithmetic: the wrap-around clock

You know how to read a clock. Set the hour hand on 11, move forward by 3 hours: the hand lands on 2. Not on “14” — the dial only has twelve slots, and after 12, it starts over from zero. You just did modular arithmetic mod 12, without realizing it: (11 + 3) mod 12 = 2. The clock forgets the complete laps, it only keeps the remainder.

Formally: a mod n is the remainder when a is divided by n. The clock does mod 12, but you can do mod any integer: 27 mod 12 = 3, 100 mod 7 = 2, 42 mod 5 = 2. Same “clock” idea, just with a dial of 7, 5, or 78 decimal digits.

In this world, addition and multiplication keep working the way you’re used to. (8 + 9) mod 12 = 17 mod 12 = 5 — the clock points to 5. And (4 × 5) mod 12 = 20 mod 12 = 8. You can add, multiply, raise to powers, everything stays consistent inside the “12 slots” of the clock.

What changes is division. In the normal world, 1 / 3 = 0.333…. In a modular world, you have to find the number that, multiplied by 3, gives 1 modulo n. Sometimes it exists, sometimes it doesn’t. It’s more subtle, and that’s exactly the subtlety Bitcoin exploits.

Why Bitcoin : all of Bitcoin’s cryptographic calculations happen in a universe modulo a huge number p — roughly 2²⁵⁶, that is 78 decimal digits. At that size, guessing where you are by trial and error becomes physically impossible: there are more positions to try than there are atoms in the solar system. That size is the entire security.

Hash functions: the fingerprint machine

A hash function takes anything as input — a word, a book, a 10 GB file — and outputs a number of fixed size. The function Bitcoin uses (SHA-256) always outputs 256 bits, whether you give it “hello” or “In Search of Lost Time” in its entirety.

A hash function takes an input of arbitrary size and produces a fingerprint of fixed size. Three properties make it a cryptographic tool: determinism, irreversibility, collision resistance.

Three properties make SHA-256 a useful cryptographic tool, not just a data compressor:

Deterministic — the same input always produces the same fingerprint. Always.
Irreversible in practice (one-way) — from the fingerprint alone, recovering the input that produced it requires, on average, trying half of all possible inputs. At 256 bits, that’s ≈ 2²⁵⁵ attempts — meaning never.
Collision-resistant — finding two different inputs that yield the same fingerprint is also infeasible. That’s what prevents an attacker from manufacturing two blocks with the same fingerprint to swap one for the other.

A small detail that speaks: a single bit changed in the input flips roughly half of the bits in the output. Hash “hello”, hash “Hello”, and the two fingerprints have practically nothing in common. This is called the avalanche effect — it’s what prevents anyone from “tweaking” the input to aim at a specific output.

Why Bitcoin : at least three things. (1) Proof of work consists in finding an input whose hash starts with many zeros — no trick possible, you have to try (cf. BTC-A-03 mining , coming). (2) Block fingerprints link blocks to one another: modifying an old block breaks the whole chain after it. (3) Bitcoin addresses are hashes of public keys — you share the fingerprint, you keep the key.

Elliptic curves: the one-way lock

Now we hit the most exotic building block. Don’t panic: we’ll stay at the intuition level.

An elliptic curve is a curve in a plane, defined by an equation of the form y² = x³ + ax + b. Bitcoin uses a specific curve called secp256k1, with the equation y² = x³ + 7. Visually, it looks like a parabola lying on its side, with a cusp on the left:

The curve y² = x³ + 7 over the reals. G is the generator point fixed by the secp256k1 standard; the construction shows how 2G = G + G is computed via the tangent method.

What makes these curves special is that you can define an operation called “point addition” on them. Not coordinate-wise addition — a geometric operation. The schema shows the case that matters directly for Bitcoin: adding a point to itself. Start from the point G (the generator, fixed by the secp256k1 standard, public coordinates, identical for everyone), draw the tangent to the curve at G, look at where that tangent meets the curve elsewhere (the point T), and reflect that point across the horizontal axis — you get a new point called 2G = G + G. It’s strange, it’s arbitrary, but it’s well-defined: given G, the resulting 2G is unique.

The rule generalizes for two distinct points: instead of a tangent, you draw the secant (the line through both), take the 3rd intersection with the curve, reflect it. Tangent = the special case where the two points coincide. In both cases you get a third point on the curve — that’s what we call “adding” on the curve.

Once you can add, you can iterate: 3G = 2G + G, 4G = 3G + G (or more cleverly: 4G = 2G + 2G), and more generally, for any integer k, the point k · G is defined as G added to itself k times. For astronomical k (10⁷⁷), we speed it up by doubling cleverly — but the idea is still that we walk the curve, jumping from point to point.

And here’s the trick that grounds Bitcoin. If you pick a large random number k — typically 256 bits, so ≈ 10⁷⁷ — and compute P = k · G, you get a new point P on the curve, among the ≈ 2²⁵⁶ possible points.

Going from k to P is easy. A few microseconds on a phone.
Going from P back to k is impossible in practice. No known shortcut. You’d have to try all possible k — ≈ 2²⁵⁶, more than the number of atoms in the galaxy. All the computers in the world combined wouldn’t finish before the end of the universe.

This problem is called the elliptic curve discrete logarithm problem (ECDLP). If someone tomorrow found a trick to solve it fast, Bitcoin would be broken, along with all HTTPS communications, biometric passports, and half of modern cryptographic systems. Nobody has found one in 40 years.

Why Bitcoin : k is your private key, P is your public key. You can share P with the whole world (it’s the base of your address), nobody can trace back to k. And you can prove you know k without revealing it — that’s what a digital signature is. The whole edifice of ownership on Bitcoin rests on this easy/impossible asymmetry.

Law of large numbers: what “on average” really means

Last building block, softer. When you flip a coin once, you get heads or tails — the “expectation” of 0.5 doesn’t make sense in a single experiment. When you flip 10, you often get 6 heads and 4 tails, or 4 and 6 — close to half, without being there. When you flip 10,000, you’ll be almost exactly at 50% heads, within a fraction of a percent.

That’s the law of large numbers: the more you repeat a random experiment, the more the observed mean converges to the theoretical mean (the expectation).

Coin flip simulation: the measured heads frequency converges to 0.5 as the number of flips grows. Three independent trajectories, same asymptotic behavior.

Three things to keep in mind:

The result stabilizes. Beyond a certain number of trials, the mean stops moving significantly.
But it never lands exactly on it. Even at 1 million flips, you won’t get exactly 500,000 heads — you’ll be off by a few hundred. The variance doesn’t disappear, it just becomes small relative to the number of trials.
Early deviations don’t “compensate.” If you have 60 heads in the first 100 flips, the next 100 won’t produce 40 heads to “balance things out” — they’ll produce ~50, like all the others. It’s just that in the larger average, those 10 excess heads become statistically insignificant.

Why Bitcoin : at least two places.

Mining and difficulty (BTC-A-03 , coming). Each miner attempt is a draw: is the hash below the target? yes or no, almost always no. At very large scale (10²⁴ attempts per block in 2026), the number of blocks found per unit of time follows the law of large numbers. That’s why the protocol adjusts difficulty every 2,016 blocks: so the average stays at 10 minutes per block, regardless of the network’s brute power.
Probability of double-spend (BTC-A-05 , coming). When people say “waiting for 6 confirmations makes a transaction practically irreversible,” that “practically” is quantified: Satoshi computed this probability in the 2008 paper. It drops exponentially with the number of blocks — direct application of the law of large numbers.

A useful transition phrase to remember: a block found after 10 attempts or after 10²⁴ attempts doesn’t say the same thing about the amount of computation behind it. That difference in scale is what secures the network.

Going further

You have enough to keep going. Head back to the article in the series you were reading — the link is in the next tab. And if a new word ever blocks you, come back here: these four building blocks cover ≈ 90% of the prerequisites for the series.

This article is not investment advice.

auto at 80% scroll

Modular arithmetic: the wrap-around clock#

Hash functions: the fingerprint machine#

Elliptic curves: the one-way lock#

Law of large numbers: what “on average” really means#

Going further#

newsletter / weekly

Modular arithmetic: the wrap-around clock

Hash functions: the fingerprint machine

Elliptic curves: the one-way lock

Law of large numbers: what “on average” really means

Going further