Detection As Information Gain

You do not know which world you are in. Breached or not breached. Endogenous or exogenous. Normal system behavior or adversarial system behavior. Every alert is evidence. Most alerts are weak evidence. The job is to ask questions that collapse uncertainty fastest.

TL;DR
The real question
Has the network we are responsible for defending been breached?
The model
A detection is valuable in proportion to the number of plausible worlds it eliminates.
The failure mode
Most alerts are weak evidence. They leave too many possible worlds alive and force the analyst to become the reasoning layer.
The better question
Does this communication belong to the system, or was it imposed into the system by an adversary?

Every security program is organized around a question it almost never says out loud.

Has the network we are responsible for defending been breached?

That is the real question. Everything else is instrumentation around it.

The SIEM sees events. The EDR sees processes. The identity provider sees logins. The firewall sees sessions. Each surface emits fragments.

But the defender is not trying to classify fragments.

The defender is trying to infer the state of the system.

The frame is Bayesian without needing math.

  1. StateThe network is in one of many possible worlds.
  2. EvidenceEach alert, flow, login, and session changes the distribution.
  3. UpdateUseful evidence rules out plausible worlds.
  4. QuestionThe next move should collapse uncertainty faster.
That is detection. Not alerting. Inference.

Draw the unknown as a black square.

The square is not empty. It contains every world still consistent with what you know.

One world contains a compromised workstation. One contains a user mistyping a password. One contains a misconfigured backup job. One contains an attacker maintaining command and control through legitimate cloud infrastructure. One contains normal business traffic. One contains data theft.

Before evidence arrives, too many worlds fit.

priorunknown state

Too many worlds fit.

No observation has arrived.

64 / 64 worlds remain
  • user error
  • normal workflow
  • credential attack
  • C2
  • data theft
Before evidence arrives, the square contains every world still consistent with what you know.
Mental modelA detection is valuable in proportion to the number of plausible worlds it eliminates.

Bad detections leave the square almost unchanged.

Good detections collapse it.

Consider a common alert:

Invalid login attempt from a VPN user.

This is evidence.

It may be password spraying. It may be credential testing. It may be an attacker probing the enterprise edge.

It may also be a user typing the wrong password. A stale mobile client. An expired contractor account. A cached VPN profile on an old laptop. An authentication retry from a system that already healed.

Many worlds survive.

weak updatelow information gain

A failed VPN login removes little.

Invalid login attempt from a VPN user.

46 / 64 worlds remain
  • user typo
  • stale client
  • password spray
  • credential test
  • travel
Weak evidence changes the square, but leaves many benign and adversarial worlds alive.

The alert tells us an authentication attempt failed. It does not tell us whether an adversarial system is present. It does not tell us whether anything is under external control. It does not tell us whether command is being maintained, whether movement is occurring, whether data is being staged, or whether the network is producing behavior it would not have produced on its own.

The analyst now has to ask the real questions.

  • Was this one failure or many?
  • Is the account privileged?
  • Did the source later succeed?
  • Has this source appeared before?
  • Did the user authenticate from another geography?
  • Did a host begin doing anything new after the login?

Each pivot is an attempt to gain information the original alert did not carry.

The machine emits a fragment. The human has to discover whether the fragment matters.

Now consider a different observation.

An internal host that normally talks to a small set of internal services begins maintaining a machine-timed HTTPS channel to an external service it has never used before. The channel appears after business hours. It is bidirectional. It persists across interruptions. Its timing is regular. The destination is a legitimate cloud service, but this host's role has no history of producing that communication.

This is also evidence.

But it is more informative evidence.

Fewer worlds survive.

Many benign worlds can produce a failed VPN login. Fewer benign worlds produce a new, persistent, machine-timed external channel from a host whose role does not explain it.

The observation is closer to the mechanics of post-breach operation.

If an attacker is inside an enterprise, the attacker has to preserve coherence. They need to coordinate activity, receive instructions, maintain access, move through the environment, stage actions, and adapt.

Coordination requires communication.

Command and control is not merely a technique. It is the nervous system of a post-breach operation.

The destination can be legitimate. The protocol can be normal. The certificate can be valid. The payload can be encrypted. The event can look ordinary in isolation.

Does this communication belong?

Modern security tooling is biased toward what is easy to observe.

Authentication failures are easy to emit. Process launches are easy to log. Known malicious hashes are easy to match. Blocklisted destinations are easy to score. Rule matches are easy to count.

None of that makes them maximally informative.

This is the category error underneath a lot of detection work. We confuse what is observable with what is useful. We inherit the shape of our sensors as the shape of our thinking.

If the tool emits events, the SOC becomes event-shaped.

If the tool emits alerts, the SOC becomes alert-shaped.

If the tool emits fragments, the SOC becomes a fragment-reconciliation machine.

But the attacker is not a fragment.

A mature intrusion is a distributed adversarial system.

Its events are partial observations.

Its coherence lives in communication.

Design questionWhich observations eliminate the most plausible worlds?

An intrusion has a path.

Actors differ. Tooling differs. Infrastructure differs. Initial access differs. But the broad shape is stable enough to reason about:

  1. entry
  2. command
  3. privilege expansion
  4. lateral movement
  5. staging
  6. data theft
  7. leverage

Initial access matters, but it is often a low-information region of the problem. The internet produces failed logins, phishing clicks, suspicious emails, exposed services, and endpoint oddities all day. Many are attempts that failed. Many are user error. Many are ambiguous until more evidence is pulled in.

Post-breach communication is different.

Once an attacker is operating inside the enterprise, the adversarial system has to communicate to remain a system. It has to coordinate across hosts, identities, applications, cloud services, and external infrastructure. It has to move through the substrate of the enterprise while not being native to that substrate.

That creates maximally informative questions:

  • Did a stable peer set expand into systems it has never touched before?
  • Did communication cross a boundary the network's own dynamics do not cross?
  • Did a host begin communicating with destinations its role would not produce?
  • Did an internal system maintain a machine-timed channel to an external service?
  • Did data leave through a legitimate application in a shape the enterprise does not produce on its own?
  • Did destination, timing, direction, volume, session shape, protocol behavior, and peer context change together?

These are not merely anomaly questions.

They are system questions.

They ask whether the observed communication is part of the enterprise's own dynamics or evidence of another system using the enterprise as substrate.

Take two observations.

Observation one

Weak update

A VPN user had three failed login attempts from an unfamiliar IP address.

Many worlds survive.
Observation two

High-information update

A developer workstation uploaded 14.9 GB to Slack after hours, using pre-stored credentials, with upload tooling statistically incompatible with the host's prior legitimate Slack session, while new Tailscale peers appeared during the upload window.

Far fewer worlds survive.

The first observation may matter.

But many worlds survive.

The second observation still does not give omniscience. It could be an authorized red-team exercise. It could be a compromised token. Flow telemetry alone may not reveal the destination Slack workspace, the actor identity, the token provenance, or the upload binary.

But far fewer worlds survive.

case filehigh information gain

The square collapses.

Off-hours Slack upload, new peers, incompatible tooling.

8 / 64 worlds remain
  • red team
  • token reuse
  • Tailscale peer
  • Slack exfil
  • workspace?
  • actor?
  • binary?
A high-information case does not claim omniscience. It leaves a smaller set of worlds and the next questions that matter.

The second observation describes a communication system:

  • high-value host
  • off-hours transfer
  • legitimate cloud service
  • pre-authenticated use
  • tooling different from prior legitimate behavior
  • single-host fleet isolation
  • coincident external operator presence
  • clear outside enrichments that would collapse the remaining uncertainty

That is what detection should produce.

Not certainty theater.

Not a raw alert.

A high-information case file.

The right output says: here is what the network shows, here is what it rules out, here is what it cannot know from this sensor layer, and here is the next question that will collapse uncertainty fastest.

Every SOC has finite attention.

If the detection system emits low-information fragments, analysts become the compression algorithm. They absorb thousands of weak measurements and try to turn them into a small number of meaningful beliefs about the state of the network.

That is bad system design.

The machine should do more of the compression before the human enters the loop.

It should pivot through related communication. It should compare the host to its own history. It should compare the host to the fleet. It should test benign explanations. It should ask what changed together. It should identify what the current telemetry can prove, what it can disprove, and what it cannot resolve without enrichment.

If detection is inference, architecture has to follow the questions we want to ask.

You cannot ask whether communication belongs to the network's own dynamics if you only see isolated events.

You need communication shape:

  • timing
  • direction
  • peer sets
  • protocol metadata
  • session behavior
  • destination context
  • historical role
  • fleet comparison
  • cross-boundary relationships
  • changes across multiple dimensions at once

A failed login is a fragment. A process launch is a fragment. A DNS lookup is a fragment. A flow is a fragment.

The breach is often visible only when the fragments become a system.

That is why network communication matters.

Cyber attacks must communicate over the network. Network communication has observable shape. A model that learns what a network produces on its own can ask whether new communication belongs to that system or whether something else has been imposed inside it.

The black square never disappears completely.

Security rarely gets perfect knowledge. The network can show communication shape, but not every actor intention. Flow telemetry can show a Slack upload, but not the destination workspace. It can show a machine-timed channel, but not the process hash unless host telemetry is available. It can show a new peer path, but not always whether the change was authorized.

That is fine.

The goal is not omniscience. The goal is to ask questions that reduce uncertainty faster than the adversary can exploit it.
Not

does this event look suspicious?

But

does this communication belong?

See how Prophet worksRead breach analysis