Table of contents
Title
Table of content
Table of contents
Title

Secret scanning

A single credential can do a lot of damage. It's the value that lets an application talk to a database, deploy to the cloud, charge a customer's card, or post into a chat workspace. Most of the time, it sits where it belongs — inside a vault, an environment variable injected at runtime, a properly scoped CI/CD secret. Sometimes, though, it ends up somewhere it shouldn't: pasted into a configuration file, hardcoded for "just one quick test," dropped into a thread to unblock a teammate, or echoed into a build log nobody will ever read. That's where secret scanning earns its place in an application security program.

What is secret scanning?

Secret scanning is the automated practice of looking through code, configuration, infrastructure, pipelines, and other places where teams store and discuss software, in order to find exposed credentials. A credible program goes beyond running a regex over a repository: it tells you what was found, whether the value is still active, where it appears, who needs to act on it, and how fast.

Secrets are not the same as sensitive data

Worth getting this distinction straight up front, because it shapes how teams think about both detection and response. Sensitive data (e.g., a customer's credit card number, a national identifier, a medical record) describes information about people that needs to be protected. Secrets, on the other hand, are credentials or keys that grant access to different systems. An API key for a payment provider is a secret; the card number that key helps process is sensitive data. Both deserve protection, but they have different threat models and security management approaches.

The values secret scanning looks for tend to fall into a familiar set: passwords, API keys, cloud access keys, SSH private keys, OAuth tokens, database connection strings, encryption keys, security certificates, and webhook URLs that often grant enough access to merit the same treatment. Many have recognizable shapes; for example, a Stripe key starts with a known prefix, an AWS access key follows a fixed format, and a private key sits between predictable header and footer lines.

For instance, an SSH private key may look like:

-----BEGIN OPENSSH PRIVATE KEY-----
[long encoded key content]
-----END OPENSSH PRIVATE KEY

-----BEGIN OPENSSH PRIVATE KEY-----
[long encoded key content]
-----END OPENSSH PRIVATE KEY

-----BEGIN OPENSSH PRIVATE KEY-----
[long encoded key content]
-----END OPENSSH PRIVATE KEY

-----BEGIN OPENSSH PRIVATE KEY-----
[long encoded key content]
-----END OPENSSH PRIVATE KEY

Those patterns help, but they aren't enough on their own; modern detection has to handle a lot more variety.

Why do exposed secrets happen at all?

The usual story is mundane: a developer hardcoded a token to test an integration locally and meant to remove it before pushing. A team running short on time decided, "We'll move it to the vault next sprint." A new engineer didn't know the vault existed. An AI coding assistant produced a working API integration example with a placeholder credential; the developer inserted a real key to make it run, and the file was committed.

Then the value spreads. Once a credential lands in a repository's history, removing it from the latest version of a file doesn't undo the exposure. The string lives on in earlier commits, in branches that haven't merged, in clones on other people's laptops, in mirrored forks. If the repository is public, automated scrapers will likely find it within minutes — search platforms, threat actors, and security researchers all run their own scanners over public Git hosts looking for exactly this. If the repository is private, anyone with read access, including past collaborators, can still recover it.

Code is far from the only source. Configuration files, CI/CD pipelines, container images, Kubernetes manifests, Terraform state, Helm charts, deployment scripts, runtime logs, observability platforms like Datadog, Slack channels, Jira tickets, Confluence pages, and onboarding documentation can all hold credentials that were never meant to live there.

Here's a small, fairly common example — a backup job written as a quick shell script:

#!/bin/bash
# Nightly backup of the orders DB to S3
export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

pg_dump -h db.internal -U backup_user -d orders \
    | gzip \
    | aws s3 cp - "s3://acme-backups/orders/$(date +%F).sql.gz"
#!/bin/bash
# Nightly backup of the orders DB to S3
export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

pg_dump -h db.internal -U backup_user -d orders \
    | gzip \
    | aws s3 cp - "s3://acme-backups/orders/$(date +%F).sql.gz"
#!/bin/bash
# Nightly backup of the orders DB to S3
export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

pg_dump -h db.internal -U backup_user -d orders \
    | gzip \
    | aws s3 cp - "s3://acme-backups/orders/$(date +%F).sql.gz"
#!/bin/bash
# Nightly backup of the orders DB to S3
export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

pg_dump -h db.internal -U backup_user -d orders \
    | gzip \
    | aws s3 cp - "s3://acme-backups/orders/$(date +%F).sql.gz"

Nothing about this script looks unusual at first glance — it does the job. But anyone with access to the file (or the build artifact it lands in, or the log of the run) gets two AWS credentials and a path into the production backup bucket. A scanner looking at the file flags both keys: the access key by its AKIA prefix and the secret access key by its high-entropy value following an AWS_SECRET_ACCESS_KEY assignment.

Why does this matter at the business level?

Attackers look for exposed credentials because they work. A leaked cloud key can spin up resources, exfiltrate data, or run cryptominers on someone else's bill. A leaked database password can lead to a breach disclosure. A leaked Git platform token can let an attacker plant code in a repository or tamper with a CI pipeline (a supply chain incident in the making). A leaked webhook URL can be enough to trigger internal workflows or read messages not meant for outside eyes.

The downstream consequences are familiar to anyone who has been near an incident: account takeover, unauthorized cloud usage, data breaches, lateral movement, supply chain compromise, service disruption, and the financial costs that come with all of the above. Regulatory frameworks add another layer: PCI DSS, GDPR, HIPAA, SOX, FISMA, and CCPA all impose requirements around access controls and the handling of credentials and customer data. A recurring pattern of exposed secrets is hard to defend against during an audit and even harder after a breach.

For executives, the calculus is straightforward. Secret scanning is comparatively cheap to run; the cost of a credential-driven incident (i.e., forensics, notification, remediation, lost trust, potential fines) usually isn't.

How does secret scanning work?

Often, several techniques are used together. Each has strengths and known failure modes, which is why no serious tool relies on just one.

Regular expressions (aka regex) are the oldest approach. A rule describes the shape of a particular credential type (e.g., a Stripe key, an AWS access key, a private key block, a JWT) and the scanner looks for matches. This works well when providers publish stable formats and badly when they don't, and it produces false positives whenever a string happens to look like a credential without being one.

Entropy analysis asks the opposite question: not "does this match a known format" but "does this look random." Secrets generated by a CSPRNG have measurably high Shannon entropy. So do hashes, identifiers, and base64-encoded blobs — which is the catch.

Dictionary and pattern libraries lift some of that ambiguity by comparing values and surrounding text against catalogs of known credential types and naming conventions. A high-entropy string assigned to a variable called password in a production deployment file is treated very differently from the same string in a test fixture.

Contextual analysis goes further still, examining file paths, comments, commit messages, and surrounding code to determine how much weight to assign to a finding. Machine learning, trained on labeled examples, can pick up subtler signals that fixed rules miss (e.g., odd token shapes, unusual placements, and patterns in how a particular team tends to leak).

Consider what a layered scan might surface in a piece of Python code:

import openai
from twilio.rest import Client

# TODO: move these to env vars before merging
openai.api_key = "sk-proj-aB3xQ9pD7eL5kN2mR8tVfWxYzC1H4jK6oP3sU0vZ"
account_sid = "AC1234567890abcdef1234567890abcdef"
auth_token  = "your_auth_token_3a8f5e2d1c9b7a6e4d2f0b8a"

client = Client(account_sid, auth_token)

def notify(user_phone, msg):
    return client.messages.create(
        body=msg,
        from_="+15555550123",
        to=user_phone,
    )
import openai
from twilio.rest import Client

# TODO: move these to env vars before merging
openai.api_key = "sk-proj-aB3xQ9pD7eL5kN2mR8tVfWxYzC1H4jK6oP3sU0vZ"
account_sid = "AC1234567890abcdef1234567890abcdef"
auth_token  = "your_auth_token_3a8f5e2d1c9b7a6e4d2f0b8a"

client = Client(account_sid, auth_token)

def notify(user_phone, msg):
    return client.messages.create(
        body=msg,
        from_="+15555550123",
        to=user_phone,
    )
import openai
from twilio.rest import Client

# TODO: move these to env vars before merging
openai.api_key = "sk-proj-aB3xQ9pD7eL5kN2mR8tVfWxYzC1H4jK6oP3sU0vZ"
account_sid = "AC1234567890abcdef1234567890abcdef"
auth_token  = "your_auth_token_3a8f5e2d1c9b7a6e4d2f0b8a"

client = Client(account_sid, auth_token)

def notify(user_phone, msg):
    return client.messages.create(
        body=msg,
        from_="+15555550123",
        to=user_phone,
    )
import openai
from twilio.rest import Client

# TODO: move these to env vars before merging
openai.api_key = "sk-proj-aB3xQ9pD7eL5kN2mR8tVfWxYzC1H4jK6oP3sU0vZ"
account_sid = "AC1234567890abcdef1234567890abcdef"
auth_token  = "your_auth_token_3a8f5e2d1c9b7a6e4d2f0b8a"

client = Client(account_sid, auth_token)

def notify(user_phone, msg):
    return client.messages.create(
        body=msg,
        from_="+15555550123",
        to=user_phone,
    )

Three credentials in a few lines: an OpenAI API key with the sk-proj- prefix (regex match), a Twilio SID with its known shape (regex plus dictionary), and an auth token (entropy plus a variable name in the dictionary). The TODO comment is the kind of contextual signal that bumps confidence further — humans leave footprints when they intend to clean up later and forget.

Strong tools layer these methods together. Hybrid scanning runs regex, entropy, dictionaries, context, and ML in concert, using each method's strengths to compensate for the others' weaknesses. This is also where an obfuscated or split secret (a token reassembled from two variables, say, or a key encoded in base64) is more likely to be caught.

Some tools take a further step: live validation. After flagging a candidate, the scanner makes a safe call to the relevant provider to determine whether the credential is still active. The point isn't to confirm every hit; it's to triage. An active production AWS key with admin permissions belongs at the top of the queue; an expired test token from two years ago doesn't.

Where to scan for secrets?

Source code is the obvious starting point, but a program that stops there will miss most of the exposure surface. Scanning should reach all branches, not just the default one, and should examine the commit history because a secret removed from the current file may still exist in earlier commits, clones, and forks.

Configuration and deployment assets are the next priority. .env files, YAML configurations, Kubernetes manifests, Helm charts, Terraform, Dockerfiles, and CI/CD pipeline definitions all routinely accumulate credentials. Container images store them in layers that survive long after the source has been cleaned. A simple Kubernetes manifest is illustrative:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-service
spec:
  template:
    spec:
      containers:
      - name: orders
        image: acme/orders:1.4.2
        env:
        - name: DATABASE_URL
          value: "postgres://orders_app:[email protected]:5432/orders"
        - name: STRIPE_API_KEY
          value: "sk_live_4eC39HqLyjWDarjtT1zdp7dcEXAMPLE"
apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-service
spec:
  template:
    spec:
      containers:
      - name: orders
        image: acme/orders:1.4.2
        env:
        - name: DATABASE_URL
          value: "postgres://orders_app:[email protected]:5432/orders"
        - name: STRIPE_API_KEY
          value: "sk_live_4eC39HqLyjWDarjtT1zdp7dcEXAMPLE"
apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-service
spec:
  template:
    spec:
      containers:
      - name: orders
        image: acme/orders:1.4.2
        env:
        - name: DATABASE_URL
          value: "postgres://orders_app:[email protected]:5432/orders"
        - name: STRIPE_API_KEY
          value: "sk_live_4eC39HqLyjWDarjtT1zdp7dcEXAMPLE"
apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-service
spec:
  template:
    spec:
      containers:
      - name: orders
        image: acme/orders:1.4.2
        env:
        - name: DATABASE_URL
          value: "postgres://orders_app:[email protected]:5432/orders"
        - name: STRIPE_API_KEY
          value: "sk_live_4eC39HqLyjWDarjtT1zdp7dcEXAMPLE"

Two leaks in eight lines (a database connection string and a Stripe live key), both committed to a manifest that probably ends up in version control and a container registry. A scanner catches the Stripe key by its sk_live_ prefix and the database URL by its structure.

Beyond infrastructure, secrets show up in collaboration tools (e.g., Slack and Teams messages, Jira tickets, Confluence pages, support platforms) and in observability systems, where stack traces and debug logs sometimes include authorization headers or environment dumps. Artifact repositories, backups, and data stores can hold older builds and snapshots with credentials no one remembered to rotate.

Two scanning modes cover this terrain. At-rest scans focus on accumulated risk in static and historical locations; real-time scans examine new commits, pull requests, CI runs, tickets, and chat messages as they occur. Both matter: the historical baseline tells you where you stand, and the real-time controls keep things from getting worse.

Putting scanning into the development workflow

Secret scanning works best as a continuous control, not a quarterly audit; the earlier a credential is caught, the less it costs to remediate.

Pre-commit hooks run on a developer's machine and warn before a secret leaves the workstation. Push protection sits on the platform side and blocks pushes that contain high-confidence findings. Pull request checks flag (or block) credentials before they merge, often with inline comments. CI/CD integration scans builds, deployment scripts, and artifacts as part of the pipeline. IDE plugins can give feedback as code is being written.

Findings that get past those controls or surface from at-rest scans of historical material need a triage process. Categorizing by credential type, validity, location, age, permissions, environment, and business impact lets security teams route work efficiently. Automation handles routing; people handle ambiguous cases and recurring patterns that suggest something deeper than a one-off mistake.

Responding when a secret is found

The first question is what the credential actually grants. A read-only analytics token, an internal service password, a cloud admin key, and a long-expired test value sit at very different points on the risk curve. Identify the service, environment, permissions, and time of exposure, as well as whether the value was publicly visible or restricted to specific systems.

Then revoke or rotate. In most cases, rotation is the cleaner path: invalidate the old value, generate a new one, and update the consumers to use it. If the secret was ever public, assume it was copied. Removing the text from a file is not enough; the exposure has already happened.

Investigation runs in parallel. Logs, audit trails, and provider-side activity records can show whether the credential was used in unexpected ways, from unusual locations, or against unusual resources. If permissions were broad, the review needs to extend to downstream systems. Least privilege limits the blast radius, but only when the exposed credential had limited privileges to begin with.

Cleanup follows. Remove the value from code, configuration, documentation, logs, and anywhere else it appears. If it landed in Git history, rewriting history may be warranted, along with invalidating caches and communicating with anyone who may have cloned the repository. Then document the incident (what happened, when it was detected, what was done, what should change) for audit purposes and for the team's own learning.

Scanning is one layer; secrets management is the foundation

Worth restating because it often gets blurred: secret scanning is a detection layer, not a replacement for secret management. Credentials should live in dedicated secrets managers (e.g., HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, Google Secret Manager) and be retrieved at runtime via APIs, rather than committed alongside the code that uses them.

Environment variables are usually safer than hardcoded values, but they aren't a finish line. They can still leak through logs, error messages, debug endpoints, artifact dumps, and .env files that wander into commits. Local secrets files belong in .gitignore, and developers benefit from clear written guidance on safe local setup.

Least privilege is the other half of the picture. A credential should grant only what it needs, expire when it can, rotate on a schedule, and be logged when used. A policy that codifies these expectations (what counts as a secret, where scanning runs, how findings are triaged, who owns remediation, and which timelines apply by severity) turns ad hoc practice into something an organization can audit and improve.

Education closes the loop. Developers and platform engineers benefit from understanding why hardcoding is risky, how Git history actually works, and what to do when they slip up. Manual audits and penetration testing fill gaps that automated tools miss.

Limitations worth being honest about

Secret scanning produces false positives. Random-looking identifiers, hashes, sample data, and placeholder values get flagged. Too many false alarms lead to alert fatigue, which in turn leads to ignored alerts. It also produces false negatives: a scanner can miss a credential that's split across files, encoded, in an unfamiliar format, or buried in a data structure the parser doesn't understand well.

Historical exposure is harder still. A secret committed once and "removed" later still lives in commit history, clones, forks, backups, and build artifacts. That's why blocking a leak at the point of commit is worth more than detecting it afterward.

Accuracy is a balance of precision (most flagged values are real secrets) and recall (few real secrets are missed). Most organizations would rather investigate some false positives than miss a credential that grants production access. Tuning the rules to the organization's stack (its actual cloud providers, internal token formats, test data conventions) is what brings both numbers up at once.

Choosing a tool, and where it fits in AppSec

Four dimensions matter when picking a scanner: coverage, integration, detection quality, and reporting.

Coverage means scanning the places where secrets actually appear: repositories, IaC, CI/CD, containers, artifacts, collaboration tools, and logs. Integration means working with the source platforms, build systems, IDEs, and security tooling already in use, along with developer-friendly feedback paths (IDE warnings, CLI scans, PR comments, pre-commit hooks) and security-team workflows (dashboards, assignment, status, evidence, audit trails).

Detection quality comes from layered methods, customization for internal patterns, and, where appropriate, ML and live validation. Reporting should produce the artifacts a compliance audit will ask for and the metrics a security leader needs to track program health.

Secret scanning sits alongside other AppSec practices rather than replacing any of them. SAST looks for insecure coding patterns; SCA finds known vulnerabilities and license issues in dependencies; DAST and penetration testing assess running systems. Secret scanning addresses a separate class of risk—the credentials that connect software to everything around it—and the categories complement each other. For instance, a SAST tool may flag an insecure auth code in a service, while secret scanning finds the live token hardcoded in the same file.

Conclusion

Modern software depends on credentials, and credentials may end up in places they shouldn't. Secret scanning catches them—early when possible, historically when necessary—and feeds the findings into a process that revokes, rotates, investigates, and cleans up. It works best with broad coverage, real-time controls that block leaks before they spread, accurate multi-method detection, safe validation to prioritize active credentials, and a real secrets management foundation underneath. For a security or development leader, the question is whether the current setup catches credentials before they spread, whether it covers where leaks actually happen, and whether the organization has a safer place for those credentials to live.

Start your 21-day free trial

Discover the benefits of our Continuous Hacking solution, which organizations of all sizes are already enjoying.

Start your 21-day free trial

Discover the benefits of our Continuous Hacking solution, which organizations of all sizes are already enjoying.

Start your 21-day free trial

Discover the benefits of our Continuous Hacking solution, which organizations of all sizes are already enjoying.

Fluid Attacks' solutions enable organizations to identify, prioritize, and remediate vulnerabilities in their software throughout the SDLC. Supported by AI, automated tools, and pentesters, Fluid Attacks accelerates companies' risk exposure mitigation and strengthens their cybersecurity posture.

Get an AI summary of Fluid Attacks

Subscribe to our newsletter

Stay updated on our upcoming events and latest blog posts, advisories and other engaging resources.

© 2026 Fluid Attacks. We hack your software.

Fluid Attacks' solutions enable organizations to identify, prioritize, and remediate vulnerabilities in their software throughout the SDLC. Supported by AI, automated tools, and pentesters, Fluid Attacks accelerates companies' risk exposure mitigation and strengthens their cybersecurity posture.

Subscribe to our newsletter

Stay updated on our upcoming events and latest blog posts, advisories and other engaging resources.

Get an AI summary of Fluid Attacks

© 2026 Fluid Attacks. We hack your software.

Fluid Attacks' solutions enable organizations to identify, prioritize, and remediate vulnerabilities in their software throughout the SDLC. Supported by AI, automated tools, and pentesters, Fluid Attacks accelerates companies' risk exposure mitigation and strengthens their cybersecurity posture.

Subscribe to our newsletter

Stay updated on our upcoming events and latest blog posts, advisories and other engaging resources.

Get an AI summary of Fluid Attacks

© 2026 Fluid Attacks. We hack your software.