Ethical Hacking Lesson 6 – Footprinting & Reconnaissance | Dataplexa
Foundations & Hacking Mindset · Lesson 6

Footprinting & Reconnaissance

Before a single scan runs, a professional pen tester already knows a significant amount about their target. That knowledge doesn't come from guessing — it comes from a disciplined process of gathering publicly available information. This lesson covers exactly how that process works and why it matters so much.

Reconnaissance is the foundation everything else is built on

Think about how a skilled burglar operates. They don't walk up to a random house and try the door. They watch the neighbourhood first. They note when people leave for work, whether there's a dog, whether the side gate is locked, what kind of alarm system is visible. By the time they act, they already know more about that property than most of the people who live nearby.

Attackers work exactly the same way. The reconnaissance phase is where they build their picture of the target — what systems are running, who the key people are, what technologies the company uses, and where the soft spots in the perimeter might be. All of this happens before any attack tool touches the target's infrastructure.

Footprinting vs reconnaissance — the distinction worth knowing

These two terms are often used interchangeably and in most practical contexts that is fine. But strictly speaking, they describe slightly different activities. Footprinting is the broader process of mapping out an organisation's digital presence — its IP ranges, domains, technologies, and people. Reconnaissance is the active application of that footprinting to build an operational picture before an attack.

In practice, you will hear both used to describe the same first phase of an engagement. What matters is understanding the split between passive and active approaches — because that split has significant legal and operational implications.

Passive vs active — two very different risk levels

Every piece of information gathered during reconnaissance comes from one of two categories. Understanding the difference between them is critical — not just technically, but legally.

PASSIVE vs ACTIVE RECONNAISSANCE
PASSIVE

You gather information without making any direct contact with the target's systems. You are reading data that is already publicly available — anyone with an internet connection could access the same information.

Examples: WHOIS lookups, searching LinkedIn, reading job postings, checking Google, reviewing the company's public website, searching historical data in archives.

Risk level: Zero. No packets reach the target. No logs are created on their systems. Completely undetectable.

ACTIVE

You interact directly with the target's systems to gather information. Packets are sent, responses are received, and your activity may appear in the target's logs. This requires explicit authorisation before you begin.

Examples: Port scanning, DNS enumeration against live servers, banner grabbing, sending probes to discover open services.

Risk level: Detectable. Your IP address may appear in the target's logs. Without written authorisation, this is illegal.

Professional pen testers always start with passive reconnaissance — even when they have full written authorisation for active testing. Passive first means you go into the active phase with a much clearer picture of what you are looking at, which makes the active scanning more efficient and less noisy.

The information categories a pen tester builds during recon

Reconnaissance isn't random browsing. A skilled pen tester works through specific categories of information, building a complete picture of the target's digital footprint. Here is what that picture actually contains by the time a proper recon phase is complete.

Network information

IP address ranges owned by the company, DNS records, mail server details, network blocks registered under the organisation's name. This tells you the size of the target's internet-facing footprint before a single scan runs.

Technology stack

The web server software, CMS platforms, programming languages, and cloud providers in use. Often revealed by HTTP response headers, job postings mentioning specific skills, and technology detection tools that read publicly visible signatures.

People and organisational structure

Key employees, their roles, email formats, and which ones have high system access. LinkedIn is a goldmine for this. A CFO or IT administrator with a weak password is a far more valuable target than a junior marketing intern.

Domains and subdomains

All domains and subdomains registered to the organisation. Companies often forget about old subdomains — staging environments, developer portals, legacy applications — that are still accessible and often far less secured than the main site.

Leaked credentials and data

Email addresses and passwords exposed in previous data breaches. Tools and services allow testers to check whether company email addresses appear in known breach databases — a fast way to find accounts that may still be using compromised passwords.

Physical and geographic details

Office locations, data centre providers, and physical security setup. For a full red team engagement that includes physical access testing, knowing where the building is, how many entrances it has, and whether there is visible security infrastructure is part of the intelligence picture.

A recon profile — what it looks like when it comes together

All of that information gets compiled into a target profile before the active phase begins. Here is a realistic example of what a pen tester's recon profile looks like for a mid-sized e-commerce company at the end of the passive reconnaissance phase.

TARGET RECON PROFILE — Passive Phase Complete Passive only — no active contact made
Organisation Meridian Retail Ltd. — UK-based e-commerce, ~200 employees
Primary domain meridianretail.co.uk — registered 2011, expires 2026, registrar: Namecheap
Subdomains found shop.meridianretail.co.uk  |  staging.meridianretail.co.uk  |  admin.meridianretail.co.uk
Technology stack WordPress 6.1 (confirmed via response headers)  |  PHP  |  nginx 1.18  |  Cloudflare CDN
Email format firstname.lastname@meridianretail.co.uk (confirmed via LinkedIn)
Key personnel IT Manager: James Hollis  |  CTO: Sarah Meade  |  DevOps Engineer: Tom Barker
Breach exposure 14 company email addresses found in breach databases
Priority findings Staging subdomain appears publicly accessible with no authentication. WordPress version is 3 major releases behind. Two IT staff emails found in 2022 LinkedIn breach.

Notice the staging subdomain highlighted in red. That is a classic finding from passive recon — a forgotten environment that developers built for testing purposes, left publicly accessible, and almost certainly running older and less hardened software than the production site. This kind of discovery happens in passive reconnaissance, before a single active scan has run.

The breach exposure finding is equally significant. Fourteen email addresses from breach databases means fourteen potential credential stuffing attempts — checking whether those leaked passwords still work on the company's login pages. Again, discovered passively, before the active phase begins.

OSINT — the discipline behind passive reconnaissance

The formal name for gathering publicly available information is OSINT — Open Source Intelligence. It is a field in its own right, used by security researchers, journalists, law enforcement, and pen testers alike. The "open source" part refers to the fact that the information is publicly available — not that it involves open source software.

OSINT practitioners develop a methodical approach to searching — knowing which sources to check first, how to cross-reference information between sources, and how to avoid confirmation bias when building a target profile. The tools you will use later in this course — Shodan, theHarvester, Maltego, and others — are all OSINT tools at their core. But the methodology matters more than the tool. A tester with no tool but a sharp methodology will outperform a tester with every tool but no process.

Sources pen testers check during passive recon

Public records and databases
WHOIS, ARIN/RIPE, Companies House, SEC filings

Search engines
Google, Bing, and specialised search operators that surface hidden pages and exposed files

Social media
LinkedIn, Twitter/X, GitHub — employees often reveal technology stacks and internal tooling without realising it

Historical data
Wayback Machine archives of old website versions that may reveal removed but still-accessible content

Breach databases
Have I Been Pwned and similar services to check corporate email exposure

Job postings
Surprisingly revealing — a posting for a "Senior AWS Engineer" tells you the company runs AWS infrastructure

Recon in the context of a live engagement

On a real engagement, reconnaissance doesn't stop after the passive phase. As the test progresses and you gain more access, new information surfaces — internal hostnames, internal IP ranges, employee credentials — that feeds back into your understanding of the target. Recon is an ongoing process throughout the engagement, not a box you tick on day one.

The scenario: You have been engaged to test a regional logistics company. It is day one and your active scanning phase doesn't start until tomorrow — the client's IT team needs to whitelist your IP address first. In the meantime, your team lead asks you to spend today on passive recon and build a target profile. You are working entirely from public sources. Nothing touches the client's systems today.

# whois is a command that looks up public registration details for a domain
# It queries a public registry database — NOT the target's own servers
# This means zero risk of detection — no packets ever reach the target
# You can run this on any public domain without any authorisation

whois targetlogistics.com
# Replace targetlogistics.com with the actual domain you are investigating

Reading this output — three immediate findings:

Creation Date: 2008 — A 16-year-old domain almost always means legacy infrastructure. Systems built that long ago often run software that hasn't been updated in years. Flag it before you even scan a single port.

Name Server: CLOUDFLARE — The real origin server is hidden behind Cloudflare's proxy. If you scan the domain's public IP, you will hit Cloudflare's infrastructure — not the actual target server. Finding the real IP is a separate task you will need to complete before scanning begins.

DNSSEC: unsigned — This means DNS responses for the domain carry no cryptographic signature. An attacker positioned between a user and their DNS resolver could potentially forge responses and redirect traffic to a fake version of the site. Low severity, but it goes in the report.

That is three useful findings from a single passive command that took under a second to run. Now check what subdomains are publicly visible.

# crt.sh is a public certificate transparency log
# Every time an SSL certificate is issued for a domain, it gets logged here publicly
# We query it to find every subdomain that has ever had a certificate — completely passive

# curl fetches the crt.sh page and returns results in JSON format
# -s means silent mode — suppresses progress output so only data is returned
# &output=json tells crt.sh to return structured data instead of a webpage
curl -s "https://crt.sh/?q=targetlogistics.com&output=json" \

  # Pipe the JSON output into Python for processing
  | python3 -c "

import sys, json                          # import tools for reading input and parsing JSON
data = json.load(sys.stdin)               # load the JSON data from curl's output
names = sorted(set(d['name_value'] for d in data))  # extract unique subdomain names and sort them
for n in names:                           # loop through each unique subdomain
    print(n)                              # print it to the screen, one per line
"

Reading this output — seven subdomains returned from a public certificate log:

Each line is a subdomain that has had an SSL certificate issued for it at some point. This list came entirely from a public registry — not from any contact with the company's servers. Three of these immediately stand out: vpn (a VPN gateway — high value target), portal (likely an employee login page), and staging (a development environment that is almost certainly less secured than production). The priority table below maps these findings into an action plan for the active phase.

Those seven subdomains are raw data. The next step is turning them into a prioritised list — deciding which ones are worth investigating further in the active phase and why. This is what that analysis looks like written up as a working document during the engagement.

RECON FINDINGS — Day 1 Passive Phase
Subdomain Interest level Reason
www.targetlogistics.com Low Main public website — likely hardened and monitored
mail.targetlogistics.com Medium Mail server — check for exposed OWA or webmail login
vpn.targetlogistics.com High VPN gateway — check software version and authentication method
portal.targetlogistics.com High Employee or customer portal — likely contains authenticated functionality
staging.targetlogistics.com Critical Staging environment — often publicly accessible, unpatched, no WAF

Breaking it down:

Certificate transparency logs (crt.sh)
Every SSL certificate issued for a domain is publicly logged. This was introduced as a security measure to detect rogue certificates — but it also means pen testers can query it to find every subdomain that has ever had a certificate issued. Completely passive, completely legal, and remarkably effective.
The staging subdomain
Staging environments are built for developers to test changes before they go live. They are almost always running older software, skipping security controls like WAFs, and accessible to anyone who knows the URL. Finding one in passive recon is one of the most common high-value discoveries a pen tester makes.
The VPN subdomain
A publicly visible VPN gateway tells you the company allows remote access. The next question — answered in the active phase — is what software is running and what version. VPN software has historically been a rich source of critical vulnerabilities.

Teacher's Note: The amount of information available about any organisation through passive reconnaissance alone is consistently surprising — even to experienced pen testers. Always complete a thorough passive phase before any active scanning. The picture you build here directly determines the quality of everything that follows.

Practice questions

Scenario:

A pen tester spends the first day of an engagement searching LinkedIn for employee names, reading the company's public job postings to identify their technology stack, looking up their domain registration details on a public WHOIS service, and checking their corporate email addresses against a public breach database. No tools are run against the company's servers. No packets are sent to their infrastructure. What type of reconnaissance is this?


Scenario:

During passive recon on a target company, a pen tester queries a public service that records every SSL certificate ever issued for a domain. The query returns seven subdomains — including a staging environment and a VPN gateway — that were not visible on the company's public website. No request was made to any of the company's own servers. What public data source did the tester use to discover these subdomains?


Scenario:

A senior pen tester tells a junior colleague: "Before we start scanning, I want you to spend two hours doing some solid gathering on this target — check their company registration, search their socials, look up their DNS records, find their employee email format, and see if anything comes up in breach databases. All public sources only." The junior asks what this discipline is formally called. What is the answer?


Quiz

Scenario:

A pen tester with written authorisation for an upcoming engagement runs a query against crt.sh to discover subdomains, checks the company's WHOIS record, and searches LinkedIn for employee names — all before the engagement officially starts tomorrow. A colleague questions whether this is allowed without the active testing phase beginning. Is this activity within bounds, and what type of reconnaissance is it?

Scenario:

Passive recon on a retail company surfaces four subdomains: www, mail, api, and staging. Your team needs to prioritise which one to investigate first during the active phase tomorrow. Based purely on what these subdomains typically represent in terms of security posture, which one should be at the top of your list and why?

Scenario:

A pen tester is building a passive recon profile on a target company. They have checked WHOIS, LinkedIn, and certificate transparency logs. Their team lead suggests one more source that regularly reveals the company's internal technology stack, cloud providers, and infrastructure tools — without the company even realising they are sharing it. Which source is the team lead referring to?

Up Next · Lesson 7

Passive vs Active Reconnaissance

A deeper look at the tools and techniques used in both phases — and when switching from passive to active changes everything.