Ethical Hacking Lesson 6 – Footprinting & Reconnaissance | Dataplexa

Foundations & Hacking Mindset · Lesson 6

Footprinting & Reconnaissance

Before a single scan runs, a professional pen tester already knows a significant amount about their target. That knowledge doesn't come from guessing — it comes from a disciplined process of gathering publicly available information. This lesson covers exactly how that process works and why it matters so much.

Reconnaissance is the foundation everything else is built on

Think about how a skilled burglar operates. They don't walk up to a random house and try the door. They watch the neighbourhood first. They note when people leave for work, whether there's a dog, whether the side gate is locked, what kind of alarm system is visible. By the time they act, they already know more about that property than most of the people who live nearby.

Attackers work exactly the same way. The reconnaissance phase is where they build their picture of the target — what systems are running, who the key people are, what technologies the company uses, and where the soft spots in the perimeter might be. All of this happens before any attack tool touches the target's infrastructure.

For an ethical hacker, reconnaissance serves the same purpose. A thorough recon phase means your scanning is focused, your exploitation attempts are targeted, and your report covers the actual attack surface — not just the parts that were easy to find. Rushing past it is one of the most common mistakes junior pen testers make, and it consistently leads to missed vulnerabilities.

Footprinting vs reconnaissance — the distinction worth knowing

These two terms are often used interchangeably and in most practical contexts that is fine. But strictly speaking, they describe slightly different activities. Footprinting is the broader process of mapping out an organisation's digital presence — its IP ranges, domains, technologies, and people. Reconnaissance is the active application of that footprinting to build an operational picture before an attack.

In practice, you will hear both used to describe the same first phase of an engagement. What matters is understanding the split between passive and active approaches — because that split has significant legal and operational implications.

Passive vs active — two very different risk levels

Every piece of information gathered during reconnaissance comes from one of two categories. Understanding the difference between them is critical — not just technically, but legally.

PASSIVE vs ACTIVE RECONNAISSANCE

PASSIVE

You gather information without making any direct contact with the target's systems. You are reading data that is already publicly available — anyone with an internet connection could access the same information.

Examples: WHOIS lookups, searching LinkedIn, reading job postings, checking Google, reviewing the company's public website, searching historical data in archives.

Risk level: Zero. No packets reach the target. No logs are created on their systems. Completely undetectable.

ACTIVE

You interact directly with the target's systems to gather information. Packets are sent, responses are received, and your activity may appear in the target's logs. This requires explicit authorisation before you begin.

Examples: Port scanning, DNS enumeration against live servers, banner grabbing, sending probes to discover open services.

Risk level: Detectable. Your IP address may appear in the target's logs. Without written authorisation, this is illegal.

Professional pen testers always start with passive reconnaissance — even when they have full written authorisation for active testing. Passive first means you go into the active phase with a much clearer picture of what you are looking at, which makes the active scanning more efficient and less noisy.

The information categories a pen tester builds during recon

Reconnaissance isn't random browsing. A skilled pen tester works through specific categories of information, building a complete picture of the target's digital footprint. Here is what that picture actually contains by the time a proper recon phase is complete.

Network information

IP address ranges owned by the company, DNS records, mail server details, network blocks registered under the organisation's name. This tells you the size of the target's internet-facing footprint before a single scan runs.

Technology stack

The web server software, CMS platforms, programming languages, and cloud providers in use. Often revealed by HTTP response headers, job postings mentioning specific skills, and technology detection tools that read publicly visible signatures.

People and organisational structure

Key employees, their roles, email formats, and which ones have high system access. LinkedIn is a goldmine for this. A CFO or IT administrator with a weak password is a far more valuable target than a junior marketing intern.

Domains and subdomains

All domains and subdomains registered to the organisation. Companies often forget about old subdomains — staging environments, developer portals, legacy applications — that are still accessible and often far less secured than the main site.

Leaked credentials and data

Email addresses and passwords exposed in previous data breaches. Tools and services allow testers to check whether company email addresses appear in known breach databases — a fast way to find accounts that may still be using compromised passwords.

Physical and geographic details

Office locations, data centre providers, and physical security setup. For a full red team engagement that includes physical access testing, knowing where the building is, how many entrances it has, and whether there is visible security infrastructure is part of the intelligence picture.

A recon profile — what it looks like when it comes together

All of that information gets compiled into a target profile before the active phase begins. Here is a realistic example of what a pen tester's recon profile looks like for a mid-sized e-commerce company at the end of the passive reconnaissance phase.

TARGET RECON PROFILE — Passive Phase Complete Passive only — no active contact made

Organisation	Meridian Retail Ltd. — UK-based e-commerce, ~200 employees
Primary domain	meridianretail.co.uk — registered 2011, expires 2026, registrar: Namecheap
Subdomains found	shop.meridianretail.co.uk \| staging.meridianretail.co.uk \| admin.meridianretail.co.uk
Technology stack	WordPress 6.1 (confirmed via response headers) \| PHP \| nginx 1.18 \| Cloudflare CDN
Email format	firstname.lastname@meridianretail.co.uk (confirmed via LinkedIn)
Key personnel	IT Manager: James Hollis \| CTO: Sarah Meade \| DevOps Engineer: Tom Barker
Breach exposure	14 company email addresses found in breach databases
Priority findings	Staging subdomain appears publicly accessible with no authentication. WordPress version is 3 major releases behind. Two IT staff emails found in 2022 LinkedIn breach.

Notice the staging subdomain highlighted in red. That is a classic finding from passive recon — a forgotten environment that developers built for testing purposes, left publicly accessible, and almost certainly running older and less hardened software than the production site. This kind of discovery happens in passive reconnaissance, before a single active scan has run.

The breach exposure finding is equally significant. Fourteen email addresses from breach databases means fourteen potential credential stuffing attempts — checking whether those leaked passwords still work on the company's login pages. Again, discovered passively, before the active phase begins.

OSINT — the discipline behind passive reconnaissance

The formal name for gathering publicly available information is OSINT — Open Source Intelligence. It is a field in its own right, used by security researchers, journalists, law enforcement, and pen testers alike. The "open source" part refers to the fact that the information is publicly available — not that it involves open source software.

OSINT practitioners develop a methodical approach to searching — knowing which sources to check first, how to cross-reference information between sources, and how to avoid confirmation bias when building a target profile. The tools you will use later in this course — Shodan, theHarvester, Maltego, and others — are all OSINT tools at their core. But the methodology matters more than the tool. A tester with no tool but a sharp methodology will outperform a tester with every tool but no process.

Sources pen testers check during passive recon

Public records and databases
WHOIS, ARIN/RIPE, Companies House, SEC filings

Search engines
Google, Bing, and specialised search operators that surface hidden pages and exposed files

Social media
LinkedIn, Twitter/X, GitHub — employees often reveal technology stacks and internal tooling without realising it

Historical data
Wayback Machine archives of old website versions that may reveal removed but still-accessible content

Breach databases
Have I Been Pwned and similar services to check corporate email exposure

Job postings
Surprisingly revealing — a posting for a "Senior AWS Engineer" tells you the company runs AWS infrastructure

Recon in the context of a live engagement

On a real engagement, reconnaissance doesn't stop after the passive phase. As the test progresses and you gain more access, new information surfaces — internal hostnames, internal IP ranges, employee credentials — that feeds back into your understanding of the target. Recon is an ongoing process throughout the engagement, not a box you tick on day one.

The scenario: You have been engaged to test a regional logistics company. It is day one and your active scanning phase doesn't start until tomorrow — the client's IT team needs to whitelist your IP address first. In the meantime, your team lead asks you to spend today on passive recon and build a target profile. You are working entirely from public sources. Nothing touches the client's systems today.

# whois is a command that looks up public registration details for a domain
# It queries a public registry database — NOT the target's own servers
# This means zero risk of detection — no packets ever reach the target
# You can run this on any public domain without any authorisation

whois targetlogistics.com
# Replace targetlogistics.com with the actual domain you are investigating

Domain Name: TARGETLOGISTICS.COM
Registrar: GoDaddy.com, LLC
Creation Date: 2008-03-12T09:14:22Z
Expiry Date: 2026-03-12T09:14:22Z
Registrant Organization: Target Logistics Ltd.
Registrant Country: GB
Name Server: NS1.CLOUDFLARE.COM
Name Server: NS2.CLOUDFLARE.COM
DNSSEC: unsigned

Reading this output — three immediate findings:

Creation Date: 2008 — A 16-year-old domain almost always means legacy infrastructure. Systems built that long ago often run software that hasn't been updated in years. Flag it before you even scan a single port.

Name Server: CLOUDFLARE — The real origin server is hidden behind Cloudflare's proxy. If you scan the domain's public IP, you will hit Cloudflare's infrastructure — not the actual target server. Finding the real IP is a separate task you will need to complete before scanning begins.

DNSSEC: unsigned — This means DNS responses for the domain carry no cryptographic signature. An attacker positioned between a user and their DNS resolver could potentially forge responses and redirect traffic to a fake version of the site. Low severity, but it goes in the report.

That is three useful findings from a single passive command that took under a second to run. Now check what subdomains are publicly visible.

# crt.sh is a public certificate transparency log
# Every time an SSL certificate is issued for a domain, it gets logged here publicly
# We query it to find every subdomain that has ever had a certificate — completely passive

# curl fetches the crt.sh page and returns results in JSON format
# -s means silent mode — suppresses progress output so only data is returned
# &output=json tells crt.sh to return structured data instead of a webpage
curl -s "https://crt.sh/?q=targetlogistics.com&output=json" \

  # Pipe the JSON output into Python for processing
  | python3 -c "

import sys, json                          # import tools for reading input and parsing JSON
data = json.load(sys.stdin)               # load the JSON data from curl's output
names = sorted(set(d['name_value'] for d in data))  # extract unique subdomain names and sort them
for n in names:                           # loop through each unique subdomain
    print(n)                              # print it to the screen, one per line
"

mail.targetlogistics.com
portal.targetlogistics.com
staging.targetlogistics.com
targetlogistics.com
track.targetlogistics.com
vpn.targetlogistics.com
www.targetlogistics.com

Reading this output — seven subdomains returned from a public certificate log:

Each line is a subdomain that has had an SSL certificate issued for it at some point. This list came entirely from a public registry — not from any contact with the company's servers. Three of these immediately stand out: vpn (a VPN gateway — high value target), portal (likely an employee login page), and staging (a development environment that is almost certainly less secured than production). The priority table below maps these findings into an action plan for the active phase.

Those seven subdomains are raw data. The next step is turning them into a prioritised list — deciding which ones are worth investigating further in the active phase and why. This is what that analysis looks like written up as a working document during the engagement.

RECON FINDINGS — Day 1 Passive Phase

Subdomain	Interest level	Reason
www.targetlogistics.com	Low	Main public website — likely hardened and monitored
mail.targetlogistics.com	Medium	Mail server — check for exposed OWA or webmail login
vpn.targetlogistics.com	High	VPN gateway — check software version and authentication method
portal.targetlogistics.com	High	Employee or customer portal — likely contains authenticated functionality
staging.targetlogistics.com	Critical	Staging environment — often publicly accessible, unpatched, no WAF

Breaking it down:

Certificate transparency logs (crt.sh)
Every SSL certificate issued for a domain is publicly logged. This was introduced as a security measure to detect rogue certificates — but it also means pen testers can query it to find every subdomain that has ever had a certificate issued. Completely passive, completely legal, and remarkably effective.

The staging subdomain
Staging environments are built for developers to test changes before they go live. They are almost always running older software, skipping security controls like WAFs, and accessible to anyone who knows the URL. Finding one in passive recon is one of the most common high-value discoveries a pen tester makes.

The VPN subdomain
A publicly visible VPN gateway tells you the company allows remote access. The next question — answered in the active phase — is what software is running and what version. VPN software has historically been a rich source of critical vulnerabilities.

Teacher's Note: The amount of information available about any organisation through passive reconnaissance alone is consistently surprising — even to experienced pen testers. Always complete a thorough passive phase before any active scanning. The picture you build here directly determines the quality of everything that follows.

Practice questions

Scenario:

A pen tester spends the first day of an engagement searching LinkedIn for employee names, reading the company's public job postings to identify their technology stack, looking up their domain registration details on a public WHOIS service, and checking their corporate email addresses against a public breach database. No tools are run against the company's servers. No packets are sent to their infrastructure. What type of reconnaissance is this?

Scenario:

During passive recon on a target company, a pen tester queries a public service that records every SSL certificate ever issued for a domain. The query returns seven subdomains — including a staging environment and a VPN gateway — that were not visible on the company's public website. No request was made to any of the company's own servers. What public data source did the tester use to discover these subdomains?

Scenario:

A senior pen tester tells a junior colleague: "Before we start scanning, I want you to spend two hours doing some solid gathering on this target — check their company registration, search their socials, look up their DNS records, find their employee email format, and see if anything comes up in breach databases. All public sources only." The junior asks what this discipline is formally called. What is the answer?

Quiz

Up Next · Lesson 7

Passive vs Active Reconnaissance

A deeper look at the tools and techniques used in both phases — and when switching from passive to active changes everything.

← Previous Course Index Next →