Ethical Hacking
Information Gathering Tools
The recon phase is only as good as the tools behind it. This lesson walks through the specific tools professionals reach for at each stage of information gathering — what each one does, when to use it, and what the output actually tells you.
Tools don't replace thinking — they accelerate it
A common mistake beginners make is treating recon tools like a magic button. Run the tool, get the answer, move on. That approach misses the point entirely. Tools surface raw data — it is the pen tester's job to interpret that data, spot the anomalies, connect the dots, and decide what is worth pursuing further.
A port scanner that finds 47 open services on a server is not useful on its own. A pen tester who looks at those 47 services, notices that one of them is running software three major versions behind, cross-references it against a CVE database, and realises it is exploitable without authentication — that is useful.
With that framing in place, here are the tools you will use most frequently during information gathering — all taught here strictly for authorised engagements.
The core toolkit — seven tools every pen tester knows
These are not obscure specialist tools. They appear in almost every professional engagement at some point. Some are passive, some are active — the table below maps each one so you know exactly what category it falls into before you reach for it.
| Tool | Primary use | Type | Needs auth? |
|---|---|---|---|
| theHarvester | Collect emails, subdomains, IPs and employee names from public sources | Passive | No |
| Shodan | Search engine for internet-facing devices — find open ports and services without scanning | Passive | No |
| Maltego | Visual intelligence mapping — connects people, domains, IPs and organisations into a relationship graph | Passive | No |
| Nmap | Port scanning, service detection and OS fingerprinting against live targets | Active | Yes |
| dig / nslookup | DNS record lookups — find A, MX, TXT, NS and CNAME records for a domain | Passive | No |
| Recon-ng | Modular recon framework — automates multiple intelligence gathering tasks in a structured workflow | Passive | No |
| WHOIS | Domain registration records — registrar, owner organisation, creation date, nameservers | Passive | No |
Six of the seven tools above are passive — they gather information without touching the target. Only Nmap is active. That ratio reflects how a real engagement is run. The majority of useful intelligence is gathered before a single active packet leaves your machine.
theHarvester — gathering emails and names from public sources
theHarvester is one of the first tools a pen tester runs. It queries search engines, LinkedIn, DNS records, and other public sources to pull together a list of email addresses, employee names, subdomains, and IP addresses associated with a target domain. All passive — it never contacts the target's systems directly.
The email addresses it finds are particularly valuable. Cross-reference them against a breach database and you may find that several employees are still using passwords that were leaked in a previous breach somewhere else — a credential stuffing opportunity that goes straight into the report.
The scenario: You are on day one of a black box engagement against a UK-based law firm. Your team lead asks you to run theHarvester to build an initial picture of the target's email footprint and any subdomains visible from public sources. Nothing in this step touches the firm's servers.
# theHarvester gathers public intelligence about a target domain
# It searches multiple sources simultaneously — no packets hit the target
# -d specifies the target domain to investigate
# -b specifies which data sources to search
# "all" tells it to query every available source at once
# You can also specify individual sources like google, bing, linkedin
# -l 200 limits the results to 200 entries so the output stays manageable
theHarvester -d targetlawfirm.co.uk -b all -l 200
******************************************************************* * _ _ __ _____ * | |_| |__ ___ /\ /\__ \_ _| * | __| _ \ / _ \/ \/ / _` || | * | |_| | | | __/ /\ / (_| || | * \__|_| |_|\___\_\ \/\/ \__,_||_| * * theHarvester 4.4.0 ******************************************************************* [*] Target: targetlawfirm.co.uk [*] Searching across: google, bing, linkedin, dnsdumpster, crtsh... [*] Emails found: 12 -------------------------------------------------- j.morrison@targetlawfirm.co.uk s.chen@targetlawfirm.co.uk admin@targetlawfirm.co.uk hr@targetlawfirm.co.uk info@targetlawfirm.co.uk [*] Subdomains found: 5 -------------------------------------------------- mail.targetlawfirm.co.uk portal.targetlawfirm.co.uk staging.targetlawfirm.co.uk vpn.targetlawfirm.co.uk www.targetlawfirm.co.uk [*] IPs found: 3 -------------------------------------------------- 89.44.12.201 104.21.14.82 185.220.101.9
Breaking it down:
The target domain. theHarvester builds all its searches around this — it will look for email addresses at this domain, subdomains under it, and IP addresses associated with it.
Tells theHarvester to search every available source — Google, Bing, LinkedIn, DNS databases, certificate logs. Using "all" is slower but more thorough. On time-limited engagements, you might specify only the most productive sources like google,linkedin,crtsh.
Generic role-based addresses like admin@ and hr@ are worth checking against breach databases immediately. They are often shared credentials — several people know the password — which makes them both more likely to have been leaked and harder to change without disrupting operations.
Appearing again — as it does in almost every engagement. A law firm's staging environment is even more sensitive than a retailer's, because it likely contains draft legal documents and client data being tested against a less-secured system.
Shodan — the search engine for internet-connected devices
Shodan is unlike any other search engine you have used. Where Google indexes websites, Shodan indexes devices — every server, router, webcam, industrial control system, and IoT device connected to the internet. It continuously scans the entire internet and stores what it finds: open ports, running services, software versions, and banner information.
For a pen tester, Shodan means you can often discover what software a target's server is running before making any direct contact with it. That is significant — it turns active scanning information into passive intelligence.
The scenario: theHarvester returned an IP address — 89.44.12.201 — associated with the law firm's VPN subdomain. Before running any active scans against it, you check Shodan to see if it has already indexed this IP and knows what is running on it.
# Shodan CLI tool — queries Shodan's database for a specific IP address
# This is completely passive — no packets reach 89.44.12.201
# Shodan already scanned this IP previously and stored the results
# We are just reading their index, not contacting the server ourselves
# shodan host performs a lookup on a specific IP address
# Replace the IP with the one you found during theHarvester scan
shodan host 89.44.12.201
89.44.12.201 City: London Country: United Kingdom Organisation: Fasthosts Internet Ltd Updated: 2024-11-03 Ports: 443/tcp — HTTPS 8443/tcp — Fortinet SSL-VPN 6.4.2 22/tcp — OpenSSH 7.4 Vulnerabilities: CVE-2022-40684 — Fortinet SSL-VPN auth bypass (CVSS 9.6) — CRITICAL CVE-2022-42475 — Fortinet SSL-VPN RCE (CVSS 9.8) — CRITICAL
Breaking it down:
Shodan identified the exact software and version running on this port from the banner it returned during a previous scan. That version number is everything — it is what you cross-reference against vulnerability databases.
A CVE (Common Vulnerabilities and Exposures) is a publicly documented security flaw with a unique ID. CVSS 9.6 out of 10 is critical severity. This specific CVE allows an attacker to bypass authentication on the VPN entirely — gaining access without any valid credentials. Found passively, before a single active scan.
SSH version 7.4 is significantly outdated — released in 2016. While not as immediately critical as the Fortinet finding, it suggests a pattern of neglected updates on this server that may extend to other services.
This is a critical finding. A CVSS 9.6 authentication bypass on the target's VPN gateway — found passively in under a minute, before the active phase even started. This goes straight into the report as a priority one finding and gets escalated to the client immediately rather than waiting until the end of the engagement.
dig — reading DNS records directly
dig stands for Domain Information Groper. It is a command-line tool for querying DNS records — the public database that maps domain names to IP addresses and other technical details. Every domain has a set of DNS records, and reading them carefully often reveals infrastructure details the company did not intend to expose.
DNS records come in several types. The most useful ones for recon are MX records (which mail server handles email for this domain), TXT records (which often contain SPF and DMARC security configurations), and NS records (which reveal the DNS provider). Each tells a different part of the story.
The scenario: You want to understand the law firm's email infrastructure before considering a phishing simulation. A dig query against their MX records will tell you exactly which mail server handles their incoming email — and whether it has any security configurations in place that might affect delivery of a test phishing email.
# dig queries the public DNS system for records about a domain
# This is passive — it asks public DNS servers, not the target's systems
# MX records tell us which mail server handles email for this domain
# The priority number before the server name controls which server is tried first
dig MX targetlawfirm.co.uk
# TXT records often contain SPF rules — which servers are allowed to send email
# as this domain. Missing or weak SPF makes phishing simulation easier.
dig TXT targetlawfirm.co.uk
# +short gives a cleaner, shorter output — useful when you just need the values
dig MX targetlawfirm.co.uk +short
;; ANSWER SECTION: targetlawfirm.co.uk. 300 IN MX 10 mail.targetlawfirm.co.uk. targetlawfirm.co.uk. 300 IN MX 20 mail2.targetlawfirm.co.uk. ;; ANSWER SECTION (TXT): targetlawfirm.co.uk. 300 IN TXT "v=spf1 include:mailgun.org ~all" targetlawfirm.co.uk. 300 IN TXT "MS=ms48291047" +short output: 10 mail.targetlawfirm.co.uk. 20 mail2.targetlawfirm.co.uk.
Breaking it down:
The number before the mail server is its priority. Lower number = higher priority. Email gets sent to the MX 10 server first. If that fails, it falls back to MX 20. Two mail servers means redundancy — and two targets to investigate.
This is the SPF record — it declares which servers are authorised to send email claiming to be from this domain. The ~all at the end means "soft fail" — emails from unauthorised servers are marked as suspicious but not rejected outright. A stricter -all would reject them entirely. This SPF configuration means phishing emails that fail the SPF check will still be delivered, just potentially flagged.
Without +short, dig returns the full DNS response including timing data, TTL values, and query metadata. With +short, you get just the answer — much cleaner when you need to quickly read a result or pipe it into another command.
Putting the tools together — a complete recon workflow
In practice, these tools are not used in isolation. A professional passive recon phase runs them in sequence — each one adding a layer to the picture. Here is how that workflow maps out from first command to completed intelligence profile.
WHOIS — domain registration and ownership
Start here. Get the domain age, registrar, nameservers, and organisation name. Sets the baseline for everything that follows.
theHarvester — emails, subdomains, and people
Run against the primary domain. Builds the email footprint and surfaces subdomains you may not have known existed.
dig — DNS records and email security posture
Pull MX, TXT, NS and A records. Check SPF and DMARC configuration. Note any missing or weak records.
Shodan — known services and vulnerabilities on discovered IPs
Take every IP address found so far and check it in Shodan. Look for exposed services, software versions, and any CVEs already flagged against them.
Compile the profile — prioritise and document
Bring everything together into a structured document. Prioritise findings by risk. Identify the highest-value targets for the active phase. Log every source and timestamp.
That workflow can be completed in two to three hours for a standard target. The output — a structured intelligence profile with prioritised findings — is what the active scanning phase is built on. A weak passive phase means a weak active phase.
Teacher's Note: Every tool in this lesson is pre-installed on Kali Linux — you will not need to set anything up manually. When you reach the lab setup lesson, you will have a working environment where you can run all of these commands against practice targets in a completely legal, isolated setting.
Practice questions
Scenario:
Scenario:
Scenario:
Quiz
Scenario:
Scenario:
Scenario:
Up Next · Lesson 9
Hacking Lab Setup
Build your own safe, legal practice environment from scratch — the exact setup used throughout the rest of this course.