Ethical Hacking Lesson 13 – Google Dorking | Dataplexa

Reconnaissance & Scanning · Lesson 13

Google Dorking

Search engines index the web indiscriminately. They do not distinguish between a company's polished homepage and a misconfigured server dumping its directory listing publicly. Google Dorking is the art of using search operators to surface the second kind — the things that were never supposed to be findable.

Indexed by accident, exposed by design

Every day, Google's crawlers visit billions of web pages and index everything they can read. They follow links, read directory listings, pull text out of uploaded documents, and log everything into a searchable database. The crawlers are not malicious — they are just thorough. The problem is that the things they index include content that administrators never intended to be public.

A developer who uploads a configuration file to a web server with a misconfigured directory listing. A company that puts a test login page on a publicly accessible subdomain. An IT team that leaves a network management interface reachable from the internet. In each case, Google's crawler finds it, indexes it, and from that moment it is searchable by anyone who knows the right query to use.

Google Dorking — formally called Google Hacking — is the practice of using advanced search operators to find exactly these kinds of exposures. It requires no special software, no network access to the target, and produces zero logs on any of the target's systems. It is entirely passive. The only interaction is between you and Google's search engine.

The operators that make it work

Google's standard search box accepts a set of operators — special keywords and syntax that modify what the search engine looks for and where. Most people have used at least one without knowing it (wrapping a phrase in quotes to search for the exact string). The operators that matter for reconnaissance go considerably further than that.

GOOGLE SEARCH OPERATORS — recon reference

Operator	Purpose	Example
site:	Restrict results to a specific domain or subdomain	site:targetcompany.com
filetype:	Search for specific file extensions — PDFs, spreadsheets, config files	filetype:pdf site:targetcompany.com
intitle:	Find pages with a specific word or phrase in the page title	intitle:"index of" site:targetcompany.com
inurl:	Find pages with a specific word or string in the URL	inurl:admin site:targetcompany.com
intext:	Search for a specific word or phrase within the body text of a page	intext:"password" filetype:log
cache:	View Google's cached version of a page — useful if the live page has changed or been taken down	cache:targetcompany.com/login
-	Exclude a term from results — useful for narrowing down noisy searches	site:targetcompany.com -www

The real power comes from combining these operators. A single operator finds things. Two or three chained together finds things that are specifically interesting to a pen tester — exposed admin panels on a specific domain, configuration files uploaded to a public directory, login pages that were never meant to be indexed.

Dorks that show up in real engagements

The security community maintains a public database of effective dorks called the Google Hacking Database — GHDB — which catalogues searches that have found genuinely sensitive exposures across real systems. Below are the categories of findings that come up most consistently in professional pen tests, along with the dork patterns used to find them.

Exposed directory listings

High value

When a web server has directory listing enabled and no index page exists in a folder, the server displays the folder contents like a file browser. Anything in that directory — configuration files, backups, scripts, database dumps — becomes downloadable by anyone who finds it.

intitle:"index of" site:targetcompany.com

Exposed login and admin panels

Critical

Admin panels, CMS login pages, and network management interfaces that are publicly accessible and indexed are extremely common findings. A WordPress admin login page exposed to the internet and showing up in Google search results is practically an invitation for a brute-force attack.

inurl:wp-admin site:targetcompany.com
inurl:/admin/login site:targetcompany.com

Sensitive file types indexed publicly

High value

Log files, SQL database exports, Excel spreadsheets with customer data, and environment configuration files containing API keys and database credentials all get uploaded to web servers and forgotten. If the directory is indexed, they are one search away from public exposure.

filetype:log site:targetcompany.com
filetype:sql site:targetcompany.com
filetype:env site:targetcompany.com

Credentials and keys in indexed content

Critical

Developers sometimes commit .env files, configuration files, or script output containing passwords, API keys, and connection strings to publicly accessible locations. When Google indexes a page containing the word "password" alongside a domain name, it often means something was uploaded that should not have been.

intext:"password" filetype:txt site:targetcompany.com
intext:"DB_PASSWORD" filetype:env

A real dork session — building queries step by step

The most effective dork sessions start broad and get narrower. Beginning with a wide site: query shows you everything Google has indexed for the target domain. From there, you layer additional operators to filter down to specific types of content. Each query builds on what the previous one revealed.

The scenario: You are in the passive recon phase of an engagement against a professional services firm. You have the domain confirmed in scope. Before running any active scans, you spend thirty minutes working through a structured dork session to see what Google has indexed from their infrastructure. Everything here is passive — no connection to the target, no logs generated anywhere except Google's own servers.

# These are Google search queries — type them directly into Google search
# They are not terminal commands — no tool required, no installation needed
# Everything runs inside a standard web browser, completely passively

# Step 1 — start broad: see everything Google has indexed for this domain
# This gives you the full picture of what is publicly accessible
site:targetfirm.com

# Step 2 — look for subdomains Google knows about that do not appear on the main site
# The minus sign excludes www results so you only see non-standard subdomains
site:targetfirm.com -www

# Step 3 — look for directory listings that expose server file contents
# "index of" in the page title is the signature of an open directory listing
intitle:"index of" site:targetfirm.com

# Step 4 — look for sensitive file types Google has indexed from this domain
# SQL files are particularly alarming — they may contain database exports
filetype:sql site:targetfirm.com

# Step 5 — look for admin and login panels indexed by Google
# inurl looks for the word "admin" anywhere in the page URL
inurl:admin site:targetfirm.com

site:targetfirm.com
--- About 2,840 results ---
www.targetfirm.com
careers.targetfirm.com
portal.targetfirm.com
mail.targetfirm.com

site:targetfirm.com -www
--- Subdomains found ---
staging.targetfirm.com
dev.targetfirm.com
old.targetfirm.com

intitle:"index of" site:targetfirm.com
--- 1 result ---
staging.targetfirm.com/uploads/
Index of /uploads — Apache/2.4.29

filetype:sql site:targetfirm.com
--- 1 result ---
staging.targetfirm.com/uploads/clients_backup_2023.sql

inurl:admin site:targetfirm.com
--- 2 results ---
portal.targetfirm.com/admin/login
old.targetfirm.com/administrator

Breaking it down:

staging.targetfirm.com appearing in three separate queries
The staging subdomain showed up when searching for non-www subdomains, when searching for directory listings, and when looking for exposed SQL files. Three separate signals pointing at the same host. When different query types all converge on the same target, that is where you start the active phase.

clients_backup_2023.sql
A database backup file sitting in a publicly accessible directory, indexed by Google. The filename alone tells you what is likely inside it — client records from 2023. This is an immediate critical finding. The client needs to be notified before the end of the business day, not at the end of the engagement.

old.targetfirm.com/administrator
An administrator panel on a subdomain called "old" — almost certainly a legacy system the organisation forgot was still running. Legacy systems tend to run outdated CMS versions, carry default credentials, and receive zero security patching. The word "administrator" in the URL suggests a Joomla CMS login page.

Apache/2.4.29 in the directory listing header
The server version is displayed in the directory listing page header — a classic information disclosure finding in its own right. Apache 2.4.29 was released in 2017. Cross-referencing against CVE databases would almost certainly surface multiple critical vulnerabilities for this version.

Five queries. Thirty minutes of work. One exposed database backup, two admin panels on forgotten subdomains, an outdated Apache version, and a staging environment that is effectively public. All of it passive. None of it requiring a single packet to reach the target's servers.

The clients_backup_2023.sql finding does not wait for the final report. It gets escalated immediately — a phone call to the engagement contact, not an email, because email takes time and that file is sitting publicly accessible right now. How you handle an in-engagement discovery like this is a mark of professional maturity.

The Google Hacking Database — a catalogue of proven dorks

The GHDB is maintained by Offensive Security and contains thousands of documented dorks that have been used to find real exposures across real systems. It is categorised by finding type — files containing passwords, exposed web server configurations, sensitive online shopping information, vulnerable servers — and each entry includes the exact search query used.

For a pen tester, the GHDB serves as a starting point rather than a complete playbook. The most effective approach is to browse the categories relevant to your target's technology stack, pull the dorks that match, and adapt them by adding a site: operator to focus the search on your specific target domain. Broad GHDB queries without a site: filter return results from random organisations worldwide and have no place in a professional engagement.

Dorking is passive — but acting on findings requires authorisation

This distinction is important and gets blurred by beginners. The search queries above are passive — they interact with Google, not with the target. Clicking through to an exposed admin panel, downloading an indexed SQL file, or attempting to log in with default credentials against a discovered login page — all of those cross into active territory and require explicit authorisation.

A common real-world mistake: a researcher finds an exposed directory listing through Google, clicks through to browse the files out of curiosity, and downloads one to verify its contents. That download is an unauthorised access of the target's system — even though Google surfaced it, even though the file was publicly reachable, and even though the researcher never exploited anything. Passive reconnaissance ends the moment you interact with any resource the target controls.

The rule: Dorking itself is passive. Visiting, downloading, or interacting with anything you find through dorking is active — and requires the same authorisation as any other active technique in your engagement. Document what you find through search queries, then raise it with the client before touching anything directly.

Teacher's Note: The clients_backup_2023.sql scenario above is not hypothetical — database backups left in public directories are a finding that comes up more often than it should. The moment you find something like that, the engagement timeline changes. Your job is to protect the client's data, not just document it for a report that arrives two weeks later.

Practice questions

Scenario:

During passive reconnaissance on a financial company, a pen tester wants to check whether Google has indexed any Excel spreadsheet files from the company's domain — the kind that might contain client financial records or internal salary information. They already have the site: operator in their query. Which additional Google search operator do they need to add to restrict results to .xlsx and .xls files specifically?

Scenario:

A pen tester wants to find web server directories that have been left open and are displaying their file contents publicly through Google's index. When a web server has directory listing enabled and no index page is present, the page title always contains a specific phrase that identifies it as an open directory. Which Google operator and search term combination targets exactly these pages?

Scenario:

While running a dork session during passive reconnaissance on day one of an engagement, a pen tester discovers a Google result linking to a file called employee_salaries_2024.xlsx sitting in an open directory on the target company's staging subdomain. The file appears to be publicly accessible. The active phase of the engagement has not yet started — the client is still setting up IP whitelisting. What is the correct immediate response?

Quiz

Up Next · Lesson 14

Nmap Scanning

The recon phase is done. Now the scanning begins — learning to use the most powerful and widely used network scanner in the field.

← Previous Course Index Next →