Ethical Hacking
Google Dorking
Search engines index the web indiscriminately. They do not distinguish between a company's polished homepage and a misconfigured server dumping its directory listing publicly. Google Dorking is the art of using search operators to surface the second kind — the things that were never supposed to be findable.
Indexed by accident, exposed by design
Every day, Google's crawlers visit billions of web pages and index everything they can read. They follow links, read directory listings, pull text out of uploaded documents, and log everything into a searchable database. The crawlers are not malicious — they are just thorough. The problem is that the things they index include content that administrators never intended to be public.
A developer who uploads a configuration file to a web server with a misconfigured directory listing. A company that puts a test login page on a publicly accessible subdomain. An IT team that leaves a network management interface reachable from the internet. In each case, Google's crawler finds it, indexes it, and from that moment it is searchable by anyone who knows the right query to use.
Google Dorking — formally called Google Hacking — is the practice of using advanced search operators to find exactly these kinds of exposures. It requires no special software, no network access to the target, and produces zero logs on any of the target's systems. It is entirely passive. The only interaction is between you and Google's search engine.
The operators that make it work
Google's standard search box accepts a set of operators — special keywords and syntax that modify what the search engine looks for and where. Most people have used at least one without knowing it (wrapping a phrase in quotes to search for the exact string). The operators that matter for reconnaissance go considerably further than that.
| Operator | Purpose | Example |
|---|---|---|
| site: | Restrict results to a specific domain or subdomain | site:targetcompany.com |
| filetype: | Search for specific file extensions — PDFs, spreadsheets, config files | filetype:pdf site:targetcompany.com |
| intitle: | Find pages with a specific word or phrase in the page title | intitle:"index of" site:targetcompany.com |
| inurl: | Find pages with a specific word or string in the URL | inurl:admin site:targetcompany.com |
| intext: | Search for a specific word or phrase within the body text of a page | intext:"password" filetype:log |
| cache: | View Google's cached version of a page — useful if the live page has changed or been taken down | cache:targetcompany.com/login |
| - | Exclude a term from results — useful for narrowing down noisy searches | site:targetcompany.com -www |
The real power comes from combining these operators. A single operator finds things. Two or three chained together finds things that are specifically interesting to a pen tester — exposed admin panels on a specific domain, configuration files uploaded to a public directory, login pages that were never meant to be indexed.
Dorks that show up in real engagements
The security community maintains a public database of effective dorks called the Google Hacking Database — GHDB — which catalogues searches that have found genuinely sensitive exposures across real systems. Below are the categories of findings that come up most consistently in professional pen tests, along with the dork patterns used to find them.
Exposed directory listings
High valueWhen a web server has directory listing enabled and no index page exists in a folder, the server displays the folder contents like a file browser. Anything in that directory — configuration files, backups, scripts, database dumps — becomes downloadable by anyone who finds it.
Exposed login and admin panels
CriticalAdmin panels, CMS login pages, and network management interfaces that are publicly accessible and indexed are extremely common findings. A WordPress admin login page exposed to the internet and showing up in Google search results is practically an invitation for a brute-force attack.
inurl:/admin/login site:targetcompany.com
Sensitive file types indexed publicly
High valueLog files, SQL database exports, Excel spreadsheets with customer data, and environment configuration files containing API keys and database credentials all get uploaded to web servers and forgotten. If the directory is indexed, they are one search away from public exposure.
filetype:sql site:targetcompany.com
filetype:env site:targetcompany.com
Credentials and keys in indexed content
CriticalDevelopers sometimes commit .env files, configuration files, or script output containing passwords, API keys, and connection strings to publicly accessible locations. When Google indexes a page containing the word "password" alongside a domain name, it often means something was uploaded that should not have been.
intext:"DB_PASSWORD" filetype:env
A real dork session — building queries step by step
The most effective dork sessions start broad and get narrower. Beginning with a wide site: query shows you everything Google has indexed for the target domain. From there, you layer additional operators to filter down to specific types of content. Each query builds on what the previous one revealed.
The scenario: You are in the passive recon phase of an engagement against a professional services firm. You have the domain confirmed in scope. Before running any active scans, you spend thirty minutes working through a structured dork session to see what Google has indexed from their infrastructure. Everything here is passive — no connection to the target, no logs generated anywhere except Google's own servers.
# These are Google search queries — type them directly into Google search
# They are not terminal commands — no tool required, no installation needed
# Everything runs inside a standard web browser, completely passively
# Step 1 — start broad: see everything Google has indexed for this domain
# This gives you the full picture of what is publicly accessible
site:targetfirm.com
# Step 2 — look for subdomains Google knows about that do not appear on the main site
# The minus sign excludes www results so you only see non-standard subdomains
site:targetfirm.com -www
# Step 3 — look for directory listings that expose server file contents
# "index of" in the page title is the signature of an open directory listing
intitle:"index of" site:targetfirm.com
# Step 4 — look for sensitive file types Google has indexed from this domain
# SQL files are particularly alarming — they may contain database exports
filetype:sql site:targetfirm.com
# Step 5 — look for admin and login panels indexed by Google
# inurl looks for the word "admin" anywhere in the page URL
inurl:admin site:targetfirm.com
site:targetfirm.com --- About 2,840 results --- www.targetfirm.com careers.targetfirm.com portal.targetfirm.com mail.targetfirm.com site:targetfirm.com -www --- Subdomains found --- staging.targetfirm.com dev.targetfirm.com old.targetfirm.com intitle:"index of" site:targetfirm.com --- 1 result --- staging.targetfirm.com/uploads/ Index of /uploads — Apache/2.4.29 filetype:sql site:targetfirm.com --- 1 result --- staging.targetfirm.com/uploads/clients_backup_2023.sql inurl:admin site:targetfirm.com --- 2 results --- portal.targetfirm.com/admin/login old.targetfirm.com/administrator
Breaking it down:
The staging subdomain showed up when searching for non-www subdomains, when searching for directory listings, and when looking for exposed SQL files. Three separate signals pointing at the same host. When different query types all converge on the same target, that is where you start the active phase.
A database backup file sitting in a publicly accessible directory, indexed by Google. The filename alone tells you what is likely inside it — client records from 2023. This is an immediate critical finding. The client needs to be notified before the end of the business day, not at the end of the engagement.
An administrator panel on a subdomain called "old" — almost certainly a legacy system the organisation forgot was still running. Legacy systems tend to run outdated CMS versions, carry default credentials, and receive zero security patching. The word "administrator" in the URL suggests a Joomla CMS login page.
The server version is displayed in the directory listing page header — a classic information disclosure finding in its own right. Apache 2.4.29 was released in 2017. Cross-referencing against CVE databases would almost certainly surface multiple critical vulnerabilities for this version.
Five queries. Thirty minutes of work. One exposed database backup, two admin panels on forgotten subdomains, an outdated Apache version, and a staging environment that is effectively public. All of it passive. None of it requiring a single packet to reach the target's servers.
The clients_backup_2023.sql finding does not wait for the final report. It gets escalated immediately — a phone call to the engagement contact, not an email, because email takes time and that file is sitting publicly accessible right now. How you handle an in-engagement discovery like this is a mark of professional maturity.
The Google Hacking Database — a catalogue of proven dorks
The GHDB is maintained by Offensive Security and contains thousands of documented dorks that have been used to find real exposures across real systems. It is categorised by finding type — files containing passwords, exposed web server configurations, sensitive online shopping information, vulnerable servers — and each entry includes the exact search query used.
For a pen tester, the GHDB serves as a starting point rather than a complete playbook. The most effective approach is to browse the categories relevant to your target's technology stack, pull the dorks that match, and adapt them by adding a site: operator to focus the search on your specific target domain. Broad GHDB queries without a site: filter return results from random organisations worldwide and have no place in a professional engagement.
Dorking is passive — but acting on findings requires authorisation
This distinction is important and gets blurred by beginners. The search queries above are passive — they interact with Google, not with the target. Clicking through to an exposed admin panel, downloading an indexed SQL file, or attempting to log in with default credentials against a discovered login page — all of those cross into active territory and require explicit authorisation.
A common real-world mistake: a researcher finds an exposed directory listing through Google, clicks through to browse the files out of curiosity, and downloads one to verify its contents. That download is an unauthorised access of the target's system — even though Google surfaced it, even though the file was publicly reachable, and even though the researcher never exploited anything. Passive reconnaissance ends the moment you interact with any resource the target controls.
The rule: Dorking itself is passive. Visiting, downloading, or interacting with anything you find through dorking is active — and requires the same authorisation as any other active technique in your engagement. Document what you find through search queries, then raise it with the client before touching anything directly.
Teacher's Note: The clients_backup_2023.sql scenario above is not hypothetical — database backups left in public directories are a finding that comes up more often than it should. The moment you find something like that, the engagement timeline changes. Your job is to protect the client's data, not just document it for a report that arrives two weeks later.
Practice questions
Scenario:
Scenario:
Scenario:
Quiz
Scenario:
Scenario:
Scenario:
Up Next · Lesson 14
Nmap Scanning
The recon phase is done. Now the scanning begins — learning to use the most powerful and widely used network scanner in the field.