OSINT: What Attackers Know Before They Strike

Every engagement I run starts the same way. Before I touch a client's network — before I run a single scan, before I send a single packet — I spend time doing exactly what a real attacker would do first: I look. I use publicly available tools and databases to build a picture of the target that is, in almost every case, far more detailed than the business owner ever imagined possible.

This phase is called passive reconnaissance, or OSINT — Open Source Intelligence. "Passive" means it leaves no fingerprint. No firewall log will show it. No IDS will alert on it. You can be fully, comprehensively profiled from the internet without your systems ever knowing it happened. The attacker then uses everything they've found to craft a targeted, personalised attack — a spear phishing email that references real people, real systems, real job titles — that is dramatically more convincing than a generic scam.

This article walks through every major OSINT source a real attacker uses. For each one I'll show you what they find and, more importantly, what you can do to shrink your exposure.

91%

of cyberattacks begin with a phishing or spear phishing email

network traces left by passive OSINT — it is completely silent

~2 hrs

to build a full company profile using free tools

cost to an attacker — every tool listed here has a free tier

🎯

HOW TO READ THIS ARTICLE

Each OSINT source section shows you what the attacker learns and ends with a concrete defensive action. The final section synthesises everything into a full spear phishing build — so you can see how these fragments combine into something genuinely dangerous. Then we cover how to fight back.

Category 01 — Your Internet-Facing Infrastructure

The first thing an attacker does is answer one question: what is this business actually running, and where is it exposed? They don't need to touch your systems to answer this. Multiple search engines have already indexed everything visible on the internet — they just need to query them.

Shodan

INTERNET DEVICE SEARCH ENGINE

Often called "the search engine for hackers." Shodan continuously crawls the internet and catalogues every internet-connected device — servers, routers, cameras, printers, SCADA systems. For your domain it will reveal open ports, running services, software banners, SSL certificate details, and in many cases the exact software version. A search for your IP range returns a complete picture of what you're exposing. Shodan also flags known vulnerabilities against what it finds — so if your Fortinet or Citrix is on an unpatched version, it will say so.

ATTACKER IMPACT: CRITICAL

Censys

ATTACK SURFACE INTELLIGENCE

Similar to Shodan but with deeper TLS/certificate analysis and stronger organisational correlation. Censys maps your entire internet-facing attack surface — every IP, every exposed service, every certificate. It can enumerate all hosts associated with your organisation even if you don't know those hosts exist. Particularly dangerous for finding forgotten subdomains or staging servers that nobody is monitoring.

ATTACKER IMPACT: CRITICAL

FOFA / ZoomEye

CHINESE THREAT-ACTOR FAVOURITES

Chinese equivalents of Shodan with massive crawl databases — FOFA indexes over 3.5 billion assets globally. These are actively used by threat actors operating out of East Asia and increasingly by RaaS affiliates who have purchased access. Your business is indexed here too. They excel at finding industrial control systems and healthcare infrastructure that Western crawlers sometimes miss.

ATTACKER IMPACT: HIGH

Nmap / Masscan (Active)

ACTIVE PORT SCANNING

Once the passive picture is built, an attacker may confirm it with a lightweight active scan. Masscan can scan your entire internet-facing IP range in under a minute. Nmap then interrogates specific ports to confirm service versions. Unlike the passive tools above, this does leave traces in your firewall logs — but most small businesses don't review those logs, so it rarely matters.

ATTACKER IMPACT: CRITICAL

SHODAN CLI — SAMPLE OUTPUT FOR greyhatdemo.co.za

$ shodan host 196.22.xxx.xxx

IP:         196.22.xxx.xxx

Hostnames:  mail.greyhatdemo.co.za, vpn.greyhatdemo.co.za

Country:    South Africa

Org:        Internet Solutions (Pty) Ltd

Open Ports:

  25/tcp   SMTP  Microsoft Exchange smtpd

  443/tcp  HTTPS Fortinet FortiGate SSL-VPN 7.4.1  ← CVE-2024-21762 UNPATCHED

  3389/tcp RDP   Windows Server 2019 ← EXPOSED TO INTERNET

  8443/tcp HTTPS Synology DiskStation DSM 7.1

Vulnerabilities:

  CVE-2024-21762  FortiOS RCE — CVSS 9.8 — public PoC available

  CVE-2024-3400   PAN-OS command injection — CVSS 10.0

🔴

THE ATTACKER DIDN'T TOUCH YOUR NETWORK TO LEARN ALL OF THIS

Every piece of data in that Shodan output was gathered passively by Shodan's crawlers — not by the attacker. The attacker simply queried a search engine. Your firewall never logged it. Your SIEM never saw it. You had no idea it happened. And now the attacker knows your unpatched FortiGate has a public PoC exploit with a CVSS score of 9.8.

Category 02 — DNS, Subdomains & Certificate Transparency

Your DNS records are public by design — they have to be for email delivery and website routing to work. But most businesses don't realise how much infrastructure topology is inadvertently disclosed through DNS, or that every SSL certificate ever issued for their domain is logged in a publicly queryable database.

crt.sh

CERTIFICATE TRANSPARENCY LOGS

Every SSL/TLS certificate issued by a trusted Certificate Authority is logged in public Certificate Transparency logs — a security measure designed to prevent certificate fraud. crt.sh queries these logs. A search for %.yourdomain.co.za returns every certificate ever issued for your domain — including internal staging servers, development environments, VPN portals, and admin panels that were never intended to be public-facing. This is one of the most reliable ways to enumerate hidden subdomains.

ATTACKER IMPACT: CRITICAL

Amass / Subfinder

SUBDOMAIN ENUMERATION

Automated tools that combine dozens of passive sources — DNS brute forcing, certificate logs, search engine results, VirusTotal, SecurityTrails, and more — to build a comprehensive map of all subdomains associated with a domain. A typical run against a mid-sized South African professional firm returns 15–40 subdomains, many of which the IT department has forgotten exist. Forgotten systems are rarely patched.

ATTACKER IMPACT: HIGH

MXToolbox / DNSdumpster

MX, SPF, DKIM & DMARC ANALYSIS

MXToolbox reveals your mail server software, your MX record priority, and whether you have SPF, DKIM, and DMARC configured. Missing or misconfigured DMARC is extremely common and means an attacker can send spoofed email that appears to come from your own domain — a critical enabler for spear phishing. DNSdumpster maps your full DNS topology visually, including TXT records that sometimes inadvertently disclose internal infrastructure details.

ATTACKER IMPACT: CRITICAL

SecurityTrails / ViewDNS

HISTORICAL DNS RECORDS

These tools reveal historical DNS records — what your domain pointed to in the past. This matters because businesses sometimes migrate hosting and forget about old subdomains that still resolve to legacy infrastructure. They also reveal your real hosting provider's IP even if you're behind Cloudflare — a common "hiding behind a WAF" mistake that exposes your origin server to direct attack.

ATTACKER IMPACT: HIGH

📋

THE DMARC PROBLEM IS BIGGER THAN YOU THINK

In South Africa, fewer than 30% of .co.za domains have a properly enforced DMARC policy (p=reject or p=quarantine). The rest are either missing DMARC entirely or have it set to p=none — which means the policy is monitoring-only and does nothing to prevent spoofing. An attacker who checks your MX records and finds no DMARC enforcement can send email from [email protected] that passes most mail client authenticity checks.

Category 03 — People, Email Addresses & Org Structure

Infrastructure enumeration tells the attacker what systems you're running. Personnel enumeration tells them who to target and how to reach them. These are the building blocks of a spear phishing campaign — the attacker needs real names, real job titles, and real email addresses to make their lure convincing.

LinkedIn + Google Dorking

ORG CHART & PERSONNEL RECON

LinkedIn is the attacker's org chart. A search for your company name surfaces every staff member who has listed it as their employer — names, job titles, seniority, how long they've been there, previous employers. Combined with Google Dorks (site:linkedin.com "YourCompany" "Director"), an attacker can build your complete hierarchy: who approves payments, who manages IT, who is new (and therefore less likely to question an unusual request). This is all publicly available, requires no account, and leaves no trace.

ATTACKER IMPACT: CRITICAL

theHarvester

EMAIL & SUBDOMAIN HARVESTING

theHarvester is an open-source tool pre-installed in Kali Linux that aggregates email addresses, subdomains, hosts, and employee names from dozens of public sources simultaneously — Google, Bing, LinkedIn, Twitter/X, VirusTotal, Hunter.io, and more. A single command against your domain typically returns 10–30 verified staff email addresses within minutes. Every email address recovered is a potential phishing target and, if leaked in a breach, a credential stuffing vector.

ATTACKER IMPACT: CRITICAL

Hunter.io / Snov.io

EMAIL PATTERN DISCOVERY

These tools identify your email address format — whether your business uses firstname@, f.lastname@, firstname.lastname@, or another pattern — by analysing known email addresses already indexed from your domain. Once the pattern is known, the attacker can derive valid email addresses for any staff member they identify on LinkedIn without ever needing to confirm them. Hunter.io also provides a confidence score and sources for each address.

ATTACKER IMPACT: HIGH

Maltego

LINK ANALYSIS & GRAPH MAPPING

Maltego is a visual intelligence platform that connects the dots across all these data sources — domains, IPs, email addresses, social profiles, phone numbers, company registrations — into an interactive relationship graph. It shows who knows whom, which email addresses are connected to which domains, how your company relates to other entities. Used by professional threat actors and red teamers alike, it transforms raw OSINT fragments into an actionable attack map.

ATTACKER IMPACT: HIGH

theHarvester — SAMPLE RUN AGAINST demo-practice.co.za

$ theHarvester -d demo-practice.co.za -b all

[ Emails Found ]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected] ← IT contact = high value target

[ Hosts Found ]
mail.demo-practice.co.za 196.22.x.x
vpn.demo-practice.co.za 196.22.x.x
backup.demo-practice.co.za 196.22.x.x ← forgotten, unmonitored

[ LinkedIn Personnel ]
Dr David S. Principal · 8 years · posted last week
Sarah M. Practice Manager · 3 years
Kyle T. IT Administrator · 4 months ← NEW — higher social engineering risk

Category 04 — Breached Credentials & the Dark Web

Data breaches happen constantly. Millions of credential pairs — email addresses and their corresponding passwords — are stolen from services your staff use every day and end up on dark web marketplaces and freely-shared breach databases. An attacker doesn't need to hack you directly if your staff member used the same password on LinkedIn that they use on your VPN.

HaveIBeenPwned (HIBP)

BREACH NOTIFICATION DATABASE

Troy Hunt's HIBP database contains over 13 billion compromised accounts from 700+ documented breaches. An attacker queries it with your staff email addresses to find which ones appear in known breach datasets — and which breaches they appear in. Knowing someone was in the 2021 LinkedIn breach, the 2022 RockYou2021 compilation, and the 2023 23andMe breach tells the attacker a lot about likely reuse patterns and which password lists to try first.

ATTACKER IMPACT: CRITICAL

DeHashed / SnusBase

PLAINTEXT CREDENTIAL SEARCH

Unlike HIBP which only confirms presence in a breach, DeHashed and SnusBase return the actual leaked data — email address, username, plaintext or hashed password, name, phone number, physical address — for a small subscription fee. For a threat actor, finding that your practice manager's email appeared in a breach with a recoverable password hash is the difference between spending weeks on brute-force and spending five minutes on a credential-stuffing script.

ATTACKER IMPACT: CRITICAL

Dark Web Marketplaces

STEALER LOG MARKETS

Infostealer malware — deployed via phishing or malvertising — captures credentials, cookies, and session tokens from infected machines and sells them in bulk on dark web markets like Russian Market and 2easy. These "logs" are searchable by domain. An attacker can query whether any machine that has ever visited your business domain had active session cookies or stored credentials harvested by a stealer. This is how attackers bypass MFA — they steal the session token after authentication, not the password before it.

ATTACKER IMPACT: CRITICAL

IntelligenceX / Grep.app

DARK WEB & PASTE SEARCH

IntelligenceX indexes Tor sites, data dumps, and Pastebin-style sites. A search for your domain or IP range will surface any mentions — leaked credential lists, attacker reconnaissance notes shared in forums, or dumped data from previous breaches you may not have known about. Grep.app indexes public code repositories and can surface hardcoded credentials or API keys in code referencing your domain.

ATTACKER IMPACT: HIGH

Category 05 — GitHub Leaks, Metadata & Job Posts

Three sources that most businesses have never considered as OSINT risks — but which consistently deliver some of the highest-value intelligence an attacker can find.

GitHub / GitLab / Bitbucket

ACCIDENTAL SECRET EXPOSURE

Developers frequently commit secrets to public code repositories by mistake — API keys, database connection strings, AWS credentials, internal IP addresses, SMTP passwords. Tools like Trufflehog, GitLeaks, and Gitrob automate scanning repositories for high-entropy strings and known credential patterns. Even deleted commits are often recoverable. A single leaked database connection string from your developer's side project might contain the credentials to your production server if password reuse is in play.

ATTACKER IMPACT: CRITICAL

Document Metadata (ExifTool)

HIDDEN DATA IN PUBLISHED FILES

PDFs, Word documents, and images published on your website contain embedded metadata — author name, organisation name, software version, internal file paths, sometimes GPS coordinates from mobile photos. ExifTool extracts all of this in seconds. Knowing your staff use Microsoft Word 2016 tells an attacker which unpatched macro vulnerabilities to target. An internal file path like \\PRACTICE-SERVER\accounts\invoices\ in a PDF's metadata reveals your server name and share structure without ever accessing it.

ATTACKER IMPACT: HIGH

Job Adverts (LinkedIn / Indeed / Careers Pages)

TECH STACK DISCLOSURE

Your IT job advertisements are a free inventory of your internal technology stack. "Must have experience with Sophos XG Firewall, Veeam Backup, and Microsoft Exchange 2019" tells an attacker your exact software list — and they can immediately cross-reference against the CVE database to find your unpatched vulnerabilities. This is a well-documented attacker technique that requires zero technical skill to exploit. Hiring managers who write detailed job specs are unknowingly publishing your internal tech audit.

ATTACKER IMPACT: HIGH

Wayback Machine / web.archive.org

HISTORICAL WEBSITE SNAPSHOTS

The Internet Archive has been crawling and storing copies of web pages since 1996. If your website previously had a login portal, an admin panel, an old e-commerce system, or a development environment, it's likely archived. Old pages sometimes reveal staff names, previous email formats, now-deprecated software, and infrastructure paths that no longer exist — but whose associated accounts or passwords may still be active on your current systems due to credential reuse.

ATTACKER IMPACT: MEDIUM

Category 06 — Google Dorks: The Free Vulnerability Scanner

Google's search operators can be chained together to create highly targeted queries — known as "Google Dorks" — that surface sensitive files, exposed admin panels, login pages, and confidential documents that have been inadvertently indexed. This requires nothing beyond a web browser and takes seconds per query.

GOOGLE DORK EXAMPLES — substitute yourdomain.co.za

# Exposed login portals and admin panels

site:yourdomain.co.za inurl:admin OR inurl:login OR inurl:portal

# Sensitive file types indexed by Google

site:yourdomain.co.za filetype:pdf OR filetype:xlsx OR filetype:docx

# Configuration and environment files

site:yourdomain.co.za filetype:env OR filetype:cfg OR filetype:ini

# Error pages disclosing stack traces / software versions

site:yourdomain.co.za intext:"sql syntax" OR intext:"stack trace"

# Exposed backup files

site:yourdomain.co.za filetype:bak OR filetype:sql OR filetype:zip

# Staff email enumeration via indexed documents

site:yourdomain.co.za intext:"@yourdomain.co.za" filetype:pdf

The GHDB — Google Hacking Database, maintained at exploit-db.com — catalogues thousands of proven Google Dork queries organised by vulnerability category. An attacker searches the GHDB for queries relevant to your software stack and runs them against your domain in minutes. The results have been weaponised in real attacks and documented repeatedly in incident response reports.

Putting It Together: How a Spear Phishing Attack Is Built

Here is a realistic composite of how everything above combines. This is drawn from documented attack patterns and our own red team engagements. The target is a fictional three-dentist practice — but the techniques apply to any small South African business.

ATTACKER'S DOSSIER COMPILED IN 97 MINUTES · FULLY PASSIVE

TARGET

Highveld Family Dental (Pty) Ltd · Centurion · 4 staff

PRINCIPALS

Dr Sarah K. (Owner, 12yr) · Dr James P. (Associate, 2yr) · Nompumelelo D. (Practice Mgr, 6mo — NEW)

EMAIL FORMAT

[email protected] · confirmed via Hunter.io (3 hits)

EXPOSED PORTS

3389 RDP — internet-exposed · 443 (Fortinet 7.0.9) · 25 SMTP (Exchange 2019)

KNOWN VULNS

CVE-2024-21762 CVSS 9.8 (FortiGate — unpatched) · CVE-2024-49113 (Exchange — unpatched)

BREACHED CREDS

[email protected] found in RockYou2021 + LinkedIn 2021 breach · password hash recoverable

DMARC STATUS

p=none · spoofing ENABLED — can send as any @highvelddental.co.za address

JOB POSTING

Indeed post (Nov 2025): "experience with Genie PMS, Carestream CS3600, Windows Server 2019" ← full tech stack disclosed

PLANNED ATTACK

Spear phish Nompumelelo (new, less suspicious) · Send as [email protected] · Subject: "IT system password reset required — action by COB"

That dossier was built entirely with free tools in under two hours. No network access. No hacking. The attack hasn't started yet. The attacker now sends a single email to a new staff member, appearing to come from the owner's own address, referencing the practice management software by name, with a link to a fake login page. The email is convincing because every detail in it is real.

⚠️

THIS IS EXACTLY WHAT WE DO IN A RED TEAM ENGAGEMENT

Before we touch anything in a client's network, we run every source in this article against their domain. The dossier we build is usually more complete than what we've shown here. When we then send a simulated spear phishing email to their staff — crafted with all of this intelligence — the click rate on these targeted, contextual emails is typically 3–5× higher than generic phishing simulations. The intelligence gap is real, and it is routinely exploited.

How to Fight Back: Reducing Your OSINT Attack Surface

You cannot make yourself completely invisible. Some of what OSINT reveals — your MX records, your SSL certificates, your public-facing services — are necessary to run a business. But you can significantly reduce the signal quality an attacker gets, close the most dangerous exposures, and make targeted attacks dramatically harder to construct.

Run a Google Dork Audit on Yourself — Right Now

Spend 20 minutes running the Google Dorks from Section 06 against your own domain. If Google has indexed sensitive files, an admin panel, or a login portal that shouldn't be public, you'll find it. Add a robots.txt disallow rule and request removal via Google Search Console. This takes under an hour and closes something an attacker could find in 30 seconds.

Enforce DMARC at p=reject — Today

This single DNS record change prevents attackers from sending spoofed email as your domain — the most dangerous capability enabled by passive recon. Add an SPF record, configure DKIM signing on your mail server, then add a DMARC TXT record with p=reject. Your IT support should be able to do this in under an hour. Use MXToolbox to verify once live. Without this, your own domain can be weaponised against your staff.

Check Your Staff Emails on HIBP — and Force Password Changes

Go to haveibeenpwned.com and check every staff email address. For every address that appears in a breach, that password must be changed immediately and MFA must be enabled. Assume that if an email appeared in any breach, the password associated with it has been tried against your VPN, your Microsoft 365, and your email platform. Credential stuffing is automated and runs constantly against every known business domain.

Find Your Shodan Exposure Before an Attacker Does

Search your public IP address and domain on Shodan (shodan.io). Everything it returns is visible to every attacker who searches it. Pay particular attention to any exposed RDP, unrecognised open ports, and any flagged CVEs. If RDP appears, take it off the internet immediately — put it behind a VPN. If Shodan flags unpatched software, apply the patch that day. Greyhat4Hire's free Hacker's Dossier tool runs a live Shodan query against your domain in one click — it's the fastest starting point.

Sanitise Document Metadata Before Publishing

Before publishing any PDF, Word document, or image to your website, strip the metadata. In Microsoft Word: File → Info → Check for Issues → Inspect Document → Remove All. For PDFs, use Adobe Acrobat's Sanitize Document function, or the free ExifTool -all= document.pdf command. Internal file paths, author names, and software versions disclosed in document metadata are a gift to an attacker building a targeted lure.

Audit Your Subdomains via crt.sh

Go to crt.sh and search %.yourdomain.co.za. Every subdomain ever issued an SSL certificate will appear. For each one, check whether it still resolves and whether it's actively maintained. Forgotten staging servers, old client portals, and deprecated admin panels are almost never patched and often run outdated software. Either redirect them, take them offline, or confirm they're secured. Make this a quarterly housekeeping task.

Audit Your Job Posts — Remove Tech Stack Details

Review your current and archived job adverts. Remove specific software product names, version numbers, and infrastructure details. "Proficient in enterprise backup and virtualisation tools" conveys the same hiring signal as "experience with Veeam Backup & Replication v12 and VMware ESXi 7.0" — without advertising your exact software stack to every threat actor who runs a job board scraper. This is a zero-cost change that closes a real intelligence gap.

Train New Staff Specifically on Spear Phishing

New employees are the highest-risk target in any OSINT-driven attack. They are less familiar with internal processes, more likely to follow instructions from apparent authority figures without questioning them, and their names are freshly visible on LinkedIn. Run a specific spear phishing awareness session with every new hire within their first two weeks — before an attacker identifies them as a target. Show them a sample of the dossier an attacker could build on the company. The reality check is more effective than any generic security training slide deck.

The Uncomfortable Truth About Open Source Intelligence

Everything in this article — every tool, every technique, every piece of intelligence described — is legal, publicly available, and being used right now by attackers scanning for their next target. No laws were broken to compile that dossier. No systems were accessed without permission. The data was simply found, because your business put it there — in DNS records, job posts, LinkedIn profiles, and certificate logs — as part of normal operations.

The appropriate response is not to become invisible — that's not achievable. It's to understand what you're disclosing and make deliberate choices about it. Close the exposures that provide direct attack paths (RDP, DMARC, unpatched CVEs). Reduce the signal quality of everything else (metadata, job post details, subdomain hygiene). And make sure your staff — especially new staff — understand that a convincing email doesn't mean a legitimate one.

If you want to know exactly what an attacker finds when they look at your business, Greyhat4Hire runs a full OSINT reconnaissance engagement as part of every penetration test — and we can run a standalone OSINT report if you want the picture without the full test. We show you the dossier. Then we help you close it.

🦷

Dr David Sykes

Dentist · Penetration Tester · Founder, Greyhat4Hire · South Africa

Dr Sykes runs a dental practice and a cybersecurity consultancy out of the same brain. His focus is on translating enterprise-grade threat intelligence into actionable security for South African small businesses that the industry has largely ignored.

Book a Pentest Run Free OSINT on My Domain

Share This Article

What an Attacker Finds Out About YouBefore Sending the First Packet

Category 01 — Your Internet-Facing Infrastructure

Category 02 — DNS, Subdomains & Certificate Transparency

Category 03 — People, Email Addresses & Org Structure

Category 04 — Breached Credentials & the Dark Web

Category 05 — GitHub Leaks, Metadata & Job Posts

Category 06 — Google Dorks: The Free Vulnerability Scanner

Putting It Together: How a Spear Phishing Attack Is Built

How to Fight Back: Reducing Your OSINT Attack Surface

The Uncomfortable Truth About Open Source Intelligence

What an Attacker Finds Out About You
Before Sending the First Packet