The first step of an external penetration test or red team engagement is to gather as much relevant information as possible about the target. This information, often referred to as “Open Source Intelligence” (OSINT) comes from a wide variety of sources and helps an attacker learn important details about an organization. We’ve recently been exploring some different data sources and methods for OSINT using entirely offline data. There are some interesting advantages (and disadvantages) of using offline OSINT that we’ll look at here.
"Open-source intelligence (OSINT) is data collected from publicly available sources to be used in an intelligence context. In the intelligence community, the term "open" refers to overt, publicly available sources…"
OSINT activities are often overlooked as less valuable than technical testing. They are extremely valuable! A robust set of target data before beginning an engagement can be the key to success.
We typically gather two distinct types of information during the OSINT phase of an engagement: employee details and internal/external network details. It’s been helpful for us to split out our efforts for gathering this data into two distinct portions of the OSINT process.
This is all about gathering the most complete picture of who works there as possible. We attempt to get lists of employee names, emails, usernames, phone numbers, and potentially additional information.
We are not always able to find names, email addresses and usernames that match or that originate from the same data source. However, by identifying username or email formats, we can often use name lists to generate the corresponding username and email data we desire. This can be really effective at building a large list of targets but does result in some false positives.
Employee data helps us succeed in a few scenarios. The primary attack scenario most people associate employee data with is phishing campaigns. A large, accurate list of users and emails greatly increases the chances our campaigns will be successful. In addition, usernames identified during this process can be used for password guessing attacks against external portals. This can be key in establishing internal network access.
Internal/External Network Information
The focus of this phase is gathering technical details about the target organization. These include internal and external hostnames and IPs, internal Active Directory domains, and even publicly exposed sensitive data such as secret keys or credentials.
Popular Data Sources
LinkedIn – A popular source for finding specific details about targeted organizations. This can also be used in trying to identify certain technologies in use at an organization.
Data.com – This site is a great source for gathering large amounts of data (name, phone number, position, etc.) on employees. You can search by company which is ideal.
Technical Forums – These are chock full of sensitive internal data that gets inadvertently posted. Common forums include Microsoft TechNet and Experts Exchange.
Social Media (Facebook, Twitter, Instagram, etc.) – These are good for gathering more personal details on individuals, and can be useful for guessing security questions, etc.
Google – Your general catch-all of the internet. Google is used for anything from harvesting emails to identifying publicly accessible sensitive data and portals.
Tools for Automation
I was going to write a lot more about the various tools out there designed to help automate the OSINT process. Then this post got a little long winded and out of hand. So, I’ll settle for providing a list of popular OSINT tools that I’ve used to gather data. These are all free and/or open sourced, so you should be able to download them and check them out without much trouble.
TheHarvester – This tool leverages several search engines to gather general data about the provided organization. Gathered data includes usernames, subdomains, emails, etc.
Metagoofil – Metagoofil leverages google to pull metadata from common sources such as Word and PDF documents. This can be useful for discovering username structure as well as gathering a number of usernames.
Raven – Raven uses Google to scrape information from LinkedIn. It is also available on GitHub.
Shodan – Shodan is a public service which indexes websites and stores information on open ports and vulnerable services.
DNSdumpster – This tool is great for finding subdomains and other related domains.
Robtex – Website for looking up general information about hostnames, IPs and routes.
FOCA – gathers metadata in documents from various search engines based on input parameters to the tool.
When we refer to “offline” OSINT, we mean performing all of our standard OSINT activities without necessarily having to actually touch target or third party websites. The only thing we need an internet connection for is to download the initial data sets! This strategy provides a number of advantages (and some downsides) over traditional OSINT.
Stealth – While many traditional OSINT methods avoid touching the target directly, offline OSINT methods don't even disclose your target to third party organizations (think search engines, ISPs, etc). While this type of stealth may not be relevant during pentests, it could be attractive for longer term engagements or attackers looking to avoid a legal trail.
Speed – Parsing static data sets can be an incredibly fast way to gather information on a target. For example, as opposed to making tens or hundreds of web searches and API calls, we can simply parse through some local databases to retrieve the same information.
Available Tools – While there are many existing tools for online OSINT, you’ll most likely need to develop some of your own tools and methodology to efficiently perform offline OSINT.
Data sources are the key to success in performing offline OSINT. The more complete and recent the data source, the more likely you are to get accurate data about your target. There are too many possible sources out there to cover, so here are just a few of our favorites we've found so far.
These contain the results of DNS queries. One great example is the Forward DNS Database from Rapid7's Project Sonar which consists of the responses to DNS requests for all domain names known by Project Sonar.
One of the simplest things to query this database for is all of the subdomains of a target domain. This is particularly helpful when coming up with an initial target list for a pentest or red team engagement. Here's an example just using grep:
We can also search for hosts based on an IP address range, again helpful for identifying potential targets prior to an engagement. Here's an example searching Apple's /8 range:
If you're interested in other examples using this particular dataset, we've had a couple recent blog posts use it as a primary source: Hijacking 8000+ CloudFront Domains, How I Identified 93k Domain-Frontable CloudFront Domains.
These contain the ownership records for IP addresses. There are five Regional Internet Registries that maintain these databases for their respective regions. It should be possible to download the database from any of these organizations, though some may require you to jump through more hoops than others... For example, here is ARIN's (US & Canada) procedure: https://www.arin.net/resources/request/bulkwhois.html. This database contains every record that is searchable through the online ARIN API.
RIPE's (Europe) Whois database is easily downloaded here: ripe.db.gz. We can search by any field we want, but here's an example of identifying some IP ranges by searching for an @apple.com email address:
Apple isn't exacly the most unique company name to search for, so an email search is very useful. For someone with a bit more of a unique name, we can easily search with that:
It's also worth mentioning Internet Routing Registry (IRR) databases. These are often easier to find and download (and smaller files). Here is the one for ARIN: arin.db. There is a list of many more here: http://www.irr.net/docs/list.html. While the IRRs don't contain nearly as much info as the full Whois database, there can still be useful information that is quickly parseable.
These are the data dumps that result from website breaches. For most of the high profile data breaches you've heard about, there is a data dump somewhere that you can actually download and search. These are excellent sources for finding lots of valid email addresses in preparation for a spearphish or password spray attack.
There are also passwords (or password hashes) associated with many of the email addresses in these dumps. We are usually targeting corporate portals with password policies better than the breached website (and forced password rotation), so most users won't have managed to share a password. But it happens: the 2012 Dropbox breach was caused by an employee's shared password that was in an earlier LinkedIn breach.
Here's an example of using the combined dump Exploit.in to harvest target.com email addresses (passwords blurred):
While we are going to avoid linking to sensitive information directly, I can assure you these dumps are easy to find and download.
We have only touched the tip of the iceberg of available data and it's uses. There are so many more databases, dumps, and other sources of information available. There is also a lot of room to come up with better tools and methods for searching and using all of this data. While what we are calling "Offline OSINT" can by no means replace the more established methods, we encourage you to explore some of these data sources and incorporate them into your engagements if you find them useful.