description: >- The OSINT Knowledge Base (OKB) is a collection of OSINT tools, products, and data sources. Its goal is to provide curated content to help people along their OSINT journeys. cover: https://w.wallhaven.cc/full/xl/wallhaven-xl1ydl.jpg coverY: 0
The OKB
The Talk
{% content-ref url="broken-reference" %} Broken link {% endcontent-ref %}
Other Great OSINT Resources
{% embed url="https://metaosint.github.io/" %} TONS of links. The interactive graph is cool. {% endembed %}
{% embed url="https://inteltechniques.com/tools/index.html" %} Michael Bazzell's IntelTechniques Site is Excellent {% endembed %}
{% embed url="https://sector035.nl/links" %} Links Broken Down By Category {% endembed %}
{% embed url="https://osintframework.com" %} A fun graphical tool with links for a bunch of categories {% endembed %}
description: Evaluating OSINT Data sources and Tooling - Build vs Buy cover: >- https://images.unsplash.com/photo-1565551223391-be988013ee6d?crop=entropy&cs=tinysrgb&fm=jpg&ixid=MnwxOTcwMjR8MHwxfHNlYXJjaHwzfHxicm9rZW58ZW58MHx8fHwxNjY1NjM2Mzc5&ixlib=rb-1.2.1&q=80 coverY: 0
<overly complex title>
"An OSINT Talk"
by: Corey Ham
description: Welcome!
Introduction
What is this talk?
This is a talk where I introduce a resource to help you make decisions about what tools, sites, or techniques to use next time you use OSINT.
I'll also talk about my approach to OSINT, and provide some generalized recommendations based on my experience.
Who am I?
- Corey Ham | cham423
- A tester at BHIS
- mostly red team and adversary sim experience
- client-facing tester since 2013
- previously Optiv, Hurricane Labs
- OSINT Enjoyer
- Have used OSINT during offensive engagements over the years out of necessity
- Also do OSINT for "fun" (data hoarding)
Disclaimers
- this talk contains opinions, biases, assumptions, and YMMV
- there are infinite OSINT use cases, and I can't cover them all
The Idea
OSINT is easy! But there's so much out there...
- nearly infinite amount of data available
- difficult to judge quality of data
- lots of tasks require human brains, manual effort
Existing OSINT resources lack curation
- huge lists of links (guilty)
- difficult to see return data types/capabilities ahead of time
- for beginners, difficult to map tools to specific workflows
- example: business email enumeration
Time wasters:
- For free products, getting tools running can take time
- keys, deployment, proxying, etc
- Hosted tools often have limitations
- For paid products, cost/benefit
We also have to decide between building it ourselves, using existing tools, or just paying for a commercial product.
The Solution (hopefully)
- operators build their list of "go-to" tools over time based on experience
- why not share this experience?
This site!! have you noticed my slides are on gitbook? it's public.
- Please contribute (more info on that later)
Focus on metrics, fallback to opinions where metrics don't make sense
- for example, can compare certain tools by number of results and accuracy of results
- cannot review training this way
Addressing Build vs. Buy
It was in the title of the talk...
Build Everything: $1500 Sandwich in 6 Months
youtube channel “How to Make Everything” demonstrated how taking the “build” mentality to the absolute extreme can lead to ridiculous outcomes. His process involved harvesting and milling his own wheat to make the bread, killing his own chicken, and planting a garden to grow ingredients. in the end, the sandwich was “not bad” and consumed 6 months of his life.

Buy Everything: budget error 406 not acceptable
- lots of overlapping data
- data you have no use for
- you might get fired
- purchased tools still need to be integrated
Balanced Approach
- some "assembly" is always required
- integrating paid tools
- deploying free tools
- automation component
- Mix build and buy to meet your OSINT goals
Easy Examples:
- Buy
- internet-sized attack surface data -- building this would involve scanning billions of hosts, cost lots in hosting, and require lots of development to enrich the data to make it usable. and Shodan is quite affordable
- Build
- profile enumeration by username (demo) -- there are tons of ways to do this, and it's relatively straightforward
- anything that doesn't exist already -- you have no other options...
Site Walkthrough
Reviewed Content
Workflows
- a specific task, along with inputs and outputs
Workflow Example
{% content-ref url="../reviewed-content/workflows/profile-discovery-by-username.md" %} profile-discovery-by-username.md {% endcontent-ref %}
Categories
- generic OSINT sources/tasks that do not have specific inputs and outputs
Example
{% content-ref url="../reviewed-content/categories/public-breach-databases.md" %} public-breach-databases.md {% endcontent-ref %}
General Knowledge
- basically just huge lists of links (like I complained about in the start of the talk)
Contribute!
New Workflow?
- consider your testing procedure, make sure it is repeatable
Have you used any of this before?
- add your experience/opinion to their descriptions
Have something new?
- add it, along with any supporting information you have
{% content-ref url="../about/how-to-contribute.md" %} how-to-contribute.md {% endcontent-ref %}
Some General OSINT Thoughts
Consider OPSEC before you start
- your intelligence gathering is part of someone else's intelligence
- use a clean image if creating admissable evidence
- Michael Bazzell has an excellent guide for this in his OSINT Techniques book
Automate anything you can
- build a simple API wrapper script
- consider storing it in jupyter
- at the very least, build a checklist/standard procedure to follow
Always keep your output
- I prefer JSON/structured output formats, but you can parse it later
- Consider storing everything in elastic/ELK
- Newer versions of ES are really good at auto-mapping
curl -XPOST http://elk:9200/osint_dump_2022-09-15/_doc -H "Content-Type: application/json" -d @amass_out.json
Wrap-up / Questions
- i really hope people use it :)
- more content is coming!
- any questions?
Workflows
description: 'inputs: username | outputs: list of profiles associated with that username'
Profile Discovery by Username
Testing Methodology
- check the same username (i.e. cham423) using each tool
- count the number of profiles discovered
- count number of false positives (nonexistent profiles, not mismatches)
- rate difficulty of deployment
- rate quality of output
- rate development health
- rate automation potential
{% hint style="info" %} see the template at the bottom of the page if you are adding a new tool. {% endhint %}
Build Pick
Sherlock
{% embed url="https://github.com/sherlock-project/sherlock" %}
Example Results
python3 sherlock.py cham423 -o cham423.out
.png)
- 38 total results
- 0 false positives!
- 7 results were not the target, but were valid accounts
Deployment
- python (medium difficulty)
- has a docker container as a fallback
Output
- basically stdout, no JSON output option -- a list of discovered profile urls
Development Health
- very healthy, recent commits, large number of contributors, long history
Automation Potential
- very high, command line tool that supports input lists
- <todo> not tested at scale. from a single host, would likely encounter rate limiting when enumerating large numbers of profiles
Other Tools
WhatsMyName
{% embed url="https://github.com/WebBreacher/WhatsMyName" %}
Example Results
python3 whats_my_name.py -u cham423 -o cham423.out
.png)
- 114 total results
- 75 false positives! (65% false positive rate)
Deployment
- python (medium difficulty)
Output
- file output was broken at time of testing (2022-09-18)
- stdout is printed in table form with lots of extra spaces/characters, and some profile links are cut off
Development Health
- OK, some recent contributions
Automation Potential
- medium -- this project is used by many projects for its JSON list of profile urls, which can be imported to check usernames automatically. the false positive rate is alarming
Namechk (web)
{% embed url="https://namechk.com" %}
Example Results
.png)
- 25 total results
- 1 false positive
Deployment
- web hosted (easy difficulty)
- 1-5 clicks to full results
Output
- No file output
- Profile links to signup/about pages, not valid profiles
Development Health
- unknown (third party)
Automation Potential
- very limited: captcha + cloudflare would prevent easy scraping
namecheckr
{% embed url="https://www.namecheckr.com/" %}
Example Results
.png)
- 12 total results
- 1 false positives
Output
- no file output
- site links point to root page, not profile link
Development Health
- unknown (third party)
Automation Potential
- no CAPTCHA, so could be scraped
UserSearch.org
{% embed url="https://usersearch.org/" %}
Example Results
.png)
- 15 total results
- 1 false positive
Deployment
- web hosted (easy difficulty)
- requires 10+ clicks to see all sites, no single page with all results
Output
- no file output
- profile links available in search results, but not on single page
Development Health
- unknown (third party)
Automation Potential
- no CAPTCHA, could potentially be scraped
NameCheckup
{% embed url="https://namecheckup.com/" %}
Example Results
.png)
- 14 total results
- 1 false positive
Deployment
- web hosted (easy difficulty)
- 1-5 clicks to results
Output
- no file output
- results link to specific profiles
Development Health
- unknown (third party)
Automation Potential
- no CAPTCHA, could potentially be scraped
CheckUsernames
{% embed url="https://checkusernames.com/" %}
Example Results
.png)
- 42 total results
- 27 false positives
- 79 errors checking target site
Deployment
- web hosted (easy difficulty)
- 1-5 clicks to results
Output
- no file output
- profile links are to individual profile, not to root page which is nice
Development Health
- unknown (third party)
Automation Potential
- no CAPTCHA, could potentially be scraped
Namechk (script)
{% hint style="danger" %} This tool is not currently functional as of 2022-09-18, and returns all false positives due to Cloudflare bot protection {% endhint %}
{% embed url="https://github.com/GONZOsint/Namechk" %}
Example Output
.png)
Todo
- https://knowem.com/
- https://instantusername.com/#/
- namevine
- socialsearcher
Template
<link to tool>
Example Results
- n total results
- n false positives
Deployment
Output
- <describe output options>
Development Health
- <describe project health>
Automation Potential
- <describe automation potential>
description: 'input: domains | output: list of emails for input domains'
Email Enumeration by Domain
Testing Methodology
- check the same domain (i.e. microsoft.com) using each tool
- count the number of results
- rate difficulty of deployment
- rate quality of output
- rate development health
- rate automation potential
Phonebook.cz
- requires IntelX account to use (free signup)
- 39,601 results for
microsoft.com
Hunter
{% embed url="https://hunter.io/search" %}
Example Results
.png)
- 35,830 results
- also shows email format (i.e. {f}.{last}@domain.com
Deployment Difficulty
- web hosted (easy difficulty)
- has API
Output
- can export subset of records from web interface
- can export full records using API (if credits available in account)
Pricing
- $50-400/month depending on plan
- free plan available, limited quota per month
Infoga
<todo>
{% embed url="https://github.com/m4ll0k/infoga" %}
- "Infoga is a tool gathering email accounts informations (ip,hostname,country,...) from different public source (search engines, pgp key servers and shodan) and check if emails was leaked using haveibeenpwned.com API"
- Language: python
skymem.info
{% hint style="danger" %} Note: i have not confirmed this is a legitimate site {% endhint %}
{% embed url="http://www.skymem.info/" %}
Example Results
.png)
- 36631 results
Deployment Difficulty
- web hosted (easy difficulty)
- No API
Pricing
- varies by results size, not transparent
- large lists get into $2000-3000 range!!!
Categories
Public Breach Databases
Build Pick: Custom ES Cluster
- go to https://breached.to/Announcement-Database-Index
- download as many as you can
- parse them and import to ElasticSearch
- highly recommend using Jupyter + Spark to do this (https://hub.docker.com/r/jupyter/all-spark-notebook/)
- elastic connector for spark
- even on modest hardware, can process/import over 2 million records per hour
- Total size will cap out around ~10billion records
- I am working to release this to the public
Buy Pick: Dehashed
- $15.50/month, or $179 annually
- separate pricing for API credits and web access
- API pricing: $3 for 100 credits
- Contains passwords
- certain sensitive data is not accessible to normal accounts (i.e. SSNs)
- offensive focus, but could be useful for defenders to determine what is exposed/change passwords
Also Worth It: Have I Been Pwned (HIBP)
- worth it, $3.50 per month subscription fee
- highly effective data breach notification service
- passwords accessible in certain circumstances
IntelX
- pricing unclear, but can use https://phonebook.cz for free to gather certain attributes (i.e. email addresses)
Live Host Data
Buy Pick: Shodan
Details
- $69 - $1099 per month depending on number of results per month needed
- 1 million, 20 million, or unlimited results options
- 137,390,036 results for port:443
Usage Tips
- use
net:127.0.0.1/8for searching for results in a CIDR network - supports JARM searches with
ssl.jarmparameter - can stack queries i.e.
ssl.jarm:"2ad2ad16d2ad2ad22c42d42d000000dc2b105e4dda975fa70719c0cae5d0ce" net:203.97.69.74/24 - supports organization filters i.e.
org:"Verizon Business" - supports SSN cert CN filters i.e.
ssl.cert.subject.cn:"*.google.com"
BinaryEdge
- free account available, 250 requests per month
- also gathers torrent/DHT data
- $10-500 a month for higher query limits
- 163,691,756 results for port:443
LeakIX
- gray hat scanning platform
- targeted at identifying vulnerable services, but can be used for other things
- has "reporting" capability where vulnerabilities can be reported to their owners
RiskIQ Community
- free account with business email has the following monthly quotas:
Web Searches: 350
API Searches: 3,500
Community (public) Projects: 1,000
Team & Analyst (private) Projects: 1
Basic Monitors: 5,000
Keyword Monitors: 10
Monitor Frequency: Weekly
Quota Duration
Monthly
Last Reset
2022-10-01 00:00:00
Next Reset
2022-11-01 00:00:00
- recently acquired by microsoft, integrated into Defender Threat Intelligence so future uncertain
Greynoise
- "GreyNoise is a cybersecurity platform that collects and analyzes Internet-wide scan and attack traffic"
- does not scan the internet, but collects data on what is scanning/attacking from the internet
Censys
- 96,166,688 results for port:443
Netlas
Onyphe
https://www.onyphe.io/pricing/
- 59 Euros for search access
- 100-1000 Euro plans for business
- unknown data size
Build Pick: N/A
- most users should not try to scan the whole internet
- if you are the person that needs to build this, hopefully you know what you're doing
description: >- Multipurpose tools gather multiple pieces of information in one go, as opposed to a single type of data
Multipurpose Reconnaissance Tools
Recon-ng
{% embed url="https://github.com/lanmaster53/recon-ng" %}
Maltego
{% embed url="https://www.maltego.com" %}
- long standing general-purpose OSINT tool with graphical interface
- Free community edition available, with data gathering limitations
- $1000/year professional license
TheHarvester
{% embed url="https://github.com/laramies/theHarvester" %}
- performs open source intelligence (OSINT) gathering to help determine a domain's external threat landscape.
- gathers names, emails, IPs, subdomains, and URLs by using multiple public resources
- Language: python
Spiderfoot
{% embed url="https://github.com/smicallef/spiderfoot" %}
Identity Data
Free/Open Source
Commercial
Pipl
- built primarily for identity verification
description: If you're trying to get paid for OSINT, this will be required
Documentation
description: Screenshots or it didn't happen
Screen Capture
Hunchly
{% embed url="https://www.hunch.ly" %}
- Browser extension specifically designed for OSINT investigations
- automatically captures webpages you visit
- free 30 day trial, $130/year
Greenshot
{% embed url="https://getgreenshot.org/" %}
- Free on Windows
- $2 on Mac
Snagit
{% embed url="https://www.techsmith.com/screen-capture.html" %}
- $63 license, perpetual but no updates/new features after 1 year
- Windows/Mac
- Video capture + webcam, scrolling captures, built-in shapes/censoring, much more
Maps
ZeeMaps
{% embed url="https://www.zeemaps.com/" %}
BatchGeo
{% embed url="https://batchgeo.com/" %}
CreePy
{% embed url="https://github.com/ilektrojohn/creepy" %}
- No active development, last update 2014
Threat Intelligence
Free/Open Source
ThreatMiner
{% embed url="https://www.threatminer.org" %}
AlienVault OTX
{% embed url="https://otx.alienvault.com/" %}
BotScout
{% embed url="http://botscout.com/" %}
Bluelive Threat Exchange
{% embed url="https://community.blueliv.com/#!/discover" %}
X-Force Exchange
{% embed url="https://exchange.xforce.ibmcloud.com/" %}
Project Honeypot
{% embed url="https://www.projecthoneypot.org/index.php" %}
HoneyDB
{% embed url="https://honeydb.io/" %}
MISP
{% embed url="https://www.misp-project.org/" %}
Commercial
VirusTotal Intelligence
{% embed url="https://www.virustotal.com/gui/intelligence-overview" %}
- trial available for request
SkopeNow
{% embed url="https://www.skopenow.com" %}
SocialLinks
{% embed url="https://sociallinks.io" %}
NexusExplore
{% embed url="https://www.nexusxplore.com" %}
Recorded Future
{% embed url="https://www.recordedfuture.com" %}
Dataminr
{% embed url="https://www.dataminr.com/" %}
- pricing not publicly advertised - probably expensive
- claims "AI" driven real time intelligence
Echosec
{% embed url="https://www.echosec.net/" %}
- pricing not publicly advertised - probably expensive
- Hosted platform
- API - "delivers direct access to a unique range of fringe social media networks, imageboards, and forums, and allows you to integrate and leverage this data for your own products, systems, and tooling"
Nexvision
{% embed url="https://www.nexvisionlab.com" %}
Code Repositories
Grep.app
- searches 500k+ git repositories
- filter by multiple attributes
Example Results

Searchcode
- searches GitHub, bitbucket, gitlabs, etc
- filter by language/repo only
example results
.png)
Grep.app
- searches 500k+ git repositories
- filtering capabilities
Example Results
Malware Analysis
Online Sandboxes
sandbox.pikker.ee
https://sandbox.pikker.ee/dashboard/
- cuckoo cluster
- unknown affiliation
Antiscan.me
JoeSandbox
https://www.joesandbox.com/#windows
Hybrid Analysis
https://www.hybrid-analysis.com
- provided by crowdstrike, shares data with Falcon
- cheaper than VT
Android
Koodous
- mobile application with AV capabilities
- analysis tools
- requires login/install
Blockchain
Dune Analytics
https://dune.com/browse/dashboards
- User-created dashboards
- Extra useful for DeFi
Etherscan
- ethereum focused
- has wallet labels, transaction search, smart contracts
Google BigQuery
https://cloud.google.com/bigquery/
- Supports most major chains including Bitcoin, Ethereum, Zcash, doge, etc
- SQL syntax
- very useful for graphs
CoinGecko
- FREE API, with no keys
https://www.coingecko.com/en/api
Web Scraping
Scraping
description: Paste sites are internet dumpsters, but dumpster diving can be rewarding
Paste Sites
Tools
Pastehunter
{% embed url="https://github.com/kevthehermit/PasteHunter" %}
Site Info
Pastebin (https://pastebin.com)
- 8 byte alphanumeric ID
- Provides scraping API with feeds, but requires grandfathered pro account
# example
https://pastebin.com/Vqai8sBA
rentry.co
- 5 byte alphanumeric ID
- no feed
# example
https://rentry.co/uvq4v
zerobin (https://zerobin.net)
# example
https://zerobin.net/?98703897a0cdceef#uTVX3KoxQbNrDJE+efmO9sexX3jT6Zt0ed4glZj295U=
justpaste.it
- no recent feed
- no API
# example paste
https://justpaste.it/3spa1
twitlonger.com / tl.gd
- oauth from twitter only
- no recent feed
- api is for getting/updating posts only, auth required
- 9 char id (alphanumeric with symbols)
# example link
https://www.twitlonger.com/show/n_1s1vre6
http://tl.gd/n_1s1vre6
jsfiddle.net
- 8 char id (alphanumeric)
#example
https://jsfiddle.net/k092mfae/
gist.github.com
- requires username
- 32 char id (hex)
https://gist.github.com/cham423/308cba152d57d10d91a1dbd614768024
Textbin (https://textbin.net/)
- recent paste list, HTML only (would have to be scraped, extracted from homepage)
- embedded in homepage, no API, no dedicated HTML page)
# example
https://textbin.net/1q09pepj0k
ideone (ideone.com)
- IDE (sandbox code exec)
- has recent feed HTML (https://ideone.com/recent)
# example
https://ideone.com/Ok8iWL
pastesite (https://pastesite.org/lists)
- very low volume
- has recent feed HTML (https://pastesite.org/lists)
# example
https://pastesite.org/view/19d5c69e
controlC (https://controlc.com)
- popular
- no recent feed
- 8 byte hex ID
# example
<https://controlc.com/ff1435ac>
ghostbin
- defunct
sources
https://mediasonar.com/2020/09/09/pastebins-darkwebmarketplaces-osint/
Yalis
https://github.com/EatonChips/yalis
- simple, golang binary
- auto-generates emails based on email format flag
Web Scanning Datasets
PublicWWW
- search HTML source code on websites
URLScan
- can execute live scan of a potentially malicious URL
- can also search previous scan results
- results and scanned URLs are public #opsec
File Hosting Sites
- github.com
- jira.com
- dropbox.com
- sendspace.com
- mediafire.com
- onedrive.com
- teknik.io
- drive.google.com
- box.com
- evernote.com
- gofile.io
- dood.so
- anonfiles.com
- mediafire.com
- pcloud.com
- edocr.com
- 4shared.com
- files.fm
- mega.nz
- jmp.sh
- pixeldrain.com
- opendrive.com
- orangedox.com
WiFi
Wigle
- huge datastore of wardriving data
- can search for wireless networks in a specific area, or locate a physical location if a unique wireless SSID is known
Training
todo, i need to make all of these into individual sub pages
https://academy.tcm-sec.com/p/osint-fundamentals
https://inteltechniques.com/training.html
https://academy.osintcombine.com/courses
https://www.aware-online.com/en/live-webinar-osint-training-i-beginner/
https://www.intelligencewithsteve.com/osint
https://training.csilinux.com/
https://www.udemy.com/course/osint-open-source-intelligence/
https://www.sans.org/cyber-security-courses/open-source-intelligence-gathering/
https://www.sans.org/cyber-security-courses/advanced-open-source-intelligence-gathering-analysis/
description: Joe offers paid OSINT training at a fair price. See links and reviews below.
Joe Gray / The OSINTion
{% hint style="danger" %} This is found content, and lacks a user review. Please submit a review if you taken any of this training. {% endhint %}
{% embed url="https://www.theosintion.com/courses/" %}
Offerings
- Paid Training (scheduled)
Pricing
- Fair ($100-500 USD)
Reviews
- submit yours! one sentence or paragraph is plenty.
How to Contribute
GitHub
Editing on github is better if you're adding content to an existing page or making minor changes
- click "edit on github" in right panel, or go to https://github.com/cham423/okb
- find the page you want to edit (search or manual)
- make changes (markdown)
- github browser editor
- command line
- make pull request
- submit!
GitBook
GitBook editing will be required for bigger organization changes, adding new pages, or other "curator" tasks
- sign up for a gitbook account if you don't have one
- contact me (discord cham423#2790), provide your GitBook account's email address
- i will invite you to the space
- make your changes in an edit
- submit edit for review, along with any pertinent information
Rules
- todo
Contributors
Corey Ham (@cham423)
- red teamer