description: >- The OSINT Knowledge Base (OKB) is a collection of OSINT tools, products, and data sources. Its goal is to provide curated content to help people along their OSINT journeys. cover: https://w.wallhaven.cc/full/xl/wallhaven-xl1ydl.jpg coverY: 0

The OKB

The Talk

{% content-ref url="broken-reference" %} Broken link {% endcontent-ref %}

Other Great OSINT Resources

{% embed url="https://metaosint.github.io/" %} TONS of links. The interactive graph is cool. {% endembed %}

{% embed url="https://inteltechniques.com/tools/index.html" %} Michael Bazzell's IntelTechniques Site is Excellent {% endembed %}

{% embed url="https://sector035.nl/links" %} Links Broken Down By Category {% endembed %}

{% embed url="https://osintframework.com" %} A fun graphical tool with links for a bunch of categories {% endembed %}


description: Evaluating OSINT Data sources and Tooling - Build vs Buy cover: >- https://images.unsplash.com/photo-1565551223391-be988013ee6d?crop=entropy&cs=tinysrgb&fm=jpg&ixid=MnwxOTcwMjR8MHwxfHNlYXJjaHwzfHxicm9rZW58ZW58MHx8fHwxNjY1NjM2Mzc5&ixlib=rb-1.2.1&q=80 coverY: 0

<overly complex title>

"An OSINT Talk"

by: Corey Ham


description: Welcome!

Introduction

What is this talk?

This is a talk where I introduce a resource to help you make decisions about what tools, sites, or techniques to use next time you use OSINT.

I'll also talk about my approach to OSINT, and provide some generalized recommendations based on my experience.

Who am I?

  • Corey Ham | cham423
  • A tester at BHIS
    • mostly red team and adversary sim experience
    • client-facing tester since 2013
    • previously Optiv, Hurricane Labs
  • OSINT Enjoyer
    • Have used OSINT during offensive engagements over the years out of necessity
    • Also do OSINT for "fun" (data hoarding)

Disclaimers

  • this talk contains opinions, biases, assumptions, and YMMV
  • there are infinite OSINT use cases, and I can't cover them all

The Idea

OSINT is easy! But there's so much out there...

  • nearly infinite amount of data available
  • difficult to judge quality of data
  • lots of tasks require human brains, manual effort

Existing OSINT resources lack curation

  • huge lists of links (guilty)
  • difficult to see return data types/capabilities ahead of time
  • for beginners, difficult to map tools to specific workflows
    • example: business email enumeration

Time wasters:

  • For free products, getting tools running can take time
    • keys, deployment, proxying, etc
  • Hosted tools often have limitations
  • For paid products, cost/benefit

We also have to decide between building it ourselves, using existing tools, or just paying for a commercial product.

The Solution (hopefully)

  • operators build their list of "go-to" tools over time based on experience
  • why not share this experience?

This site!! have you noticed my slides are on gitbook? it's public.

https://okb.goldmine.sh

  • Please contribute (more info on that later)

Focus on metrics, fallback to opinions where metrics don't make sense

  • for example, can compare certain tools by number of results and accuracy of results
  • cannot review training this way

Addressing Build vs. Buy

It was in the title of the talk...

Build Everything: $1500 Sandwich in 6 Months

youtube channel “How to Make Everything” demonstrated how taking the “build” mentality to the absolute extreme can lead to ridiculous outcomes. His process involved harvesting and milling his own wheat to make the bread, killing his own chicken, and planting a garden to grow ingredients. in the end, the sandwich was “not bad” and consumed 6 months of his life.

Buy Everything: budget error 406 not acceptable

  • lots of overlapping data
  • data you have no use for
  • you might get fired
  • purchased tools still need to be integrated

Balanced Approach

  • some "assembly" is always required
    • integrating paid tools
    • deploying free tools
    • automation component
  • Mix build and buy to meet your OSINT goals

Easy Examples:

  • Buy
    • internet-sized attack surface data -- building this would involve scanning billions of hosts, cost lots in hosting, and require lots of development to enrich the data to make it usable. and Shodan is quite affordable
  • Build
    • profile enumeration by username (demo) -- there are tons of ways to do this, and it's relatively straightforward
    • anything that doesn't exist already -- you have no other options...

Site Walkthrough

Reviewed Content

Workflows

  • a specific task, along with inputs and outputs

Workflow Example

{% content-ref url="../reviewed-content/workflows/profile-discovery-by-username.md" %} profile-discovery-by-username.md {% endcontent-ref %}

Categories

  • generic OSINT sources/tasks that do not have specific inputs and outputs

Example

{% content-ref url="../reviewed-content/categories/public-breach-databases.md" %} public-breach-databases.md {% endcontent-ref %}

General Knowledge

  • basically just huge lists of links (like I complained about in the start of the talk)

Contribute!

New Workflow?

  • consider your testing procedure, make sure it is repeatable

Have you used any of this before?

  • add your experience/opinion to their descriptions

Have something new?

  • add it, along with any supporting information you have

{% content-ref url="../about/how-to-contribute.md" %} how-to-contribute.md {% endcontent-ref %}

Some General OSINT Thoughts

Consider OPSEC before you start

  • your intelligence gathering is part of someone else's intelligence
  • use a clean image if creating admissable evidence
    • Michael Bazzell has an excellent guide for this in his OSINT Techniques book

Automate anything you can

  • build a simple API wrapper script
  • consider storing it in jupyter
  • at the very least, build a checklist/standard procedure to follow

Always keep your output

  • I prefer JSON/structured output formats, but you can parse it later
  • Consider storing everything in elastic/ELK
    • Newer versions of ES are really good at auto-mapping
    • curl -XPOST http://elk:9200/osint_dump_2022-09-15/_doc -H "Content-Type: application/json" -d @amass_out.json

Wrap-up / Questions

  • i really hope people use it :)
  • more content is coming!
  • any questions?

Workflows


description: 'inputs: username | outputs: list of profiles associated with that username'

Profile Discovery by Username

Testing Methodology

  1. check the same username (i.e. cham423) using each tool
    • count the number of profiles discovered
    • count number of false positives (nonexistent profiles, not mismatches)
  2. rate difficulty of deployment
  3. rate quality of output
  4. rate development health
  5. rate automation potential

{% hint style="info" %} see the template at the bottom of the page if you are adding a new tool. {% endhint %}

Build Pick

Sherlock

{% embed url="https://github.com/sherlock-project/sherlock" %}

Example Results

python3 sherlock.py cham423 -o cham423.out
  • 38 total results
  • 0 false positives!
  • 7 results were not the target, but were valid accounts

Deployment

  • python (medium difficulty)
  • has a docker container as a fallback

Output

  • basically stdout, no JSON output option -- a list of discovered profile urls

Development Health

  • very healthy, recent commits, large number of contributors, long history

Automation Potential

  • very high, command line tool that supports input lists
  • <todo> not tested at scale. from a single host, would likely encounter rate limiting when enumerating large numbers of profiles

Other Tools

WhatsMyName

{% embed url="https://github.com/WebBreacher/WhatsMyName" %}

Example Results

 python3 whats_my_name.py -u cham423 -o cham423.out
  • 114 total results
  • 75 false positives! (65% false positive rate)

Deployment

  • python (medium difficulty)

Output

  • file output was broken at time of testing (2022-09-18)
  • stdout is printed in table form with lots of extra spaces/characters, and some profile links are cut off

Development Health

  • OK, some recent contributions

Automation Potential

  • medium -- this project is used by many projects for its JSON list of profile urls, which can be imported to check usernames automatically. the false positive rate is alarming

Namechk (web)

{% embed url="https://namechk.com" %}

Example Results

  • 25 total results
  • 1 false positive

Deployment

  • web hosted (easy difficulty)
  • 1-5 clicks to full results

Output

  • No file output
  • Profile links to signup/about pages, not valid profiles

Development Health

  • unknown (third party)

Automation Potential

  • very limited: captcha + cloudflare would prevent easy scraping

namecheckr

{% embed url="https://www.namecheckr.com/" %}

Example Results

  • 12 total results
  • 1 false positives

Output

  • no file output
  • site links point to root page, not profile link

Development Health

  • unknown (third party)

Automation Potential

  • no CAPTCHA, so could be scraped

UserSearch.org

{% embed url="https://usersearch.org/" %}

Example Results

  • 15 total results
  • 1 false positive

Deployment

  • web hosted (easy difficulty)
  • requires 10+ clicks to see all sites, no single page with all results

Output

  • no file output
  • profile links available in search results, but not on single page

Development Health

  • unknown (third party)

Automation Potential

  • no CAPTCHA, could potentially be scraped

NameCheckup

{% embed url="https://namecheckup.com/" %}

Example Results

  • 14 total results
  • 1 false positive

Deployment

  • web hosted (easy difficulty)
  • 1-5 clicks to results

Output

  • no file output
  • results link to specific profiles

Development Health

  • unknown (third party)

Automation Potential

  • no CAPTCHA, could potentially be scraped

CheckUsernames

{% embed url="https://checkusernames.com/" %}

Example Results

  • 42 total results
  • 27 false positives
  • 79 errors checking target site

Deployment

  • web hosted (easy difficulty)
  • 1-5 clicks to results

Output

  • no file output
  • profile links are to individual profile, not to root page which is nice

Development Health

  • unknown (third party)

Automation Potential

  • no CAPTCHA, could potentially be scraped

Namechk (script)

{% hint style="danger" %} This tool is not currently functional as of 2022-09-18, and returns all false positives due to Cloudflare bot protection {% endhint %}

{% embed url="https://github.com/GONZOsint/Namechk" %}

Example Output

Todo

Template

<link to tool>

Example Results

  • n total results
  • n false positives

Deployment

Output

  • <describe output options>

Development Health

  • <describe project health>

Automation Potential

  • <describe automation potential>

description: 'input: domains | output: list of emails for input domains'

Email Enumeration by Domain

Testing Methodology

  1. check the same domain (i.e. microsoft.com) using each tool
    • count the number of results
  2. rate difficulty of deployment
  3. rate quality of output
  4. rate development health
  5. rate automation potential

Phonebook.cz

https://phonebook.cz/

  • requires IntelX account to use (free signup)
  • 39,601 results for microsoft.com

Hunter

{% embed url="https://hunter.io/search" %}

Example Results

  • 35,830 results
  • also shows email format (i.e. {f}.{last}@domain.com

Deployment Difficulty

  • web hosted (easy difficulty)
  • has API

Output

  • can export subset of records from web interface
  • can export full records using API (if credits available in account)

Pricing

  • $50-400/month depending on plan
  • free plan available, limited quota per month

Infoga

<todo>

{% embed url="https://github.com/m4ll0k/infoga" %}

  • "Infoga is a tool gathering email accounts informations (ip,hostname,country,...) from different public source (search engines, pgp key servers and shodan) and check if emails was leaked using haveibeenpwned.com API"
  • Language: python

skymem.info

{% hint style="danger" %} Note: i have not confirmed this is a legitimate site {% endhint %}

{% embed url="http://www.skymem.info/" %}

Example Results

  • 36631 results

Deployment Difficulty

  • web hosted (easy difficulty)
  • No API

Pricing

  • varies by results size, not transparent
  • large lists get into $2000-3000 range!!!

Categories

Public Breach Databases

Build Pick: Custom ES Cluster

Buy Pick: Dehashed

  • $15.50/month, or $179 annually
  • separate pricing for API credits and web access
    • API pricing: $3 for 100 credits
  • Contains passwords
  • certain sensitive data is not accessible to normal accounts (i.e. SSNs)
  • offensive focus, but could be useful for defenders to determine what is exposed/change passwords

Also Worth It: Have I Been Pwned (HIBP)

  • worth it, $3.50 per month subscription fee
  • highly effective data breach notification service
  • passwords accessible in certain circumstances

IntelX

  • pricing unclear, but can use https://phonebook.cz for free to gather certain attributes (i.e. email addresses)

https://intelx.io/

Live Host Data

Buy Pick: Shodan

https://www.shodan.io

Details

  • $69 - $1099 per month depending on number of results per month needed
  • 1 million, 20 million, or unlimited results options
  • 137,390,036 results for port:443

Usage Tips

  • use net:127.0.0.1/8 for searching for results in a CIDR network
  • supports JARM searches with ssl.jarm parameter
  • can stack queries i.e. ssl.jarm:"2ad2ad16d2ad2ad22c42d42d000000dc2b105e4dda975fa70719c0cae5d0ce" net:203.97.69.74/24
  • supports organization filters i.e. org:"Verizon Business"
  • supports SSN cert CN filters i.e. ssl.cert.subject.cn:"*.google.com"

BinaryEdge

https://www.binaryedge.io/

  • free account available, 250 requests per month
  • also gathers torrent/DHT data
  • $10-500 a month for higher query limits
  • 163,691,756 results for port:443

LeakIX

https://leakix.net/

  • gray hat scanning platform
  • targeted at identifying vulnerable services, but can be used for other things
  • has "reporting" capability where vulnerabilities can be reported to their owners

RiskIQ Community

https://community.riskiq.com

  • free account with business email has the following monthly quotas:
Web Searches: 350
API Searches: 3,500
Community (public) Projects: 1,000
Team & Analyst (private) Projects: 1
Basic Monitors: 5,000
Keyword Monitors: 10
Monitor Frequency: Weekly
Quota Duration
Monthly
Last Reset
2022-10-01 00:00:00
Next Reset
2022-11-01 00:00:00
  • recently acquired by microsoft, integrated into Defender Threat Intelligence so future uncertain

Greynoise

https://viz.greynoise.io/

  • "GreyNoise is a cybersecurity platform that collects and analyzes Internet-wide scan and attack traffic"
  • does not scan the internet, but collects data on what is scanning/attacking from the internet

Censys

https://search.censys.io/

  • 96,166,688 results for port:443

Netlas

https://netlas.io

Onyphe

https://www.onyphe.io/pricing/

  • 59 Euros for search access
  • 100-1000 Euro plans for business
  • unknown data size

Build Pick: N/A

  • most users should not try to scan the whole internet
  • if you are the person that needs to build this, hopefully you know what you're doing

description: >- Multipurpose tools gather multiple pieces of information in one go, as opposed to a single type of data

Multipurpose Reconnaissance Tools

Recon-ng

{% embed url="https://github.com/lanmaster53/recon-ng" %}

Maltego

{% embed url="https://www.maltego.com" %}

  • long standing general-purpose OSINT tool with graphical interface
  • Free community edition available, with data gathering limitations
  • $1000/year professional license

TheHarvester

{% embed url="https://github.com/laramies/theHarvester" %}

  • performs open source intelligence (OSINT) gathering to help determine a domain's external threat landscape.
  • gathers names, emails, IPs, subdomains, and URLs by using multiple public resources
  • Language: python

Spiderfoot

{% embed url="https://github.com/smicallef/spiderfoot" %}

Identity Data

Free/Open Source

Commercial

Pipl

https://pipl.com

  • built primarily for identity verification

description: If you're trying to get paid for OSINT, this will be required

Documentation


description: Screenshots or it didn't happen

Screen Capture

Hunchly

{% embed url="https://www.hunch.ly" %}

  • Browser extension specifically designed for OSINT investigations
  • automatically captures webpages you visit
  • free 30 day trial, $130/year

Greenshot

{% embed url="https://getgreenshot.org/" %}

  • Free on Windows
  • $2 on Mac

Snagit

{% embed url="https://www.techsmith.com/screen-capture.html" %}

  • $63 license, perpetual but no updates/new features after 1 year
  • Windows/Mac
  • Video capture + webcam, scrolling captures, built-in shapes/censoring, much more

Maps

ZeeMaps

{% embed url="https://www.zeemaps.com/" %}

BatchGeo

{% embed url="https://batchgeo.com/" %}

CreePy

{% embed url="https://github.com/ilektrojohn/creepy" %}

  • No active development, last update 2014

Threat Intelligence

Free/Open Source

ThreatMiner

{% embed url="https://www.threatminer.org" %}

AlienVault OTX

{% embed url="https://otx.alienvault.com/" %}

BotScout

{% embed url="http://botscout.com/" %}

Bluelive Threat Exchange

{% embed url="https://community.blueliv.com/#!/discover" %}

X-Force Exchange

{% embed url="https://exchange.xforce.ibmcloud.com/" %}

Project Honeypot

{% embed url="https://www.projecthoneypot.org/index.php" %}

HoneyDB

{% embed url="https://honeydb.io/" %}

MISP

{% embed url="https://www.misp-project.org/" %}

Commercial

VirusTotal Intelligence

{% embed url="https://www.virustotal.com/gui/intelligence-overview" %}

  • trial available for request

SkopeNow

{% embed url="https://www.skopenow.com" %}

{% embed url="https://sociallinks.io" %}

NexusExplore

{% embed url="https://www.nexusxplore.com" %}

Recorded Future

{% embed url="https://www.recordedfuture.com" %}

Dataminr

{% embed url="https://www.dataminr.com/" %}

  • pricing not publicly advertised - probably expensive
  • claims "AI" driven real time intelligence

Echosec

{% embed url="https://www.echosec.net/" %}

  • pricing not publicly advertised - probably expensive
  • Hosted platform
  • API - "delivers direct access to a unique range of fringe social media networks, imageboards, and forums, and allows you to integrate and leverage this data for your own products, systems, and tooling"

Nexvision

{% embed url="https://www.nexvisionlab.com" %}

Code Repositories

Grep.app

https://grep.app

  • searches 500k+ git repositories
  • filter by multiple attributes

Example Results

Searchcode

https://searchcode.com/

  • searches GitHub, bitbucket, gitlabs, etc
  • filter by language/repo only

example results

Grep.app

https://grep.app

  • searches 500k+ git repositories
  • filtering capabilities

Example Results

Malware Analysis

Online Sandboxes

sandbox.pikker.ee

https://sandbox.pikker.ee/dashboard/

  • cuckoo cluster
  • unknown affiliation

Antiscan.me

https://antiscan.me

JoeSandbox

https://www.joesandbox.com/#windows

Hybrid Analysis

https://www.hybrid-analysis.com

  • provided by crowdstrike, shares data with Falcon
  • cheaper than VT

Android

Koodous

https://koodous.com

  • mobile application with AV capabilities
  • analysis tools
  • requires login/install

Blockchain

Dune Analytics

https://dune.com/browse/dashboards

  • User-created dashboards
  • Extra useful for DeFi

Etherscan

https://etherscan.io

  • ethereum focused
  • has wallet labels, transaction search, smart contracts

Google BigQuery

https://cloud.google.com/bigquery/

  • Supports most major chains including Bitcoin, Ethereum, Zcash, doge, etc
  • SQL syntax
  • very useful for graphs

CoinGecko

  • FREE API, with no keys

https://www.coingecko.com/en/api

Web Scraping

Scraping


description: Paste sites are internet dumpsters, but dumpster diving can be rewarding

Paste Sites

Tools

Pastehunter

{% embed url="https://github.com/kevthehermit/PasteHunter" %}

Site Info

Pastebin (https://pastebin.com)

  • 8 byte alphanumeric ID
  • Provides scraping API with feeds, but requires grandfathered pro account
# example
https://pastebin.com/Vqai8sBA

rentry.co

  • 5 byte alphanumeric ID
  • no feed
# example
https://rentry.co/uvq4v

zerobin (https://zerobin.net)

# example
https://zerobin.net/?98703897a0cdceef#uTVX3KoxQbNrDJE+efmO9sexX3jT6Zt0ed4glZj295U=

justpaste.it

  • no recent feed
  • no API
# example paste
https://justpaste.it/3spa1

twitlonger.com / tl.gd

  • oauth from twitter only
  • no recent feed
  • api is for getting/updating posts only, auth required
  • 9 char id (alphanumeric with symbols)
# example link
https://www.twitlonger.com/show/n_1s1vre6
http://tl.gd/n_1s1vre6

jsfiddle.net

  • 8 char id (alphanumeric)
#example
https://jsfiddle.net/k092mfae/

gist.github.com

  • requires username
  • 32 char id (hex)
https://gist.github.com/cham423/308cba152d57d10d91a1dbd614768024

Textbin (https://textbin.net/)

  • recent paste list, HTML only (would have to be scraped, extracted from homepage)
    • embedded in homepage, no API, no dedicated HTML page)
# example
https://textbin.net/1q09pepj0k

ideone (ideone.com)

# example
https://ideone.com/Ok8iWL

pastesite (https://pastesite.org/lists)

# example
https://pastesite.org/view/19d5c69e

controlC (https://controlc.com)

  • popular
  • no recent feed
  • 8 byte hex ID
# example
<https://controlc.com/ff1435ac>

ghostbin

https://ghostbin.co

  • defunct

sources

https://mediasonar.com/2020/09/09/pastebins-darkwebmarketplaces-osint/

LinkedIn

Yalis

https://github.com/EatonChips/yalis

  • simple, golang binary
  • auto-generates emails based on email format flag

Web Scanning Datasets

PublicWWW

https://publicwww.com/

  • search HTML source code on websites

URLScan

https://urlscan.io/

  • can execute live scan of a potentially malicious URL
  • can also search previous scan results
  • results and scanned URLs are public #opsec

File Hosting Sites

  • github.com
  • jira.com
  • dropbox.com
  • sendspace.com
  • mediafire.com
  • onedrive.com
  • teknik.io
  • drive.google.com
  • box.com
  • evernote.com
  • gofile.io
  • dood.so
  • anonfiles.com
  • mediafire.com
  • pcloud.com
  • edocr.com
  • 4shared.com
  • files.fm
  • mega.nz
  • jmp.sh
  • pixeldrain.com
  • opendrive.com
  • orangedox.com

WiFi

Wigle

https://wigle.net/

  • huge datastore of wardriving data
  • can search for wireless networks in a specific area, or locate a physical location if a unique wireless SSID is known

Training

todo, i need to make all of these into individual sub pages


https://academy.tcm-sec.com/p/osint-fundamentals
https://inteltechniques.com/training.html
https://academy.osintcombine.com/courses

https://www.aware-online.com/en/live-webinar-osint-training-i-beginner/

https://www.intelligencewithsteve.com/osint

https://training.csilinux.com/

https://www.udemy.com/course/osint-open-source-intelligence/

https://www.sans.org/cyber-security-courses/open-source-intelligence-gathering/

https://www.sans.org/cyber-security-courses/advanced-open-source-intelligence-gathering-analysis/


Joe Gray / The OSINTion

{% hint style="danger" %} This is found content, and lacks a user review. Please submit a review if you taken any of this training. {% endhint %}

{% embed url="https://www.theosintion.com/courses/" %}

Offerings

  • Paid Training (scheduled)

Pricing

  • Fair ($100-500 USD)

Reviews

  • submit yours! one sentence or paragraph is plenty.

How to Contribute

GitHub

Editing on github is better if you're adding content to an existing page or making minor changes

  • click "edit on github" in right panel, or go to https://github.com/cham423/okb
  • find the page you want to edit (search or manual)
  • make changes (markdown)
    • github browser editor
    • command line
  • make pull request
  • submit!

GitBook

GitBook editing will be required for bigger organization changes, adding new pages, or other "curator" tasks

  • sign up for a gitbook account if you don't have one
  • contact me (discord cham423#2790), provide your GitBook account's email address
  • i will invite you to the space
  • make your changes in an edit
  • submit edit for review, along with any pertinent information

Rules

  • todo

Contributors

Corey Ham (@cham423)

  • red teamer