Find_SSNs - Search files for U.S.
Social Security or Credit Card Numbers
Try our standalone number
validator. It's written in C++ and is very fast and efficient.
Works well with PowerShell
and egrep.
Caution!
Find_SSNs is not a silver bullet against identity theft. It helps
individuals and organizations find sensitive numbers in files on
computers. It does not secure the files it discovers. It may
produce false positives and false negatives. It may miss some files
altogether. Use it as part of a larger plan to identify and protect
sensitive data stored on computers. Do not rely solely on it. To be 100% certain that sensitive data does not exist in files, humans should manually examine the files.
Preventing sensitive data disclosures is a process. Organizations
should have ongoing, recurring efforts in place to locate and
secure sensitive data before a break-in occurs. You should also
note that Find_SSNs is a tool. Like any tool, it can be used for
good or bad purposes. For example, it can just as easily be used by
'bad guys' to find your sensitive data before you do.
Please remember to securely delete all of the Find_SSNs report files
after you are finished using the program. The report files are road
maps to potentially sensitive information. Do not store these as plain
text. Do not email them. If you want to store or email the reports,
encrypt them using GPG or TrueCrypt software.
There are other programs that offer similar functionality. If
Find_SSNs isn't right for you, try Senf
or Spider.
Commercial products are available as well.
Downloads - Latest Release October 8th,
2008
Please use FireFox to
download the 'Compiled Windows Executable'. According to our
hosting provider, IE often corrupts .exe downloads.
Compile Instructions - How to build your
own custom Windows binary from the source code.
More Program Information
Find_SSNs can search *most files for sensitive numbers.
Searchable file formats include Microsoft Word, Excel and Access as
well as file formats that store data in plain text. The OASIS Open
Document XML format (Open Office 2) and the Microsoft Office 2007
Open XML format are also supported. Adobe PDF files are supported, but PDF search is not enabled by default. See the notes in the source code about enabling it. The program searches for
sensitive numbers such as these:
9 digit U.S. Social Security Numbers
13 digit Visa
14 digit Diners Club (International and Carte Blanche)
15 digit American Express
15 digit JCB
16 digit VISA
16 digit Mastercard
16 digit Discover Card
16 digit JCB
16 digit Diners Club (U.S. and Canada)
Find_SSNs is meant to be used by anyone, not just IT Professionals.
On Windows, no software needs to be installed prior to running the
program. Just download the software and run it. It's also designed
to be as accurate as possible when searching files so as to reduce
the number of false positives. However, there will always be false
positives as many times valid sensitive numbers are often used in
other contexts. For example, 123246789 is a valid SSN and because
it's in this html page, Find_SSNs would identify this web page as a
suspect file. So, always verify the results.
How is Find_SSNs Different from Other Sensitive Data Discovery
Tools
Many sensitive data discovery programs, that search for social
security numbers, simply discard illegal area numbers (the first
three digits). In our experience, applying this method to 1 million
randomly generated nine digit numbers leaves roughly 720,000
suspect numbers. Unlike these programs, Find_SSNs uses data from
the Social Security
Administration to validate area number and group number
relationships. This validation reduces the pool of suspect numbers
to about 445,000. If Find_SSNs had access to the Social Security
Administration's death master record those 445,000 could potential
be reduced by an additional 20% to approximately 356,000 suspect
numbers.
Going from a 100% problem with an unknown scope (the locations of
the suspect files that contain sensitive data) to a 45% problem
with a known scope is very good, but not ideal for end-users. In
our opinion, no other numerical validation methods can be applied
to today's U.S. social security number format that will further
reduce false positives. Context determination, that attempts to
guess whether or not the suspect number is being used in the
context of a SSN (i.e. finding surnames in addition to numbers,
etc.) or logic that attempts to grade the context, may further
reduce false positives, but will increase the potential for false
negatives as well.
Credit card numbers are a different story. Out of 1 million
randomly generated 15 and 16 digits numbers (potential AmEx, Visa,
MasterCard, Discover and JCB) only approximately 100,000 will Luhn validate.
Find_SSNs applies these three additional validations:
This reduces the 100,000 Luhn validated numbers to approximately
25,000 numbers. Applying Bank Identifier Numbers (BIN) or Issuer
Identifier Numbers (IIN) validation would further reduce this...
although this may not be entirely possible as the American Banking
Association (ABA) is rather protective of BINs. However partial BIN
list may be found online.
In our opinion, outside of these three additional validation steps
there are no other validation methods to further reduce false
positives when searching for credit card numbers in files. In the
case of credit card numbers, we had a 100% problem with an unknown
scope that Find_SSNs reduces to a 2.5% problem with a known
scope.
* PDF (Adobe Portable Document Format) files are not searched by default... users may enable this feature. Encrypted files cannot be
searched. Nested zip archives are not searched. By default, files
larger than 100 Megabytes are not searched... users may adjust this limit.
System files and multimedia files are not searched. Read the source code for a complete list of files that are not searched.