Publicly accessible content auditing

Discovery Exercise

Are you showing things to the world that you shouldn’t be? Simple search operators revealed the above example. Is this YOUR document? Should anyone on the Internet be able to download it? Make a #DataPrivacyWeek impact for your business with this simple discovery exercise.

Search engines are constantly crawling and indexing websites. They don’t just look at web pages (HTML files). They index a wide variety of file types, including PDFs, Word documents (.doc, .docx), spreadsheets (.xls, .xlsx), presentations (.ppt, .pptx), and more.

You can use some simple search operators to see what your own website is publishing. Visit your search engine of choice, and try the following operators:

site:name-of-your-website[.]com filetype:pdf

Anything interesting? What if instead of pdf, you try docx, doc, xlsx, pptx, rtf, and so on? Try other search engines as well.

Found a TON of results? Use the intext: operator and/or the – (minus) operator to search for specific words or filter out certain words.

Things to note:

1️⃣ – If you have never seen a defanged URL before, you have now! Fangs removed…as in, it can no longer bite! The square brackets prevent automatic conversion of on otherwise live URL via your browser, LinkedIn’s ‘Post’ function, or wherever the URL shows up and is processed.

2️⃣ – When you do your search, you will need to use your real website’s name, with no square brackets, no https://, and no www.

3️⃣ – This is not the end of the story. With only the above operators, you will be limiting your search to a specific website and a specific file type. Just because there are no results, doesn’t mean you are squared away. There are tons of places where files you produce can exist. Which systems do you use? Do you have a common title or header?

4️⃣ – Additional operators can assist you in searching for specific things or excluding certain things. Get creative, but do not include anything sensitive in a search operator.

5️⃣ – This is #DataPrivacyWeek after all. Know that accessing systems that do not belong to you may be an issue.

That said, want to get further in the weeds? Check out the Google Hacking Database (GHDB).

Applicable Security Controls

Does any of this seem familiar? You might recognize the implications of publicly accessible content if you are familiar with NIST SP 800-171. Specifically, Access Control 3.1.22 (CMMC Practice AC.L2-3.1.22) states:

Control CUI posted or processed on publicly accessible systems.

With a bit of manipulation of search operators it is entirely possible to find Controlled Unclassified Information (CUI) on publicly accessible systems.

Happy searching!