Following the release of Emerald Sage’s whitepaper exploring how OSINT can help manage risk for critical infrastructure, the OSINT Combine Training and Tradecraft team has been exploring hands-on techniques and tools that can be used by security teams working in the sector to improve the security posture of their organisations.
In our last blog, we looked at how OSINT tradecraft can help with monitoring and vetting personnel and organisational footprints online. In this blog, we will continue to explore how open-source information and OSINT tradecraft can help critical infrastructure providers monitor and maintain awareness of misuse of cyber assets.
Searching Code Repositories
Code repositories are storage spaces for source code and related files. They are used by software developers for version control, collaboration with other developers, and code management. While organisations should always ensure that sensitive or proprietary code is stored safely in private, access-controlled repositories, many organisations rely on and contribute to open-source projects. Sensitive code can also be copied and stolen or misused by employees or bad actors who have gained access (for example via Github access tokens) to project source code.
In June 2024, it was reported that source code belonging to the New York Times had been leaked on the 4chan imageboard, after an exposed Github token had allegedly provided access to repositories.
There has also been reporting to suggest that a vast amount of sensitive credential and authentication information continues to be accidentally leaked in code stored on Github, as well as warnings earlier for IT Security teams to become more aware and proactive of Github security risks.
Searching code repositories can assist in identifying organisational misuse of cyber assets and uncover sensitive information that has been accidentally or maliciously published in code. This might include passwords and password hashes, API keys, private encryption keys, and information about security software and systems.
Code searching engines can help organisations conduct regular audits of their online assets and information. Github is the most well-known code repository, and it includes advanced search fields that can be used to target keywords and pieces of code, authors, locations and languages.
Github Advanced Search: https://github.com/search/advanced
Searchcode (https://searchcode.com) has one of the largest indexes of public code on the web—it searches across multiple sites and repositories and supports advanced search language as well.
Conducting domain and keyword searches can identify mentions of an organisation in indexed code—perhaps code snippets, author information, or references to connected websites and email addresses.
Even conducting cursory searches to better understand the information available online about your organisation’s infrastructure and software is a proactive approach to maintaining security.
Searching Cloud Storage Drives
Employees and contractors may also use cloud storage drives to upload and share information with others—this might include source code, and documents that include specific information about technology and infrastructure. Publicly accessible cloud drives are indexed by search engines, but because they are not stored on an organisational domain, users may assume that they won’t be stumbled across.
Use the site: operator to target domains for Azure, AWS and Google drives – see an example of this in the image below.
Developing Keywords List
To monitor public code repositories, cloud storage drives and paste sites for sensitive information, security teams can develop a list of organisation and technology-specific keywords for searching. Examples might include:
Domain and email names i.e. ‘widget.com’ and ‘@widget.com’
Sensitive keywords i.e. ‘password’, ‘apiKey’, ‘credentials’, ‘secret’, ‘ssh_key’.
Configuration files i.e. ‘config’, ‘settings.py’.
Company-specific keywords and phrases i.e. ‘Widget Internal’, ‘Widget Confidential’, ‘Commercial In Confidence’.
Combining sensitive terms with company names or domains and searching across code repositories and cloud storage drives can identify where sensitive information has been leaked—a security team with a thorough understanding of an organisation’s business, terminology (including specific acronyms), and systems will be better able to develop targeted searches for exposed information.
Exploit DB’s Google Hacking Database is an excellent resource for understanding how attackers (and defensive security teams) can use targeted search strings to identify sensitive information with Google. It contains user-contributed examples of search strings targeting, amongst other things. Below, you can see some examples of search strings (or Google ‘dorks’) to identify password-related phrases:
Dark Web Monitoring
Security teams may also want to monitor dark web forums and breach data sharing sites to identify when exposed information is requested, discussed or offered for sale. Breach data forums, where users discuss and share leaked information, can provide information about the scale, recency and severity of leaks. Law enforcement seizures of sites such as BreachForums can make access difficult—to view data, users must register with a whitelisted Dark Web email address, which may be prohibitive for some teams and organisations. Once registered, though, users can search for keywords across the forum posts, and view discussions.
Monitoring Dark Web hidden services and breach data forums requires careful management of attribution and systems—always keep your operational security in mind when conducting investigations of malicious cyber actors. Security teams might consider a tool like NexusXplore, which provides a readymade, non-attributable environment for accessing the Dark Web.
For more information about accessing Tor hidden services safely, see our previous blog on setting up a virtual environment for Dark Web access.
Summary
This blog has explored some of the tools and tradecraft for identifying where sensitive information and source code may have been revealed online. Security teams can use code search engines to conduct keyword searches across repositories to identify leaked credentials, auth tokens, configuration files, and more. Some teams may also conduct monitoring of Dark Web and breach data forums to identify leaked information about cyber assets.
As always, it is important to remember that any of these OSINT techniques can be used by nefarious actors, and so it is imperative for organisations to incorporate open-source information monitoring proactively as well as in response to incidents.
To support your OSINT collection and analytical capability uplift, contact us at training@osintcombine.com to learn about our off-the-shelf and bespoke training offerings.