Public Repository Discovery

Public Repository Discovery is a critical sub-technique within the Reconnaissance phase, specifically under Public Source Code and Artifacts Analysis, where threat actors search for and identify publicly accessible code repositories associated with target organizations. Adversaries systematically locate repositories on platforms such as GitHub, GitLab, Bitbucket, and other code hosting services to gather valuable intelligence about the target's development practices, technology stack, and potential security vulnerabilities. These repositories often contain sensitive information including API keys, credentials, internal endpoints, proprietary algorithms, and architectural details that can be leveraged in subsequent attack phases. Sophisticated attackers employ automated tools to identify repositories not only by organization name but also by examining contributor accounts, commit histories, and even searching for specific code patterns that may indicate connections to the target organization. This reconnaissance activity typically leaves minimal footprints as it leverages legitimately public interfaces, making it difficult to detect. The information gleaned during Public Repository Discovery enables adversaries to develop more targeted and effective attack strategies against the organization's digital assets.

Procedure Examples

ID	Name	Description
AC-0001	ByBit $1.5B Crypto Heist	Discovery of Safe{Wallet} developer accounts and organizational repositories to understand development processes and identify potential social engineering targets with AWS access.

Mitigations

ID	Mitigation	Description
M1018	User Account Management	Implement proper access controls and monitoring for organizational repositories
M1021	Restrict Web-Based Content	Limit public access to sensitive repositories and implement private repository policies
M1017	User Training	Train developers on repository security and risks of exposing organizational information

Detection

Most Public Repository Discovery takes place entirely on public code-hosting platforms or via unauthenticated Internet search, putting the activity outside the defender’s collection boundary. Because adversaries enumerate repositories with standard web interfaces and unauthenticated API calls, there is virtually no host- or network-level telemetry inside the target enterprise that exposes the reconnaissance in real time. Visibility is usually achieved only through: - Platform-native audit logs (e.g., GitHub/GitLab telemetry) when the organisation owns the repository and can review access events. - Third-party threat-intelligence feeds that report large-scale enumeration of organisation-specific terms or leak-hunting tool fingerprints. - Down-stream effects, such as phishing or tailored exploits that reference information harvested from the public code.