Package Manifest Scraping

Package Manifest Scraping is a reconnaissance technique where attackers analyze application dependency files (like package.json, requirements.txt, pom.xml, Gemfile, or composer.json) to identify software components, their versions, and potential vulnerabilities. During the Application Dependencies Mapping phase of reconnaissance, attackers extract these manifests from accessible repositories, websites, or exposed configuration files to build a comprehensive understanding of the target application's technology stack. By examining these files, adversaries can pinpoint outdated libraries with known security vulnerabilities (CVEs), determine framework versions that may contain exploitable flaws, and identify dependencies that could be targeted for supply chain attacks. This intelligence gathering technique requires minimal interaction with the target system and often leverages publicly available information, making it difficult to detect while providing attackers with valuable insights for planning subsequent phases of their attack campaign.

Data Sources

Public Repositories: Dependency manifests in GitHub, GitLab, and other version control systems
Package Registries: Package metadata from npm, PyPI, Maven Central, and other repositories
API Logs: Access logs from repository and package registry APIs
Network Traffic: HTTP requests to package management endpoints

Mitigations

ID	Mitigation	Description
M1013	Application Developer Guidance	Limit sensitive information in publicly accessible dependency manifests
M1021	Restrict Web-Based Content	Implement access controls on repositories containing sensitive dependency information
M1017	User Training	Train developers on risks of exposing detailed dependency information publicly

Detection

Package-manifest scraping usually occurs on public SCM platforms or registry APIs an organisation does not control. Real-time enterprise detection is therefore limited.

Potential visibility avenues include:

Hosted-platform audit logs (GitHub/GitLab, Bitbucket) that show excessive clone or raw-file downloads when the repo is under the organisation’s control.
External threat-intel flagging large-scale scraping of project manifests tied to the brand.
Post-reconnaissance indicators such as phishing or exploit payloads referencing library versions harvested from the manifests.