Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(sbom): Duplicate SBOM packages for multi-module pom.xml files #7824

Open
DmitriyLewen opened this issue Oct 30, 2024 Discussed in #7795 · 7 comments · May be fixed by #7879
Open

bug(sbom): Duplicate SBOM packages for multi-module pom.xml files #7824

DmitriyLewen opened this issue Oct 30, 2024 Discussed in #7795 · 7 comments · May be fixed by #7879
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@DmitriyLewen
Copy link
Contributor

Description

mvn handles modules separate.
Trivy uses same logic:

// Modules should be handled separately so that they can have independent dependencies.
// It means multi-module allows for duplicate dependencies.

But SPDX format doesn't allow duplicate SPDXIDs - https://spdx.github.io/spdx-spec/v2.3/package-information/#72-package-spdx-identifier-field

Same for CycloneDX - https://cyclonedx.org/docs/1.6/json/#components

Solutions

  1. We will add workspace relationship for maven modules (see bug(java): dependOn contains extra dependencies for pom.xml files with modules when using SBOM formats #7802). After these changes Trivy will use rootPkg -> workspace -> directDeps -> IndirectDeps logic.
    This logic is different from mvn logic. So may want to remove duplicates in parser.
  2. We will remove duplicates when converting Report into BOM

Example

Test project:

➜  cat pom.xml 
    <groupId>com.example</groupId>
    <artifactId>root</artifactId>
    <version>1.0.0</version>

    <modules>
        <module>module1</module>
        <module>module2</module>
    </modules>

➜  cat module1/pom.xml 
    <groupId>com.example</groupId>
    <artifactId>module1</artifactId>
    <version>1.0.0</version>

    <dependencies>
        <dependency>
            <groupId>org.example</groupId>
            <artifactId>example-api</artifactId>
            <version>1.1.1</version>
        </dependency>
    </dependencies>

➜  cat module2/pom.xml
    <groupId>com.example</groupId>
    <artifactId>module2</artifactId>
    <version>2.0.0</version>

    <dependencies>
        <dependency>
            <groupId>org.example</groupId>
            <artifactId>example-api</artifactId>
            <version>1.1.1</version>
        </dependency>
    </dependencies>

mvn output:

➜  mvn dependency:tree
[INFO] ------------------------< com.example:module1 >-------------------------
[INFO] Building module1 1.0.0                                             [1/3]
[INFO]   from module1/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[WARNING] The POM for org.example:example-api:jar:1.1.1 is missing, no dependency information available
[INFO] 
[INFO] --- dependency:3.7.0:tree (default-cli) @ module1 ---
[INFO] com.example:module1:jar:1.0.0
[INFO] \- org.example:example-api:jar:1.1.1:compile
[INFO] 
[INFO] ------------------------< com.example:module2 >-------------------------
[INFO] Building module2 2.0.0                                             [2/3]
[INFO]   from module2/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- dependency:3.7.0:tree (default-cli) @ module2 ---
[INFO] com.example:module2:jar:2.0.0
[INFO] \- org.example:example-api:jar:1.1.1:compile
[INFO] 
[INFO] --------------------------< com.example:root >--------------------------
[INFO] Building root 1.0.0                                                [3/3]
[INFO]   from pom.xml
[INFO] --------------------------------[ pom ]---------------------------------
[INFO] 
[INFO] --- dependency:3.7.0:tree (default-cli) @ root ---
[INFO] com.example:root:pom:1.0.0
[INFO] ------------------------------------------------------------------------

trivy outputs:

➜  trivy -q fs ./pom.xml -f json --list-all-pkgs | grep ID
...
          "ID": "org.example:example-api:1.1.1",
            "UID": "e574f6e703187373"
          "ID": "org.example:example-api:1.1.1",
            "UID": "e574f6e703187373"

➜  trivy -q fs ./pom.xml -f spdx-json | grep SPDXID -B 1
...
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-a9813b377fc4bc80",
--
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-a9813b377fc4bc80",
...

Discussed in #7795

@DmitriyLewen DmitriyLewen added the kind/bug Categorizes issue or PR as related to a bug. label Oct 30, 2024
@DmitriyLewen DmitriyLewen self-assigned this Oct 30, 2024
@DmitriyLewen
Copy link
Contributor Author

I am not sure if trivy reports should contain duplicates.
That's why i voted for the 1st solution.

@knqyf263 wdyt? You added this logic, maybe i missed something.

@DmitriyLewen DmitriyLewen added this to the v0.58.0 milestone Oct 30, 2024
@knqyf263
Copy link
Collaborator

knqyf263 commented Oct 30, 2024

Even if the component has the same name and version, the dependency of the component could be different.
#6694 (reply in thread)

graph LR;
  pomRoot(com.example:root v1.0.0)
  mod1(com.example:module1 v1.0.0)
  mod2(com.example:module2 v2.0.0)
  pomC(org.example:example-api v1.1.1)
  pomE(POM E v2.0.0)

  pomRoot-->mod1
  pomRoot-->mod2
  mod1-->pomC
  pomC-->pomE

  pomC'(org.example:example-api v1.1.1)
  pomD'(POM D v1.0.0)
  pomE'(POM E v2.1.0)

  mod2-->pomC'
  mod2-->pomD'
  pomC'-->pomE'
  pomD'-->pomE'
Loading

org.example:example-api:v1.1.1 looks identical, but the child dependency can be different for various reasons (e.g. dependencyManagement). Therefore, I'd say they are really really similar, but different components.

@DmitriyLewen
Copy link
Contributor Author

DmitriyLewen commented Oct 30, 2024

hmm... you're right. I missed that.
I'll take a look and update our logic for creating SPDXID

@DmitriyLewen
Copy link
Contributor Author

I updated logic for SPDXIDs (#7837).
It removes duplicates:

➜  trivy -q fs ./pom.xml -f spdx-json | grep '"org.example:example-api"' -A 1 
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-a9813b377fc4bc80",
--
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-a9813b377fc4bc80",
➜  ./trivy -q fs ./pom.xml -f spdx-json | grep '"org.example:example-api"' -A 1
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-a5527f408fa64d61",
--
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-b1e7f5814081cb0e",

But i found another problem:
We can't correctly choose child component:

We have 2 components with same pkgID, then when we parse dependsOn - we take first found component for all components:

{
    ...
    {
      "name": "com.example:root",
      "SPDXID": "SPDXRef-Package-beb5534e91f2fc01",
      "versionInfo": "1.0.0",
     ...
    },
    {
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-36a9eeebfbc737b0",
      "versionInfo": "1.1.1",
      ...
    },
    {
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-7d0aa2ced54119b2",
      "versionInfo": "1.1.1",
      ...
    },
    ...
  ],
  "relationships": [
    ...
    {
      "spdxElementId": "SPDXRef-Package-beb5534e91f2fc01",
      "relatedSpdxElement": "SPDXRef-Package-7d0aa2ced54119b2",
      "relationshipType": "DEPENDS_ON"
    },
    {
      "spdxElementId": "SPDXRef-Package-beb5534e91f2fc01",
      "relatedSpdxElement": "SPDXRef-Package-7d0aa2ced54119b2",
      "relationshipType": "DEPENDS_ON"
    },
    ...
  ]
}         

I thought a bit and found some ideas:

  1. we will use UID for child dependencies (dependsOn slice). But for this case we need to build UID in each parser...
    Also we need to add new filed for same pkgIDs from different modules (e.g. in this map add module name).
  2. Use separate Result for each maven module. This logic is similar to mvn logic. In this case result will not contain duplicates. But for this case we need to wrap our pom.xml parser (to return []ftypes.Package and []ftypes.Dependency for each module).
  3. Add info about root module name into Package. We will use this field to build UID, find module/workspace dependencies, etc.
    3.1. We can addmodule field to root of Package (next to Dev, Arch, etc.).
    3.2 We will add workspace relationship (see bug(java): dependOn contains extra dependencies for pom.xml files with modules when using SBOM formats #7802). We can expand relationship field. This field will include relationship + related/root element. e.g.:
    • RelationshipRoot, ""
    • RelationshipWorkspace, "root/pom.xml" // this module from root pom.xml
    • RelationshipWorkspace, "module1" (or RelationshipWorkspace, "module1<separator>root/pom.xml)" // nested modules (root->module1->module2). This case for module2.
    • RelationshipDirect, "module1" // dependency got from module1
    • RelationshipIndirect, "" // dependency from root pom.xml

@knqyf263 Can you take a look? Perhaps you will able to see another way.

@knqyf263
Copy link
Collaborator

knqyf263 commented Nov 5, 2024

The current package ID (name@version) was implemented based on the assumption that the identical packages don't exist in the same application. If that's not the case, we need to use another ID. Actually, we already faced that when implementing Julia and used UUID.

pkgID := manifestDep.UUID
pkg := ftypes.Package{
ID: pkgID,
Name: name,
Version: version,
}

So, can we use UUID or something like that only in Maven? We don't have to re-implement all parsers.

@DmitriyLewen
Copy link
Contributor Author

We don't have to re-implement all parsers.

We might need to add similar logic for npm and cargo (we talked about adding a workspace field to the relationship), but I'm not sure if there could be duplicates for them.

But in general you are right. We can only use UUID for specific parsers

So, can we use UUID or something like that only in Maven?

hm... i think it is possible. I will take a look.

@DmitriyLewen DmitriyLewen linked a pull request Nov 6, 2024 that will close this issue
7 tasks
@knqyf263 knqyf263 modified the milestones: v0.58.0, v0.59.0 Nov 28, 2024
@DmitriyLewen
Copy link
Contributor Author

User found similar case for dpkg - #8273

But this is strange case (there are 2 status dirs (libssl1 and libssl1.1) with same name/version/etc. (see #8273 (comment)).

This looks like an error in the image construction, but on the other hand there are no restrictions for such cases, and we should solve this problem in Trivy.
@knqyf263 wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
2 participants