Skip to content

add command line flag for system test result dumps#2967

Open
efd6 wants to merge 2 commits intoelastic:mainfrom
efd6:flag_dump
Open

add command line flag for system test result dumps#2967
efd6 wants to merge 2 commits intoelastic:mainfrom
efd6:flag_dump

Conversation

@efd6
Copy link
Copy Markdown
Contributor

@efd6 efd6 commented Sep 30, 2025

Please take a look.

@efd6 efd6 self-assigned this Sep 30, 2025
@efd6 efd6 requested a review from a team September 30, 2025 22:41
Comment thread internal/cobraext/flags.go Outdated
Comment on lines +212 to +214
TestDumpPrefixFlagName = "dump"
TestDumpPrefixFlagDescription = "prefix for system test dump file"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding just one flag it forces to always set a prefix to the user, but probably they often need just a default value for that.

What about adding two flags to manage this feature?

  • One flag to enable or not dumping the documents (--dump-docs) without defining any path or prefix.
    • The default filename could be the same one used when the environment variable is set.
    • Should this file be saved in the same location where elastic-package runs ? Take into account that this could change with the parameter -C.
      elastic-package -C test/packages/parallel/nginx test system -v
  • Another flag to set a custom prefix (--dump-docs-prefix) or even it could be set the whole path (--dump-docs-output) leaving to the user the choice to where save the output file. I think I'd prefer the latter.
# enable dumping documents and using the default filename for the path
elastic-package test system -v --dump-docs-

# enable dumping documents and using a custom path
elastic-package test system -v --dump-docs --dump-docs-output /path/to/dump.json

Doing it like this, it would be similar on how coverage is set in elastic-package:

  • --test-coverage to enable the feature,
  • --coverage-format to choose which format is required.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comes down to a philosophical position; I think having a default location for things like this is a significant misfeature. IMO actions should require explicit requests. Including things like this in the config add complexity to the user experience and is also IMO a misfeature.

I'm not entirely sure what the gain is for users in adding this additional complexity.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO I think most users (from my POV) would like just to download/dump the documents without really requiring setting any specific path or prefix. That's why I would say that the most common usage would be (not requiring to set any prefix there):

elastic-package test system -v --dump-docs

Moreover, doing it like this, it would enable this feature and it could be coded to be equivalent to set the ELASTIC_PACKAGE_TEST_DUMP_SCENARIO_DOCS environment variable. So users setting the environment variable and setting that flag can expect the same exact behavior of elastic-package.

For the other flag I proposed, I thought there could be users that require some specific known file, for instance they are running tests with different packages and they want those files to be easily identified. That's my reasoning to add that new --dump-docs-output or --dump-docs-prefix flag, so they can decide how to save those documents.

# allow a full path
elastic-package test system -v --dump-docs --dump-docs-output /path/store/documents.json

# or as you proposed adding a prefix and save them in the same location where elastic-package runs
elastic-package test system -v --dump-docs --dump-docs-prefix "package-test"

The concern I have of writing those new documents in the same location as elastic-package runs is that this could lead to users committing and pushing those files in the repository inadvertently.

Copy link
Copy Markdown
Contributor Author

@efd6 efd6 Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO I think most users (from my POV) would like just to download/dump the documents without really requiring setting any specific path or prefix.

This is certainly not what I'd want; doing this means that I cannot script obtaining the data.

In terms of work, specifying a location is easier than digging through logs to find the name that the program chose. In my ideal world, the output would not be to a file whose name was under the control of the program at all. IMO this should be entirely under the control of the user, but I made a concession to the current behaviour on the basis that it was still relatively easy to obtain the files if the prefix was known.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got it.

Could it be changed what the value would indicate for this flag? Instead of setting a prefix for files, this flag could set the path to a folder.

I'm trying to be consistent with other commands/parameters used in elastic-package. For instance, in these commands --output is a path to a folder:

  • dump stack logs: elastic-package stack dump --output <path>
  • dump assets: elastic-package dump agent-policies --output <path>

That folder should be checked (and created if needed) by elastic-package. And it would be the folder where the JSON files are stored.
Files stored in that folder could follow the same naming patterns as it was set when setting the environment variable:

    dumpPath = filepath.Join(dumpFolderBase, fmt.Sprintf("elastic-package-test-docs-dump-%s.json", time.Now().Format("20060102150405")))

So, elastic-package would be like this:

elastic-package test system -v --dump-logs "docs_folder_package/"

Probably, doing it like this, the parameter could be renamed as --dump-logs-to.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd also prefer this flag to specify a folder instead of a prefix.

Which behaviour would you prefer.

  1. mkdir -p ${outdir}
    echo ${result} > ${outdir}/${prefix_slug}-$(date).ndjson
    
  2. mkdir ${outdir}
    echo ${result} > ${outdir}/${prefix_slug}-$(date).ndjson
    

(where ${prefix_slug} is determined without user input and $(data) is the current timestamp suffix logic).

I prefer the first behavior, as this is more convenient for users.

This is also what elastic-package does at the moment in most of the cases, so it would be consistent with current behaviors:

.../elastic-package$ git grep Mkdir | wc -l
65
.../elastic-package$ git grep MkdirAll | wc -l
60
.../elastic-package$ git grep MkdirTemp | wc -l
4

Though it is also true that maybe scripts don't always try to create parent directories, so for this case both options would work for me if you prefer the second behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the second; explicit is better than implicit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!
So then, IIUC with that option, the command would be like:

elastic-package test system --dump-docs <folder>

And elastic-package would create the (last) folder and use the same prefix (prefix_slug as mentioned above) for the filename as set with the environment variable (elastic-package-test-docs-dump-%s.json). Is that right ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, but only if there is a path to the location where the directory would be created.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer 1. If it's making any directories, why not make them all?

If I were doing this work in a script, I'd write mkdir -p, because in addition to wanting make sure I can write to the requested destination, I don't want to fail if the destination directory already exists.

Comment thread internal/cobraext/flags.go Outdated
efd6 added 2 commits October 27, 2025 14:41
Also fix case of failing tests without validation in the case that the
dump fails.
@elasticmachine
Copy link
Copy Markdown
Collaborator

💚 Build Succeeded

History

cc @efd6

err := dumpScenarioDocs(scenario.docs, dumpPath)
if err != nil {
return nil, fmt.Errorf("failed to dump scenario docs: %w", err)
dumpErr = fmt.Errorf("failed to dump scenario docs: %w", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for understanding, why we change the behavior, so now the error is not thrown but joined to the result of validateTestScenario. Does it make sense to run r.validateTestScenario if this fails?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be relevant to the success of the test if some aspect of the dump fails? As an example, if the dump fails because of an fs error, do we now not care that the validation is not valid?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is the question i asked in order to understand the change. before, the test was failing if dump failed, isn't it? i am just asking to understand, lacking context here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think that behaviour was unhelpful. (see this for context)

Comment thread internal/testrunner/runners/system/tester.go
Copy link
Copy Markdown
Contributor

@chrisberkhout chrisberkhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best story here would be:

I can run elastic-package test system -h and see an option for dumping documents ingested during the run of a system test scenario. I run with that option, possibly with a value, and know exactly where the documents will be written.

Several things were/are blocking that:

  • (Old env var implementation) A temp dir has an unpredictable name.
  • If you give a name prefix you still need to know the suffix.
  • If the suffix includes a timestamp, you need to know the time it was run (may be okay if you only want the latest).
  • If your system test run covers multiple scenarios, you need to find the file (timestamp) that corresponds to the run you're interested in (or if you want all, you still need to know which is which).

I think having a default is good for the manual dev workflow, and having a destination parameter is good for scripting.

If the option took a pattern and had a default like ./system-test-docs-%DATASTREAM-%SCENARIO-%TIMESTAMP.json, I think that would be getting pretty close. The default value also serves as compact documentation of the manual option.

System test scenarios don't have names, but the scenario files need to match ^test-[a-z0-9_.-]+-config\.yml$, so I guess we'd say the scenario defined in test-cel-bad-creds-config.yml is called cel-bad-creds (unique for a data stream).

If the user asks for an output location that doesn't exist, then it's probably good to create it for them. But I think no directories or all directories are both okay options.

It might be worth doing something to avoid the confusion of an earlier scenario's dump being overwritten by a later scenario in the same run. Possibly appending rather than overwriting and possibly having a wrapper with some metadata, or logging when a file exists and possibly adding a -COUNTER suffix.


Philosophical position:

Simplicity is what's important, especially simplicity for the user. UNIX does a version of it well.

An option to say "write docs here" isn't simple if you still have to look at the code or the logging or all recently created files to know where the docs are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants