Loading...
Loading...
### Terraform Version ```shell 1.12.2 ``` ### Terraform Configuration Files ```terraform terraform { backend "s3" { bucket = "your-terraform-state-bucket" key = "path/to/your/statefile.tfstate" region = "us-east-1" encrypt = true use_lockfile = true } } ``` ### Debug Output I've only reproduced this in a production pipeline when triggering multiple concurrent builds, where it's not that simple to get debug logs. I can try to reproduce this in a separate environment if truly necessary to get debug logs, but I think the issue is understandable just from the info logs. ``` Initializing the backend... Initializing modules... Initializing provider plugins... - Reusing previous version of hashicorp/aws from the dependency lock file - Installing hashicorp/aws v5.80.0... - Installed hashicorp/aws v5.80.0 (unauthenticated) Terraform has been successfully initialized! Acquiring state lock. This may take a few moments... Error: Error acquiring the state lock Error message: operation error S3: PutObject, https response error StatusCode: 409, RequestID: , HostID: , api error ConditionalRequestConflict: The conditional request cannot succeed due to a conflicting operation against this resource. unable to retrieve file from S3 bucket 'example-terraform-state-bucket' with key 'clusters/staging/cluster-name/us1/security-groups/terraform.tfstate.tflock': operation error S3: GetObject, https response error StatusCode: 404, RequestID: , HostID: , NoSuchKey: Terraform acquires a state lock to protect the state from being written by multiple users at the same time. Please resolve the issue above and try again. For most commands, you can disable locking with the "-lock=false" flag, but this is not recommended. ``` ### Expected Behavior After failing to acquire the lock, Terraform should retry acquiring the lock until the lock timeout. ### Actual Behavior When the lockfile cannot be found, the command fails immediately without any retries. ### Steps to Reproduce Run `terraform plan -lock-timeout=30m -input=false` several times concurrently. Higher concurrency increases the chance of it happening. Likely the plan also needs to complete very quickly. ### Additional Context From reading the code that implements the state locking for S3, I believe this is what is happening: - Invocation A successfully acquires the lock - Invocation B fails to acquire the lock - Before invocation B does the GetObject to get information about the lock, invocation A releases the lock, [deleting the lockfile](https://github.com/hashicorp/terraform/blob/v1.12.2/internal/backend/remote-state/s3/client.go#L543-L546). - Invocation B now [fails to get information about the lock](https://github.com/hashicorp/terraform/blob/v1.12.2/internal/backend/remote-state/s3/client.go#L680), which [causes the retrier to bail out without performing any retries](https://github.com/hashicorp/terraform/blob/main/internal/states/statemgr/locker.go#L90-L94). In our case, this fails pretty consistently in CI when several pull requests are opened concurrently (typically via automation) and several plans are done within the pipeline. ### References _No response_ ### Generative AI / LLM assisted development? _No response_
Click on a version to see all relevant bugs
Terraform Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.