ENT-14140: psql_wrapper.sh: retry psql commands on transient failures#3165
Conversation
craigcomstock
left a comment
There was a problem hiding this comment.
This doesn't feel quite right to me. It would seem we need more of a sequence of actions and not a class/gate situation. We need the restart to finish and then run superhub_schema(). With this solution superhub_schema() would be run at next agent interval, which is OK but maybe not ideal. Could we instead of gating on recent restart gate on postgresql up and ready in hopes that superhub_schema() could run in the same agent run?
@craigcomstock, @nickanderson what if we have the |
yeah I think that would be better. |
The only thing @nickanderson is that it will cause the agent to hang while it bootstraps. Or should it perhaps run these commands in the background? |
Retry psql command on transient failures. E.g., when postgres is being restarted due to config change. Ticket: ENT-14140 Changelog: psql commands are now retried on transient errors in federated reporting Signed-off-by: Lars Erik Wik <lars.erik.wik@northern.tech>
Hang permanently, or just be slow? Permanent hang needs to be avoided. Just reading the code there it looks like it might hang for up to 30 seconds while re-trying. Also, this hang would be limited to the hub bootstrapping to itself, is that right? |
No, every agent run on the super hub is affected. Feeders are not as far as I can see (due to |
|
@cf-bottom Jenkins please :) |
|
Alright, I triggered a build: Jenkins: https://ci.cfengine.com/job/pr-pipeline/13880/ Packages: http://buildcache.cfengine.com/packages/testing-pr/jenkins-pr-pipeline-13880/ |
Observed a race condition in CI where
bundle agent superhub_schemainteracts with postgres shortly after service restart.Fixed by gating
superhub_schema,ensure_feeders, andimported_dataon a persistent class set by the cf-postgres restart.Ticket: ENT-14140
Backported to: