🐛(ZENKO-5288) wrap kafka-server-start.sh to propagate broker exit code#2428
🐛(ZENKO-5288) wrap kafka-server-start.sh to propagate broker exit code#2428DarkIsDude wants to merge 1 commit into
Conversation
Hello darkisdude,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
|
| { \ | ||
| echo '#!/bin/bash'; \ | ||
| echo '"$(dirname "$0")/kafka-server-start-real.sh" "$@"'; \ | ||
| echo 'KAFKA_EXIT=$?'; \ | ||
| echo 'printf "%d" "$KAFKA_EXIT" > /var/run/kafka-exit/code'; \ | ||
| echo 'exit "$KAFKA_EXIT"'; \ | ||
| } > ${KAFKA_HOME}/bin/kafka-server-start.sh && \ | ||
| chmod +x ${KAFKA_HOME}/bin/kafka-server-start.sh |
There was a problem hiding this comment.
instead of generating a script, best to add it to the folder and use the ADD command
|
|
||
| RUN chmod a+x ${KAFKA_HOME}/bin/*.sh | ||
|
|
||
| RUN mv ${KAFKA_HOME}/bin/kafka-server-start.sh ${KAFKA_HOME}/bin/kafka-server-start-real.sh && \ |
There was a problem hiding this comment.
I think it would be better to keep the existing file, and just create another one next to it.
we can then either use our own script as ENTRYPOINT, or change the CMD command to our own?
There was a problem hiding this comment.
Koperator's entrypoint calls kafka-server-start.sh
not clear what this means precisely: i.e. is koperator setting a custom ENTRYPOINT ? or overriding the COMMAND ? or something else still ?
| echo '#!/bin/bash'; \ | ||
| echo '"$(dirname "$0")/kafka-server-start-real.sh" "$@"'; \ | ||
| echo 'KAFKA_EXIT=$?'; \ | ||
| echo 'printf "%d" "$KAFKA_EXIT" > /var/run/kafka-exit/code'; \ |
There was a problem hiding this comment.
nit: instead of a wrapper, we may be able to add a trap command at the beginning of the script to store exit code to /var/run/kafka-exit/code before returning
Did you find how it got there? Seems really weird that |
Summary
kafka-server-start.shtokafka-server-start-real.shat image build timekafka-server-start.shwrapper that captures Kafka's exit code and writes it to/var/run/kafka-exit/codebefore returningkafka-scripts-patcherinit container in the operator (see scality/zenko-operator#620)Context
Koperator's entrypoint calls
kafka-server-start.shthen unconditionally runsrm /var/run/wait/do-not-exit-yet(exit 0), masking any non-zero exit code from the broker. By baking the wrapper into the image, we capture the exit code before that masking occurs, allowing theexit-code-propagatorsidecar in the operator to set the pod phase toFailedwhen Kafka crashes.Test plan
kafka-server-start-real.shand the newkafka-server-start.shwrapper are present/var/run/kafka-exit/codeexit-code-propagatorsidecar exits with the correct code and pod phase isFailedI also opened an upstream fix adobe/koperator#260. IMO we should ship our fix and removed it later. Without this fix, it's hard to debug issue for CS team.
Issue: ZENKO-5288