Commit Graph

12 Commits

Author SHA1 Message Date
tusuii
c6bb1ac9b4 fix: make MetalLB IP pool apply resilient to broken webhook state
Some checks failed
scrum-manager/pipeline/head There was a failure building this commit
Wait for the MetalLB controller deployment to be ready before applying
IPAddressPool/L2Advertisement CRDs. If the webhook service has no ready
endpoints (stale ClusterIP from a previously removed controller), delete
the ValidatingWebhookConfiguration so the apply is not blocked. This
prevents the 'connection refused' webhook failure seen when a duplicate
MetalLB install left behind a broken webhook service endpoint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 00:38:40 +05:30
tusuii
d067dbfc44 fix: stop reinstalling MetalLB — cluster already has it running
Some checks failed
scrum-manager/pipeline/head There was a failure building this commit
MetalLB was already installed (metallb-speaker-* / metallb-controller-*)
32 days ago. Applying metallb-native.yaml created duplicate controller and
speaker resources. The new speaker pods could not schedule because the
existing metallb-speaker-* pods already occupy the host ports (7472, 7946)
on all 3 nodes: "1 node(s) didn't have free ports for the requested pod ports"

Fix: remove the kubectl apply for metallb-native.yaml — just apply the
IPAddressPool and L2Advertisement configs which is all we need.

Manual cluster cleanup required (one-time):
  kubectl delete deployment controller -n metallb-system
  kubectl delete daemonset speaker -n metallb-system

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 00:31:01 +05:30
tusuii
57c3c14b48 fix: make MetalLB speaker rollout non-blocking with diagnostics
Speaker DaemonSet on CPU-constrained cluster takes >180s to start all 3 pods.
Don't fail the entire pipeline — warn and print speaker pod status instead.
Controller must still be ready (it handles IP assignment) before continuing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 00:27:37 +05:30
tusuii
245301450c fix: use maxSurge=0 rolling update to avoid CPU pressure on small cluster
Some checks failed
scrum-manager/pipeline/head There was a failure building this commit
During rolling updates with the default maxSurge=1, an extra surge pod was
created temporarily (3 pods instead of 2), causing all 3 nodes to report
"Insufficient CPU" and delaying scheduling past the Jenkins rollout timeout.

With maxSurge=0 / maxUnavailable=1, one old pod terminates first before a
new one starts — pod count stays at 2 throughout, no extra CPU needed.

Also increase Jenkins rollout timeout from 300s to 600s as a safety net
for CPU-constrained nodes that may still need extra scheduling time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 00:10:04 +05:30
tusuii
7900114303 fix: increase MetalLB speaker daemonset rollout timeout to 180s
Speaker runs on all 3 nodes and needs image pull + startup time per node.
90s was too tight — bumped to 180s to handle slow node startups.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 00:07:55 +05:30
tusuii
69f7b4a93d feat: add MetalLB for on-premise LoadBalancer support
Some checks failed
scrum-manager/pipeline/head There was a failure building this commit
- Add MetalLB IPAddressPool (192.168.108.213/32) and L2Advertisement
  so the frontend gets a stable external IP on the LAN
- Change frontend service type: NodePort → LoadBalancer
- Add 'Setup MetalLB' stage in Jenkinsfile that installs MetalLB v0.14.8
  (idempotent) and applies the IP pool config before each deploy

After deploy: kubectl get svc frontend -n scrum-manager
should show EXTERNAL-IP: 192.168.108.213
App accessible at: http://192.168.108.213

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 00:00:04 +05:30
tusuii
55287c6f1d fix: increase backend memory limit and add rollout failure diagnostics
Some checks failed
scrum-manager/pipeline/head There was a failure building this commit
Backend was OOMKilled during rolling update startup (Node.js + Socket.io +
MySQL pool exceeds 256Mi). Raised limit to 512Mi and request to 256Mi.

Jenkinsfile: show kubectl get pods immediately after apply so pod state
is visible in build logs. Added full diagnostics (describe + logs) in
post.failure block so the root cause of any future rollout failure is
visible without needing to SSH into the cluster.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 23:24:19 +05:30
tusuii
5ed8d0bbdc fix: remove PVC patch that broke kubectl apply on bound claims
Some checks failed
scrum-manager/pipeline/head There was a failure building this commit
The mysql-data-pvc was already dynamically provisioned by the cluster's
'local-path' StorageClass. The overlay patch tried to change storageClassName
to 'manual' and volumeName on an already-bound PVC, which Kubernetes forbids:
  "spec is immutable after creation except resources.requests"

Fixes:
- Remove mysql-pvc-patch from kustomization.yaml (PVC left as-is)
- Remove mysql-pv.yaml resource (not needed with dynamic provisioner)
- Add comment explaining when manual PV/PVC is needed vs not

Jenkinsfile: add --timeout and FQDN to smoke test curl; add comments
explaining MySQL Recreate strategy startup timing expectations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 23:02:54 +05:30
748ce24e87 Update Jenkinsfile
All checks were successful
scrum-manager/pipeline/head This commit looks good
2026-02-22 12:24:41 +00:00
e23bb94660 jenkinsfile
Some checks failed
scrum-manager/pipeline/head There was a failure building this commit
2026-02-22 11:07:30 +00:00
ad65ab824e jenkinsfile
Some checks failed
scrum-manager/pipeline/head There was a failure building this commit
2026-02-22 11:06:21 +00:00
606eeed4c3 jenkinsfile
Some checks failed
scrum-manager/pipeline/head There was a failure building this commit
2026-02-22 10:48:45 +00:00