We changed to Consensus, hoping the network would be more robust. However, due to a network reconfiguration, some IPs were left undefined and the cluster broke:
<[ACTIVE] ExecuteThread: '37' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <f56960fb001bca18:-33f939f3:1413c1039a1:-8000-0000000000065ea0> <1380619475997> <BEA-000802> <ExecuteRequest failed java.lang.AssertionError: Invalid state transition from failed to stable_leader. java.lang.AssertionError: Invalid state transition from failed to stable_leader at weblogic.cluster.leasing.databaseless.ClusterState.setState(ClusterState.java:100) at weblogic.cluster.leasing.databaseless.ClusterState.setState(ClusterState.java:59) at weblogic.cluster.leasing.databaseless.ClusterFormationServiceImpl.leaderInitialization(ClusterFormationServiceImpl.java:318) at weblogic.cluster.leasing.databaseless.ClusterFormationServiceImpl.formClusterInternal(ClusterFormationServiceImpl.java:148) at weblogic.cluster.leasing.databaseless.ClusterFormationServiceImpl.timerExpired(ClusterFormationServiceImpl.java:339) at weblogic.timers.internal.TimerImpl.run(TimerImpl.java:273) at weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:528) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209) at weblogic.work.ExecuteThread.run(ExecuteThread.java:178) >
The issue is that once the network has been fixed, the cluster didn't recover and we had to restart the servers... however this could simply be because when we restart the server, the Virtual IP associated to each server is readded to the NIC (/sbin/ifconfig -addif). Instead of restarting the servers I should have tried to add the IP manually... one should really monitor continuously the availability of those IPs...
No comments:
Post a Comment