Author Topic: Communication break causing us to manually restart CMOD service  (Read 8037 times)

bruce.mchendry

  • Guest
Communication break causing us to manually restart CMOD service
« on: September 17, 2014, 06:14:51 AM »
We're running CMOD V9.0.0.3 on server 2012 and using an SQL DB on another server. Our DBA folks folks say they see nothing on the SQL side. Our server guys think it may be an issue with something called the McAfee Mini Firewall. We do deploy the McAfee Suite everywhere. We've bumped up both disc space and memory all around to ensure capacity "should" be okay. When I check event logs on the server the CMOD app will generate the same errors that seem to be about a connection loss and just happens to manifest itself as a drop to the SQL server. It will say things like :
"existing connection was forcibly closed by the remote host.   -- SQLSTATE=08S01, SQLCODE=10054, File=arsseg.c, Line=6509"
"DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509"
Typically this was happening between 1 and 2 a.m. so I was asking various teams if there were over night scans , backups, updates etc, etc that could affect us but nothing so far. And now this week its been happening during the day too. Anyone else seen this ? Any and all help is always appreciated !
Cheers,
Bruce McHendry

kbsiva

  • Guest
Re: Communication break causing us to manually restart CMOD service
« Reply #1 on: September 17, 2014, 09:27:00 AM »
When we had something like this - it is usually because a) firewall module is resetting a connection b) A security scan is taking place c) Database is not responding. The harder part is getting the respective teams to agree that there is an issue that they need to resolve. I am not sure if there is an time out that can be configured in CMOD to keep retrying connection to the database in case it looses the database connection. Eventually we had the CMOD and DB on the same server with out firewall and lo behold! this has happened only once in a year or so and that was because the database went down.  My two cents,


Thanks,

bruce.mchendry

  • Guest
Re: Communication break causing us to manually restart CMOD service
« Reply #2 on: September 17, 2014, 10:01:43 AM »
Thanks for the response ! I'd almost think you work here. I spent way too much time between a few teams trying to get someone to take ownership and participate in this. Finally got that worked out and that's how we arrived at the server guys disabling the McAfee Mini Firewall. They will actually be ding that tonight so I'm hopeful. I was also glad to see your suggestion of the firewall as well since it lines up with what we are trying. Thanks again for the fast reply and information.
Cheers

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2231
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Communication break causing us to manually restart CMOD service
« Reply #3 on: September 18, 2014, 03:59:23 AM »
For the record, while IBM will support a remote database with CMOD, it's not something that they recommend for a variety of reasons.  :)

CMOD & DB2 are tightly integrated -- almost all configuration data & logging info are stored in DB2, so when the database connection disappears, there's nothing that CMOD can do.

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

bruce.mchendry

  • Guest
Re: Communication break causing us to manually restart CMOD service
« Reply #4 on: September 18, 2014, 05:38:27 AM »
Thanks, I'm going to add this info to the agenda for this mornings project meeting.  Appreciate it !

bruce.mchendry

  • Guest
Re: Communication break causing us to manually restart CMOD service
« Reply #5 on: September 18, 2014, 09:51:19 AM »
So we had a meeting and due to some architecture here we can't have our DB and app on the same server. I'm asking the server guys to roll back the McAfee mini firewall change and asking IBM to get the app logs over to the CMOD experts. We've had the app down 3 times in 5 hours so not much fun here today. We keep seeing the same events in the server application event logs. And SQL DBA still says nothing on their side, the network group has just started monitoring.
Level   Date and Time   Source   Event ID   Task Category   Error
Error   9/18/2014 8:59:59 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/18/2014 8:59:59 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/18/2014 8:59:59 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]TCP Provider: An existing connection was forcibly closed by the remote host.   -- SQLSTATE=08S01, SQLCODE=10054, File=arsseg.c, Line=6509
Error   9/18/2014 7:09:51 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/18/2014 7:09:51 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/18/2014 7:09:51 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]TCP Provider: An existing connection was forcibly closed by the remote host.   -- SQLSTATE=08S01, SQLCODE=10054, File=arsseg.c, Line=6509
Error   9/17/2014 9:42:38 PM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]TCP Provider: An existing connection was forcibly closed by the remote host.   -- SQLSTATE=08S01, SQLCODE=10054, File=arsseg.c, Line=6509
Error   9/17/2014 9:32:40 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 9:32:40 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]TCP Provider: An existing connection was forcibly closed by the remote host.   -- SQLSTATE=08S01, SQLCODE=10054, File=arsfol.c, Line=3072
Error   9/17/2014 9:32:39 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 9:16:53 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 9:16:53 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 8:46:53 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 8:46:53 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 8:42:18 PM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 8:42:18 PM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 8:42:18 PM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]TCP Provider: An existing connection was forcibly closed by the remote host.   -- SQLSTATE=08S01, SQLCODE=10054, File=arssm.c, Line=1047
Error   9/17/2014 8:16:52 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 8:16:52 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 7:46:53 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 7:46:53 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]TCP Provider: An existing connection was forcibly closed by the remote host.   -- SQLSTATE=08S01, SQLCODE=10054, File=arsseg.c, Line=6509
Error   9/17/2014 7:16:53 AM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]TCP Provider: An existing connection was forcibly closed by the remote host.   -- SQLSTATE=08S01, SQLCODE=10054, File=arsseg.c, Line=6509
Error   9/17/2014 6:17:23 PM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 6:17:23 PM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 6:17:23 PM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509
Error   9/17/2014 6:17:23 PM   OnDemand for Windows   13   None   DB Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure -- SQLSTATE=08S01, SQLCODE=0, File=arsseg.c, Line=6509


Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2231
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Communication break causing us to manually restart CMOD service
« Reply #6 on: September 19, 2014, 12:45:03 PM »
FYI, if it's for ISO compliance, my understanding of the requirements means the database and the application can co-exist, as long as they're moved to a separate network segment which has it's own set of restrictions (firewall rules, etc.).

Not knowing what your network infrastructure looks like, I can't recommend anything other than moving the rules to a physical router, instead of a local software firewall.

Good luck!

-JD.

IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

bruce.mchendry

  • Guest
Re: Communication break causing us to manually restart CMOD service
« Reply #7 on: September 22, 2014, 06:12:16 AM »
Hi,
Thanks Justin. I'm trying to gather some more data form our logs and have a ticket open with our Network and Server Ops guys to review both the firewalls, antivirus (updates and scans) as well as back ups, updates etc, etc. Is there any internal logging from within the CMOD app itself that I can either access or enable that might tell me something from the app itself ? I'm also opening tickets with IBM and CMOD. We're just in Dev currently but aew about to go to Stage/UAT and now I've been sideswiped with finding out there will be folks in India testing. That means if it quits in the middle of the night it impacts them and I sure don't want to get called at 1 a.m. just to restart a service. Appreciate the help and time so far with this. I have to really get to root cause as soon as I can.
Cheers

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2231
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Communication break causing us to manually restart CMOD service
« Reply #8 on: September 23, 2014, 03:57:49 AM »
You can turn on tracing...  I'm not sure how that's handled in Windows, but on the UNIX platforms, you change the level of logging in trace.settings, where you can tune and tweak what gets output.

http://www-01.ibm.com/support/docview.wss?uid=swg21330810

and

http://www-01.ibm.com/support/docview.wss?uid=swg21639908

Good luck!

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

bruce.mchendry

  • Guest
Re: Communication break causing us to manually restart CMOD service
« Reply #9 on: October 09, 2014, 05:04:11 AM »
Hi,
So as hard as this is to believe we may have found the issue. Due to requirements here I can't change we have the SQL DB sitting on a remote server. A few days ago we noticed that Content Navigator was spitting out errors that it couldn't connect to the DB. Turns out that some DBA changed the port number. Now ironically the change was not a scheduled one and none of knew that it happened and no one in the DBA are is 'fessing up BUT all of a sudden everything has quieted down and our connection has stayed solid for 2 days, 3 is we get through today. My best guess, and I'm no expert on this, I'm thinking there was some kind of contention going on that the port change rectified. It's kind of goofy but I'll take it and be happy that we have some stability again.
Cheers

kbsiva

  • Guest
Re: Communication break causing us to manually restart CMOD service
« Reply #10 on: October 10, 2014, 11:56:37 AM »
Glad to know that you are closer to finding a root cause and I am guessing from your posts may be catch up on some lost sleep as well :)


bruce.mchendry

  • Guest
Re: Communication break causing us to manually restart CMOD service
« Reply #11 on: October 15, 2014, 05:16:19 AM »
Hi,
Yes as much as an unscheduled change to a port is not supposed to happen it seems to be the fix. I'm not sure we'll ever really get root cause but at least we don't have daily outages and some unhappy test and QA people. next phase is to start moving form Dev to the Stage/Test/QA phase in that new environment so we'll see how that goes. Thanks for the help, ideas and comments.  ;D