OnDemand User Group

Support Forums => MP Server => Topic started by: jsquizz on June 08, 2021, 05:00:42 AM

Title: Upgrade from Redhat 6-7, CMOD 9.5-10.5, Latency Issues
Post by: jsquizz on June 08, 2021, 05:00:42 AM
Little background of our environment, 1Lib Server, 1 Obj Serv, RHEL + DB2, DB2 storage is on SAN Veritas Cluster, CMOD Cache is also on Veritas Cluster

We just upgraded from Redhat 6.8 to Redhat 7.9 Maipo. After a few days we noticed slow response times from our API's calling via ODWEK (Weblogic). These requests would timeout after around 20 seconds, causing impact to our customers. We also see instances where loading is slow, example- 12.8 Seconds to load a file with 11 Docs..500kb size.. using generic indexer (user defined html file).

We opened up a case with IBM, and the first suggestion was to upgrade CMOD to 10.5, so we did that, the CMOD upgrade went perfectly fine - and the issue persisted. We checked all of our DB2 (V11.1fp4) logs, and we don't see anything. CMOD says they cant see anything wrong, and I agree with them. the only thing they are seeing is a pause/delay in the system log..There was a 40-50 second delay where there was absolutely no activity, whereas for the whole day there is usually stuff going on. After that pause, there's some 6second long queries in the 226 records.

We also pulled in DB2 Support, and we are giving them updates/db2mon/db2logs - but i dont feel like we are getting anywhere.

Veritas is looking at our system, along with Redhat support.

We've engaged several of our DBA's, Storage Engineers, and System Admins to address the issues.

One of our SA's suggested rolling back the RHEL kernel to 7.8, due to some major features added in 7.9. We did that overnight, and sure enough when I logged on this morning, no bueno.

The biggest issue, is that the problem is intermittent. Our next step is to run on one library server instead of lib/obj to rule out any kind of network latency or anything like that. I also think it might be a shot moving the DB2 volumes off veritas to local disk, or something.

I am open to any possible feedback, if anyone has encountered issues going from RHEL6-7.

Title: Re: Upgrade from Redhat 6-7, CMOD 9.5-10.5, Latency Issues
Post by: Ed_Arnold on June 08, 2021, 07:58:48 AM
> One of our SA's suggested rolling back the RHEL kernel to 7.8, due to some major features added in 7.9. We did that overnight, and sure enough when I logged on this morning, no bueno.

That means that didn't help?  That you're seeing the same symptoms?

Ed
Title: Re: Upgrade from Redhat 6-7, CMOD 9.5-10.5, Latency Issues
Post by: jsquizz on June 08, 2021, 08:34:51 AM
> One of our SA's suggested rolling back the RHEL kernel to 7.8, due to some major features added in 7.9. We did that overnight, and sure enough when I logged on this morning, no bueno.

That means that didn't help?  That you're seeing the same symptoms?

Ed

Yeah. They seemed adamant that would resolve our issues.

i'm wondering if there was some kind of setting changed at an OS level that's being missed, something that is causing us tons of grief thats being overlooked.
Title: Re: Upgrade from Redhat 6-7, CMOD 9.5-10.5, Latency Issues
Post by: Ed_Arnold on June 08, 2021, 01:14:00 PM
Confessing that I know next-to-nothing about Linux, I was curious so I did a quick google on redhat7 performance and found this:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/pdf/performance_tuning_guide/red_hat_enterprise_linux-7-performance_tuning_guide-en-us.pdf (https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/pdf/performance_tuning_guide/red_hat_enterprise_linux-7-performance_tuning_guide-en-us.pdf)

Whew!  Too many ways to monitor or knobs to turn for me!

Ed
Title: Re: Upgrade from Redhat 6-7, CMOD 9.5-10.5, Latency Issues
Post by: jsquizz on June 09, 2021, 11:37:37 AM
So we worked on this quite a bit over the past few days and we've linked this back to an issue with our storage, slow response times. The disks are being upgraded and mirrored as we speak. hopefully that works well. Last time I supported something like this years ago. here's where the first indication of this was.

Code: [Select]
00:46:45.195162: ARS4342I Storage Node >CACHE_OBJEC
 00:46:45.195212: ARS4312I Loading started, --UNKNOW
 00:46:46.705322: ARS1144I OnDemand Load Id = >5041-
 00:47:01.442199: ARS1146I Loaded 2 rows into the database
 00:47:03.847714: ARS1175I Document compression type used = OD77
 00:47:03.847789: ARS4310I Loading completed

23 Seconds for a 241KB document to load, on a relatively new RHEL box with 64GB of memory. It's a generic indexed PDF. For comparison, Another document of the same size, .39 seconds.

Need to optimize that as well.

Will update this when and if our disk changes work :D
Title: Re: Upgrade from Redhat 6-7, CMOD 9.5-10.5, Latency Issues
Post by: Justin Derrick on June 09, 2021, 01:20:57 PM
CMOD Performance Tuning is a bit of an art since there are so many moving parts.  I'd turn on maximum logging for that AG, and then check all the records in the System Log to see which steps are taking the longest.  It seems like you've done that and storage is a good place to start, especially because cloud storage is way slower than cache / TSM. 

You might be able to test your storage backend with a tool like 'curl' to access the S3 API, and that will help you narrow it down.

-JD.
Title: Re: Upgrade from Redhat 6-7, CMOD 9.5-10.5, Latency Issues
Post by: jsquizz on June 09, 2021, 03:41:15 PM
CMOD Performance Tuning is a bit of an art since there are so many moving parts.  I'd turn on maximum logging for that AG, and then check all the records in the System Log to see which steps are taking the longest.  It seems like you've done that and storage is a good place to start, especially because cloud storage is way slower than cache / TSM. 

You might be able to test your storage backend with a tool like 'curl' to access the S3 API, and that will help you narrow it down.

-JD.

Thats exactly what we did.

We turned on query logging, and were able to identify a gap in the CMOD system log where there was ZERO activity for like 40 seconds., some kind of lag/delay/etc.

So far, our disk changes worked and loads are working much better. Maybe I can sleep this evening.