OnDemand User Group

Support Forums => MP Server => Topic started by: DDP021 on January 28, 2019, 08:48:28 AM

Title: Unable to store object Failure
Post by: DDP021 on January 28, 2019, 08:48:28 AM
I may have posted something on this subject before but it started rearing its ugly head again....We periodically get this type of failure...In most cases we can rename the .failed file and when doing so, the file will always load successfully (at least up till now)...Our issue is, we still have reports coming from the mainframe to CMOD which use OS/390...When this type of file fails, the mainframe job itself has to be rerun manually.  We risk the possibility of losing data in the event the job runs multiple times and data from failed job gets overwritten.  Our engineering group contacted ECS storage...here was their response:

"This represents a failure rate of 0.02  for clip writes and  0.01 for blob writes
We generally advertise ECS availability of 99.9% which you are well within
There is no concern here. The application will automatically retry these failures which will succeed upon retry". 

The main issue here is, ERR doesn't support automatic retry...Our options are to manually rename .failed file or manually re-run mainframe job.

Anyone else using ECS storage having any issues????

Appreciate any feedback

Dave
Title: Re: Unable to store object Failure
Post by: Justin Derrick on January 28, 2019, 12:32:02 PM
Heh,  Sounds like you need to get someone to promise 5-nines reliability.  :)

It's kind of a crappy reply from your vendor.  0.02 failure rate on a million writes is 20,000 failures.  I don't think that's acceptable.  There are plenty of CMOD systems that might write a million object files every week.

From a CMOD perspective, I can only suggest writing an enhancement request here on the forums for a 'number of retries' parameter to have CMOD make multiple attempts to write an object -- but that's a real kludge.

-JD.
Title: Re: Unable to store object Failure
Post by: DDP021 on January 29, 2019, 03:46:12 AM
Justin,

I agree...Sounded like a canned response to pretty much wash their hands of any responsibility..lol....It's funny how these errors hit...Seem to come in spurts..Last one we had was 01/26 and then before that was 01/21...But on 01/19 we has 16...Our main concern is if they hit during off hours where there is a potential of the mainframe jobs rerunning and overwriting old data that didn't get loaded..I'll pass your info onto our engineering group...See how they want to pursue it...As always, appreciate the response!!!

Take care

Dave
Title: Re: Unable to store object Failure
Post by: Alessandro Perucchi on January 31, 2019, 01:19:08 PM
Thank you for the feedback. We are planning in the next few month to migrate from Centera to the new ECS... what you say is quite frightening!
Since we don't use the arsload deamon feature, we can modify our scripts to check for such errors and try to reload x times...
but still I really don't like that... I hope this is only a "firmware" problem from ECS storage... ... ... ...
Title: Re: Unable to store object Failure
Post by: DDP021 on February 01, 2019, 03:16:57 AM
Well we just took another "hit" on 01/31..This time it was 7 reports that received the "unable to store" error.  As always, after rerunning them they all loaded fine...Last update we had was they opened a SR with Dell....With it being so intermittent its hard to diagnose apparently..Even with giving them specific times and errors everytime...Hopefully they come back with some resolution.  Hoping you don't encounter the same issues!!!  Take care

Title: Re: Unable to store object Failure
Post by: Norbert Novotny on February 20, 2019, 02:03:07 AM
Hi guys, yes I can confirm this issue as well.

Our setup is CMOD on AIX with TSM on AIX accessing ECS via CAS API.

02/20/19   08:10:19      ANR2547E A Centera device (5) reported error "Unknown error (transid='sb002017/9008600/WRITE_BLOB'),FP_SERVER_-ERR" during command BlobWrite. (SESSION: 702829)
02/20/19   08:10:19      ANR0523W Transaction failed for session 702829 for node NODE01 (CMOD) - error on output storage device. (SESSION: 702829)
02/20/19   08:10:19      ANR0514I Session 702829 closed volume 00003C41.CNT. (SESSION: 702829)
02/20/19   08:10:19      ANR0403I Session 702829 ended for node NODE01 (CMOD). (SESSION: 702829)


After a re-submit all worked well.
Cheers,
 N.
Title: Re: Unable to store object Failure
Post by: Alessandro Perucchi on February 20, 2019, 04:23:03 AM
Hi guys, yes I can confirm this issue as well.

Our setup is CMOD on AIX with TSM on AIX accessing ECS via CAS API.

02/20/19   08:10:19      ANR2547E A Centera device (5) reported error "Unknown error (transid='sb002017/9008600/WRITE_BLOB'),FP_SERVER_-ERR" during command BlobWrite. (SESSION: 702829)
02/20/19   08:10:19      ANR0523W Transaction failed for session 702829 for node NODE01 (CMOD) - error on output storage device. (SESSION: 702829)
02/20/19   08:10:19      ANR0514I Session 702829 closed volume 00003C41.CNT. (SESSION: 702829)
02/20/19   08:10:19      ANR0403I Session 702829 ended for node NODE01 (CMOD). (SESSION: 702829)


After a re-submit all worked well.
Cheers,
 N.


What a cr@p... so ECS is maybe not a so nice replacement of Centera apparently...

Are there other solutions that "works" ?