Author Topic: Alternative delete methods  (Read 4061 times)

Lars Bencze

  • Full Member
  • ***
  • Posts: 116
  • CMOD Expert at Skandia
    • View Profile
    • INACTIVE - Bezland Consulting
Alternative delete methods
« on: January 02, 2018, 02:07:37 AM »
With the new GDPR regulation coming into effect by May 2018, many companies need better ways to delete their documents, especially from OnDemand.
Most regulatory agencies don't accept the "lazy delete" done with arsdoc delete and by the ODWEK API, where only the database record pointing to the document is removed, while the document data itself is left intact on disk (and other storage).
(Most of you guys here on the forum would succeed in restoring such a "deleted" document, if you had access to the database and the files on disk.)

Are there any other options?
If we do an export + reload (after removing the document that is to be deleted) and then unload the original load, a new (minor) problem appears: the segmentation order is disrupted. Example: Say that you for a given Application Group have 100 or more segment tables, and they have been created sequentially by/with daily printouts.
Then you unload and reload a batch, which for this example is 5 years old. When you reload it, it ends up in the CURRENT segment table, and the START_DT (Start date) column for the current table will be set to a much earlier date. If you repeat this, the segmentation will eventually become really messed up and searches will be slower, since a lot of (unnecessary) tables will be searched.

Are there any better methods out there to delete data from the FAA* files?
Has anyone been bold/crazy enough to investigate a solution which overwrites part of the data file on disk? (NOT recommended!)
Can you "re-open" a segment table for writing, temporarily? (As far as I know, you can only close a table and that automatically creates a new one. Of course, you could close the current table, reload the old data into a new table, and then close that table too. But that would create a whole lot of new tables over time.)
Can you forcefully move a batch of documents from one segment table to another?
Any other solution?

Please share your thoughts and solutions here. Also if you happen to know that IBM has a solution for this up the sleeve, I'd like to know.
OnDemand for MP expert. #Multiplatforms #Admin #Scripts #Performance #Support #Architecture #PDFIndexing #TSM/SP #DB2 #CustomSolutions #Integration #UserExits #Migrations #Workflow #ECM #Cloud #ODApi

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2230
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Alternative delete methods
« Reply #1 on: January 02, 2018, 01:24:30 PM »
Enhanced Retention Management.  It reloads-under-the-covers, and keeps table segmentation intact.  This works for *most* situations, except where the back-end storage is WORM, since the media needs to be destroyed in order to be considered 'deleted'.

Happy New Year!

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Nolan

  • Full Member
  • ***
  • Posts: 152
    • View Profile
Re: Alternative delete methods
« Reply #2 on: January 02, 2018, 04:39:11 PM »
I don't believe - Enhanced Retention Management is a solution to the real world problem.  Enhanced Retention Management, is good for situations where you need a legal hold on a few documents where 90% or more of the documents in the load will follow the regular delete cycle.   I think IBM needs to try again and create a supported solution to delete a document out of the repository without unloading/reloading and tie into a Records Management system.

Note that  Enhanced Retention Management is only the OnDemand side, you still need to write/build/customize integration to manage the hold on the documents or worse, leave it up to the users!

J.

#zOS #AIX #Windows #Multiplatforms
#DB2 #TSM #ODF #zODF #ODWEK
#CapacityPlanning #AFP #ReportDistribution
#Finance #ICN

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2230
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Alternative delete methods
« Reply #3 on: January 02, 2018, 06:16:38 PM »
The snag here is how CMOD compresses and bundles objects together, and I'm fairly certain that having the ability to 'wipe' individual documents would break the compression / bundling.

Maybe add it as an enhancement thread if we can't find a good solution here.

I guess the big question is, what's the range of dates in the documents you're loading all at once?  If a batch of documents is within, say, a 7 or 30 day window, I don't see this as a big issue, as documents will expire and be deleted in short order after their official expiration date.  If we're talking about one-off deletions, like deleting an individual document, then maybe there needs to be a way to specify that an object is replaced and re-written without that index record -- but the problem is that your 'archive' system is suddenly editable -- and that would likely threaten the credibility of a document produced from that system.

However, if the document is a 'bad' one from a batch of loaded files, then that's a data quality issue that should be addressed at the source.

It's a very big question, and something I've thought about for years but never really talked about before now...  :)

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Nolan

  • Full Member
  • ***
  • Posts: 152
    • View Profile
Re: Alternative delete methods
« Reply #4 on: January 03, 2018, 08:22:34 AM »
Agreed, the compression and bundling does create a serious challenge to this problem/solution.  I think a hybrid solution which "might" appease the auditors would be to make it impossible to recreate a deleted record in the table after the lazy delete has been issued.  Using a key GUID to validate all the indexed rows, then when a "lazy" delete is done rebuild all the remaining keys so going back is impossible. 



J.

#zOS #AIX #Windows #Multiplatforms
#DB2 #TSM #ODF #zODF #ODWEK
#CapacityPlanning #AFP #ReportDistribution
#Finance #ICN

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2230
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Alternative delete methods
« Reply #5 on: January 03, 2018, 08:51:29 AM »
Actually, this provides for an interesting solution.  CMOD v10.1 supports encryption.  It would be nice to provide an encryption key for each individual document.  Yes, it would slaughter performance, but it would make each individual document irrecoverable when the row is deleted.  It would also be possible to detect any tampering with an individual load by hashing all of the keys together and storing that hash in the arsload table...

In this scenario, an individual document row could be deleted in the database, eliminating the key, making the document irrecoverable.  A change to an existing document on disk would be impossible without the key from the database.  Any change to the file on disk would break the encryption.  Any row deleted would cause a 'verification' of the load to fail, since the missing key would break the hash in the arsload table.

Anyone care to expand on that idea?

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Nolan

  • Full Member
  • ***
  • Posts: 152
    • View Profile
Re: Alternative delete methods
« Reply #6 on: January 03, 2018, 09:51:25 AM »
That is along the lines of what I was thinking.  Now to get IBM to build it  :D

J.

#zOS #AIX #Windows #Multiplatforms
#DB2 #TSM #ODF #zODF #ODWEK
#CapacityPlanning #AFP #ReportDistribution
#Finance #ICN

Lars Bencze

  • Full Member
  • ***
  • Posts: 116
  • CMOD Expert at Skandia
    • View Profile
    • INACTIVE - Bezland Consulting
Re: Alternative delete methods
« Reply #7 on: January 08, 2018, 04:06:33 AM »
Hi, very interesting thoughts.
According to another source I have (I have not verified this yet due to a lack of time), the ERM does NOT keep the segmentation intact.
Do you have a source where I can verify that this is indeed the case?
From my tests with ERM, it does not delete or reload Jack. Unless you run arsmaint -D 100, but that is to my understanding not part of ERM but of base CMOD.
(Running "arsmaint -D 100 ... -G AppGroup" is another thing I have also not verified yet. during my last attempt, it seemed to try to reload every single LoadID in the Application Group - NOT an option as you understand... :) )
OnDemand for MP expert. #Multiplatforms #Admin #Scripts #Performance #Support #Architecture #PDFIndexing #TSM/SP #DB2 #CustomSolutions #Integration #UserExits #Migrations #Workflow #ECM #Cloud #ODApi

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2230
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Alternative delete methods
« Reply #8 on: January 08, 2018, 07:21:25 AM »
The info came from IBM in a presentation.  I'll try and get a written source for you.  :)

Edit:
Here's the IBM CMOD Enhanced Retention Management documentation:  https://www.ibm.com/support/knowledgecenter/SSEPCD_10.1.0/com.ibm.ondemand.erm.doc/doder200.htm -- and no, it doesn't contain any evidence of my assertion about how it works.  I've sent a note to the developer for confirmation.

-JD.
« Last Edit: May 24, 2018, 09:43:14 AM by Justin Derrick »
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Nolan

  • Full Member
  • ***
  • Posts: 152
    • View Profile
Re: Alternative delete methods
« Reply #9 on: January 08, 2018, 10:12:37 AM »
Lars, perhaps you need to adjust your settings for it to unload jack :)

From the document shared by Ed/Justin.

To help you control how often Content Manager OnDemand reloads a load, include the -D flag when you run the arsmaint
and arsadmin unload commands as part of your expiration process. The -D flag indicates that Content Manager OnDemand should reload a load when
the number of documents with a hold in an application group changes by a specified percentage from the previous time the application group was
loaded.
When Content Manager OnDemand needs to reload an application group, it does the following tasks:

a. Extracts all the documents that have holds applied and their related index data.
b. Loads all the held documents and their related index data into a new load.
c. Deletes the original load (all files from cache and the index data from the OnDemand databases).
J.

#zOS #AIX #Windows #Multiplatforms
#DB2 #TSM #ODF #zODF #ODWEK
#CapacityPlanning #AFP #ReportDistribution
#Finance #ICN

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2230
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Alternative delete methods
« Reply #10 on: January 08, 2018, 02:22:52 PM »
Okay, confirmed by the developer -- it doesn't work the way I thought, loads go into the open database segment.  I forsee an Enhancement Request in my future.  :)

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Lars Bencze

  • Full Member
  • ***
  • Posts: 116
  • CMOD Expert at Skandia
    • View Profile
    • INACTIVE - Bezland Consulting
Re: Alternative delete methods
« Reply #11 on: January 09, 2018, 07:31:55 AM »
Thank you both Justin and Nolan for your help.
Yes, I read the same documentation and noted that it could be reloading it into the old table, but it was unspecified.

PS: I will go searching for the "jack" setting when I have some time over.... ;) Or maybe we will write an addon for that too.
OnDemand for MP expert. #Multiplatforms #Admin #Scripts #Performance #Support #Architecture #PDFIndexing #TSM/SP #DB2 #CustomSolutions #Integration #UserExits #Migrations #Workflow #ECM #Cloud #ODApi

Stephen McNulty

  • Jr. Member
  • **
  • Posts: 57
    • View Profile
Re: Alternative delete methods
« Reply #12 on: January 23, 2018, 07:32:33 AM »
Actually, this provides for an interesting solution.  CMOD v10.1 supports encryption.  It would be nice to provide an encryption key for each individual document.  Yes, it would slaughter performance, but it would make each individual document irrecoverable when the row is deleted.  It would also be possible to detect any tampering with an individual load by hashing all of the keys together and storing that hash in the arsload table...

In this scenario, an individual document row could be deleted in the database, eliminating the key, making the document irrecoverable.  A change to an existing document on disk would be impossible without the key from the database.  Any change to the file on disk would break the encryption.  Any row deleted would cause a 'verification' of the load to fail, since the missing key would break the hash in the arsload table.

Anyone care to expand on that idea?

-JD.

perhaps on this line of thinking, during the arsdoc delete, we know the object location, offset and length of the compressed document within the storage object overwrite the bytes.
#ISERIES #ODWEK #XML

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2230
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Alternative delete methods
« Reply #13 on: January 23, 2018, 08:05:42 AM »
> perhaps on this line of thinking, during the arsdoc delete, we know the object location, offset and length of the compressed document within the storage object overwrite the bytes.

I think this breaks the compression in the file.  I'm under the impression that there are compressed blocks aggregated into stored objects, and inside those compressed blocks are multiple documents, so blanking out one would break the rest of the objects in that compressed block.

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR