Author Topic: Document size on disk i AFP format  (Read 2694 times)

Johan Dahlgren

  • Jr. Member
  • **
  • Posts: 12
    • View Profile
Document size on disk i AFP format
« on: January 25, 2022, 06:24:14 AM »
Hello,

I am having problems with calculating the average size of the documents stored in OnDemand folders and application groups. I have been manually trying to match different sizes in the tables to match it to the actual size of the AFP file when written to disk outside of OnDemand.

We are working to convert our documents from AFP to PDF-A/2b and want an accurate way to estimate the time needed to do this. We are fetching the documents from OnDemand and writing them to disk in the AFP format, then we convert them using the AFP2PDF converter and finaly store them in a new storage.

I have been looking at the DOC_LEN in the tables named from the application group name (ARSAG, AGID_NAME) and combining that with the DECOMP_SIZE of the correlating resource from ARSRES table... for the most part it seems to give me the correct size of the file but when the resource DECOMP_SIZE is null the numbers are off. Examples below.

If i have a file with DOC_LEN of 2000 and that has a RESOURCE with a DECOMP_SIZE of 4000 the document size on disk will be 6000b

But if I have a file with DOC_LEN of 2000 and a RESOURCE with a DECOMP_SIZE of '-' the document size is not the expected 2000 but often much bigger even tho there is no DECOMP_SIZE.

Am i way off in what i try to do? Is there a better way to calculate the expected number of bytes a document will have once it is written to disk?

Kind regards
Johan Dahlgren

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2229
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Document size on disk i AFP format
« Reply #1 on: January 26, 2022, 10:36:11 AM »
The easiest way is to calculate an average compression ratio for an Application by using the input and output sizes in the System Load table, then calculate the average size of each document, then multiply it out by the number of documents.

I'd also be interested to know more about your project and how you're handling the conversion.  Can you share a little more information on your process?

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Johan Dahlgren

  • Jr. Member
  • **
  • Posts: 12
    • View Profile
Re: Document size on disk i AFP format
« Reply #2 on: January 28, 2022, 07:26:46 AM »
System load table, is that the one called SA in the database? in our case it is named SA2, not sure why there is no SA1 but could be it has been deleted since it is set to expire after 3650 days.

I got a bit of help from our Database Admin with an SQL to get a decent number calculated, here is an example of one wouch SQL query:
Code: [Select]
select x.name, x.agid, x.agid_name  , AVG(BIGINT(x.decomp_size + x.DOC_LEN)) as AVG from (
select AG.NAME, AG.AGID, AG.AGID_NAME, RES.DECOMP_SIZE, DOC.DOC_LEN
from ARSRES as RES ,
ARSAG as AG ,(
select DOC_LEN,RESOURCE from QLA1    UNION ALL
select DOC_LEN,RESOURCE from QLA2    UNION ALL
select DOC_LEN,RESOURCE from QLA3    UNION ALL
select DOC_LEN,RESOURCE from QLA4    UNION ALL
select DOC_LEN,RESOURCE from QLA5
) as DOC
where AG.AGID=RES.AGID
and RES.RID=DOC.RESOURCE and AG.AGID_NAME='QLA'
) X  where decomp_size is not null
group by x.NAME, x.AGID, x.AGID_NAME

It works well enough for us we think. This combined with a select count(*) gives us the aproximate size of the application group.

I thing i can try to describe an overview of our conversion process. Unfortunately no details since the data is sensetive.

Basically we have CM OnDemand that has been running for many years, think it was early 2000. Anyway it was old when i started 11 years ago. :)
It has been recieving documents of various kinds over these years, slowly adding and changing with no regard for GDPR or that the data eventually had to be removed och preserved for future statistics.

Start of our project, we were tasked with making a new document storage as well as moving and converting the old one.

What we are doing to migrate the documents.

1. We segment a folder search, usually one year worth of documents/folder, depending on the size of the folder (number of documents).

2. Write the AFP files (so far we only worked with the AFP format but we will have to migrate some TIF aswell in the future) to disk along with the metadata about each document. We also give each document a unique name for when we put it in the new storage. We also include all the available resources, because they will be needen later in the conversion to PDF-A/1b.

3. We now have all the information we can get we run two processes
    A. Convert the file, this is done with AFP2PDF converter, it takes a bit of fiddeling with resources souch as font mapping and replacing missing peices of information with other resources. But we
        got it to work fairly well.
    B. Update the medatata about the document, this part is a bit sensetive but basically we update dates that are wrong or missing, customer personal id with internal customer ids instead and we
        apply rules for when document should be erased/preserved.

4. Now most of the work is done and we package the updated metadata about the documents with the new PDF files and send it to the new server that inserts it all in to its database and storage.

That is the basic process, if you have any questions about some part of it I'll try to answer if I can.

EDIT: Unfortunately it seems that our SA table does not contain loads for all our application groups.
Code: [Select]
db2 "select count(*) from SA2 where LOAD_AGID=5863"

1
-----------
          0

  1 record(s) selected.
And i have resources in ARSRES that have the AGID 5863 but alas no DECOMP_SIZE.
« Last Edit: January 28, 2022, 09:47:59 AM by Johan Dahlgren »

Darrell Bryant

  • Full Member
  • ***
  • Posts: 104
  • Sed fugit interea fugit inreparabile tempus-Virgil
    • View Profile
Re: Document size on disk i AFP format
« Reply #3 on: January 31, 2022, 06:38:20 AM »
The System Load facility was first available about 2010 as a part of the 8.4.1.x server version. On Multi-platforms the creation of the System Load facility was optional and required customer action.
The first System Load table is SA2, there was never an SA1 (just as the first System Log table was SL2).
#IBMi #iSeries #PDF #XML #400 Indexer #ASM

Johan Dahlgren

  • Jr. Member
  • **
  • Posts: 12
    • View Profile
Re: Document size on disk i AFP format
« Reply #4 on: February 02, 2022, 04:59:50 AM »
I see, that explains why it starts on SA2. And regardning System Load the oldest i can find is from 2017, does that mean that we chose to activate it in 2017 and anything before that was nevere logged? Ones this is activated can I trust that all loads are saved in the System Load facility or is it dependent on what load was run?

For example if we have 2 folders, folder1 and folder2 and each of these have their own application groups and applications connected. If System Load feature was activated will it store all loads for both folders/applications groups or do you have to activate it for each folder/application group?

Darrell Bryant

  • Full Member
  • ***
  • Posts: 104
  • Sed fugit interea fugit inreparabile tempus-Virgil
    • View Profile
Re: Document size on disk i AFP format
« Reply #5 on: February 02, 2022, 06:29:29 AM »
Once the System Load facility is created / activated it is used by all Application Groups. If your System Load entries start in 2017, then yes, that is when someone in your organization ran the steps to create the facility.
Note that the System Load facility does not get updated if data is unloaded.
#IBMi #iSeries #PDF #XML #400 Indexer #ASM