OnDemand User Group

Tips and Tricks => Tips and Tricks => Topic started by: frasert on March 04, 2011, 12:59:43 PM

Title: Script to show doc counts for a list of app groups
Post by: frasert on March 04, 2011, 12:59:43 PM
This ksh script will produce output in csv format that shows the total # of documents for a given list of app groups.  First, create a file called ag_list that is a list of app groups, one per line.  Then run this script from the same dir.

Output looks like this:

appgroup1,GBP,0
appgroup2,VUK,2010
appgroup3,UUK,167
appgroup4,VVI,0
appgroup5,NVN,8
appgroup6,YTK,930

You could then save this as a .csv file so it can be opened directly in MS Excel.


** Run at your own discretion!  This does eat up resources on the server, so better to schedule this off hours! **


Code: [Select]
#!/bin/ksh93

function db2getagidname
{
  db2 +o connect to archive1
  db2 -x "select agid_name from arsag where name='$1'"
  db2 +o terminate
}

function db2gettables
{
  db2 +o connect to archive1
  db2 list tables | grep -E "^$1[0-9]+" | awk '{print $1}'
  db2 +o terminate
}

function db2gettblcnt
{
  db2 +o connect to archive1
  db2 -x "select count(*) from $1"
  db2 +o terminate
}

while read agname
do

  agidname=$(db2getagidname $agname)
  agidname=${agidname# *}
  agidname=${agidname%% *}
  tblnames=$(db2gettables $agidname)

  printf "$agname,$agidname,"

  let tblcnt=0
  for tbl in ${tblnames[@]}
  do
    typeset -i t=$(db2gettblcnt $tbl)
    let tblcnt=tblcnt+t
  done

  printf "$tblcnt\n"

done < ag_list
Title: Re: Script to show doc counts for a list of app groups
Post by: Alessandro Perucchi on March 06, 2011, 04:00:27 AM
Hi,

Nice script, but wow... so many connect - disconnect from DB2 is a little bit too much for my eyes!
I would enhanced like that:

Code: [Select]
#!/bin/ksh93

function db2getagidname
{
  db2 -x "select agid,agid_name from arsag where name='$1'"
}

function db2gettables
{
  db2 -x "select table_name from arsseg where agid=$1"
}

function db2gettblcnt
{
  db2 -x "select count(*) from $1"
}

db2 +o connect to archive1
while read agname
do

  agidname=$(db2getagidname $agname)
  agid=$(echo $agidname | awk '{print $1}')
  agidname=$(echo $agidname | awk '{print $2}')
  tblnames=$(db2gettables $agid)

  printf "$agname,$agidname,"

  let tblcnt=0
  for tbl in ${tblnames[@]}
  do
    typeset -i t=$(db2gettblcnt $tbl)
    let tblcnt=tblcnt+t
  done

  print "$tblcnt"

done < ag_list

db2 +o terminate

You can see also that I've simplified your DB2 queries, so no need to use too much piping to grep and awk...

You could have spared the

Code: [Select]
db2 list tables | grep -E "^$1[0-9]+" | awk '{print $1}'
by doing

Code: [Select]
db2 list tables | awk '/^'$1'[0-9]+/ {print $1}'
Which does exactly the same thing, and it's more efficient.

Well, in that kind of script efficiency doesn't matter too much :-)


And concerning the function db2gettables, I prefere to look directly at what CMOD will look at, and it is what is written in the ARSSEG segment.
If you have a table in which is called "ABA129_backup", then your script will see it, and include it in your count.
Which is maybe not what you wanted.
So by reading the ARSSEG table, you avoid such problem.

Cheers,
Alessandro
Title: Re: Script to show doc counts for a list of app groups
Post by: frasert on March 07, 2011, 07:54:11 AM
Nice!  Exactly the kind of input I was hoping for :)

Yes, now that you mention it, there were way too many db2 connects going on in that script.  Nice use of awk there as well.  And using arsseg is definitely better than using list tables.
Title: Re: Script to show doc counts for a list of app groups
Post by: j4jackycheng on May 02, 2012, 06:08:52 AM
BIG THANKS to both of you! @frasert and @AlessandroPerucchi.

@frasert, i have used your script as skeleton and wrote a few customized scripts to do more then counting and get the stats i wanted.
And @AlessandroPerucchi, thanks for your input in moving out the connect and terminate parts.
I have been using my script for more then half a year now.  Owe you a BIG THANKS. :)
Title: Re: Script to show doc counts for a list of app groups
Post by: dr_te_z on May 03, 2012, 12:48:45 AM
You can create a userview which gives you the same info. I use this:
Code: [Select]
   select varchar (
        case rownumber() over(order by a.table_name asc )
         when 1  then 'create view cmod_tablecount as ('
                 else 'union all '
         end
    || 'select char('
    || chr(39) 
    || lcase(rtrim(a.table_name))
    || chr(39)
    || ') as tabnam'
|| ', count (*) as tabcnt from '
    || lcase(rtrim(a.table_schema))
    || '.'
    || lcase(rtrim(a.table_name))
    || case rownumber() over(order by a.table_name desc)
       when 1  then ');'
               else space(1)
       end,128
   ) as view_definition
   from sysibm.tables   a
   where                a.table_type       = ucase('base table')
     and  length(rtrim( a.table_name  ))   =  4
     and  substr      ( a.table_name,3,1 ) = ucase('a')
     and  substr      ( a.table_name,4,1 ) between '1' and '9'
   order by             a.table_name asc
;                                                                                               
the output of this query must be captured in a file and fed back into db2
Title: Re: Script to show doc counts for a list of app groups
Post by: Alessandro Perucchi on May 03, 2012, 04:11:02 AM
You can create a userview which gives you the same info. I use this:
Code: [Select]
   select varchar (
        case rownumber() over(order by a.table_name asc )
         when 1  then 'create view cmod_tablecount as ('
                 else 'union all '
         end
    || 'select char('
    || chr(39) 
    || lcase(rtrim(a.table_name))
    || chr(39)
    || ') as tabnam'
|| ', count (*) as tabcnt from '
    || lcase(rtrim(a.table_schema))
    || '.'
    || lcase(rtrim(a.table_name))
    || case rownumber() over(order by a.table_name desc)
       when 1  then ');'
               else space(1)
       end,128
   ) as view_definition
   from sysibm.tables   a
   where                a.table_type       = ucase('base table')
     and  length(rtrim( a.table_name  ))   =  4
     and  substr      ( a.table_name,3,1 ) = ucase('a')
     and  substr      ( a.table_name,4,1 ) between '1' and '9'
   order by             a.table_name asc
;                                                                                               
the output of this query must be captured in a file and fed back into db2

Hello,

nice sql.

Nevertheless, my input is still true, if you want to be really exact, you must do your checks based on the content of the list of tables in the CMOD ARSSEG tables, otherwise you might end up counting wrong tables.

In your case, for example, you might not find the tables SL2, SL3, SL4, ... or tables with ABA1234, since you look per default only for tables with a length of 4 characters.

maybe it would be more useful to have something like that:

Code: [Select]
   select varchar (
        case rownumber() over(order by a.table_name asc )
         when 1  then 'create view cmod_tablecount as ('
                 else 'union all '
         end
    || 'select char('
    || chr(39) 
    || lcase(rtrim(a.table_name))
    || chr(39)
    || ') as tabnam'
|| ', count (*) as tabcnt from ' 
    || lcase(rtrim(a.table_name))
    || case rownumber() over(order by a.table_name desc)
       when 1  then ');'
               else space(1)
       end,128
   ) as view_definition
   from arsseg a order by a.table_name asc
;

Of course in order this to work, you might need to be the OnDemand instance owner :-)

Sincerely yours,
Alessandro

Title: Re: Script to show doc counts for a list of app groups
Post by: dr_te_z on May 03, 2012, 05:36:17 AM
maybe it would be more useful to have something like that:
Your version is better, thanks.
Title: Re: Script to show doc counts for a list of app groups
Post by: Lars Bencze on November 02, 2016, 06:28:22 AM
I'd like to up the challenge level a bit, and change the question a bit. Hopefully someone already knows the answer to this.

When you send AFP documents to OnDemand, it is possible to create more than one DB record per physical document. I.e. let's assume that you create an AFP file with 10 letters in it (10 documents), but each of these are related to two different persons. So you add two metadata records per document, i.e. 20 records/rows get added to the DB, but only 10 documents.

So the scripts above don't really count the number of DOCUMENTS in OnDemand, but rather the number of document RECORDS (or "rows") in the database.

Technically, it would be possible to run a SQL query against each table, and then group the result on the field that we know is duplicated. (In the example above, it could be "CustName" or "SocSecNo" or something similar.)

But assume that we do not always know which field is the one that is varied in non-unique records, how can we count the actual number of DOCUMENTS in OnDemand? Across quite a few Application Groups, which have a fairly large number of tables each, and different sets of metadata.

In the example above, let's say that Alessandros excellent scripts tell us that there are 1.342.917 records in one particular application group, and we happen to know that there have only been 642.387 documents loaded. What we want to find is that number - "642.387". How can we do that count?
 8) :o ;)

(Yes, I have a real-life request for this particular info.)

EDIT: For the record, I already know of and make use of this info:
http://www.odusergroup.org/forums/index.php?topic=2086.msg7928#msg7928

Title: Re: Script to show doc counts for a list of app groups
Post by: Justin Derrick on November 02, 2016, 09:31:50 AM
Unless you can provide the logic that determines which field to select as being the one to count/group, I don't think you'll find a generic SQL solution that meets your needs.

Of course, you could 'brute force' this problem by selecting/counting/grouping each AG Field column, and letting the user decide which number to use.

-JD.
Title: Re: Script to show doc counts for a list of app groups
Post by: Alessandro Perucchi on January 19, 2017, 12:59:55 PM
Easy peasy... not really a challenge...

in order to have really the number of documents and not the number of rows, then you must use such select :

Code: [Select]
select count(distinct doc_name,doc_off, doc_len, comp_off, comp_len, pri_nid, sec_nid) from <segment table name>
Because the combination of these 7 fields makes a entry to a single document in the storage manager (cache, tsm, HDFS, OAM, etc...)
To be really exact you need to do a count distinct from the union of all the segment tables, since it is possible that you are the same document reference in different segment table.
But I let that as an exercise to the people who wants to challenge themselves!! :-D

Regards,
Alessandro