Informing the IBM Community

Backup Performance

0
(0)

A while ago, I was asked to look at the performance during a Save 21 ‘Save the entire system’.  Basically it took a little bit longer, as time went by, due to an increase in both the size and number of objects.  However there were a couple of days when the backup took much longer.

At first I looked at PDI ‘Performance Data Investigator’ in Navigator for i.  I couldn’t find any noticeable differences between the two.

So I resorted to looking at the History Logs and the backup Job Logs.  While the elapsed time was 25% longer, the number of jobs running during the two periods was 41% more.  This might be an unfair comparison, because the backup took longer and so was now running alongside overnight jobs that were expected to run after the backup.

So I compared the time differences at each library save using the following SQL statement.  This will pick up Save messages to a device or *SAVF, to establish if there was a gradual change or not.  The resulting graph is above (once imported into excel).

with first as (select MESSAGE_TIMESTAMP, MESSAGE_ID,
regexp_substr(FROM_JOB,'(?<=\/)\w‘,9) as Job_Name,
case when MESSAGE_ID IN (‘CPC3701’, ‘CPC3722) then regexp_substr(MESSAGE_TEXT,'(?<=\slibrary\s)\w+’)
when MESSAGE_ID in (‘CPC3732’, ‘CPC3733’) then ‘security’
when MESSAGE_ID in (‘CPC3736’, ‘CPC3737’) then ‘config’
when MESSAGE_ID in (‘CPC9410’, ‘CPC9063’) then ‘document’
when MESSAGE_ID in (‘CPF3837’, ‘CPF3838’) then ‘IFS’
end as SAVED
from table(qsys2.history_log_info(
START_TIME => ‘2021-10-06-22.00.00.000000’,
END_TIME => ‘2021-10-07-06.00.00.00000’)) a
where MESSAGE_ID IN (‘CPC3701’, ‘CPC9410’, ‘CPC3732’, ‘CPC3736’, ‘CPF3837’, ‘CPC3722’, ‘CPC3733’, ‘CPC3737’, ‘CPC9063’, ‘CPF3838’) ),
second as (select MESSAGE_TIMESTAMP, MESSAGE_ID,
regexp_substr(FROM_JOB,'(?<=\/)\w
‘,9) as Job_Name,
case when MESSAGE_ID IN (‘CPC3701’, ‘CPC3722) then regexp_substr(MESSAGE_TEXT,'(?<=\slibrary\s)\w+’)
when MESSAGE_ID in (‘CPC3732’, ‘CPC3733’) then ‘security’
when MESSAGE_ID in (‘CPC3736’, ‘CPC3737’) then ‘config’
when MESSAGE_ID in (‘CPC9410’, ‘CPC9063’) then ‘document’
when MESSAGE_ID in (‘CPF3837’, ‘CPF3838’) then ‘IFS’
end as SAVED
from table(qsys2.history_log_info(
START_TIME => ‘2021-10-07-22.00.00.000000’,
END_TIME => ‘2021-10-08-06.00.00.00000’)) b
where
MESSAGE_ID IN (‘CPC3701’, ‘CPC9410’, ‘CPC3732’, ‘CPC3736’, ‘CPF3837’,
‘CPC3722’, ‘CPC3733’, ‘CPC3737’, ‘CPC9063’, ‘CPF3838’))
select f.MESSAGE_TIMESTAMP as First_Time,
s.MESSAGE_TIMESTAMP as Second_Time,
minute(s.MESSAGE_TIMESTAMP - f.MESSAGE_TIMESTAMP) as Minute_Diff,
f.SAVED, f.MESSAGE_ID,
f.JOB_NAME as First_Job, s.JOB_NAME as Second_Job
from first F, second s
where f.MESSAGE_ID = s.MESSAGE_ID
and f.SAVED = s.SAVED;

If your backup runs between 10pm and 6am, you only need to change the date in 4 places, but be careful not to change the format.

So the next thing was to establish was what was happening at the four points of time, when there was a marked difference.  Most of the jobs were of equal number and of equal elapsed time.  I did establish that there were a lot of ‘QZDASOINIT’ jobs.  These are ODBC or JDBC SQL jobs accessing IBM i databases.  The problem with these jobs is that they either, stay active for a long period of time, or come and go within seconds.  Luckily there are CPIAD09 “User USER from client 192.1.1.1 connected to job 123456/QUSER/QZDASOINIT” messages in the History Log too.  This message also reports other jobs such as QZDASSINIT which is QZDASOINIT using SSL, QPWFSERVSO which is a connection to the IFS (not mapped drive) and other jobs.

The following SQL will show you who, what and from where, these connections are being made.  If you are clever, you can put this together to compare, as I did above.  Changing the dates and times as necessary.

WITH Connections as (
SELECT from_user,
SUBSTR(CAST(message_tokens as VARCHAR(80) CCSID 37), 1,10) as job,
SUBSTR(CAST(message_tokens as VARCHAR(80) CCSID 37), 64, 15) as ip
FROM TABLE(qsys2.history_log_info
(START_TIME => ‘2021-10-06-22.00.00.000000’,
END_TIME => ‘2021-10-07-06.00.00.000000’))
WHERE message_id = ‘CPIAD09’)
SELECT ip, from_user as user, job, count(*) as connections
FROM Connections
GROUP BY ip, from_user, job;

P.S. I recommend using ‘New Navigator for i’, for PDI and much more, it is much faster and uses less resources.  You need to be ‘Up to date’ with PTFs, but you only need to install it on a single partition that can link to others.  That’s right you only need one partition with V7R4 TR5 to use New Navigator for i for any of your partitions.  It won’t work very well linked to 7.2 or earlier partitions.  It uses standard Host Servers on the IBM I.

For PDI you can save and restore the Collection data, to analyse it from any other partition without impact to the original partition.  If you have a suitable partition, use it to try out New Navigator.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.


Comments

4 responses to “Backup Performance”

  1. Please note, if you copy the text, the last character of ‘CPC3722′ is not the right apostrophe. You have to change it twice.
    As well: s.MESSAGE_TIMESTAMP – f.MESSAGE_TIMESTAMP ist not minus. Replace it.

  2. And the reason for the slow backups was??

  3. Terry Bartlett avatar
    Terry Bartlett

    David,
    Basically the identifiable jobs that ended during the backup were the same, more or less. While there were lots of QZDASOINIT jobs that ended, there were quite a few that had not ended. This made me look at how I could measure what they were doing. When I used the Connections SQL I found that there were 65% more connections on the day of the longer backup. The actual problem was probably the use of memory by these jobs, it is also possible for jobs to lock files that are being backed up, but this was not the problem in this case. Backups run better if they have lots of memory available. If possible, running backups in a restricted state, or near restricted state is advisable.

  4. Terry Bartlett avatar
    Terry Bartlett

    Juergen,
    I think I have fixed it. Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *