Hi,
My customer is experiencing strange failures of certain large filer's volumes. The problem that they are quite inconsistent... If anyone can shere some insight it will be greatly appreciated.
NBU 7.1.03 HP MS6000 library with 3 active drives. 2 HP LTO-3 drives daisy chained with robotic control to a Media Server, and a HP LTO-4 drive directly attached to the NetApp filer.
Often, but not always, large filer volumes backups to directly attached LTO-4 fail with status 84. These succeed while directed to remotely-attached LTO-3.
in short, the error shows as: 07/11/2012 15:19:48 - Error ndmpagent(pid=5216) MOVER_HALTED media write error - reason = 5 (NDMP_MOVER_HALT_MEDIA_ERROR)
07/11/2012 15:19:48 - Error ndmpagent(pid=5216) NDMP backup failed, path = /vol/notes_vol
07/11/2012 15:19:49 - Error bptm(pid=5452) cannot write image to media id B246L4, drive index 3
07/11/2012 15:23:10 - Info bptm(pid=5452) EXITING with status 84 <----------
Has anyone seen someting similar? Many thanks!
If anyone is interested there are more details/logs below:
============
On the filer:
ndmpd log (debug level 50) shows: "Wed Nov 7 15:23:20 GMT [tape.cmd.chkCondErr:error]: Tape device 2a.3: Check Condition: SCSI Op code Write(06)) (CDB 0x0a: 0x010000 bytes) hard error - Write error (0x3 - 0xc 0x0x0).
/etc/system hardly helpful: "DUMP: Media Error on tape write"
NBU logs are better:
All Log Entries:
07/11/2012 15:15:20 hreuknetflow Info 0 General cleaning media DB(s)
07/11/2012 15:19:48 hreuknetflow hreuk3210 Error 7515 Backup MOVER_HALTED media write error - reason = 5 (NDMP_MOVER_HALT_MEDIA_ERROR)
07/11/2012 15:19:48 hreuknetflow hreuk3210 Error 7515 Backup NDMP backup failed, path = /vol/notes_vol
07/11/2012 15:19:49 hreuknetflow hreuk3210 Error 7515 Media Device cannot write image to media id B246L4, drive index 3
07/11/2012 15:23:10 hreuknetflow hreuk3210 Info 7515 Backup Status media write error
07/11/2012 15:23:10 hreuknetflow hreuk3210 Error 7515 Backup backup of client hreuk3210 exited with status 84 (media write error)
07/11/2012 15:23:55 hreuknetflow hreuk3210 Info 7516 Backup started backup job for client hreuk3210, policy test-hreukfs1-main, schedule Full on storage unit HREUKNETFLOW-HCART4-ROBOT-TLD-1-HREUK3210
07/11/2012 15:23:55 hreuknetflow hreuk3210 Info 7516 Backup client hreuk3210 handling path /vol/vol0
07/11/2012 15:25:01 hreuknetflow hreuk3210 Info 7516 Media Device begin writing backup id hreuk3210_1352301835, copy 1, fragment 1, to media id B246L4 on drive HP.ULTRIUM4-SCSI.000 (index 3)
07/11/2012 15:25:20 hreuknetflow Info 0 General cleaning media DB(s)
job detailed status:
07/11/2012 12:33:34 - Info nbjm(pid=3528) starting backup job (jobid=7515) for client hreuk3210, policy test-hreukfs1-main, schedule Full
07/11/2012 12:33:35 - estimated 0 Kbytes needed
07/11/2012 12:33:35 - Info nbjm(pid=3528) started backup job for client hreuk3210, policy test-hreukfs1-main, schedule Full on storage unit HREUKNETFLOW-HCART4-ROBOT-TLD-1-HREUK3210
07/11/2012 12:33:36 - Info bpbrm(pid=4708) hreuk3210 is the host to backup data from
07/11/2012 12:33:36 - Info bpbrm(pid=4708) reading file list from client
07/11/2012 12:33:36 - started process bpbrm (4708)
07/11/2012 12:33:36 - connecting
07/11/2012 12:33:37 - Info bpbrm(pid=4708) starting ndmpagent on client
07/11/2012 12:33:37 - Info ndmpagent(pid=5216) Backup started
07/11/2012 12:33:37 - connected; connect time: 00:00:01
07/11/2012 12:33:37 - Info bptm(pid=5452) start
07/11/2012 12:33:38 - Info bptm(pid=5452) using 30 data buffers
07/11/2012 12:33:38 - Info bptm(pid=5452) using 65536 data buffer size
07/11/2012 12:33:38 - Info bptm(pid=5452) start backup
07/11/2012 12:33:38 - Info bptm(pid=5452) Waiting for mount of media id B246L4 (copy 1) on server hreuknetflow.
07/11/2012 12:33:38 - mounting B246L4
07/11/2012 12:34:29 - Info bptm(pid=5452) media id B246L4 mounted on drive index 3, drivepath nrst1a, drivename HP.ULTRIUM4-SCSI.000, copy 1
07/11/2012 12:34:29 - mounted; mount time: 00:00:51
07/11/2012 12:34:35 - positioning B246L4 to file 1
07/11/2012 12:34:51 - positioned B246L4; position time: 00:00:16
07/11/2012 12:34:51 - begin writing
07/11/2012 15:19:48 - Error ndmpagent(pid=5216) MOVER_HALTED media write error - reason = 5 (NDMP_MOVER_HALT_MEDIA_ERROR)
07/11/2012 15:19:48 - Error ndmpagent(pid=5216) NDMP backup failed, path = /vol/notes_vol
07/11/2012 15:19:49 - Error bptm(pid=5452) cannot write image to media id B246L4, drive index 3
07/11/2012 15:23:10 - Info bptm(pid=5452) EXITING with status 84 <----------
07/11/2012 15:23:10 - end writing; write time: 02:48:19
07/11/2012 15:23:15 - Info ndmpagent(pid=0) done. status: 84: media write error
media write error(84)
07/11/2012 15:33:10 - Info nbjm(pid=3528) starting backup job (jobid=7515) for client hreuk3210, policy test-hreukfs1-main, schedule Full
bptm log:
15:19:49.843 [5452.3576] <2> NdmpAgentSession[0]: [332] Received 10 (MEDIA_ERROR) ""
15:19:49.843 [5452.3576] <2> NdmpAgentSession[0]: [332] Replying error = 0
15:19:49.843 [5452.3576] <16> check_and_process_ndmpagent_backup_tasks: ndmpagent[0] reports media write error
15:19:49.843 [5452.3576] <2> NdmpAgentSession_close_by_index[0]: Saving ndmpagent session (still needed for ndmp media session, index 0)
15:19:49.843 [5452.3576] <2> write_data: ndmp_task = 1
15:19:49.843 [5452.3576] <2> write_data: status_to_return = -8, total_frag_kbytes 5306624
15:19:49.843 [5452.3576] <16> write_data: cannot write image to media id B246L4, drive index 3
15:19:49.859 [5452.3576] <2> send_MDS_msg: DEVICE_STATUS 1 2658 hreuknetflow B246L4 4000503 HP.ULTRIUM4-SCSI.000 2000056 WRITE_ERROR 0 0
15:19:49.921 [5452.3576] <2> log_media_error: successfully wrote to error file - 11/07/12 15:19:49 B246L4 3 WRITE_ERROR HP.ULTRIUM4-SCSI.000
15:19:49.921 [5452.3576] <2> write_backup: write_data() returned, exit_status = 84, CINDEX = 0, TWIN_INDEX = 0, backup_status = -8
15:19:49.921 [5452.3576] <2> write_backup: tp = 1302490921, stp = 1293098234, et = 9392687, mpx_total_kbytes[TWIN_INDEX = 0] = 5306624
15:19:49.937 [5452.3576] <2> io_terminate_tape: writing empty backup header, drive index 3, copy 1
15:19:49.937 [5452.3576] <2> io_terminate_tape: reposition to previous tapemark and rewrite header
15:19:49.937 [5452.3576] <2> io_ioctl: command (2)MTBSF 1 0x0 from (bptm.c.8774) on drive index 3
15:22:41.875 [5452.3576] <2> io_ioctl: command (0)MTWEOF 1 0x1 from (bptm.c.8856) on drive index 3
15:22:57.187 [5452.3576] <2> io_terminate_tape: absolute block position prior to writing empty header is 2, copy 1
15:22:57.187 [5452.3576] <2> io_write_back_header: drive index 3, empty_file, file num = 1, mpx_headers = 0, copy 1
15:22:57.406 [5452.3576] <2> io_write_block: ndmp_tape_write_func returned 1024
15:22:57.406 [5452.3576] <2> send_MDS_msg: MEDIADB 1 2658 B246L4 4000503 *NULL* 6 1352291615 1352291615 1352334815 0 0 0 0 24 5 0 0 1024 0 2 0
15:22:57.421 [5452.3576] <2> io_ioctl: command (2)MTBSF 1 0x0 from (bptm.c.9122) on drive index 3
15:23:04.625 [5452.3576] <2> io_ioctl: command (1)MTFSF 1 0x0 from (bptm.c.9124) on drive index 3
15:23:05.062 [5452.3576] <2> io_close: closing E:\Program Files\VERITAS\NetBackup\db\media\tpreq\drive_HP.ULTRIUM4-SCSI.000, from bptm.c.9152
15:23:05.281 [5452.3576] <2> NdmpMediaSession_close_public_and_ndmpagent[0]: closing public session, force = 0, agent_session_index = 0
15:23:05.281 [5452.3576] <2> NdmpMediaSession[0]: ndmp_public_session_destory: destroying session 0x014C7FE8
15:23:05.281 [5452.3576] <2> NdmpAgentSession[0]: [333] Received 16 (CONNECT_CLOSE_REPLY) ""
15:23:05.281 [5452.3576] <2> NdmpAgentSession[0]: [333] Replying error = 0
15:23:10.281 [5452.3576] <2> NdmpAgentSession_close_by_index[0]: closing ndmpagent session 014B4CA8
15:23:10.281 [5452.3576] <2> process_tapealert: TapeAlert returned 0x00000000 0x00000000 (from io_terminate_tape)
15:23:10.281 [5452.3576] <2> send_brm_msg: EXIT hreuk3210_1352291615 84
15:23:10.281 [5452.3576] <2> bptm: EXITING with status 84 <----------
15:23:11.562 [5428.4636] <2> bptm: instance - 1303197343
15:23:11.562 [5428.4636] <2> bptm: INITIATING (VERBOSE = 5): -unload -dn HP.ULTRIUM4-SCSI.000 -dp nrst1a -dk 2000056 -m B246L4 -mk 4000503 -mds 0 -alocid 2658 -nh hreuk3210 -nu root -nk 2b900d52288612d7 -np 764f88d4ba3b3a741959e2b5c58c9e4b59473dba4d130d3ce85c49c9a997fe7a5c695968576bd19fb6d556344b572fff0b9a3d289b695b07349651727b8a9150 -...
and finally vxlogview:
E:\Program Files\VERITAS\NetBackup\bin>vxlogview -p 51216 -o 134 -b "07/11/12 12:00:00" -e "07/11/12 16:10:00"
07/11/2012 15:18:14.296 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 DUMP: Wed Nov 7 15:22:01 2012 : We have written 5305862 KB.
07/11/2012 15:19:32.781 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 DUMP: Media error on tape write.
07/11/2012 15:19:32.781 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 DUMP: DUMP IS ABORTED
07/11/2012 15:19:33.968 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 DUMP: Deleting "/vol/notes_vol/../snapshot_for_backup.21437" snapshot.
07/11/2012 15:19:48.515 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 Connection or IO Error.
07/11/2012 15:19:48.515 [Error] V-134-36 MOVER_HALTED media write error - reason = 5 (NDMP_MOVER_HALT_MEDIA_ERROR)
07/11/2012 15:19:48.562 V-134-19 [NdmpAgent::SetErrorAndHalt] NdmpBackupManager.cpp(1173) - error code 84 (media write error)
07/11/2012 15:19:48.562 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 MoveletOutput: Media Error.
07/11/2012 15:19:48.953 [Error] V-134-32 NDMP backup failed, path = /vol/notes_vol
07/11/2012 15:23:10.296 [ConnectionToBrm::SendExitStatus] Sending EXIT STATUS 84: media write error
07/11/2012 15:23:56.484 [MainStartup] ==================== STARTUP ====================
07/11/2012 15:23:56.484 [Info] V-134-3 started process ndmpagent (pid=5824.4864)
07/11/2012 15:23:56.484 [Warning] V-134-255 Unrecognized parameter -stream_count
07/11/2012 15:23:56.484 [Warning] V-134-255 Unrecognized parameter 8
07/11/2012 15:23:56.484 [Warning] V-134-255 Unrecognized parameter -stream_numbe
r
07/11/2012 15:23:56.484 [Warning] V-134-255 Unrecognized parameter 2
07/11/2012 15:23:56.484 [Warning] V-134-255 Unrecognized parameter -blks_per_buf
fer
07/11/2012 15:23:56.484 [Warning] V-134-255 Unrecognized parameter 128
07/11/2012 15:23:56.484 [Warning] V-134-255 Unrecognized parameter -use_otm
07/11/2012 15:23:56.484 [Warning] V-134-255 Unrecognized parameter -fso
07/11/2012 15:23:56.484 [Warning] V-134-255 Unrecognized parameter -kl
07/11/2012 15:23:56.484 [Warning] V-134-255 Unrecognized parameter 28
07/11/2012 15:23:56.484 [Warning] V-134-255 Unrecognized parameter -use_ofb
07/11/2012 15:23:56.484 [NdmpAgent::CheckLogLevels] DiagLevel = 1 DebugLevel = 1
07/11/2012 15:23:57.140 [NdmpGlueLogTraceCb] ndmp_connect_to_server: hostname = hreuk3210, portname = 10000