[Eisfair] Fehlermeldung beim Einsatz von rsync - NEU

Jürgen Witt j-witt at web.de
Mi Mär 13 04:33:00 CET 2019


Hallo NG,

ich habe vor wenigen Tagen zu dem Thema im Betreff hier schon etwas 
geschrieben und war guter Hoffnung, daß das Thema erledigt wäre. Stimmt 
aber leider nicht.

Der rsync der Daten hat nur genau 1x funktioniert Und zwar nachdem ich 
den Ordner, in dem es zu dem Abbruch kam (TDAMP), auf dem 
Ziel/Backup-Server in TDAMP- umbenannt und danach den Ordner TDAMP neu 
angelegt habe. Am folgenden Abend ist es wieder mit der selben 
Fehlermeldung in die Hose gegangen. Aber auch nach dem Austausch der 
defekten Festplatte (1 pending sector) heute, ist die Sicherung per 
rsync wieder an gleicher Stelle/Datei abgebrochen.

Email vom Backup-Server eis mit dem Betreff "fcron <root at eis> 
/usr/bin/server-sichern"

rsync: connection unexpectedly closed (2266062707 bytes received so far) 
[receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(235) 
[receiver=3.1.3]
rsync: connection unexpectedly closed (32617488 bytes received so far) 
[generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(235) 
[generator=3.1.3]

Neue Platte /dev/sdb:

eis # smartctl -a /dev/sdb
smartctl 7.0 2018-12-30 r4883 [i686-linux-3.16.62-eisfair-1-PAE] (local 
build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD20EFRX-68EUZN0
Serial Number:    WD-WCC4M6NCH89F
LU WWN Device Id: 5 0014ee 2bb0ff2cc
Firmware Version: 82.00A82
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Mar 13 02:22:50 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                         was never started.
                                         Auto Offline Data Collection: 
Enabled.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
Always       -       0
   3 Spin_Up_Time            0x0027   100   253   021    Pre-fail 
Always       -       0
   4 Start_Stop_Count        0x0032   100   100   000    Old_age 
Always       -       1
   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
Always       -       0
   7 Seek_Error_Rate         0x002e   200   200   000    Old_age 
Always       -       0
   9 Power_On_Hours          0x0032   100   100   000    Old_age 
Always       -       17
  10 Spin_Retry_Count        0x0032   100   253   000    Old_age 
Always       -       0
  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age 
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
Always       -       1
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always 
       -       0
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always 
       -       1
194 Temperature_Celsius     0x0022   118   116   000    Old_age   Always 
       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always 
       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always 
       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age 
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always 
       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age 
Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%        17 
     -
# 2  Short offline       Completed without error       00%        17 
     -

SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Vorhandene Festplatte /dev/sda, die mit /dev/sdb ein Software-Raid-1 bildet

eis # smartctl -a /dev/sda
smartctl 7.0 2018-12-30 r4883 [i686-linux-3.16.62-eisfair-1-PAE] (local 
build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD20EFRX-68EUZN0
Serial Number:    WD-WCC4M4FF0JJZ
LU WWN Device Id: 5 0014ee 2b92310de
Firmware Version: 82.00A82
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Mar 13 02:26:24 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85) Offline data collection activity
                                         was aborted by an interrupting 
command from host.
                                         Auto Offline Data Collection: 
Enabled.
Self-test execution status:      (   0) The previous self-test routine 
completed
                                         without error or no self-test 
has ever
                                         been run.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
Always       -       0
   3 Spin_Up_Time            0x0027   177   177   021    Pre-fail 
Always       -       4125
   4 Start_Stop_Count        0x0032   100   100   000    Old_age 
Always       -       10
   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
Always       -       0
   7 Seek_Error_Rate         0x002e   200   200   000    Old_age 
Always       -       0
   9 Power_On_Hours          0x0032   081   081   000    Old_age 
Always       -       13892
  10 Spin_Retry_Count        0x0032   100   253   000    Old_age 
Always       -       0
  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age 
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
Always       -       10
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always 
       -       3
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always 
       -       292
194 Temperature_Celsius     0x0022   116   108   000    Old_age   Always 
       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always 
       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always 
       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age 
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always 
       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age 
Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     13892 
     -
# 2  Short offline       Completed without error       00%     13892 
     -
# 3  Short offline       Completed without error       00%     13869 
     -
# 4  Short offline       Completed without error       00%     13845 
     -
# 5  Short offline       Completed without error       00%     13821 
     -
# 6  Extended offline    Completed without error       00%     13803 
     -
# 7  Short offline       Completed without error       00%     13797 
     -
# 8  Short offline       Completed without error       00%     13773 
     -
# 9  Short offline       Completed without error       00%     13749 
     -
#10  Short offline       Completed without error       00%     13725 
     -
#11  Short offline       Completed without error       00%     13701 
     -
#12  Short offline       Completed without error       00%     13677 
     -
#13  Short offline       Completed without error       00%     13653 
     -
#14  Extended offline    Completed without error       00%     13635 
     -
#15  Short offline       Completed without error       00%     13629 
     -
#16  Short offline       Completed without error       00%     13605 
     -
#17  Short offline       Completed without error       00%     13581 
     -
#18  Short offline       Completed without error       00%     13557 
     -
#19  Short offline       Completed without error       00%     13533 
     -
#20  Short offline       Completed without error       00%     13437 
     -
#21  Short offline       Completed without error       00%     13414 
     -

SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Per cron wird um 23:30 Uhr auf dem Server eis (10.20.3.100) täglich das 
Script server-sichern gestartet.

eis # cat /usr/bin/server-sichern
#! /bin/bash
ssh 10.20.3.101 touch /public/TDAMP/DS/end.txt
ssh 10.20.3.101 sleep 45
rsync -avzu --delete 10.20.3.101:/home /data >/var/log/rsync-home_log
rsync -avzu --delete 10.20.3.101:/public /data >/var/log/rsync-public_log
rsync -avzu --delete 10.20.3.101:/var/www/htdocs/webdav /var/www/htdocs 
 >/var/log/rsync-webdav_log
ssh 10.20.3.101 rm /public/TDAMP/DS/end.txt
rm -f /public/TDAMP/DS/end.txt
/usr/bin/server-sichern-report

Dieses Script sorgt für den Report, der per Email zugestellt wird.

eis # cat /usr/bin/server-sichern-report
#!/bin/sh
( echo "To: root"
   echo "Subject: Praxis Borgweg Abgleich Server + Backup-Server"
   echo "Groesse und Datum der log-Dateien"
   ls -l /var/log/rsync-*
   echo "----------------------------------------"
   echo "Sicherung von /home"
   # cat /var/log/rsync-home_log
   echo "-----------------------------------------"
   echo "Zeitpunkt der Sicherung"
   ls -la /var/log/rsync-home_log
   echo "-----------------------------------------"
   echo "Sicherung von /public"
   cat /var/log/rsync-public_log
   echo "-----------------------------------------"
   echo "Zeitpunkt der Sicherung"
   ls -la /var/log/rsync-public_log
   echo "-----------------------------------------"

   ) | /usr/lib/sendmail root

Ergebnis einer unvollständigen Sicherung:

Anfang des zugesendeten Report:

Groesse und Datum der log-Dateien
-rw-r--r-- 1 root root   3314 Mar 12 23:30 /var/log/rsync-home_log
-rw-r--r-- 1 root root 186499 Mar 13 00:33 /var/log/rsync-public_log
-rw-r--r-- 1 root root    174 Mar 13 00:33 /var/log/rsync-webdav_log

...
public/TDAMP/DS/daten/prax1/PSI.DBF
public/TDAMP/DS/daten/prax1/QMETA.DBF
public/TDAMP/DS/daten/prax1/QMETA.FPT
-----------------------------------------
Zeitpunkt der Sicherung
-rw-r--r-- 1 root root 186499 Mar 13 00:33 /var/log/rsync-public_log

Hier endet der Report.

Der zugesendete Report endet wieder mit der selben Datei QMETA.FPT in 
einem Unterordner von public. Diese Datei ist 2700 MB groß. Der Report 
des dritten rsync-Prozesses (webdav) wird nicht gesendet, obwohl er ja 
vorhanden ist und erst nach dem Abruch des zweiten rsync-Prozesses 
(public) ausgeführt wird.

Auszug aus /var/log/messages auf dem Produktiv-Server debian (10.20.30.101)

Mar 12 23:30:01 debian sshd[30074]: Accepted publickey for root from 
10.20.3.100 port 48907 ssh2: RSA 
SHA256:We20RCnHw5WERqLEu5nbmonNrhYB/AgkeIT+yGXBFLY
Mar 12 23:30:01 debian sshd[30074]: pam_unix(sshd:session): session 
opened for user root by (uid=0) 

Mar 12 23:30:01 debian sshd[30074]: Received disconnect from 10.20.3.100 
port 48907:11: disconnected by user 

Mar 12 23:30:01 debian sshd[30074]: Disconnected from user root 
10.20.3.100 port 48907 

Mar 12 23:30:01 debian sshd[30074]: pam_unix(sshd:session): session 
closed for user root 

Mar 12 23:30:01 debian sshd[30233]: Accepted publickey for root from 
10.20.3.100 port 48908 ssh2: RSA 
SHA256:We20RCnHw5WERqLEu5nbmonNrhYB/AgkeIT+yGXBFLY
Mar 12 23:30:01 debian sshd[30233]: pam_unix(sshd:session): session 
opened for user root by (uid=0) 

Mar 12 23:30:46 debian sshd[30233]: Received disconnect from 10.20.3.100 
port 48908:11: disconnected by user 

Mar 12 23:30:46 debian sshd[30233]: Disconnected from user root 
10.20.3.100 port 48908 

Mar 12 23:30:46 debian sshd[30233]: pam_unix(sshd:session): session 
closed for user root 

Mar 12 23:30:47 debian sshd[31049]: Accepted publickey for root from 
10.20.3.100 port 48910 ssh2: RSA 
SHA256:We20RCnHw5WERqLEu5nbmonNrhYB/AgkeIT+yGXBFLY
Mar 12 23:30:47 debian sshd[31049]: pam_unix(sshd:session): session 
opened for user root by (uid=0) 

Mar 12 23:30:53 debian sshd[31049]: Received disconnect from 10.20.3.100 
port 48910:11: disconnected by user 

Mar 12 23:30:53 debian sshd[31049]: Disconnected from user root 
10.20.3.100 port 48910 

Mar 12 23:30:53 debian sshd[31049]: pam_unix(sshd:session): session 
closed for user root 

Mar 12 23:30:53 debian sshd[31109]: Accepted publickey for root from 
10.20.3.100 port 48911 ssh2: RSA 
SHA256:We20RCnHw5WERqLEu5nbmonNrhYB/AgkeIT+yGXBFLY
Mar 12 23:30:53 debian sshd[31109]: pam_unix(sshd:session): session 
opened for user root by (uid=0) 
 
 

Mar 13 00:33:12 debian sshd[31109]: Received disconnect from 10.20.3.100 
port 48911:11: disconnected by user 

Mar 13 00:33:12 debian sshd[31109]: Disconnected from user root 
10.20.3.100 port 48911 

Mar 13 00:33:12 debian sshd[31109]: pam_unix(sshd:session): session 
closed for user root 

Mar 13 00:33:14 debian sshd[25188]: Accepted publickey for root from 
10.20.3.100 port 48976 ssh2: RSA 
SHA256:We20RCnHw5WERqLEu5nbmonNrhYB/AgkeIT+yGXBFLY
Mar 13 00:33:14 debian sshd[25188]: pam_unix(sshd:session): session 
opened for user root by (uid=0) 

Mar 13 00:33:14 debian sshd[25188]: Received disconnect from 10.20.3.100 
port 48976:11: disconnected by user 

Mar 13 00:33:14 debian sshd[25188]: Disconnected from user root 
10.20.3.100 port 48976 

Mar 13 00:33:14 debian sshd[25188]: pam_unix(sshd:session): session 
closed for user root 

Mar 13 00:33:15 debian sshd[25242]: Accepted publickey for root from 
10.20.3.100 port 48977 ssh2: RSA 
SHA256:We20RCnHw5WERqLEu5nbmonNrhYB/AgkeIT+yGXBFLY
Mar 13 00:33:15 debian sshd[25242]: pam_unix(sshd:session): session 
opened for user root by (uid=0) 

Mar 13 00:33:15 debian sshd[25242]: Received disconnect from 10.20.3.100 
port 48977:11: disconnected by user 

Mar 13 00:33:15 debian sshd[25242]: Disconnected from user root 
10.20.3.100 port 48977 

Mar 13 00:33:15 debian sshd[25242]: pam_unix(sshd:session): session 
closed for user root 

Mar 13 00:39:32 debian su: pam_unix(su:session): session closed for user 
root

Auszug aus /var/log/messages auf dem Backup-Server eis (10.20.3.100)

Mar 12 23:30:00 eis fcron[9517]: Job '/usr/bin/server-sichern' started 
for user root (pid 9526)
Mar 13 00:05:00 eis fcron[11911]: Job '/var/install/bin/smartmon-plot' 
started for user root (pid 119
Mar 13 00:05:10 eis fcron[11911]: Job '/var/install/bin/smartmon-plot' 
completed
Mar 13 00:25:02 eis su: (to root) root on none
Mar 13 00:25:02 eis su: pam_unix(su:session): session opened for user 
root by (uid=0)
Mar 13 00:33:16 eis fcron[9517]: Job '/usr/bin/server-sichern' completed 
(mailing output)

Während ich das hier schreibe, habe ich auf dem Backup-Server wieder den 
Ordner TDAMP in TDAMP- umbenannt, danach TDAMP wieder neu angelegt und 
mein Script server-sichern neu gestartet. Nun sind wieder knapp 86000 MB 
(so groß ist der Ordner TDAMP auf dem Produktiv-Server) per rsync auf 
den Backup-Server zu holen. Inzwischen sind 57000 MB übertragen und die 
Datei QMETA.FPT (die letzte Zeile im rsync-Report, wenn es bisher 
fehlgeschlagen ist) ist dabei. Sieht aus, als wenn es jetzt 
funktionieren könnte.

Verstehe aber nicht, weshalb die Sicherung per rsync auch mit heilen 
Festplatten im Backup-Server nicht funktioniert hat.

Hat jemand eine Idee dazu?



Mehr Informationen über die Mailingliste Eisfair