Recently, on of our Ubuntu 18.04.1 LTS servers went into RO mode. I have searched all over and tried various things to no avail, therefore I am turning here for help.
I copied a folder (size +/ 24 MB) on a server running Ubuntu 18.04.1 LTS using: cp -r /opt/3.5.0 /opt/4.0.0
. Right after, the system went in read-only mode.
It is an mSata SSD. The disk has 16GB out of 29GB available, so a full disk should not be the cause.
The syslog:
Mar 3 10:08:11 kernel: [6633028.028328] sd 1:0:0:0: [sda] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Mar 3 10:08:11 kernel: [6633028.028337] sd 1:0:0:0: [sda] tag#21 CDB: Write(10) 2a 00 02 d8 13 a0 00 00 b0 00
Mar 3 10:08:11 kernel: [6633028.028341] print_req_error: I/O error, dev sda, sector 47715232
Mar 3 10:08:11 kernel: [6633028.028371] sd 1:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Mar 3 10:08:11 kernel: [6633028.028377] sd 1:0:0:0: [sda] tag#17 CDB: Write(10) 2a 00 02 d8 0e 60 00 05 40 00
Mar 3 10:08:11 kernel: [6633028.028381] print_req_error: I/O error, dev sda, sector 47713888
Mar 3 10:08:11 kernel: [6633028.028393] EXT4-fs warning (device sda2): ext4_end_bio:323: I/O error 10 writing to inode 1442150 (offset 0 size 778240 starting block 5964426)
Mar 3 10:08:11 kernel: [6633028.028399] Buffer I/O error on device sda2, logical block 5832908
Mar 3 10:08:11 kernel: [6633028.028408] Buffer I/O error on device sda2, logical block 5832909
Mar 3 10:08:11 kernel: [6633028.028414] Buffer I/O error on device sda2, logical block 5832910
Mar 3 10:08:11 kernel: [6633028.028420] Buffer I/O error on device sda2, logical block 5832911
Mar 3 10:08:11 kernel: [6633028.028426] Buffer I/O error on device sda2, logical block 5832912
Mar 3 10:08:11 kernel: [6633028.028432] Buffer I/O error on device sda2, logical block 5832913
Mar 3 10:08:11 kernel: [6633028.028438] Buffer I/O error on device sda2, logical block 5832914
Mar 3 10:08:11 kernel: [6633028.028444] Buffer I/O error on device sda2, logical block 5832915
Mar 3 10:08:11 kernel: [6633028.028449] Buffer I/O error on device sda2, logical block 5832916
Mar 3 10:08:11 kernel: [6633028.028455] Buffer I/O error on device sda2, logical block 5832917
Mar 3 10:08:11 kernel: [6633028.028666] sd 1:0:0:0: [sda] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Mar 3 10:08:11 kernel: [6633028.028672] sd 1:0:0:0: [sda] tag#16 CDB: Write(10) 2a 00 02 d5 62 c0 00 02 b0 00
Mar 3 10:08:11 kernel: [6633028.028675] print_req_error: I/O error, dev sda, sector 47538880
Mar 3 10:08:11 kernel: [6633028.028684] EXT4-fs warning (device sda2): ext4_end_bio:323: I/O error 10 writing to inode 1442161 (offset 0 size 180224 starting block 5942404)
Mar 3 10:08:11 kernel: [6633028.028738] EXT4-fs warning (device sda2): ext4_end_bio:323: I/O error 10 writing to inode 1442165 (offset 0 size 172032 starting block 5942446)
Mar 3 10:08:11 kernel: [6633028.028802] sd 1:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Mar 3 10:08:11 kernel: [6633028.028808] sd 1:0:0:0: [sda] tag#7 CDB: Write(10) 2a 00 02 d5 5d e8 00 04 d8 00
This continues for another second and then no more logs are written.
Whenever I try to do anything on the system (for example tab-completion) I get an error regarding r/o, for example: cannot create temp file for here-document read-only file system
.
I have rebooted the system in recovery mode, and ran fsck. Right away I see an error message. When looking at the status of systemd-remount-fs.service I see:
systemd-remount-fs[1343]: mount: /: mount point not mounted or bad option
systemd-remount-fs[1343]: /bin/mount for / exited with exit status 32
I have also tried using sudo mount -o remount,rw /dev/sda2 /
but with the same error message as a result.
My etc/fstab file:
UUID=xxx / ext4 defaults 0 0
UUID=xxx /boot/efi vfat defaults 0 0
/swap.img none swap sw 0 0
From various answers online I understand I should set the fsck order, but as the system is in RO I am not able to update the file.
When I run mount -l
I see (amongst other things, but I don't think it is all relevant):
/dev/sda2 on / type ext 4 (ro, relatime, data=ordered)
Not sure if it is relevant, but while in recovery mode I go to system-summary I see under LVM state:
Physical Volumes: not ok (BAD)
Volume groups: ok (good)
So there are a few questions I have. Apologies if I am asking anything obvious, I am quite new to all of this.
- First of all, do you have any further advice on which steps to take to get out of RO mode?
- Is this more likely a software or a hardware issue?
- As we have more systems with the same setup, I would like to understand why my copy action could have triggered the system going to RO. I have done this before on other systems without any problems. The folder contained a .Net application that was running at the moment of copying, could this be the cause?
Thanks for your help!