[blfs-support] kernel panic when booting with connected external disk

Michael Shell list1 at michaelshell.org
Sun Mar 17 21:37:25 PDT 2013


On Sun, 17 Mar 2013 16:16:44 +0100
"Dr.-Ing. Edgar Alwers" <edgaralwers at gmx.de> wrote:

> 2.) Rootdelay of, let me say, 50 in the grub command, as suggested by Kenno 
> Han, does not change the behaviour. But: if the Toshiba disk is connected 
> _during_ the delay time, the boot process finish succesfull.


The idea behind the rootdelay approach is to give the USB system time to
"stabilize" so that all the devices are seen before continuing. It would
be a good thing to try if the Toshiba external drive was not being seen
("in time").

Your problem, however, seems to be that it *is* being seen and that
when it does, it alters the device numbering of the internal *boot* drive.
I guess it has to be seen really early for it to screw the boot device up.

Readers of this thread should be aware that Dr. Alwers' problem does
*not* have anything to do with trying to boot to the external USB drive,
but rather that its *presence* during the early part of the boot process
blocks the ability of the LFS system from booting normally from the
internal HD. 


> 2.) fdisk -l ( see file ) delivers a little comment which I cannot judge: 
> "Partition 4 does not start on physical sector boundary"
>
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sda1   *        2048     1011711      504832   83  Linux
> /dev/sda2         3598336    35962879    16182272   83  Linux
> /dev/sda3         1011712     3598335     1293312   82  Linux swap / Solaris
> /dev/sda4        35964926   422031359   193033217    5  Extended
> Partition 4 does not start on physical sector boundary.
> /dev/sda5        71630848   122830847    25600000   83  Linux
> /dev/sda6       122832896   163792895    20480000   83  Linux
> /dev/sda7       163794944   214994943    25600000   83  Linux
> /dev/sda8       214996992   317396991    51200000   83  Linux
> /dev/sda9       317399040   422031359    52316160   83  Linux
> /dev/sda10       35964928    71628799    17831936   83  Linux


The warning is harmless, the only potential consequences of it is 
that the access performance of the drive may be a tad slower because
a logical start sector does not align with the device's hardware
sector boundaries. (And in fact, in this specific case there won't even
be a slow down because the contained sda5-sda10 partitions *do* start
on hardware boundaries, see below.) 

The drive uses a 4096 byte hardware sector size. Thus, each partition
should start at a *byte* number that is a multiple of 4096. There is a
complication in that the logical sector size is 512 which means
the starting byte number for the given starting sector number 
(the first sector is number 0) is:

start_sector_number * 512

(thus sector 1 starts on byte 512, and sector 0 contains 512 bytes,
numbered 0 - 511)

which want want to also be divisible by 4096.

For example, for sda5:

 71630848 * 512 / 4096 = 8953856

Observing that 512/4096 = 1/8, we want the starting sector
numbers to all be multiples of 8. Thus,

 71630848 / 8 = 8953856

However, for sda4 it is:

35964926 / 8 = 4495615.75

And that noninteger result is what flags the warning.
For the warning to go away, sda4 should start at sector
35964928 because

35964928 / 8 = 4495616

I strongly recommend that all data on that disk be backed
up before any partition table changes are attempted. However,
given that only the start value of the extended partition
is being changed, you might be able to get by with just
a simple change to that. Note that sda10 (sda5-10 are contained
in sda4) is already set to begin at the correct sector 35964928
so no changes to sda5-sda10 are even needed. I believe you
can simply change sda4's (partition number 4, extended) starting
sector to 35964928 via the "b" command of fdisk, available after
enabling "expert" mode via "x". But, I've never had to do that myself.

Given that none of sda4's contained logical partitions need to be
changed, it might be a quick and totally painless fdisk tweak. The
command gparted is a full blown repartitioning utility, but in this
case it may not even be needed as the location of any of the actual
filesystems are not moved at all.

 
------

Anyway back to the problem at hand ....

> http://dl.dropbox.com/u/8734485/Kernel_Panic
 
> dmesg when external disk ( toshiba ) is NOT connected:
> .
> .
> 4.232817] VFS: Mounted root (reiserfs filesystem) readonly on device 8:5.
>
>
> dmesg additions after connecting the USB Toshiba after the login:


That helps in knowing how the system is *supposed* to work, but the
information we now really need is what devices are the Toshiba and
internal drive partitions set to when the Toshiba drive is present
during boot. And what becomes of sda5, because your error message is:

> VFS Mount root on devices 8:5

Something happens to 8:5 (sda5) when the Toshiba drive is seen during
boot.

I'd also like to see the relevant part of Ubuntu's boot with
and without the Toshiba USB drive connected during boot so we
can see the resultant drive ordering/naming under Ubuntu.
Maybe we can see the same change and how Ubuntu escapes being
bothered by it.

I did notice that the partition name ordering is incorrect as
sda10 has a starting sector less than sda5. This *might* have something
to do with the problem. Also, the fact that LFS is within an extended
partition (sda5) rather than a primary partition may also have something
to do with the problem (What partition is Ubuntu on, primary sda1 ?).
There is an article how to reorder the device name ordering using fdisk:

http://journalxtra.com/linuxsanity/how-to-reorder-linux-drive-partition-numbers-2768/

There is also some information there about device registration order, which
is related to the problem at hand.

Be forewarned that fixing the name ordering will require that the configuration
of everything that refers to sda5-sda10 (grub, fstab, etc.) will have to be
changed to match the new names as well (sda10 becomes sda5, sda5 become sda6,
etc.). 

Anyway, before you do anything, what we want to see what name the
Toshiba and sda partitions are given when the Toshiba drive is
already connected during boot.

However, there is a big problem with getting that crucial information
isn't there? Namely, that you can't just get it from a dmesg because
the system will fail to boot. Compounding the problem is that today's
machines are so fast that the information flies right by on the screen
faster than it can be read.

The simplest approach is to use the "scroll lock" key to pause the
boot messages. This of course assumes that the message timing is
such that a human can react fast enough with the keyboard. You don't
have to copy all the info for us, just find out what is happening
to the partitions in the ATA ST320LT007-9ZV14 drive as well
as the Toshiba.

Another approach is to enable some debug options in the kernel
(in the kernel source configuration under "kernel hacking")
such as "Early printk via EHCI debug port". There also is
a "Remote debugging over FireWire early on boot". But I've
never used any of them before.

There are also was a boot delay patch by Randy Dunlap that allows for the
printing of all boot messages to be slowed down:

https://build.opensuse.org/package/view_file?file=linux-2.6-debug-boot-delay.patch&package=kernel&project=home%3Asteve-beattie&rev=57fbfa7d9c9498aa121a5447beff5e5e

However, I don't know how well it could be applied to modern kernels
and I don't see any kernel option like it in the kernel configuration
for 3.2.1. IMHO, there really is a need for such an option.

Anyone else have some better advice how to capture boot log info before
the filesystem is available?


  Cheers,

  Mike Shell





More information about the blfs-support mailing list