Failed Repository Integrity Check

Last week I was presented with the following error on one of the Solaris 10 boxes:

svc.configd: smf(5) database integrity check of:

/etc/svc/repository.db

failed. The database might be damaged or a media error might have
prevented it from being verified. Additional information useful to
your service provider is in:

/etc/svc/volatile/db_errors

The system will not be able to boot until you have restored a working
database. svc.startd(1M) will provide a sulogin(1M) prompt for recovery
purposes. The command:

/lib/svc/bin/restore_repository

can be run to restore a backup version of your repository. See
http://sun.com/msg/SMF-8000-MY for more information.

Having never seen this error, I was thinking: “this is gonna be interesting…”. Thankfully the error was pretty verbose so I started to disect it section by section. Yeah, service repository got hosed, somehow, and I can potentially find some usefull info in /etc/svc/volatile/db_errors. Unfortunatelly, there was nothing of use in there.

The restore_repository script mentioned gave me little more hope. I also went and checked out the page URL. After reading the page I decided to go ahead and try to restore the service repository.

I logged in to the box in single user mode and took a look at the restore script to get an idea of what it might do. Then, I ran it. Fortunatelly, the script was pretty good at doing checks and told me that I can not proceed any further because / filesystem is mounted RO. To fix this I was asked to run:

bash-3.00# /lib/svc/method/fs-root
bash-3.00# /lib/svc/method/fs-usr

Once the filesystems were fixed up I ran the restore_repository script. I was asked which backup copy I wanted to restore and that was it. The system rebooted and came back up fine. This turned out to be a pretty good learning experience and http://www.sun.com/msg/SMF-8000-MY is very well worth reading. Continue Reading

Finding out length of a UTP cable using Cisco IOS

Well, this is just cool. Everyone has been there; sitting at a Cisco switch console wondering how long the unmarked UTP plugged into port 17 is… Thankfully Cisco IOS might be able to tell you:

core# test cable-diagnostics tdr interface gigabitethernet0/17

Unfortunately it’s not available on all switches. I dug this up on here Cisco site. And here is the command reference… Continue Reading

Quick and dirty SVM cheatsheet

This list focuses mostly on mirror operations. I use Solaris Volume Manager quite a bit when mirroring internal drives. There are tons of additional features and commands, if you use SVM forĀ  things other than mirroring. In that case you might want to look at check out Solaris Volume Manager Administration Guide.

Create database replicas:
metadb -f -a -c [number_of_replicas] [device]
metadb -f -a -c 3 c0t0d0s7

Delete all database replicas from device:
metadb -d [device]
metadb -d c0t0d0s7

Display status of database replicas:
metadb -i
metadb -i

Display metadevice status:
metastat
metastat

Create simple concat/stripe metadevice:
metainit -f [concat_metadevice] 1 1 [device]
metainit -f d21 1 1 c0t0d0s1

Create a mirror with one submirror:
metainit [mirror_metadevice] -m [submirror_metadevice]
metainit d20 -m d21

Attach a submirror to one sided mirror:
metattach [mirror_metadevice] [submirror_metadevice]
metattach d20 d22

Detach a submirror from a mirror:
metadetach [mirror_metadevice] [submirror_metadevice]
metadetach d20 d22

Clear a metadevice:
metaclear [metadevice]
metaclear d22

Offline a submirror:
metaoffline [mirror_metadevice] [submirror_metadevice]
metaoffline d20 d22

Online a submirror:
metaonline [mirror_metadevice] [submirror_metadevice]
metaonline d20 d22

Enable a failed component:
metareplace -e [metadevice] [device]
metareplace -e d21 c0t0d0s1

Rename a metadevice:
metarename [old_metadevice] [new_metadevice]
metarename d20 d30

Switch metadevice names:
metarename [metadevice_1] [metadevice_2]
metarename -x d20 d30

Configure system for root metadevice:
metaroot [metadevice]
metaroot d10
Continue Reading

Linux multipathing

I use MPxIO in Solaris quite often and it works very well for me. This time I needed to test out I/O multipathing in RedHat. What I really needed to do: have a server with two HBA’s manage a mirror which has submirrors on separate SAN’s; so that the server has multiple paths to each submirror. That way, if an HBA goes the server has still connection to both submirrors through the remaining HBA.

Gear used in this “experiment”:

  • Dell Poweredge server.
  • Two Qlogic QLA2310 HBA’s.
  • RHEL Server 5.3 x86.
  • Two SAN’s presenting one LUN each.

Rough steps I took to get this working:

  1. Make sure device mapper package is installed.
  2. Present two LUN’s from two SAN’s.
  3. Probe HBA’s for presented LUN’s.
  4. Configure multipathing.

First and foremost, make sure qla2xxx driver is loaded. You also have to make sure you have device-mapper-multipath-0.4.7-23.el5 installed. Next, configure multipathing daemon so that it starts on boot:

[root@carbon ~]# chkconfig multipathd on

When that’s done you need to make the system aware of the presented LUN’s. One way to do so is to reboot the server. Another option is to force HBA scan:

[root@carbon ~]# echo "- - -" > /sys/class/scsi_host/host1/scan

During this you should watch /var/log/messages to see if your LUN’s are detected. When done, make multipathd aware of the LUN’s:

[root@carbon ~]# multipath -v2 -d

The above command is a “dry run”. There will be no device map changes committed. You will only be shown device mapper changes that will be made. To commit device map changes run:

[root@carbon ~]# multipath -v2

Once this is done you can see what multipathd is seeing:

[root@carbon ~]# multipath -ll
mpath2 (3600508d311100a300000f00001a90000) dm-3 COMPAQ,HSV111 (C)COMPAQ
[size=15G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=100][enabled]
 \_ 1:0:3:1 sde 8:64 [active][ready]
 \_ 2:0:3:1 sdh 8:112 [active][ready]
\_ round-robin 0 [prio=20][enabled]
 \_ 1:0:2:1 sdd 8:48 [active][ready]
 \_ 2:0:2:1 sdg 8:96 [active][ready]
mpath1 (3600508c362d0a1250000900001490000) dm-2 COMPAQ,HSV111 (C)COMPAQ
[size=15G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=100][enabled]
 \_ 1:0:0:1 sdb 8:16 [active][ready]
 \_ 2:0:4:1 sdi 8:128 [active][ready]
\_ round-robin 0 [prio=20][enabled]
 \_ 1:0:1:1 sdc 8:32 [active][ready]
 \_ 2:0:1:1 sdf 8:80 [active][ready]

If everything looks good, you can create configuration file for multipathd. You will need to edit /etc/multipath.conf and depending on your environment, add or modify some parameters. The configuration file contains enough comments and examples to figure out what different parameters mean. When in doubt, consult the man pages.

First, add a blacklist section, which will make certain device exempt from multipathing. I have my internal drives listed in blacklist section:

blacklist {
        devnode "^sd[a-b].*"
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^hd[a-z]"
}

Next, you are going to need device section. This is going to be specific to your SAN. The one below is for EVA5000. I got the parameters from HP’s device mapper package:

device {
        vendor                  "HP|COMPAQ"
        product                 "HSV1[01]1 \(C\)COMPAQ|HSV[2][01]0|HSV300"
        path_grouping_policy    group_by_prio
        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        path_checker            tur
        path_selector           "round-robin 0"
        prio_callout            "/sbin/mpath_prio_alua /dev/%n"
        rr_weight               uniform
        failback                immediate
        hardware_handler        "0"
        no_path_retry           12
        rr_min_io               100
}

You should also look at defaults section to make sure it is configured for your setup. Again, the parameters in mine are specific to EVA5000:

defaults {
        udev_dir                /dev
        polling_interval        10
        selector                "round-robin 0"
        path_grouping_policy    failover
        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        prio_callout            "/bin/true"
        path_checker            tur
        rr_min_io               100
        rr_weight               uniform
        failback                immediate
        no_path_retry           12
        user_friendly_names     yes
        bindings_file           "/var/lib/multipath/bindings"
}

Finally, you will need to specify configuration for the presented LUN’s. This applies to the multipaths section of multipath.conf file:

multipath {
        wwid                    3600508b4001031250000900001490000
        alias                   san1data
}
multipath {
        wwid                    3600508b400011c300000f00001a90000
        alias                   san2data
}

After you are done, restart multipathd and check output of multipath -ll command:

[root@carbon ~]# multipath -ll
san2data (3600508d311100a300000f00001a90000) dm-3 COMPAQ,HSV111 (C)COMPAQ
[size=15G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=100][active]
 \_ 1:0:3:1 sde 8:64 [active][ready]
 \_ 2:0:3:1 sdh 8:112 [active][ready]
\_ round-robin 0 [prio=20][enabled]
 \_ 1:0:2:1 sdd 8:48 [active][ready]
 \_ 2:0:2:1 sdg 8:96 [active][ready]
san1data (3600508c362d0a1250000900001490000) dm-2 COMPAQ,HSV111 (C)COMPAQ
[size=15G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=50][enabled]
 \_ 1:0:0:1 sdb 8:16 [active][ready]
 \_ 2:0:4:1 sdi 8:128 [active][ready]
\_ round-robin 0 [prio=20][enabled]
 \_ 1:0:1:1 sdc 8:32 [active][ready]
 \_ 2:0:1:1 sdf 8:80 [active][ready]

That should be it. You should test the setup by disabling paths to see if your LUN’s stay up. Continue Reading

Moving Solaris Container to a different host

Cloning Solaris Container is pretty straight forward. But what if you want to have an identical container on another host? In a nutshell:

  1. Make a clone of an existing container on host A
  2. Detach the clone
  3. Compress it and move it to host B
  4. Create configuration for the moved container
  5. Decompress the container on host B
  6. Attach the decompressed container

I have done this on Solaris 10 8/07. Before going any further, it is important that both host A and host B are running the same release of Solaris and they are both at the same patch level. Otherwise, you will almost certainly run into a situation where the container will refuse to attach to the new host.

I have created a cloned container called mx2. First, I have detached the container:

bash-3.00# zoneadm -z mx2 detach

Then I compressed the container directory so I could move it to host B. It does not really matter which tool you use to compress the directory. Just make sure you preserve permissions, ownership or ACL’s . For me, for some reason tar had a little of an ordeal compressing the container directory:

bash-3.00# cd /export/home/zones
bash-3.00# tar cf mx2.tar mx2
tar: mx2/root/usr/jdk/instances/jdk1.5.0/jre/lib/sparc/cpu/sparcv9+vis/sparcv9/libclib_jiio.so: symbolic link too long
tar: mx2/root/usr/jdk/instances/jdk1.5.0/jre/lib/sparc/cpu/sparcv9+vis2/sparcv9/libclib_jiio.so: symbolic link too long

Once I had the directory compressed I scp-ied it to host B. Then I decompressed it in the zones directory:

bash-3.00# cd /export/home/zones
bash-3.00# tar xf mx2.tar

I have recreated the problematic symlinks mentioned above manually. If you are having trouble attaching the container check that container configurations are the same on both systems.

Before you can attach a container, you need to have container configuration in place. Without it you will not be able to attach the container.

Make sure your configuration is correct. I was attaching full root container and the system was complaining that the container being attached is missing some packages. In reality, my container configuration was for sparse root container. It turned out that when I was importing container configuration, some inherit-package-dir statements were added which has caused attach operation to fail. I had to remove those manually.

So, again, before attaching the container make sure your container configuration is good, you are inheriting correct package directories, etc. Once you have that right, you can attach the new container:

bash-3.00# zoneadm -z mx2 attach

That’s it. Depending on your needs you might want to change hostname, ip address and so on.

Some interesting linkage:

Solaris Containers replication
Continue Reading

Page 1 of 812345...Last »