Sunday, January 31, 2010

Odds and Ends - 01/31/10

  • The Hot Aisle has an article showing the mathematical inevitability of storage arrays moving to Flash and SATA (AKA Flash and Trash).  While SSD adoption was slow initially, almost every vendor is offering it in some fashion.  I agree that to reap the full benefits, it will eventually have to stop looking like a standard "spindle."
  • Storagezilla had a nice post on Oracle's declaration of war on NetApp.  It is the second time Oracle has declared war on an established vendor in recent memory, the first time being their release of rebranded RedHat.  It doesn't look like it affected RedHat in the long term, and I doubt it'll affect NetApp much.  During storage purchases, you're relying on the vendor's ability to deliver as much as what their delivering, and it'll be some time before Oracle has proven itself in the storage realm.
  • EMC obliterates the competition in SPECsfs_cifs and posts extremely competitive numbers in SPECsfs_nfs.  The cifs benchmark originally looked like the result of some bored engineer in an EMC lab trying to see how much he could destroy the existing rankings - the benchmark was ran on all SSDs (well, 4 FC disks for Celerra information).  I wonder if this will cause some of the other vendors to post updated cifs numbers.  Storagezilla claims they won't because of how bad their implementation is.  It could be due to the few vendors that can offer that amount of SSD storage.  I have to ask, does CIFs really make sense for this type of a workload?
  • Storagebod posted on the cloud-angle of Apple's iPad announcement. I thought something very similar when I saw the announcement, except for a few things.  First of all, the bulk of storage on most consumers computers is media.  iTunes already has most of that content available, so pushing that storage into the iTunes cloud is more a function of scaling IO/access rather than 'having sufficient storage'.  In fact, if Apple could talk Big Content into allowing them to detect non-iTunes media and offer free-of-charge the equivalent iTunes media, this would be even easier.  Pirates won't buy songs they already 'have', so there isn't a lot of money left on the table AND it reduces the availability of completely wide-open music/movie files.
  • Cleversafe posted a good primer on silent errors.  This is the main reason why details matter when it comes to RAID implementation, and why you need more than 1 piece of parity for large drives.
  • If you have the chance to try it, New Belgium's Spring Seasonal, the Mighty Arrow, is quite tasty.  A nice pale ale that is light on hops and extremely drinkable.

Thursday, January 28, 2010

vBlock and Private Cloud

I'll be honest... when EMC announced the vBlock architecture alongside the VCE initiative, I didn't quite get its importance.  In my mind, there was very little benefit to these preconfigured stacks, especially at the price points that I heard rumored.

After a few weeks, I think I've got a handle on it and why this is potentially a big deal.

When technologies first come out, implementations are fairly complex and require quite a bit of trial and error alongside a fairly good breadth of skillsets.  VMware was no different... it required people who understood virtualization technology, networking, storage, Windows/Guest OSes, and security.  As time passed, the implementations become easier due to the toolsets becoming better and the availability of knowledge increasing.

After that, the difficulty shifted.  Beyond the politics of getting signoff for virtualizing as much of your environment as possible, the next challenge was taking the architecture and scaling it big.  As the technology progressed, this became easier (it is still not what I'd call "simple") and things start shifting towards "how do we backup/recover these large environments, and how do we leverage the technologies in play for Disaster Recovery/Business Continuity?"

But what strikes me is, as the challenges become greater, the importance of a good fundamental implementation remains.  Compatibility matrices still need to be kept up to date, documented, and tested.  Research needs to be done on new server hardware models and processor models, alongside updating any documentation/procedures that change as new VM farms are built on the new hardware.

What is really the kicker, though, is typically the people who originally brought in VMware are still at that level, making sure the implementations are solid rather than spending the time on the more difficult "next big thing."


So how does vBlock fit into this?  Simple.  If you are an organization where there is a large virtualized environment and you aren't allowed headcount increases, vBlock offers an opportunity... namely to take some of your best technical employees and allow them to be repositioned to where they can provide the most value.  Lets face it, an architect with 4+ years of experience is wasted on validating firmware levels.  Similarly, in these large environments where vBlock makes sense, churning VMware farms to stay supported isn't a great use of highly skilled resources time.

If you notice, the vBlock architecture does not cover the current cutting edge portions of leveraging the virtual infrastructure as much as possible to benefit DR/BC.

How does this play into Private Clouds?  Simple.  There are a lot of private cloud definitions floating around, but for the purpose of this, I'm going to drastically simplify it.

A private cloud is "self service IT."

A lot of people get edgy when you start discussing private clouds.  The foremost argument typically runs along these lines:  A private cloud, implemented properly, greatly reduces the time to deploy a server/system, increases the accuracy of the build process, and dramatically reduces the "friction" of implementation.  By "friction," what I'm really getting at are those things that take a 1 day server build and turn it into a 2 week process.  By reducing the friction and difficulty of implementing new systems, the total cost will go up because there will be a lower barrier of entry (easier/quicker build process = more systems being built = more money goes to VMware, Microsoft, and your storage vendor).

Not quite.  A good private cloud still requires strong processes.  Systems have to be sized, priced, signed off on, and someone with a budget has to agree to fund it.  None of that really changes (granted, depending on many factors, sizing exercises may be reduced in many cases).  All that changes is, once all of the appropriate approvals are attained, systems can be deployed in days instead of weeks.  Money still should not be spent without a good business and cost justification - but in either model, if someone gets the appropriate signoffs and funding, the environment is going to get built.

The second argument is fairly simple:  In order to implement a good private cloud, it takes automation, standardization, and virtualization. If things get to the point where it is extremely simple to deploy systems, then people could lose jobs.  Lets be honest with ourselves... if the only value IT resources provide is the ability to install a server, then their days of employment are probably numbered anyway.  If their only function is to ask "small, medium, large, or super-sized server," they are probably in the wrong profession.

The final bit is this.  The vBlock provides a couple things:
  • A standardized, compliant "plug and play" architecture for virtualized environments
  • The ability to free up valuable time to work on areas that provide true business value.
  • A decent building block for private clouds, alongside software to (supposedly) streamline the administration of the cloud architecture and increase the number of systems a given admin can support.
VMWare/EMC/Cisco were first to market with these preconfigured building blocks.  NetApp recently announced something similar, and I assume Oracle will be coming out with a competitor eventually.  A good systems administrator automates as much as possible.  Fundamentally, all this does is just take automation to the next (massive, cross-vendor) level.

Monday, January 25, 2010

XIV Disk Fault Questions [Updated]

UPDATE [8/6/2010]:  If you're interested in more current XIV information, I recommend reading Tony Pearson's recent posts here and here.  He also provided additional information in the comments to one of my posts here.
--
Today, I came across an XIV RAID-X post by IBMer KT Stevenson: RAID in the 21st Century.  It is a good overview of the XIV disk layout/RAID algorithm.  I have limited my questions in this post to ones raised by KT’s post since this post is already a bit lengthy.
In fact, DS8000 Intelligent Write Caching makes RAID-6 arrays on the DS8000 perform almost as well as pre-Intelligent Write Caching RAID-5 arrays.
Any array that does caching for all incoming writes should be able to claim the same (from a host perspective).  For large writes where the entire stripe is resident in memory for parity computations, there should be almost NO performance degradation.  It is great that the DS8000 performs well with RAID-6, but that is rapidly becoming “table stakes” if it isn’t already.
When data comes in to XIV, it is divided into 1 MB “partitions”.  Each partition is allocated to a drive using a pseudo-random distribution algorithm.  A duplicate copy of each partition is written to another drive with the requirement that the copy not reside within the same module as the original.  This protects against a module failure.  A global distribution table tracks the location of each partition and its associated duplicate.
[..]
The most common ways to mitigate the risk of data loss are to decrease the probability that a critical failure combination can occur, and/or decrease the window of time where there is insufficient redundancy to protect against a second failure.  RAID-6 takes the former approach.  RAID-10 and RAID-X take the combination approach.  Both RAID-10 and RAID-X reduce the probability that a critical combination of failures will occur by keeping two copies of each bit of data.  Unlike RAID-5, where the failure of any two drives in the array can cause data loss, RAID-10 and RAID-X require that a specific pair of drives fail.  In the case of RAID-X, there is a much larger population of drives than a typical RAID-10 array can handle, so the probability of having the specific pair of failures required to lose data is even lower than RAID-10.
While at first this paragraph made perfect sense to me, there was something that just didn't seem to sit right.  Namely, this portion:
In the case of RAID-X, there is a much larger population of drives than a typical RAID-10 array can handle, so the probability of having the specific pair of failures required to lose data is even lower than RAID-10.

The following is based off of what I've read online... allocations are divided up into 1 MB partitions, which are the distributed across the frame.  For the purpose of this question, I will assume 100% of all disks are available for distribution (which is untrue, but it is the absolute best case scenario) and that the data is perfectly evenly distributed.

In a fully loaded XIV frame, there are 180 physical disks.  What I'm interested in is the number of chunks that can be mirrored among the 180 disks without repeating the pair – once a ‘unique’ pair is repeated, you are vulnerable to a double disk failure with every allocation past that point.  So, 180 C 2 = 16110.  With 1MB per chunk, that is 16 GB.  From an array perspective, you run out of uniqueness after 16GB of utilization.  From an allocation perspective, any allocation larger than 16GB would be impacted by a double disk fault.  I assume XIV doesn’t “double up” on 1MB allocations (going for a less wide stripe for reducing the chances of a double fault) simply because I've always heard that hotspots aren't an issue.  This is best case assuming a perfect distribution, as near as I can reason - I'm sure any XIVers out there will correct this if I'm making an invalid assumption.

If you look closer, though, it’s a little worse than that.  Every single disk is not a candidate as a mirror target – as noted above, XIV does not mirror data within a module.  With 15 modules in a 180 disk system, that means for each mirror position there are 11 disks that can not be used.  The math gets beyond me at this point, so if anyone wants to comment on what that actually equates to, I’d be interested.

  1. What is the “blast radius” of a double drive fault with the two drives on different modules?  Is it just the duplicate 1MB chunks that are shared between the two drives, or does it have broader impacts (at the array level)?
  2. At what size of allocation does a double drive fault guarantee data loss (computed as roughly 16GB above)?
  3. What is the impact of a read error during a rebuild of a faulted disk?  How isolated is it?
  4. Does XIV varyoff the volumes that are affected by data loss incurred by a double drive issue, or is everything portrayed as “ok” until the invalid bits get read?
  5. If there is data loss due to a double drive issue, are there reports that can identify which volumes were affected?
Update (01/26/2010):
I realize that the math part of this post is a little hard to understand, especially with 180 spindles in play, so I went ahead and drew it out with only 5 spindles (5 C 2 = 10).

This shows the 5 spindle example half utilized.  In this diagram it is possible to lose 2 spindles without data loss... for example, you can lose spindle 2 and spindle 4 - since neither of them have both copies of a mirror, no data is lost.



This shows the 5 spindle example with all unique positions utilized.  In this diagram, it is impossible to lose 2 spindles without losing both sides of one of the mirrors.

The 16GB number quoted above is based off of a 1MB chunk size, which is what has been documented online.  If the chunk size was larger, then that amount would be higher before guaranteed loss.  Of course, if you lose the wrong two drives prior to 16GB, you'll still lose data.  The percentage chance of data loss increases as you get closer to 16GB.

I know KT is working on a response to this, I'm looking forward to being shown where the logic above is faulty (or where my assumptions went south).

Tuesday, January 19, 2010

Storage Costs - Cost per GB? Not Always

Recently, using "cost per GB" as a metric to rank the acquisition cost for storage platforms has come up quite a bit... starting with a conversation amongst the storage fanatics on Twitter ('bod, myself, sysadmjim, and peglarr) followed by a post from tommyt (Xiotech) and then Storagebod.

First off, the general consensus seems to be that "cost per raw GB" is not ideal.  This is simply because, due to architectural differences (i.e. DIBs), RAID overhead, and other factors, the amount of usable capacity per "raw GB" can vary greatly.  Most people have settled on "cost per usable GB."  Over at Storagebod's blog, an unnamed vendor (but obviously one that has dedup, etc) claims that "cost per used GB" is a better, albeit harder to measure metric.

When migrating from a legacy environment, it is difficult to quantify how much primary storage dedup can save you.  From a career standpoint, it is a little risky to assume that you can purchase 80% of your currently allocated storage to meet your current and future needs based off of what a vendor claims you can squeeze out of it.  While there are tools and benchmarks to help with this, all things being equal, I think it's safe to say people would be more comfortable recommending buying sufficient spinning rust to meet their capacity requirements than relying on dedup/tp.  While dedup/tp can save money (especially for future orders after experiencing how it behaves in a particular environment), for the initial outlay it is a little risky.  Additionally, you need to be careful that you're not spending a greater amount of money on dedup licenses than you would have on disk (after maintenance, etc).

Another issue with cost per GB is it can fluctuate based on what components need to be upgraded with any given order.  With the upfront costs of the cabinets, storage processors, cache, connectivity being fairly significant, the ongoing cost of adding disk can be higher (or lower) depending on what you need to "scale" on the array.

In short, cost per GB is not always a good metric.  While it works when capacity is the primary requirement, it falls short when there are specific IOPS requirements that outstrip physical capacity.  For the sake of argument, lets assume a given corporation has standardized on 3 tiers of storage:
  • Tier 1:  High performance requirements - such that the capacity required does not fulfill the IOPS requirement (typical architecture:  RAID 10, 15k drives or SSDs).
  • Tier 2:  General storage requirements - generally, the capacity required will fulfill the IOPS requirement, especially when spread over enough hosts and disks (typical architecture:  RAID 5, 10k drives).
  • Tier 3:  Archive storage requirements - generally, IOPS is not a major consideration, and fat cheap disks are more important than speed (typical architecture:  RAID 6, 1-2TB SATA drives).
With these tiers, cost per GB is a fairly good metric for Tiers 2 and 3, but falls down in Tier 1 where after the capacity requirement is met, additional drives are allocated to meet a performance requirement.  The use of SSDs can reduce the cost of Tier 1, but in the past it has been difficult to optimally layout the allocations to get the best use out of them.  Technologies such as FAST (EMC) and PAM (NetApp) help with the layout issue.  Wide striping can help performance in all Tiers, but  in Tier 1 you have to be careful if you need to guarantee response time.

FAST v2 is an interesting technology.  Basically, EMC is betting that, with extremely granular intelligent data moves, most customers can eliminate the Tier 2 requirement, spread all the allocations over a lot more 'Tier 3' spindles, and handle all the hot spots by migrating hot blocks to SSDs.  This will make internal chargeback extremely difficult since it is dynamic and self tuning.  Also, it will also make performance testing on non-production servers difficult since, to my knowledge, there isn't a good way to "apply" a given layout to a different environment.

All of which basically says, going forward, straight "Cost per usable GB" is going to become less important for determining the total cost of a storage environment.  My recommendations?
  • Work with all vendors very closely to make sure that they (and you) have a good understanding of the requirements of the proposed environment.  Make sure the estimates are realistic, and in the cases of dedup and TP, make sure that the vendor will stand by the ratios that the solution depends on.
  • Make sure that you understand how the storage environment will grow - namely maintenance costs, stair step upgrades (when purchasing additional disks require cache, directors, additional licensing, etc).  Make sure it is all in the contract, and pre-negotiate as much as possible.
  • Maintain competition between vendors.
Typically, someone will bring up using virtualization as a mechanism to ensure fair competition among vendors for capacity growth.  While there is some validity to that for existing environments where multiple vendors are already on the floor and under maintenance, for new environments the "startup costs" of new arrays tend to negate any "negotiating benefit" you could get from virtualizing the environment in my opinion.

Friday, January 15, 2010

Giving the Right Answers

Storagebod has an interesting post up about the questions customers ask vendors.  It is definitely a good read and the comments are good food for thought.  The main question I think should be asked during RFPs is "give me customer references that have gone through a very similar migration as we would be faced if we chose your solution."  While customer references are always glowing and refereed, these type of references can (and have) offer common gotchyas.

What I'd like to touch on is vendor responses to questions.  What I'd like to see is more blunt answers that actually help the customer with the actual implementation.  Answers such as:
  • "That is a supported configuration.  However, we'd recommend that you do it this way..." - Too often, I've seen responses that just indicate whether or not a given configuration is supported... not if it makes sense with the equipment in question.
  • "While with your current architecture, having that type of drive configuration made sense... in the proposed solution, it does not and this is why." - No two storage platforms behave the same... questions are driven by past experience with whatever vendor is currently in house.  The responses should guide the customer to better solutions... not simply forklifting what is currently installed with a differently badged environment.  Of course, explain why there are deviations, but don't allow for a tray of 15 10k 73GB drives if it isn't optimum on the suggested architecture, for example.
  • "No.  We do not provide that functionality.  We actually do not plan on providing that functionality due to." - If there is functionality that the proposed solution does not provide, and there isn't a concrete date for implementation, don't even respond with "it's coming."  One or two missing "nice to haves" probably isn't going to make a difference during vendor selection (cost will come into play typically before that).
Basically, I'd rather see blunt responses that indicate where the customer is being stupid/misguided than a glowing RFP response that doesn't quite paint an accurate picture.

Sunday, January 10, 2010

Perl Script to Add or Remove Colons from WWNs

Some days it seems like a large portion of my job is pasting WWNs from Cisco MDS switches into SYMCLI (and back).  MDS switches require the WWNs to have colons in them, SYMCLI requires no colons.

I wrote a quick and dirty script in Perl (with a little help from Aaron on the backreference) to add or remove colons from what is currently on the clipboard. 

Requires Microsoft Windows and ActiveState Perl.
use Win32::Clipboard;
$CLIP = Win32::Clipboard();
$x = $CLIP->Get();
if ($x =~ m|:|) {
    $x =~ s|:||g;
}
else {
    $x =~ s|(.{2})|$1:|g;
    $x =~ s|:$||;
}
$CLIP->Set($x);

Thursday, January 7, 2010

Ripping DVDs to Other Formats in Windows 7 64-Bit

After I migrated my main laptop to Windows 7 64-bit, I noticed that most DVD ripping applications no longer worked.  After trying several replacements, I finally found a free app to rip the DVDs into MKV format where HandBrake can transcode them into an iPhone compatible format.  There are several tutorials online if you needed to convert the MKV files back to DVDs, but for my purposes that was unnecessary.

MakeMKV - "MakeMKV is your one-click solution to convert video that you own into free and patents-unencumbered format that can be played everywhere. MakeMKV is a format converter, otherwise called "transcoder". It converts the video clips from proprietary (and usually encrypted) disc into a set of MKV files, preserving most information but not changing it in any way. The MKV format can store multiple video/audio tracks with all meta-information and preserve chapters. There are many players that can play MKV files nearly on all platforms, and there are tools to convert MKV files to many formats, including DVD and Blu-ray discs."

HandBrake - "HandBrake is an open-source, GPL-licensed, multiplatform, multithreaded video transcoder, available for MacOS X, Linux and Windows."

Hot Migrate Root Volumes in AIX

AIX is one of the few OSes that, out of the 'box', can replace boot disks while the OS is running without an outage.  Of course, the standard disclaimers apply... namely, test this on a non-production LPAR and make sure you have backups.  This is extremely helpful for storage array migrations (bringing an LPAR under a SVC, for example) and general maintenance.

If you’re running under VIO Servers, it even allows complete cleanup to occur online.  I recommend a reboot at the end to be safe, but it really isn't necessary.

Assumptions:  All the target disks are currently configured on the LPAR and are not in any volume groups.

Step 0: MAKE SURE TO HAVE A CURRENT MKSYSB AND BACKUP.

Step 1: Replace an old root hdisk with the new one. If this fails due to the destination disk being smaller, go to the alternate instructions below
$ replacepv OLDDISK1 NEWDISK1
0516-1011 replacepv: Logical volume hd5 is labeled as a boot logical volume.
0516-1232 replacepv:
NOTE: If this program is terminated before the completion due to
a system crash or ctrl-C, and you want to continue afterwards
execute the following command
replacepv -R /tmp/replacepv385038
0516-1011 replacepv: Logical volume hd5 is labeled as a boot logical
volume.
Step 2: Verify that the old disk is not defined to any volumegroups:
$ lspv
OLDDISK1 00007690a14xxxxx None
NEWDISK1 00007690a14xxxxx rootvg  active
OLDDISK2 00007690913xxxxx rootvg  active
Step 3: Add the boot image to the new disk:
$ bosboot -ad NEWDISK1
bosboot: Boot image is 30441 512 byte blocks.
Step 4: Repeat steps 1-3 for the second root disk (if replacing both
root disks)

Step 5: Adjust the bootlist
$ bootlist -om normal  NEWDISK1 NEWDISK2
$ bootlist -om service NEWDISK1 NEWDISK2
Step 6: Remove the old hdisks.
$ rmdev -dl OLDHDISK
Step 7: Remove the old disk mappings from the VIO Server if applicable.
$ rmdev -dev OLDMAPPING
Step 9: Run savebase
$ savebase
Alternate Instructions

Step A1: Place the replacement hdisks into the volumegroup:
extendvg rootvg NEWDISK
Step A2: Migrate the disks (you must have PPs sufficient to migrate the
disk):
migratepv OLDDISK NEWDISK
Step A3: Validate that there is no data on the old disk
lspv -l OLDDISK
Step A4: Remove the OLDDISK from the Volumegroup
reducevg rootvg OLDDISK
Step A5: Add the boot image to the new disk:
$ bosboot -ad NEWDISK1
Step A6: Repeat steps A1-A5 for the second root disk.

Step A7: Continue with step 5 above (namely, adjust the bootlist, cleanup left over disks, and savebase).

Monday, January 4, 2010

SSH over VPN on the iPhone - Why Not?

Recently, Nigel Poulton tweeted a YouTube video that showcased an application to manage the Xsigo I/O Directors from the iPhone... I responded that, at one time, I had done something 'similar' using SYMCLI and SSH.  He posted a followup discussing whether or not an iPhone admin function is something that Enterprise customers would be comfortable with:
While I think the idea is cool, I’m not sure how interested companies would be –> management and configuration changes to production kit from an iPhone ….. sounds a bit ahead of its time to me.   Cool, yes.  But is cool what major companies and managers of large Data Centres are looking for?  Remember that Xsigo kit is pretty squarely pitched at enterprise customers.  Would such applications cause more worries and concerns than they would solve problems?
I don't think anyone would argue that administering any sort of production kit primarily using an iPhone is a good idea.  But certainly most IT folks have had production situations arise where they're away from a computer and just need to check a few things out quickly.  This type of software is perfect for that.  In any case, it is optional software, so if a given customer has an issue with providing this type of access then they can simply not deploy this interface.
Think about it this way……. Matt Davis pinged me back saying that he had once done “symcli over ssh over VPN ….. via my iPhone” to administer a Symmetrix DMX!!  Not sure what your initial thoughts are on hearing that, but mine were trepidation.  Sure, that’s pretty damn cool, but pretty flipping scary too!  Kudos to Matt, but more scary than cool in my books
More scary than cool?  In my opinion, not really.
  1. GNU Screen provides protection against connection hiccups.  If the VPN or SSH connection drops in the middle, I can re-attach the terminal as it was running.
  2. I've written perl scripts around the majority of changes... as part of these scripts, they generate 'undo scripts' that can easily revert any changes to the way they were previously.
  3. I'm extremely familiar with SYMCLI (to the point that I tend to know more than the support people I work with) and I would only ever run processes I was comfortable with via this type of connection.  I'm enough of a geek that I have the entire Solutions Enabler PDF collection synced to my iPhone via DropBox (along with Cisco documentation and other array documentation). 
  4. I would never run any procedure that would generate a lock on the array or take a long time to run.  But some symdev or symmaskdb queries?  Readying a device or kicking off a symrcopy command?  Why not?
I could argue that this method is more stable than most Web interfaces since it isn't subject to JVM crashes and browser hangs.  As with most CLIs, you need to know exactly what you are doing though.  A little knowledge and root access is a dangerous thing.

Use NIM to Change Root's Password

Many large AIX environments use IBM’s Network Installation Manager (NIM) to deploy and maintain AIX LPARs. If you ever need to change a "forgotten" root password in AIX and have a NIM environment available, the following procedure will allow you to access and change the password.

This requires an outage, but can be easier than booting off of media. Of course, try this in a test environment first to make sure it works as expected and I offer no warranties/etc if something goes horribly wrong (please make sure you use >> and not > below).

I’m not going to claim it is elegant or the best way of doing it, but it works. If anyone has a better way, please post it in the comments.

On the NIM server, run the following command:
nim -o maint_boot -a spot=SPOTNAME LPARNAME
On the AIX box that is having the “root password opportunity”, reboot and enter SMS mode. Make sure that the NIM server IP address is set as the boot server and the LPAR’s network information is configured properly.

The LPAR will perform a network boot using the SPOT. You will have to go through prompts to set up the current terminal and preferred language.  Following that, there will be an option to either install the BOS or go into a limited maintenance mode. Go into the limited maintenance mode.

You will be booted into a semi-functional AIX environment. Use lspv to see what physical volumes you have available and type the following to import the hdisk that had rootvg on it originally:
importvg hdisk#
Create a temporary mount point and mount the root filesystem:
mkdir test
mount /dev/hd4 /test
You do not have access to a good portion of the command line tools (including vi) in this environment. Run the following command to add a new account to the passwd file:
echo tempuser::0:0::/:/usr/bin/ksh >> /test/etc/passwd
MAKE SURE THAT YOU USE TWO “>” SYMBOLS. Otherwise, you will overwrite the entire passwd file. Run the following commands to sync the file system and prepare the LPAR for the reboot:
sync
cd /
umount /test
Reboot the LPAR. When the LPAR comes up, it should boot to the proper hdisk. At this point, you can log in locally as the user you created above without a password. Run “passwd root” to change the root password. Be sure to remove the entry you made in /etc/passwd after verifying that the password has been changed.

Sunday, January 3, 2010

Consolidating Filesystems in AIX

Have you needed to consolidate and migrate a filesystem that is spread over 2 physical disks onto 1 physical disk? You can easily do this in AIX without even unmounting the FS.

mjd@techmute mjd $ lspv
hdisk0          00007690a14d9fee                    rootvg
hdisk4          00007690a14cae39                    None
hdisk5          0000769091324b51                    rootvg

There is a FS on testlv that resides in rootvg but has active storage on both hdisk0 and hdisk5. To consolidate and move that filesystem to hdisk4, its as simple as 3 commands:

extendvg rootvg hdisk4
migratepv -l testlv hdisk5 hdisk4

At this point, half of the testlv is on hdisk4, and half is on hdisk0:

mjd@techmute mjd $ lspv -l hdisk4
hdisk4:
LV NAME               LPs   PPs   DISTRIBUTION          MOUNT POINT
testlv                100   100   00..67..33..00..00    /test
mjd@techmute mjd $ lspv -l hdisk0
hdisk0:
LV NAME               LPs   PPs   DISTRIBUTION          MOUNT POINT
[...]
testlv                100   100   39..61..00..00..00    /test
[...]

To finish the consolidation/move, migrate the last half.

mjd@techmute mjd $ migratepv -l testlv hdisk0 hdisk4
mjd@techmute mjd $ lspv -l hdisk4
hdisk4:
LV NAME               LPs   PPs   DISTRIBUTION          MOUNT POINT
testlv                200   200   66..67..67..00..00    /test

This is easier than trying to use cplv or restoring a backup onto the new disk.