Tuesday, March 29, 2011

Adventures in Bare Metal Recovery

I've worked my way through several Bare Metal Recoveries before, including physical to virtual conversions, but this is the first time this has happened to me.  The fact that the server has been unable to contact the domain may have been a pre-existing condition which was further exasperated by a nasty malware infection.

This is a handy tool, if your server in question is a Domain Controller:
netdom resetpwd /server:server_name /userd:domain_name\administrator /passwordd:administrator_password
But if it's not, a tiny note on this page explains that:
Note: This method only works for DC. If it’s member server, we have to disjoin and rejoin domain.
And I was stuck with a member server.  Thankfully its services recovered after rejoining the domain without too much hacking.  A few updates later and we were back on track.

Monday, March 21, 2011

Weekend Warrior Wireless Project - Phase II

After this last weekend's project, I ended up with more on my plate to fix than originally planned.

Timing Really is Everything

As I noted in my last post, I love the Buffalo WHR-HP-G54, because its so well supported!  By setting my local physical NIC to 192.168.11.2/24, I was able to ping my device when it started up by using the 5/5/5 method:
ping -t -w 10 192.168.11.1
I found the Windows based GUI TFTP client listed here to be the best for restartable TFTP sessions.  First I'd get a a few errors, then at least 5 pings before it timed out.  Setting the timeout to 10, then starting the TFTP attempts prior to the 5/5/5 maintenance mode boot, I was able to send successfully!

Selecting options for a successful TFTP

Choosing the Right Firmware (Again)

Firmware was indeed what it came down to.  I chose a firmware that was both a different feature set, kernel, and wireless driver.  Essentially, it was too much for the nvram to handle.  Even during that initial upload, I was amazed at just how fast that firmware uploaded.  Oh, wait it actually didn't.  So back to my last "good" firmware, the 14929 vpn K24 sp2 build from Brainslayer.

From what research I have gathered, when making the jump to the NEWD-2 firmware, it includes a jump to the K26 series.  When I've got more time on my hands and as the builds mature, I'll jump back in and try out the latest and greatest.  Meanwhile the trusty 14929 will have to do.  At least if I did actually brick my device, I've got another one to spare.

Monday, March 14, 2011

Weekend Wireless Warrior Project

Since my father just retired the router I setup for him in favor of re-aligning his IP schema with the generic defaults of the 2WIRE device included with his AT&T U-Verse subscription, I re-inherited one of my favorite toys, a pre-lawsuit Buffalo WHR-HP-G54.  I already had one in my possession (originally desiring to setup a PTP OpenVPN, but never configured it).  I recently swapped mine out to put in my old Apple AirPort Express so we could stream music to the home theater system.  So, now that I've got two good, hackable routers, it was high time to upgrade the firmware using DD-WRT.

Preparing For The Upgrade

After reading the latest documentation on the Peacock Thread, I prepared my system by running a 30/30/30 Reset.  This helps to completely clear the nvram and prep for what's essentially a "schema" upgrade for the settings in the nvram.  Ultimately, bricking can occur if this isn't done as settings can get jumbled between firmware revisions. (A situation I was bound to experience.)

Researching the Right Firmware

Choosing the most up-to-date, feature complete, and out-right compatible.  It's not as easy as visiting the latest router database.  Originally, I thought my device was running build 13064 (10/10/09) using the VPN feature-set.  This build has since been known to be rather unstable, in spite of still being recommended by the router database.

Much to my surprise, after performing the 30/30/30 Reset, my router reported it was running firmware 10011 (07/27/08) vpn!  This is running the following firmware and wl driver:
Linux DD-WRT 2.4.36 #310 Sun Jul 27 16:25:32 CEST 2008 mips
wl0: Jul 27 2008 08:31:25 version 4.150.10.5
Yes, its still running Linux kernel 2.4.  Do a little research, and you'll find significant issues with the K26 (2.6 series kernel) builds, as well as with the later wireless drivers (NEWD and NEWD-2).  The key point is that "NEWD won't work only on corerev=4 radios. You can run it on 5 and 7, just virtual wireless interfaces won't work in AP mode. All the rest work."

After copious amounts of research, I settled on the v24-sp2 build 14929 vpn firmware.  And the initial flash was successful!  Here's the kernel and wireless driver versions for that build, a minor upgrade at the least:
Linux DD-WRT 2.4.37 #13291 Thu Aug 12 02:58:35 CEST 2010 mips
wl0: Aug 11 2010 05:22:07 version 4.150.10.31
Cut By The Bleeding Edge

But it wasn't enough, I had to run the 2.6 kernel for improved memory management and a better scheduler!  After all, I've only got 200Mhz of MIPS power, pulling in a whopping 199.0 BogoMIPs!  According to this forum post, my device is compatible with the nokaid build of the K24 NEWD-2 firmware.  This is one of the very reasons I chose this model router back in the day.  It's simply got the best support, recoverability, an mod-ability.

Well, I chose the NEWD-2 K26 firmware build 14929 std nokaid.  Notice anything wrong?  Yes, I chose to jump from K24 to K26 with NEWD-2.  And I bricked it.  Thankfully, I backed up my CFE for best recoverability.  At this point, I need to get my device into maintenance mode and TFTP the last good firmware onto it.  I really don't want to solder on a JTAG.

And now my weekend warrior project is now locked into my desk at the office for a little lunchtime project.

Monday, March 7, 2011

The Trouble with Replication

For those of you hoping to read about SQL Server, I'm sorry to disappoint, but the meat of this article is not for you.  But please everyone, remember:
The worst part about replication is that sometimes it fails, and no one knows.
I've been facilitating an Exchange 2003 to Exchange 2010 SP1 migration and while checking the prerequisites, we found a few problems, the largest of which was replication of the SYSVOL share, which is critical for high availability of Group Policy objects, and ultimately the availability of Active Directory services as a whole.

Getting ready for Exchange

During the expansion of the AD schema, the setup application couldn't find a domain controller.  All DNS tests showed up healthy, but a quick glance at the File Replication Service log revealed that FRS was not replicating the SYSVOL share, inhibiting the domain controller from responding to requests.  Oh, and that domain controller held all five FSMO roles.  Since this domain has both 2003 and 2008 servers, SYSVOL replication is still handled by FRS.  We transfered all of the FSMO roles to the 2003 controller, which was believed to be in a healthy state.

Restarting Replication via a Non-Authoritative Restore

A System State backup was taken on the 2003 server, which supposedly held an authoritative replica.  Since the 2008 controller needed a non-authoritative restore, the backup had to restored on the 2003 box to a redirected folder, as NT Backup and Windows Backup aren't directly compatible.  After this it was a matter of stopping FRS on the 2008 server, copying the files across hosts, setting a registry key and starting FRS.  However, FRS still failed to replication, blaming DNS, FRS on the authoritative host, or the lack of convergence in Active Directory's topology information for FRS.  After proving DNS resolution and full replication, the only recourse was to check on FRS on the authoritative host.

A Turn for the Worse

A quick restart of FRS on the "authoritative" host revealed that the SYSVOL share was in a JRNL_WRAP_ERROR.  Typically, this is resolved by enabling a registry key that effectively truncates the FRS database logs and requests a fresh copy from the next authoritative source.  But not so when the second replica has errors!  Now every Domain Controller in the forest is inhibited from bringing Active Directory online.  At this point no one is able to authenticate.

Breathing Life into a Dead System

Quickly performing an authoritative restore from the earlier System State backup brought the 2003 controller which held all of the FSMO roles online.  After that, it was a matter of performing the non-authoritative restore on the secondary controller brought FRS back online.  Then a transfer of the FSMO roles back to the 2008 controller tidied things up and gets us ready to introduce a second 2008 controller so we can finally complete a migration to a fully 2008 Active Directory domain and forest.  But even thing, an important item of note is that SYSVOL replication must be manually upgraded to take advantage of DFS based replication.

Post-mortem

Thankfully nothing did actually die, so with a good backup prior to starting work and a strong understanding of the fundamentals both of the design of Active Directory and FRS and of the limitations of the environment I was working in, I was able to keep the service outage to a minimum.  Aside from a rogue GPO that was setting the Windows Time service to disabled, this was the best challenge yet that Active Directory has given me.  I'm thankful it wasn't an irregular error like the aforementioned GPO trick, but with Microsoft's excellent documentation (Go v-dashes in the Technet/MSDN teams!), I was able to detect and resolve this issue in under an hour.