Unfortunately only fixing Mac OS X backup, not the Tardis 🙁 … but, nonetheless, critical.
What bit of software do you really need to be reliable? If anything else goes really wrong you have the backup — but if the backup fails you really are lost.
And Mac OS X Time Machine, while it does have a very pretty interface, is inclined to get stuck sometimes.
This is my own story of how it goes wrong … and how to put it right.
… and throughout I’ve dropped in a few lessons for anyone implementing critical system software — maybe the odd Apple engineer is reading
how to tell when things are wrong
Occasionally Time Machine seems to be stuck, but isn’t really. When you first do a backup, or when you haven’t backed up to a particular disk for ages (perhaps if you have been away on a trip), it can spend several hours ‘preparing’. You can tell it is ‘preparing’ because when you open the Time Machine preferences there is the little barbers pole saying ‘preparing’ 😉
This is when it is running over the disk working out what it needs to backup, and always seems to be the lengthiest operation, actually backing up the disk is often quite fast, and yet, for some reason there is no indication of how far through the ‘preparing’ process it has got.
Lesson 1: make sure you include progress indicators for anything that can take a while, not just the obvious ‘slow’ things.
So, when you see ‘preparing’, just be patient!
However, at least half-a-dozen times over the last year, my Time Machine has got completely stuck. I have seen this happen in three ways:
(i) it is still saying ‘preparing’ after leaving it overnight!
(ii) it starts to transfer to disk, but then gets stuck part way:
(iii) if you look in the Time Machine preferences it says the backup has failed
This last time in fact the first sign was (iii), but it doesn’t actually tell you (if you don’t look) until it has failed for ten days, by which time I was travelling. In the days before Time Machine I always did a manual backup before travelling as I knew that was when things were most likely to go wrong, but now-a-days I have got used to relying on it and forget to check it is working OK … so if you are paranoid about your data, do peek occasionally at Time Machine to check it is still working!
When I got home and told Time Machine to backup to the Time Capsule here rather than my office disk (why can’t it remember that I have two backup disks??). Then (after being very very patient while to was ‘preparing’ for four hours), I saw it got stuck in step (ii) at 1.4 GB or 4.2 GB. Of course progress indicators are never very good for very slow operations, when transferring several GB of data there may be several minutes before the bar even moves a pixel … but I was very very patient and it definitely did not move!
Lesson 2: for very long processes supplement the progress indicator with some other indicator to show things are still working, in this case perhaps amount transferred in last minute
At this point I did the normal things, turn Time Machine On/Off, restart machine a couple of times, etc., but when it persists then you know something is deeply wrong.
so why does it go wrong?
In fact Fiona@lovefibre has found Time Machine flawless for her desktop machine backing up to exactly the same Time Capsule. I am guessing the problem I have is because I use a laptop so possible reasons:
- it may go to sleep occasionally, breaking connection to the Time Capsule
- maybe the WiFi aerial on a laptop is not as good as the desktop
However, if every laptop failed as often surely Apple would have fixed it by now. So guessing there is an additional factor:
- my disk has 196 Gb of data, much of it in smaller document files (word docs, code files, etc.), not just a few giant movies.
The software will be designed to withstand a certain amount of external failure, especially when connecting to disks over WiFi as the Time Capsule is designed to do. However, I imagine that there are places in the code where there are race conditions, or critical portions where external failure really makes a difference. If the external connections are reliable and the backup is quite fast the likelihood of hitting one of the nasty spots in the code is low. However, if you have a lot of data to check and then transfer and the external failures more frequent, then the likelihood of hitting one increases and things start to go wrong.
I see similar problems with other software, Dreamweaver in particular, which has got better, but still can crash if the Internet connection is poor (see also “Why software need never hang“). What happens is that during testing, the test machines often have minimal data, little software (maybe just the operating system and what is being tested), and operate in perfect situations. In such circumstances these hidden flaws never become apparent.
Lesson 3: make sure your test machine is fully loaded with data and applications, and operates in an unreliable environment, so that testing is realistic
However, this is not like Word crashing and losing your most recent edits to one document. When Time Machine fails it seems to occasionally leave something corrupt in the backup disk so that subsequent attempts to backup also fail. There is no excuse for this, the techniques for dealing with potential disk-writing failures are well established in both databases and low-level disk management. For example, one can save a timestamp file at the end of successful operations so that, when returning to the data, if the timestamp file is not there the software knows something went wrong last time.
Maybe Time Machine is trying to be too clever, picking up where it left off when, for example, connection to the disk is broken. If so it clearly needs some additional mechanism to notice “I’ve tried this several times and it keeps going wrong, maybe I need to back off to the last successful state”. Perhaps not something to worry about in less critical software, but not difficult to get right when it is really needed … as in backups!
Lesson 4: build critical software defensively in layers so that errors in one part do not affect the whole; and if saving to disk ensure there is some sort of atomic transaction
The aim during testing should be what I call “fail-fast programming” trying to make sure that failures happen during testing not real use!
One thing I found particularly disturbing about my most recent Time Machine hang is that when I looked at the system console it had regular spats of “unknown SIGSEGV” several times a minute … in the kernel! If you don’t know UNIX internals the ‘kernel’ is the heart of the operating system of the Mac, where all the lowest level work is done and where if something goes wrong everything fails. SIGSEGV means that some bit of software is trying to access a memory location that doesn’t exist. In fact while this is caught it is not so bad, the greater worry is that if it is trying to access non-existent memory, then it may corrupt other memory … and the kernel has access to everything – not good.
Please, please Apple if you cannot get Time Machine to work properly, do not let it affect the kernel!
how to put it right
One might hope that even if Time Machine cannot notice itself there is something wrong at least there would be an option to say “restart yourself”. One might hope, but there is not. However, you can do it yourself by digging a little into the backup disk itself.
First problem is to stop the Time Machine backup if it has hung.
In the Time Machine control panel, you can simply slide the OFF-ON button to OFF. The status should change to ‘stopping’ and after a while stop. Then you can restart the machine and try to fix things.
This is the ideal thing to do, but I find that when Time Machine is really hung this rarely works. I do turn it to OFF, but either it never changes to ‘stopping’ and stays ‘preparing’, or it changes to ‘stopping’, but never does. If this happens the system restart typically doesn’t restart the system as Time Machine won’t stop running. Then, always with much trepidation, I reach for the on/off button on the Mac itself :-/
After doing a hard on/off like this, I usually do anther restart from the Apple menu … not sure if this is necessary, but just to be on the safe side!
Occasionally I skip to the next step before the hard restart.
Then you can start to fix the problem properly.
Find the backup disk. If it is not obvious in the Finder use the ‘Go’ menu and select “Computer”; it shows all the locally connected disks (or it may simply appear in the left hand favourites pane in each Finder window).
If you skipped the restart stage (or of you just peek now to see what it is like when it hasn’t gone wrong), you will see something like “Backup of Alan Dix’s MacBook Pro” (obviously for you it will not be “Alan Dix’s MacBook Pro”!). This is the Time Machine backup. However, if you have restarted the machine with Time Machine off you will have to find the actual disk that you chose as your backup disk and on it look for a file called something like “Alan Dix’s MacBook Pro_0039fc56f8a2.sparsebundle”. This is some form of compressed disk image. In the older versions of Time Machine there was simply a folder with all the backups in it — I felt much more secure. Now this is a single opaque file and I worry that if one day it gets corrupted :-/
Having found the ‘sparsebundle’ double click it and it will display a little pop-up window that says ‘checking volumes’. I keep meaning to see if this ever stops, but I am not patient enough and press the button that says to skip this state and then (after a while) it mounts the disk image and the disk “Backup of Alan Dix’s MacBook Pro” appears.
Double click “Backup of Alan Dix’s MacBook Pro” and look inside and then inside the folder “Backups_backupd” and you find loads of dated folders, which are the actual backups of your system that you can browse if you prefer instead of using the Time Machine interface. In addition there may be one file ending “.inProgress”, which is some sort of internal file created while it is in the middle of doing the backup.
Delete the “.inProgress” file.
In addition, I usually delete the last of the dated folders (sort by “Date Modified” to get the last one). However, if you don’t want to lose the last backup you can try just deleting the “inProgress” file and only delete the last dated backup if Time Machine still gets stuck.
Important: only delete the latest of the dated backup folders (e.g. “2010-06-09-225547” in the screen shot above), NOT the entire “Alan Dix’s Macbook Pro” folder. If you do that you lose all your backups!
I recall doing this all with extreme trepidation the first time, but had got to the point when I couldn’t do backups or access them anyway so had nothing to lose. Actually it seems pretty OK getting in here and doing this sort of thing, the nice thing about Time Machine is that it uses ordinary folder structures that you can peek around in and see are there all secure. I am much happier with this than the kind of backup where you only know if it is working the day you try to restore something! At least half the times I have used such backups over the years I’ve found the backup is in some way corrupt or incomplete. So actually one up for Time Machine 🙂
Now reboot again (for luck). Turn Time Machine back on in the control panel and wait … a long time … it will start ‘preparing’ as if for the first backup … and several hours later hopefully all will be well.
But do remember to set the power save options not to go to sleep in the middle!
In fact the above has always worked for me except for this last time when, for some reason (maybe I missed something on the way?), it hung again and I had to go through the whole process again. This time I waited until yesterday evening before turning Time Machine back on so that I could leave it to do the long 4 hour ‘preparing’ stage without me doing anything else.
In its slight defence, Time Machine works far better when backing up to a local drive… back when I only had a 100Gb internal drive, I used to carry around a portable 250Gb drive for this purpose, and it worked great. Does seem pretty flaky over wifi, though.
Nowadays, I do the Time Machine over wifi thing as well, but I also do a clone (using SuperDuper) once a week to that portable 250Gb drive– it’s incremental too, so it only takes about 20 minutes after the initial big dump. Most Mac users I know recommend doing this… apart from the fact it means you’re not stuck if Time Machine decides your backup is corrupted just when you need it, you can also boot direct from a clone to get up and running again immediately in an emergency.
I have a fixed drive in my Office and have once had similar problems with that, so not entirely reliable with that, but certainly seems better.
Fiona used to use SuperDuper and that does seem a good belt-and-braces solution, especially as you say for the complete world has ended type situations … but it says something about Time Machine as backup software (*) that one feels one needs backup for the backup!
* Apple says about Time Machine “Never again worry about losing your digital files.” and about Time Capsule: “It automatically backs up everything, so you never have to worry about losing your important files.” … yea like :-/
Yeah, Time Machine could be much better alright. I suppose I’ve just never really considered it to be a proper backup solution anyway (but maybe that’s just because it doesn’t work as well as it ought to), more of a file-by-file, “darn, I wish I could get back that thing I deleted last week” sort of safety net.
Pingback: Persistent Inappeasable Mind
I have 2 internal hard drives in my G-5 desktop; the boot drive failed and I replaced it, reformatted it and did a time machine restore transfer. When I formatted the new drive I forgot to name it exactly what the previous one had been named and accidentally named it very similarly to the other internal hard drive’s name. It looks like it transferred the exact same data onto the new one that was on the other internal drive. I get an error message that disk utility was unable to repair the drive that was in tact and that I now need to erase and reformat that one as well.
My question is should I erase both drives and let Time machine restore transferr the data back on to the respective drives, or should I do one at a time or what?
Just don’t want to lose any data so I’ve disconnected the external time machine backup because I don’t want it backing up the incorrect, or duplicate sets of data. What should I do?
Thanks for these hints. I might give this a try. I’ve had problems where my attempt to use time machine failed in a similar fashion (got about half way and then stopped). After letting it run for three days, I finally gave up. I am using a combination with SilverKeeper (free SW with a drive purchase) and doing simple disk copies of my user-folder over to a second backup disk. It’s a little “clunky” but at least key files are being backep up. My plea(s) to Apple – unanswered. I like your comment about designing SW so that if it is going to die, it does it in real time.
Excellent article. Thanks!
I was wondering if you have any suggestions for a Time Machine problem that I’ve been having and which Apple has been unable to help.
I have my internal hard drive divided into 2 partitions. The main partition is Macintosh HD. The 2nd partition is Macintosh HD 2. When I first set-up Time Machine, both partitions were being backed up. For the past few months, only my main partition, Macintosh HD is being backed up. My Macintosh HD 2 partition is listed in the Time Machine Finder Sidebar window, but when I open it, no files or folders appear. Yet, the Time Machine Log states that it Copied 2207 files (2.0 GB) from volume Macintosh HD and 2208 files (2.0 GB) from volume Macintosh HD 2.
I have Time Machine set up on 3 machines backing up to a Western Digital NAS.
Most of the time it works pretty well, but I have had issues with corruption occurring as a result of network problems.
First I had a Gigabit network originally with a Netgear switch. The switch went senile and started causing badly framed packets unless I ran it at 100Mbit. This caused repeated Time Machine failures eventually resulting in Time Machine refusing to backup without “resetting” the backup (wiping out all the history and starting fresh).
Having replaced the switch, I have had a similar problem if I let Time Machine backup my MacBook Pro via WiFi. It is fine over a cable. I have a copy of the corrupted backup sitting on a spare disk, until I can find some way of re-integrating the history.
The sparsebundle is not actually a file, but a directory containing “bundles” (directories of files). MacOS like all bundles (eg .app) shows the directory as a file.
I am given to understand that this bundled disk image mechanism is more reliable than a sparse image file, and copes with file systems that do not support large files.
Since this post I had the kind of catastrophic failure you had where Time Machine started afresh deleting all backups! For a short while the bundle, before Time Machine deleted it, the old bundle appeared in its raw glory, just as you describe it, if I recall in 4.8Gb lumps. The fact that a ‘backup’ can fail so dramatically betrays a level of incompetence that is unfathomable. I believe, that as in your case, the origins of Time Machine’s faults lie in its lack of resilience to network failures, but given Apple sell the Time Capsule as a network backup drive, this doubly damnable!
Thanks Alan, this was very helpful and helped me understand why I too were having issues with time machine. (I’ve got one setup on a Buffalo NAS.)
I have another (time machine) situation that you might be able to help me with.
Essentially I lost my entire time machine on the NAS, not sure after which software update on the MBP or NAS was the culprit. So I simply started from scratch, i.e. a new time machine.
It was only a few days later after the mist had settled that I realised that I also backup the NAS and that I should thus have a copy of the original time machine which I should have simply restored :-/
Now that I understand the inner workings of time machine a little better I’m starting to think that I can still recover from my momentary lack of reason…
Surely if I make a copy of the current time machine, then restore the other one from the backup and disable time machine backups. Then I should be able to copy all the (dated) directories from the new time machine into the old time machine.
When I switch on the time machine again it should merely continue with its backups, but I would have gained the possibility to recover anything that would have only existed in the old time machine, right?
I suspect it is not that simple as I would need to isolate the initial full backup in the second newer time machine in order for this to work…
Have you played around with this yet?
Pingback: Alan’s blog » Time Machine – when it goes wrong and how to fix it | chimac.net – Stuff worth knowing about
Your article implies that sparsebundle files are created by newer versions of Time Machine. My understanding is that whether or not TM creates sparsebundle files depends on how you back up. Time Machine only uses the sparsebundle format when you back up over a network.
thanks John I’d not realised this as I usually backup over network to Timecapsule.
I assume the sparse bundle is also used when the file system is not Apple’s own as the non-standard structure would cause problems with other OS.
Orginally this didn’t use sparse bundle, but the modified Unix file system with DAG rather than strict hierarchy of folders even over network.
When it changed I was not happy as the sparse bundle just felt it was lily to be a more fragile alternative … and as I later found out (see comment 9 above), it is!
Recently, when I have wanted to retrieve a file from time machine I have often just opened the sparse bundle directly and found the file (to read not modify!!), as Time Machine interface itself is too slow to be usable. … and on my newest machine (Air), I have never managed to get Time Machine to initiate its first backup successfully, so it is all becoming moot. Instead all my most recent files are in Dropbox and replicated over multiple machines … although one day Dropbox will have a bug :-/
You are brilliant! I tried everything to get my Time Machine “unstuck” and this worked perfectly. Thank you!
Thanks for the hints, deleting the in progress and the last backup… did the trick.
Alan, thank you for posting this. Time Machine was stuck for hours but after a restart and deleting the .inProgress file and the most recent backup, everything is back to normal.
I tried to back-up my husband’s mac book for the first time and it stayed on “preparing to back-up” for a long time and the rainbow wheel was on and locked it up. I thought I could back it up on a second computer because that’s what I was told at the mac store the other day, but I’m worried it’s not going to work. Please help.
The solution I mailed about doesn’t help for the first backup, which seems interminably slow. However, like you interminably can mean literally never terminates – indeed my current macbook air, which was cloned form my old machine using a timemachine backup (it is useful for some things!), as *never* managed to backup itself using time machine!
Te good news is if you kill the failing timemachine backup, it has never (for me!) caused any problems.
The bottom line anyway is that Time Machine is NOT a reliable backup. I use it (when it works) to retrieve the odd individual file that I accidentally deleted or updated badly, but I do not trust it with the security of my data. As I mention somewhere in the comment threads, I once had TimeMachine give up entirely and delete its own complete history 🙁
There are a number of backup solutions, but have never had time to work out what is best. Now-a-days I use a bit of a hotch potch. All my older files are simply duplicated over two machines and I have a copy on an external hard drive. Everything new is in a (BIG) dropbox folder, which (so far!) seems very reliable even though it is doing everything over the internet and living on an island this is sometimes a little flaky! I am considering trying out SugarSync, another cloud solution.
I also periodically (but not really often enough) just copy everything onto a secondary hard disk (including system preferences). Big disks are now cheap compared with the costs of the computer, and get cheaper all the time.
Finally (and I don’t do this, but keeping meaning to!) – do make sure you have a copy of all the licence keys etc. for your software (maybe printed on paper!) – if total disaster happens, and you need to recover things, you may need these.
haveing trouble with the time on my computer after time i take the battery out the bACK ON MY LAPTOP AND SWITCH IT BACK ON IT GOES BACK TILL 2006 AND I HAVE DONE A BOOT
this is a very different kind of problem, so no deep advice, but … and I know this is a bit like “have you plugged it in?” … have you have tried just resetting the time in date and time preferences (Apple>Preferences or drop down from time display in the top bar)
I am ready to throw my wife’s iMAC in the garbage, it’s so damn frustrating trying to find an answer for the problem, I have almost given up. For some reason the “time machine” starts and locks everything up, I have gone to preferences and moved the slider to “off” but the damn thing persists Can’t seem to trash it, if it were a pc, I could remove it, but MAC doesn’t allow me that luxury. Anyone know how to kill this annoying feature…she doesn’t need a backup.
Thanks for the help; but when I try to delete the in progress file I get a message:
The operation can’t be completed because some items had to be skipped. For each item, choose File > Get Info, make sure “Locked” is deselected, and then check the Sharing & Permissions section. When you are sure the items are unlocked and not designated as Read Only or No Access, try again.
I’ve checked, and “locked” is deselected; and I can’t change the “sharing and permissions” settings for “staff” and “everyone” which are set to “read only”. Any suggestions..?
Thanks for the really good, clear and helpful advice here. You might get a job offer to help the Apple team in their fruit department.
I’ve used Time Machine on all my macs over the last 10 years but when I really needed it a few days ago it failed me miserably.
I bought a new 3TB Sata3 drive for my iMac and (after making sure Time Machine was up to date), following instructions from Youtube, installed it correctly.
Next, I booted off the OSX disc and did a Restore from Time Machine to transfer the original 500GB drive’s contents to the new Drive.
When it was finished, all looked normal, until I tried to open Mail – it insisted on importing about 45,000 emails, which crashed at 26,000 ish. When I tried to open Word I had to re-validate my license with the instal DVD code. Next, to my horror, my iPhoto library of about 12,000 photos was missing all this year’s photos.
I’ve tried searching through TM’s backup folder but they just aren’t there.
That’s it for Time Machine. I don’t trust it any more.
I still have the original drive, outside the iMac now, so now I want to clone it exactly to the new internal sata drive inside the iMac, replacing all the corrupt data on it. Is there any way I can do this without having to physically swap the drives again?
I have found that this “Time Machine hangup” happens when the backup disk is almost exactly 1/2 full. My solution is to retire the current backup disk and start with a newer (bigger) backup disk. Label the old one “before xx/xx/xxxx” and label the new one “after xx/xx/xxxx”. This always works!