Do you sometimes have the feeling, of not feeling comfortable the way you handle and administrate your valuable music ( and video and image) collection?
You don't really know -- if all the audio file integrity is still given (after all those years of moving files here and there) !?!?
Many of us spent hundred of hours to shape their collection, rip it, rip it twice,
get the tags right, finding the right cover arts and do this and that. ( I know there are other people out there who select their tracks by filenames such as 1.wav - these people can stop reading at this point ;) )
Once you're finshed (if that ever happens - a collection is a living animal) with your collection you do a backup.
Usually most of the people just simply copy/paste (using MS explorer) the stuff from one disk to another.
Over time that'll be done many times. You keep copying your data back and forth.
You trust the OS (operating system) that it takes care on the consistancy and won't mess with the data.
There are others who're running RAID systems and feel save about doing it.( It can't get worse than that)
The more sophistcated users use backup tools and/or secure copy tools such as Teracopy .
Most of the users out there believe that this will be end of sorrows and discussion.
Fact is, not any of above systems will give you a waterproof solution of protecting your audio treasure.
Time is your enemy.
(As well as laziness and ignorance.)
You should always ask yourself "Are the files still OK?"
And you'd better do something about it.
Ignoring that question might lead to loss of more data over time than you might expect.
I also thought like -
"My OS backup tools take care on integrity checks etc. It'll work out. Keep the fingers crossed."
I know NOW that this attitude is more than inaproppriate.
You really need to actively verify your assumption/believe that everything is gonna be fine.
During the last weekend I did an analysis. I figured out that 30 files got corrupted. Over a couple of thousand tracks that's not much. Still we're talking about 30 corrupted CDs.
It really surprised me. I thought all the time I'd be doing things right.
Obviously the way I handled my collection must have had some weak points.
I really underestimated the situation.
30 files. And guess what. Not any of them belonged to the same album.
Obviously I had to re-rip those CDs again. That's possible if you still own the CDs.
What if you downloaded your tracks??
Bottom line. I concluded that I need to improve my data handling process.
The goal is to make 100% (or close to it) sure that no data loss can occur.
The whole process must become an automated process.
Because laziness and ignorance is creeping in sooner or later. I can't avoid it.
And I'm sure other people out there are not any different.
Perhaps those of you reading this until now, ask yourself
"How did he realize that data got corrupted???"
Data can get corrupted on many occasions. The typical copy/paste and backup routines try to avoid it. They are supposed to control overall integrity and ususally issue error or deviation messages if a problem is detected.
It's a known fact that e.g. Copy/Paste is not a save mechnism. Not any professional administrator would
use such a method. Backups as done with better backup tools are quite save. Still. Over time
it still can happen that you overwrite clean backups with corrupted backups.
(Error-) Messages need to be looked at. I'm pretty sure that 98% of all people ignore those messages issued by backup/copy or convertering tools with - "I'll look at it later - no time yet". Time and ignorance is our enemy.
And this way it happens that you get more and more corruption into your database over time.
Don't forget. Corruption of data can also occur very silently with aging/weared down hardisks, weired constellations, such as power outtage at the wrong moment asf.
Look at e.g. Amazon and elsewhere. People report of dieing HDDs after 6 months. I wouldn't trust a HDD that's older than 2-3 years.
What to do about it?
It's pretty simple - once you are aware of the problem. And now you are aware of it.
We just need to add some minor steps to the backup process - assuming you've got that one in place.
Beside just running your standard backups with rather sophisticated backup tools - and please - once more - a simple copy/paste won't cut it - you'd need - on top of your backup process -
1. to run an initial verification and cleanup project first
Make sure that the existing master and backup media are OK.
2. to check continously the integrity on a per file basis on your master HDD
and on your backup media - prior - and after a backup.
3. You need to do a 2nd backup - including integrity check
To accomplish above we need to go for an audio data format which comes with
an embedded integrity mechanism - a checksum.
flac would be the prefered option.
.wav files are e.g NOGOS.
Let's go on with flacs.
Flacs come with an embedded md5 checksum. That checksum is generated over the audio data chunk only. Which is good. Having a little problem or a change in the tag area is not ciritical and won't have an impact on the checksum mechanism. That checksum is written into the flac file.
That checksum will be renewed, as soon as you do some re-encoding or transcoding of the flac data.
If the flac codec detects a mismatch - while decoding - between the MD5 checksum and the decoded audio data chunk - it will issue a corruption message and stop decoding!!!.
That corruption message is KEY. You need to look for it and you have to fix the affected track on either of your storage devices.
How to introduce this to our "backup" AND not to forget "restore" procedure?
(We don't want to restore a corrupted collection!)
The flac(.exe) executable (you'll find it under windows and Linux) allows for identifing that checksum data mismatch and issues error mesages.
What you do is, you run a test-decoding ( flac -t <file> ) on all files on your master HDD and backup media prior and after any of your backups. (That check can easily take an hour on e.g 4.5k-5k tracks)
While doing the test-decoding , flac(.exe) will issue an error message if there's a problem with the checksum. You can write a batchfile under Windows or a script ( see Appendix II) under Linux.
There's also another tool you might use:
The dbPoweramp Reference converter tool offers a function called "Test Conversion". It basically does what the flac binary is doing if run with the -t option. You can also use that one. The dbPoweramp Batch converter tool allows to check your entire disk at once. That's most convenient. dbPoweramp also offers a feature called
Destination File on Error that moves the corrupted data automatically to a predefined target directory.
That's basically all we need to do.
Yep. That's about it.
A pretty simple measure - though very powerful. I'm sure my collection is kept much safer now.
I guess 99.9% of all collections out there got a problem. You might consider doing something about it.
If you follow my advise, I'd be really interested to see you reporting back your corruption rate (and associated backup strategy). ;)
1. Don't run simple Copy/Paste backups, use backup tools - look for recommended settings of these tools!!!
Usually you'll find numerous parameters, which doesn't mean much to you in the beginning. Later on you
usually, realize, why these parameters were introduced. Spent a little time on the subject.
2. Introduce incremental backups, which keep the original data as long as possible and store the "delta" data
at a different place on the same disk.
This way you save quite some space. You can have several backup cycles on one disk.
3. Use at least 2 backup disks - stored at different locations.
4. Buy disks which are used in the professional area. Cheap consumer stuff is not recommended
5. You don't need that fastest disks ( higher wear down effects). You'd need the most reliable.
6. Don't use old disks ( which e.g. were just replaced by your brandnew SSD) as backup media
7. Check the data integrity seperately - see above
8. Automate the process as much as possible and keep the logfiles.
Full Backup ( all partitions and bootsector)
Windows 7 - Backup and Restore
Acronis - The Free WD Version
dd (Linux - commandline)
Windows 7 - Backup and Restore
rsync (Linux and Windows - commandline - also remote via network! (ssh))
Note: The W7 tools are meanwhile able to compete with other commercial software.
IMO there's no need to go for Norton or Acronis.
Secure Copy ( with CRC check and error logging on failed transfers)
rsync (Linux and Windows - commandline)
Advise: It's always recommended to use a defragmented disk.
Copying data back and forth all the time, or any other jobs you run on your data, such as adding RG tags or similar, gets you lot of fragmentation on the disk. That fragmentation is gone if you backup your disk to another disk. If you can swap your master disk with the backup disk easily, use your backup disk as master
This way you always run a rather defragmented disk, without running an annoying defragmenattion process -
which puts a lot of pressure on your HDD.
Since I'm at home in the Linux world, I've written a simple one-liner that accomplishes the flac integrity check over your entire harddisk(s):
Open a terminal first.
**copy/paste below command into one line******
find / -iname "*flac" -print0 | while IFS= read -r -d '' "j" ; do flac -s -t "$j" 2>>/tmp/flac-integrity.log ; done
That'll take some time (hours) - 1 to 2 s per file.
You can replace "find /" with e.g. "find /media/music" to specify a specific music directory.
The scan result you'll find in /tmp/flac-integrity.log
To test above you might copy a CD to /tmp first and then you replace "find /" with e.g. "find /tmp"
You can run a:
grep "error" /tmp/flac-integrity.log | wc -l
That'll tell you if any and how many problems where found.