The silent death...

(Latest Update: Jun/2020)

Does it make U feel uneasy ( once in a while ) of not being in control of something? 
Have you ever experienced that such an uneasy feeling grows on you with time?

OR... 

RU the type of person who takes it easy  -  "What I don't know, I don't care to know..." ?  


Whatever type of person you're considering yourself to be, I'd suggest to spend 10 minutes to find out what the heck this guy is talking about. 




A music data collection, and this is what this is all about, can be considered quite a treasure. 

It's not about the TerraBytes, the number of tracks, and it's not necessarily all the money that went into such a collection.
 
To me it's all the passion, all the hours and work and enthusiasm to build and shape your very own and personal audio collection that makes it a treasure, your very own treasure. 

Therefore protecting such a treasure should be of highest priority to all of us, all of us who actually care about it. 

To do that properly - protecting that treasure - you need to get on top of things. To get on top of things requires ambition, knowledge and actions  - yep - I'm talking about work, it's your job to get it done!

Why am I bringing all this up? 

One day I realized I lost valuable audio files and several files got corrupted. And I also realized that I never realized I've even been cultivating these losses and corruption for quite some time.

It was time to get my act together.


What's been happening?

Many of us spend hundreds of hours to shape and maintain our audio collections.
We rip it. We rip it a 2nd time. We get the tags right. We find the right cover-arts.  

We re-encode or transcode the data. Some of us add replay-gain tags. We change
filenames and 
directory structures. We copy the data from a to b to c. We do this or that. 
Yes. We - is also me.

And yes. Many of the folks out there run backups being well aware that hundreds 
of hours shouldn't be wasted by facing a catastrophic hardware or software failure or much easier even a simple user error.

Over years and even decades you're living side by side to your collection. During all these years you trust your HW, OS (operating system) and integrated tools that the integrity of your data is been taken care of. You basically assume that nobody messes with your valuable data.

And here it is. The reason everybody should be feeling uneasy about all this!

You trust, You assume...

...it'll be all right.


I'm now telling you -- it's not gonna be all right!


Because by coincidence I realized that there's a tool and a feature that can prove your assumption right or wrong.  

Over the last weekend I ran a time consuming data analysis.  Surpise, Surprise! 

I figured that 30 audio files got corrupted. 30 files. And guess what. Not any of the corrupted files belonged to the same album. That means about 30 albums were incomplete or better corrupted. I hear you. Who cares about 30 files out of thousands. I do. I tried to outline the why earlier.

Anyhow. That result really surprised me. I thought I'd be doing things right. 
Obviously the way I've been handling my collection must have had some weak spots. I simply underestimated the situation.

I of course re-ripped these incomplete CDs. Lucky me. I still own many of the CDs. 
And it's been ""just"" 3 hours of work. What if you gave away your CDs, bought your tracks online or simply inherited them??? What if you lost 100s of tracks?

This experience kept me thinking and concluded:  I need to improve my data management process. I need to understand what can go wrong and what actually went wrong.

I issued a new goal: 

Make 100% (or more realistic close to 100%) sure that data-loss or data corruption gets detected. You won't be able to avoid that it will occur at one point. At least you should know.
The entire process must become a pretty much fully automated and continuously running process. Laziness and ignorance will be creeping in sooner or later. It simply can't be avoided. That automated process needs to rise a flag as soon as something goes wrong or an error gets detected. All that should be put in place in a structured way asap.


How did/does it all happen ?


Perhaps those of you still reading this article ask 

"How the heck did he realize that data got corrupted???"

The vast majority of my audio collection is flac. I also own some mp3s and and a bit of DSD material.

flac and flac only,  offers a built-in integrity check feature as a part of the codec. You already got the idea where this is going...

What this integrity check does, is comparing two MD5 checksums, a reference and a realtime checksum. The reference checksum gets generated and stored inside the flac file-header block during the encoding process. The checksum covers the audio data chunk of the flac file only and not the header or tags or image that are also stored inside the flac file. 
To get the realtime checksum the flac gets decoded and from the data chunk a new md5 gets generated.  
If the reference MD5 and the realtime generated MD5 checksum do not match the file is corrupt. An error will be issued and decoding will not even be started. Such a track is 
basically useless! Basically just a single audio bit change is sufficient to make the file useless. 

And that's how I found out. I ran a bulk integrity check over my entire collection. Which can take hours to conclude. 

And you might guess. I wasn't happy to see my error logs not being empty.

While thinking about it, it occurred to me the integrity-check is actually not the end of the story.

What I just did was identifying the corrupted files only! What I don't know at this stage is if any files are actually missing - completely gone!?!? Hmmh. Let's keep this in mind.

At that point I had no way to tell when and how the corruption happened. I didn't have any logs at hand.
And all my backups had the same issues. I had overwritten my backups, even my long-term backups,  with corrupted files at one point in time. 

And I didn't know if any track was missing because, I didn't know how many tracks I've actually been maintaining. On one occasion I accidentally stepped over one album starting at track 2. Here we go. Track 1 was missing!

That made me realize, that I had to do a bit of brainstorming and research about at what point these kind of things can theoretically happen.

Data can get lost or corrupted on many occasions if you think about it: 

  • The typical copy/paste and backup routines just "try" or better "suggest" to cope with it. These routines are supposed to control overall integrity and usually issue error or deviation messages if a problem occurs. Folks. copy/paste is not a save mechanism!!! Not any professional administrator I know would use this method.  
  • Harddisk and SSD failures. Corruption of data can occur very silently with aging/worn out harddisks. Look at e.g. Amazon reviews or elsewhere. People report of dying HDDs or SSDs that can already hit after short periods of 6 months. That's not the normal of course. I personally wouldn't trust a HDD that's older than 2-3 years.
  • Weird operating constellations, such as power outage at the wrong moment asf.
    can cause issues.
  • Software flaws (no SW is flawless) OS and apps (e.g. bulk conversion tools) can cause data loss or corruption.
  • System overload conditions - during overload conditions weird things can happen
  • Simple user faults - pushing a wrong button
  • Cyberattacks/Malware - might be causing any kind of weird stuff
  • and I am pretty sure there's more...

With all this in mind it should be more than obvious that our valuable data can get corrupted or lost at one point in time.


A closer look


The main challenge with all this is that many of these corruptions and losses occur under the radar. You simply don't see it happen. "The silent death".

Copy/paste doesn't generate logs. How would you know that anything happened during  - sometimes - hours of copying?

With tools that provide log options you could see what's happening. However. I am pretty sure the vast majority of folks out there simply don't ever look at logfiles.  

Most HDDs wear down slowly... 

And all this is exactly why it happens that more and more corruption and data loss gets induced into our databases over time - and remains undiscovered.

It's not enough to be aware of the challenges though.


What to do about it?


Once you are aware of the problem it's actually quite simple. 
And lucky you - now you are aware of it! ;) Let's do something about it.

We need to establish a backup process that includes a data consistency check. And that process has to be doable and reliable. And it has to be automated. I'm almost 100% sure that if you don't automate the process it'll fail over time.

Before we can establish such a thing we need to get the current data base under control.
  1. Analyse your current situation.
  2. Review your choice of file formats. flac is the only format that I am aware of that
    offers the integrity check function.
    And NO, for "audiophile" reasons there's no reason to stay with .wav! 
    You should convert your .wav to .flac with a bulk processor..
  3. Run a bulk integrity check. Several audio tools or the flac binary itself offer this option.
    and get yourself a clean base first.
  4. By simply counting (with a tool of course) all your audio files, you'll generate 
    your 100% file count reference. Next time you compare that reference with the as-is value and you know if a file simply vanished.
  5. What remains is to find out what file actually vanished. You could check
    all your tracknumbers (tag) on a per album basis.  But that requires that you have a consistent handling of track numbers in place. A challenge on its own.
    You could also save all your filenames to a file as reference and compare that to the
    as-is situation.
     
  6. Review your storage media. E.g. run filesystem-checks or check time of HDD operation, if you hear strange noises - act immediately. If needed replace your HDDs. 2TB HDDs run at around 70$/€ nowadays.
    Don't ever use aged and/or out-phased drives as backup media!

    What I do. After 3 years of operation max, I buy myself a new master drive.
  7. Make sure that your existing backups are OK.
    Many backup tools offer Test-Run backup options. The backup gets simulated.
    That's a pretty handy function. You can run test runs in both directions for test
    purposes. And after this you can analyze the logs.


    Note: 
    A file with an equal timestamp and an equal size doesn't have to be equal!

    A byte can flip and your backup tool (usually) won't recognize it. You can address this though by running checksum tests over the entire file. Many backup tools even offer that option. The issue with that. The checksum backup (-test) is very time consuming. 
    Most people are therefore not using it.  
  8. If you start with a data base review and cleanup exercise - make sure you have one extra healthy backup.
  9. If you own just one backup disk, you better introduce a 2nd (long term) backup media
  10. You need to introduce logs. Each backup and consistency check needs a log file. Make sure it gets generated. Store these logs on the backup media. You then always know where to find them! And of course you'll be able to analyse the situation when things started going south. 
  11. And then you need to analyze the logs!!!
    Look for errors or changes that doesn't make sense.
  12. Introduce incremental backups!  You might know the expression "restore points" from
    e.g. Windows.
    Here it means the initial backup remains untouched. Only the deltas that come with each new backup are getting stored in the following backup directories.
    Incremental backups are a very important safety-net to prevent overwriting your clean backups.
    Incremental backups let you roll back in time for quite a period.
    It requires a bit more space on the backup media though. It works well with
    minor data collection updates here and there.
    If you run major data changes, e.g. bulk conversions, you'd run out of disk space by using incremental backups.
    At one point you then have  to reestablish a new data base and related backups!
  13. Trigger the incremental backups by using timer daemons, such as Windows Task Scheduler or Linux cron.
  14. This automated backup or consistency check needs to generate a notification if an error is detected!
  15. More ?? Perhaps.

That's pretty much your ToDo list. That's a lot. Yep. That's pretty much what every sysadmin out their calls "my job".


Of course. There's a shortcut. Sign Up for Spotify or any other streaming service. ;)

The file format - Flac



As mentioned several times by now to accomplish above audio file related safety-net, 
we need to go for an audio data format which comes with an embedded integrity mechanism - a checksum.

Therefore flac is without doubt the preferred option. flac is lossless btw. It basically
can be converted into any other format if at one point in time you consider to change the format.

As already mentioned flacs come with an embedded md5 checksum. That checksum is generated over the audio data chunk only. The header, with it's tags asf.,  of the flac is therefore not protected! And that means if the header would hypothetically get corrupted 
you could also use info on the track or even the checksum could become corrupted.  

You realize. There's simply no 100% safety. But the probability that something will happen gets very low by running a solid process.

Keep also in mind. The flac reference checksum will be renewed, as soon as you do some re-encoding or transcoding of the flac data file. That'd lead to a complete new data base, which  would require you to rerun consistency, to create new reference logs and backups. 
Changing your base data is always a high risk task. And usually will force you to let go on your old database at one point in time! I've done it once or twice. Always with a bad feeling in the stomach.

Now. If the flac codec detects a checksum mismatch - while decoding -  it will issue a corruption message and stop decoding!!!.
That corruption message is KEY. You need to look for it and you have to fix the affected track on either/any side of your storage devices.

The way the integrity test is done is simply to run a so called test-decoding of the flac file.
You do that on a per track basis. 

flac --test  <filename>.flac 

(see Annex 2 which explains how to run a bulk check incl. Log under Linux)

For those of you who prefer applications with user interface:

E.g. dbPoweramp offers a function called "Test Conversion". It basically does what the flac binary would be doing. dbPoweramp offers a batch converter tool that allows to check your entire collection at once. dbPoweramp also offers a feature called "Move Destination File on Error" that would move the corrupted data automatically to a predefined target directory.

Another tool would be Foobar2000 with its "File Integrity Verifier" plugin. It basically  does the same job as dbpa.



Before running your new data management process over your entire database, I'd suggest to start with a few test files to setup and test the process thoroughly. Make sure you got your backups done before starting the journey.



Wrap UP


That'd be basically it. I am well aware that I am not providing a solution that covers it all.
I am simply giving directions.

I hope with this article I could make you all aware of a potential issue you're facing without being aware of it. 

I hope I could get you some inspiration to keep your treasure a bit safer now.

To be honest. Without setting up and running an automated process, even I, as an IT professional, am having the problem that laziness is creeping in over time. ;) Make sure
you get that sorted out.

Setting all this up is a lot of work and gets annoying at times. Once you have it all working you learned a lot and you never look back.

Good luck on the journey.

Enjoy.

#########################################################################
#########################################################################


Annex 1:


Short Summary - Backup:

  1. Go for the flac format. 
  2. Don't run simple Copy/Paste backups. Use backup tools - lookup  recommended settings of these tools!!!
    Usually you'll find numerous parameters, which doesn't mean much to you in the beginning. Later on you,  realize, why these parameters were introduced. Spent a little time on the subject.
  3. Introduce incremental backups, which keep the original data as long as possible and store the "delta" data at a different place on the same disk.
    This way you save quite some space. You can  have several backup cycles on one disk.
  4. Use at least 2 backup disks - stored at different locations.
  5. Buy disks which are used in the professional area. Rather cheap consumer stuff is not recommended
  6. You don't need the fastest disks (higher wear down effects). You'd need the most reliable.
  7. Don't use old disks ( which e.g. you just replaced by your brandnew SSD) as backup media
  8. Check the data integrity separately - see above
  9. Automate the process as much as possible and keep the logfiles.

Tools:

Full Backup ( all partitions and bootsector)
Windows 10 Backup - covers incremental backups! ( A nice HowTo) 
dd (Linux - commandline)

and more

Incremental Backup
rsync (Linux and Windows - commandline - also remote via network! (ssh))

and more

Note: The W10 tools are IMO able to compete with other commercial software.
There's IMO no need to look for other software.



Advise:   It's always recommended to use a defragmented harddisk or a trimmed ssd.
Copying data back and forth all the time, or any other jobs you run on your data, such as adding replay gain tags or alike, gets you lot of fragmentation on the disk. 
That fragmentation is gone if you backup your disk to another disk. If you can swap your master disk with the backup disk easily, use your backup disk as master

This way you always run a rather defragmented disk, without running an annoying defragmenation process - which puts a lot of pressure on your HDD.


###############################################

Annex 2:


Since I'm at home in the Linux world, I've written a simple one-liner that accomplishes the flac integrity check over your entire harddisk(s):

Open a terminal first.

**copy/paste below command into one line******

find / -iname "*flac" -print0 | while IFS= read -r -d '' "j" ; do   flac -s -t "$j"  2>>/tmp/flac-integrity.log ; done

************************************

That'll take some time (hours) - 1 to 2 s per file.

You can replace "find /" with e.g. "find /media/music" to specify a specific music directory.

The scan result you'll find in /tmp/flac-integrity.log

To test above you might copy a CD to /tmp first and then you replace  "find /" with e.g. "find /tmp"

You can run a:

grep -i "error"  /tmp/flac-integrity.log | wc -l

That'll tell you if any and how many problems where found.

6 comments:

  1. Hey Klaus,

    thanks for the headsup! Just checked my collection (203Gb, 14k files, 70% flac, 30% mp3) with dBpoweramp test. Only 3 files corrupted 1 of which I no longer need and 2 that I can rerip.

    Now I have to buy a external hdd so that I can do the backup^^. At the moment I just put a 2nd disc in my NAS, let it do its mirror job and take the 2nd back out...

    I stumbled upon your blog via diyaudio. Funny thing is I run a Squeezebox Touch as well and just 5 days ago ordered the DDX320v2 :D

    So you will be hearing from me in the near future about your mods which I am looking forward to doing and specificaly about how you have hooked up your SBT to the DDX.

    Schöne Grüße
    Ragnar

    ReplyDelete
  2. Great bit of info there — I wasn't aware that flacs had the md5 sum embedded and were so easy to test. Monthly cron job duly set up. Cheers!

    James

    ReplyDelete
  3. Hiya, cheers for the above, somewhat a beginner at pc stuff so will read through again and give it a bash :-) as i am slowly realizing the source file is rather important!

    I have always mirrored my master every day music hdd to back it up, so all the bits stay the same (or so one thought)

    Would say a RAID5 back up, play with the bits of a FLAC file, or would it stay the same and just spread out more ?

    cheers for any thoughts
    thanks again
    Mark

    ReplyDelete
  4. Hi,
    I got curious about this and tried it for myself with dbamp. Out of 39'000 flac files, found 4 corrupted.
    Re-ripped one of them as flac and tested it ok with dbamp.
    I then compared the wav version of the corrupted file with the wav of re-ripped file (both obtained with flac frontend) and EAC (compare wav) found them to be identical. Does this mean that corruption originated from the flac tags of the data?

    ReplyDelete

  5. The wav should have the same size as a decoded flac.

    To compare two file though the md5sum of both files have to be compared.

    The files can have the same size but can differ on the content.

    If a flac is broken - there's a mismatch of the embedded and stored md5sum and the just generated md5sum of
    the audio content.

    ReplyDelete
  6. Hi Klaus,
    after reading this blog, I now am AWARE of the corruption problem as well, but what can I do, if all my musical treasure is ripped (via dbPoweramp) to aif, ´cause I am living in the appleworld?

    By the way,thanx for all that absorbing blogstuff and DETAILED explanations, always fun to read.

    Holger

    ReplyDelete