CD - RIP

(Latest Update: Oct-2020)

Anybody out there still ripping CDs? Or is it just me. ;)




Recently I figured that some tracks were missing. No idea when and why this happened. OK. Quick decision taken. Re-Rip the affected CDs. Yep - I still keep my CDs in the attic.

Gee Wiz... Re-Ripping a track "quickly" turned out into a major exercise (nightmare).

This article gives some background info and directions related to CD ripping and CD tagging.


Intro

I've been using dBPoweramp (dBP) for the (mass) audio extraction job in the past. It used to be my reference. And I think it still is the reference extraction tool out there. Good old EAC comes close to dBP of course and it's still (2020) free.

I havn't been ripping CDs very often in recent years. At the time I was about to start this project I figured I'm running an outdated dBPoweramp version. Sh.. . It happened before. Every time I want to use dBP once in a while I have to pay rather big $ for the next upgrade. Grrr. Nope. I'm not upgrading anymore. I just need a couple of tracks.

I also figured out that dBP wouldn't work properly within a virtual environment (Virtualbox) anyhow. I am running my W10 installation under Virtualbox. The CD-drive parameters required for Accurate Rip to work won't be accessible in a virtual environment. Therefore I didn't feel motivated to give EAC a try either. It'd be the same situation for EAC regarding Virtualbox as it is with dBP. In the end I'd need a full Windows 10 installation to run either of these tools. Nope. Not an option for me.

What a mess! A dead-end.


I kept going.


Let's have a look at Linux - my actual home-turf.

How is the situation on the Linux CD extraction field nowadays? It has never been great - that's why I've been using Windows based tools for the task, pretty much the only Windows tools I've been using. I wasn't quite optimistic that the Linux based CD extraction situation evolved a lot in recent years. I actually expected the opposite to be the case, considering the ongoing extinction of CD media.

Spoiler alert: I wasn't wrong with that prediction...

Over more then 2 decades there have been several, some of them quite promising, Linux CD extraction projects out there, such as

ABCDE, ripperX, Rubbyripper, Morituri, Whipper, soundjuicer, Asunder, Brasero, K3B, and some more...

After having a closer look - which took me another 2 hours - I figured that many of these projects are outdated - simply not maintained - thus pretty much dead -  or lack certain features! The most advanced of the bunch would have probably been ABCDE. However. ABCDE, as most of the other still usable tools, is not offering Accurate Rip support. To me AR support is the key feature. Without it I simply don't know if my rips are somewhat correct.


Again. What a mess!!! I considered the Linux CD extraction application arena another dead-end.


I kept going.

Let's think.

A Perfect Rip - Requirements


What actually are the crucial parameters to get a close to perfect copy of an audio CD?
  • a reliable HQ extraction tool
  • an intact CD - flawed, scratched and dirty CDs just waste your time and energy
  • a reference database - you need to know if your extracted data is correct
  • a widely supported and secure data format - flac and flac only !
  • a reliable tag database - however - you won't ever get around manually editing your tags anyway!
  • a source for album arts - Google will be your best friend on that one
That looks manageable. 

I finally decided to set up my own extraction process.  
The idea. I'd like to be able to handle and maintain it myself for the years to come with minimum effort. And that's actually one of the reasons why I am writing this article. Also to be used as a guideline for myself.


Let's see how far we get.



CD Extraction process

CD-Drive and Accurate Rip

The key feature for me is Accurate Rip support. Therefore the CD-Drive of choice needs to be present in the Accurate Rip drive database . This database lists the drive and its Drive-Offset-Correction parameter. The offset correction parameter is key. Applying the right offset correction during the extraction process is a must to end up with a "Accurate Rip" rip.
Accurate Rip is the only Audio Extraction Reference method I'm aware of that generates repeatable and reliable results over different drives.
Using AR is the only way to verify your rip results - each and every single track - against a common reference and other peoples drives and rips.
The simple logic. If other people with different drives end up with the same rip result (for the same CD release!) you can be sure that your rip is OK.
Bottom line. Not using AR will prevent from verifying your rips against a common reference. You'd basically never know what you just put on disk. AR is IMO therefore a must, if highest extraction quality matters to you.

How is AR working!?!? A checksum over the audio data chunk gets generated on a per track basis. This checksum is compared against the checksum as being stored in the AR database.
The tracks are connected to the disc-id - every CD has a disc-id, the CD identifier. The same album can have different disc-ids btw (reprints etc)  !
The Accurate Rip database holds results of million of rips. If the checksum of the just extracted track matches the AR database reference of that track, the rip can be considered "Accurate Rip" accurate.
 

There's a big BUT!!!


Can "Accurate Rip" rip results being trusted?


YES and NO !


I'm not kidding. This question turned out to be a very good and valid question!!!


A little history.


The very first Accurate Rip drive offset reference was derived from a ""a single patient-0"" CD-drive. The offset corrections of all further CD-drives were simply adjusted accordingly. The applied (guessed) offsets were adjusted on the new drive until the rip results were matching the "patient-0" result. The new drive and its drive-offset parameter then went into the Accurate Rip drive database.

How's the offset being used? During the actual CD extraction process, the extraction tool adjusts its input data stream by the number of samples given in the drive-offset-correction parameter. This is done by redefining the track borders - moving to earlier or later samples - to end up with the same result as "patient-0". The drive-offset that needs to be applied differs from drive to drive. What you can see here is, that every audio file would look different if using different drives. By using AR compliant ripping methods your extracted files get manipulated to look like "patient-0" tracks. They will all look the same.
Therefore the question of what drive to use is basically irrelevant as long as you know the its drive-offset.


Now comes the (old) news. 

The "Accurate Rip" rips are actually NOT accurate. What !?!??

The "Accurate Rip" results are first of all 100% identical !! Identical to the rip as if it would have been done on "patient-0" - the reference. AR can be trusted on that one!

You might already guess where this is going.

A developer did some research some time ago and proved the patient-0 reference to be wrong!!!!!! (meanwhile confirmed by the Accurate Rip designer!) It turned out the Accurate Rip drive offset reference is off by 30 samples !

(I'm sitting here and shake my head every time I reread this or think about this debacle.)


Can you imagine!?!? All the million rips done based on properly applied Accurate Rip offsets are flawed. The entire AR track database is flawed. The AR drive offset database is flawed. So. You simply can't trust Accurate Rip in terms of getting accurate rips as its name suggests.


What a disaster!


Being confronted with it the AR designer simply responded with something like, sorry, it can't be changed anymore. Of course not. A new database would be required. He didn't seem to consider to start over.  

However. How bad is all this? There are different opinions about it. In my opinion it's bad. You simply can't call these tracks accurate anymore. Should I forget about AR now? A good question. I'd say, not having any reference might be worse then having this 30-sample-flaw reference in place. At least I gonna end up with identical rips. Of course I could run two extractions, one with AR offset being applied just to see if the CD extracts fine and then on the second extraction I could apply the corrected offset to end up with a finally accurate rip. Do I want that? Do I really need it? For now I stay with the flawed but at least identical rips.

Just to make it clear. This is not just a Linux issue. This issue of course affects all tools on all platforms using Accurate Rip and that includes dBPoweramp and EAC under Windows!


Finally. Let's get the extraction planed.


Extraction Planning



What do we need?


Extraction CD drive


I had a look at Amazon and Accurate Rip database. And read some discussions here and there.

A reasonable device for the job seemed to be a Lite-ON eBAU108 drive (image above).
It sells at around 25$/€.

Its drive-offset ""correction"" is listed with "+6" in the Accurate Rip database. That means the actual drive offset is "-6". Keep that in mind!!!


Check out if your extraction tool of choice requires the "offset" or the "offset correction" as parameter.

That'll do.  


Extraction tool

 
Pretty much all Linux CD extraction tools are making use of a low level extraction tool
called cdparanoia. (It's not being maintained since 2008!) 
There's a newer libcdio-paranoia fork which is maintained. It also provides all the tools required.
Unfortunately not all distros have that newer version properly implemented (e.g. Raspberry PI OS). 

cdparanioa still works and is available on pretty much every Linux operating system out there.  cdparanoia is a very basic commandline tool (box) offering a wide range of features supporting a reliable and high quality CD extraction job. It extracts the plain data and saves them into .wav format! You can assign your CD-drive drive-offset correction, which is required for identical rips in line with Accurate Rip.


We'll end up with non-tagged .wav files after this step.

WAV to FLAC conversion

With a simple one-line command we can convert the generated .wav to .flac. We'll use the
"flac" commandline tool for it.

After this step we end up with non AR verified, non tagged flacs.

Accurate Rip verification


That's been a tricky one. Luckily a fellow called Cerebus from hydrogenaud.io wrote a standalone Perl script called ARFLac.pl that does the Accurtae Rip verification on flac files. There's also a C port or better called adaptation. It's much better then the Perl script. I do prefer the C version it's IMO the most advanced of the tools out there:





As you can see the confidence is greater 1 and pretty high @ 200 - telling us there've been 200 others having the same checksum.

After this step we'll know the tracks are "Accurate Rip" accurate. I am working on getting easier access to that C version.

Tagging

My favorite Linux tagging tool is Puddletag. It gets us access to the known tagging databases like gnudb-CDDB or MusicBrainz. It is pretty much as powerful as mp3tag under Windows. It also allows for bulk tagging asf. Further it also lets us create filenames and directories based on the chosen tags.









You now simply load the just ripped untagged and AR verified flacs into Puddletag and let it fetch the tags for you.  (mark all tracks and open the tag sources window and look up the freedb tags)
Of course you you cold add the tags manually.

Note:
Puddletag finally was ported (by the community - I did contribute some stuff) to Python 3.  The ported version had recently (August/2020) been merged into the original puddlteag repo. That's great. It's been officially stepped up to version 2.0 now. I don't know if any of the major distributions picked up the new code already. v2.0 can be easily installed from sources - on github you'll find a HowTo ( a 5min exercise for the experienced fellows around) . 



CoverArts

Google image search will be your best friend. I was never really satisfied using this or
that cover art search tool.





Wrap Up


That'll be it. We'v got all tools required at hand. And it doesn't look that complicated. 

Sounds like a plan. Let's go for it.

Now the fun part starts.

The Extraction - Prep Stage

I'd like to list some basic generic topics that should be considered while doing the preparations for a CD extraction project - no matter what tool or platform is being used:

  • I'd suggest to use flacs as target format with compression level 0 (see my flac articles) Forget wav or e.g. no-compression flacs! Or lossy mp3s.
    wav tagging is not supported widely. wav does not support file integrity checks.
    Hint: Don't go for No-compression flacs. These are much slower then flac C-0 from a decoding perspective. I tested it!
  • Make sure your flac encoder uses the latest flac code!
  • You should look for highest quality images as cover-arts via Google image search.
  • Look for clean images with a minimum resolution of 500x500 "square" pixel dimensions.
  • Make your choice for the coverart filename - and then keep that name for all your CDs.
    I recommend to use "folder.jpg" for all of them.
  • I do not embed coverarts into the files btw!
  • Usually you can't or don't want to use the default tag and file structures offered by whatever tools you'll be using. Please have a closer look at that!!! Use mp3tag under Windows or puddletag under Linux to edit and/or add your tags.
  • Have a closer look at the "genre" tag. To me this is a very important tag in dealing with my quite large collection. In 99% of all cases where I'm not looking for a specific album I first select the genre and then the album underneath. Very often the Genre tag is not set at all or set properly by the online tag databases!
  • Folder/file structure. Below you'll find my preferred structure: 

     /music/folk/Norah Jones-Come Away With Me-2002/02-Norah Jones-Come Away With Me - Seven Years.flac
        
     
    Most tools give you by default something like e.g.

    /music/Norah Jones/Come away with me/01-Seven Years.flac
      
    As you can see the actual flac filename wouldn't tell you much. On the long run you'll appreciate more info attached to the filename.
  • The filenames and directories are derived from below tag fields. All fields that are needed.
    • Tracknumber
    • Artist
    • Album
    • Title
    • Date
    • Genre
  • Classical Music Tagging. A challenge. And it'll also be a challenge with above tag structure. But that's the structure being supported by most of the tools and players out there. You better stick to it! You actually can't get around it. Otherwise you might end up with compatibility issues depending on what player app you'll be using. CDBB or Music Brainz won't get you proper or consistent tags for classical albums. There is no way around to manually edit classical tags! Just a hint. The way I do it. I add the (to me) key artist - soloist or conductor or orchestra - depending of the album into the artist tag. Usually the conductor/orchestra/composer goes then into the album tag. Most important. Make sure you have a great coverart that pretty much explains that classical CD! You'll appreciate it later on. 
  • Others things to think of in terms of tagging, file and directory naming. You might need to add more additional info the files, such as 
    • various artists/samplers 
    • different CD dates (first release/remaster1/remaster2) 
    • sample rates (e.g. add "-2496" to the Album tag) 
    • CD sets ( add CD1/CD2 to the Album tag ) 
    • ... 
Above list gives you an idea. You need to have that tag engineering done before you start the project! Otherwise you either do it all again or never again and end up with a messy database. I've met several people who switched to web based streaming services simply because the tagging situation on their own collection was totally messed up.

Do a test run with a couple of different album/genre rips and tools to make sure you can handle the whole thing properly and you like what you see. It takes a while. I know. ReRipping or ReTagging takes a lot more time though. You don't want to re-rip or re-tag hundreds of albums!

If your process and tools are well prepared, you should still calculate 10 minutes effort for a properly ripped, tagged (edited) and stored CD.


Don't forget to introduce a safe backup strategy! 1 original and 2 backup disks. Run backups, preferably incremental backups, during the rip project!



The Extraction  - The Linux Way



Due to lack of features and/or quality issues of most all-in-one (GUI based) Linux tools as outlined earlier, this CD-RIP exercise became a command-line exercise. Don't be afraid! We're talking about just just a few commands to get the job done.

I'll roughly outline the extraction process to get you an idea.

1. Tool Installation

First we need to install the required tools.

Debian based systems
sudo apt-get install cdparanoia flac 

Fedora
sudo dnf install cdparanoia flac

2. Attach your CD drive and insert the CD


3. Open a terminal


Use your own drive offset correction (looked up @ Accurate Rip Drive database ) and directory names in below example!

You might have to become root to execute the tasks.

Note: cdparanoia requires the drive offset-""correction"" parameter!


CD Extraction - HowTo

TARGET='/path/to/flac/album'
OFFSET='6'
CLEVEL='0'
cd-paranoia -Bw --sample-offset $OFFSET
flac --compression-level-$CLEVEL --delete-input-file *.wav
ARFlac.pl "$TARGET"


You'd need to add your own target directory name, driveoffset and flac compression level.
That'll be it. Just a few commands and you got your AR verified flacs on the disc.

4. Adding tags


Now you can open puddletag and add your tags and change the filenames (tags>>filename).


5. Add your downloaded and renamed covert-art 


6. Change folder and file permissions, if needed




And that'll be it.  You've got your album well extracted, structured and tagged.


Wrap Up

You'd think ripping and tagging a single CD "properly" should be an easy task. It wasn't. Especially under Linux it's been a challenge especially if looking for Accurate Rip support. 

Ripping a whole collection would then be a hell of a project (nightmare) under Linux.
I'd say if you have to rip a huge number of CDs better go for EAC on a Windows machine. It'll get the task done much easier. I won't recommend dBPoweramp anymore because I am not happy with the license fee handling and because of it's involvement in the Accurate Rip disaster. As a matter of fact you'll be asked to pay for non-accurate rips. If you don't care. Go ahead.

The Windows tools do not create better results - quality-wise - then above outlined Linux based approach! That's why I will stick to Linux the way as described above.

One thing you might consider before starting an exhausting rip or a collection overhaul project.
Do the math for signing up with a music streaming service. That'll save a lot of work and time.
Shaping up a collection, buying and managing hard-discs or SSDs and backup disks will also cost you several hundred $/€ and endless hours of work over the years!
If you're lucky you might still be able to sell your CD collection. That'd give you a nice head-start into the web streaming world.


I wish you good luck with your project.

Next time - I hope - ripping a CD will take me just 20 minutes - which would include re-reading this article to refresh my memory about the subject. ;)

Enjoy.

######################################################################

ANNEX


Below an example output of the ARFlac.pl program.


*************************************************

$ ARFlac.pl /tmp/Norah_Jones-Come_Away_With_Me-2002


/tmp/music/:8209656:6406260:9640260:7822164:8741796:10444644:6826680:8206716:11172000:7145964:8190840:11122020:7267680:8280216:
Checking AccurateRip database


Track Ripping Status [Disc ID: 0016f82c-b90a950e]
1 Accurately Ripped (confidence -56) [6ba01a43]
2 Accurately Ripped (confidence -56) [2a28e248]
3 Accurately Ripped (confidence -56) [7529437d]
4 Accurately Ripped (confidence -56) [822a4da0]
5 Accurately Ripped (confidence -56) [ebc715e6]
6 Accurately Ripped (confidence -56) [7d0e8bd8]
7 Accurately Ripped (confidence -56) [1ebdee8b]
8 Accurately Ripped (confidence -56) [7c54d45a]
9 Accurately Ripped (confidence -56) [17d8439c]
10 Accurately Ripped (confidence -56) [d6a501b3]
11 Accurately Ripped (confidence -56) [81b6a780]
12 Accurately Ripped (confidence -56) [e768a4cf]
13 Accurately Ripped (confidence -56) [25fab0d9]
14 Accurately Ripped (confidence -56) [d92fd059]
All Tracks Accurately Ripped.


*************************************************

You could also use ARFlac.pl  to check any of your already ripped CDs against AR.