High Efficiency Compression & Automation

Results

Generally it’s best to compress directly from source materials like Bluray or DVD, or (at least) something lossless like FLAC. Those files compress significantly better than from previously compressed files. I use MakeMKV to rip discs files without compression.

The compression script has a flag for quality. New movies that I haven’t watched I will compress with “high” flagged, otherwise for archival, or if the footage was like DV quality I won’t fuss, and will use the “low” flag. High flag uses “slow” compression, so it tends to be a bit crisper. For audio I always use high, because size is basically never a factor, it’s always going to be smaller than even 128kbps mp3.

Generally I saw files between half and 1/12th the size after the script had run. I was gaining a lot of movies in the collection during this period from redbox and netflix, but I basically reduced my 2tb collection, to 1tb, over the course of two weeks. If you’re wondering how I did it. I used a combination of 4x 4-core i7s, and I rented a 4-core high cpu node from Digital Ocean for about a week, and VPNed into my home network, and compressed over CIFS.

Bottlenecks

Generally I was able to run two or three concurrent compressions on a 4790s without losing to many FPS. The main bottleneck that I noticed was FFMPEG wouldn’t completely crush my CPU without multiple concurrent jobs. That was true with 8 threads, and 20 thread Xeons. Large 4k footage, or higher bitrate source material drastically slowed things down. HD TV rips would usually crawl around 24 fps even on low.

Is it worth it

The script was fun to create – learned a lot about piping scripts, and processing with grep and sed. The investment for getting 2tb of media was pretty substantial. People with a simple 4 core, or 2 core and more than a few hundred gbs are going to be stuck in a hard place. I consistently used 4x 4c8t machines for about two weeks straight. You pretty much need more than 4 threads to make this at all efficient. Plus hard drive wear and tear should be a consideration. I staged all of my compression on an SSD, so I only spun my raids up a few times during the process.