Tue, 17 May 2022

Linux Command Line Magic

— SjG @ 12:24 pm

In day-to-day operations, cirumstances often arise where you need simple answers to fairly complicated situations. In the best scenario, the information is available to you in some structured way, like in a database, and you can come up with a query (e.g., “what percentage of our customers in January spent more than $7.50 on two consecutive Wednesdays” is something you could probably query). In other scenarios, the information is not as readily available or not in a structured format.

One nice thing about Linux and Unix-like operating systems, is that the filesystem can be interrogated by chaining various tools to make it cough up information you need.

For example, I needed to copy the assets from a digital asset management (DAM) system to a staging server to test a major code change. The wrinkle is that the DAM is located on a server with limited monthly bandwidth. So my challenge: what was the right number of files to copy down without exceeding the bandwidth cap?

So, to start out with, I use some simple commands to determine what I’m dealing with:

$ ls -1 asset_storage | wc -l

$ du -hs asset_storage
409G	asset_storage

So that first command lists all the files in the “asset_storage” directory, with the -1 flag saying to list one file per line, which is then piped into the word-count command with the -l flag which say to count lines. The second command tells me the storage requirement, with the -h flag asking for human-readble units.

I’ve got a problem. Over 10,000 files totalling over 400G of storage, and say my data cap is 5G. The first instinct is to say, “well, the average file size is 40M, so I may only be able to copy 125 files.” However, we know that’s wrong. There are some big video files and many small image thumbnails in there. So what if I only copy the smaller files?

$ find asset_storage -size -10M -print0 | xargs -0 du -hc | tail -n1
630M	total

Look at that beautiful sequence. Just look at it! The find command looks in the asset_storage directory for files smaller than 10M. The list it creates gets passed into the disk usage command via the super-useful xargs command. xargs takes a list that’s output from some command and uses that list as input parameters to another command. To be safe with weird characters (i.e., things that could cause trouble by being interpreted by the shell, like single quotes or parens or dollar signs) we use the -print0 flag from find (which forces it to use null terminators after each result output) and the -0 flag on xargs, which tells it to expect the null terminators. This takes the list of small files, passes them to the disk usage command with the -h (human-readable) and -c (cumulative) flags. The du command gives output for each file and for the sum total, but we only want the sum, so we pipe it into the tail command to just give us that last value.

So if we only include files under 10M, we can transfer them without getting close to our data cap. But what percentage of the files will be included?

$ find asset_storage -size -10M -print | wc -l

Again, the find command looks in the asset_storage directory for files smaller than 10M and each line is passed into the word count as before. So if we include only files smaller than 10M, we get 7,708 of the 10,384 files, or just under 75% of them! Hooray!

But when I started to create the tar file to transfer the files, something was wrong! The tar file was 2G and growing! Control C! Control C! What’s going on here?

What was wrong? Well, this is where it gets into the weeds a bit. It took me longer than I’d like to admit to track down. The shell command buffer has limitations on its length, and xargs has its own limitations. If the list it receives exceeds those limits, xargs splits the input and invokes the destination command multiple times, each with a chunk of the list. So in my example above, the find command was overwhelming the xargs buffer and the du command was called multiple times:

$ find asset_storage -size -10M -print0 | xargs -0 du -hc | grep -i total
6.1G	total
630M	total

My tail command was seeing that second total, and missing the first one! To make the computation work the way I’d wanted, I had to allocate more command line length to xargs (the size you can set is system dependent, and can be found with xargs --show-limits):

$ find asset_storage -size -10M -print0 | xargs -0 -s2000000 du -hc | grep -i total
6.6G	total

Playing with the file size threshold, I was finally able to determine that my ideal target was files under 5M, which still gave me 68% of the files and kept the final transfer down to about 3G.

In summary, do it this way:

$ find asset_storage -size -5M -print0 | xargs -0 -s2000000 du -hc | tail -n1
2.9G	total

$ find asset_storage -size -5M -print | wc -l

$ find asset_storage -size -5M -print0 | xargs -0 -s2000000 tar cf dam_image_backup.tgz

Thu, 2 Sep 2021

Character Encoding

— SjG @ 4:25 pm

Hm. Looks like the server upgrade from Ubuntu 18.04.5 LTS to Ubuntu 20.04.3 LTS changed some character encoding defaults somewhere. I’ll have to track down if it’s in the database (most likely), the database connection (possible), PHP, or somewhere else.

In the mean time, please forgive the occasional bad unicode character.

Filed in:

Fri, 9 Jul 2021

Virtual Apache SSL Hosts in a Docker Container

— SjG @ 1:40 pm

You might want to “containerize” your development hosting environment so you can easily migrate it from machine to machine. As a Docker noob, I had a bunch of issues getting this set up the first time, and wanted to share a working configuration. This example assumes you have Docker installed and operating. You can also skip reading this, and just download the files at GitHub.

First, we’ll need to create some directories. I create one for the Apache configurations, and one for the code projects I’ll be working on.
mkdir apache-php
mkdir project

Within project, you can check out the code for your various projects into subdirectories. For simplicity, I’ve created project1 and project2 in the sample code. The Apache web server within the container will serve content from these directories.

We’re going to use a fictional top-level domain (TLD) for our development environment. This way, the URL you use to access your sites will be the same every time you spin up a new dev environment, without having to worry about name servers. You do, however, have to worry about your /etc/hosts file (or your platform’s equivalent). Choose a TLD that will be easy to remember. For my example, I’m using “mylocal”. One thing to avoid: start the TLD with a letter rather than a number. Don’t include characters like hyphens. Please learn from my mistakes.

Edit your /etc/hosts, and add the lines: project1.mylocal project2.mylocal

Next, we’ll want to create an SSL certificate for use in development. The easiest way to do this is with mkcert. Once you have mkcert installed and working, you’ll create a wildcard certificate for your TLD:
cd php-apache
mkcert -install mylocal "*.mylocal"

Next, we create a compose.yaml file:

version: "3.9"
container_name: ApachePHPVirtual
- apache
context: .
dockerfile: PhpApacheDockerfile
- "./project:/var/www"
- 80:80
- 443:443
- "project1.mylocal:" # remember to add " project1.mylocal" to your /etc/hosts file or equivalent
- "project2.mylocal:" # remember to add " project2.mylocal" to your /etc/hosts file or equivalent
hostname: project1.mylocal # default
domainname: mylocal
tty: true # if you want to debug

This is pretty straightforward. We’re creating a container which we’ll call “ApachePHPVirtual,” it will have a network we call “apache” if we want to connect using other containers, and it links our top level project directory to /var/www in the container. We map ports 80 and 443 on our host machine to those same ports in the container. The extra_hosts directive adds our project names to the container’s /etc/hosts. We set up the container’s hostname to match our first project, and set the default domain to our “mylocal” TLD.

We then want to create configurations for each of the Apache virtual hosts. In the php-apache directory, we create config files for each project. These are just standard virtual host declarations, e.g.:

<VirtualHost *:80>
    ServerName project1.mylocal
    Redirect permanent / https://project1.mylocal/
<VirtualHost *:443>
    ServerName project1.mylocal
    DocumentRoot /var/www/project1
    ErrorLog ${APACHE_LOG_DIR}/project1-error.log
    CustomLog ${APACHE_LOG_DIR}/project1-access.log combined
    DirectoryIndex index.php

    <Directory "/var/www/project1">
        Options -Indexes +FollowSymLinks
        AllowOverride all
        Order allow,deny
        Allow from all

    SSLEngine On
    SSLCertificateFile    /etc/apache2/ssl/cert.pem
    SSLCertificateKeyFile /etc/apache2/ssl/cert-key.pem

You’ll need to create a similar configuration for each project. Note that the Document Root points at the mapped host directory. That means you won’t need to rebuild the container to see project changes.

The actual image for the Apache/PHP container is created and configured in our next file, “PhpApacheDockerfile”. So we create that:

FROM php:8.0.8-apache-buster

# add some packages
RUN docker-php-ext-install curl gd iconv pdo pdo_mysql soap zip

# Apache Config
COPY php-apache/project1.conf /etc/apache2/sites-available/project1.conf
COPY php-apache/project2.conf /etc/apache2/sites-available/project2.conf
COPY php-apache/mylocal+1-key.pem /etc/apache2/ssl/cert-key.pem
COPY php-apache/mylocal+1.pem /etc/apache2/ssl/cert.pem

# mod rewrite! SSL!
RUN a2enmod rewrite
RUN a2enmod ssl

# enable sites
RUN a2ensite project1.conf
RUN a2ensite project2.conf
RUN service apache2 restart

This pulls the php-8.0.8 image from DockerHub, adds in some PHP extensions, copies over our SSL certificate and key, copies our virtual host configuration files over, enables the projects, and restarts the Apache server.

Now, all that remains to do is build it and power it up:

docker-compose build && docker-compose up -d

You can now visit the project URls in your browser, e.g., https://project1.mylocal/

Tue, 7 Apr 2020

One-liner to get a directory’s worth of video times

— SjG @ 10:58 am

The ffmpeg family of programs is incredibly arcane and powerful for handling video and video information. I needed to get the run times of a collection of videos. Here’s a handy one-liner that creates output suitable for import into a spreadsheet:

for i in *.mp4; do q=ffprobe -i "$i" -show_entries format=duration -v quiet -of csv="p=0";echo "$i, $q"; done

Sample run:
$ cd ~/work/training_videos
$ for i in *.mp4; do q=ffprobe -i "$i" -show_entries format=duration -v quiet -of csv="p=0";echo "$i, $q"; done
First_steps.mp4, 70.868000
Create_a_project.mp4, 134.320000
Loading_libraries.mp4, 45.442000

Wed, 8 Aug 2018

Building Kannel 1.4.5 under CentOS 7.5

— SjG @ 1:05 pm

Well, it’s been a few years since I’ve had to build Kannel (see Compiling Kannel under CentOS 7.0), and a server migration required that I figure it out again.

You can find discussions on why Bison v.3.0 or later, standard with modern versions of Linux, prevents compilation from succeeding. Some discussions provide workarounds that I couldn’t make work.

Here’s how I downgraded Bison and built a working Kannel 1.4.5 binary on CentOS 7.5:

$ cd /usr/local/kannel
$ wget --no-check-certificate
$ tar xzvf gateway-1.4.5.tar.gz
$ cd gateway-1.4.5
$ sudo yum downgrade bison
$ sudo ./configure --prefix=/usr/local/kannel --disable-wap --enable-start-stop-daemon
$ sudo make
$ sudo make install

In that downgrade step, Bison was rolled back to version 2.7-4.el7. If you’re doing this at some indeterminate time in the future, the downgrade may be to a 3.x version, in which case you’ll want to downgrade until you’re at a 2.7.x version (also, if you’re at some indeterminate time in the future, simply check your watch and it will be roughly determinate again).