fogbound.net




Tue, 17 May 2022

Linux Command Line Magic

— SjG @ 12:24 pm

In day-to-day operations, cirumstances often arise where you need simple answers to fairly complicated situations. In the best scenario, the information is available to you in some structured way, like in a database, and you can come up with a query (e.g., “what percentage of our customers in January spent more than $7.50 on two consecutive Wednesdays” is something you could probably query). In other scenarios, the information is not as readily available or not in a structured format.

One nice thing about Linux and Unix-like operating systems, is that the filesystem can be interrogated by chaining various tools to make it cough up information you need.

For example, I needed to copy the assets from a digital asset management (DAM) system to a staging server to test a major code change. The wrinkle is that the DAM is located on a server with limited monthly bandwidth. So my challenge: what was the right number of files to copy down without exceeding the bandwidth cap?

So, to start out with, I use some simple commands to determine what I’m dealing with:

$ ls -1 asset_storage | wc -l
10384

$ du -hs asset_storage
409G	asset_storage

So that first command lists all the files in the “asset_storage” directory, with the -1 flag saying to list one file per line, which is then piped into the word-count command with the -l flag which say to count lines. The second command tells me the storage requirement, with the -h flag asking for human-readble units.

I’ve got a problem. Over 10,000 files totalling over 400G of storage, and say my data cap is 5G. The first instinct is to say, “well, the average file size is 40M, so I may only be able to copy 125 files.” However, we know that’s wrong. There are some big video files and many small image thumbnails in there. So what if I only copy the smaller files?

$ find asset_storage -size -10M -print0 | xargs -0 du -hc | tail -n1
630M	total

Look at that beautiful sequence. Just look at it! The find command looks in the asset_storage directory for files smaller than 10M. The list it creates gets passed into the disk usage command via the super-useful xargs command. xargs takes a list that’s output from some command and uses that list as input parameters to another command. To be safe with weird characters (i.e., things that could cause trouble by being interpreted by the shell, like single quotes or parens or dollar signs) we use the -print0 flag from find (which forces it to use null terminators after each result output) and the -0 flag on xargs, which tells it to expect the null terminators. This takes the list of small files, passes them to the disk usage command with the -h (human-readable) and -c (cumulative) flags. The du command gives output for each file and for the sum total, but we only want the sum, so we pipe it into the tail command to just give us that last value.

So if we only include files under 10M, we can transfer them without getting close to our data cap. But what percentage of the files will be included?

$ find asset_storage -size -10M -print | wc -l
7708

Again, the find command looks in the asset_storage directory for files smaller than 10M and each line is passed into the word count as before. So if we include only files smaller than 10M, we get 7,708 of the 10,384 files, or just under 75% of them! Hooray!

But when I started to create the tar file to transfer the files, something was wrong! The tar file was 2G and growing! Control C! Control C! What’s going on here?

What was wrong? Well, this is where it gets into the weeds a bit. It took me longer than I’d like to admit to track down. The shell command buffer has limitations on its length, and xargs has its own limitations. If the list it receives exceeds those limits, xargs splits the input and invokes the destination command multiple times, each with a chunk of the list. So in my example above, the find command was overwhelming the xargs buffer and the du command was called multiple times:

$ find asset_storage -size -10M -print0 | xargs -0 du -hc | grep -i total
6.1G	total
630M	total

My tail command was seeing that second total, and missing the first one! To make the computation work the way I’d wanted, I had to allocate more command line length to xargs (the size you can set is system dependent, and can be found with xargs --show-limits):

$ find asset_storage -size -10M -print0 | xargs -0 -s2000000 du -hc | grep -i total
6.6G	total

Playing with the file size threshold, I was finally able to determine that my ideal target was files under 5M, which still gave me 68% of the files and kept the final transfer down to about 3G.

In summary, do it this way:

$ find asset_storage -size -5M -print0 | xargs -0 -s2000000 du -hc | tail -n1
2.9G	total

$ find asset_storage -size -5M -print | wc -l
7094

$ find asset_storage -size -5M -print0 | xargs -0 -s2000000 tar cf dam_image_backup.tgz


Sun, 13 Mar 2022

The Programming Curse

— SjG @ 10:28 am

Programming is fun. You can be off doing some chore and get this idea … “hey, wouldn’t it be cool if I could just have the computer help me with this …”

So, you come up with an idea, and you think through the first few steps, and throw together a script. Then you play with it, and you get excited. It works, sort of, but you can see ways to make it better. You make changes and discover a better approach to the problem — so you implement that, and before you know it, you’ve spent an evening or an afternoon. It’s exciting to watch your ideas turn into something.

But computers get ever more complex, and the power and complexity of interfacing also gets ever more complex. Keeping pace with that increased complexity are more and more powerful development tools. This is a double-edged sword: you can easily do amazing things that would have once been very difficult, but getting set up can be more challenging and when things go wrong, it’s harder to figure out why.

For some ideas, getting to the point of coding is still easy as entering php -a or python and starting to type. For other ideas, though, there is the dreaded setup problem. I call this phenomenon “The Programming Curse.”

For example, I had an idea for a phone app that I wanted to prototype. In the old days, I’d have had to break out XCode and learn Swift and all of the iOS libraries. Today, however, I can use more familiar (to me) web technologies, and build an app using the Ionic Framework. Now I have a toolchain that includes at least nodejs, the Ionic framework, Ruby Gems, and XCode. I know very little about any of these things’ internals, and I really don’t want to know a lot about them. I just want to explore my code idea!

Sadly, I have to learn something about the internals. My first attempt to install the toolchain failed deep inside a nodejs package setup. After extensive googling, I find that it’s because one of the components is not the latest version (but there’s a reason for that1).

Maybe I’ve just gotten old, or maybe I’m just lazy. I’m certainly not the first to gripe about this phenomenon2. It just dampens the fun when during that excited “wouldn’t it be cool” phase I have to spend hours getting a functional development environment together instead of actually getting to write code.

1 The problem is that I support a phone app that was written in an earlier version of the Ionic framework, and it depends on a Cordova plug-in that’s no longer supported. The plug-in still works, but I can’t update my development environment for my new project because the dependencies would clobber my ability to update new builds of my old project. Could that be resolved by selectively holding back some packages to previous versions? Maybe. Three or four hour’s worth of effort in that direction didn’t get me anywhere, other than dependency hell.
For my web-only projects, I use products like Docker to keep a fully isolated development environment per project. Since Ionic depends on nodejs which installs globally (and since I need XCode to perform the final build), I haven’t found a way to do that. I guess if I made some Mac OS virtual machines, I could, but it seems like a lot of overhead.

2Fifteen years ago, David Brin wrote an article on Why Johnny Can’t Code extolling the virtues of BASIC. I find myself grudgingly agreeing — not about his specific language objections (I don’t know why he felt Perl or Python are any further from the metal than BASIC), but about how and why it should be easier to write small programs.


Fri, 11 Mar 2022

Of House Mountains and AR

— SjG @ 12:01 pm

Many, many years ago, a Swiss exchange student introduced us to the concept of a “house mountain.” It’s sort of the landscape view equivalent of home base: the mountain that you see from wherever “home” is.

Separately, I just came across a discussion of augmented reality applications, which reminded me of the outstanding PeakFinder web site and mobile app. I first encountered PeakFinder in 2013 when I was loading up my first iPhone. It was one of the two applications that showed me the enormous promise of augmented reality (the other being the original Star Walk). I was able to install PeakFinder on my phone, and identify peaks when hiking in the Sierra Nevada, on a trip to the Atacama desert in Chile, from a ferry crossing Horseshoe Bay in British Colombia, and many other places.

In general, I find I use PeakFinder without the AR mode. I just point my phone around the horizon and see the peaks labeled, recognizing them by the basic shape. But if you want to know what’s that peak in a picture you took last year, PeakFinder has a neat feature where you can import your photo and then overlay the data. It requires that GPS coordinates were embedded in the picture or that you can find the spot where you took the picture on a map. Tilting the camera off level and/or lens distortion make the overlay approximate, but it’s almost always good enough!

Shot from the train in Alberta, Canada as we approached Jasper
Volcanos and Laguna Miscanti, Chile

So, marrying the concepts of AR and house mountains, PeakFinder lets you generate the view from any arbitrary point and even keep it as a “favorite.” So even if you aren’t in a place or don’t have a picture from that location, you can see your house mountain, like this view of my childhood house mountain.

AR House Mountain


Fri, 17 Dec 2021

Manipulating SVG in PHP, part 2.

— SjG @ 1:06 pm

As mentioned in Part 1, the purpose of this code is to help automate file format conversions and presentation for posting vector designs to Etsy. One kind of design I make is “shadow theaters,” which are essentially little layered dioramas that look nice when backlit.

Sample 8-Layer Shadow Theater

The individual layers of these designs comprise an outline, scoring lines for folding the paper, and the design itself. However, creating a thumbnail to represent the design has a unique challenge: the layers are negatives. The design file fills the areas that will be cut out, but this is where the greatest illumination will come through. I want the thumbnail to look like the photograph above. Furthermore, I can’t simply composite all the layers, as they are opaque, and the result would only show the top layer.

So the task becomes to invert the layers. But even that isn’t so simple. In each of the layers, I want to take the filled area and make it transparent, and take the unfilled area and fill it. To really make it work, the filled areas should not be a uniform color, but should vary by how far “down” in the stack the layer is.

However, this isn’t as simple as removing layers like I did in Part 1. What I have to do is take the outermost layer, fill it, and subtract the geometry of the inner design from it. Fortunately, that’s not so difficult to do in SVG. If you take a path in SVG and close it, you can continue to add sub-paths. When those sub-paths are outside the original path, they are added to it. If the sub-paths are inside the original path, they are subtracted from it. If they overlap, they’re both added and subtracted. This is probably better explained with pictures.

Disclaimer: for the purpose of this discussion, I’m assuming two SVG settings: fill-rule and clip-rule are set to “evenodd”. There are other options, but they aren’t helpful for what I want to do.

What do these paths and sub-paths look like? If you look at the example from Part 1, you can see that a path has a d attribute, which is a string of commands and coordinates. Rather than go through those here, I’ll just link the relevant Mozilla SVG tutorial page which I found helpful. The key thing is that a complete, closed path ends with a Z command. Any path details after that are a subpath.

Looking at an SVG file of one of my shadow gallery layers, what I want to do is convert the “Design” layer to a fully-contained sub-path of the “PaperOutline” layer. Based on what we know of SVG paths, that means I can just append the design path to the outline path – the design path will become a sub-path which will be subtracted from the outline path! I then can set the fill color of this composite path, and I will effectively have filled the outline and subtracted my design.

I again take advantage of the fact that the layer names are following my standard. I search through the file for the “Design” layer, and copy its path into the $subtract_path variable. Then I loop through the groups again, find the “PaperOutline” layer, and append the path. While I’m at it, I set the fill color with a shade I calculate based on the layer number.

<?php
$total_layer_count = 1;
for ($this_layer_index = 1; $this_layer_index <= $total_layer_count; $this_layer_index++)
{
    $doc = new DOMDocument();
    $doc->load("layer-{$this_layer_index}.svg");
    $shade = 255 * ($total_layer_count - $this_layer_index / 1.5) / $total_layer_count;
    $subtract_path = '';
    $rm = [];
    $groups = $doc->getElementsByTagName('g');
    foreach ($groups as $this_group)
    {
        $layer_name = $this_group->getAttribute('id');
        if (preg_match('/design/i', $layer_name))
        {
            foreach ($this_group->childNodes as $node)
            {
                if (in_array(get_class($node), ['DOMNode', 'DOMElement']))
                    $subtract_path .= $node->getAttribute('d');
            }
            array_push($rm, $this_group);
        }
    }
    foreach ($groups as $this_group)
    {
        $layer_name = $this_group->getAttribute('id');
        if (preg_match('/outlinepaper/i', $layer_name))
        {
            foreach ($this_group->childNodes as $node)
            {
                if (in_array(get_class($node), ['DOMNode', 'DOMElement']))
                {
                    $opath = $node->getAttribute('d');
                    $node->setAttribute('d', $opath . $subtract_path);
                    $style = $node->getAttribute('style');
                    if (preg_match_all('/fill:([^;]+)/', $style, $matches))
                    {
                        foreach ($matches[1] as $tm)
                        {
                            $style = str_replace($tm, "rgb($shade,$shade,$shade)", $style);
                        }
                    }
                    $node->setAttribute('style', $style);
                }
            }
        }
    }
    foreach ($rm as $trm)
    {
        $trm->parentNode->removeChild($trm);
    }
    $res = $doc->saveXML();
    $out = fopen("layer-{$this_layer_index}-inverted.svg", 'w');
    fwrite($out, $res);
    fclose($out);
}

Not the most elegant code in the world, and will certainly fail on SVG files that don’t match my naming and layout conventions, but illustrates the process.

Once I’ve created these inverted SVGs, I convert them into PNGs, composite the stack into a single image, and come up with what I consider a handsome representation for my design thumbnail.

Composited, complete

Once I have the ability to take layers and change the transparent areas and the fill color, I can do all sorts of other manipulations. For example, I can create striking negative images, or feed the the layers into other scripts to create “exploded” views of the shadow theaters.


Tue, 7 Dec 2021

Manipulating SVG files in PHP, Part 1.

— SjG @ 9:10 am

Now, before I even start, I have to admit this is a weird idea. I’m sure there are better languages and libraries for this stuff. But I’ll explain the rationale and then go into the details.

I’ve been creating a lot of designs for cards and other things which I cut out of paper with a Silhouette Cameo or out of other materials using the laser cutter at CrashSpace. I sell some of these designs at Etsy.com. To streamline the process of making files available in a variety of formats and posting them for sale at Etsy, I’ve written myself some PHP scripts that manipulate files, build thumbnails, and bundle stuff together and submit them for sale via Etsy’s API. Why PHP? The short answer is that it’s the language I use most on a day-to-day basis for work, and I have a lot of experience using it to manipulate images and interact with APIs. Are there languages that would be better suited? Probably, but the point of these scripts is not to be elegant. They just have to be something I can maintain, extend, and use to get the job done.

Anyway, I don’t know exactly how people will use the design files or what software they’ll use, so to make things flexible for them, I export the different layers of a file separately and in combination, e.g., a file with the outline, a file with the scoring lines, a file with the design, and a file combining all of the elements. To do this, I have a naming convention for the layers that I create in Affinity Designer, and use those names to manipulate the layers.

So, for example, start with a card with three layers: Outline, Score, and Design. I export it as an SVG file from Designer. There are a few settings that make the easier to manipulate later: especially “flatten transforms” and “add line breaks”.

SVG export settings

SVG files are just fancy text files. Consider the following sample image:

Sample card with outline, score, and design layers

If you open up this SVG file in a text editor, you’ll see the source. It’s got a header with some file details like viewport size and defaults for rendering lines and shapes. Each layer from Designer is represented in the file as a group (the <g> tag), and each group contains paths with the actual geometry:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg width="3300px" height="2550px" version="1.1" xmlns="http://www.w3.org/2000/svg"
     xmlns:xlink="http://www.w3.org/1999/xlink" xml:space="preserve"
     xmlns:serif="http://www.serif.com/"
     style="fill-rule:evenodd;clip-rule:evenodd;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:1.5;">
    <g id="Score">
        <path d="M1650.09,152.011L1650.09,2246.78" style="fill:none;stroke:black;stroke-
width:0.42px;stroke-dasharray:1.25,4.17,0,0;"/>
    </g>
    <g id="Outline">
        <path d="M150,150L3150,150" style="fill:none;stroke:black;stroke-width:4.63px;"/>
        <path d="M3150,150L3150,2250" style="fill:none;stroke:black;stroke-width:4.63px;"/>
        <path d="M150,150L150,2250" style="fill:none;stroke:black;stroke-width:4.63px;"/>
        <path d="M150.124,2250.07L3150.12,2250.07" style="fill:none;stroke:black;stroke-width:4.63px;"/>
    </g>
    <g id="Design">
        <path d="M2594.13,912.644L2594.13,524.494L2087.86,524.494L2087.86,1030.77L2387.83,1030.77L2387.83,
1362.85L2838.03,1362.85L2838.03,912.644L2594.13,912.644Z" style="fill:rgb(128,128,128);stroke:black;
stroke-width:1.46px;"/>
    </g>
</svg>

One of the things my scripts do is create “hairlined” versions of the file. This makes sure all shapes are unfilled and have very thin border strokes. Since the SVG file is just text, I can manipulate it with same kinds of tools I’d use to manipulate any text file. If you’re as demented as I am, this includes regular expressions, which is what I use for the hairlining:

$svg = file('original.svg');
foreach ($svg as $tlidx => $tl)
{
   if (preg_match_all('/fill:((?!none)[^";]+)/', $svg[$tlidx], $matches))
   {
      foreach ($matches[1] as $tm)
         $svg[$tlidx] = str_replace($tm, 'none', $svg[$tlidx]);
   }
   if (preg_match_all('/stroke-width:([^";]+)/', $svg[$tlidx], $matches))
   {
      foreach ($matches[1] as $tm)
         $svg[$tlidx] = str_replace('stroke-width:' . $tm, 'stroke-width:1px', $svg[$tlidx]);
   }
}
file_put_contents('converted.svg',implode("\n",$svg));

This is crude but effective. It replaces every fill style with “none,” and converts every stroke-width to a single pixel width. Using regular expressions is generally a fool’s errand, but this particular set seems to work pretty reliably.

One of the other things that my scripts do is take the source file with all three layers, and save versions with fewer layers. For example, one file contains only layers that will be cut rather than scored. This is where the specifics of the SVG format gets interesting. It turns out that the SVG file format is based on XML, thus we have a wealth of tools at our disposal to process them. In this case, we will use the power of PHP’s dreaded DOMDocument to manipulate the SVG file.

Here’s how to go through and remove the “Score” layer from the file.

$remove = [];
$dom = new DOMDocument();
$dom->load('original.svg');
$groups = $dom->getElementsByTagName('g');
foreach ($groups as $this_group)
{
    $layer_name = $this_group->getAttribute('id');
    if (!strcasecmp($layer_name, 'Score'))
    {
        array_push($remove, $this_group);
    }
}
foreach ($remove as $this_removal)
{
    $this_removal->parentNode->removeChild($this_removal);
}
$res = $dom->saveXML();
$out = fopen("scoreless.svg", 'w');
fwrite($out, $res);
fclose($out);

In the SVG file, the layer name is stored in the group’s id attribute, so the code iterates through the groups in the and uses that attribute to identify the layer it wishes to remove.

In the next post on this subject, I’ll discuss combining DOMDocument manipulations and string manipulation to do other fancy stuff like subtracting shapes from one another.