fogbound.net




Page 1 of 6212345...1020...Last »

Fri, 14 Jul 2017

Surveillance, Big Data, and Big Stupidity

— SjG @ 4:21 pm

(This post was started in March of ’16, revised later.)

Recently, a friend I’ll call Cassie was on a trip abroad to a country I’ll call Absurdia. She went to access her Google mail account, and was promptly locked out by the clever security system. It had determined that someone was accessing the account from overseas. Presumably, she was asked one or more security questions that she couldn’t answer (“When did you first create this account?”) along with one or another of her own security questions. OK, bad on her, you might say, for not remembering the answers to her security questions, and hooray for adaptive security that protected her account from unauthorized access!

But let’s examine that for a moment. Adaptive security recognized that the access was from a new place — not merely a different computer or IP address, but a different country. Great, makes a lot of sense. But if we step back to the weeks before her departure, Cassie was being served ads for hotels around Absurdia. She was being served ads for taxi companies in Absurdia, airline bargains for nonstop flights to Absurdia, and online language courses in Absurdese. You see, Google processes GMail messages, and extracts keywords and knowledge in order to serve ads that the user will find interesting1. When Cassie emailed people about her upcoming trip to Absurdia, Google’s algorithms understood enough to start serving travel related ads for the place. Google “knew” that Cassie was going to Absurdia. But this knowledge was not propagated beyond the ad-serving system.

Back in the 80s, my sister did a semester abroad in Rostock, in what was then the German Democratic Republic — East Germany. There was a very limited exchange program between Brown University and the GDR, and she was one of a handful of American students who took advantage of it. We have some family history in Rostock. A great-aunt had lived there, and my sister wanted to do some research on what had become of her. This great-aunt had been elderly by the time of the Second World War, and my sister wanted to know if she had died of natural causes (sadly, it turns out that she had not).

Now, the reason I’m telling this seemingly unrelated story involves something that happened years later. After the reunification of Germany, and as part of the national reconciliation process, people could request their Stasi files. That’s the collection of data that had been accumulated by the Staatssicherheitsdienst — the secret police — gathered via informants, phone taps, reading mail, and so forth. Naturally, during the tense Cold-War Reagan years, the East German security apparatus assumed that any American who would study there was a CIA agent, so my sister’s file was extensive.

Her file was also slightly ridiculous: pages and pages of hand-written notes, filled with scuttlebutt and rumor. What was particularly enlightening was just how far off base the operatives had been. They missed critical details, and misinterpreted others. My sister’s attempts to track down our great-aunt became, in their notes, a frustrated attempt to make contact with a hitherto unknown agent. With all the data they gathered, with all the information they accumulated, there was no actual gain in knowledge. In fact, there could have been even greater costs: the incorrect assumptions and misunderstanding could have resulted in the agency siphoning off resources to pursue this phantom.

Now, you might suggest that I’m the one who is missing the point here. Perhaps, you could argue, that this is the nature of bureaucracy. The agents monitoring my sister were obligated to report to their superiors, so they grasped at whatever straws were available, and willfully ignored clues that would get in the way of a narrative that would please the authorities.

But in a way, that is the point. Surveillance generally finds what it’s seeking and only utilizes it for the purpose at hand.

In this day where Big Data is a tech industry buzzword, we continuously see articles on “business intelligence” and adaptive systems. More data gathering will solve all kinds of business problems. We read that credit card companies can predict divorce, that Target Stores predict pregnancies, and so on2.

And there are other successes. In the last year, there was a fascinating article on how a programmer helped discover cheating in the crossword puzzle world. “I guess that’s the nature of any data set. You might find things you’d rather not see,” said one of the people who contributed data to the collection that ended up confirming the plagiarism.

But Amazon still serves me ads for, say, umbrellas for weeks after I actually buy an umbrella from them. Maybe they set the flag when I look at the products, but don’t unset it when I buy one. I do work on the MarriageToGo.com site from a new computer, and suddenly I’m being served wedding ads. Ads are scattershot, and the only penalty for throwing stuff at the wall to see what sticks is the lower-value ads could crowd out the higher-value ads.

This kind of bad data processing is annoying, but not harmful. The same is not true with crime-prediction, voter targeting, insurance assessment, and other tasks upon which “deep learning” is being brought to bear. If the AI is built with bad assumptions, it can have serious effects on people. Training AI with “real world data” that’s been filtered by the status quo is equally dangerous. I think it’s obvious what happens. You can become un-insurable, denied loans, put on a no-fly list, and worse. “I do assure you, Mrs. Buttle, the Ministry is very scrupulous about following up and eradicating any error.”3

Whenever I fuck up something spectacularly in a complicated piece of code, I think of the Donald Fagen lyric:

A just machine to make big decisions
Programmed by fellows with compassion and vision

Unfortunately, as we see time and again, both of those attributes are often lacking. Stressed or overworked programmers, get-rich-quick VC and startup culture, bad assumptions, and a lack of examining the biases built into data sets all contribute to the failure of our machines to live up to that ideal.

1 Google issues a blog post at the end of June 2017, saying this practice would stop.

2 Interestingly, in the update to that article, Visa indignantly claims they do not track marital status, nor offer a service to predict divorces. Maybe the protest is carefully worded to hide their capabilities, or maybe it’s straightforward and honest. The fact remains that credit card companies know an enormous amount about their customers.

3 As Terry Gilliam, Tom Stoppard, and Charles McKeown captured so deftly in Brazil

Filed in:

Tue, 9 May 2017

This too shall pass

— SjG @ 9:49 pm

Back in October of 2015, I started writing the following, and never finished or published it:

I upgraded the Mac to Yosemite a year or so ago. Yesterday, I wanted to do some development on a project that I’d been idly thinking about. Unfortunately, it required a dependency in a package I’d installed via Mac Ports. I tried to upgrade it, but got an error that I was compiling for the wrong Darwin version. This means I haven’t actually updated any of my Ports since upgrading to Yosemite! For shame.

Rather than fix Mac Ports for Yosemite, and then again when I upgrade to El Capitan, I decided it was time to do that upgrade and then fix it. I also thought … hey, there’re all these neat new container technologies and configuration tools. Maybe I should look into some of those, and save myself the agony next time around.

So I dove into some articles, and pretty soon had become a seething mass of quivering rage.

To set up my environment in Docker, I need Docker, and a VM. I could set it up using Vagrant, or, as some people recommend, Vagrant running Chef or Solo. Then, of course, I need to set up some replacement for vboxsf so I can access my files in the Virtual environment. Each of these requires its own configuration, of course.

Today, I was struggling with something similar. I’m building an iOS app. Years ago, I’d built a few native iOS apps, but I’ve forgotten everything I ever knew about Objective C, and I don’t know Swift. Plus, I need to publish for Android too. So, six months ago, when I started this process, I decided I’d be using Ionic Framework. It had the advantage that it was based on AngularJS, and I’ve done some work in Angular.

Now that I’m starting, I discover that Ionic 2 is the way to go — oh wait, not Ionic 3 was just released! And my AngularJS experience is ancient v1.2.x, knowledge which is largely obsolete. I’d be learning Angular 2 — no, we’re up to Angular 4 now — so better get cracking on that.

I remember, many, many years ago, how excited I was was there was a new version of Windows. I couldn’t wait to get all those 3.5″ floppies home so I could upgrade my machine to the latest and greatest. Now, I dread each year when a new version of Mac OS comes out, and I need to upgrade and track down all the things that broke, and rebuild my ports and and and… Not to mention when I installed a recent Linux on a VM to host some sites, and discovered to my chagrin that systemd has replaced all manner of things Unixy that I’ve been doing mostly-the-same for thirty years.

Well shit. There it is. I’ve become the grumpy old software guy. “Why are they changing things? Why can’t they just leave them alone?” The fact is, some of these changes are indisputably improvements. But so many of them seem to be changes for the sake of change. We have to have “new, improved!” all the time, even if it’s just changing the syntax (why, oh why, is *ngFor so much better than ng-repeat !?).

Part of this is struggling with obsolescence in general. It’s hard being middle-aged in tech. You can’t help seeing that look in the eyes of the youngsters: that old guy is so backwards. But it goes beyond that. My neighborhood is changing around me. Younger families are moving in, and suddenly I’m that guy who’s been in the neighborhood for a long time. I find myself navigating by past landmarks — it’s across from the Burger King, er, those condos, right by the Foster’s Freeze, er, Dunkin’ Donuts. The world is changing around me rapidly. The political world I grew up in has shifted. Every year, I see more obituaries for people I know, or whose names I know. Things that were true when I was a child are no longer true.

I have vague memories of hearing these thoughts expressed when I was younger by people I thought were old. I didn’t understand them then. I’m beginning to understand them now.

I just have to remind myself that change is constant, and not all bad. When I was a kid, there were no known exoplanets. When I was in my twenties, I’d come home from a night out, and I’d be stinking of second-hand cigarette smoke. We had to struggle with card catalogs to find books in the library. When trying to reach my friends, I’d have to leave messages on their home answering machines, and I’d have to call from a pay phone where I’d enter in a multidigit phone card number. If you were interested in obscure music or books, you’d have to read tiny ads in the back of magazines to track down sources or information. LGBQT people were all but invisible, and same-sex marriage was barely even in the realm of speculative fiction.

So, that being said, I’d like change to slow down a bit. Could I please just finish a project before all the constituent languages, libraries, and frameworks have a major version increment?


Mon, 16 Jan 2017

Slow Reality TV

— SjG @ 10:45 am

In the garden, we have a variety of highly aggressive, imperialistic vines that compete for space.

In one hotly-contested piece of real estate, have creeping fig (Ficus pumila), pink jasmine (Jasminium polyanthum), asparagus fern (I think we have both Asparagus setaceus and Asparagus aethiopicus), fo ti (Fallopia multiflora), and red trumpet vine (Campsis radicans). The soil there is just a solid mat of roots and runners.

I propose a very very slow reality TV series. We get a big pentagon of fertile soil, light it evenly from all sides, water it gently on a regular basis, and plant one of of these vines in each corner. After a few years, we’ll see which is the victorious species!

Filed in:

Thu, 22 Sep 2016

Checking Solr index with nagios: obsolete versions

— SjG @ 12:33 pm

I needed to check that the index process that populates the Solr index succeeded and didn’t die during the night, leaving an empty index.

To make things more complicated, the versions of Solr and nagios in use are probably not the latest.

The check_solr -o numdocs command doesn’t work with our Solr configuration. But the internet tells me that the Solr query http://localhost:8983/solr/select/?debug=q‌uery&q=*:* includes the size of the result set. Testing it, I found this to be true:

<response>
   <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">0
      <lst name="params">
         <str name="q">*:*</str>
         <str name="debug">q‌uery</str>
      </lst>
   </lst>
   <result name="response" numFound="9832" start="0">
      <doc>
...

I want to use nagios to check that that numFound is never zero (or too small). I thought I’d just be able to use a nagios regex:

check_http -H localhost -p 8983 -u "/solr/select/?debug=query&q=*:*" -lr 'numFound=\"\d{2+}"'

It didn’t work. To make a long story short, there’s regex and then there’s regex. The kind that works for nagios is:

check_http -H localhost -p 8983 -u "/solr/select/?debug=query&q=*:*" -lr 'numFound=\"[1-9][0-9][0-9]'

This guarantees at least a hundred docs are in the index.


Tue, 7 Jun 2016

JavaScript compares things weirdly

— SjG @ 2:52 pm

We’ve already established that PHP compares things weirdly.

It shouldn’t surprise us that JavaScript does too.

Consider the following:

> var k=['hello'];
undefined
> (k=='hello'?'Equals':'Nope');
Equals

Now, purists will point out that that’s an “equals” operator not an “identity” operator, but I mean seriously? We’re just going to pretend that


> ['hello']=='hello'
true

I think I’ll just go and rewrite all my client side code in C now.


Page 1 of 6212345...1020...Last »