fogbound.net




Wed, 13 Jun 2012

Building Direct IO library for PHP with Mac Ports

— SjG @ 11:47 am

Say you’re developing on a Mac, and want to test some PHP code that has calls to the direct IO library. You may not actually have a physical serial port, but your unit tests will fail in the wrong way if the library isn’t present. You want the unit tests to fail in the right way!

If you do the expected thing, you’ll find that dio is currently beta:

root# pecl install dio
Failed to download pecl/dio within preferred state "stable", latest release is version 0.0.5, stability "beta", use "channel://pecl.php.net/dio-0.0.5" to install
install failed

So you have to try to do it the hard way:

samuel:~ root# pecl install channel://pecl.php.net/dio-0.0.5
downloading dio-0.0.5.tgz ...

Easy-peasy, eh? Not so fast!


/private/tmp/pear/temp/dio/dio.c: In function 'zif_dio_fdopen':
/private/tmp/pear/temp/dio/dio.c:138: error: 'EBADFD' undeclared (first use in this function)
/private/tmp/pear/temp/dio/dio.c:138: error: (Each undeclared identifier is reported only once
/private/tmp/pear/temp/dio/dio.c:138: error: for each function it appears in.)
make: *** [dio.lo] Error 1
ERROR: `make' failed

Fortunately, via this bug report, we can see what to do:

# pecl download channel://pecl.php.net/dio-0.0.5
# tar xzvf dio-0.0.5.tgz
# cd dio-0.0.5
# phpize
# ./configure

Then edit dio.c, and change line 138 to:

if ((fcntl(fd, F_GETFL, 0) == -1) && (errno == EBADF)) {

Then, finish up:

# make
# make install

Then, create a file called “dio.ini” in /opt/local/var/db/php5/ containing:

extension=dio.so

Now you can run your tests!


Tue, 12 Jun 2012

Spelling Suggestions from Solr using Yii-solr

— SjG @ 7:48 pm

So a Yii app wants to query Solr using the Yii-solr extension, and wants to provide spelling suggestions for the provided search term. We need the suggestions to be unique, i.e., no repeated suggesttions, and we don’t want negative suggestions, i.e., if the user’s search wanted to exclude “foo”, we don’t want to suggest “foo” as a possible correction. There’s probably a better way to do all this, but I couldn’t figure it out. Here’s what I did instead (as always, forgive the code style/formatting):

I created a subclass of ASolrConnection that switches to using the “spell” Solr servlet:


class SolrSpellingConnection extends ASolrConnection {
public $servlet_type;
public $servlet_path;
public function resetClient()
{
parent::resetClient();
if (isset($this->servlet_type) && isset($this->servlet_path))
{
$this->_client->setServlet($this->servlet_type,$this->servlet_path);
}
}
}

This is instantiated in the app’s config file:

'solr' => array(
'class'=>'app.components.SolrSpellingConnection',
'clientOptions'=>array(
'hostname'=>'localhost',
'port'=>8983,
),
'servlet_type'=>SolrClient::SEARCH_SERVLET_TYPE,
'servlet_path'=>'spell'
),

Then, my Solr query data-provider looks something like:


$criteria = new ASolrCriteria;
$criteria->query = $queryterm;
$criteria->set('spellcheck','true');
$pages=array('pageSize'=>15);
$dataprovider = new ASolrDataProvider(ASolrDocument::model(),array('criteria'=>$criteria,'pagination'=>$pages));
return $dataprovider;

In my controller, I add some code to process the suggestions:

public function buildSuggestionList($dataprovider, $originalQueryString)
{
$suggestions = array();
$terms = array();
if ($dataprovider != null)
{
$facets = $dataprovider->getQueryFacets();
$resp = $dataprovider->getSolrQueryResponse()->getSolrObject();
if (isset($resp['spellcheck']))
{
if (isset($resp['spellcheck']['suggestions']))
{
foreach ($resp['spellcheck']['suggestions'] as $thisSuggest)
{
if (is_object($thisSuggest) && get_class($thisSuggest) == 'SolrObject')
{
if (isset($thisSuggest['suggestion']) && is_array($thisSuggest['suggestion']))
{
foreach($thisSuggest['suggestion'] as $thisTerm)
{
$terms[$thisTerm]=1;
}
}
}
}
}
}
}
foreach($terms as $termKey=>$val)
{
// Solr adds negated or added terms to suggestions if they don't match case
if (!preg_match('/\b'.$termKey.'\b/i',$originalQueryString))
{
$suggestions[] = CHtml::link($termKey,'/index.php/mysearchcontroller?q='.urlencode($termKey));
}
}
return $suggestions;
}

I can then display the main collection of results using a CWidget. For displaying suggestions, however, a bit of specific code is required:


$suggestions = $this->buildSuggestionList($dataprovider, $originalQueryString);
if (count($suggestions) > 0)
{
echo '<p>Did you mean '.implode(' or ',$suggestions).'?</p>';
}
?>

When this is all in place, you should get decent suggestions from Solr. Of course, you will need to have Solr build a spelling index! The easiest way to do that is simply connect to Solr’s web interface and tell it to build the index:

http://localhost:8983/solr/spell?spellcheck.build=true


Tue, 17 Jan 2012

Sign of the Times

— SjG @ 2:58 pm

I think it’s a sad, sad sign of the times that most Linux distros not only omit figlet from their standard installations, but often don’t even offer it in their package managers.

       _           _                   _                      _ 
  __ _| | __ _ ___| |   __ _ _ __   __| | __      _____   ___| |
 / _` | |/ _` / __| |  / _` | '_ \ / _` | \ \ /\ / / _ \ / _ \ |
| (_| | | (_| \__ \_| | (_| | | | | (_| |  \ V  V / (_) |  __/_|
 \__,_|_|\__,_|___(_)  \__,_|_| |_|\__,_|   \_/\_/ \___/ \___(_)

Thu, 29 Dec 2011

Farewell to CMS Made Simple

— SjG @ 3:52 pm

I have been getting a fair number of CMS Made Simple requests lately, and it appears that people are not aware that I am no longer associated with that project. I feel it’s necessary to write this post to clarify the record, and explain why I will not be participating in any future CMS Made Simple related development, attending the Geek Moots, or writing any more books on the subject.

In short, I was losing confidence in the direction of the project. So, on November 14th, when I was asked by the current project leader to resign from the Core Development Team (as an alternative to being thrown off for my lack of participation1), and after 7 years of Core Team involvement, creation of 20-some-odd modules, attendance at four Geek Moots, and writing one of the three books published about the project, I resigned.

The above is all that is important in this post. For my own amusement, I will present some opinions on the future prospects of CMS Made Simple, and, for the people who are asking me about alternative F/OSS CMS software, I’ll provide a few thoughts.

Outlook

At the time I left, the leadership of the CMS Made Simple Core Team has been involved in creating a non-profit organization to own, fund, and shepherd the project forward. There have been various efforts to expand corporate sponsorship and exclusive promotional relationships with hosting companies. These are all positive developments for the project, and provide the possibility to take it to “the next level.” This organization will still have a few hurdles to overcome for CMSMS to be successful. Here’s what CMSMS will need to do in order to achieve its potential2:

– Steady funding. The nonprofit organization and promotional relationships will likely solve this problem. This will enable the project to have more than one full-time person employed.

– Audience definition. For at least five years, there has been an ongoing battle over what the desired target demographic is for CMSMS. Originally, it was targeted at the small, Mom’n’Pop shop or web designer. The current thinking is that it’s for “experienced web developers.” Similarly, there has been ongoing debate as to how the multilingual community will be best served, and there have been several shifts in direction. For CMSMS to succeed, it will need to have a broad enough target to sustain interest, and a narrow enough target that the team can implement that target’s requirements.

– Overcome architectural limitations. The core architecture of CMSMS is a blend of the “old school” procedural code and weak objects that reveal its roots in PHP versions before 4.3, along with more modern PHP 5 constructs. The limitations that this blend imposes led to the desire to make a clean break, and build a Version 2 with a fully modern architecture. The CMSMS 2.0 project was eventually scrapped, however, as the leadership was unwilling to freeze development and stop adding features to the 1.x version, thereby starving 2.0 of developer resources.
The current approach is an incremental one, with each major dot-release making structural improvements (while altering the API and breaking backwards compatibility). The problem with this approach is that it requires exponentially more work on the part of module authors and the people who maintain sites. For CMSMS to support a modern feature set, however, the older code will need to be rewritten.

– Re-establish and adhere to a plan. At one time there was a road-map laying out feature sets that would be available in subsequent releases. Such a plan guides development decisions and helps ensure that the resources for a project are allocated wisely. This prevents the kind of problem-solving that addresses one immediate problem but creates subsequent problems.
A new road-map would be a positive step to help the documentation problem. It’s just not possible to adequately document a system that goes through changes on an ad hoc basis. With a clear plan, APIs can serve as intended — as a contract with the programmers — and can be documented in a way that’s helpful to both experienced and new users.

– Cement the community. The current leadership of CMSMS has very strong opinions about control of not only the code and the branding, but the conversation about the code and branding. People are regularly banned from the forums or IRC channel for violating rules that forbid talking about core modifications (“mods”) or for failing to follow protocol in asking questions. This comes across, fairly or not, as the Core Team having an adversarial relationship with the end users.
While it’s not visible to outsiders, the tenor of the developer’s channel on IRC was often one of open contempt for end users: there are frequent nominations for “idiot of the day” awards, for example, or the impugning the character of entire nationalities. This attitude is also reflected in slightly more oblique terms on public forums.
To be fair, any Open Source endeavor encounters some definite problems: end users who feel entitled to try to dictate features or demand free work, troublemakers and trolls who will try to tear down a project out of sheer perversity, and people who will complain about any decision that’s made. Projects must find ways to determine which problematic people to ignore and which to engage or eject. A more tolerant CMSMS environment would attract a larger community, which would dilute the power of difficult users (one troll in a small community is a lot more influential than the same troll in a crowd of hundreds).

Alternatives to CMS Made Simple

I’ve had a fair number of people ask me what I’m using for projects, if I’m not using CMS Made Simple code any more. While I can’t go into specifics of projects, I can comment on a few systems that I’ve explored and/or used recently.

Concrete 5 is an MIT-licensed Content Management system with a modern core. It has all of the technical buzzwords I look for like OO, MVC, database abstraction, data objects, user/groups permissions, and jQuery. It has some nice features like a rich core API, safely override-able core files, content versioning, in-context or admin-side content editing, and more. They have a store on the project page, enabling developers to sell add-ons or themes directly through the main page, as well as smart approaches for rewarding community members and contributors of both code and documentation. The Concrete5 team appears to have thought about many of the issues that I consider problematic with CMSMS, and, in many cases, come up with what seem like good solutions.

The Yii framework is not a Content Management System — it’s a development framework. I often used CMS Made Simple as a starting-point framework for a project, since it provided me with a lot of features such as templating, database abstraction, and an authentication system — the content-management aspect was just an added bonus. For heavy development projects, I find the Yii framework to be very well thought out. It’s a modern, MVC-based architecture that has database abstraction, a performant ActiveRecord implementation, authentication and role-based access control, support for AJAXY interfaces, and includes scaffolding and testing facilities to get projects up to speed quickly. Every time I find myself thinking “It’d be great if there was a way to do X,” I discover that Yii has it built in, or has it available as an extension.

Conclusion
The CMSMS project was a significant part of my life for the better part of seven years where I met a lot of great people, made some lasting friendships, and learned a great deal about coding and writing. I hope that this posting is taken in the spirit it is written. I offer best wishes for the project and my friends there; I hope that the CMSMS team is successful and the project flourishes.

1 Actually, I was told that the leadership “would rather that you gracefully resign than our having to have you removed via a public vote.” It might have been entertaining to force a vote, but not entertaining enough for me to actually make the effort.

2 These are, of course, purely my own opinions and should not be construed as anything more. I imagine the current team has identified its own goals and milestones.


Tue, 20 Dec 2011

Kerning Pairs

— SjG @ 11:22 pm

I’ve been playing around with font creation for a couple of projects (more on that will be posted here at some point). One of the more surprising aspects of computer typography is the sheer complexity of it — I may have once naively thought that just it was just a matter of splatting characters … er … glyphs out to some display device based on simple shapes, but I was sadly mistaken. In fact, True Type and its successor Open Type not only use complex mathematical equations for creating the curves that define font outlines, but they also contain rules for scaling, hints for rendering these “mathematically perfect” curves on a bit-mapped display, and metrics for spacing character combinations. Open Type has its own internal language for doing such complex tasks as replacing some glyph pairs with ligatures, or doing fancy substitutions of glyphs depending on the surrounding glyphs or other rules. This allows ambitious font designers to do such things as imitate handwriting or handle non-Roman languages naturally (for example, in Semitic languages, the same letter may be written quite differently if it’s at the beginning or end of a word, and sometimes also depending on where it is in the sentence).

There’s a lifetime of complexity in typography, and, as yet, I’ve only been swimming in the shallow end. Still, I was deep enough to be playing with kerning pairs. Kerning involves moving letters so they fit together nicely. For a visual demonstration and nice game, take a look here. This does more to explain kerning than anything I could write.

The program I’m using for font creation has a facility for creating kerning pair metrics. You can type in a pair of letters, and then adjust the spacing for that particular pair. Of course, you can’t really go through and tune them all1: consider the case where you only have upper case letters and digits from zero through nine. Neglecting accented characters, we’re talking 36 glyphs, or 666 combinations. Now throw in lower case, punctuation, etc, and you have an enormous list of possible combinations to tune.

But think about it for a moment. There are characters combinations that will want tuning in just about every kind of Roman-character-based font, like “VA” or “To” or “ij”. Equally, depending on your language, there are character combinations that will almost never need to be combined. For example, in English, you’ll almost never see a lowercase letter followed immediately by an uppercase, or combinations like “Yq” or “Td” or “zn” in sequence.

So in the interest of selecting kerning pairs intelligently, I wrote a script to analyze character combinations. My target audience is English-speakers, so for my source data, I used English-language texts. But which English texts to use? Being an absurdist, I selected Emma by Jane Austen, At The Mountains of Madness by H. P. Lovecraft, The Adventures of Tom Sawyer, by Mark Twain, An Inquiry into the Nature and Causes of the Wealth of Nations by Adam Smith, Alice, or The Mysteries, Complete by Edward Bulwer Lytton, Tales of the Jazz Age by F. Scott Fitzgerald, Tarzan of the Apes by Edgar Rice Burroughs, An Unsocial Socialist by George Bernard Shaw, the collected writings of Thomas Jefferson, the complete works of William Shakespeare, the Project Gutenberg license text, and the Unix version of the English Dictionary that lives in /usr/share/dict/words.

To analyze the data, I loaded up the text, and stripped out all but the letters, digits, and the following punctuation: period, single-quote, double-quotes, exclamation mark, question mark, comma, semicolon, colon, left parenthesis, and right parenthesis2. I took all of the two-character combinations, and filtered out all pairs where one character was a space. Then I simply counted the number of instances.

Of course, the statistical analysis doesn’t match the experience of reading. While the frequency of combinations that start with an uppercase character followed by a lowercase character is low, those are possibly more important than combinations of lowercase characters. After all, they start out each sentence, and are very visually prominent. Additionally, the shapes of letters increases the propensity of these combinations to need kerning adjustments. With these thoughts in mind, I generated a file of statistics from the same texts, but based solely on combinations containing an uppercase character.

You can download the lists for your own nefarious purposes. Here’s the complete list, and here’s the list containing caps. In the complete list, there is what appears to be bad data. Keep in mind that the text contained such things as Roman Numeral chapter headers, older style numeric abbreviations (e.g., “3dly” and “23d”), some currency abbreviations (e.g., “1s.6d” or “1/6d”, both of which stand for 1 shilling and sixpence), and poetic contractions (e.g., “oer,” “stol’n,”, or “capdv’d”). I also see what I suspect are errors due to imperfect OCR of the original texts.

Last, but not least, I have two files which are my collection of The 128 Vitally Important Kerning Pairs and The 255 Important Kerning Pairs With One Repeat which comprise the most common combinations from the other two files as a single text for examination when testing a font.

1 Ideally, the way you define the spacing of the glyphs themselves saves you from having to tune all combinations. Most should start out looking pretty good. But you do, of course, want your font to lay out perfectly, hence the rest of this discussion.

2 This was admittedly an arbitrary choice of allowable punctuation. I also excluded accented characters like ü and à which would obviously need to be taken into consideration for many European languages. Since my focus was on English, I deemed them rare enough to ignore.