Saturday, June 8, 2013

Beware of the gcc 'ext' directory?

Profiling, I found myself confronted with a C++ program that was dying the death of one thousand calls to malloc and free, courtesy of STL. I set out to see if I could use the STL allocator model to improve the situation.

Off I went to the GCC documentation: http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt04ch11.html. Here I found a pool allocator, but it is documents to have miserable multithreaded performance. In any case, for one of the cases I had in mind, the even cheaper array_allocator seemed to be just the ticket. I could state a reasonable upper bound for the space I needed, and this would just about eliminate overhead.

The documentation here is, to say the least, minimal. I was bemused to discover that googling didn't yield much else -- just a bug report from my old friend Scott McKay. I didn't need to do what he did that hit the bug, so I proceeded. I was bemused, however, by the lack of discussion.

Problem 1: the array allocator doesn't work with basic_string. The basic_string code assumes that the allocator can be respecialized to return a 'char *'.  The array allocator won't do that; it results in a compilation error.

As it happened, the string in question was only used inside the private methods of a class, and the class didn't need many string functions at all. So I switched it to being a vector.

That found me problem 2: The very first attempt to insert into the vector threw a bad_alloc, due to some  other bug. At this point, I threw up my hands, and changed over to the simple use of arrays.

So, there seem to be two morals of the story. If no one is posting anything about 'X' on the internets, it perhaps implies that no one is using 'X'. And if no one is using 'X', there is probably a good reason.



Saturday, November 12, 2011

How to avoid (DY)LD_LIBRARY_PATH with JNI

Some of us are stuck with JNI. We've got a heap of code in C++. Sometimes, we have a heap of code with enough floating point computation in it that the speed advantage of native code is inescapable.

In the simple case, JNI isn't too bad. You make the header, you build the code, and you have a shared library. You can use java.library.path and System.loadLibrary, or you can use System.load and avoid any extra settings when launching java.

Things are not so nice, however, if your C++ code has dependencies on other shared libraries. Listing the directories containing those libraries in java.library.path is not enough.  You'll still get exceptions claiming that your JNI library cannot be loaded. To get rid of those exception, you have to modify PATH, LD_LIBRARY_PATH, or DYLD_LIBRARY_PATH (on Windows, Linux, or OSX respectively). This leads to a world of hurt, particularly when people want to run your code inside a container such as Tomcat.

For Windows, there's a solution involving a Win32 hairball called 'delayed loading'. That's not what this posting will help you with. Perhaps I'll do a writeup some day. At Basis, we worked that out years ago.

Until now, however, we've suffered with LD_LIBRARY_PATH and DYLD_LIBRARY_PATH.

Well, we're not going to suffer any longer. The solution to these problems leaked, finally, into my consciousness, and I've built a testbed to show it off. Here it is on github:

https://github.com/bimargulies/jni-origin-testbed

The code in here shows off the existence of linker options and tools that avoid the need to set those environment variables. On Linux, the critical feature of 'ld' is '-rpath $ORIGIN'. Watch out; it takes some care to actually get the characters '$' 'O' 'R' ... into the ELF file.

On MacOSX, the situation is more complex. MacOSX has this idea that every shared library has a single, proper, installation location, called the 'install path.' Things linked to shared libraries pull that path from the Mach object file, and store it for use at runtime. If you are willing to structure your code as a Framework that follows Apple's conventions for a fixed installation, this all works great.

If not, then there turns out to be a solution. A command, 'install_name_tool', allows you to patch the location where one library (your JNI library) looks for another (its dependencies). The special token '@loader_path' expands to the location of the library itself. Thus, you can express the location of the dependencies by relative path. So long as the JNI libraries live in a fixed location relative to their dependencies, all is well.

Thursday, July 7, 2011

A maven seder

The other day, Stephen Connolly of the Maven PMC was moved to remark, 'Meh! there's a lot of maven haters out there...' This led me to the following musings, which might be thought of 'A Maven Seder, or, the Four Children of Maven.'


Maven may be one tool, but developers encounter it under very different circumstances. Here's a way to look at those circumstances under four headings, and perhaps reveal something about why developers end up with such wildly different attitudes.


The Developer Who Does Not Know How To Ask


Some developers find Maven by adding themselves to a project that already uses it in some sane and stable fashion. To them, it just works. They type 'mvn'. Their tests run, their jar files appear. Maybe they even follow a recipe to push a release. Likely, if they have this experience, the project they are joining has a Maven-friendly shape: sources in the standard layout, lots of dependencies, no need for complex and exotic scripting. These developers may not fall in love, but they may wonder what all the fuss is about.


The Simple Developer


The simple developer actually sets up a Maven build, but has no particular problem in doing so. His or her inputs are some java sources and some 'in classpath' files. The dependences she or he needs are all sitting out there on Maven Central. The output of the process is a jar file, or, at most, a relatively straightforward release package. He or she copies a simple pom.xml file, makes a few tweaks, and all is well. A copy of Jenkins produces instant automated builds, and a copy of Nexus or Artifactory speeds up the process. No giant stress, no giant strain. So long, of course, as nothing goes wrong. The moment this person has to graduate from 'mvn dependency:tree' to 'mvn -X', they are at risk of becoming ...


The Rebellious Developer


These are the folks who fill blogs and mailing lists with warnings to give Maven a wide berth. How do they get this way? Here are some of the ways:


  • They are given the job of taking a complex build with some other build system and adapting it to Maven. This is a hard job at best. It's made hard and annoying when the Maven-mavens insist on replying to all complaints and questions with variation on, "You should just restructure your entire system to fit in with the Maven way of doing things," instead of "Well, it can be a hard job to adapt a complex, highly-scripted build to Maven. Are you sure you have to? Maybe you just want to learn to publish some artifacts."
  • They have a build that has some incompressible complexity in it. They need to create multiple slight variations for different targets, or complex JNI, or interactions with large datasets. 
  • They run into a problem. Maven works great when it works. When it doesn't work, often the diagnostic process leaves a great deal to be desired. Things seem to happen 'by magic.' If the ordinary log messages aren't informative, the alternative is -X, which spews a gigantic amount of content. At best, this is a needle-in-haystack situation.
The Wise Developer

At the other extreme, we have the developer who has become completely assimilated. He or she has internalized the lifecycle, and so knows exactly what to expect. Little is mysterious or surprising. Chances are, she or he has written a plugin or three, having figured out that writing plugins is often much easier to do (and debug) than convincing the more obscure options on the more obscure plugins to cooperate with each other.

This person is, of course, in danger of turning into one of the voices that ticks off the Rebellious category. Isn't that what family dynamics are like?

Lessons for the Maven Community

In my view, there are some simple lessons here. Maven evangelists should be mindful of what is realistic. Better documentation, error messages, and log message never hurt. 

It's easy to write these sentiments, of course. Acting on them is another story. The Maven ecosystem is now a gigantic stack of code, and the committers for it sometimes seem like short-order cooks in a very busy diner.









Sunday, November 15, 2009

How to get a reputation on StackOverflow

A little while ago, I noticed that people were posting questions about Apache CXF on stackoverflow.com. It's easy enough to answer questions, but I wanted more. I wanted to be able to herd the sheep: clean up tags, edit confused questions, and generally improve the quality of the resulting knowledge base. This would require a significant store of reputation, in StackOverflow parlance, so I set out to accumulate it.

It doesn't take much observation of the site to learn that reputation comes from answering questions. Sure, if you ask really good questions, people will vote them up and your reputation will advance. However, most questions don't get upvoted, and who has time to sit around thinking of questions?

But it's not enough to answer questions. Adding the 3rd or 4th answer to a question is not going to get you votes, and votes are what you need. You have to get in there and be one of the first two to post a concise, helpful, answer.

Visiting the site, I found that it was not so easy to take a timely snipe at questions I could answer. The good candidates were buried in the giant mass of questions on subjects where I knew nothing, cared less, or both.

The first solution lept out at me: set up some 'interesting tags'. Then click on the tab that shows only unanswered questions in those tags. Seemed simple.

It didn't work. That tab is nearly entirely populated by questions that have three or four answers. They are still there because no one voted for, or accepted, any of them. In theory, that means that these are mediocre answers, and I could add a superior answer and get some votes.

No such luck. Usually, these are perfectly sensible answers. It seems as if there are not enough site users who bother to vote, and nowhere near enough question-posters who bother to accept answers. The result is usually an impenetrable clutter.

That led me to my second strategy. I started collecting a very large set of 'ignored tags'. To filter out what I didn't want to see, I had to add many, many, tags. Why? StackOverflow uses a flat taxonomy. To avoid seeing Visual Studio questions, you need to exclude about 10 different tags for 10 different versions or aspects of Visual Studio. Repeat this for all the other subjects of disinterest, and soon enough you, like I, will have a long list of ignored tags.

The reward for this labor is that the 'recent' tab under the unanswered questions is suddenly useful. Now, the most recent questions of possible interest appear at the top of the page, just waiting for a quick answer.

Sure enough, this allowed me to pile up over 500 points of reputation in a few days of visiting the site at spare moments.

My points are nothing like evenly distributed over my answers. I posted plenty of answers that collected no votes, and thus no reputation. Many got a vote or two. The big winners were two very simple answers to simple questions: A brief lesson on logarithms and quick reminder of an option to the linux mkdir command.

Go figure!

Anyhow, I've now got the privilege of fixing bad tags, and if this keeps up, I can look forward to permission to edit other people's questions.

One final hint: stay away from meta.stackoverflow.com, unless you are suffering from insomnia or have an urge to count the angels dancing on Zippy.

Thursday, January 29, 2009

Mysteries of Maven

Maven is a very interesting beast. It seems well-established as the successor to ant as the best practice solution for java builds. It has the cardinal virtue of doing simple things simply. Very large ant build files can collapse to very small maven pom.xml files.

It does simple things simply, and the complexity curve is not far from linear as you try more complex things. However, there's a knee in that curve, and I spent this morning with it firmly planted in a sensitive part of my anatomy.

Consider the maven-antrun-plugin. It's typical of maven that it handles the special capabilities of its competitors by absorption rather than religious argumentation. The antrun plugin allows you to embed ant build tasks in your maven build. If you don't feel like figuring out out to reexpress them in purely maven terms, you can just keep them as-is.

Now, I happened to have a multi-directory structure to move to maven, and there happened to be two places in the structure where antrun seemed the better part of valor. The first one worked fine. The second one did not. After much hair-pulling, I was able to demonstrate to myself that the dependencies that I had listed in the second case were ignored.

"Dependencies?" you ask? Maven plugins, like maven-everything-else, come with a list of dependencies that are added to their classpath. If you want to use ant facilities that aren't in the core of ant, or custom tasks, you have to package them up as maven artifacts and list them as dependencies.

Skipping to the end, there's a bug in core maven that causes all but the first set of dependencies on a plugin to be ignored when multiple projects are evaluated together. Fair enough. Things have bugs. What seemed interesting to me was how wildly opaque this bug turned out to be.

The maven-dependency-plugin didn't reveal much, since it doesn't seem interested in plugin dependencies, only module dependencies. The debug log (-X) didn't add anything to the situation. Like many debug logs, it is a compendium of things that seemed interesting to a developer chasing a particular issue at a particular time. I found myself wondering, "what would fail-soft consist of in this case? Could there be some other medium-level debug output more adapted to people debugging their POMs than to people debugging the inside of maven itself?"

Debugging is a understudied problem.

And there turn out to be two open bug reports on this against the plugin, even though it isn't the plugin's fault.

Saturday, January 17, 2009

Another Voyage Through Eclipse

I spend a significant fraction of my working time staring at Eclipse. I fear that I'm a lifer. Even though rumors reach my ears that Netbeans has become usable, I just can't motivate myself to consider an alternative.

Some years back, I did some development in Eclipse. I was able to convince a client of ours that the best way to set up a linguistic workbench was Eclipse. We built a set of plugins. We submitted a raft of bug reports. Even a few patches. It worked. Sadly, the client decided that they preferred .NET, so off they went. Given the problems we had getting Arabic and Hebrew text to work right in Eclipse, and my recent discoveries, I'm not entirely sure that they were crazy.

All that is background. In the last week, I set out to build a simple tool for linguistic annotation. That is, display some text, and give the user very convenient, keyboard-based, tools to mark words.

I found http://www.vogella.de/articles/RichClientPlatform/article.html, which looked to be a good solution to the fact that the books can never keep up with the ongoing redesign.

And then the fun began.

The fundamental thing I wanted in my program was an editor. Historically, Eclipse editors have tended to be more entangled with the full IDE than other kinds of components. This problem turned out not to be solved in 3.4.

The turorial explains how to use the most recent Eclipse mechanisms for setting up the File menu and all of that. Unfortunately, the tutorial does not mention that as soon as you add an editor, the simple prescription fails.

To add an editor to an RCP application, you add an extension under org.eclipse.ui.editor. And, the next thing you know, you have a file menu, whether you want one or not. Before you can add the File Open that you want, you have to get rid of the old one. All this is complicated by the transition from 'actions' to commands. Under the 'new administration', things are supposed to be organized around commands, but there remains, not surprisingly, a mountain of code, examples, and documentation from the old universe.

Thankfully, a bit-o-googling revealed a recipe for this.

Next? Make Arabic look right.

When I last worked in this environment, the top of my lap was a Windows box. (I've since spent some time chained to a copy of Ubuntu and moved on to a Macbook Pro.) There were some problems with Arabic; the SWT people hadn't thought all that much about an application running with an overall English locale that needed to do a clean job of displaying some RTL text. However, we got reasonable results on Windows.

Mac OS X? Uh, oh. SWT, it seems, has some fundamental assumptions about how systems will support RTL languages. Even though RTL languages (in general) work fine on Mac OS, there is an impedance mismatch with SWT. Net result: there is no simple way to set up a StyledText component to display Arabic or Hebrew correctly. Luckily for me, I don't absolutely need this tool to look beautiful on MacO OS.

Friday, December 12, 2008

Maven, Eclipse, Checkstyle, PMD, Oy, Vey.

Code quality is a wonderful thing. You can run PMD and Checkstyle to enforce consistency check for inconsistencies and problems.

If you're like me, however, you really hate to edit for two hours and then get handed a long list of complaints. Or, worse, to ask your IDE to neaten up the code and find out that it's been 'neatened' to the tune of 100 Checkstyle complaints.

I use Eclipse. Eclipse has plugins for Checkstyle and PMD, and a highly configurable set of tools for formatting and code cleanup. Configurable is nice, but it's a tiresome process to establish the right configuration for all my projects and workspaces.

I wish that there was some automatic way to decorate an Eclipse workspace with all the right settings to match up with how some project uses Checkstyle and PMD.

Luckily for me, Dan Kulp of the Apache CXF Project did a vast job of reverse engineering to figure out how all the Eclipse configuration works. The Maven POMs for CXF arrange for the same Checkstyle and PMD configurations to be used for Maven and Eclipse, and set up the Eclipse formatting and cleanup options to match.

I did a bit of reverse engineering of Dan's reverse engineering here.

Aside from the goodness of this when working on CXF, I was able in an hour of so of work to adapt it to my day job.

Analytics