May, 2004


4
May 04

Mon, 03 May 2004

I really sympathize with the RH guys- they are trying to do create Debian, from scratch, with their Fedora stuff. That’s a noble task, an important task for the company, and an incredibly !@##ing hard task. It seems like this is the best summary I’ve seen of the progress so far, though.


3
May 04

Mon, 03 May 2004

  • Got stymied over the weekend by !@@!# perl, but did figure out the basics for a very fast stack trace duplicate finder. Not perfect, but should mostly be isolated enough to not matter.
  • Lessig’s new book rocks so far. Factoid: I knew Disney lifted all the Brothers Grimm tales, but I did not know that even the first Mickey Mouse talkie (Steamboat Willie) was a parody of Buster Keaton’s Steamboat Bill, Jr.- which had come out only a year before. Live and learn.
  • “We’re losing the ‘war’ on terrorism b/c it’s a horrible way to define the scope of a conflict. Common nouns like ‘terrorism’ (or drugs or crime or poverty, for that matter) can’t surrender and promise never to attack again the way that proper nouns like Germany or Japan can.” –Paul Sholtz.
    Brilliant summation of a whole lot of failures.

1
May 04

Sat, 01 May 2004

The first step in my plan to take over the world is complete. I whipped up a little script that (after much pain) dumped the first five function names (more or less) from all crashes in bugzilla (more or less) into a new table, where I can glean experience and data from them.

The idea long term is that you’d be able to submit a bug and instead of the current time consuming manual stack matching after submission, or a very slow match on the bodies of the 400K comments, at submit time a very quick query on this new table could say ‘actually, we think this is a duplicate of bug XXXXX- look familiar?’ Less spam for everyone, less work for bugsquad, more accurate data on what things are reported most often.

Short-term there are three things I should do: first is to resurrect and clean up the old simple-dup-finder, which ran on the same principle as this experiment, so worst-case it is no more/less accurate than the old one, and a hell of a lot faster. Second is to figure out exactly how accurate it is- I think I should be able to do some queries that will be pretty revealing of how this auto-matching compares with human matching. Just have to figure out exactly how to do the comparison to get meaningful data. Third is to think about how to make this permanent. I think since it is all in a separate table, the upgrade risk is low- even if the table gets nuked in an upgrade, we could just re-run the script that created the table in the first place. More complex is any UI implications and how they are handled. At least at first these scripts will all live in a separate directory and not interact with the rest of the UI, so for the time being it isn’t a big issue. We’ll see, of course.


This work by Luis Villa is licensed under a Creative Commons Attribution-ShareAlike 3.0 United States.