Sunday, December 04, 2005

Advanced search, PostScript, and improving search

Last night, I was trying to find something pretty specific, a PostScript program that generates random mazes when you send it down to your printer. I was having a hard time finding it with quick searches for "postscript maze" and the like, so I switched to advanced search.

What I decided to do was a search limited by filetype to Postscript (.ps) files with the word "maze" in the filename (or URL).

I was surprised to find that only Google supports this query (e.g. [allinurl: maze filetype:ps]). I think AltaVista used to be able to do it, but can't now that it is owned by Yahoo. Yahoo, MSN Search, Ask, none of the other engines can do this particular query.

There is a debate right now about whether search can be improved by giving people more powerful tools (advanced search, MSN Search's "Search Builder", Clusty's clustering, A9's "columns") or whether search just needs to do the right thing (question answering, personalized search).

While I'm not a huge believer in improving search with more powerful tools -- I don't think the mainstream will bother with them -- I'm surprised that advanced search isn't getting more attention. I was amazed that only Google supported this particular search.

By the way, PostScript is a full programming language, though a rather bizarre one, so it really is possible to write very short programs that generates mazes, fractals, and other goodies when you send them down to your printer.

I did a few of these back when I was in undergrad, but I've misplaced the files now, so I went searching so see what other people did. If you want to check out what I found, here's two ([1] [2]) of my favorites. They're PostScript files, so you'll need a PostScript printer or GSView to see them.


Michael Fagan said...

For one thing, since 'maze' is the only word used, I would use intitle: rather than allintitle:

so the query becomes
[title:maze filetype:ps]

Now, in discussing filetype handing, think about this: almost all search engines that can handle this use MIME type, whereas Google uses the file extension.

MIME type is usually much better than file extension, in my experience.

Yahoo using file extension:
[intitle:maze originurlextension:ps]
(note title: works just as well)

Yahoo using MIME type:
append "&vf=[filetype]" to the URL... except PostScript is not supported

MSN using MIME type:
[intitle:maze filetype:ps] .. but wait.. looks like they don't support postscript?

Gigablast using MIME type
[title:maze filetype:ps]
note that type: works too

Exalead using MIME type
[intitle:maze filetype:ps] ... but they don't support postscript

All that being said, restricting a search to titles works fine for web pages, but most other file formats do not really have title metadata.

Note to self: I really must update

Greg Linden said...

Thanks, Michael. But the queries you provided for Yahoo [intitle:maze originurlextension:ps] and Gigablast [title:maze filetype:ps] don't return any results.

It's true that a variant [inurl:maze originalurlextension:ps] works well on Yahoo. And a variant [suburl:maze filetype:ps] does return two results on Gigablast.

But, originalurlextension is undocumented functionality in Yahoo Search -- it is not mentioned in Yahoo's advanced search or any of their help pages -- so it's not clear that it is supported or will continue to be supported.

In any case, the point here is that, if providing more powerful tools really is a good way to improve search, it is a little surprising that more search engines aren't competing on advanced search features.

Greg Linden said...

Professor Kim Border at Cal Tech managed to dig up a copy of the original PostScript file I wrote way back in 1992. Thanks, Kim!

If you are interested in playing with this old file, you can download it.