December 2008 - Posts
I checked to see how CoasterBuzz's search
index was going today, and was pretty shocked to see that it now as
about 15 million rows. Can't say that I've personally written anything
that generated that much data on its own before.
Once upon a time, POP Forums
used SQL's FullText service to index posts. The frustration that came
out of that black box was two-fold: It was grinding the disk and CPU,
and frankly the results sucked. Long before I ever got v8 into a usable
state, I prototyped the current search engine against data from the old
Building something like that is one of those exercises
where you worry about all kinds of scalability issues instead of just
trying to build something and refactor when you fail the first time.
That's why it took me so long to just try something. Eventually I
passed that brain block and wrote something, which predictably sucked.
I kept refactoring until I had a workable solution.
went like this. Find all of the words in a topic, toss out junk words
like "the" and other things people won't search for, and score them
based on frequency. Bonus points if it lived in the topic's title. Then
save the words, along with their topic ID and score.
simply find the topics that rank highest by averaging the rank for each
word. There is an absolutely horrible query built ad-hoc in the data
query that does it. It's partly ugly because it has to page the data,
so there are some weird common table expressions being formed. It's so
hard to read that I'm not even sure where I'd start to refactor it! But
despite this, it works surprisingly well.
I think the one thing
I'd tweak is the scoring, but aside from that the searching part works
pretty well. I'm sure that I'm not the first to think of it. The joy
comes from the fact that SQL Server is fast enough to get the work
done. One of these days I'll see if I can get a guru to look harder at
it and see how it can be made even faster.
Last week I started a new job in an architecture role with a small company that does mostly marketing Web sites for some brands you absolutely know. I'm not sure yet if I should say who they are because I don't want to presume to speak for them in any way. Regardless, I've been looking for a position like this for a very long time.
I'm in a place where hands-on coding will not likely be something I do frequently. I'm responsible for developing standards, mentoring young Jedis, transforming the company from a project culture to a platform culture and looking for opportunities to realize certain business efficiencies when it comes to helping the company grow.
I willingly took a pay cut for this gig. I know I could get more doing some crappy hourly nonsense. But I very much wanted to engage in something that was interesting and challenging to me. I think this is that job. It's interesting how just having the responsibility seems to change the way you think. I was thinking a little about an issue in one of my own projects, and the answers were obvious because of the way the job forces me to think.
This Internet thing continues to lead me on a great many adventures. I wonder what this one will bring.
I have a real love-hate thing with Jason Calacanis. His solution to our woes is that we all work more. For what, I don't even know. Sometimes I think he's a smart guy with the right view on the world, other times I think he's someone who got lucky in the right place at the right time. This is one of those times I think he's got it wrong.
I think that we can agree that over-consumption across socioeconomic lines contributed to the financial mess our country is in. The jerk across the street in the neighboring McMansion driving a Hummer and buying his trophy wife a Coach bag is a disgusting display of excess and vanity, and I know this because my neighborhood might be new, but it isn't high end. But what Calacanis doesn't understand is that not everyone is an ADD-prone entrepreneur who gets off on working most of the day, every day. I don't know anyone personally who lives around Silicon Valley, but I suspect most people who are like that in the rest of the world over-work to support their over-consumption in the first place. The two are related. My neighbor is not like Calacanis.
Remember, the US is the country that sucks at taking vacations. So what is the reason we work at the expense of our non-work lives? Obviously it's the consumption addiction. People work more hoping to make more to feed the need.
My suggestion is that we need to work smarter, not more. Balance your life with work, play and family. Make sure the work component is rational and supports a rational lifestyle. Calacanis is wrong about Google. When we're in a grind to deliver a feature or make just one more sales call or whatever, we're using finite time resources that could otherwise be used to solve problems or create opportunities in novel ways. Innovation doesn't happen when you're in a constant grind.
Have I taken work home or stayed late when something was on fire? Yeah, of course I have. I understand that particularly in technology jobs, sometimes you just gotta put in the time. But in my last job in particular (the last long-term one), the one recurring theme about those instances was that we would figure out how to prevent them next time. We worked smarter, not more, and that's sustainable execution in business.
Calacanis points to a collective "sloth" and then starts waving the flag and talking about how awesome we are. Well, which is it? The truth lies somewhere in the middle. I know librarians having to sell their cars to cover health care costs for their kids and professionals being cut to part time. Work more is about the most asinine thing I can think of when unemployment nears 8%, as it is here in Ohio.
Massive changes in consumer behavior have already begun. Savings are suddenly up and consumer spending is getting kicked in the nuts right now. Go to the mall tomorrow and see how uncharacteristically non-crowded it is for December. I think people are doing the right things, and doing what they've got to do. Not all of us can be dotcom millionaires and pontificate about what we should be doing. We can grit our teeth and try to get through it the best we can, not by working more, but by working smarter.
I openly admit that perhaps I'm not the best blogger when it comes to technology, because I have a very skeptical "why should I care about this" attitude toward everything new. And specifically for programming, because I'm not very computer science oriented.
So when it comes to the various data frameworks of the .NET world, the ORM's, LINQ to SQL and the entity framework, I often find myself asking why I need to care about any of them. With the recent debates about the alleged death of LINQ to SQL and flaws in the EF, I only scratch my head more. The reason for my apathy toward these subjects is that I tend to rely on the "smarter" portions of the community to solve problems for me so I can get to the business of building stuff.
I think the people that I look up to for answers and guidance often get disconnected from the problems that are out there. Frameworks of all kinds often grow into these giant things that do too much and obscure the original problem (see Vista for the biggest case study of all ;)). Or perhaps there is some amount of mismatched expectations when these tools are developed.
For example, the thing that I really want out of any of these data frameworks is less code to do simple things. If I want to insert a row into a table, I want to do it in as little code as possible. Having some kind of object representation is nice, I suppose, but only in certain circumstances (some small line-of-business app: good, some component of enormous system with its own entities: bad). LINQ to SQL does a pretty decent job in this role, but sometimes it makes me do things I don't want to (like create primary keys where they aren't necessary), in the name of a feature I don't care about in that instance like concurrency checking. Then when I look into it more, I start to wonder if I'm really gaining efficiency anyway, since my "few lines of code" can't work without a ton of generated code, or code that I write.
At the end of the day, I find myself going back to little code libraries I've used previously, or even pieces of the old enterprise application block or whatever it was called. It's pretty hard to beat a method that takes a stored procedure name and an array of parameters!