David Stone's Blog

I'm open to suggestions for a subtitle here! (Really!)
Vint Cerf and the 700MHz Spectrum

Dare asks:

Google just pledged to spending up to $4.6 billion to license the 700MHz wireless spectrum in what the company has described as the most significant auction of wireless spectrum in history by the U.S. federal government. Why is this auction so significant and what kind of services can we expect from Google if it wins the auction? 

Here's a quick recap for those of you who are just now hearing about the 700MHz auction.

Essentially, this is the spectrum that's been used for broadcast television. Since the FCC has ruled that all cable companies must go digital next year, the FCC is auctioning off this chunk of the airwaves to corporations. The thing that makes the spectrum so valuable is the fact that it easily penetrates walls (as you know if you've ever dealt with those rabbit ear antennae for TVs). As such, it's the ideal piece of the airwaves for use in wireless broadband service. And it's generating quite a bit of controversy, which essentially boils down to Telecom vs The People.

The telecom industry wants the spectrum because it would increase their strangle hold on the United States' broadband internet access. By controlling it, they would have the ability to keep broadband speeds at their current level and provide minimal service to the people using broadband over that chunk of the spectrum.

The people want it to be open for more competitors to come in and provide wireless broadband services. As we've seen in areas where Verizon has installed their FIOS system, cable companies will respond to the pressure and increase their customers' bandwidth. Increasing the number of companies providing broadband access in a given area will decrease prices and increase bandwidth, giving the consumer better control and better choice over their internet access.

That's why Google has stepped in and said "We'll only throw our chips in the pot if you do things a bit differently." And considering that the main goal for the FCC in this auction is to raise billions of dollars, and Google has billions of dollars, I think they might listen. A few of the FCC commissioners have spoken out in favor of opening up the rules on the auction and the draft of the auction rules has indicated that they're willing to change.

If you want to read more about it, I highly recommend reading David Isenberg's blog. He posts quite frequently on the subject and other matters of freedom on the internet. (like net neutrality)

Vint Cerf came to UCSD a few months ago and I had the opportunity to sit in on the latter half of his presentation. I would definitely ask him about what he thinks regarding the 700MHz auction and what opportunities it presents for Google. I know they have their Google WiFi in Mountain View. Would they consider moving to a nationwide Google Broadband service? And would they remember that they're trying not to be evil?

So I was thinking about attending Defcon this year, and I wanted to take a roll call of people I know who might go. UCSD people: we might think about sharing a room/carpooling. (Although I suppose flights are cheap enough...especially with gas prices here in socal.)  Anyway...if you're going, or thinking about going, leave a comment.
More Compiler Construction stuff...

I realize this will bore those of you who don't go to UCSD or aren't interested in how one would go about writing an XQuery interpreter. Sorry. I promise that I'll come up with some interesting posts about LINQ and stuff in the coming weeks. I've got finals coming up though. So it's not likely to be long stuff.

Anyway. Compilers people. Read on.

Decisions, decisions...

For the interpreter, you really only have two choices:

  1. Build an interpreter that goes through your AST and evaluates it. (Something like the visitor pattern. It's just a class that walks the tree...TreeWalker?)
  2. Define evaluate methods on your AST classes that evaluate their children and return a result.

What I Did

For my interpreter, I did 2. At the time, it seemed like a brilliant idea. I mean all I had to do was make it so that each class overrode the evaluate method and returned the appropriate result. So, for instance, my Range class had an evaluate method that looked like so (without type checking or semantic requirements:

public XSequence evaluate(Context context) throws Exception
    BigInteger one = new BigInteger("1");
    BigInteger from = new BigInteger(args[0].toString());
    BigInteger to = new BigInteger(args[1].toString());
    // Return a sequence of integers
    XSequence s = new XSequence();
    while(from.compareTo(to) <= 0)
        s.add(new XInteger(from));
        from = from.add(one);
    return s;

Now this is great. When I have a Range object, it just gets evaluated (in the given context) and returns an XSequence of XIntegers, right? Fantastic. For the most part, this worked.

What I Should Have Done

Honestly, looking back, it makes more sense to write a treewalker class that goes through your AST and does stuff based on the type of node it's going to find. This is how antlr (a very popular and fully featured compiler construction toolkit) does things and it makes more sense in certain cases. Especially when it comes to the error checking in function call code and type checking in your context code.

Universal Truths

This class is not context free...

Regardless of what you choose to do, you can rest assured that you'll need a Context class. At least, that's what I called it. You could also call it the environment or something like that. In my case, the Context class contained all the global variables, all the global functions, all the namespaces, etc. Methods to add variables and remove them (bind and unbind, essentially, so that you could deal with method parameters with the same name as global variables), etc. I also threw a few static methods in there to deal with type checking for the application so that I always had access to those methods whenever I needed to use them. It's also handy for doing type checking on function calls and return types.

Typing is important

Speaking of types. Build the types first. I swear. You must do this. And define as many of the built in functions in your type classes. For instance, you need to implement set theoretic operators (union, intersect, except) on sequences, right? How about defining:

public XSequence union(XSequence other);
public XSequence intersect(XSequence other);
public XSequence except(XSequence other);

That way you make each type responsible for its share of the built in functions. And it makes your built in function code much much cleaner.

Another nice thing about abstracting your types like this and shoving as much functionality as possible into them is that you can more easily write unit tests for your own type classes than you can for the AST classes. That way you can ensure that your set theoretic operators are working and that all those other built-ins are working just fine. After all, those are the building blocks of every XQuery program.

Built In Functions

I also had a class called BuiltInFunctions. This was pretty cool. So in XQuery, there's a FunctionDeclaration, right? You probably have a node that serves this purpose in your AST. You'll also have a FunctionCall that asks the Context class for a matching function declaration, hands over the arguments, and executes the ExpressionElement that is the body of the FunctionDeclaration. Well, since I was using the evaluate method I described earlier, I was able to use Java's anonymous classes for the built-in functions. Like so:

    new FunctionDeclaration("op:intersect", "sequence",
    new ExpressionElement("intersectBLOCKED EXPRESSION
        public XSequence evaluate(Context context) throws Exception
            XSequence set1 = (XSequence)context.getVariable("set1").evaluate(context);
            XSequence set2 = (XSequence)context.getVariable("set2").evaluate(context);
            return set1.intersect(set2);
    new ArgumentDeclaration("set1", "sequence"),
    new ArgumentDeclaration("set2", "sequence")));

So now my global functions variable has a function declaration for the built in op:intersect function...and the fact that my new intersectExpression element has an evaluate method that will return the set intersection of the two arguments. Two things to note here: 1) It's clean because of the methods I implemented on my types. 2) It negates the need for me to make a ton of classes for each built in function.

If you do decide to go the route of the tree-walker, you'll probably do something like this. But I'm tired and can't decide exactly how I'd work it. Probably every time I saw a built in function being called, I'd just call the method in my treewalker class that executed that built in.

Start Early...

My last comment would be to start this one early. It will take lots of time.

You guys all know my phone number. (If not, Scott has it.) And I'm in the channel enough. So if you've got questions, feel free to ask.

Some GPA bragging...

Just noticed this while looking at my degree audit to see how to successfully arrange my schedule for the next year:

Upper Division Summary - Not Complete       
(UC trfr not calculated in audit GPA)
20.0 ATTEMPTED HOURS    62.8 POINTS    3.14 GPA

Of course the important part here is that my upper division GPA is pi. And yes, I do value that more than a perfect 4.0 :-p

Stuff to do this week:

  1. Homework homework homework
  2. Talk to Rhodes to see if he'll waive the 101 or 141 pre-reqs on his summer 120 class. I'd love to take it with him without having to take 141. I'd be willing to suffer through 101 next quarter.
  3. Register for classes on Friday.
  4. Get my numerical analysis midterm back. This means going to the TA's office hours. I've never met the TA...because I don't ever go to section. This should be sufficiently awkward.
  5. More homework

Anyway...it's late. I'm off to bed.

Posted: Feb 12 2007, 01:57 AM by David Stone | with 9 comment(s)
Filed under: ,
David's Compilers Survival Guide

Taking CSE 131A at UCSD means two things:

  1. Your life will be consumed by Java.
  2. Your life will be consumed by XQuery.

It's non-negotiable. I didn't completely implement the interpreter last quarter. (The only thing I was missing was the order by interpretation for query comprehensions. It came down to the half hour before the assignment was due and I gave up on it, reverted to the last revision in Subversion, and turned in all my code.) But even without the orderby implementation, my code came out to around 10K+ lines. Divide that by the 9 weeks you get to write it all and you're talking over a thousand lines of code a week.

So, first things first. You do not want to do this class in vim. I don't care how hardcore you think you are, doing the entire class in vim will kill your productivity instantly.

Step 1: Get Eclipse

Unless you have the money to shell out for IntelliJ, Eclipse is the best IDE out there for Java. I'm always a fan of using the latest and greatest (aka, the bleeding edge) in the milestone builds. They're stable enough that the IDE won't crash on you while you're editing code and they allow some really cool features. Currently, 3.3M4 is the latest milestone. So go download that.

One of Eclipse's strong points is JUnit integration, which you'll definitely want to take advantage of in this class.

Step 2: Get Subclipse

I shouldn't have to explain that using version control is a necessary part of any software project. And while Eclipse comes with CVS support built in, I prefer using Subversion. So get Subclipse. You're going to want to set up your project and check it into your respository. Don't bother naming it A1, since you'll be editing the same project for the entire quarter. I set it up like so:

  • /cs131a
    • /a0
    • /[Your Project Name Goes Here. I recommend picking something cool as a codename]
      • Main.java
      • /lexer
        • Lexer.java
        • /tests

Eventually you're going to have a parser directory, a directory for your abstract syntax tree classes, etc. But for right now, all you'll need is your lexer directory.

Step 3: Get CUP/LEX

You'll be using JFlex and CUP for the lexer and parser in this course. You don't have to, of course, but it's recommended. And if you do, it'll be a lot easier if you have this plugin. It automatically re-runs JFlex and CUP on your .lex and .cup files when you save them, which in turn generates your Java files. It's slick. Use it.

Step 4: Get EclEmma

Emma is a free (and pretty darn good) code coverage tool for Java. Surprisingly, the Java code coverage tool space is dominated by commercial offerings, with a very sparse (and very new) set of open source tools cropping up to complement the rest of the open source stack (Eclipse, JUnit, Java itself now, etc.) The reason I chose Emma is that it had a very well integrated plugin for Eclipse: EclEmma. EclEmma is great because it integrates with your JUnit tests runs and provides coverage information for each line in the IDE and coverage statistics per-package, per-file, and per-class.

Step 5: Use JUnit to automate your testing

Eclipse has built-in JUnit integration. Use it. Kube provided us with XQuery files and their XML output from the reference program. This is nice, except for the fact that there's no convenient way to write JUnit code around it. What I ended up doing was writing a helper function for my JUnit classes that read in the XQuery file, ran the (Lexer|Parser|Interpreter) on it, and diffed the result against the reference result file that Kube gave us. I'll be uploading my test code to my ieng6 account sometime this weekend, when I have time to prep it for everybody. (I've also got a diff utility.)

If you use JUnit in conjunction with EclEmma, you'll get a nice green bar telling you that all your tests passed, and a nice green bar telling you how much of the code you wrote was actually covered. This will be especially useful for the lexer assignment.

Step 6: Start Early

I know everybody always says this. And nobody ever follows it. I started writing A1 (the lexer) about two days before it was due, same with A2 (the scanner). With A3 (the interpreter), I started a whole 8 days in advance. That's a grand total of 12 days. I probably had 100+ dollars worth of Starbucks/Pepsi/etc during that time, almost no sleep, and absolutely no free time. And I wrote nearly 11 thousand lines of code. Don't be stupid. Start early.

There you go. That's the advice I've got. If you've got any more questions about the class or about the tools and how I've got them set up, go ahead and e-mail me.


Another Quarter Done

Whew. Crazy quarter. Glad that's over. I just found out that I got an A in my programming languages course. That's really good, because the final was killer. It had a bunch of "what's the value of this ML/Python/Prolog program" questions and some scoping stuff, and the real fun part: write an algorithm in prolog to type check ML programs. (Everybody was thinking they failed the class.) The professor used the "I don't like failing people" grading curve...where everybody above a 50% in the class at least passed.

My compiler construction final was significantly different than I thought it would be. The True/False questions Kube always puts on his tests didn't seem to be too bad. And a fair amount of them were either totally obvious (T/F: A Text Node in the XML DOM can never have child objects.)  or things that I had on my cheat sheet. There was a whole section on "Give the value of this XQuery program" (that's what we were doing all quarter, after all), and since I've been staring at XQuery these last two weeks, that wasn't that hard. The hardest parts were generating an SLR(1) parse table, using an SLR(1) parse table to parse a simple string (and giving the steps that it parsed), and generating first- and follow-sets from CFGs. Crazy stuff. Those were hard, but I had them on my cheat sheet as well. So I think I pretty much rocked the final. We'll see though. I hear we'll get grades by Monday.

Math 109 is something I at least passed. I had a 140% on the first midterm, a 50% on the second, (worth a sum total of 50% of my grade), I only turned in about half of the homework (~15%), and then there was the final (which I think I did really well on), which was 35% of my grade. I'm hoping for a B. It'd be pretty hard for me to totally not pass the class.

Today's agenda: Help out a friend with studying for her math placement exam. Get a haircut. Maybe fix up some stuff in CPHog that I've been meaning to look at. About a month ago, Firefox trunk builds were changed so that child nodes created in different documents must be imported into the document you want to use them in. This is valid, according to the spec, but a lot of Javascript totally ignores it because, up till now, Firefox has let you just use them together. Well not anymore. It's broken a few things in CPhog for me.

CSE 130 Sample Final Answers

  1. Ocaml Types
    1. ans:int = 804
    2. type error
    3. ans:int = 100
    4. ans:int = 20
  2. Function Type / Write reverse
    1. 'a -> 'a * 'a -> bool * 'a -> 'a
    2. let reverse l =
      1. let f (c,b) = match c with [] -> ([], b) | h::t -> (t, h::b) in
      2. let g (l, acc) = match l with [] -> false | h::t -> true in
      3. let base = (l, []) in
  3. Semantic Equivalence
    1. Semantically Equivalent because addition is transitive
    2. Not Semantically Equivalent because g is not defined in e2
    3. Semantically Equivalent because multiplication is transitive
  4. OCaml Module
    1. Part A
      1. Signature B
      2. Stack.top [];;
      3. Because signature A says that 'a stack = 'a list, you can call any function on 'a stack on an 'a list instead. This allows Stack.top to be called on [], which will throw the EmptyStack exception. When using signature B, you must call Stack.make, and it will never give you back an empty list.
    2. Part B
      1. Signature A
      2. Inferred Type: 'a stk -> 'a list
    3. Part C
      1. Doesn't work
  5. Boolean Formulas
    1. type boolexp = Var of int | Not of boolexp | Or of boolexp * boolexp | And of boolexp * boolexp
      And(Or(Var(0), Not(Var(1))), Or(Var(1), Not(Var(2))))
    2. let rec eval(list, expr) =
          let rec find (list, index) =
              if(index = 0) then List.hd(list)
              else find(List.tl list, index - 1) in
          match expr with
              Var(i) -> find(list, i)
              |Not(v) -> !eval(list, v)
              |And(a1,a2) -> eval(list, a1) && eval(list, a2)
              |Or(a1,a2) -> eval(list,a1) || eval(list,a2)
Toorcon 8

I had an awesome time at Toorcon this last weekend. Some friends from UCSD convinced me to pay the last minute registration and attend. Totally worth the $120. Here's a quick rundown of my weekend. I've included all my notes from Cory Doctorow's keynote because it was just do dang good.

Keynote - Cory Doctorow
The industry has moved from the mainframe: a giant, specialized network that restricts the user's abilities based on a central policy authority, to a general purpose network that provides an open environment for the user to explore, learn, and share. He said that today, we're seeing a backlash. We have IT departments that are trying to turn our general purpose networks and machines back into specialized, locked down, dumb terminals again.

We can see this in the era of the EULA. Everybody is including these "binding agreements" that you agree to by opening the box. By "agreeing" to these EULAs, we've agreed to give up all our rights to the manufacturer. And we're seeing this in DVDs, TVs, Consoles, Cars, Software, etc.

We can see this in the era of DRM or "copy protection". (As Cory pointed out, have you ever protected something by not copying it?!) DRM only keeps honest users honest. If I have the choice of being restricted by buying something legally (off the iTMS or URGE or whatnot), or being completely unrestricted by downloading something illegally, which am I going to be more likely to choose? If I download music off BitTorrent, I don't get a rootkit. If I buy the music from Sony, I do. Rootkits are enforcing policy from a central policy authority that treats the user of the hardware as an attacker. Why are we designing computers and software that can enforce remote policy on the user? That's like building a rocketship with a self-destruct button.

The best security is living in a civilization that is based on freedom. We have to reform our business models so that we treat the fact that the internet makes copying fast and easy as a virtue, not a vice. The history of copyright and trademark law shows that yesterday's pirate becomes today's admiral. And that these admirals point at the pirates of today, screaming "What we did was progress! What these guys are doing is theft!" Who revolutionized the entertainment industry by introducing the VCR? Sony. Who put copy protection rootkits on their CDs? Sony. We need the pirates of today to legitimize the way in which we operate so that the pirates of tomorrow can succeed in their own fights.

Keynote - Simple Nomad
Simple gave a very interesting talk about the "State of the Enemy of the State". His current prediction (and he's made a few very accurate predictions) is that reverse engineering is soon to become illegal. He cited the Microsoft vs. Viodentia case as an example of how reverse engineering can go away.

Black Ops 2006 - Dan Kaminsky
Dan's talk was pretty cool (He was also very hungover :-p). He showed an idea of using RTP/RTCP as a way of monitoring networks for net neutrality. The basic idea is that because RTP floods TCP links without regard for quality, packet loss, etc and then waits for the client to give back the RTCP data on jitter, latency, etc, we can use this information to see how network links are faring. And this data can be used to monitor if/how the carriers are shaping the traffic. (Cory Doctorow also touched on this a bit...in a bit less technical detail.)

Since Dan knows more about DNS than practically anybody on the planet, he also showed how you can stop DNS leakage on VPNs over SSH. Basically, DNS requests can (and do) get leaked to the local LAN because they're UDP packets rather than TCP (SSH only moves TCP packets). Rather than fixing all the client side code or putting a layer in to wrap DNS requests up in TCP packets, Dan proposed that the fix is actually much simpler. Put up a local DNS server that always responds by setting the truncate bit to one. This tells the local DNS system making the request to ask over TCP instead. Pretty slick.

Also, Dan showed the coolest application that visualized binary formats as a dotplot. Now, this isn't new to image formats and the like, but he showed dotplots of .NET assemblies, Java class files, Win32 PEs, etc. This is useful for fuzzing file formats. Also, for examining them for changes. His last slide was titled Visual BinDiff and showed the differences, in dotplot format between msvcr70.dll and msvcr71.dll. It was cool.

More to come...

That was just the rundown of the keynote sessions. Pretty cool stuff. I'll post more over the next week or so (as homework allows) on the actual session I attended.

prep cs130; prep cs131a

Another school year, more CSE classes. My class schedule for fall quarter looks like:

  • CSE 130 - Programming Languages. Looks to be the most interesting course. We'll be studying OCaml for 4 weeks. Comparing it to Java for 1. Studying Python for 3 weeks. And finally we'll have a quick look at Prolog (yay!) for 1 week.
  • CSE 131A - Compiler Construction 1. Building an XQuery interpreter in Java. This should be fun. (Except for the Java part) This'll make me appreciate LINQ all the more.
  • Math 109 - Mathematical Reasoning. Upper division Discrete Math. Standard proof stuff. No biggie.

CSE 131A necessitates that I venture back into the realm of Eclipse. Since everything at this school is in Java, I've resigned myself to a life of saying "I wish this worked more like Visual Studio." Of course, there are some things that I like about Eclipse where I think the opposite. But those are few and far between.

So since I have to use Eclipse for CSE 131A, I was thinking it'd be really slick if I could do all the OCaml work I have to do in CSE 130 in Eclipse as well. I'll see about Python and Prolog later.

So I stumbled across EclipseFP, which is a plugin for functional language support in Eclipse. This is fantastic. I'll be able to do OCaml development in the same environment as I develop my XQuery interpreter.


Paul asked if one could easily add jQuery to pages everywhere. He mentioned that you might be able to do it using GreaseMonkey. The answer is: of course you can. :)

var theScript = document.createElement("script");
theScript.src = "http://jquery.com/src/latest.js";
theScript.language = "javascript";
document.body.insertBefore(theScript, document.body.firstChild); 

This is basically the same trick Josh and I use in CPhog. You use the user.js file to inject JavaScript into the page itself, so that rather than running in the GreaseMonkey sandbox, the script is running in the page. You can even use Firebug to run jQuery statements against pages now. Pretty spiffy, eh? 

More Posts Next page »