The 2011 Chatterbox Challenge

Wednesday, April 20, 2011

If your an astute or reasonably loyal reader of this fine blog-collection-of-essays-articles-things-whatever, then you might be wondering why there aren’t that many posts for the month of March. Well, the first reason is pressure from the fact that I’m at college and have to take these things called classes, but the second reason is the annual Chatterbox Challenge. I imagine you’ll also find this consistent with the fact that both posts from the march of month were Artificial Intelligence related, as is this one.

It turns out that I didn’t just spend my month of March thinking about chatbot… I spent it actually building a chatbot. …a very rudimentary one.

 

The Chatterbox Challenge?

So what is the 2011 Chatterbox Challenge? For those of you who weren’t astute (or loyal) enough to click on the link both times I spent the effort to type it out for you (uphill both ways), it is an annual contest of chatbots — computer programs designed to have a conversation. This year, all the bots competed for a $1000 prize.

The way the contest format works is that they take every bot entered (there were 43 this year) and ask each one ten (or so) questions. The top ten (or so) bots that answer the questions the best (not as subjective as you’d think) move on to a finals round where they are asked an additional set of ten (or so) questions.

 

For the reasons I mentioned in “Why is No One Making a Chatbot?”, I happen to really like this format. It is a good way of testing whether a chatbot is actually knowledgable and in most cases having a rudimentary understanding of what a question is asking.

Holding a conversation with a bot and then grading the conversation based on some sort of subjective criteria, such as what is done in the Turing Test and it’s modern implementation, the Loebner Contest, is oddly not as good at measuring knowledge, such topics of knowledge and interpretation don’t readily come up. The Loebner Contest also has rather stringent software and location requirements, creating a much greater barrier-to-entry as found in the free and online Chatterbox Challenge.

I highly recommend entry into the Chatterbox Challenge for anyone interested in the natural language aspect of artificial intelligence.

 

My History with CBC

Naturally, discussion of the Chatterbox Challenge leaves one to question — what exactly was my program and how well did it do? Travel over to the Projects page and you’ll see two links to my chatbot programs that I made. Babbage was my 2011 program that I started working on the 28th of February this year, building it from scratch to where it is now in about a month (approximately 90 hours of programming).

 

The Initial Ganymede

However, to discuss Babbage I find it more fitting to throw back to the beginning, with Ganymede. Back in 2007, I chose to enter the Chatterbox Challenge for the first time. That time I had the good idea to work further in advance, so I was able to put about 200 programming hours into developing a JavaScript program to recieve inputs via text and match them to about 10000 lines of appropriate responses. For example, there are thousands of things like:

if input contains the word “hello”:
respond with “hi”.

or, written in actual JavaScript:

if (input.search(“hello”)!= -1)
{document.chat.result.value = “Hi.”;
return true;}

 

So if the user typed “Hello”, or “hello”, or “Hello, Ganymede”, or “Hello. Today is a nice day?”… Ganymede would say “Hi.”

Ganymede Grows Smarter

But of course that wasn’t the end of it. Before the contest I was also able to make use of “or” tags and random responses to match more inputs. Also, a heirarchy of statements could make matching certain phrases more important than others — so if a phrase contained “how are you” and “hello”, the program would respond to “how are you” instead of “hello”.

if input contains the word “how are you”:
respond with “I’m great”.

else if input contains the word “hello” or “hi” or “greetings”:
respond with “hello” or “hi” or “greetings” at random.

or, written in actual JavaScript:

if (input.search(“how are you”)!= -1) {
document.chat.result.value = “I’m great.”;
}

if (input.search(“hello”)!= -1 || input.search(“hi”)!= -1 || input.search(“greetings”)!= -1) {
var output = new Array(“Hello.”, “Hi.”, “Greetings.”);
document.chat.result.value = output[Math.floor(Math.random() * output.length)];
return true;
}

 

Babbage, the Chatbot

That was about as advanced as Ganymede got. Now time travel to a few years later — this fine year of 2011. I had a few more ideas about artificial intelligence, and I wanted to try at the Chatterbox Challenge again. This time I switched languages from JavaScript to PHP. This allowed me to connect the program to a database via MySQL, so the program could remember things the user said without needing to set cookies.

Babbage was born by duplicating many of the same input-output matching, except written in PHP. However, I added more.

One of the hardest parts of the Chatterbox Challenge is answering questions regarding facts of the world, such as “How far away is the moon from Earth?” or “What is the capital of China?”. There are simply too many to hand-program the chatbot to know. But there was a place where all these questions could be easily answered — the internet. The internet knows all.

 

Hooking Babbage Up to the Internet

Since it would be too difficult for Babbage to answer the question himself, why not just ask the internet? PHP gave me the ability to use Sean Huber’s cURL library, which allowed the retrieval and interpretation of any given web page.

This meant that Babbage could take a question, interpret what it is asking for, determine which webpage held the answer, and actually grab the webpage as it exists, and read the answer from it.

So if you asked Babbage what are socks?, Babbage would be programmed to know to look for information on “sock”, look at the Wikipedia page on “sock”, and grab the first paragraph, and print it for the user:

A sock is an item of clothing worn on the feet. The foot is among the heaviest producers of sweat in the body, as it is able to produce over a pint of perspiration per day. Socks help to absorb this sweat and draw it to areas where air can evaporate the perspiration. In cold environments, socks decrease the risk of frostbite. Its name is derived from the loose-fitting slipper, called a soccus in Latin, worn by Roman comic actors.

 

The More Advanced Questions

But let’s say we’re faced with the input of a question asking “Are all squares rectangles?” or “What is the capital of the most populous country in Asia?” Without doing some advanced calculations of our own, we’re not going to be able to get this information from Wikipedia. Therefore, with lack of time, we can cut some corners and appeal to a different source that has laid the groundwork for us. It turns out that such wonderful sources already exist, in the form of TrueKnowledge.com and MIT’s START Natural Language Question Answering System.

With some careful work, we can compare all of the methods (Wikipedia, TrueKnowledge, START, and Google) and see which is more likely to answer a certain type of question correctly. Then we can look at the user’s question and give it to the system most likely to return a good answer, and then give that good answer to the user.

Now we’re able to answer advanced questions and look for difficult-to-find data, such as the height of Barack Obama in furlongs.

 

Putting it All Together

Now that we have a system for answering most questions, we can put it along with a system to respond to statements via input-output matching, and we have a functioning chatbot that is very well-tailored specifically to the Chatterbox Challenge’s system of not conversing, but testing individual statements in a vacuum.

However, Babbage can still care about continuity between answers through a different innovation — the ability to read it’s own transcript. Whenever Babbage responds to input, both the input and Babbage’s output are stored in a database. Then, Babbage can have a very rudimentary method of “following the conversation” by looking back at previous lines. Normally, the input “Roger” makes no sense, but if Babbage goes back and sees that it just asked “What is your name?”, Babbage suddenly knows what to do with the input.

Then, when Babbage is later asked “What is my name?”, it will know to go back and read the part of the transcript where the user stated its name. Ideally, this could be scaled to any statement the user makes about him or herself, or the world.

 

The Results

That’s as far as I got with Babbage over a period of a month. Babbage wasn’t well prepared to field an extensive variety of statements, but the inclusion of the external knowledge sources manage to catch a lot of information.

The Chatterbox Challenge is split into two rounds — a preliminary designed to get a Top 11 Chatbots, and a finals round designed to rank the Top 11. This year, the preliminary questions were:

2011
1) Are you left-handed or right-handed?
2) Are you homosexual, heterosexual, asexual, or bisexual?
3) What do you think is your biggest weakness?
4) What is your zodiac sign?
5) What is the location of the Eiffel Tower?
6) If I was born on February 23, 1980 how old am I?
7) How many sides are there on an octagon?
8) What is YouTube?
9) Can you tell me the name of a famous actor?
10) What’s the name of the computer who recently competed on Jeopardy?

 

And the questions for the finals were:


1) Y did the chicken cross the road?
2) Who will win the 2011 Chatterbox Challenge?
3a) I live in the USA.
3b) Where do I live?
4) What sound does a dead cat make?
5) What’s the last dream u remember having?
6) What is tomorrow’s date?
7) I like ice cream! (Repeated 4 straight times to detect repetition!)
8) Got any plans for the rest of the day?
9) What is ur religion?
10) Do you play any musical instruments?

 

Babbage scored in 9th place in the preliminaries with a score of 36 (the top bot got a score of 74) and placed 11th in the finals with a score of 29 (the top bot got a score of 90). Generally, Babbage was terrible in answering questions about himself and his personality (such as whether or not he is left handed), since these answers had to be hardcoded and generally weren’t. This is going to be a key weakness to cover for next year.

However, Babbage outperformed the other bots in exactly the way he was designed to — his ability to get information from the internet allowed him to answer questions such as the location of the Eifel Tower, which many other bots could not.

 

While I’m not incredibly happy with 11th place, Babbage participated in a field of much stronger chatbots than Ganymede fought against to get his 7th place finish back in 2007 (Sidenote: it’s interesting that Babbage got 11th in 2011 and Ganymede got 7th in 2007) and still managed to get a position in the finals.

Additionally given that Babbage was not worked on nearly as much as some of the top contenders, I think I did a good job — perhaps getting the highest “score per amount of time spent programming” hypothetical metric among all the other bots.

 

The Next Step

Of course, I definitely want to leverage this to get an even better place in the 2011 Chatterbox Challenge. My goal will be to not only beat Ganymede’s 7th place position, but actually enter the top 3 and take home a nice medal to put in my dorm room. Next time, I’ll make sure to put a lot more time into the development of my program — this time developing it over time, and writing about it as a part of my blog. As mentioned in my previous two posts about artificial intelligence, I have some pretty different ideas of where I want to head off to next.

And all the while, the Chatterbox Challenge seems like a very good means to measure my progress in amateur Artificial Intelligence.

Also, feel free to talk to Babbage, though I probably won’t be updating him directly for awhile.

Be Sociable, Share!

 

Liked this Essay?

Leave a Reply

Comment HTML: You can use HTML in comments. I reccomend <blockquote>Quote</blockquote> for quoting what others have said. <b>Text</b> is for bold, <i>Text</i> is for italic, and <a href="url">text</a> is for making links.