How difficult is it to port code from Google App engine to Amazon EC2?

EC2 and AppEngine are very different. EC2 is IaaS (infrastructure as a service) providing on-demand VMs with full root access. It is up to you to setup these VMs, install and manage your applications on them.

AppEngine is PaaS (platform as a service) wherein you deploy your software (written in their supported languages – Python or Java) to Google's infrastructure and they host and manage it for you.

AppEngine is a lot more hands off from the management standpoint, but also much less flexible in terms of what you can do with it. AppEngine also doesn't offer some features that may be essential like custom SSL domains (this is a feature they'll be adding soon though), and limits the things you can do within their environment via a Java/Python sub-set whitebox API (e.g. socket APIs). If your software can conform to AppEngine's more restrictive environment and features, it is an excellent choice.

Where can I learn more about computer science methods and algorithms that match and summarize different pieces of  text?

Foundations of Statistical Natural Language Processing

There are a number of books, but I like this one for starting out.  Towards the end of the book there is some information on automatic summarization systems.  The field is rapidly changing and there is nothing as up to date as the literature and the open source software frameworks, but this book lays a good foundation for finding the names of the algorithms you need to perform natural language processing tasks.

Many people however prefer this book for some reason;
Speech and Language Processing

I have no read it personally but have been told that it is denser, covers more topics in depth and has the code samples you need to implement working systems.  I am not sure if any of these things are true.

Natural Language Processing with Python

I enjoyed reading this book and learned a little Python and some linguistics, however I am not sure I learned how to do anything that was useful.  Its a good book though.

Programming Collective Intelligence: Building Smart Web 2.0 Applications

This is another great book on Python.  It has some natural language processing stuff in it, but mostly limited to bag of words and spectral methods such as non-negative matrix factorization.  It has many code samples and is a joy to read.  I often copy and paste code from this book to get some RSS feed scraping and NLP stuff up and working very quickly.

How secure is MD5?

I don't expect to see MD5 preimages in the next 6 to 12 months. The cryptographers I know who would have the best expertise at creating preimages are focused on finding collisions in SHA-1, which appears to be "this close." Furthermore, the community has a viable strategy for progress in collisions in SHA-1 building on Wang's seminal 2004 work with collisions in MD4 and MD5 and more recent work on collisions in SHA-0.

The few who aren't focused on SHA-1 collisions are busy with the SHA-3 competition. So there isn't a lot of research bandwidth in the community for preimages in the near future. That's my take, ayway.

That being said, Wang came "out of nowhere" in 2004, so it is always dangerous to make these kinds of bets. If you have MD5 in your application, you should start moving away from it now. Even though we don't know how to do preimages yet, we do know how to do chosen-prefix collisions, where we can specify the prefixes of the two colliding messages. This code is released and continues to improve in speed.

Chosen-prefix collisions can pose problems that are not obvious at first. For example, our team was able to use chosen-prefix collisions to obtain a "rogue CA certificate" from RapidSSL:…

Bottom line: I don't expect preimages in MD5 for at least 6 to 12 months, but you should still move away from MD5 today.

What is the best resource for learning bash scripting?

I will recommend Bash shell: An innovative way to learn data!.

It’s also available as an interactive course (with live playgrounds). Besides learning the basics, you’ll get to apply the concepts on four projects.

How much of what secretaries and admin assistants typically do is automatable?

Scheduling meetings:
 – mostly automatable.  You can set up calendar software.  But if you want to shuffle things around, it usually takes contacting multiple parties to work it out.
Taking messages
 – mostly automatable.  A recording is even better than a transcribed post-it note.  A human is better at blocking solicitors and spam, if you're big enough to have to worry about that.
Travel arrangements
 – not automatable.  Unless if you're scheduling limo pickups, restaurant reservations, other concierge type service, you're paying someone to click on webpages for you.
 – not quite automatable yet.

Why are Macs claimed to be virus-free?

Those claims are the result of either…

A.) Misinformation/ignorance.


B.) Out of practicality.

There are "viruses" that target Mac OS X. They needn't be worried about though because they aren't spreading actively. This is the case because Unix is well designed for security, Apple has done a fair job making it difficult for inexperience crackers to infiltrate, and there is a much easier target for malware available.

What are the principles of "code that documents itself"?

One answer for my own question – because I've encountered it this morning.

While it's obvious that class/function/variable names should indicate what they do or contain, this shouldn't take precedence over the "form" of the code. For example:

def get_measure_savings_valid_field_lists_dict():
    # etc

This is silly, imho, "def get_measure_savings():" would have sufficed. But more importantly, when this function name gets mixed in with the rest of the code, it makes for long, ungainly lines that are very difficult to scan.

What are the best monospace or fixed-width fonts to use for programming?

I'm still a huge fan of the classic Profont, originally a Mac-only font that now has been cloned on several other platforms.…

ProFont originally offered 9-point and 12-point versions designed to match the spacing of Monaco exactly, but with clear articulation of similar characters like "0" and "O", "l" and "1".  Incredibly clear, even at 9-point.

How much of what sysadmins typically do is automatable?

While many of the routine tasks can be automated, any reasonable-sized deployment can have a lot of interesting and unexpected behavior under unexpected circumstances.  A good sysadmin needs to both automate routine tasks and be able to deal with the unexpected circumstances that come up (and automate their fixes if they happen often enough).