Thursday, April 11, 2013

Out of This World Images

Oh, you thought I was talking about cats? No, when I say "out of this world", I mean it literally.

Through the Hubble Legacy program, NASA makes available the raw data captured by one of human kinds greatest achievements. (Well, I think the Hubble Space Telescope is one of our greatest achievements anyway.)

Unfortunately, you won't find JPEG files on the site, but rather a somewhat exotic format called "fits". I'll spare you the details on what exactly a .fits file is, but suffice it to say, it stores a lot of information. Just one channel of the image of galaxy M51, the galaxy used as an example in this article, is almost three hundred megabytes.

Here, I want to take a brief look at working with these images, and how you can create original spacescapes with real data. It's a great way to spruce up any space themed design, plus, you get the geek cred that comes with being able to say you've messed with the same raw data as NASA scientists.

If you'd like to follow along, head on over to the Hubble Legacy Archive, and download an image of your choice. I recommend looking for galaxy M51, because it's easy to find some very high quality images.

You'll also need software appropriate for handling .fits files. You'll need to research what is good for your platform, I'm a Linux user, so I'll be using ImageMagick. Now, some common programs such as GIMP can actually open the .fits files directly, but they can't handle the ultra-high bit depth of the files. I recommend ImageMagick because it properly supports arbitrary bit depth file operations. (Yes, arbitrary bit depth.)

When you first open the .fits file, it is likely to appear black. That's OK, it's just because there is very low light data. The sensor on the Hubble is designed to be sensitive enough to pick up light from distant stars, and also robust enough to view much nearer objects like, say, the sun.

In ImageMagick, I use the Stretch Contrast function to fix the exposure into something that makes visual sense. Once you're happy with the exposure, save the image out to a file.

Tuesday, February 26, 2013

How to set up a LAMP

LAMP is the common acronym for Linux, Apache, MySQL, PHP/Perl. It reflects one of the most common web server setups in use today. I've set up LAMP servers enough times now that I've mostly got it down to a science, and I'd like to share with you how I set up a LAMP.

Choosing the "L"

I use the latest Ubuntu LTS, or Long-Term-Service release. This ensures 6+ years of compatibility, stability, bug fixes, and security fixes. CentOS, Red Hat Enterprise Linux, and Debian Stable are other common choices, but for the purposes of this document, I will assume Ubuntu 12.04.

Basic "A"

The "A" in LAMP is for the Apache web server, and there's really not much choice in that. Make sure you're running a current version, though. You may also choose which modules to enable or install. I try to keep the server light, and I disable CGI and and deflate. You can enable CGI if you need Perl, or enable Deflate to save bandwidth, but I tend to target script performance and load handling over bandwidth. I also use mod_userdir, which you should install if given the option. On Ubuntu, all of these mods come along with the basic apache2 package.

"M" and "M" (and occasionally another "P")

Although PostgreSQL is also a fantastic option, MySQL is still the de facto standard for LAMP stacks. Future versions of Red Hat Enterprise Linux and Ubuntu LTS will switch to the completely compatible fork, MariaDB, and will keep the acronym neatly in tact. MySQL's InnoDB database engine is nearly ACID compliant, fast, and featureful. For this reason, I install MySQL 5.5 or higher for now, and will use MariaDB when it is widely available. I also often install PHPMyAdmin along with MySQL.

Mind your "P" (or "R" or "L" or...)

The last letter(s) in LAMP are the most flexible. PHP is a common choice, but so was Perl (thankfully, not so any more, though you might still need it with CGI), and Python, Ruby, Lisp, and a few others are also gaining in popularity. I'll focus on PHP, since it's what I mostly develop in, and it is what many common software projects such as WordPress, Drupal, and PHPBB are built on. PHP has a lot of libraries built in, but I recommend adding GD, CURL, and SQLite support which are often packaged separately. Most distributions package the Suhosin (security hardening) patch by default, if not, I recommend you install it as well.

Sending eMail

On Linux, getting PHP configured to send eMail is easy. Simply install Postfix, and you're on your way. During installation, you'll be asked for SMTP configuration, so make sure you have that handy.

Accessing the Server

The standard methods of interacting with a web server are SSH, FTP, and SFTP. I recommend ProFTPd and of course the standard SSH server. ProFTPd is my FTP server of choice because it is very easy to set up. You log in with your system account, and your permissions are determined by what they are set as on the server. This makes it secure and easy to configure all in one.

The Magic Command

One of the things I love about setting up Linux servers is that you can get it down to just a few commands. As my parting thought, here's a magical command for Ubuntu flavored servers that will get you up and running in one go.

sudo apt-get update; sudo apt-get install apache2 php5 mysql-server openssh-server postfix proftpd-basic phpmyadmin phpsysinfo php5-gd php5-curl php5-suhosin php5-sqlite; sudo a2enmod userdir; sudo a2dismod cgi deflate;

Tune in next time for how to set up user accounts, access control, per-user websites, DNS integration, and performance tuning for Apache!

Sunday, February 10, 2013

The Importance of Normalcy

In most cases, I champion being unique, and in most cases, that means not being normal. In the case of databases, however, that isn’t the case. In the lingo of database land, being Normal is to be Unique.

Early Databases


In the earliest times, the idea of a database was primitive. If you even had something called a database, it didn’t look much like our databases today. For the most part, programs stored their information in clever flat-file structures. Let’s imagine that you wanted a list of employees at a company. Some are developers, others are managers, some are in the Android department, some in the iOS department, some in the design department, and some in the web development department. An early application might have treated this data as a fancy sort of CSV -- merely separating each of the employee’s fields with a delimiter.

While this works fine for smaller numbers of people, you begin to encounter severe performance problems when you want to manipulate the data based on certain fields. At first, this isn’t a problem. Finding all the people who work under a manager as as simple as matching the “manager” field to the manager’s name. Of course, the wacky new employee (Moreena) who decided to enter their manager’s name as “Kymberlie” instead of “Kimberly” isn’t going to show up. It would make things simpler if the system instead showed options for existing managers so that Moreena can simply select her manager from a list. Generating a list of all existing managers now requires scanning every existing record in the document, a time consuming operation.

The Relational Revolution


Eventually, a new concept, the relational database, was born. Enter: Normalcy. A relational database was so called because it took these spreadsheet like structures, called them tables, and allowed you to specify relationships between them. Along with this came the concept of normalization, meaning that the database was structured in such a way that data was repeated as little as possible, and organized in a way that was as efficient as possible.

If we return to our example earlier, we would no longer store each employee as a single record with all of the information; at least, not exactly. Since multiple people work for the same manager, that manager’s name is duplicated data. We then take that data and “Normalize” it, by putting it in a separate table. The same is true of departments. Each table gets a special column called a primary key. This key is a unique internal way to identify each item. By convention, this is called the “id”. What’s important is that the unique key is internal to the database, and is linked but not equivalent to the actual text. To associate an employee with a manager and with a department, we define two fields for each employee which are a special type called a foreign key which points to the identification column in the table with the information we want to associate. There is now a simple, concise, table which lists the company’s managers, one of whom is Kimberly. When Moreena goes to select a manager, the database no longer needs to scan an entire document. Instead, it simply lists the small list of managers, and when she selects Kimberly from the list, inserts the unique identifier into her manager field as a foreign key, directly linking her to her manager in an efficient way.

Being Normal


Keeping your databases normalized is absolutely important both because it improves the efficiency of your queries, and also because it maintains the integrity of your data. Given our example, if Kimberly decides that she actually likes the way Moreena spelled her name, and gets it legally changed, it would be difficult to apply this to a monolithic structure. It would require the equivalent of a massive find-and-replace operation. Of course, that may result in Kimberly-the-housekeeper who takes out the trash getting her name changed as well. In a normalized database, since only the unique internal identifier links the manager to the employee, you can simply change the manager’s name, and leave the id the same. Thus, the next time it lists your manager, it shows the new name, as referenced by the id. Additionally, modern databases provide functions that further speed up queries on normalized data, allowing even a moderately powerful server to handle literally millions of data elements and return complex queries in a fraction of a second.