12 May 2013

Quercus on Google App Engine - 2.0


Quercus on Google App Engine - 2.0


The following article originally appeared on JavaAdvent 2012.

It's been a 3.5 years ago to this day since I last wrote about running Quercus on Google App Engine (GAE). The article detailed a crazy experiment to get a PHP application, Wordpress, running on GAE. I still remember it being a painful experience because of number of changes I had to make to Wordpress to get it working. It was all due to the GAE Java environment being so drastically different from what Java developers were used to. It was heavily sandboxed and had no SQL support. You couldn't launch new Threads. You couldn't write to the file system. But those were the tradeoffs you had to live with if you wanted to deploy your application to the all-wonderful Cloud.
Fast-forward to 2012 and GAE has come a long way. GAE now allows you to spawn new Threads (currently in beta). You still cannot write to the file system, but for all it's worth there is a new GAE Files API that gives you file-like concepts. And you can run your very own MySQL instances on Google's infrastructure (albeit Google's customized version of MySQL).
What hasn't changed over the years is that you're still expected to hit major roadblocks as you migrate your existing web applications over to GAE. And you'll have to make heavy modifications to your application to 1) make it work and 2) be performant on GAE.
So this is where Quercus comes into the picture. At Caucho, we spent a lot of time getting Quercus to work seamlessly with GAE. Our goal was to abstract the GAE details away so that developers don't have to worry about the fact that the application is running on GAE. Things just work transparently behind the scenes for PHP applications. For example,
  • PHP file_*() functions work just like they do before (including writing to files!)
  • PHP mysql_*() functions and PDO work just like they do before

What this means is that we can run existing PHP applications on GAE without any modifications! Say hello to "Wordpress on Google App Engine” 2.0! But first, let's start with some formalities.

I. Introduction to Quercus

Quercus is Caucho's 100% Java implementation of the PHP language runtime and libraries. Currently, Quercus supports PHP 5.3 language features and contains a multitude of libraries developed in-house including but not limited to apc, bcmath, curl, date, dom, filter, gettext, json, mail, mbstring, mcrypt, pdf, pdo, postgres, reflection, regexp, spl, zip, and zlib.
Quercus can run as a servlet or on the command line. For our case, we're interested in running it as a servlet:
  1. Download Resin Java Application Server and it'll come with Quercus enabled by default and ready to go.
  2. Uncompress the archive and start up Resin with:
        ./bin/resin.sh console
  3. Then create a webapps/ROOT/test.php file with the following contents:
    <?php
    
      phpinfo();
    
    ?>
  4. Browse to http://localhost:8080/test.php and you should see the following:
Congrats! You have PHP running natively on a Java server.

II. Running Quercus on Tomcat

Okay, so you want to use Tomcat instead of Resin? No worries.
  1. Copy lib/resin.jar from the Resin download above to Tomcat's webapps/ROOT/WEB-INF/lib directory.
  2. Add QuercusServlet to Tomcat's webapps/ROOT/WEB-INF/web.xml file:
    <web-app xmlns="http://java.sun.com/xml/ns/javaee" version="2.5">
      <servlet>
        <servlet-name>Quercus Servlet</servlet-name>
        <servlet-class>com.caucho.quercus.servlet.GoogleQuercusServlet</servlet-class>
        <init-param>
          <param-name>ini-file</param-name>
          <param-value>WEB-INF/php.ini</param-value>
        </init-param>
      </servlet>
    
      <servlet-mapping>
        <servlet-name>Quercus Servlet</servlet-name>
        <url-pattern>*.php</url-pattern>
      </servlet-mapping>
    
      <welcome-file-list>
        <welcome-file>index.php</welcome-file>
      </welcome-file-list>
    </web-app>
    
Quercus should now be handling all php HTTP requests.

III. Running Quercus on GAE

Quercus runs on top of the GAE Java SDK, so we configure it just like any other servlet. Before continuing, please read and work through Google's Getting Started: Java – Google App Engine.
  1. Create a new GAE project by creating the following directories on your local drive:
        quercusproject
        quercusproject/src
        quercusproject/war
        quercusproject/war/WEB-INF
        quercusproject/war/WEB-INF/lib
    
  2. Copy lib/resin.jar from the Resin download above to quercusproject/war/WEB-INF/lib. Note: I've created a special pre-release Quercus snapshot just this article. Please download it here on GitHub and use the included jars instead. The snapshot exposes functionality unique to GAE that we originally had hidden.
  3. Create a quercusproject/war/WEB-INF/web.xml file for GoogleQuercusServlet:
    <web-app xmlns="http://java.sun.com/xml/ns/javaee" version="2.5">
      <servlet>
        <servlet-name>Quercus Servlet</servlet-name>
        <servlet-class>com.caucho.quercus.servlet.GoogleQuercusServlet</servlet-class>
    
        <init-param>
          <param-name>ini-file</param-name>
          <param-value>WEB-INF/php.ini</param-value>
        </init-param>
      </servlet>
    
      <servlet-mapping>
        <servlet-name>Quercus Servlet</servlet-name>
        <url-pattern>*.php</url-pattern>
      </servlet-mapping>
    
      <welcome-file-list>
        <welcome-file>index.php</welcome-file>
      </welcome-file-list>
    </web-app>
  4. Create a quercusproject/war/WEB-INF/appengine-web.xml file:
    <?xml version="1.0" encoding="utf-8"?>
    <appengine-web-app xmlns="http://appengine.google.com/ns/1.0">
      <application>your_app_id_here</application>
      <version>1</version>
    
      <threadsafe>true</threadsafe>
    
      <static-files>
        <exclude path="/**.php" />
      </static-files>
    </appengine-web-app>
    Note: remember to change “your_app_id_here” to your actual appspot application ID when deploying to GAE.
  5. Create a quercusproject/war/test.php file:
    <?php
    
      phpinfo();
    
    ?>
  6. Start up the GAE development server with Eclipse or with ant:
        ant runserver
  7. Browse to http://localhost:8080/test.php and you should reach the PHP info page titled “Quercus”.
At this point, the test project should be complete and deployable to GAE.

IV. Writing to files on GAE

GAE does not allow applications to write to the local file system. However, Google recently added a files-like API (com.google.appengine.api.files) that applications can use to write streams to GAE DataStore or Google Cloud Storage. Quercus leverages this API to grant write capability to PHP file functions. In addition, Quercus goes a step further and merges the local and network file systems together to create a unified view. For example, consider the following script:

<pre>
<?php

  $filename = 'test.txt';
  var_dump(file_get_contents($filename));

  file_put_contents($filename, 'modified at time ' . date('Y-m-d H:i:s O'));
  var_dump(file_get_contents($filename));

?>

Suppose the file test.txt exists on the local file system. The first file_get_contents()call retrieves it. Then we try to overwrite that file. But since GAE does not allow writes to the local file system, Quercus writes to Google Cloud Storage instead. The nextfile_get_contents() call retrieves the newly-written file with the new data, not the old local file. And if the network file is cached in memcache, Quercus will serve that instead. This is all done transparently behind-the-scenes and at no point is the application aware that it's not reading/writing to the local file system.

A. Setting up Quercus to use Google Cloud Storage

Google Cloud Storage is structured around uniquely-named buckets, which are very similar to mounted disk drives. As with regular drives, buckets can hold a hierarchical list of files and directories. To start things off, we need to enable Google Cloud Storage, create a bucket, and grant our application access to Google Cloud Storage.
  1. Activate Google Cloud Storage and enable billing.
    Google Cloud Storage is free for the first 5GB of storage.
  2. Go to your Google API Console => Google Cloud Storage => Google Cloud Storage Manager. Create a new bucket named “quercusbucket0”.
    If you cannot create a new bucket, then it's very likely your billing is not enabled or is still pending.
  3. Go to your Google App Engine Overview => select or create your application => Application Settings. Make a note of the application's “Service Account Name”. Then go to Google API Console => Team. Create a new teammate with the “Service Account Name”.
    Note: if you had just created a new application, remember to update yourquercusproject/war/WEB-INF/appengine-web.xml with the actual application ID.
  4. Your GAE application should now has access to most of your Google APIs including Google Cloud Storage (but not Google Cloud SQL unfortunately). We now need to give Quercus the name of the bucket we just created. We do so by setting a servlet init-param in the web.xml file:
    <web-app xmlns="http://java.sun.com/xml/ns/javaee" version="2.5">
      <servlet>
        <servlet-name>Quercus Servlet</servlet-name>
        <servlet-class>com.caucho.quercus.servlet.GoogleQuercusServlet</servlet-class>
    
        <init-param>
          <param-name>ini-file</param-name>
          <param-value>WEB-INF/php.ini</param-value>
        </init-param>
    
        <init-param>
          <param-name>cloud-storage-bucket</param-name>
          <param-value>quercusbucket0</param-value>
        </init-param>
      </servlet>
    
      <servlet-mapping>
        <servlet-name>Quercus Servlet</servlet-name>
        <url-pattern>*.php</url-pattern>
      </servlet-mapping>
    
      <welcome-file-list>
        <welcome-file>index.php</welcome-file>
      </welcome-file-list>
    </web-app>
    Note: the bucket name can also be set in a php.ini file, but that capability is broken at the moment.
    Note: you do not need to enable Google Cloud Storage and billing in order to test on the development server, but you do need to set <cloud-storage-bucket> .
  5. Create a quercusproject/war/vfs.php file with the following: 
    <pre>
    <?php
    
      $filename = 'test.txt';
      var_dump(file_get_contents($filename));
    
      file_put_contents($filename, 'modified at time ' . date('Y-m-d H:i:s O'));
      var_dump(file_get_contents($filename));
    
    ?>
    
  6. Launch the development server or deploy to GAE. Browse to the vfs.php page. Refresh a few times. You should see something like the following:
    string(42) "modified at time 2012-12-05 22:04:21 +0000"
    string(42) "modified at time 2012-12-05 22:25:28 +0000”
    

V. Executing SQL queries on GAE

For the longest time, Google's distributed key-value store was GAE's only persistent store option. However not all use cases can be boxed into key-values, so Google Cloud SQL was a welcome addition to GAE. It does come at a price because Cloud SQL has no free quota and you're forced to pay from the start.
Update: as of 2012-11-08, Google is offering a free 6-month trial for a tiny D0 instance.
As with the PHP file functions, Quercus transparently maps PHP mysql and PDO functions to Cloud SQL. The only caveat is that the host url must begin with "jdbc:google:rdbms://" instead of localhost or a remote IP or domain name. Quercus will recognize the following and connect to Cloud SQL appropriately:

<?php

  $db = mysql_connect('jdbc:google:rdbms://foo:bar');

?>

A. Setting up the Development Server to use local MySQL

To do testing, we need to set up the development server to use the local MySQL instance.
  1. Download the MySQL JDBC driver at http://dev.mysql.com/downloads/connector/j/. Uncompress mysql-connector-java-5.1.22.tar.gz and place mysql-connector-java-5.1.22-bin.jar into your <appengine-sdk>/lib/impl directory.
  2. Follow the instructions at Google's Using a Local MySQL Instance During Development.
    If you're using ant, then all you need to do is add the following to your build.xml: 
      <target name="runserver" depends="datanucleusenhance"
          description="Starts the development server.">
        <dev_appserver war="war">
          <options>
            <arg value="--jvm_flag=-Drdbms.server=local"/>
            <arg value="--jvm_flag=-Drdbms.driver=com.mysql.jdbc.Driver"/>
            <arg value="--jvm_flag=-Drdbms.url=jdbc:mysql://127.0.0.1:3306/wordpress0?user=root"/>
          </options>
        </dev_appserver>
      </target>

    For the development server only, any dummy JDBC url string that begins withjdbc:google:rdbms// gets redirected to the local MySQL instance. But you still need to pass in the correct credentials to the mysql functions for both the development server and GAE:
      // $user and $pass must be valid
      // Google Cloud SQL default user is root with an empty password
      $db = mysql_connect('jdbc:google:rdbms://foo:bar', $user, $pass);

B. Setting up Google Cloud SQL

If we wish to deploy to GAE, we also need to enable Google Cloud SQL and billing.
  1. Activate Google Cloud SQL and turn on billing. Create a new Cloud SQL instance.
    Google Cloud SQL is priced at two levels: a per-month or a per-use plan. There is currently no free trial quota.
  2. Grant your application access to the newly-created Cloud SQL instance.
  3. Create a quercusproject/war/mysql.php file with the following: 
    <pre>
    <?php
    
      $jdbc_url = 'jdbc:google:rdbms//project_name:instance_name';
      $db = mysql_connect($jdbc_url);
    
      $result = mysql_query('SHOW DATABASES', $db);
    
      while (($row = mysql_fetch_assoc($result))) {
        var_dump($row);
      }
    
    ?>
    Note: remember to update $jdbc_url with the actual JDBC url for your MySQL instance.
    Note: Google Cloud SQL's root user has an empty default password.
  4. Deploy the application to GAE and browse to mysql.php. You should see a list of database names from your MySQL instance.

VI. Putting it all together – Running Wordpress on GAE

We now have all the pieces to install a stock Wordpress onto GAE.
  1. Download Wordpress at http://wordpress.org/download/.
  2. Uncompress the archive into your quercusproject/war directory.
  3. Make sure the bucket name is set in web.xml.
  4. Create a new MySQL database for Wordpress:
        CREATE DATABASE wordpress0;
    
  5. Deploy your application to GAE.
  6. Browse to wordpress/ and follow the prompts to install Wordpress. The default database user should be root with an empty password. Your Google Cloud SQL instance's base JDBC URL should go into the “Database Host” field. Wordpress will automatically create a custom config file and populate the database with initial data.

  7. Wordpress will then prompt you to configure an admin account:
  8. And then you're done!
Congrats! Your blog is installed and lives in the Cloud.
Here is a sample Wordpress site running on GAE: http://wordpress-on-quercus-2.appspot.com/

VII. Source Code

The source code for the examples are available on GitHub.
Author: Nam Nguyen is a Software Engineer at Caucho Technology in San Francisco/San Diego, CA. He is currently the lead for the Quercus project.
Meta: this post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on! Want to write for the blog? We are looking for contributors to fill all 24 slot and would love to have your contribution! Contact Attila Balazs to contribute!

No comments:

Post a Comment