Open Source Part of Speech Taggers
I'm on a new project at work, and it requires we tag parts of speech. To that end, I'll be evaluating some of the existing open source NLP/POS Taggers - I'll post results once I'm done, but for now here is a list of current open source POS Taggers.
- Stanford (GPL)
- LBJ/POS (BSD-like)
- Berkley Parser (GPL)
- XTAG/Spinal (GPL)
- MorphAdorner (NCSA/BSD-like)
- OpenNLP (Apache)
I'll be concentrating on MorphAdorner and OpenNLP, as they seem to have the most favorable licensing and most activity.
Accessing Facebook Fan Page Video tagging settings
If you run a fan page, you probably already know you can allow your fans to tag photos you post (and also control if they can post pictures to your page). But did you know you can do the same for Videos? You can, but it is hidden.
Steps to access your video settings:
1) go to your Facebook Page
2) Click the "Edit Info" link near the top of the page
3) Click the "Apps" item on the left hand menu
4) Click "Go To App" under the "Photos" section
5) (this is the trick) - look at your URL bar - you should see a section that says "aid=2305272732". Replace it with "aid=2392950137", hit enter... and now you can manage the ability of your fans to tag and post Videos.
New wordpress plugin published
I've just released a small wordpress plugin for showing paypal funds received vs a target amount (with time periods, so, monthly for instance).
You can find it here
Installing Mono and ASP.Net on Bluehost (and other shared hosting providers)
I get a request from a friend the other day to get Mindtouch working on Bluehost. It's not working yet, but Mono, XSP, and mod_mono are all fully working... and I thought I'd share the process. It's pretty basic to get compiled and running, and only takes a few modifications to the build files.
details after the break...
Flex SWC class parser
Quick tool that prints out all the available classes exported by a SWC file.
Usage swfinfo.py
#!/usr/bin/python
# Filename: swc-info.pl
# Author : Lokkju Brennr <lokkju@lokkju.com>
# License : GPL
# Copyright 2010
import sys
import getopt
import zipfile
from xml.etree import ElementTree
def usage():
print "%s <swcfile>" % (sys.argv[0])
print " Prints all classes exported by the provided swc(s). Supports wildcard globbing"
sys.exit(2)
def main():
# parse command line options
if len(sys.argv) < 2:
usage()
files = sys.argv[1:]
for swcfile in files:
z = zipfile.ZipFile(swcfile,'r')
catalog = z.open("catalog.xml")
catalogxml = catalog.read()
catalog.close()
z.close()
tree = ElementTree.XML(catalogxml)
defs = tree.findall(".//{http://www.adobe.com/flash/swccatalog/9}def")
for d in defs:
id = d.attrib.get("id")
type = d.attrib.get("type")
if type is None:
print "%s:%s" % (swcfile,id)
if __name__ == "__main__":
main()
Supporting dynamic FlexFileSets in Flex’s compc and mxmlc Ant tasks
If you are using the mxmlc and compc tasks in Ant to compile flex code, there is no documented way to make the fileset-like children accept a dynamic include pattern - that is, one you set based on conditionals.
In my case, I need to have a list of included libraries in my build.properties, and only include those ones in my compc task. The solution is to use patternsets and a custom ant macro. See the source below, but essentially you do the following:
- Create a new patternset, assigning it an id
- Use the append.to.patternset macro to add a new patternset for each pattern in your list of library patterns (as defined in you build.properties, or dynamically, or...)
- assign that patternset to the library-path of compc
Important: All the patterns must decend from the same root directory as set in the library-path. If you need multiple root directories, you must use multiple library-path directives and multiple patternset refids.
build.xml:
<?xml version="1.0" encoding="utf-8"?>
<project name="My Component Builder" basedir=".">
<taskdef resource="flexTasks.tasks" classpath="${basedir}/flexTasks/lib/flexTasks.jar" />
<property file="build.properties"/>
<property name="FLEX_HOME" value="C:/flex/sdk"/>
<property name="DEPLOY_DIR" value="c:/jrun4/servers/default/default-war"/>
<property name="COMPONENT_ROOT" value="components"/>
<macrodef name="append.to.patternset">
<attribute name="patternset"/>
<element name="nested" optional="yes" implicit="true"/>
<sequential>
<patternset id="tmp">
<patternset refid="@{patternset}"/>
<nested/>
</patternset>
<patternset id="@{patternset}"><patternset refid="tmp"/></patternset>
<patternset id="tmp"/>
</sequential>
</macrodef>
<patternset id="compc.library-path" />
<for list="${compc.libraries}" param="lib">
<sequential>
<append.to.patternset patternset="compc.library-path">
<patternset>
<include name="@{lib}" />
</patternset>
</append.to.patternset>
</sequential>
</for>
<target name="main">
<compc
output="${DEPLOY_DIR}/MyComps.swc"
include-classes="custom.MyButton custom.MyLabel">
<source-path path-element="${basedir}/components"/>
<include-file name="f1-1.jpg" path="assets/images/f1-1.jpg"/>
<include-file name="main.css" path="assets/css/main.css"/>
<library-path append="true" dir="${compc.libdir}">
<patternset refid="compc.library-path" />
</library-path>
</compc>
</target>
<target name="clean">
<delete>
<fileset dir="${DEPLOY_DIR}" includes="MyComps.swc"/>
</delete>
</target>
</project>
build.properties:
compc.libdir=${rootdir}/libs/
compc.libraries=lib1.swc,plib*.swc,**/type.swc
Adding functionality to HTSQL v2
HTSQL is a very cool open source product that gives you a RESTful interface to multiple database backends. Because it uses it's own simple, but very powerful, syntax, you avoid most of the risks involved in passing in SQL. Currently it supports SQLite and PostgreSQL, but for my current project, I need to support Geometry columns.
For the first draft, I just wanted to add support for Spatialite, a spatially enabled version of SQLite. Since SQLite is already supported, this turned out to be relatively easy - though in hindsight, I may not even have implemented it in the easiest way possible - but I'll get to that in another post.
So, each database backend is in a namespace called htsql_[name], and they all subclass files in the main htsql namespace. I started off by cloning the htsql_sqlite tree into a new namespace called htsql_spatialite, and ripped out most of the code, leaving me with a basic structure. I then subclassed any SQLite classes I wanted to override - most importantly:
- Changed connect.py to import pyspatialite instead of pysqlite2.
- Added my own Column and Data types (Domains) in domain.py
- Modified introspect.py to handle my custom Domains, as well as to handle the blank Column type sometimes given for Geometry columns
I also, and here is where most of my functionality was added, overrode a few classes in tr/serializer.py:
- Class SpatialiteSerializeLeafReference(subclassing SerializeLeafReference) tests if I am selecting a Geometry column, and if I am, wraps it in the "AsText" function, to return WKT.
- a new Adaptor, FormatGeometry, which handled the representation of the WKT when returned to the client. Right now, only HTML is supported, but JSON, CSV, and the rest are easy to add in the same way.
The last thing you have to do is add a line in setup.py to point to your new database engine's entry point - it is in a list called ENTRY_POINTS.
Interestingly, I think I could better utilize the plugin architecture - but as I'm just discovering HTSQL, and there isn't all that many samples, nor much documentation, I'm pretty happy with what I accomplished.
You can see the full source of my additions at https://bitbucket.org/lokkju/htsql-appengine/src/3d7b7d8e8580/
Converting from Spatialite to PostGIS
A quick one-liner using ogr2ogr to convert from spatialite to postgis:
ogr2ogr -f PostgreSQL PG:"host=host_ip user=username password=password dbname=database" -lco LAUNDER="YES" sqlite.sqlitefile -skipfailures
Setting up PostGIS 1.5 on PostgreSQL 8.4.1 (on Debian)
I found that getting the template database for postgis set up was somewhat poorly documented - so:
First, create a role that will own the tables within the template database:
psql -c "CREATE ROLE gisgroup;"
Second, create and populate the template database:
createdb -E UNICODE template_postgis
createlang -d template_postgis plpgsql
psql -d template_postgis < /usr/share/postgresql/8.4/contrib/postgis-1.5/postgis.sql
psql -d template_postgis < /usr/share/postgresql/8.4/contrib/postgis-1.5/spatial_ref_sys.sql
psql -d template_postgis < /usr/share/postgresql/8.4/contrib/postgis_comments.sql
Third, set the ownership to the role you created:
psql -c "ALTER TABLE geometry_columns OWNER TO gisgroup;" template_postgis
psql -c "ALTER TABLE spatial_ref_sys OWNER TO gisgroup;" template_postgis
Fourth, we create the user for our database:
psql -c "CREATE USER yourgisuser WITH PASSWORD 'yourpassword';"
psql -c "GRANT gisgroup TO yourgisuser;"
Fifth, and last, we create a new postgis enable database:
createdb -T template_postgis -O yourgisuser your_new_postgis_database
Using Office Automation on IIS7
Though there are a lot of articles out there on office automation in dotnet (most of them telling you not to do it), there are very few covering how to get office automation up and running under IIS7 on a 64bit machine - and it is possible.
I needed to do this recently, and found one hint on how to get it working at http://forums.asp.net/t/1328690.aspx . They key is to use Process to launch the application you need, then attach to it.
Sample code in C#:
Microsoft.Office.Interop.Word.Application app = null;
Process proc = null;
Document doc = null;
try
{
ProcessStartInfo procinfo = new ProcessStartInfo(WORD_PATH, "");
procinfo.WorkingDirectory = AppDomain.CurrentDomain.BaseDirectory;
procinfo.CreateNoWindow = true;
procinfo.WindowStyle = ProcessWindowStyle.Hidden;
proc = Process.Start(procinfo);
proc.WaitForInputIdle();
app = (Microsoft.Office.Interop.Word.Application)System.Runtime.InteropServices.Marshal.GetActiveObject("Word.Application");
if (app == null) { throw new Exception("Word not found"); }
app.Visible = false;
app.DisplayAlerts = WdAlertLevel.wdAlertsNone;
#region Declare Params
object fileName = filename;
object ConfirmConversions = false;
object ReadOnly = true;
object AddToRecentFiles = Type.Missing;
object PasswordDocument = Type.Missing;
object PasswordTemplate = Type.Missing;
object Revert = Type.Missing;
object WritePasswordDocument = Type.Missing;
object WritePasswordTemplate = Type.Missing;
object Format = Type.Missing;
object Encoding = Type.Missing;
object Visible = false;
object OpenAndRepair = Type.Missing;
object DocumentDirection = Type.Missing;
object NoEncodingDialog = Type.Missing;
object XMLTransform = Type.Missing;
#endregion
doc = app.Documents.Open(ref fileName, ref ConfirmConversions, ref ReadOnly, ref AddToRecentFiles, ref PasswordDocument, ref PasswordTemplate, ref Revert, ref WritePasswordDocument, ref WritePasswordTemplate, ref Format, ref Encoding, ref Visible, ref OpenAndRepair, ref DocumentDirection, ref NoEncodingDialog, ref XMLTransform);
// DO WHATEVER YOU NEED TO HERE
}
finally
{
object saveChanges = WdSaveOptions.wdDoNotSaveChanges;
object OriginalFormat = Type.Missing;
object RouteDocument = Type.Missing;
try
{
((_Document)doc).Close(ref saveChanges, ref OriginalFormat, ref RouteDocument);
doc = null;
}
catch { }
try
{
((Microsoft.Office.Interop.Word._Application)app).Quit(ref saveChanges, ref OriginalFormat, ref RouteDocument);
app = null;
}
catch { }
try
{
proc.Kill();
}
catch { }
}
