Determining content type with Apache Tika

About Apache Tika

The project is hosted by the Apache Software Foundation. It supports detecting various file and content types. There is a full list of supported formats. When having a look at the list that displays the supported formats, many document formats are listed in there. E.g. text/plain, text/xml, the propritary Microsoft OOXML or the office standard Open Document. Furthermore images (image/gif, image/jpeg, image/bmp or image/tiff), videos (video/avi, video/mpgeg or video/mp4) and audios (audi/ogg, audio/x-wav or audio/mpeg) can be recognized by Tika. Even feeds (application/rss+xml, application/atom+xml) may be recognized. And many, many more … Continue reading

Different modes for H2 Database Engine for testing

When testing software that requires a database for persisting data there are different ways to do this. One way is to use a real database engine instance for testing purpose only. Setting up a full scale database server is totally oversized for testing purpose. So using a dedicated simply to use database is a good way. Furthermore the data may not persisted as after the test the data is not necessary anymore. For performance reasons a non-persistent database engine would be great. There are many solutions for in-memory database testing. One of them is H2. Let’s have a look how H2 can be configured for this purpose. Continue reading