Determining content type with Apache Tika

About Apache Tika

The project is hosted by the Apache Software Foundation. It supports detecting various file and content types. There is a full list of supported formats. When having a look at the list that displays the supported formats, many document formats are listed in there. E.g. text/plain, text/xml, the propritary Microsoft OOXML or the office standard Open Document. Furthermore images (image/gif, image/jpeg, image/bmp or image/tiff), videos (video/avi, video/mpgeg or video/mp4) and audios (audi/ogg, audio/x-wav or audio/mpeg) can be recognized by Tika. Even feeds (application/rss+xml, application/atom+xml) may be recognized. And many, many more … Continue reading

Different modes for H2 Database Engine for testing

When testing software that requires a database for persisting data there are different ways to do this. One way is to use a real database engine instance for testing purpose only. Setting up a full scale database server is totally oversized for testing purpose. So using a dedicated simply to use database is a good way. Furthermore the data may not persisted as after the test the data is not necessary anymore. For performance reasons a non-persistent database engine would be great. There are many solutions for in-memory database testing. One of them is H2. Let’s have a look how H2 can be configured for this purpose. Continue reading

Running Junit tests in a specific order

When running tests with Junit, per default the test execution order is deterministic. But the order of all the tests is not predictable. But for some reasons, it may be useful to ensure a specific order. E.g. one of your tests is sometimes failing and you have no idea. So maybe your production code is not threadsafe or there are some other effects that influence your test without knowing it. E.g. a cache that is filled with some results by a test running before. Continue reading