Determining content type with Apache Tika

About Apache Tika

The project is hosted by the Apache Software Foundation. It supports detecting various file and content types. There is a full list of supported formats. When having a look at the list that displays the supported formats, many document formats are listed in there. E.g. text/plain, text/xml, the propritary Microsoft OOXML or the office standard Open Document. Furthermore images (image/gif, image/jpeg, image/bmp or image/tiff), videos (video/avi, video/mpgeg or video/mp4) and audios (audi/ogg, audio/x-wav or audio/mpeg) can be recognized by Tika. Even feeds (application/rss+xml, application/atom+xml) may be recognized. And many, many more … Continue reading

Different modes for H2 Database Engine for testing

When testing software that requires a database for persisting data there are different ways to do this. One way is to use a real database engine instance for testing purpose only. Setting up a full scale database server is totally oversized for testing purpose. So using a dedicated simply to use database is a good way. Furthermore the data may not persisted as after the test the data is not necessary anymore. For performance reasons a non-persistent database engine would be great. There are many solutions for in-memory database testing. One of them is H2. Let’s have a look how H2 can be configured for this purpose. Continue reading

How to use regex groups in Java with ease

When searching for a pattern or a group of data while processing text in Java, regular expressions are a useful instrument to work with. Using regular expressions in Java, it is possible to identify data in a subset of other data. E.g. when having a number of groups or when developing a regular expression it might be helpful in case you don’t have to know the position of the group in context of the regular expression. For this purpose there is a feature called named capturing group. Continue reading

Redirecting embedded jetty log to log4j2

When using embedded jetty within an application, per default there is no support for log4j2. Using slf4j it is possible to redirect the logging to log4j2. But without this redirection there is no way to directly use it. But of course there is a way to use it. Only a piece of simple adapter code is required. Continue reading

Running Junit tests in a specific order

When running tests with Junit, per default the test execution order is deterministic. But the order of all the tests is not predictable. But for some reasons, it may be useful to ensure a specific order. E.g. one of your tests is sometimes failing and you have no idea. So maybe your production code is not threadsafe or there are some other effects that influence your test without knowing it. E.g. a cache that is filled with some results by a test running before. Continue reading

A new scope for Contexts and Dependency Injection – CDI

Dependency Injection is a great approach to decouple software components. Furthermore it is possible to inject the current context (state) of a software component. In Java EE 6 there are four scopes. With Java EE 7 a new scope was added. This post describes how to implement an additional scope that lives in its own lifecycle.
Continue reading