Determining content type with Apache Tika

About Apache Tika

The project is hosted by the Apache Software Foundation. It supports detecting various file and content types. There is a full list of supported formats. When having a look at the list that displays the supported formats, many document formats are listed in there. E.g. text/plain, text/xml, the propritary Microsoft OOXML or the office standard Open Document. Furthermore images (image/gif, image/jpeg, image/bmp or image/tiff), videos (video/avi, video/mpgeg or video/mp4) and audios (audi/ogg, audio/x-wav or audio/mpeg) can be recognized by Tika. Even feeds (application/rss+xml, application/atom+xml) may be recognized. And many, many more … Continue reading