
False-positives for the `msi` detection #162

fazouane-marouane posted onGitHub


It seems that doc files (in fact all files that are CFB based, meaning: msi, doc, xls, ppt, oft...) are recognized as "msi" files. Example of a doc file:

Checking the code, I found the following:

    if (check([0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1])) {
        return {
            ext: 'msi',
            mime: 'application/x-msi'

The issue here is that 0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1 is the header signature for CFB format not msi specifically.

Can we please remove this entry and discuss other ways of recognizing such files? Thanks

posted by karlhiramoto over 6 years ago

mmmagic detects .doc files correctly as application/msword

posted by thehappycoder almost 6 years ago

I have something that detects doc/xls/ppt/msi... but it needs to parse the whole cfb container file and doesn’t use magic values which is not that perfect for most people. (+1ko gzipped if my memory serves me right)

posted by fazouane-marouane almost 6 years ago

I have the same issue.

wrong: (using "file-type-cli") $ node cli.js my.doc msi application/x-msi

correct: (using Ubuntu's "file" command) $ file --mime-type -b my.doc application/msword

posted by ilsundal almost 6 years ago

I have something that detects doc/xls/ppt/msi... but it needs to parse the whole cfb container file and doesn’t use magic values which is not that perfect for most people. (+1ko gzipped if my memory serves me right)

If this is the only way then so be it :)

Note that Ubuntu's "file" command/utility also works nicely (and super-fast). One option could be to investigate how it does it and then do the same in the "file-type" library.

posted by ilsundal almost 6 years ago
posted by PhantomSophia almost 6 years ago

The fix would not be to add doc support, which I'm not interested in, but rather improve the msi detection.

posted by sindresorhus almost 6 years ago

@issuehunt has funded $40.00 to this issue.

posted by issuehunt-app[bot] almost 6 years ago

Hi, is anyone able to provide true positive .msi files?

Using the following check I've been able to detect msi (that I've found) without having a false positive on .doc, .xls or .ppt

Full disclosure: I started with the commented out magic bytes from but ended up just reading the byte stream of msi files.

check([0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x3E, 0x00, 0x04, 0x00, 0xFE, 0xFF, 0x0C, 0x00, 0x06])
posted by HugoDF over 5 years ago

@sindresorhus has rewarded $36.00 to @hugodf. See it on IssueHunt

  • :moneybag: Total deposit: $40.00
  • :tada: Repository reward(0%): $0.00
  • :wrench: Service fee(10%): $4.00
posted by issuehunt-app[bot] over 5 years ago

Fund this Issue


Rewarded pull request

Recent activities

hugodf was rewarded by sindresorhus for sindresorhus/file-type# 162
over 5 years ago
hugodf submitted an output to  sindresorhus/ file-type# 162
over 5 years ago