Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request help, Two same code behaves differently, leading to incorrect decompression results. #330

Open
digitalgust opened this issue Dec 28, 2024 · 3 comments

Comments

@digitalgust
Copy link

digitalgust commented Dec 28, 2024

I am using miniz in a project M, and it works great, but I encountered an issue that confuses me. The problem is that when I try to decompress a file in project M, it fails. However, when I place this code snippet in a separate test code T, it works perfectly. When this code is placed in project M, the decompression CRC32 check fails. I have debugged both projects simultaneously, trying to trace the code step by step and compare the execution results. I found that at one point, the return values differ between the test code T and the project M code, which eventually leads to incorrect decompressed data.
Below are several screenshots. The first three images show the environment variable values in both IDEs before executing the following line of code:

    mz_int64 cur_ofs = MZ_FTELL64(pZip->m_pState->m_pFile);

After executing this line of code, cur_ofs = 31 in my project code, while cur_ofs = 30 in the test code.
The last two images show that during execution, the test code T performs a SEEK operation, whereas in my project M, because file_ofs equals 31, no SEEK operation is performed. Consequently, the final results are different. In project M, the bytes read are off by one byte, as if the reading position is shifted forward by one byte compared to the correct result.
In project M, this function is frequently used to read some class files or resource files from a JAR.

the caller function, same as Project M and Project T

s32 zip_loadfile_to_mem(char const *jarpath, char const *filename, c8 *buf, s64 bufsize) {
    int file_index = 0;
    mz_zip_archive zipArchive = {0};
    mz_zip_archive_file_stat file_stat = {0};

    //skip the first '/'
    if (filename && filename[0] == '/') {
        filename += 1;
    }

    if (filename[0] == '0' && filename[1] == '\0') {
        s32 debug = 1;
    }
    int ret = 0;
    if (mz_zip_reader_init_file(&zipArchive, jarpath, 0) == MZ_FALSE) {//
        ret = -1;
    } else {

        file_index = mz_zip_reader_locate_file(&zipArchive, filename, NULL, 0);//
        if (!mz_zip_reader_file_stat(&zipArchive, file_index, &file_stat)) {
            ret = -1;
        } else {
            size_t uncompressed_size = (size_t) file_stat.m_uncomp_size;
            mz_bool op = mz_zip_reader_extract_file_to_mem(&zipArchive, file_stat.m_filename, buf, bufsize, MZ_ZIP_FLAG_CASE_SENSITIVE);
            if (!op) {
                ret = -1;
            }
        }
        mz_zip_reader_end(&zipArchive);
    }
    return ret;
}

1
2left_state
2right_state
3seeked
4result

@ell1e
Copy link
Contributor

ell1e commented Jan 1, 2025

I'm not a code contributor, and I'm assuming you already tested this, but since I recently had a similar problem caused by something external in the end:

  1. I would recommend you check the two files are bit-wise really the same. You might want to do the same for the miniz headers.

  2. Then I would do a memory check on your machine with memtest just in case. Faulty memory is known to affect decompression code in particular if you get unlucky.

  3. You might also want to check in Microsoft's app verifier, or if you have a Linux build then on a Linux machine or inside a Linux VM or perhaps inside WSL via valgrind, to see if your other code is perhaps corrupting any memory in a way that might affect miniz.

If miniz version and archive really are the same, other code causing memory corruption that spills over seems like not an unlikely culprit. I'm not saying it can't be miniz's fault, but you would perhaps wanna rule out such other causes if you haven't yet.

Sorry if this isn't helpful.

@digitalgust
Copy link
Author

ell1e, Thank you very much for your suggestions.

1,The two projects I opened are using the same file, so the file is consistent.
2,The issue with the machine's memory can be ruled out because this bug is reproducible, and it always occurs with a difference of one byte.
3,When I run the same program on a MAC, I do not encounter this error.

From my preliminary analysis, this error might be related to our compilation environment or multithreading. In my project M, if I call the function in the start of main(), miniz can run normally and produce the expected results. However, if I call it during the program's execution, this issue always occurs. Therefore, I suspect that it might be related to multithreading. However, even after adding synchronization to these functions, the error persists.

When I run this program on macOS, I have never encountered this error. Therefore, I also suspect that the issue might be related to the compilation environment. On Windows 10, I used MinGW-W64-builds-4.3.5 64-bit version (gcc (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0), and later I tried the 32-bit version of MinGW-W64-builds-4.3.5, which resulted in the same error. Then, I used the MinGW version bundled with JetBrains (gcc (GCC) 11.2.0), and the program worked correctly after compilation. Therefore, I am unsure whether attributing this issue to the compilation environment is correct.

@digitalgust
Copy link
Author

The problem is NOT produced by miniz.

This is the conclusion of my new test , MinGW take the blame for the error,
I used the following code to perform multi-threading tests in MinGW (gcc (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0). This code can reproduce the error, which is identical to the one I initially encountered. When running with multiple threads, the content read out would randomly be shifted forward by one byte compared to the correct content. After switching to a different compiler (GCC 11.2.0), repeated testing showed no issues.
I hope that anyone who encounters a similar problem can find this information helpful.

Detailed Explanation
Environment: MinGW (gcc (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0)
Issue: Multi-threaded operations cause the content read from the ZIP file to be randomly shifted forward by one byte.
Solution: Switching to GCC 11.2.0 resolved the issue, as the problem did not occur during repeated testing with the newer compiler.
Recommendation: If you encounter a similar issue, consider updating your compiler or trying a different version of GCC.
This experience might help others facing similar problems with multi-threaded applications and specific compiler versions.

Test code:


#include "miniz.h"
#include "stdint.h"
#include "stdlib.h"
#include "sys/types.h"
#include <stdio.h>
#include <pthread.h>

typedef unsigned char u8;
typedef signed char s8;
typedef char c8;
typedef unsigned short int u16;
typedef signed short int s16;
typedef unsigned int u32;
typedef signed int s32;
typedef float f32;
typedef double f64;
typedef unsigned long long int u64;
typedef signed long long int s64;
typedef void *__refer;
typedef void *__returnaddress;

s64 zip_get_file_unzip_size(char const *jarpath, char const *filename)
{
    int file_index = 0;
    mz_zip_archive zipArchive = { 0 };
    mz_zip_archive_file_stat file_stat = { 0 };

    if (filename[0] == '0' && filename[1] == '\0')
    {
        s32 debug = 1;
    }
    // skit the first '/'
    if (filename && filename[0] == '/')
    {
        filename += 1;
    }

    s64 ret = 0;
    if (mz_zip_reader_init_file(&zipArchive, jarpath, 0) == MZ_FALSE)
    { //
        ret = -1;
    }
    else
    {

        file_index = mz_zip_reader_locate_file(&zipArchive, filename, NULL, 0); //
        if (!mz_zip_reader_file_stat(&zipArchive, file_index, &file_stat))
        {
            ret = -1;
        }
        else
        {
            size_t uncompressed_size = (size_t)file_stat.m_uncomp_size;
            ret = uncompressed_size;
        }
    }
    mz_zip_reader_end(&zipArchive);
    return ret;
}

int zip_loadfile_to_mem(char const *jarpath, char const *filename, char *buf, s64 bufsize)
{
    int file_index = 0;
    mz_zip_archive zipArchive = { 0 };
    mz_zip_archive_file_stat file_stat = { 0 };

    // skit the first '/'
    if (filename && filename[0] == '/')
    {
        filename += 1;
    }

    if (filename[0] == '0' && filename[1] == '\0')
    {
        int debug = 1;
    }
    int ret = 0;
    if (mz_zip_reader_init_file(&zipArchive, jarpath, 0) == MZ_FALSE)
    { //
        ret = -1;
    }
    else
    {

        file_index = mz_zip_reader_locate_file(&zipArchive, filename, NULL, 0); //
        if (!mz_zip_reader_file_stat(&zipArchive, file_index, &file_stat))
        {
            ret = -1;
        }
        else
        {
            size_t uncompressed_size = (size_t)file_stat.m_uncomp_size;
            //            mz_bool op = mz_zip_reader_extract_file_to_mem(&zipArchive, file_stat.m_filename, buf, bufsize, MZ_ZIP_FLAG_CASE_SENSITIVE);
            mz_bool op = mz_zip_reader_extract_to_mem(&zipArchive, file_index, buf, bufsize, MZ_ZIP_FLAG_CASE_SENSITIVE);
            if (!op)
            {
                ret = -1;
            }
        }
    }
    mz_zip_reader_end(&zipArchive);
    return ret;
}

void load_process(void *para)
{
    //    c8 *jarpath = "D:\\tmp\\test.jar";
    c8 *jarpath = "D:\\GitHub\\miniJVM\\binary\\1.jar";
    c8 *filename = "0";

    for (int i = 0; i < 10000; i++)
    {
        s64 size = zip_get_file_unzip_size(jarpath, filename);
        if (size < 0)
        {
            continue;
        }
        char *buf = (char *)calloc(size, 1);
        s32 len = zip_loadfile_to_mem(jarpath, filename, buf, size);
        if (len < 0)
        {
            printf("load fail %d\n", i);
        }
        else
        {
            printf("load ok %d\n", i);
        }
    }
}

int main(int argc, char *argv[])
{
    pthread_t t; // 
    if (pthread_create(&t, NULL, load_process, NULL) != 0) // 
    {
        perror("Failed to create thread");
        return -1;
    }
    pthread_t t1; // 
    if (pthread_create(&t1, NULL, load_process, NULL) != 0) // 
    {
        perror("Failed to create thread");
        return -1;
    }
    pthread_t t2; // 
    if (pthread_create(&t2, NULL, load_process, NULL) != 0) // 
    {
        perror("Failed to create thread");
        return -1;
    }
    pthread_join(t, NULL); // 
    return 0;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants