2

I'm refreshing my C skills (been decades) and ran across something I don't quite understand. I'm working on some code that involves a lot of bit shifting, masking, etc. I have one function that creates a string with the binary representation of a number. I've found it's a lot more complicated than it needs to be so I've adjusted it, but here is the original code below.

Note: The caller is responsible for freeing memory, so I've taken care of that. This is running on Linux x64 in Eclipse C/C++ IDE.

char* get_bit_str(int num, int num_bits) {

    // XXX: This no worky worky...
    // When having multiple malloc'd string it eventually
    // get garbage.  Not too sure why...

    num_bits = num_bits > 0 ? num_bits : sizeof(int);

    char* buf = (char*)malloc(num_bits + 1);

    if (!buf) {
        printf("\nMemory allocation failed !\n");
        return "";
    }

    // Start at least significant bit.
    for (int i = num_bits - 1; i >= 0; i--) {
        char* bit = (num & (1 << i)) ? "1" : "0";

//      strncat(buf, bit, 1);
        strcat(buf, bit);
    }
//  strncat(buf, "\0", 1);
//  strcat(buf, "\n");

    return buf;
}

This works fine most of the time, but for one specific example, I get garbage. I've tried multiple approaches like overwriting the same char* or assigning a new one, but the result is always same. I also tried:

char* buf = (char*)malloc((num_bits + 1) * sizeof(char));

I've used strcpy and strncpy as well, but they produce the same issue. Both strcpy and strncpy append a null terminator, which might be an issue. Normally, the output is ok when I:

  1. Call the function

  2. Print the result

  3. Free the char* pointer

However, if I:

  1. Call the function with two different values (e.g., 1 and then 2)

  2. Print out both char* pointers

  3. Free both char* pointers

I get garbage for the second one using this test code (Like I say, I've tried to assign new variables but here I'm reusing the same "c" and "d"):

void test_get_bit_str() {
    char* c = get_bit_str(0, 4);
    printf("c (0) %s\n", c);

    char* d = get_bit_str(1, 4);
    printf("d (1) %s\n", d);

    free(c);
    free(d);

    c = get_bit_str(1, 4);
    printf("c (1) %s\n", c);

    d = get_bit_str(2, 4);
    printf("d (2) %s\n", d);

    free(c);
    free(d);
}

The output is:

c (0) 0000
d (1) 0001
c (1) ���D�U0001
d (2) 0010

You can see on the second c is garbage. I printed the pointer addresses, and it is reusing the same memory addresses after I free everything, which is fine, I think.

I really don't understand why this works when malloc and free is done for individual invocations but when I have two pointers defined (malloc twice, print out results, free both pointers), it's doing this.

Now, having stated all of that, I've made it a lot less complex with the following, which does work:

char* get_bit_str(int num, int num_bits) {
    num_bits = num_bits > 0 ? num_bits : sizeof(int);

    // Allocate enough memory for the number of bits including room for NULL.
    char* buf = (char*)malloc(num_bits + 1);

    if (!buf) {
        printf("\nMemory allocation failed !\n");
        return NULL;
    }

    *buf = 0;

    for (int i = 0; i < num_bits; i++) {
        char bit = (num & (1 << i)) ? '1' : '0';;

        // Here we want to but the least significant bit at the other end of
        // the array.  This is effectively swapping the array positions.
        buf[num_bits - 1 - i] = bit;
    }

    // Remember to append the NULL char.
    buf[num_bits] = '\0';

    return buf;
}

Any help understanding what's going on would be greatly appreciated.

9
  • 5
    That's because you should run your programs with clear UB symptoms through valgrind and similar tools. s/malloc/calloc - strcat scans its left operand until it finds a null terminator, and malloc returns uninitialized memory, so it can easily overrun the buffer. (and you certainly don't want to use strcat here, do not turn a simple O(N) operation into O(n^2)) Commented Nov 11 at 16:28
  • after char* buf = (char*)malloc(num_bits + 1); add *buf=0; Commented Nov 11 at 16:29
  • Gotcha. Yea, this was more for me understanding bit shifting with all of the bitwise operators so I'm not worried about performance. But thanks for the heads up! Commented Nov 11 at 18:14
  • After posting this I saw that I was returning a "" so I'm returning NULL instead and checking for that so that I did try to free a "". Also, I noticed that the revised function was not printing the bits in correct order (was printing 1 as 1000 instead of 0001) so I changed it to: buf[num_bits - 1 - i] = bit; instead of just buf[i] = bit; I also did what someone suggested and after the malloc deference it and set it to zero: *buf = 0; Commented Nov 11 at 18:31
  • 1
    You can't char* buf = ... and then *buf = 0; and then if (!buf) { ... because you are assigning to the contents of buf and then testing if buf has a value. If it didn't have a value the assignment of the contents would cause problems. You need to assign the contents AFTER checking if buf has a value. Commented Nov 12 at 0:59

2 Answers 2

5

Problems include :

buf is not certainly pointing to a string

On the first call, strcat(buf, bit); is undefined behavior (UB) as pointer buf does not yet point to a string. In C, a string is a sequence of characters up to an including the null character. After the malloc() call, the memory pointed to by buf is not yet assigned.

Assign buf[0] first to '\0'.

Default assignment insufficient

With, num_bits = num_bits > 0 ? num_bits : sizeof(int);, should num_bits <= 0, the number of bits used is the number s of bytes of an int.

Consider num_bits = num_bits > 0 ? num_bits : sizeof(int)*CHAR_BIT;

Unneeded code

With char* buf = (char*)malloc((num_bits + 1) * sizeof(char));, both the cast and * sizeof(char) serve no purpose.

Better as char* buf = malloc(num_bits + 1u);.

1u avoid UB and works better with pathological input when num_bits == INT_MAX.

Of course any num_bits > sizeof(int)*CHAR_BIT deserves detection to prevent later UB in the shifting.

Weak algorithm

for (int i = num_bits - 1; i >= 0; i--) { ... strcat(buf, bit); } is a Shlemiel The Painter code.


Minor: num & (1 << i) is UB when i = 31

When i = 31 or whatever the bit width of int minus 1, sifting by that much or more is UB.

Consider using unsigned math 1u << i. Note this is still UB when i > 32.

Minor: return ""; on out-of-memory is problematic

get_bit_str(num, 0) both return a pointer to "" should malloc() fail or succeed.

Consider returning NULL on failure.

Minor: Errors on strerr

The usual suggested practice to to output errors on stderr, rather than stdout.

// printf("\nMemory allocation failed !\n");
fprintf(stderr, "\nMemory allocation failed !\n");

Note that various simplifications possible.

With C23, we could use snprintf(buf, num_bits + 1, "%0*b", num_bits, (unsigned) (num & ((1llu << num_bits) - 1)); to handle many cases.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks so much for the details. I've adapted your points to my local test code. I'm keeping the first rendition around commented out so I know what does NOT work but your answers provide my WHY it does not work and that's awesome. I'll admit, I took some shortcuts (like i being bigger than 30 and whatnot) because this is only test code but your insights are great. I'll probably never write production code with C (use to decades ago but can't remember hardly anything from it). That's the problem with using higher level languages for decades, ya forget the basics!
As a side note, when I was thinking about how this algorithm would work, I know how to do it in Java (been dong that for 25 years). When I was researching this, I found a very similar algorithm here on SO that did the same kind of for loop (except it was printing the bits). Even that AI generated garbage at the top of search results has it that way (yet another reason I don't trust AI crap). I've never heard of Shlemiel The Painter so I'm going to research that. Thanks again, this is EXACTLY the kind of answer I was looking for!
3

As the commenters have suggested, strcat requires the first argument to be a NUL-terminated string. When you malloc for the buffer, it can contain garbage values, none of which are the NUL character. So, calling strcat with this buffer will cause the result to be garbage and even overrun the allocated memory. Since others have already identified the inefficiency with strcat, I'll just add that a solution that uses strcat would have to set *buf = '\0' immediately after the call to malloc, or alternatively, use calloc.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.