Splitting a string into tokens and putting tokens into array - strtok











up vote
1
down vote

favorite












char** token_arr(char* str, int n_tokens)
{
char**arr = malloc((n_tokens+1)*sizeof(char*));
char str2[n_tokens + 1];
strcpy(str2,str);
int i = 0;
char *p = strtok (str2, " ");

while (p != NULL)
{
arr[i] = p;
//printf("%sn", arr[i]);
p = strtok (NULL, " ");
i++;
}
return arr;
}


The purpose of token_arr is to get a string and a number of tokens, then put the tokens into an array. The array of tokens is returned.



int main(void) {
char*str1 = "( 8 + ( 41 - 12 ) )";
char**expression = token_arr(str1, 9);
for(int i = 0; i < 9; i++)
printf("expression[%d] = %cn", i, *expression2[i]);
return 0;
}


Output:



expression2[0] = (
expression2[1] =
expression2[2] =
expression2[3] =
expression2[4] =
expression2[5] =
expression2[6] =
expression2[7] =
expression2[8] =


Why is only the first value being printed? What's wrong with my code?










share|improve this question


















  • 1




    You are allocating the array of pointers, but the pointers point to a local variable, str2, in the function, and that variable is out of scope once you're back in main(), so you're accessing invalid data that has been reused and presumably zeroed. You'll need to allocate the string where you copy the string literal, and you'll need to find a way to pass both allocated pointers back to the main() so you can free the allocated memory accurately. As it stands, you have 'undefined behaviour'. That means that what you see is a valid response, as are many other possible behaviours.
    – Jonathan Leffler
    Nov 11 at 5:47












  • Unless you are using calloc or memset to preserve a sentinel NULL at the end of arr, there is no need for +1 in char**arr = malloc((n_tokens+1)*sizeof(char*));. arr is simply the number of pointers you are allocating. If you only have n_tokens you only need that number of pointers unless you are ensuring you have the final pointer NULL to allow iteration with a while loop -- but in that case you need to set ALL pointers NULL to begin with.
    – David C. Rankin
    Nov 11 at 5:57










  • You also need to avoid writing to more pointers than you have. So you need to ensure i never exceeds n_tokens+1 as you currently have it, e.g. while (i < n_tokens + 1 && p != NULL). You can allocate for each token with strdup, e.g. arr[i] = strdup (p);. Now the contents of arr[n] will survive the return.
    – David C. Rankin
    Nov 11 at 6:02















up vote
1
down vote

favorite












char** token_arr(char* str, int n_tokens)
{
char**arr = malloc((n_tokens+1)*sizeof(char*));
char str2[n_tokens + 1];
strcpy(str2,str);
int i = 0;
char *p = strtok (str2, " ");

while (p != NULL)
{
arr[i] = p;
//printf("%sn", arr[i]);
p = strtok (NULL, " ");
i++;
}
return arr;
}


The purpose of token_arr is to get a string and a number of tokens, then put the tokens into an array. The array of tokens is returned.



int main(void) {
char*str1 = "( 8 + ( 41 - 12 ) )";
char**expression = token_arr(str1, 9);
for(int i = 0; i < 9; i++)
printf("expression[%d] = %cn", i, *expression2[i]);
return 0;
}


Output:



expression2[0] = (
expression2[1] =
expression2[2] =
expression2[3] =
expression2[4] =
expression2[5] =
expression2[6] =
expression2[7] =
expression2[8] =


Why is only the first value being printed? What's wrong with my code?










share|improve this question


















  • 1




    You are allocating the array of pointers, but the pointers point to a local variable, str2, in the function, and that variable is out of scope once you're back in main(), so you're accessing invalid data that has been reused and presumably zeroed. You'll need to allocate the string where you copy the string literal, and you'll need to find a way to pass both allocated pointers back to the main() so you can free the allocated memory accurately. As it stands, you have 'undefined behaviour'. That means that what you see is a valid response, as are many other possible behaviours.
    – Jonathan Leffler
    Nov 11 at 5:47












  • Unless you are using calloc or memset to preserve a sentinel NULL at the end of arr, there is no need for +1 in char**arr = malloc((n_tokens+1)*sizeof(char*));. arr is simply the number of pointers you are allocating. If you only have n_tokens you only need that number of pointers unless you are ensuring you have the final pointer NULL to allow iteration with a while loop -- but in that case you need to set ALL pointers NULL to begin with.
    – David C. Rankin
    Nov 11 at 5:57










  • You also need to avoid writing to more pointers than you have. So you need to ensure i never exceeds n_tokens+1 as you currently have it, e.g. while (i < n_tokens + 1 && p != NULL). You can allocate for each token with strdup, e.g. arr[i] = strdup (p);. Now the contents of arr[n] will survive the return.
    – David C. Rankin
    Nov 11 at 6:02













up vote
1
down vote

favorite









up vote
1
down vote

favorite











char** token_arr(char* str, int n_tokens)
{
char**arr = malloc((n_tokens+1)*sizeof(char*));
char str2[n_tokens + 1];
strcpy(str2,str);
int i = 0;
char *p = strtok (str2, " ");

while (p != NULL)
{
arr[i] = p;
//printf("%sn", arr[i]);
p = strtok (NULL, " ");
i++;
}
return arr;
}


The purpose of token_arr is to get a string and a number of tokens, then put the tokens into an array. The array of tokens is returned.



int main(void) {
char*str1 = "( 8 + ( 41 - 12 ) )";
char**expression = token_arr(str1, 9);
for(int i = 0; i < 9; i++)
printf("expression[%d] = %cn", i, *expression2[i]);
return 0;
}


Output:



expression2[0] = (
expression2[1] =
expression2[2] =
expression2[3] =
expression2[4] =
expression2[5] =
expression2[6] =
expression2[7] =
expression2[8] =


Why is only the first value being printed? What's wrong with my code?










share|improve this question













char** token_arr(char* str, int n_tokens)
{
char**arr = malloc((n_tokens+1)*sizeof(char*));
char str2[n_tokens + 1];
strcpy(str2,str);
int i = 0;
char *p = strtok (str2, " ");

while (p != NULL)
{
arr[i] = p;
//printf("%sn", arr[i]);
p = strtok (NULL, " ");
i++;
}
return arr;
}


The purpose of token_arr is to get a string and a number of tokens, then put the tokens into an array. The array of tokens is returned.



int main(void) {
char*str1 = "( 8 + ( 41 - 12 ) )";
char**expression = token_arr(str1, 9);
for(int i = 0; i < 9; i++)
printf("expression[%d] = %cn", i, *expression2[i]);
return 0;
}


Output:



expression2[0] = (
expression2[1] =
expression2[2] =
expression2[3] =
expression2[4] =
expression2[5] =
expression2[6] =
expression2[7] =
expression2[8] =


Why is only the first value being printed? What's wrong with my code?







c arrays output strtok






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 11 at 5:37









Treavor

85




85








  • 1




    You are allocating the array of pointers, but the pointers point to a local variable, str2, in the function, and that variable is out of scope once you're back in main(), so you're accessing invalid data that has been reused and presumably zeroed. You'll need to allocate the string where you copy the string literal, and you'll need to find a way to pass both allocated pointers back to the main() so you can free the allocated memory accurately. As it stands, you have 'undefined behaviour'. That means that what you see is a valid response, as are many other possible behaviours.
    – Jonathan Leffler
    Nov 11 at 5:47












  • Unless you are using calloc or memset to preserve a sentinel NULL at the end of arr, there is no need for +1 in char**arr = malloc((n_tokens+1)*sizeof(char*));. arr is simply the number of pointers you are allocating. If you only have n_tokens you only need that number of pointers unless you are ensuring you have the final pointer NULL to allow iteration with a while loop -- but in that case you need to set ALL pointers NULL to begin with.
    – David C. Rankin
    Nov 11 at 5:57










  • You also need to avoid writing to more pointers than you have. So you need to ensure i never exceeds n_tokens+1 as you currently have it, e.g. while (i < n_tokens + 1 && p != NULL). You can allocate for each token with strdup, e.g. arr[i] = strdup (p);. Now the contents of arr[n] will survive the return.
    – David C. Rankin
    Nov 11 at 6:02














  • 1




    You are allocating the array of pointers, but the pointers point to a local variable, str2, in the function, and that variable is out of scope once you're back in main(), so you're accessing invalid data that has been reused and presumably zeroed. You'll need to allocate the string where you copy the string literal, and you'll need to find a way to pass both allocated pointers back to the main() so you can free the allocated memory accurately. As it stands, you have 'undefined behaviour'. That means that what you see is a valid response, as are many other possible behaviours.
    – Jonathan Leffler
    Nov 11 at 5:47












  • Unless you are using calloc or memset to preserve a sentinel NULL at the end of arr, there is no need for +1 in char**arr = malloc((n_tokens+1)*sizeof(char*));. arr is simply the number of pointers you are allocating. If you only have n_tokens you only need that number of pointers unless you are ensuring you have the final pointer NULL to allow iteration with a while loop -- but in that case you need to set ALL pointers NULL to begin with.
    – David C. Rankin
    Nov 11 at 5:57










  • You also need to avoid writing to more pointers than you have. So you need to ensure i never exceeds n_tokens+1 as you currently have it, e.g. while (i < n_tokens + 1 && p != NULL). You can allocate for each token with strdup, e.g. arr[i] = strdup (p);. Now the contents of arr[n] will survive the return.
    – David C. Rankin
    Nov 11 at 6:02








1




1




You are allocating the array of pointers, but the pointers point to a local variable, str2, in the function, and that variable is out of scope once you're back in main(), so you're accessing invalid data that has been reused and presumably zeroed. You'll need to allocate the string where you copy the string literal, and you'll need to find a way to pass both allocated pointers back to the main() so you can free the allocated memory accurately. As it stands, you have 'undefined behaviour'. That means that what you see is a valid response, as are many other possible behaviours.
– Jonathan Leffler
Nov 11 at 5:47






You are allocating the array of pointers, but the pointers point to a local variable, str2, in the function, and that variable is out of scope once you're back in main(), so you're accessing invalid data that has been reused and presumably zeroed. You'll need to allocate the string where you copy the string literal, and you'll need to find a way to pass both allocated pointers back to the main() so you can free the allocated memory accurately. As it stands, you have 'undefined behaviour'. That means that what you see is a valid response, as are many other possible behaviours.
– Jonathan Leffler
Nov 11 at 5:47














Unless you are using calloc or memset to preserve a sentinel NULL at the end of arr, there is no need for +1 in char**arr = malloc((n_tokens+1)*sizeof(char*));. arr is simply the number of pointers you are allocating. If you only have n_tokens you only need that number of pointers unless you are ensuring you have the final pointer NULL to allow iteration with a while loop -- but in that case you need to set ALL pointers NULL to begin with.
– David C. Rankin
Nov 11 at 5:57




Unless you are using calloc or memset to preserve a sentinel NULL at the end of arr, there is no need for +1 in char**arr = malloc((n_tokens+1)*sizeof(char*));. arr is simply the number of pointers you are allocating. If you only have n_tokens you only need that number of pointers unless you are ensuring you have the final pointer NULL to allow iteration with a while loop -- but in that case you need to set ALL pointers NULL to begin with.
– David C. Rankin
Nov 11 at 5:57












You also need to avoid writing to more pointers than you have. So you need to ensure i never exceeds n_tokens+1 as you currently have it, e.g. while (i < n_tokens + 1 && p != NULL). You can allocate for each token with strdup, e.g. arr[i] = strdup (p);. Now the contents of arr[n] will survive the return.
– David C. Rankin
Nov 11 at 6:02




You also need to avoid writing to more pointers than you have. So you need to ensure i never exceeds n_tokens+1 as you currently have it, e.g. while (i < n_tokens + 1 && p != NULL). You can allocate for each token with strdup, e.g. arr[i] = strdup (p);. Now the contents of arr[n] will survive the return.
– David C. Rankin
Nov 11 at 6:02












1 Answer
1






active

oldest

votes

















up vote
3
down vote



accepted










While I think you have probably got most of the issues sorted based on the comments, let's look at a way to address both the validation/return of expressions and a way to return the number of tokens to protect against an error in tokenization resulting in less than n_tokens being found.



As you have learned, when you declare str2 local to token_arr, it has automatic storage duration and is only valid within the scope where it is declared. When token_arr returns, the memory holding str2 is released for re-use and any attempt to reference that memory back in main() invokes Undefined Behavior.



What are your options? (1) use strdup to dynamically allocate storage for each token, copy the token to the new memory allocated, and then assign the starting address for the new block of memory containing the token to arr[i], e.g.



        arr[i] = strdup (p);


or (2) do the same thing manually using strlen, malloc & memcpy, e.g.



        size_t len = strlen(p);
arr[i] = malloc (len + 1);
/* validate - here */
memcpy (arr[i], p, len + 1);


Now each arr[i] points to a block of memory having allocated storage duration which remains valid until free is called on that block -- or the program ends.



What If Less Than n_tokens Are Found?



If less than n_tokens are found within token_arr and you attempt to use n_tokens through expressions back in main() you will likely invoke Undefined Behavior again. To ensure you only use the tokens found in token_arr and made available in main() by the assignment to expression -- Pass A Pointer To n_tokens as the second parameter and update it will the value of i before you return arr;, e.g.



char **token_arr (const char *str, int *n_tokens)
{
char **arr = malloc(*n_tokens * sizeof *arr);
...
i++;
}
*n_tokens = i; /* assign i to make tokes assigned available */

return arr;
}


Now n_tokens back in main() contains only the number of tokens actually found and allocated for and assigned to arr[i] in token_arr.



Validate Every Allocation



It is critical that you validate every call to malloc, calloc, realloc, strdup or any other function that allocates memory for you. Allocation can, and does, fail. When it does, it let's you know by returning NULL instead of a pointer containing the beginning address for the new block of memory. Check every allocation.



Putting it altogether, you could do something like:



#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char **token_arr (const char *str, int *n_tokens)
{
char **arr = malloc(*n_tokens * sizeof *arr);
char str2 [strlen(str) + 1];
int i = 0;

if (!arr) { /* validate every allocation */
perror ("malloc-n_tokens");
return NULL;
}

strcpy (str2, str);

char *p = strtok (str2, " ");

while (i < *n_tokens && p != NULL) { /* check used pointers */
arr[i] = strdup (p);
if (!arr[i]) { /* strdup allocates -> you must validate */
perror ("strdup-arr[i]");
if (i) /* if tokens stored, break an return */
break;
else { /* if no tokes stored, free pointers */
free (arr);
return NULL;
}
}
p = strtok (NULL, " ");
i++;
}
*n_tokens = i; /* assign i to make tokes assigned available */

return arr;
}

int main (void) {

char *str1 = "( 8 + ( 41 - 12 ) )";
int n_tokens = 9;
char **expression = token_arr (str1, &n_tokens);

if (expression) { /* validate token_arr succeeded */
for (int i = 0; i < n_tokens; i++) { /* n_tokens times */
printf ("expression[%d] = %sn", i, expression[i]);
free (expression[i]); /* free mem allocated by strdup */
}
free (expression);
}

return 0;
}


(note: likewise check the return of token_arr before making use of the return)



Example Use/Output



$ ./bin/token_arr
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )


Memory Use/Error Check



In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.



It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.



For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.



$ valgrind ./bin/token_arr
==8420== Memcheck, a memory error detector
==8420== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8420== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==8420== Command: ./bin/token_arr
==8420==
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )
==8420==
==8420== HEAP SUMMARY:
==8420== in use at exit: 0 bytes in 0 blocks
==8420== total heap usage: 10 allocs, 10 frees, 92 bytes allocated
==8420==
==8420== All heap blocks were freed -- no leaks are possible
==8420==
==8420== For counts of detected and suppressed errors, rerun with: -v
==8420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


Always confirm that you have freed all memory you have allocated and that there are no memory errors.



Look things over and let me know if you have further questions.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53246140%2fsplitting-a-string-into-tokens-and-putting-tokens-into-array-strtok%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    3
    down vote



    accepted










    While I think you have probably got most of the issues sorted based on the comments, let's look at a way to address both the validation/return of expressions and a way to return the number of tokens to protect against an error in tokenization resulting in less than n_tokens being found.



    As you have learned, when you declare str2 local to token_arr, it has automatic storage duration and is only valid within the scope where it is declared. When token_arr returns, the memory holding str2 is released for re-use and any attempt to reference that memory back in main() invokes Undefined Behavior.



    What are your options? (1) use strdup to dynamically allocate storage for each token, copy the token to the new memory allocated, and then assign the starting address for the new block of memory containing the token to arr[i], e.g.



            arr[i] = strdup (p);


    or (2) do the same thing manually using strlen, malloc & memcpy, e.g.



            size_t len = strlen(p);
    arr[i] = malloc (len + 1);
    /* validate - here */
    memcpy (arr[i], p, len + 1);


    Now each arr[i] points to a block of memory having allocated storage duration which remains valid until free is called on that block -- or the program ends.



    What If Less Than n_tokens Are Found?



    If less than n_tokens are found within token_arr and you attempt to use n_tokens through expressions back in main() you will likely invoke Undefined Behavior again. To ensure you only use the tokens found in token_arr and made available in main() by the assignment to expression -- Pass A Pointer To n_tokens as the second parameter and update it will the value of i before you return arr;, e.g.



    char **token_arr (const char *str, int *n_tokens)
    {
    char **arr = malloc(*n_tokens * sizeof *arr);
    ...
    i++;
    }
    *n_tokens = i; /* assign i to make tokes assigned available */

    return arr;
    }


    Now n_tokens back in main() contains only the number of tokens actually found and allocated for and assigned to arr[i] in token_arr.



    Validate Every Allocation



    It is critical that you validate every call to malloc, calloc, realloc, strdup or any other function that allocates memory for you. Allocation can, and does, fail. When it does, it let's you know by returning NULL instead of a pointer containing the beginning address for the new block of memory. Check every allocation.



    Putting it altogether, you could do something like:



    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    char **token_arr (const char *str, int *n_tokens)
    {
    char **arr = malloc(*n_tokens * sizeof *arr);
    char str2 [strlen(str) + 1];
    int i = 0;

    if (!arr) { /* validate every allocation */
    perror ("malloc-n_tokens");
    return NULL;
    }

    strcpy (str2, str);

    char *p = strtok (str2, " ");

    while (i < *n_tokens && p != NULL) { /* check used pointers */
    arr[i] = strdup (p);
    if (!arr[i]) { /* strdup allocates -> you must validate */
    perror ("strdup-arr[i]");
    if (i) /* if tokens stored, break an return */
    break;
    else { /* if no tokes stored, free pointers */
    free (arr);
    return NULL;
    }
    }
    p = strtok (NULL, " ");
    i++;
    }
    *n_tokens = i; /* assign i to make tokes assigned available */

    return arr;
    }

    int main (void) {

    char *str1 = "( 8 + ( 41 - 12 ) )";
    int n_tokens = 9;
    char **expression = token_arr (str1, &n_tokens);

    if (expression) { /* validate token_arr succeeded */
    for (int i = 0; i < n_tokens; i++) { /* n_tokens times */
    printf ("expression[%d] = %sn", i, expression[i]);
    free (expression[i]); /* free mem allocated by strdup */
    }
    free (expression);
    }

    return 0;
    }


    (note: likewise check the return of token_arr before making use of the return)



    Example Use/Output



    $ ./bin/token_arr
    expression[0] = (
    expression[1] = 8
    expression[2] = +
    expression[3] = (
    expression[4] = 41
    expression[5] = -
    expression[6] = 12
    expression[7] = )
    expression[8] = )


    Memory Use/Error Check



    In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.



    It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.



    For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.



    $ valgrind ./bin/token_arr
    ==8420== Memcheck, a memory error detector
    ==8420== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
    ==8420== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
    ==8420== Command: ./bin/token_arr
    ==8420==
    expression[0] = (
    expression[1] = 8
    expression[2] = +
    expression[3] = (
    expression[4] = 41
    expression[5] = -
    expression[6] = 12
    expression[7] = )
    expression[8] = )
    ==8420==
    ==8420== HEAP SUMMARY:
    ==8420== in use at exit: 0 bytes in 0 blocks
    ==8420== total heap usage: 10 allocs, 10 frees, 92 bytes allocated
    ==8420==
    ==8420== All heap blocks were freed -- no leaks are possible
    ==8420==
    ==8420== For counts of detected and suppressed errors, rerun with: -v
    ==8420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


    Always confirm that you have freed all memory you have allocated and that there are no memory errors.



    Look things over and let me know if you have further questions.






    share|improve this answer

























      up vote
      3
      down vote



      accepted










      While I think you have probably got most of the issues sorted based on the comments, let's look at a way to address both the validation/return of expressions and a way to return the number of tokens to protect against an error in tokenization resulting in less than n_tokens being found.



      As you have learned, when you declare str2 local to token_arr, it has automatic storage duration and is only valid within the scope where it is declared. When token_arr returns, the memory holding str2 is released for re-use and any attempt to reference that memory back in main() invokes Undefined Behavior.



      What are your options? (1) use strdup to dynamically allocate storage for each token, copy the token to the new memory allocated, and then assign the starting address for the new block of memory containing the token to arr[i], e.g.



              arr[i] = strdup (p);


      or (2) do the same thing manually using strlen, malloc & memcpy, e.g.



              size_t len = strlen(p);
      arr[i] = malloc (len + 1);
      /* validate - here */
      memcpy (arr[i], p, len + 1);


      Now each arr[i] points to a block of memory having allocated storage duration which remains valid until free is called on that block -- or the program ends.



      What If Less Than n_tokens Are Found?



      If less than n_tokens are found within token_arr and you attempt to use n_tokens through expressions back in main() you will likely invoke Undefined Behavior again. To ensure you only use the tokens found in token_arr and made available in main() by the assignment to expression -- Pass A Pointer To n_tokens as the second parameter and update it will the value of i before you return arr;, e.g.



      char **token_arr (const char *str, int *n_tokens)
      {
      char **arr = malloc(*n_tokens * sizeof *arr);
      ...
      i++;
      }
      *n_tokens = i; /* assign i to make tokes assigned available */

      return arr;
      }


      Now n_tokens back in main() contains only the number of tokens actually found and allocated for and assigned to arr[i] in token_arr.



      Validate Every Allocation



      It is critical that you validate every call to malloc, calloc, realloc, strdup or any other function that allocates memory for you. Allocation can, and does, fail. When it does, it let's you know by returning NULL instead of a pointer containing the beginning address for the new block of memory. Check every allocation.



      Putting it altogether, you could do something like:



      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>

      char **token_arr (const char *str, int *n_tokens)
      {
      char **arr = malloc(*n_tokens * sizeof *arr);
      char str2 [strlen(str) + 1];
      int i = 0;

      if (!arr) { /* validate every allocation */
      perror ("malloc-n_tokens");
      return NULL;
      }

      strcpy (str2, str);

      char *p = strtok (str2, " ");

      while (i < *n_tokens && p != NULL) { /* check used pointers */
      arr[i] = strdup (p);
      if (!arr[i]) { /* strdup allocates -> you must validate */
      perror ("strdup-arr[i]");
      if (i) /* if tokens stored, break an return */
      break;
      else { /* if no tokes stored, free pointers */
      free (arr);
      return NULL;
      }
      }
      p = strtok (NULL, " ");
      i++;
      }
      *n_tokens = i; /* assign i to make tokes assigned available */

      return arr;
      }

      int main (void) {

      char *str1 = "( 8 + ( 41 - 12 ) )";
      int n_tokens = 9;
      char **expression = token_arr (str1, &n_tokens);

      if (expression) { /* validate token_arr succeeded */
      for (int i = 0; i < n_tokens; i++) { /* n_tokens times */
      printf ("expression[%d] = %sn", i, expression[i]);
      free (expression[i]); /* free mem allocated by strdup */
      }
      free (expression);
      }

      return 0;
      }


      (note: likewise check the return of token_arr before making use of the return)



      Example Use/Output



      $ ./bin/token_arr
      expression[0] = (
      expression[1] = 8
      expression[2] = +
      expression[3] = (
      expression[4] = 41
      expression[5] = -
      expression[6] = 12
      expression[7] = )
      expression[8] = )


      Memory Use/Error Check



      In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.



      It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.



      For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.



      $ valgrind ./bin/token_arr
      ==8420== Memcheck, a memory error detector
      ==8420== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
      ==8420== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
      ==8420== Command: ./bin/token_arr
      ==8420==
      expression[0] = (
      expression[1] = 8
      expression[2] = +
      expression[3] = (
      expression[4] = 41
      expression[5] = -
      expression[6] = 12
      expression[7] = )
      expression[8] = )
      ==8420==
      ==8420== HEAP SUMMARY:
      ==8420== in use at exit: 0 bytes in 0 blocks
      ==8420== total heap usage: 10 allocs, 10 frees, 92 bytes allocated
      ==8420==
      ==8420== All heap blocks were freed -- no leaks are possible
      ==8420==
      ==8420== For counts of detected and suppressed errors, rerun with: -v
      ==8420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


      Always confirm that you have freed all memory you have allocated and that there are no memory errors.



      Look things over and let me know if you have further questions.






      share|improve this answer























        up vote
        3
        down vote



        accepted







        up vote
        3
        down vote



        accepted






        While I think you have probably got most of the issues sorted based on the comments, let's look at a way to address both the validation/return of expressions and a way to return the number of tokens to protect against an error in tokenization resulting in less than n_tokens being found.



        As you have learned, when you declare str2 local to token_arr, it has automatic storage duration and is only valid within the scope where it is declared. When token_arr returns, the memory holding str2 is released for re-use and any attempt to reference that memory back in main() invokes Undefined Behavior.



        What are your options? (1) use strdup to dynamically allocate storage for each token, copy the token to the new memory allocated, and then assign the starting address for the new block of memory containing the token to arr[i], e.g.



                arr[i] = strdup (p);


        or (2) do the same thing manually using strlen, malloc & memcpy, e.g.



                size_t len = strlen(p);
        arr[i] = malloc (len + 1);
        /* validate - here */
        memcpy (arr[i], p, len + 1);


        Now each arr[i] points to a block of memory having allocated storage duration which remains valid until free is called on that block -- or the program ends.



        What If Less Than n_tokens Are Found?



        If less than n_tokens are found within token_arr and you attempt to use n_tokens through expressions back in main() you will likely invoke Undefined Behavior again. To ensure you only use the tokens found in token_arr and made available in main() by the assignment to expression -- Pass A Pointer To n_tokens as the second parameter and update it will the value of i before you return arr;, e.g.



        char **token_arr (const char *str, int *n_tokens)
        {
        char **arr = malloc(*n_tokens * sizeof *arr);
        ...
        i++;
        }
        *n_tokens = i; /* assign i to make tokes assigned available */

        return arr;
        }


        Now n_tokens back in main() contains only the number of tokens actually found and allocated for and assigned to arr[i] in token_arr.



        Validate Every Allocation



        It is critical that you validate every call to malloc, calloc, realloc, strdup or any other function that allocates memory for you. Allocation can, and does, fail. When it does, it let's you know by returning NULL instead of a pointer containing the beginning address for the new block of memory. Check every allocation.



        Putting it altogether, you could do something like:



        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>

        char **token_arr (const char *str, int *n_tokens)
        {
        char **arr = malloc(*n_tokens * sizeof *arr);
        char str2 [strlen(str) + 1];
        int i = 0;

        if (!arr) { /* validate every allocation */
        perror ("malloc-n_tokens");
        return NULL;
        }

        strcpy (str2, str);

        char *p = strtok (str2, " ");

        while (i < *n_tokens && p != NULL) { /* check used pointers */
        arr[i] = strdup (p);
        if (!arr[i]) { /* strdup allocates -> you must validate */
        perror ("strdup-arr[i]");
        if (i) /* if tokens stored, break an return */
        break;
        else { /* if no tokes stored, free pointers */
        free (arr);
        return NULL;
        }
        }
        p = strtok (NULL, " ");
        i++;
        }
        *n_tokens = i; /* assign i to make tokes assigned available */

        return arr;
        }

        int main (void) {

        char *str1 = "( 8 + ( 41 - 12 ) )";
        int n_tokens = 9;
        char **expression = token_arr (str1, &n_tokens);

        if (expression) { /* validate token_arr succeeded */
        for (int i = 0; i < n_tokens; i++) { /* n_tokens times */
        printf ("expression[%d] = %sn", i, expression[i]);
        free (expression[i]); /* free mem allocated by strdup */
        }
        free (expression);
        }

        return 0;
        }


        (note: likewise check the return of token_arr before making use of the return)



        Example Use/Output



        $ ./bin/token_arr
        expression[0] = (
        expression[1] = 8
        expression[2] = +
        expression[3] = (
        expression[4] = 41
        expression[5] = -
        expression[6] = 12
        expression[7] = )
        expression[8] = )


        Memory Use/Error Check



        In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.



        It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.



        For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.



        $ valgrind ./bin/token_arr
        ==8420== Memcheck, a memory error detector
        ==8420== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
        ==8420== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
        ==8420== Command: ./bin/token_arr
        ==8420==
        expression[0] = (
        expression[1] = 8
        expression[2] = +
        expression[3] = (
        expression[4] = 41
        expression[5] = -
        expression[6] = 12
        expression[7] = )
        expression[8] = )
        ==8420==
        ==8420== HEAP SUMMARY:
        ==8420== in use at exit: 0 bytes in 0 blocks
        ==8420== total heap usage: 10 allocs, 10 frees, 92 bytes allocated
        ==8420==
        ==8420== All heap blocks were freed -- no leaks are possible
        ==8420==
        ==8420== For counts of detected and suppressed errors, rerun with: -v
        ==8420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


        Always confirm that you have freed all memory you have allocated and that there are no memory errors.



        Look things over and let me know if you have further questions.






        share|improve this answer












        While I think you have probably got most of the issues sorted based on the comments, let's look at a way to address both the validation/return of expressions and a way to return the number of tokens to protect against an error in tokenization resulting in less than n_tokens being found.



        As you have learned, when you declare str2 local to token_arr, it has automatic storage duration and is only valid within the scope where it is declared. When token_arr returns, the memory holding str2 is released for re-use and any attempt to reference that memory back in main() invokes Undefined Behavior.



        What are your options? (1) use strdup to dynamically allocate storage for each token, copy the token to the new memory allocated, and then assign the starting address for the new block of memory containing the token to arr[i], e.g.



                arr[i] = strdup (p);


        or (2) do the same thing manually using strlen, malloc & memcpy, e.g.



                size_t len = strlen(p);
        arr[i] = malloc (len + 1);
        /* validate - here */
        memcpy (arr[i], p, len + 1);


        Now each arr[i] points to a block of memory having allocated storage duration which remains valid until free is called on that block -- or the program ends.



        What If Less Than n_tokens Are Found?



        If less than n_tokens are found within token_arr and you attempt to use n_tokens through expressions back in main() you will likely invoke Undefined Behavior again. To ensure you only use the tokens found in token_arr and made available in main() by the assignment to expression -- Pass A Pointer To n_tokens as the second parameter and update it will the value of i before you return arr;, e.g.



        char **token_arr (const char *str, int *n_tokens)
        {
        char **arr = malloc(*n_tokens * sizeof *arr);
        ...
        i++;
        }
        *n_tokens = i; /* assign i to make tokes assigned available */

        return arr;
        }


        Now n_tokens back in main() contains only the number of tokens actually found and allocated for and assigned to arr[i] in token_arr.



        Validate Every Allocation



        It is critical that you validate every call to malloc, calloc, realloc, strdup or any other function that allocates memory for you. Allocation can, and does, fail. When it does, it let's you know by returning NULL instead of a pointer containing the beginning address for the new block of memory. Check every allocation.



        Putting it altogether, you could do something like:



        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>

        char **token_arr (const char *str, int *n_tokens)
        {
        char **arr = malloc(*n_tokens * sizeof *arr);
        char str2 [strlen(str) + 1];
        int i = 0;

        if (!arr) { /* validate every allocation */
        perror ("malloc-n_tokens");
        return NULL;
        }

        strcpy (str2, str);

        char *p = strtok (str2, " ");

        while (i < *n_tokens && p != NULL) { /* check used pointers */
        arr[i] = strdup (p);
        if (!arr[i]) { /* strdup allocates -> you must validate */
        perror ("strdup-arr[i]");
        if (i) /* if tokens stored, break an return */
        break;
        else { /* if no tokes stored, free pointers */
        free (arr);
        return NULL;
        }
        }
        p = strtok (NULL, " ");
        i++;
        }
        *n_tokens = i; /* assign i to make tokes assigned available */

        return arr;
        }

        int main (void) {

        char *str1 = "( 8 + ( 41 - 12 ) )";
        int n_tokens = 9;
        char **expression = token_arr (str1, &n_tokens);

        if (expression) { /* validate token_arr succeeded */
        for (int i = 0; i < n_tokens; i++) { /* n_tokens times */
        printf ("expression[%d] = %sn", i, expression[i]);
        free (expression[i]); /* free mem allocated by strdup */
        }
        free (expression);
        }

        return 0;
        }


        (note: likewise check the return of token_arr before making use of the return)



        Example Use/Output



        $ ./bin/token_arr
        expression[0] = (
        expression[1] = 8
        expression[2] = +
        expression[3] = (
        expression[4] = 41
        expression[5] = -
        expression[6] = 12
        expression[7] = )
        expression[8] = )


        Memory Use/Error Check



        In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.



        It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.



        For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.



        $ valgrind ./bin/token_arr
        ==8420== Memcheck, a memory error detector
        ==8420== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
        ==8420== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
        ==8420== Command: ./bin/token_arr
        ==8420==
        expression[0] = (
        expression[1] = 8
        expression[2] = +
        expression[3] = (
        expression[4] = 41
        expression[5] = -
        expression[6] = 12
        expression[7] = )
        expression[8] = )
        ==8420==
        ==8420== HEAP SUMMARY:
        ==8420== in use at exit: 0 bytes in 0 blocks
        ==8420== total heap usage: 10 allocs, 10 frees, 92 bytes allocated
        ==8420==
        ==8420== All heap blocks were freed -- no leaks are possible
        ==8420==
        ==8420== For counts of detected and suppressed errors, rerun with: -v
        ==8420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


        Always confirm that you have freed all memory you have allocated and that there are no memory errors.



        Look things over and let me know if you have further questions.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 11 at 6:37









        David C. Rankin

        39.2k32546




        39.2k32546






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53246140%2fsplitting-a-string-into-tokens-and-putting-tokens-into-array-strtok%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Full-time equivalent

            Bicuculline

            さくらももこ