Splitting a string into tokens and putting tokens into array - strtok
up vote
1
down vote
favorite
char** token_arr(char* str, int n_tokens)
{
char**arr = malloc((n_tokens+1)*sizeof(char*));
char str2[n_tokens + 1];
strcpy(str2,str);
int i = 0;
char *p = strtok (str2, " ");
while (p != NULL)
{
arr[i] = p;
//printf("%sn", arr[i]);
p = strtok (NULL, " ");
i++;
}
return arr;
}
The purpose of token_arr is to get a string and a number of tokens, then put the tokens into an array. The array of tokens is returned.
int main(void) {
char*str1 = "( 8 + ( 41 - 12 ) )";
char**expression = token_arr(str1, 9);
for(int i = 0; i < 9; i++)
printf("expression[%d] = %cn", i, *expression2[i]);
return 0;
}
Output:
expression2[0] = (
expression2[1] =
expression2[2] =
expression2[3] =
expression2[4] =
expression2[5] =
expression2[6] =
expression2[7] =
expression2[8] =
Why is only the first value being printed? What's wrong with my code?
c arrays output strtok
add a comment |
up vote
1
down vote
favorite
char** token_arr(char* str, int n_tokens)
{
char**arr = malloc((n_tokens+1)*sizeof(char*));
char str2[n_tokens + 1];
strcpy(str2,str);
int i = 0;
char *p = strtok (str2, " ");
while (p != NULL)
{
arr[i] = p;
//printf("%sn", arr[i]);
p = strtok (NULL, " ");
i++;
}
return arr;
}
The purpose of token_arr is to get a string and a number of tokens, then put the tokens into an array. The array of tokens is returned.
int main(void) {
char*str1 = "( 8 + ( 41 - 12 ) )";
char**expression = token_arr(str1, 9);
for(int i = 0; i < 9; i++)
printf("expression[%d] = %cn", i, *expression2[i]);
return 0;
}
Output:
expression2[0] = (
expression2[1] =
expression2[2] =
expression2[3] =
expression2[4] =
expression2[5] =
expression2[6] =
expression2[7] =
expression2[8] =
Why is only the first value being printed? What's wrong with my code?
c arrays output strtok
1
You are allocating the array of pointers, but the pointers point to a local variable,str2
, in the function, and that variable is out of scope once you're back inmain()
, so you're accessing invalid data that has been reused and presumably zeroed. You'll need to allocate the string where you copy the string literal, and you'll need to find a way to pass both allocated pointers back to themain()
so you can free the allocated memory accurately. As it stands, you have 'undefined behaviour'. That means that what you see is a valid response, as are many other possible behaviours.
– Jonathan Leffler
Nov 11 at 5:47
Unless you are usingcalloc
ormemset
to preserve a sentinel NULL at the end ofarr
, there is no need for+1
inchar**arr = malloc((n_tokens+1)*sizeof(char*));
.arr
is simply the number of pointers you are allocating. If you only haven_tokens
you only need that number of pointers unless you are ensuring you have the final pointerNULL
to allow iteration with awhile
loop -- but in that case you need to set ALL pointersNULL
to begin with.
– David C. Rankin
Nov 11 at 5:57
You also need to avoid writing to more pointers than you have. So you need to ensurei
never exceedsn_tokens+1
as you currently have it, e.g.while (i < n_tokens + 1 && p != NULL)
. You can allocate for each token withstrdup
, e.g.arr[i] = strdup (p);
. Now the contents ofarr[n]
will survive the return.
– David C. Rankin
Nov 11 at 6:02
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
char** token_arr(char* str, int n_tokens)
{
char**arr = malloc((n_tokens+1)*sizeof(char*));
char str2[n_tokens + 1];
strcpy(str2,str);
int i = 0;
char *p = strtok (str2, " ");
while (p != NULL)
{
arr[i] = p;
//printf("%sn", arr[i]);
p = strtok (NULL, " ");
i++;
}
return arr;
}
The purpose of token_arr is to get a string and a number of tokens, then put the tokens into an array. The array of tokens is returned.
int main(void) {
char*str1 = "( 8 + ( 41 - 12 ) )";
char**expression = token_arr(str1, 9);
for(int i = 0; i < 9; i++)
printf("expression[%d] = %cn", i, *expression2[i]);
return 0;
}
Output:
expression2[0] = (
expression2[1] =
expression2[2] =
expression2[3] =
expression2[4] =
expression2[5] =
expression2[6] =
expression2[7] =
expression2[8] =
Why is only the first value being printed? What's wrong with my code?
c arrays output strtok
char** token_arr(char* str, int n_tokens)
{
char**arr = malloc((n_tokens+1)*sizeof(char*));
char str2[n_tokens + 1];
strcpy(str2,str);
int i = 0;
char *p = strtok (str2, " ");
while (p != NULL)
{
arr[i] = p;
//printf("%sn", arr[i]);
p = strtok (NULL, " ");
i++;
}
return arr;
}
The purpose of token_arr is to get a string and a number of tokens, then put the tokens into an array. The array of tokens is returned.
int main(void) {
char*str1 = "( 8 + ( 41 - 12 ) )";
char**expression = token_arr(str1, 9);
for(int i = 0; i < 9; i++)
printf("expression[%d] = %cn", i, *expression2[i]);
return 0;
}
Output:
expression2[0] = (
expression2[1] =
expression2[2] =
expression2[3] =
expression2[4] =
expression2[5] =
expression2[6] =
expression2[7] =
expression2[8] =
Why is only the first value being printed? What's wrong with my code?
c arrays output strtok
c arrays output strtok
asked Nov 11 at 5:37
Treavor
85
85
1
You are allocating the array of pointers, but the pointers point to a local variable,str2
, in the function, and that variable is out of scope once you're back inmain()
, so you're accessing invalid data that has been reused and presumably zeroed. You'll need to allocate the string where you copy the string literal, and you'll need to find a way to pass both allocated pointers back to themain()
so you can free the allocated memory accurately. As it stands, you have 'undefined behaviour'. That means that what you see is a valid response, as are many other possible behaviours.
– Jonathan Leffler
Nov 11 at 5:47
Unless you are usingcalloc
ormemset
to preserve a sentinel NULL at the end ofarr
, there is no need for+1
inchar**arr = malloc((n_tokens+1)*sizeof(char*));
.arr
is simply the number of pointers you are allocating. If you only haven_tokens
you only need that number of pointers unless you are ensuring you have the final pointerNULL
to allow iteration with awhile
loop -- but in that case you need to set ALL pointersNULL
to begin with.
– David C. Rankin
Nov 11 at 5:57
You also need to avoid writing to more pointers than you have. So you need to ensurei
never exceedsn_tokens+1
as you currently have it, e.g.while (i < n_tokens + 1 && p != NULL)
. You can allocate for each token withstrdup
, e.g.arr[i] = strdup (p);
. Now the contents ofarr[n]
will survive the return.
– David C. Rankin
Nov 11 at 6:02
add a comment |
1
You are allocating the array of pointers, but the pointers point to a local variable,str2
, in the function, and that variable is out of scope once you're back inmain()
, so you're accessing invalid data that has been reused and presumably zeroed. You'll need to allocate the string where you copy the string literal, and you'll need to find a way to pass both allocated pointers back to themain()
so you can free the allocated memory accurately. As it stands, you have 'undefined behaviour'. That means that what you see is a valid response, as are many other possible behaviours.
– Jonathan Leffler
Nov 11 at 5:47
Unless you are usingcalloc
ormemset
to preserve a sentinel NULL at the end ofarr
, there is no need for+1
inchar**arr = malloc((n_tokens+1)*sizeof(char*));
.arr
is simply the number of pointers you are allocating. If you only haven_tokens
you only need that number of pointers unless you are ensuring you have the final pointerNULL
to allow iteration with awhile
loop -- but in that case you need to set ALL pointersNULL
to begin with.
– David C. Rankin
Nov 11 at 5:57
You also need to avoid writing to more pointers than you have. So you need to ensurei
never exceedsn_tokens+1
as you currently have it, e.g.while (i < n_tokens + 1 && p != NULL)
. You can allocate for each token withstrdup
, e.g.arr[i] = strdup (p);
. Now the contents ofarr[n]
will survive the return.
– David C. Rankin
Nov 11 at 6:02
1
1
You are allocating the array of pointers, but the pointers point to a local variable,
str2
, in the function, and that variable is out of scope once you're back in main()
, so you're accessing invalid data that has been reused and presumably zeroed. You'll need to allocate the string where you copy the string literal, and you'll need to find a way to pass both allocated pointers back to the main()
so you can free the allocated memory accurately. As it stands, you have 'undefined behaviour'. That means that what you see is a valid response, as are many other possible behaviours.– Jonathan Leffler
Nov 11 at 5:47
You are allocating the array of pointers, but the pointers point to a local variable,
str2
, in the function, and that variable is out of scope once you're back in main()
, so you're accessing invalid data that has been reused and presumably zeroed. You'll need to allocate the string where you copy the string literal, and you'll need to find a way to pass both allocated pointers back to the main()
so you can free the allocated memory accurately. As it stands, you have 'undefined behaviour'. That means that what you see is a valid response, as are many other possible behaviours.– Jonathan Leffler
Nov 11 at 5:47
Unless you are using
calloc
or memset
to preserve a sentinel NULL at the end of arr
, there is no need for +1
in char**arr = malloc((n_tokens+1)*sizeof(char*));
. arr
is simply the number of pointers you are allocating. If you only have n_tokens
you only need that number of pointers unless you are ensuring you have the final pointer NULL
to allow iteration with a while
loop -- but in that case you need to set ALL pointers NULL
to begin with.– David C. Rankin
Nov 11 at 5:57
Unless you are using
calloc
or memset
to preserve a sentinel NULL at the end of arr
, there is no need for +1
in char**arr = malloc((n_tokens+1)*sizeof(char*));
. arr
is simply the number of pointers you are allocating. If you only have n_tokens
you only need that number of pointers unless you are ensuring you have the final pointer NULL
to allow iteration with a while
loop -- but in that case you need to set ALL pointers NULL
to begin with.– David C. Rankin
Nov 11 at 5:57
You also need to avoid writing to more pointers than you have. So you need to ensure
i
never exceeds n_tokens+1
as you currently have it, e.g. while (i < n_tokens + 1 && p != NULL)
. You can allocate for each token with strdup
, e.g. arr[i] = strdup (p);
. Now the contents of arr[n]
will survive the return.– David C. Rankin
Nov 11 at 6:02
You also need to avoid writing to more pointers than you have. So you need to ensure
i
never exceeds n_tokens+1
as you currently have it, e.g. while (i < n_tokens + 1 && p != NULL)
. You can allocate for each token with strdup
, e.g. arr[i] = strdup (p);
. Now the contents of arr[n]
will survive the return.– David C. Rankin
Nov 11 at 6:02
add a comment |
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
While I think you have probably got most of the issues sorted based on the comments, let's look at a way to address both the validation/return of expressions
and a way to return the number of tokens to protect against an error in tokenization resulting in less than n_tokens
being found.
As you have learned, when you declare str2
local to token_arr
, it has automatic storage duration and is only valid within the scope where it is declared. When token_arr
returns, the memory holding str2
is released for re-use and any attempt to reference that memory back in main()
invokes Undefined Behavior.
What are your options? (1) use strdup
to dynamically allocate storage for each token, copy the token to the new memory allocated, and then assign the starting address for the new block of memory containing the token to arr[i]
, e.g.
arr[i] = strdup (p);
or (2) do the same thing manually using strlen, malloc & memcpy
, e.g.
size_t len = strlen(p);
arr[i] = malloc (len + 1);
/* validate - here */
memcpy (arr[i], p, len + 1);
Now each arr[i]
points to a block of memory having allocated storage duration which remains valid until free
is called on that block -- or the program ends.
What If Less Than n_tokens
Are Found?
If less than n_tokens
are found within token_arr
and you attempt to use n_tokens
through expressions
back in main()
you will likely invoke Undefined Behavior again. To ensure you only use the tokens found in token_arr
and made available in main()
by the assignment to expression
-- Pass A Pointer To n_tokens
as the second parameter and update it will the value of i
before you return arr;
, e.g.
char **token_arr (const char *str, int *n_tokens)
{
char **arr = malloc(*n_tokens * sizeof *arr);
...
i++;
}
*n_tokens = i; /* assign i to make tokes assigned available */
return arr;
}
Now n_tokens
back in main()
contains only the number of tokens actually found and allocated for and assigned to arr[i]
in token_arr
.
Validate Every Allocation
It is critical that you validate every call to malloc, calloc, realloc, strdup
or any other function that allocates memory for you. Allocation can, and does, fail. When it does, it let's you know by returning NULL
instead of a pointer containing the beginning address for the new block of memory. Check every allocation.
Putting it altogether, you could do something like:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **token_arr (const char *str, int *n_tokens)
{
char **arr = malloc(*n_tokens * sizeof *arr);
char str2 [strlen(str) + 1];
int i = 0;
if (!arr) { /* validate every allocation */
perror ("malloc-n_tokens");
return NULL;
}
strcpy (str2, str);
char *p = strtok (str2, " ");
while (i < *n_tokens && p != NULL) { /* check used pointers */
arr[i] = strdup (p);
if (!arr[i]) { /* strdup allocates -> you must validate */
perror ("strdup-arr[i]");
if (i) /* if tokens stored, break an return */
break;
else { /* if no tokes stored, free pointers */
free (arr);
return NULL;
}
}
p = strtok (NULL, " ");
i++;
}
*n_tokens = i; /* assign i to make tokes assigned available */
return arr;
}
int main (void) {
char *str1 = "( 8 + ( 41 - 12 ) )";
int n_tokens = 9;
char **expression = token_arr (str1, &n_tokens);
if (expression) { /* validate token_arr succeeded */
for (int i = 0; i < n_tokens; i++) { /* n_tokens times */
printf ("expression[%d] = %sn", i, expression[i]);
free (expression[i]); /* free mem allocated by strdup */
}
free (expression);
}
return 0;
}
(note: likewise check the return of token_arr
before making use of the return)
Example Use/Output
$ ./bin/token_arr
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/token_arr
==8420== Memcheck, a memory error detector
==8420== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8420== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==8420== Command: ./bin/token_arr
==8420==
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )
==8420==
==8420== HEAP SUMMARY:
==8420== in use at exit: 0 bytes in 0 blocks
==8420== total heap usage: 10 allocs, 10 frees, 92 bytes allocated
==8420==
==8420== All heap blocks were freed -- no leaks are possible
==8420==
==8420== For counts of detected and suppressed errors, rerun with: -v
==8420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
While I think you have probably got most of the issues sorted based on the comments, let's look at a way to address both the validation/return of expressions
and a way to return the number of tokens to protect against an error in tokenization resulting in less than n_tokens
being found.
As you have learned, when you declare str2
local to token_arr
, it has automatic storage duration and is only valid within the scope where it is declared. When token_arr
returns, the memory holding str2
is released for re-use and any attempt to reference that memory back in main()
invokes Undefined Behavior.
What are your options? (1) use strdup
to dynamically allocate storage for each token, copy the token to the new memory allocated, and then assign the starting address for the new block of memory containing the token to arr[i]
, e.g.
arr[i] = strdup (p);
or (2) do the same thing manually using strlen, malloc & memcpy
, e.g.
size_t len = strlen(p);
arr[i] = malloc (len + 1);
/* validate - here */
memcpy (arr[i], p, len + 1);
Now each arr[i]
points to a block of memory having allocated storage duration which remains valid until free
is called on that block -- or the program ends.
What If Less Than n_tokens
Are Found?
If less than n_tokens
are found within token_arr
and you attempt to use n_tokens
through expressions
back in main()
you will likely invoke Undefined Behavior again. To ensure you only use the tokens found in token_arr
and made available in main()
by the assignment to expression
-- Pass A Pointer To n_tokens
as the second parameter and update it will the value of i
before you return arr;
, e.g.
char **token_arr (const char *str, int *n_tokens)
{
char **arr = malloc(*n_tokens * sizeof *arr);
...
i++;
}
*n_tokens = i; /* assign i to make tokes assigned available */
return arr;
}
Now n_tokens
back in main()
contains only the number of tokens actually found and allocated for and assigned to arr[i]
in token_arr
.
Validate Every Allocation
It is critical that you validate every call to malloc, calloc, realloc, strdup
or any other function that allocates memory for you. Allocation can, and does, fail. When it does, it let's you know by returning NULL
instead of a pointer containing the beginning address for the new block of memory. Check every allocation.
Putting it altogether, you could do something like:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **token_arr (const char *str, int *n_tokens)
{
char **arr = malloc(*n_tokens * sizeof *arr);
char str2 [strlen(str) + 1];
int i = 0;
if (!arr) { /* validate every allocation */
perror ("malloc-n_tokens");
return NULL;
}
strcpy (str2, str);
char *p = strtok (str2, " ");
while (i < *n_tokens && p != NULL) { /* check used pointers */
arr[i] = strdup (p);
if (!arr[i]) { /* strdup allocates -> you must validate */
perror ("strdup-arr[i]");
if (i) /* if tokens stored, break an return */
break;
else { /* if no tokes stored, free pointers */
free (arr);
return NULL;
}
}
p = strtok (NULL, " ");
i++;
}
*n_tokens = i; /* assign i to make tokes assigned available */
return arr;
}
int main (void) {
char *str1 = "( 8 + ( 41 - 12 ) )";
int n_tokens = 9;
char **expression = token_arr (str1, &n_tokens);
if (expression) { /* validate token_arr succeeded */
for (int i = 0; i < n_tokens; i++) { /* n_tokens times */
printf ("expression[%d] = %sn", i, expression[i]);
free (expression[i]); /* free mem allocated by strdup */
}
free (expression);
}
return 0;
}
(note: likewise check the return of token_arr
before making use of the return)
Example Use/Output
$ ./bin/token_arr
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/token_arr
==8420== Memcheck, a memory error detector
==8420== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8420== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==8420== Command: ./bin/token_arr
==8420==
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )
==8420==
==8420== HEAP SUMMARY:
==8420== in use at exit: 0 bytes in 0 blocks
==8420== total heap usage: 10 allocs, 10 frees, 92 bytes allocated
==8420==
==8420== All heap blocks were freed -- no leaks are possible
==8420==
==8420== For counts of detected and suppressed errors, rerun with: -v
==8420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
add a comment |
up vote
3
down vote
accepted
While I think you have probably got most of the issues sorted based on the comments, let's look at a way to address both the validation/return of expressions
and a way to return the number of tokens to protect against an error in tokenization resulting in less than n_tokens
being found.
As you have learned, when you declare str2
local to token_arr
, it has automatic storage duration and is only valid within the scope where it is declared. When token_arr
returns, the memory holding str2
is released for re-use and any attempt to reference that memory back in main()
invokes Undefined Behavior.
What are your options? (1) use strdup
to dynamically allocate storage for each token, copy the token to the new memory allocated, and then assign the starting address for the new block of memory containing the token to arr[i]
, e.g.
arr[i] = strdup (p);
or (2) do the same thing manually using strlen, malloc & memcpy
, e.g.
size_t len = strlen(p);
arr[i] = malloc (len + 1);
/* validate - here */
memcpy (arr[i], p, len + 1);
Now each arr[i]
points to a block of memory having allocated storage duration which remains valid until free
is called on that block -- or the program ends.
What If Less Than n_tokens
Are Found?
If less than n_tokens
are found within token_arr
and you attempt to use n_tokens
through expressions
back in main()
you will likely invoke Undefined Behavior again. To ensure you only use the tokens found in token_arr
and made available in main()
by the assignment to expression
-- Pass A Pointer To n_tokens
as the second parameter and update it will the value of i
before you return arr;
, e.g.
char **token_arr (const char *str, int *n_tokens)
{
char **arr = malloc(*n_tokens * sizeof *arr);
...
i++;
}
*n_tokens = i; /* assign i to make tokes assigned available */
return arr;
}
Now n_tokens
back in main()
contains only the number of tokens actually found and allocated for and assigned to arr[i]
in token_arr
.
Validate Every Allocation
It is critical that you validate every call to malloc, calloc, realloc, strdup
or any other function that allocates memory for you. Allocation can, and does, fail. When it does, it let's you know by returning NULL
instead of a pointer containing the beginning address for the new block of memory. Check every allocation.
Putting it altogether, you could do something like:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **token_arr (const char *str, int *n_tokens)
{
char **arr = malloc(*n_tokens * sizeof *arr);
char str2 [strlen(str) + 1];
int i = 0;
if (!arr) { /* validate every allocation */
perror ("malloc-n_tokens");
return NULL;
}
strcpy (str2, str);
char *p = strtok (str2, " ");
while (i < *n_tokens && p != NULL) { /* check used pointers */
arr[i] = strdup (p);
if (!arr[i]) { /* strdup allocates -> you must validate */
perror ("strdup-arr[i]");
if (i) /* if tokens stored, break an return */
break;
else { /* if no tokes stored, free pointers */
free (arr);
return NULL;
}
}
p = strtok (NULL, " ");
i++;
}
*n_tokens = i; /* assign i to make tokes assigned available */
return arr;
}
int main (void) {
char *str1 = "( 8 + ( 41 - 12 ) )";
int n_tokens = 9;
char **expression = token_arr (str1, &n_tokens);
if (expression) { /* validate token_arr succeeded */
for (int i = 0; i < n_tokens; i++) { /* n_tokens times */
printf ("expression[%d] = %sn", i, expression[i]);
free (expression[i]); /* free mem allocated by strdup */
}
free (expression);
}
return 0;
}
(note: likewise check the return of token_arr
before making use of the return)
Example Use/Output
$ ./bin/token_arr
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/token_arr
==8420== Memcheck, a memory error detector
==8420== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8420== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==8420== Command: ./bin/token_arr
==8420==
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )
==8420==
==8420== HEAP SUMMARY:
==8420== in use at exit: 0 bytes in 0 blocks
==8420== total heap usage: 10 allocs, 10 frees, 92 bytes allocated
==8420==
==8420== All heap blocks were freed -- no leaks are possible
==8420==
==8420== For counts of detected and suppressed errors, rerun with: -v
==8420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
While I think you have probably got most of the issues sorted based on the comments, let's look at a way to address both the validation/return of expressions
and a way to return the number of tokens to protect against an error in tokenization resulting in less than n_tokens
being found.
As you have learned, when you declare str2
local to token_arr
, it has automatic storage duration and is only valid within the scope where it is declared. When token_arr
returns, the memory holding str2
is released for re-use and any attempt to reference that memory back in main()
invokes Undefined Behavior.
What are your options? (1) use strdup
to dynamically allocate storage for each token, copy the token to the new memory allocated, and then assign the starting address for the new block of memory containing the token to arr[i]
, e.g.
arr[i] = strdup (p);
or (2) do the same thing manually using strlen, malloc & memcpy
, e.g.
size_t len = strlen(p);
arr[i] = malloc (len + 1);
/* validate - here */
memcpy (arr[i], p, len + 1);
Now each arr[i]
points to a block of memory having allocated storage duration which remains valid until free
is called on that block -- or the program ends.
What If Less Than n_tokens
Are Found?
If less than n_tokens
are found within token_arr
and you attempt to use n_tokens
through expressions
back in main()
you will likely invoke Undefined Behavior again. To ensure you only use the tokens found in token_arr
and made available in main()
by the assignment to expression
-- Pass A Pointer To n_tokens
as the second parameter and update it will the value of i
before you return arr;
, e.g.
char **token_arr (const char *str, int *n_tokens)
{
char **arr = malloc(*n_tokens * sizeof *arr);
...
i++;
}
*n_tokens = i; /* assign i to make tokes assigned available */
return arr;
}
Now n_tokens
back in main()
contains only the number of tokens actually found and allocated for and assigned to arr[i]
in token_arr
.
Validate Every Allocation
It is critical that you validate every call to malloc, calloc, realloc, strdup
or any other function that allocates memory for you. Allocation can, and does, fail. When it does, it let's you know by returning NULL
instead of a pointer containing the beginning address for the new block of memory. Check every allocation.
Putting it altogether, you could do something like:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **token_arr (const char *str, int *n_tokens)
{
char **arr = malloc(*n_tokens * sizeof *arr);
char str2 [strlen(str) + 1];
int i = 0;
if (!arr) { /* validate every allocation */
perror ("malloc-n_tokens");
return NULL;
}
strcpy (str2, str);
char *p = strtok (str2, " ");
while (i < *n_tokens && p != NULL) { /* check used pointers */
arr[i] = strdup (p);
if (!arr[i]) { /* strdup allocates -> you must validate */
perror ("strdup-arr[i]");
if (i) /* if tokens stored, break an return */
break;
else { /* if no tokes stored, free pointers */
free (arr);
return NULL;
}
}
p = strtok (NULL, " ");
i++;
}
*n_tokens = i; /* assign i to make tokes assigned available */
return arr;
}
int main (void) {
char *str1 = "( 8 + ( 41 - 12 ) )";
int n_tokens = 9;
char **expression = token_arr (str1, &n_tokens);
if (expression) { /* validate token_arr succeeded */
for (int i = 0; i < n_tokens; i++) { /* n_tokens times */
printf ("expression[%d] = %sn", i, expression[i]);
free (expression[i]); /* free mem allocated by strdup */
}
free (expression);
}
return 0;
}
(note: likewise check the return of token_arr
before making use of the return)
Example Use/Output
$ ./bin/token_arr
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/token_arr
==8420== Memcheck, a memory error detector
==8420== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8420== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==8420== Command: ./bin/token_arr
==8420==
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )
==8420==
==8420== HEAP SUMMARY:
==8420== in use at exit: 0 bytes in 0 blocks
==8420== total heap usage: 10 allocs, 10 frees, 92 bytes allocated
==8420==
==8420== All heap blocks were freed -- no leaks are possible
==8420==
==8420== For counts of detected and suppressed errors, rerun with: -v
==8420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
While I think you have probably got most of the issues sorted based on the comments, let's look at a way to address both the validation/return of expressions
and a way to return the number of tokens to protect against an error in tokenization resulting in less than n_tokens
being found.
As you have learned, when you declare str2
local to token_arr
, it has automatic storage duration and is only valid within the scope where it is declared. When token_arr
returns, the memory holding str2
is released for re-use and any attempt to reference that memory back in main()
invokes Undefined Behavior.
What are your options? (1) use strdup
to dynamically allocate storage for each token, copy the token to the new memory allocated, and then assign the starting address for the new block of memory containing the token to arr[i]
, e.g.
arr[i] = strdup (p);
or (2) do the same thing manually using strlen, malloc & memcpy
, e.g.
size_t len = strlen(p);
arr[i] = malloc (len + 1);
/* validate - here */
memcpy (arr[i], p, len + 1);
Now each arr[i]
points to a block of memory having allocated storage duration which remains valid until free
is called on that block -- or the program ends.
What If Less Than n_tokens
Are Found?
If less than n_tokens
are found within token_arr
and you attempt to use n_tokens
through expressions
back in main()
you will likely invoke Undefined Behavior again. To ensure you only use the tokens found in token_arr
and made available in main()
by the assignment to expression
-- Pass A Pointer To n_tokens
as the second parameter and update it will the value of i
before you return arr;
, e.g.
char **token_arr (const char *str, int *n_tokens)
{
char **arr = malloc(*n_tokens * sizeof *arr);
...
i++;
}
*n_tokens = i; /* assign i to make tokes assigned available */
return arr;
}
Now n_tokens
back in main()
contains only the number of tokens actually found and allocated for and assigned to arr[i]
in token_arr
.
Validate Every Allocation
It is critical that you validate every call to malloc, calloc, realloc, strdup
or any other function that allocates memory for you. Allocation can, and does, fail. When it does, it let's you know by returning NULL
instead of a pointer containing the beginning address for the new block of memory. Check every allocation.
Putting it altogether, you could do something like:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **token_arr (const char *str, int *n_tokens)
{
char **arr = malloc(*n_tokens * sizeof *arr);
char str2 [strlen(str) + 1];
int i = 0;
if (!arr) { /* validate every allocation */
perror ("malloc-n_tokens");
return NULL;
}
strcpy (str2, str);
char *p = strtok (str2, " ");
while (i < *n_tokens && p != NULL) { /* check used pointers */
arr[i] = strdup (p);
if (!arr[i]) { /* strdup allocates -> you must validate */
perror ("strdup-arr[i]");
if (i) /* if tokens stored, break an return */
break;
else { /* if no tokes stored, free pointers */
free (arr);
return NULL;
}
}
p = strtok (NULL, " ");
i++;
}
*n_tokens = i; /* assign i to make tokes assigned available */
return arr;
}
int main (void) {
char *str1 = "( 8 + ( 41 - 12 ) )";
int n_tokens = 9;
char **expression = token_arr (str1, &n_tokens);
if (expression) { /* validate token_arr succeeded */
for (int i = 0; i < n_tokens; i++) { /* n_tokens times */
printf ("expression[%d] = %sn", i, expression[i]);
free (expression[i]); /* free mem allocated by strdup */
}
free (expression);
}
return 0;
}
(note: likewise check the return of token_arr
before making use of the return)
Example Use/Output
$ ./bin/token_arr
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/token_arr
==8420== Memcheck, a memory error detector
==8420== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8420== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==8420== Command: ./bin/token_arr
==8420==
expression[0] = (
expression[1] = 8
expression[2] = +
expression[3] = (
expression[4] = 41
expression[5] = -
expression[6] = 12
expression[7] = )
expression[8] = )
==8420==
==8420== HEAP SUMMARY:
==8420== in use at exit: 0 bytes in 0 blocks
==8420== total heap usage: 10 allocs, 10 frees, 92 bytes allocated
==8420==
==8420== All heap blocks were freed -- no leaks are possible
==8420==
==8420== For counts of detected and suppressed errors, rerun with: -v
==8420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
answered Nov 11 at 6:37
David C. Rankin
39.2k32546
39.2k32546
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53246140%2fsplitting-a-string-into-tokens-and-putting-tokens-into-array-strtok%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
You are allocating the array of pointers, but the pointers point to a local variable,
str2
, in the function, and that variable is out of scope once you're back inmain()
, so you're accessing invalid data that has been reused and presumably zeroed. You'll need to allocate the string where you copy the string literal, and you'll need to find a way to pass both allocated pointers back to themain()
so you can free the allocated memory accurately. As it stands, you have 'undefined behaviour'. That means that what you see is a valid response, as are many other possible behaviours.– Jonathan Leffler
Nov 11 at 5:47
Unless you are using
calloc
ormemset
to preserve a sentinel NULL at the end ofarr
, there is no need for+1
inchar**arr = malloc((n_tokens+1)*sizeof(char*));
.arr
is simply the number of pointers you are allocating. If you only haven_tokens
you only need that number of pointers unless you are ensuring you have the final pointerNULL
to allow iteration with awhile
loop -- but in that case you need to set ALL pointersNULL
to begin with.– David C. Rankin
Nov 11 at 5:57
You also need to avoid writing to more pointers than you have. So you need to ensure
i
never exceedsn_tokens+1
as you currently have it, e.g.while (i < n_tokens + 1 && p != NULL)
. You can allocate for each token withstrdup
, e.g.arr[i] = strdup (p);
. Now the contents ofarr[n]
will survive the return.– David C. Rankin
Nov 11 at 6:02