Splitting a delimited string into pieces

Problem

You want to split a comma- or tab-delimited string into a vector containing strings or numbers.

Solution

There are two general methods here. The first one listed is simpler, and will handle a single string using more basic Matlab functions. The second method uses the textscan() function; it is more powerful and flexible, but it may be slower.

First method

This will handle strings with mixed types (strings and numbers) and return a cell array containing those values. Strings remain strings; number strings are converted to numbers. If none of the input is numbers, the part that attempts and tests the number conversion can be removed.

string    = 'one,two,,four';
delimiter = ',';    % For tabs, use: delimiter = sprintf('\t');

% Find the delimiters
delimIdx = find(string == delimiter);

% Pretend there are delimiters at the beginning and end, for the loop below
delimIdx = [0  delimIdx  length(string)+1];

% Preallocate cell array to hold substrings
subStrings = cell(1, length(delimIdx) - 1);

% Process each element
for i = 1:length(subStrings)

    % Find the text between the delimiters
    %(don't include the delimiters)
    startOffset = delimIdx(i)   + 1;
    endOffset   = delimIdx(i+1) - 1;

    % Get the element
    txt = string(startOffset:endOffset);

    % Attempt conversion to number
    num = sscanf(txt, '%f');

    % Number conversion successful if no error message
    if isempty(num)
        subStrings{i} = txt;
    else
        subStrings{i} = num;
    end

end

% Print out the strings
subStrings

If you know that your string is all numbers, it may be more convenient to return a normal (non-cell) vector with the values.

string    = '1,2,,4';
delimiter = ',';

% Find the delimiters
delimIdx = find(string == delimiter);

% Pretend there are delimiters at the beginning and end, for the loop below
delimIdx = [0  delimIdx  length(string)+1];

% Preallocate an array to hold values
values = zeros(1, length(delimIdx) - 1);

% Process each element
for i = 1:length(values)

    % Find the text between the delimiters
    %(don't include the delimiters)
    startOffset = delimIdx(i)   + 1;
    endOffset   = delimIdx(i+1) - 1;

    % Get the element
    txt = string(startOffset:endOffset);

    % Attempt conversion to number
    num = sscanf(txt, '%f');

    % If error or empty number, assign NaN; otherwise assign the number
    if isempty(num)
        values(i) = NaN;
    else
        values(i) = num;
    end

end

% Print out the strings
values

Second method

The textscan() function converts strings to cell arrays containing other cell arrays. Note that this is different from a two-dimensional cell array.

str = 'asdf,35,4,w,2';
values = textscan(str, '%s%f%f%s%f', 'delimiter', ',');

values{2}(1)  % returns 35
values{2}     % also returns [35], which is equivalent to 35, since it's a 1x1 matrix
values{1}{1}  % returns 'asd'
values{1}     % returns a 1x1 cell array: { 'asdf' }

In the format specification string:

The reason textscan() returns these double cell arrays is because it is designed to operate over multi-line strings.

% This string has two lines:
% asdf,35,4,qwerty,2
% foo,56,32,bar,5
str = sprintf('asdf,35,4,qwerty,2\nfoo,56,32,bar,5');
values = textscan(str, '%s%f%f%s%f', 'delimiter', ',');

values{2}(1)  % returns 35
values{2}(2)  % returns 56
values{2}     % returns [35; 56]

values{1}{1}  % returns 'asdf'
values{1}{2}  % returns 'foo'
values{1}     % returns a 2x1 cell array: { 'asdf' 'foo' }

If the input is all numbers, it can be converted from a cell array to a regular array with cell2mat(). This function requires that the numbers are all the same type, such as %f%f%f or %d%d%d. If you have a mixture of the two types, use all floating point numbers.

str = '4,3,,56.32';
values = textscan(str, '%f%f%f%f', 'delimiter', ',');
cell2mat(values)

For a tab-delimited string, use sprintf() to escape the tab character.

values = textscan(str, '%f%f%f%f', 'delimiter', sprintf('\t'));

Notes

Another way to split a string on delimiters is the strtok() function, but this function will collapse consecutive delimiters which may cause problems if you have any empty values.