How to tokenize special characters depending on whitespace (< > | & etc.)

How to tokenize special characters depending on whitespace (< > | & etc.)

I found a project done a few years ago found&nbsp;<a href="http://web.archive.org/web/20121030075237/http://bbgen.net/blog/2011/06/string-to-argc-argv" target="_blank">here</a>&nbsp;that does some simple command line parsing. While I really like it's functionality, it does not support parsing special characters, such as &lt;, &gt;, &amp;, etc. I went ahead and attempted to add some functionality to parse these characters specifically by adding some of the same conditions that the existing code used to look for whitespace, escape characters, and quotes:

I found a project done a few years ago found here that does some simple command line parsing. While I really like it's functionality, it does not support parsing special characters, such as <, >, &, etc. I went ahead and attempted to add some functionality to parse these characters specifically by adding some of the same conditions that the existing code used to look for whitespace, escape characters, and quotes:

bool _isQuote(char c) {
    if (c == '\"')
            return true;
    else if (c == '\'')
            return true;

return false;

}

bool _isEscape(char c) { if (c == '\') return true;

return false;

}

bool _isWhitespace(char c) { if (c == ' ') return true; else if(c == '\t') return true;

return false;

} . . .

What I added:

bool _isLeftCarrot(char c) {
    if (c == '<')
        return true;

return false;

}

bool _isRightCarrot(char c) { if (c == '>') return true;

return false;

}

and so on for the rest of the special characters.

I also tried the same approach as the existing code in the parse method:

std::list<string> parse(const std::string& args) {

std::stringstream ain(args);            // iterates over the input string
ain &gt;&gt; std::noskipws;                   // ensures not to skip whitespace
std::list&lt;std::string&gt; oargs;           // list of strings where we will store the tokens

std::stringstream currentArg("");
currentArg &gt;&gt; std::noskipws;

// current state
enum State {
        InArg,          // scanning the string currently
        InArgQuote,     // scanning the string that started with a quote currently 
        OutOfArg        // not scanning the string currently
};
State currentState = OutOfArg;

char currentQuoteChar = '\0';   // used to differentiate between ' and "
                                // ex. "sample'text" 

char c;
std::stringstream ss;
std::string s;
// iterate character by character through input string
while(!ain.eof() &amp;&amp; (ain &gt;&gt; c)) {

        // if current character is a quote
        if(_isQuote(c)) {
                switch(currentState) {
                        case OutOfArg:
                                currentArg.str(std::string());
                        case InArg:
                                currentState = InArgQuote;
                                currentQuoteChar = c;
                                break;
                        case InArgQuote:
                                if (c == currentQuoteChar)
                                        currentState = InArg;
                                else
                                        currentArg &lt;&lt; c;
                                break;
                }
        }
        // if current character is whitespace
        else if (_isWhitespace(c)) {
                    switch(currentState) {
                        case InArg:
                                oargs.push_back(currentArg.str());
                                currentState = OutOfArg;
                                break;
                        case InArgQuote:
                                currentArg &lt;&lt; c;
                                break;
                        case OutOfArg:
                                // nothing
                                break;
                }
        }
        // if current character is escape character
        else if (_isEscape(c)) {
                switch(currentState) {
                        case OutOfArg:
                                currentArg.str(std::string());
                                currentState = InArg;
                        case InArg:
                        case InArgQuote:
                                if (ain.eof())
                                {
                                        currentArg &lt;&lt; c;
                                        throw(std::runtime_error("Found Escape Character at end of file."));
                                }
                                else {
                                        char c1 = c;
                                        ain &gt;&gt; c;
                                        if (c != '\"')
                                                currentArg &lt;&lt; c1;
                                        ain.unget();
                                        ain &gt;&gt; c;
                                        currentArg &lt;&lt; c;
                                }
                                break;
                }
        }

What I added in the parse method:

            // if current character is left carrot (<)
            else if(_isLeftCarrot(c)) {
                    // convert from char to string and push onto list
                    ss << c;
                    ss >> s;
                    oargs.push_back(s);
            }
            // if current character is right carrot (>)
            else if(_isRightCarrot(c)) {
                    ss << c;
                    ss >> s;
                    oargs.push_back(s);
            }
.
.
.
            else {
                    switch(currentState) {
                            case InArg:
                            case InArgQuote:
                                    currentArg << c;
                                    break;
                            case OutOfArg:
                                    currentArg.str(std::string());
                                    currentArg << c;
                                    currentState = InArg;
                                    break;
                    }
            }
    }

if (currentState == InArg) {
        oargs.push_back(currentArg.str());
        s.clear();
}
else if (currentState == InArgQuote)
        throw(std::runtime_error("Starting quote has no ending quote."));

return oargs;

}

parse will return a list of strings of the tokens.

However, I am running into issues with a specific test case when the special character is attached to the end of the input. For example, the input

foo-bar&

will return this list: [{&},{foo-bar}] instead of what I want: [{foo-bar},{&}]

I'm struggling to fix this issue. I am new to C++ so any advice along with some explanation would be great help.

Angular 9 Tutorial: Learn to Build a CRUD Angular App Quickly

What's new in Bootstrap 5 and when Bootstrap 5 release date?

Brave, Chrome, Firefox, Opera or Edge: Which is Better and Faster?

How to Build Progressive Web Apps (PWA) using Angular 9

What is new features in Javascript ES2020 ECMAScript 2020

C/C++ vs. Rust: A developer’s perspective

In this post, you'll see the difference between Rust and C/C++ in a developer’s perspective

Variable Introduction in C#[Bangla]

LIKE | COMMENT | SHARE | SUBSCRIBE A variable is nothing but a name given to a storage area that our programs can manipulate. Each variable in C# has a speci...