How to tokenize special characters depending on whitespace (< > | & etc.)

How to tokenize special characters depending on whitespace (< > | & etc.)

I found a project done a few years ago found&nbsp;<a href="http://web.archive.org/web/20121030075237/http://bbgen.net/blog/2011/06/string-to-argc-argv" target="_blank">here</a>&nbsp;that does some simple command line parsing. While I really like it's functionality, it does not support parsing special characters, such as &lt;, &gt;, &amp;, etc. I went ahead and attempted to add some functionality to parse these characters specifically by adding some of the same conditions that the existing code used to look for whitespace, escape characters, and quotes:

I found a project done a few years ago found here that does some simple command line parsing. While I really like it's functionality, it does not support parsing special characters, such as <, >, &, etc. I went ahead and attempted to add some functionality to parse these characters specifically by adding some of the same conditions that the existing code used to look for whitespace, escape characters, and quotes:

bool _isQuote(char c) {
    if (c == '\"')
            return true;
    else if (c == '\'')
            return true;

return false;

}

bool _isEscape(char c) { if (c == '\') return true;

return false;

}

bool _isWhitespace(char c) { if (c == ' ') return true; else if(c == '\t') return true;

return false;

} . . .

What I added:

bool _isLeftCarrot(char c) {
    if (c == '<')
        return true;

return false;

}

bool _isRightCarrot(char c) { if (c == '>') return true;

return false;

}

and so on for the rest of the special characters.

I also tried the same approach as the existing code in the parse method:

std::list<string> parse(const std::string& args) {

std::stringstream ain(args);            // iterates over the input string
ain &gt;&gt; std::noskipws;                   // ensures not to skip whitespace
std::list&lt;std::string&gt; oargs;           // list of strings where we will store the tokens

std::stringstream currentArg("");
currentArg &gt;&gt; std::noskipws;

// current state
enum State {
        InArg,          // scanning the string currently
        InArgQuote,     // scanning the string that started with a quote currently 
        OutOfArg        // not scanning the string currently
};
State currentState = OutOfArg;

char currentQuoteChar = '\0';   // used to differentiate between ' and "
                                // ex. "sample'text" 

char c;
std::stringstream ss;
std::string s;
// iterate character by character through input string
while(!ain.eof() &amp;&amp; (ain &gt;&gt; c)) {

        // if current character is a quote
        if(_isQuote(c)) {
                switch(currentState) {
                        case OutOfArg:
                                currentArg.str(std::string());
                        case InArg:
                                currentState = InArgQuote;
                                currentQuoteChar = c;
                                break;
                        case InArgQuote:
                                if (c == currentQuoteChar)
                                        currentState = InArg;
                                else
                                        currentArg &lt;&lt; c;
                                break;
                }
        }
        // if current character is whitespace
        else if (_isWhitespace(c)) {
                    switch(currentState) {
                        case InArg:
                                oargs.push_back(currentArg.str());
                                currentState = OutOfArg;
                                break;
                        case InArgQuote:
                                currentArg &lt;&lt; c;
                                break;
                        case OutOfArg:
                                // nothing
                                break;
                }
        }
        // if current character is escape character
        else if (_isEscape(c)) {
                switch(currentState) {
                        case OutOfArg:
                                currentArg.str(std::string());
                                currentState = InArg;
                        case InArg:
                        case InArgQuote:
                                if (ain.eof())
                                {
                                        currentArg &lt;&lt; c;
                                        throw(std::runtime_error("Found Escape Character at end of file."));
                                }
                                else {
                                        char c1 = c;
                                        ain &gt;&gt; c;
                                        if (c != '\"')
                                                currentArg &lt;&lt; c1;
                                        ain.unget();
                                        ain &gt;&gt; c;
                                        currentArg &lt;&lt; c;
                                }
                                break;
                }
        }

What I added in the parse method:

            // if current character is left carrot (<)
            else if(_isLeftCarrot(c)) {
                    // convert from char to string and push onto list
                    ss << c;
                    ss >> s;
                    oargs.push_back(s);
            }
            // if current character is right carrot (>)
            else if(_isRightCarrot(c)) {
                    ss << c;
                    ss >> s;
                    oargs.push_back(s);
            }
.
.
.
            else {
                    switch(currentState) {
                            case InArg:
                            case InArgQuote:
                                    currentArg << c;
                                    break;
                            case OutOfArg:
                                    currentArg.str(std::string());
                                    currentArg << c;
                                    currentState = InArg;
                                    break;
                    }
            }
    }

if (currentState == InArg) {
        oargs.push_back(currentArg.str());
        s.clear();
}
else if (currentState == InArgQuote)
        throw(std::runtime_error("Starting quote has no ending quote."));

return oargs;

}

parse will return a list of strings of the tokens.

However, I am running into issues with a specific test case when the special character is attached to the end of the input. For example, the input

foo-bar&

will return this list: [{&},{foo-bar}] instead of what I want: [{foo-bar},{&}]

I'm struggling to fix this issue. I am new to C++ so any advice along with some explanation would be great help.

c++

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Dicey Issues in C/C++

C/C++ problems. If you are familiar with C/C++then you must have come across some unusual things and if you haven’t, then you are about to. The below codes are checked twice before adding, so feel free to share this article with your friends.

Loops in C++ | For, While, and Do While Loops in C++

In this Video We are going to see how to use Loops in C++. We will see How to use For, While, and Do While Loops in C++.

Using isdigit() in C/C++

In this article, we'll take a look at using the isdigit() function in C/C++. This is a very simple way to check if any value is a digit or not. Let's look

Object Oriented Programming in C++ | C++ OOPs Concepts | Learn Object Oriented C++

C++ is general purpose, compiled, object-oriented programming language and its concepts served as the basis for several other languages such as Java, Python, Ruby, Perl etc.

A Complete Guide to fread() in C/C++

In this article, we’ll take a look at using fread() in C/C++.