How to efficiently get a `string_view` for a substring of `std::string`

sehe picture sehe · Sep 4, 2017 · Viewed 9.5k times · Source

Using http://en.cppreference.com/w/cpp/string/basic_string_view as a reference, I see no way to do this more elegantly:

std::string s = "hello world!";
std::string_view v = s;
v = v.substr(6, 5); // "world"

Worse, the naive approach is a pitfall and leaves v a dangling reference to a temporary:

std::string s = "hello world!";
std::string_view v(s.substr(6, 5)); // OOPS!

I seem to remember something like there might be an addition to the standard library to return a substring as a view:

auto v(s.substr_view(6, 5));

I can think of the following workarounds:

std::string_view(s).substr(6, 5);
std::string_view(s.data()+6, 5);
// or even "worse":
std::string_view(s).remove_prefix(6).remove_suffix(1);

Frankly, I don't think any of these are very nice. Right now the best thing I can think of is using aliases to simply make things less verbose.

using sv = std::string_view;
sv(s).substr(6, 5);

Answer

Richard Hodges picture Richard Hodges · Sep 4, 2017

There's the free-function route, but unless you also provide overloads for std::string it's a snake-pit.

#include <string>
#include <string_view>

std::string_view sub_string(
  std::string_view s, 
  std::size_t p, 
  std::size_t n = std::string_view::npos)
{
  return s.substr(p, n);
}

int main()
{
  using namespace std::literals;

  auto source = "foobar"s;

  // this is fine and elegant...
  auto bar = sub_string(source, 3);

  // but uh-oh...
  bar = sub_string("foobar"s, 3);
}

IMHO the whole design of string_view is a horror show which will take us back to a world of segfaults and angry customers.

update:

Even adding overloads for std::string is a horror show. See if you can spot the subtle segfault timebomb...

#include <string>
#include <string_view>

std::string_view sub_string(std::string_view s, 
  std::size_t p, 
  std::size_t n = std::string_view::npos)
{
  return s.substr(p, n);
}

std::string sub_string(std::string&& s, 
  std::size_t p, 
  std::size_t n = std::string::npos)
{
  return s.substr(p, n);
}

std::string sub_string(std::string const& s, 
  std::size_t p, 
  std::size_t n = std::string::npos)
{
  return s.substr(p, n);
}

int main()
{
  using namespace std::literals;

  auto source = "foobar"s;
  auto bar = sub_string(std::string_view(source), 3);

  // but uh-oh...
  bar = sub_string("foobar"s, 3);
}

The compiler found nothing to warn about here. I am certain that a code review would not either.

I've said it before and I'll say it again, in case anyone on the c++ committee is watching, allowing implicit conversions from std::string to std::string_view is a terrible error which will only serve to bring c++ into disrepute.

Update

Having raised this (to me) rather alarming property of string_view on the cpporg message board, my concerns have been met with indifference.

The consensus of advice from this group is that std::string_view must never be returned from a function, which means that my first offering above is bad form.

There is of course no compiler help to catch times when this happens by accident (for example through template expansion).

As a result, std::string_view should be used with the utmost care, because from a memory management point of view it is equivalent to a copyable pointer pointing into the state of another object, which may no longer exist. However, it looks and behaves in all other respects like a value type.

Thus code like this:

auto s = get_something().get_suffix();

Is safe when get_suffix() returns a std::string (either by value or reference)

but is UB if get_suffix() is ever refactored to return a std::string_view.

Which in my humble view means that any user code that stores returned strings using auto will break if the libraries they are calling are ever refactored to return std::string_view in place of std::string const&.

So from now on, at least for me, "almost always auto" will have to become, "almost always auto, except when it's strings".