% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/substr2.R
\name{substr_ctl}
\alias{substr_ctl}
\alias{substr2_ctl}
\alias{substr_sgr}
\alias{substr2_sgr}
\title{ANSI Control Sequence Aware Version of substr}
\usage{
substr_ctl(x, start, stop, warn = getOption("fansi.warn"),
  term.cap = getOption("fansi.term.cap"), ctl = "all")

substr2_ctl(x, start, stop, type = "chars", round = "start",
  tabs.as.spaces = getOption("fansi.tabs.as.spaces"),
  tab.stops = getOption("fansi.tab.stops"),
  warn = getOption("fansi.warn"),
  term.cap = getOption("fansi.term.cap"), ctl = "all")

substr_sgr(x, start, stop, warn = getOption("fansi.warn"),
  term.cap = getOption("fansi.term.cap"))

substr2_sgr(x, start, stop, type = "chars", round = "start",
  tabs.as.spaces = getOption("fansi.tabs.as.spaces"),
  tab.stops = getOption("fansi.tab.stops"),
  warn = getOption("fansi.warn"),
  term.cap = getOption("fansi.term.cap"))
}
\arguments{
\item{x}{a character vector or object that can be coerced to character.}

\item{start}{integer.  The first element to be replaced.}

\item{stop}{integer.  The last element to be replaced.}

\item{warn}{TRUE (default) or FALSE, whether to warn when potentially
problematic \emph{Control Sequences} are encountered.  These could cause the
assumptions \code{fansi} makes about how strings are rendered on your display
to be incorrect, for example by moving the cursor (see \link{fansi}).}

\item{term.cap}{character a vector of the capabilities of the terminal, can
be any combination "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
starting with "38;5" or "48;5"), and "truecolor" (SGR codes starting with
"38;2" or "48;2"). Changing this parameter changes how \code{fansi} interprets
escape sequences, so you should ensure that it matches your terminal
capabilities. See \link{term_cap_test} for details.}

\item{ctl}{character, which \emph{Control Sequences} should be treated
specially. See the "_ctl vs. _sgr" section for details.
\itemize{
\item "nl": newlines.
\item "c0": all other "C0" control characters (i.e. 0x01-0x1f, 0x7F), except
for newlines and the actual ESC (0x1B) character.
\item "sgr": ANSI CSI SGR sequences.
\item "csi": all non-SGR ANSI CSI sequences.
\item "esc": all other escape sequences.
\item "all": all of the above, except when used in combination with any of the
above, in which case it means "all but".
}}

\item{type}{character(1L) partial matching \code{c("chars", "width")}, although
\code{type="width"} only works correctly with R >= 3.2.2.}

\item{round}{character(1L) partial matching
\code{c("start", "stop", "both", "neither")}, controls how to resolve
ambiguities when a \code{start} or \code{stop} value in "width" \code{type} mode falls
within a multi-byte character or a wide display character.  See details.}

\item{tabs.as.spaces}{FALSE (default) or TRUE, whether to convert tabs to
spaces.  This can only be set to TRUE if \code{strip.spaces} is FALSE.}

\item{tab.stops}{integer(1:n) indicating position of tab stops to use
when converting tabs to spaces.  If there are more tabs in a line than
defined tab stops the last tab stop is re-used.  For the purposes of
applying tab stops, each input line is considered a line and the character
count begins from the beginning of the input line.}
}
\description{
\code{substr_ctl} is a drop-in replacement for \code{substr}.  Performance is
slightly slower than \code{substr}.  ANSI CSI SGR sequences will be included in
the substrings to reflect the format of the substring when it was embedded in
the source string.  Additionally, other \emph{Control Sequences} specified in
\code{ctl} are treated as zero-width.
}
\details{
\code{substr2_ctl} and \code{substr2_sgr} add the ability to retrieve substrings based
on display width, and byte width in addition to the normal character width.
\code{substr2_ctl} also provides the option to convert tabs to spaces with
\link{tabs_as_spaces} prior to taking substrings.

Because exact substrings on anything other than character width cannot be
guaranteed (e.g. as a result of multi-byte encodings, or double display-width
characters) \code{substr2_ctl} must make assumptions on how to resolve provided
\code{start}/\code{stop} values that are infeasible and does so via the \code{round}
parameter.

If we use "start" as the \code{round} value, then any time the \code{start}
value corresponds to the middle of a multi-byte or a wide character, then
that character is included in the substring, while any similar partially
included character via the \code{stop} is left out.  The converse is true if we
use "stop" as the \code{round} value.  "neither" would cause all partial
characters to be dropped irrespective whether they correspond to \code{start} or
\code{stop}, and "both" could cause all of them to be included.

These functions map string lengths accounting for ANSI CSI SGR sequence
semantics to the naive length calculations, and then use the mapping in
conjunction with \code{\link[base:substr]{base::substr()}} to extract the string.  This concept is
borrowed directly from Gábor Csárdi's \code{crayon} package, although the
implementation of the calculation is different.
}
\note{
Non-ASCII strings are converted to and returned in UTF-8 encoding.
}
\section{_ctl vs. _sgr}{


The \code{*_ctl} versions of the functions treat all \emph{Control Sequences} specially
by default.  Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero.  For
the SGR subset of the ANSI CSI sequences, \code{fansi} will also parse, interpret,
and reapply the text styles they encode if needed.  You can modify whether a
\emph{Control Sequence} is treated specially with the \code{ctl} parameter.  You can
exclude a type of \emph{Control Sequence} from special treatment by combining
"all" with that type of sequence (e.g. \code{ctl=c("all", "nl")} for special
treatment of all \emph{Control Sequences} \strong{but} newlines).  The \code{*_sgr} versions
only treat ANSI CSI SGR sequences specially, and are equivalent to the
\code{*_ctl} versions with the \code{ctl} parameter set to "sgr".
}

\examples{
substr_ctl("\\033[42mhello\\033[m world", 1, 9)
substr_ctl("\\033[42mhello\\033[m world", 3, 9)

## Width 2 and 3 are in the middle of an ideogram as
## start and stop positions respectively, so we control
## what we get with `round`

cn.string <- paste0("\\033[42m", "\\u4E00\\u4E01\\u4E03", "\\033[m")

substr2_ctl(cn.string, 2, 3, type='width')
substr2_ctl(cn.string, 2, 3, type='width', round='both')
substr2_ctl(cn.string, 2, 3, type='width', round='start')
substr2_ctl(cn.string, 2, 3, type='width', round='stop')

## the _sgr variety only treat as special CSI SGR,
## compare the following:

substr_sgr("\\033[31mhello\\tworld", 1, 6)
substr_ctl("\\033[31mhello\\tworld", 1, 6)
substr_ctl("\\033[31mhello\\tworld", 1, 6, ctl=c('all', 'c0'))
}
\seealso{
\link{fansi} for details on how \emph{Control Sequences} are
interpreted, particularly if you are getting unexpected results.
}
