Release Notes
v2.5.0 series come with new and improved sequence, structure, and dynamics analysis features. See release notes for details.
How to Cite
Bakan A, Meireles LM, Bahar I ProDy: Protein Dynamics Inferred from Theory and Experiments
Bioinformatics 2011 27(11):1575-1577.
Bakan A, Dutta A, Mao W, Liu Y, Chennubhotla C, Lezon TR, Bahar I Evol and ProDy for Bridging Protein Sequence Evolution and Structural Dynamics
Bioinformatics 2014 30(18):2681-2683.
Zhang S, Krieger JM, Zhang Y, Kaya C, Kaynak B, Mikulska-Ruminska K, Doruker P, Li H, Bahar I ProDy 2.0: Increased scale and scope after 10 years of protein dynamics modelling with Python
Bioinformatics 2021 37(20):3657-3659.
Source code for prody.atomic.select

# -*- coding: utf-8 -*-
"""This module defines a class for selecting subsets of atoms.  You can read
this page in interactive sessions using ``help(select)``.

ProDy offers a fast and powerful atom selection class, :class:`.Select`.
Selection features, grammar, and keywords are similar to those of VMD.
Small differences, that is described below, should not affect most practical
uses of atom selections.  With added flexibility of Python, ProDy selection
engine can also be used to identify intermolecular contacts.  You may see
this and other usage examples in :ref:`contacts` and
:ref:`selection-operations`.

First, we import everything from ProDy and parse a protein-DNA-ligand
complex structure:

.. ipython:: python

   from prody import *
   p = parsePDB('3mht')

:func:`.parsePDB` returns :class:`.AtomGroup` instances, ``p`` in this case,
that stores all atomic data in the file.  We can count different types of
atoms using :ref:`flags` and :meth:`~.AtomGroup.numAtoms` method as follows:

.. ipython:: python

   p.numAtoms('protein')
   p.numAtoms('nucleic')
   p.numAtoms('hetero')
   p.numAtoms('water')

Last two counts suggest that ligand has 26 atoms, i.e. number of :term:`hetero`
atoms less the number of :term:`water` atoms.


Atom flags
-------------------------------------------------------------------------------

We select subset of atoms by using :meth:`.AtomGroup.select` method.
All :ref:`flags` can be input arguments to this methods as follows:

.. ipython:: python

   p.select('protein')
   p.select('water')

This operation returns :class:`.Selection` instances, which can be an input
to functions that accepts an *atoms* argument.


Logical operators
-------------------------------------------------------------------------------

Flags can be combined using ``'and'`` and ``'or'`` operators:

.. ipython:: python

   p.select('protein and water')

``'protein and water'`` did not result in selection of :term:`protein` and
:term:`water` atoms.  This is because, no atom is flagged as a protein and a
water atom at the same time.

.. note::

   **Interpreting selection strings**

   You may think as if a selection string, such as ``'protein and water'``, is
   evaluated on a per atom basis and an atom is selected if it satisfies the
   given criterion.  To select both water and protein atoms, ``'or'`` logical
   operator should be used instead.  A protein or a water atom would satisfy
   ``'protein or water'`` criterion.

.. ipython:: python

   p.select('protein or water')

We can also use ``'not'`` operator to negate an atom flag.  For example,
the following selection will only select ligand atoms:

.. ipython:: python

   p.select('not water and hetero')

If you omit the ``'and'`` operator, you will get the same result:

.. ipython:: python

   p.select('not water hetero')

.. note::

   **Default operator** between two flags, or other selection tokens that will
   be discussed later, is ``'and'``.  For example, ``'not water hetero'``
   is equivalent to ``'not water and hetero'``.

We can select Cα atoms of acidic residues by omitting the default logical
operator as follows:

.. ipython:: python

   sel = p.select('acidic calpha')
   sel
   set(sel.getResnames())


Quick selections
-------------------------------------------------------------------------------

For simple selections, such as shown above, following may be preferable over
the :meth:`~.AtomGroup.select` method:

.. ipython:: python

   p.acidic_calpha

The result is the same as using ``p.select('acidic calpha')``.  Underscore,
``_``, is considered as a whitespace.  The limitation of this approach is that
special characters cannot be used.


Atom data fields
-------------------------------------------------------------------------------

In addition to :ref:`flags`, :ref:`fields` can be used in atom selections
when combined with some values.  For example, we can select Cα and Cβ atoms
of alanine residues as follows:

.. ipython:: python

   p.select('resname ALA name CA CB')

Note that we omitted the default ``'and'`` operator.

.. note::

   **Whitespace** or **empty string** can be specified using an ``'_'``.
   Atoms with string data fields empty, such as those with no a chain
   identifiers or alternate location identifiers, can be selected using
   an underscore.


.. ipython:: python

   p.select('chain _')  # chain identifiers of all atoms are specified in 3mht
   p.select('altloc _')  # altloc identifiers for all atoms are empty

Numeric data fields can also be used to make selections:

.. ipython:: python

   p.select('ca resnum 1 2 3 4')


A special case for residues is having insertion codes.  Residue numbers and
insertion codes can be specified together as follows:

  * ``'resnum 5'`` selects residue 5 (all insertion codes)
  * ``'resnum 5A'`` selects residue 5 with insertion code A
  * ``'resnum 5_'`` selects residue 5 with no insertion code


Number ranges
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A range of numbers using ``'to'`` or Python style slicing with ``':'``:

.. ipython:: python

   p.select('ca resnum 1to4')
   p.select('ca resnum 1:4')
   p.select('ca resnum 1:4:2')

.. note::

   **Number ranges** specify continuous intervals:

     * ``'to'`` is all inclusive, e.g. ``'resnum 1 to 4'`` means
       ``'1 <= resnum <= 4'``

     * ``':'`` is left inclusive, e.g. ``'resnum 1:4'`` means
       ``'1 <= resnum < 4'``

   Consecutive use of ``':'``, however, specifies a discrete range of numbers,
   e.g. ``'resnum 1:4:2'`` means ``'resnum 1 3'``


Special characters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Following characters can be specified when using :ref:`fields` for atom
selections::

  abcdefghijklmnopqrstuvwxyz
  ABCDEFGHIJKLMNOPQRSTUVWXYZ
  0123456789
  ~@#$.:;_',

For example, ``"name C' N` O~ C$ C#"`` is a valid selection string.

.. note::

   **Special characters** (``~!@#$%^&*()-_=+[{}]\|;:,<>./?()'"``) must be
   escaped using grave accent characters (``````).


Negative numbers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Negative numbers and number ranges must also be escaped using grave accent
characters, since negative sign ``'-'`` is considered a special character
unless it indicates subtraction operation (see below).

.. ipython:: python

   p.select('x `-25 to 25`')
   p.select('x `-22.542`')


Omitting the grave accent character will cause a :exc:`.SelectionError`.


Regular expressions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Finally, you can specify regular expressions to select atoms based on
data fields with type string.  Following will select residues whose names
start with capital letter A

.. ipython:: python

   sel = p.select('resname "A.*"')
   set(sel.getResnames())


.. note::

   **Regular expressions** can be specified using double quotes, ``"..."``.
   For more information on regular expressions see :mod:`re`.


Numerical comparisons
-------------------------------------------------------------------------------

:ref:`fields` with numeric types can be used as operands in numerical
comparisons:

.. ipython:: python

   p.select('x < 0')
   p.select('occupancy = 1')

==========  =================================
Comparison  Description
==========  =================================
   <        less than
   >        greater than
   <=       less than or equal
   >=       greater than or equal
   ==       equal
   =        equal
   !=       not equal
==========  =================================

It is also possible to chain comparison statements as follows:

.. ipython:: python

   p.select('-10 <= x < 0')

This would be the same as the following selection:

.. ipython:: python

   p.select('-10 <= x and x < 0') == p.select('-10 <= x < 0')

Furthermore, numerical comparisons may involve the following operations:

=========  ==================================
Operation  Description
=========  ==================================
x ** y     x to the power y
x ^ y      x to the power y
x * y      x times y
x / y      x divided by y
x // y     x divided by y (floor division)
x % y      x modulo y
x + y      x plus y
x - y      x minus y
=========  ==================================

These operations must be used with a numerical comparison, e.g.

.. ipython:: python

   p.select('x ** 2 < 10')
   p.select('x ** 2 ** 2 < 10')

Finally, following functions can be used in numerical comparisons:

========  ===================================
Function  Description
========  ===================================
abs(x)    absolute value of x
acos(x)   arccos of x
asin(x)   arcsin of x
atan(x)   arctan of x
ceil(x)   smallest integer not less than x
cos(x)    cosine of x
cosh(x)   hyperbolic cosine of x
floor(x)  largest integer not greater than x
exp(x)    e to the power x
log(x)    natural logarithm of x
log10(x)  base 10 logarithm of x
sin(x)    sine of x
sinh(x)   hyperbolic sine of x
sq(x)     square of x
sqrt(x)   square-root of x
tan(x)    tangent of x
tanh(x)   hyperbolic tangent of x
========  ===================================

.. ipython:: python

   p.select('sqrt(sq(x) + sq(y) + sq(z)) < 100')  # within 100 A of origin


Distance based selections
-------------------------------------------------------------------------------

Atoms within a user specified distance (A) from a set of user specified atoms
can be selected using ``'within . of .'`` keyword, e.g. ``'within 5 of water'``
selects atoms that are within 5 A of water molecules. This setting will
results selecting water atoms as well.

User can avoid selecting specified atoms using ``exwithin . of ..`` setting,
e.g. ``'exwithin 5 of water'`` will not select water molecules and is
equivalent to ``'within 5 of water and not water'``

.. ipython:: python

   p.select('exwithin 5 of water') == p.select('not water within 5 of water')


Sequence selections
-------------------------------------------------------------------------------

One-letter amino acid sequences can be used to make atom selections.
``'sequence SAR'`` will select **SER-ALA-ARG** residues in a chain.  Note
that the selection does not consider connectivity within a chain.  Regular
expressions can also be used to make selections: ``'sequence "MI.*KQ"'`` will
select **MET-ILE-(XXX)n-ASP-LYS-GLN** pattern, if present.

.. ipython:: python

   sel = p.select('ca sequence "MI.*DKQ"')
   sel
   sel.getResnames()


Expanding selections
-------------------------------------------------------------------------------

A selection can be expanded to include the atoms in the same *residue*,
*chain*, or *segment* using ``same .. as ..`` setting, e.g.
``'same residue as exwithin 4 of water'`` will select residues that have
at least an atom within 4 A of any water molecule.

.. ipython:: python

   p.select('same residue as exwithin 4 of water')

Additionally, a selection may be expanded to the immediately bonded atoms using
``bonded [n] to ...`` setting, e.g. ``bonded 1 to calpha`` will select atoms
bonded to Cα atoms.  For this setting to work, bonds must be set by the user
using the :meth:`.AtomGroup.setBonds` or :meth:`.AtomGroup.inferBonds` method.  
It is also possible to select bonded atoms by excluding the originating atoms 
using ``exbonded [n] to ...`` setting.  Number ``'[n]'`` indicates number of 
bonds to consider from the originating selection.


Selection macros
-------------------------------------------------------------------------------

ProDy allows you to define a macro for any valid selection string.  Below
functions are for manipulating selection macros:

  * :func:`defSelectionMacro`
  * :func:`delSelectionMacro`
  * :func:`getSelectionMacro`
  * :func:`isSelectionMacro`

.. ipython:: python

   defSelectionMacro('alanine', 'resname ALA')
   p.select('alanine') == p.select('resname ALA')

You can also use this macro as follows:

.. ipython:: python

   p.alanine

Macros are stored in ProDy configuration file permanently.  You can delete
them if you wish as follows:

.. ipython:: python

   delSelectionMacro('alanine')


Keyword arguments
-------------------------------------------------------------------------------

:meth:`~.Select.select` method also accepts keyword arguments that can simplify
some selections.  Consider the following case where you want to select some
protein atoms that are close to its center:

.. ipython:: python

   protein = p.protein
   calcCenter(protein).round(2)
   sel1 = protein.select('sqrt(sq(x--21.17) + sq(y-35.86) + sq(z-79.97)) < 5')
   sel1

Instead, you could pass a keyword argument and use the keyword in the
selection string:

.. ipython:: python

   sel2 = protein.select('within 5 of center', center=calcCenter(protein))
   sel2
   sel1 == sel2

Note that selection string for *sel2* lists indices of atoms.  This
substitution is performed automatically to ensure reproducibility of the
selection without the keyword *center*.

Keywords cannot be reserved words (see :func:`.listReservedWords`) and must be
all alphanumeric characters."""

import sys
from re import compile as re_compile
try:
   # for python>=3.3
   from collections.abc import Iterable
except ImportError:
   # for python<3.3
   from collections import Iterable

import numpy as np
from numpy import array, ndarray, ones, zeros, arange
from numpy import invert, unique, concatenate, all, any
from numpy import logical_and, logical_or, floor, ceil, where

import pyparsing as pp
try:
    from pyparsing import operatorPrecedence as operatorPrecedence
except ImportError:
    from pyparsing import infixNotation as operatorPrecedence

from prody import LOGGER, SETTINGS, PY2K

from .atomic import Atomic
from .fields import ATOMIC_FIELDS
from .flags import PLANTERS as FLAG_PLANTERS

from .atomgroup import AtomGroup
from .chain import Chain, getSequence
from .atomic import AAMAP
from .pointer import AtomPointer
from .selection import Selection
from .segment import Segment
from .atommap import AtomMap

from prody.utilities import rangeString
from prody.kdtree import KDTree

if PY2K:
    range = xrange

DEBUG = 0
NUMB = 0 # Select instance will not really evaluate string for the atoms
TIMER = 0

def debug(sel, loc, *args):

    if DEBUG:
        print('')
        if args:
            print(args[0], args[1:])
        print(repr(sel))
        print(' ' * (loc + 1) + '^')

__all__ = ['Select', 'SelectionError', 'SelectionWarning',
           'defSelectionMacro', 'delSelectionMacro', 'getSelectionMacro',
           'isSelectionMacro']

ATOMGROUP = None

MACROS = SETTINGS.get('selection_macros', {})
MACROS_REGEX = None


[docs]def isSelectionMacro(word):
    """Returns **True** if *word* is a user defined selection macro."""

    try:
        return word in MACROS
    except:
        return False


[docs]def defSelectionMacro(name, selstr):
    """Define selection macro *selstr* with name *name*.  Both *name* and
    *selstr* must be string.  An existing keyword cannot be used as a macro
    name. If a macro with given *name* exists, it will be overwritten.

    .. ipython:: python

       defSelectionMacro('cbeta', 'name CB and protein')"""

    if not isinstance(name, str) or not isinstance(selstr, str):
        raise TypeError('both name and selstr must be strings')
    elif isReserved(name):
        raise ValueError('{0} is a reserved word and cannot be used as a '
                         'macro name'.format(repr(name)))
    elif not (name.isalpha() and name.islower()):
        raise ValueError('macro names must be all lower case letters, {0} '
                         'is not a valid macro name'.format(repr(name)))

    LOGGER.info('Testing validity of selection string.')
    try:
        ATOMGROUP.select(selstr)
    except SelectionError:
        LOGGER.warn('{0} is not a valid selection string, macro {1} is not'
                    'defined.'.format(repr(selstr), repr(name)))
    else:
        LOGGER.info("Macro {0} is defined as {1}."
                    .format(repr(name), repr(selstr)))
        MACROS[name] = selstr
        SETTINGS['selection_macros'] = MACROS
        SETTINGS.save()


[docs]def delSelectionMacro(name):
    """Delete the macro *name*.

    .. ipython:: python

       delSelectionMacro('cbeta')"""

    try:
        MACROS.pop(name)
    except:
        LOGGER.warn("Macro {0} is not found.".format(repr(name)))
    else:
        if MACROS_REGEX is not None: MACROS_REGEX.pop(name, None)
        LOGGER.info("Macro {0} is deleted.".format(repr(name)))
        SETTINGS['selection_macros'] = MACROS
        SETTINGS.save()


[docs]def getSelectionMacro(name=None):
    """Returns the definition of the macro *name*.  If *name* is not given,
    returns a copy of the selection macros dictionary."""

    if name is None:
        return MACROS.copy()
    try:
        return MACROS[name]
    except KeyError:
        LOGGER.info("{0} is not a user defined macro name."
                    .format(repr(name)))


def replaceMacros(selstr):

    global MACROS_REGEX
    if MACROS_REGEX is None: MACROS_REGEX = {}
    selstr = ' ' + selstr + ' '
    if MACROS:
        for name, macro in MACROS.items(): # PY3K: OK
            re = MACROS_REGEX.setdefault(name,
                re_compile('[( )]' + name + '[( )]'))

        for match in re.finditer(selstr):
            start, end = match.start(), match.end()
            match = selstr[start:end]
            selstr = (selstr[:start] + match[0] + '(' + macro+ ')' +
                      match[-1] + selstr[end:])
    return selstr[1:-1]


def checkSelstr(selstr, what, error=ValueError):
    """Check *selstr* if it satisfies a selected condition.  For now, only
    whether coordinate/distance based selection are checked.  If *error* is
    a subclass of :exc:`Exception`, an exception will be raised, otherwise
    return **True** or **False** will be returned."""

    selstr = selstr.replace('(', ' ( ')
    selstr = selstr.replace(')', ' ) ')

    if what in set(['dist']):
        for item in selstr.split():
            if item in XYZDIST:
                if issubclass(error, Exception):
                    raise error('invalid selection {0}, coordinate '
                                'based selections are not accepted'
                                .format(repr(selstr)))
                else:
                    return False


[docs]class SelectionError(Exception):

    """Exception raised when there are errors in the selection string."""

    def __init__(self, sel, loc=0, msg='', tkns=None):

        if tkns:
            for tkn in tkns:
                tkn = str(tkn)
                if sel.count(tkn, loc) == 1: loc = sel.index(tkn, loc)

        msg = ("An invalid selection string is encountered:\n{0}\n"
               .format(repr(sel)) +
               ' ' * (loc + 1) + '^ ' + msg)
        Exception.__init__(self, msg)


[docs]class SelectionWarning(Warning):

    """A class used for issuing warning messages when potential typos are
    detected in a selection string.  Warnings are issued to ``sys.stderr``
    via ProDy package logger.  Use :func:`.confProDy` to selection warnings
    *on* or *off*, e.g. ``confProDy(selection_warning=False)``."""

    def __init__(self, sel='', loc=0, msg='', tkns=None):

        if SETTINGS.get('selection_warning', True):

            if tkns:
                for tkn in tkns:
                    tkn = str(tkn)
                    if sel.count(tkn, loc) == 1: loc = sel.index(tkn, loc)

            msg = ('Selection string contains typo(s):\n'
                   '{0}\n '.format(repr(sel)) +
                   ' ' * loc + '^ ' + msg)
            LOGGER.warn(msg)

FIELDS_SYNONYMS = {'chid': 'chain',
                   'fragment': 'fragindex',
                   'resid': 'resnum',
                   'secstr': 'secondary',
                   'segname': 'segment'}


XYZ2INDEX = {'x': 0, 'y': 1, 'z': 2}

FUNCTIONS = {
    'sqrt'  : np.sqrt,
    'sq'    : lambda num: np.power(num, 2),
    'abs'   : np.abs,
    'floor' : np.floor,
    'ceil'  : np.ceil,
    'sin'   : np.sin,
    'cos'   : np.cos,
    'tan'   : np.tan,
    'asin'  : np.arcsin,
    'acos'  : np.arccos,
    'atan'  : np.arctan,
    'sinh'  : np.sinh,
    'cosh'  : np.cosh,
    'tahn'  : np.tanh,
    'exp'   : np.exp,
    'log'   : np.log,
    'log10' : np.log10,
}

OPERATORS = {
    '+'  : np.add,
    '-'  : np.subtract,
    '*'  : np.multiply,
    '/'  : np.divide,
    '%'  : np.remainder,
    '>'  : np.greater,
    '<'  : np.less,
    '>=' : np.greater_equal,
    '<=' : np.less_equal,
    '='  : np.equal,
    '==' : np.equal,
    '!=' : np.not_equal,
}

SAMEAS_MAP = {'residue': 'resindex', 'chain': 'chindex',
              'segment': 'segindex', 'fragment': 'fragindex'}

# maybe a value to a field/data label or field label
XYZ = set(['x', 'y', 'z'])
XYZDIST = set(['x', 'y', 'z', 'within', 'exwithin'])

OR = pp.Keyword('or')
AND = pp.Keyword('and')
WORD = pp.Word(pp.alphanums + '''~@#$.:;_'`,''')

_ = list(FUNCTIONS)
FUNCNAMES = set(_)

kwfunc = pp.Keyword(_[0])
FUNCNAMES_OPLIST = kwfunc
FUNCNAMES_EXPR = ~kwfunc
for func in _[1:]:
    kwfunc = pp.Keyword(func)
    FUNCNAMES_OPLIST = FUNCNAMES_OPLIST | kwfunc
    FUNCNAMES_EXPR += ~kwfunc

RE_SCHARS = re_compile(r'`[\w\W]*?`')
PP_SCHARS = pp.Regex(RE_SCHARS)


def specialCharsParseAction(sel, loc, token):

    token = token[0][1:-1]
    if not token:
        raise SelectionError(sel, loc, '`` is invalid, no special characters')
    if ':' in token or 'to' in token:
        try:
            token = PP_NRANGE.parseString(token)[0]
        except pp.ParseException:
            pass
    return token

PP_SCHARS.setParseAction(specialCharsParseAction)


RE_REGEXP = re_compile(r'"[\w\W]*"')
PP_REGEXP = pp.Regex(RE_REGEXP.pattern)


def regularExpParseAction(sel, loc, token):

    token = token[0]
    if token == '""':
        raise SelectionError(sel, loc, '"" is invalid, no regular expression')
    try:
        regexp = re_compile(token[1:-1])
    except:
        raise SelectionError(sel, loc, 'failed to compile regular '
                             'expression {0}'.format(repr(token)))
    else:
        return regexp

PP_REGEXP.setParseAction(regularExpParseAction)

_ = r'[-+]?\d+(\.\d*)?([eE]\d+)?'
RE_NRANGE = re_compile(_ + '\ *(to|:)\ *' + _)

PP_NRANGE = pp.Group(pp.Regex(RE_NRANGE.pattern) +
                     pp.Optional(pp.Regex(r'(\ *:\ *' + _ + ')')))


def rangeParseAction(sel, loc, tokens):

    tokens = tokens[0]
    debug(sel, loc, '_nrange', tokens)

    token = tokens[0]
    sep = ':' if ':' in token else 'to'
    first, last = token.split(sep)
    try:
        start = int(first)
    except ValueError:
        start = float(first)
    try:
        stop = int(last)
    except ValueError:
        stop = float(last)

    if start > stop:
        raise SelectionError(sel, loc, 'range start value ({0}) is greater '
                             'than stop value ({1})'
                             .format(repr(start), repr(stop)))
    elif start == stop:
        if sep == ':':
            raise SelectionError(sel, loc, 'range start value ({0}) is greater '
                             'than or equal to stop value ({1})'
                             .format(repr(start), repr(stop)))
        else:
            return first

    if sep == 'to':
        comp = '<='
    elif len(tokens) == 1:
        comp = '<'
    else:
        try:
            comp = int(tokens[1][1:])
        except ValueError:
            comp = float(tokens[1][1:])
    return 'range', start, stop, comp

PP_NRANGE.setParseAction(rangeParseAction)


UNARY = set(['not', 'bonded', 'exbonded', 'within', 'exwithin', 'same'])


[docs]class Select(object):

    """Select subsets of atoms based on a selection string.
    See :mod:`~.atomic.select` module documentation for selection grammar
    and examples.  This class makes use of pyparsing_ module."""

    def __init__(self):

        self._ag = None
        self._atoms = None
        self._indices = None
        self._n_atoms = None
        self._selstr = None

        self._coords = None
        self._kwargs  = None
        # set True when selection string alone cannot reproduce the selection
        self._ss2idx = False
        self._data = dict()
        self._replace = False

        self._parsers = {}


        self._evalmap = {'resnum': self._resnum, 'resid': self._resnum,
            'serial': self._serial, 'index': self._index,
            'x': self._generic, 'y': self._generic, 'z': self._generic,
            'chid': self._generic, 'secstr': self._generic,
            'fragment': self._generic, 'fragindex': self._generic,
            'segment': self._generic, 'segname': self._generic,
            'sequence': self._sequence, }


    def _reset(self):

        self._ag = None
        self._atoms = None
        self._indices = None
        self._n_atoms = None
        self._coords = None
        self._data.clear()

    def _evalAtoms(self, atoms):

        self._atoms = atoms
        try:
            self._ag = atoms.getAtomGroup()
        except AttributeError:
            self._ag = atoms
            self._indices = None
        else:
            self._indices = atoms._getIndices()
            if len(self._indices) == 1:
                try:
                    index = atoms.getIndex()
                except AttributeError:
                    pass
                else:
                    self._atoms = Selection(self._ag, array([index]),
                                    'index ' + str(index), atoms.getACSIndex())

[docs]    def select(self, atoms, selstr, **kwargs):
        """Returns a :class:`.Selection` of atoms matching *selstr*, or
        **None**, if selection string does not match any atoms.

        :arg atoms: atoms to be evaluated
        :type atoms: :class:`.Atomic`

        :arg selstr: selection string
        :type selstr: str

        Note that, if *atoms* is an :class:`.AtomMap` instance, an
        :class:`.AtomMap` is returned, instead of a a :class:`.Selection`.

        .. note:

            * A special case for making atom selections is passing an
              :class:`.AtomMap` instance as *atoms* argument.  Dummy
              atoms will not be included in the result, but the order
              of atoms will be preserved."""

        self._ss2idx = False
        self._replace = False

        self._selstr = selstr
        indices = self.getIndices(atoms, selstr, **kwargs)

        self._kwargs = None

        if len(indices) == 0:
            return None

        if not isinstance(atoms, AtomGroup):
            indices = self._indices[indices]

        ag = self._ag

        try:
            dummies = atoms.numDummies()
        except AttributeError:
            if self._ss2idx:
                selstr = 'index {0}'.format(rangeString(indices))
            else:
                if self._replace:
                    for key, value in kwargs.items(): # PY3K: OK
                        if value in ag and key in selstr:
                            if value == ag:
                                ss = 'all'
                            else:
                                ss = value.getSelstr()
                            selstr = selstr.replace(key, '(' + ss + ')')
                if isinstance(atoms, AtomPointer):
                    selstr = '({0}) and ({1})'.format(selstr,
                                                      atoms.getSelstr())

            return Selection(ag, indices, selstr, atoms.getACSIndex(),
                             unique=True)
        else:
            return AtomMap(ag, indices, atoms.getACSIndex(), dummies=dummies,
               title='Selection {0} from '.format(repr(selstr)) + str(atoms))

[docs]    def getIndices(self, atoms, selstr, **kwargs):
        """Returns indices of atoms matching *selstr*.  Indices correspond to
        the order in *atoms* argument.  If *atoms* is a subset of atoms, they
        should not be used for indexing the corresponding :class:`.AtomGroup`
        instance."""

        ss = selstr.strip()
        if (len(ss.split()) == 1 and ss.isalnum() and ss not in MACROS):
            self._evalAtoms(atoms)
            if ss == 'none':
                return array([])
            elif ss == 'all':
                return arange(atoms.numAtoms())
            elif atoms.isFlagLabel(ss):
                return atoms._getFlags(ss).nonzero()[0]
            elif atoms.isDataLabel(ss) or ss in self._evalmap:
                raise SelectionError(selstr, 0, 'must be followed by values',
                                     [ss])
            else:
                raise SelectionError(selstr, 0, 'is not a valid selection '
                                     'string', [ss])
        else:
            torf = self.getBoolArray(atoms, selstr, **kwargs)
            return torf.nonzero()[0]

[docs]    def getBoolArray(self, atoms, selstr, **kwargs):
        """Returns a boolean array with **True** values for *atoms* matching
        *selstr*.  The length of the boolean :class:`numpy.ndarray` will be
        equal to the length of *atoms* argument."""

        if not isinstance(atoms, Atomic):
            raise TypeError('atoms must be an Atomic instance, not {0}'
                            .format(type(atoms)))

        self._reset()

        for key in kwargs:
            if not key.isalnum():
                raise TypeError('{0} is not a valid keyword argument, '
                                  'keywords must be all alpha numeric '
                                  'characters'.format(repr(key)))
            if isReserved(key):
                loc = selstr.find(key)
                if loc > -1:
                    raise SelectionError(selstr, loc, '{0} is a reserved '
                                'word and cannot be used as a keyword argument'
                                .format(repr(key)))

        self._n_atoms = atoms.numAtoms()
        self._selstr = selstr
        self._kwargs = kwargs

        if DEBUG: print('getBoolArray', selstr)

        self._evalAtoms(atoms)

        selstr = selstr.strip()
        if (len(selstr.split()) == 1 and selstr.isalnum() and
            selstr not in MACROS):
            if selstr == 'none':
                return zeros(atoms.numAtoms(), bool)
            elif selstr == 'all':
                return ones(atoms.numAtoms(), bool)
            elif atoms.isFlagLabel(selstr):
                return atoms.getFlags(selstr)
            elif atoms.isDataLabel(selstr):
                raise SelectionError(selstr, 0, 'must be followed by values')
            else:
                raise SelectionError(selstr, 0, 'is not a valid selection or '
                                     'user data label')

        selstr = replaceMacros(selstr)
        try:
            parser = self._getParser(selstr)
            tokens = parser(selstr, parseAll=True)
        except pp.ParseException as err:
            self._parsers.pop(self._parser, None)
            pass
            which = selstr.rfind(' ', 0, err.column)
            if which > -1:
                if selstr[which + 1] == '(':
                    msg = ('an arithmetic, comparison, or logical operator '
                           'must precede the opening parenthesis')
                elif selstr[which - 1] == ')':
                    msg = ('an arithmetic, comparison, or logical operator '
                           'must follow the closing parenthesis')
                else:
                    msg = 'parsing failed here'
            else:
                msg = 'parsing failed here'

            raise SelectionError(selstr, err.column, msg + '\n' + str(err))
        else:
            if DEBUG: print('_evalSelstr', tokens)
            torf = tokens[0]

        if not isinstance(torf, ndarray):
            if DEBUG: print(torf)
            raise SelectionError(selstr)
        elif torf.dtype != bool:
            if DEBUG:
                print('_select torf.dtype', torf.dtype, isinstance(torf.dtype,
                                                                   bool))
            raise SelectionError(selstr)
        if DEBUG:
            print('_select', torf)
        return torf

    def _getParser(self, selstr):
        """Returns an efficient parser that can handle *selstr*."""

        alnum = selstr
        alpha = selstr
        for ch in selstr:
            if not ch.isalnum(): alnum = alnum.replace(ch, ' ')
            if not ch.isalpha(): alpha = alpha.replace(ch, ' ')
        items = set(alnum.split())
        chars = set(selstr)


        funcs = 4 if items.intersection(FUNCNAMES) else 0
        opers = 2 if chars.intersection(OPERATORS) else 0
        logic = 1 if 'or' in items or '(' in chars else 0

        schars = 8 if '`' in chars and RE_SCHARS.search(selstr) else 0
        regexp = 16 if '"' in chars and RE_REGEXP.search(selstr) else 0
        nrange = 32 if ((':' in chars or ' to ' in alpha) and
                        RE_NRANGE.search(selstr)) else 0

        self._parser = key = (logic + opers + funcs,
                              logic + funcs + schars + regexp + nrange)

        if key == (0, 0):
            return self._noParser

        try:
            return self._parsers[key][0].parseString
        except KeyError:
            pass


        word = ~AND + ~OR

        oplist = []
        if funcs:
            oplist.append((FUNCNAMES_OPLIST, 1, pp.opAssoc.RIGHT, self._func))
            # following causes 20% slow down
            #word += FUNCNAMES_EXPR

        if funcs or opers:
            oplist.extend([
                (pp.oneOf('+ -'), 1, pp.opAssoc.RIGHT, self._sign),
                (pp.oneOf('** ^'), 2, pp.opAssoc.LEFT, self._pow),
                (pp.oneOf('* / %'), 2, pp.opAssoc.LEFT, self._binop),
                (pp.oneOf('+ -'), 2, pp.opAssoc.LEFT, self._binop),
                (pp.oneOf('< > <= >= == = !='), 2, pp.opAssoc.LEFT,
                 self._comp)])

        oplist.extend([
          (pp.Optional(AND), 2, pp.opAssoc.LEFT, self._and),
          (OR, 2, pp.opAssoc.LEFT, self._or)])

        word += WORD

        expr = word
        if schars: expr = PP_SCHARS | expr
        if regexp: expr = PP_REGEXP | expr
        if nrange: expr = PP_NRANGE | expr

        parser = operatorPrecedence(expr, oplist)
        parser.setParseAction(self._default)
        parser.leaveWhitespace()
        parser.enablePackrat()
        self._parsers[key] = parser, expr, oplist
        return parser.parseString

    def _noParser(self, selstr, parseAll=True):

        debug(selstr, 0, ['_noParser'])
        return [self._default(selstr, 0, selstr.split())]

    def _getZeros(self, subset=None):
        """Returns a bool array with zero elements."""

        if subset is None:
            return zeros(self._atoms.numAtoms(), bool)
        else:
            return zeros(len(subset), bool)

    def _default(self, sel, loc, tokens):

        debug(sel, loc, '_default', tokens)
        if NUMB: return

        if len(tokens) == 1:
            torf, err = self._eval(sel, loc, tokens)
        else:
            torf, err = self._and2(sel, loc, tokens)
        if err: raise err
        return torf


    def _eval(self, sel, loc, tokens, subset=None):

        debug(sel, loc, '_eval', tokens)
        if NUMB: return
        #if isinstance(tokens, ndarray):
        #    return tokens

        if len(tokens) == 1:

            token = tokens[0]
            try:
                dtype = token.dtype
            except AttributeError:
                pass
            else:
                return token, False

            if token == 'none':
                return zeros(self._n_atoms if subset is None else len(subset),
                              bool), False
            elif token == 'all':
                return ones(self._n_atoms if subset is None else len(subset),
                            bool), False

            elif self._atoms.isFlagLabel(token):
                return self._getFlags(token, subset), False

            elif self._atoms.isDataLabel(token):
                data, err = self._getData(sel, loc, token)
                if subset is None:
                    return data, False
                else:
                    return None, SelectionError(sel, loc, 'subset??')
                    return data[subset], False

            try:
                arg = self._kwargs[token]
            except KeyError:
                try:
                    return float(token), False
                except ValueError:

                    data, err = self._getData(sel, loc, token)
                    if data is not None:
                        return data, False

                    if token in ATOMIC_FIELDS:
                        return None, SelectionError(sel, loc, '{0} data is '
                            'not found'.format(repr(token)), [token])
                    else:
                        return None, SelectionError(sel, loc, '{0} could '
                            'not be evaluated'.format(repr(token)), [token])
            else:
                if arg in self._ag:
                    try:
                        dummies = arg.numDummies()
                    except AttributeError:
                        try:
                            indices = arg._getIndices()
                        except AttributeError:
                            indices = arange(self._ag.numAtoms())
                    else:
                        if dummies:
                            indices = arg._getIndices()[arg.getFlags('mapped')]
                        else:
                            indices = arg._getIndices()

                    torf = zeros(self._ag.numAtoms(), bool)
                    torf[indices] = True
                    if self._indices is not None:
                        torf = torf[self._indices]
                    self._replace = True
                    return torf, False
                return token, False
        else:
            return self._evalmap.get(tokens[0], self._generic)(
                                            sel, loc, tokens, subset)

    def _getFlags(self, label, subset):

        if subset is None:
            # get a copy to avoid alterations
            return self._atoms.getFlags(label)
        else:
            return self._atoms._getFlags(label)[subset]

    def _or(self, sel, loc, tokens):

        debug(sel, loc, '_or', tokens)
        tokens = tokens[0]
        if NUMB: return

        flags = []
        torfs = []
        evals = []
        atoms = self._atoms
        isFlagLabel = atoms.isFlagLabel
        isDataLabel = atoms.isDataLabel
        for token in tokens:
            # check whether token is an array to avoid array == str comparison
            try:
                dtype = token.dtype
            except AttributeError:
                if token == 'or':
                    continue
                elif isFlagLabel(token):
                    flags.append(token)
                elif isDataLabel(token) or token in self._evalmap:
                    evals.append([])
                    evals[-1].append(token)
                else:
                    try:
                        evals[-1].append(token)
                    except IndexError:
                        raise SelectionError(sel, loc)
            else:
                if dtype == bool:
                    torfs.append(token)
                else:
                    try:
                        evals[-1].append(token)
                    except IndexError:
                        raise SelectionError(sel, loc)

        torf = None
        if torfs:
            torf = torfs.pop(0)
            while torfs:
                ss = where(torf == 0)[0]
                if len(ss) == 0: return torf
                torf[ss] = torfs.pop(0)[ss]

        if flags:
            if torf is None:
                torf = atoms.getFlags(flags.pop(0))
            while flags:
                ss = where(torf == 0)[0]
                if len(ss) == 0: return torf
                torf[ss] = atoms._getFlags(flags.pop(0))[ss]

        if evals:
            if torf is None:
                tokens = evals.pop(0)
                first = str(tokens[0])
                torf, err = self._eval(sel, loc, tokens)
                if err: raise err
                try:
                    dtype = torf.dtype
                except AttributeError:
                    raise SelectionError(sel, loc, 'a problem '
                        'occurred when evaluating token {0}'
                        .format(repr(first)), [first])

                else:
                    if dtype != bool:
                        raise SelectionError(sel, loc, 'a problem '
                            'occurred when evaluating token {0}'
                            .format(repr(first)), [first])
            while evals:
                ss = where(torf == 0)[0]
                if len(ss) == 0: return torf
                tokens = evals.pop(0)
                first = str(tokens[0])
                arr, err = self._eval(sel, loc, tokens, subset=ss)
                if err: raise err

                try:
                    dtype = arr.dtype
                except AttributeError:
                    raise SelectionError(sel, loc, 'a problem '
                        'occurred when evaluating token {0}'
                        .format(repr(first)), [first])

                else:
                    if dtype != bool:
                        raise SelectionError(sel, loc, 'a problem '
                            'occurred when evaluating token {0}'
                            .format(repr(first)), [first])

                torf[ss] = arr

        return torf

    def _and(self, sel, loc, tokens):

        debug(sel, loc, '_and', tokens)
        tokens = tokens[0]
        torf, err = self._and2(sel, loc, tokens)
        if err: raise err
        return torf

    def _and2(self, sel, loc, tokens, subset=None):

        debug(sel, loc, '_and2', tokens)
        if NUMB: return
        
        firsttoken = tokens[0] if not isinstance(tokens[0], Iterable) else list(tokens[0])
        lasttoken = tokens[-1] if not isinstance(tokens[-1], Iterable) else list(tokens[-1])
        if firsttoken == 'and' or lasttoken == 'and':
            return None, SelectionError(sel, loc, '{0} operator must be '
                'surrounded with arguments'.format(repr('and')), [tokens[0]])

        flags = []
        torfs = []
        evals = []
        unary = []
        atoms = self._atoms
        isFlagLabel = atoms.isFlagLabel
        isDataLabel = atoms.isDataLabel
        append = None
        wasand = False
        wasdata = False
        while tokens:
            # check whether token is an array to avoid array == str comparison
            token = tokens.pop(0)
            try:
                dtype = token.dtype
            except AttributeError:
                if token == 'and':
                    if wasand:
                        return None, SelectionError(sel, loc, 'incorrect use '
                            'of `and` operator, expected {0}'
                            .format(repr('and ... and')), ['and', 'and'])
                    append = None
                    wasand = True
                    wasdata = False
                    continue

                elif isFlagLabel(token) and not wasdata:
                    flags.append(token)
                    append = None

                elif (isDataLabel(token) or token in self._evalmap or
                      token in ATOMIC_FIELDS):
                    # not evals, must start a new list
                    # last evals list must have more than one values
                    #
                    if not evals:
                        evals.append([])
                        append = evals[-1].append
                    elif len(evals[-1]) == 1:
                        if token in XYZ:
                            pass
                        else:
                            return None, SelectionError(sel, loc, '{0} '
                                'must be followed by values'
                                .format(evals[-1][0]), [evals[-1][0]])
                    elif token in XYZ:
                        if (not tokens or tokens[0] in self._evalmap or
                              tokens[0] in ATOMIC_FIELDS or
                              isDataLabel(tokens[0]) or tokens[0] == 'and'):
                            pass
                        else:
                            try:
                                float(tokens[0])
                            except ValueError as err:
                                pass
                            else:
                                evals.append([])
                                append = evals[-1].append
                    elif not tokens:
                        return None, SelectionError(sel, loc, '{0} '
                            'must be followed by values'
                            .format(token), [token])
                    else:
                        evals.append([])
                        append = evals[-1].append
                    append(token)
                    wasdata = True

                elif token in UNARY:
                    unary.append([])
                    append = unary[-1].append
                    wasdata = False

                    if token == 'not':
                        append((token,))

                    elif token == 'same':
                        if len(tokens) < 3 or tokens[1] != 'as':
                            return None, SelectionError(sel, loc, 'incorrect '
                                'use of `same as` statement, expected {0}'
                                .format('same entity as ...'), [token])
                        append((token, tokens.pop(0), tokens.pop(0)))

                    elif token.endswith('within'):
                        if len(tokens) < 3 or tokens[1] != 'of':
                            return None, SelectionError(sel, loc, 'incorrect '
                                'use of `within` statement, expected {0}'
                                .format('[ex]within x.y of ...'), [token])
                        append((token, tokens.pop(0), tokens.pop(0)))

                    elif token.endswith('bonded'):
                        token2 = tokens.pop(0)
                        if len(tokens) < (1 + int(token2 == 'to')):
                            return None, SelectionError(sel, loc, 'incorrect '
                                'use of `bonded` statement, expected {0}'
                                .format('[ex]bonded [n] to ...'), [token2])
                        if token2 == 'to':
                            append((token, 'to'))
                        else:
                            append((token, token2, tokens.pop(0)))

                    anyargs = False
                    while tokens:
                        next = tokens[0]
                        try:
                            dtype = next.dtype
                        except AttributeError:
                            append(tokens.pop(0))
                            if next == 'and' or isFlagLabel(next):
                                break
                            if isDataLabel(next) or next in self._evalmap:
                                if anyargs:
                                    break
                                else:
                                    anyargs = True
                        else:
                            append(tokens.pop(0))
                            break
                else:
                    try:
                        append(token)
                    except TypeError:
                        return None, SelectionError(sel, loc, 'a problem '
                                    'occurred when evaluation token {0}'
                                    .format(repr(token)), [token])
            else:
                if dtype == bool:
                    torfs.append(token)
                else:
                    return None, SelectionError(sel, loc, 'a problem '
                                'occurred when evaluation token {0}'
                                .format(repr(token)), [token])
            wasand = False
        torf = None

        if torfs:
            torf = torfs.pop(0)

            while torfs:
                ss = torf.nonzero()[0]
                if len(ss) == 0: return torf, False
                torf[ss] = torfs.pop(0)[ss]

        if flags:
            if torf is None:
                torf = atoms.getFlags(flags.pop(0))

            while flags:
                ss = torf.nonzero()[0]
                if len(ss) == 0: return torf, False
                torf[ss] = atoms._getFlags(flags.pop(0))[ss]

        if unary:
            if torf is None:
                torf, err = self._unary(sel, loc, unary.pop(0))
                if err: return None, err

            while unary:
                ss = torf.nonzero()[0]
                if len(ss) == 0: return torf, False
                arr, err = self._unary(sel, loc, unary.pop(0))
                if err: return None, err
                torf[ss] = arr[ss]

        if evals:
            if torf is None:
                tokens = evals.pop(0)
                first = str(tokens[0])
                torf, err = self._eval(sel, loc, tokens, subset=subset)
                if err: return None, err
                try:
                    dtype = torf.dtype
                except AttributeError:
                    return None, SelectionError(sel, loc, 'a problem '
                        'occurred when evaluating token {0}'
                        .format(repr(first)), [first])

                else:
                    if dtype != bool:
                        return None, SelectionError(sel, loc, 'a problem '
                            'occurred when evaluating token {0}'
                            .format(repr(first)), [first])
            while evals:
                ss = torf.nonzero()[0]
                if len(ss) == 0: return torf, False
                tokens = evals.pop(0)
                first = str(tokens[0])
                arr, err = self._eval(sel, loc, tokens, subset=ss)
                if err: return None, err

                try:
                    dtype = arr.dtype
                except AttributeError:
                    return None, SelectionError(sel, loc, 'a problem '
                        'occurred when evaluating token {0}'
                        .format(repr(first)), [first])

                else:
                    if dtype != bool:
                        return None, SelectionError(sel, loc, 'a problem '
                            'occurred when evaluating token {0}'
                            .format(repr(first)), [first])

                torf[ss] = arr

        # ?? check torf.shape/ndim
        if subset is None:
            return torf, False
        else:
            return torf[subset], False

    def _unary(self, sel, loc, tokens):

        debug(sel, loc, '_unary', tokens)
        if NUMB: return

        what = tokens[0]
        which = tokens[1:]

        if not which:
            return None, SelectionError(sel, loc, '{0} must be followed by '
                .format(repr(' '.join(what))), what)
        if len(which) == 1:
            which, err = self._eval(sel, loc, which)
        else:
            which, err = self._and2(sel, loc, which)
        if err: raise err

        tokens = [what, which]
        if what[0] == 'not':
            return self._not(sel, loc, tokens)
        elif what[0] == 'same':
            return self._sameas(sel, loc, tokens)
        elif what[-1] == 'to':
            return self._bondedto(sel, loc, tokens)
        else:
            return self._within(sel, loc, tokens)

    def _not(self, sel, loc, tokens):
        """Negate selection."""

        debug(sel, loc, '_not', tokens)
        label, torf = tokens
        return invert(torf, torf), False

    def _within(self, sel, loc, tokens):
        """Perform distance based selection."""

        if DEBUG: print('_within', tokens)
        label, which = tokens
        within = label[1]
        label = ' '.join(label)
        try:
            within = float(within)
        except Exception as err:
            return None, SelectionError('could not convert {0} in {1} to '
                'float ({2})'.format(within, repr(label), str(err)),
                [label, within])
        exclude = label.startswith('ex')
        other = False
        try:
            dtype = which.dtype
        except AttributeError:

            if which in self._kwargs:
                coords = self._kwargs[which]
                try:
                    ndim, shape = coords.ndim, coords.shape
                except AttributeError:
                    try:
                        coords = coords._getCoords()
                    except AttributeError:
                        try:
                            coords = coords.getCoords()
                        except AttributeError:
                            return None, SelectionError(sel, loc,
                                '{0} must be a coordinate array or have '
                                '`getCoords` method'.format(repr(which)),
                                [label, which])
                    if coords is None:
                        return None, SelectionError(sel, loc,
                            'coordinates are not set for {0} ({1})'
                            .format(repr(which), repr(self._kwargs[which])),
                            [label, which])
                    else:
                        ndim, shape = coords.ndim, coords.shape
                if ndim == 1 and shape[0] == 3:
                    coords = array([coords])
                elif not (ndim == 2 and shape[1] == 3):
                    return None, SelectionError(sel, loc,
                        '{0} must be a coordinate array or have '
                        '`getCoords` method'.format(repr(which)),
                        [label, which])
                exclude=False
                self._ss2idx = True
                which = arange(len(coords))
                other = True
        else:
            if dtype == bool:
                which = which.nonzero()[0]
                coords = self._getCoords()
                if coords is None:
                    return None, SelectionError(sel, loc, 'coordinates are '
                                                'not set')
            else:
                return None, SelectionError(sel, loc, 'not understood')

        if other or len(which) < 20:
            kdtree = self._atoms._getKDTree()
            get_indices = kdtree.getIndices
            search = kdtree.search
            get_count = kdtree.getCount
            torf = zeros(self._ag.numAtoms(), bool)
            for index in which:
                search(within, coords[index])
                if get_count():
                    torf[get_indices()] = True
            if self._indices is not None:
                torf = torf[self._indices]
            if exclude:
                torf[which] = False

        else:
            n_atoms = self._atoms.numAtoms()
            torf = ones(n_atoms, bool)
            torf[which] = False
            check = torf.nonzero()[0]
            torf = zeros(n_atoms, bool)

            cxyz = coords[check]
            kdtree = KDTree(coords[which])
            search = kdtree.search
            get_count = kdtree.getCount
            select = []
            append = select.append
            for i, xyz in enumerate(cxyz):
                search(within, xyz)
                if get_count():
                    append(i)

            torf[check[select]] = True
            if not exclude:
                torf[which] = True

        return torf, False

    def _sameas(self, sel, loc, tokens):
        """Evaluate ``'same entity as ...'`` expression."""

        debug(self, loc, '_sameas', tokens)
        label, which = tokens
        what = label[1]
        label = ' '.join(label)
        index = SAMEAS_MAP.get(what)
        if index is None:
            return None, SelectionError(sel, loc, 'entity in "same ... as" '
                'must be one of "chain", "residue", "segment", or "fragment",'
                ' not {0}'.format(repr(what)), [label])

        indices, err = self._getData(sel, loc, index)
        iset = set(indices[which])
        torf = array([i in iset for i in indices], bool)

        return torf, False

    def _bondedto(self, sel, loc, tokens):
        """Expand selection to immediately bonded atoms."""

        debug(sel, loc, '_bondedto', tokens)
        label, torf = tokens
        token = label[1]
        label = ' '.join(label)
        if token == 'to':
            repeat = 1
        else:
            try:
                repeat = int(token)
            except TypeError:
                return None, SelectionError(sel, loc, '{0} in {0} could not '
                    'be converted to an integer'.format(token, repr(label)),
                    [label])
            else:
                if float(token) != repeat:
                    SelectionWarning(sel, loc, 'number in {0} should be an '
                        'integer'.format(repr(label)), [label])

            if repeat <= 0:
                SelectionWarning(sel, loc, 'number in {0} should be a '
                    'positive integer'.format(repr(label)), [label])
                return zeros(self._atoms.numAtoms(), bool), False

        bmap = self._ag._bmap
        if bmap is None:
            return None, SelectionError(sel, loc, 'bonds are not set',
                                        [label])
        which = torf.nonzero()[0]
        if not len(which):
            return torf, False

        indices = self._indices
        if indices is not None:
            bmap = bmap[indices]
        n_atoms = self._ag.numAtoms()
        for i in range(repeat):
            torf = zeros(n_atoms, bool)
            bonded = unique(bmap[which])
            if bonded[0] == -1:
                torf[bonded[1:]] = True
            else:
                torf[bonded] = True
            if indices is not None:
                torf = torf[indices]
            if label.startswith('ex'):
                torf[which] = False
            else:
                torf[which] = True
            if i + 1 < repeat:
                which = torf.nonzero()[0]

        return torf, False

    def _getNumeric(self, sel, loc, arg, copy=False):
        """Returns numeric data or a number."""

        debug(sel, loc, '_getNumeric', arg)

        # arg may be an array, a string, or a regular expression
        try:
            dtype, ndim = arg.dtype, arg.ndim
        except AttributeError:
            pass
        else:
            # i don't expect that a string array may show up
            if dtype == bool:
                return None, SelectionError(sel, loc, 'operands and function '
                    'arguments must be numbers or numeric data labels')
            else:
                return arg, False

        # no regular expressions
        try:
            pattern = arg.pattern
        except AttributeError:
            pass
        else:
            return None, SelectionError(sel, sel.index(pattern, loc),
               'operands and function arguments cannot be regular expressions')

        if arg in XYZ2INDEX:
            coords = self._getCoords() # how about atoms._getCoords() ?
            if coords is None:
                return None, SelectionError(sel, loc,
                    'coordinates are not set')
            else:
                if copy:
                    return coords[:, XYZ2INDEX[arg]].copy(), False
                else:
                    return coords[:, XYZ2INDEX[arg]], False
        arg = FIELDS_SYNONYMS.get(arg, arg)
        try:
            if copy:
                data = self._atoms.getData(arg)
            else:
                data = self._atoms._getData(arg)
        except Exception as err:
            return None, SelectionError(sel, loc, 'following exception '
                        'occurred when evaluating {0}: {1}'
                        .format(repr(arg), str(err)))

        if data is not None:
            if data.dtype.char in 'US':
                return None, SelectionError(sel, loc, '{0} is not a numeric '
                        'data label'.format(repr(arg)))
            else:
                return data, False

        if arg == 'index':
            try:
                if copy:
                    return self._atoms.getIndices(), False
                else:
                    return self._atoms._getIndices(), False
            except AttributeError:
                return arange(self._atoms.numAtoms()), False

        try:
            return float(arg), False
        except Exception as err:
            return None, SelectionError(sel, loc, '{0} is not a number or a '
                    'numeric data label'
                    .format(repr(arg), ))

    def _comp(self, sel, loc, tokens):
        """Perform comparison."""

        tokens = tokens[0]
        debug(sel, loc, '_comp', tokens)
        if NUMB: return

        if len(tokens) >= 3 and len(tokens) % 2 != 1:
            raise SelectionError(sel, loc,
                                 'invalid number of operators and operands')
        left, err = self._getNumeric(sel, loc, tokens.pop(0))
        if err: raise err

        torf = None
        while tokens:
            try:
                binop = OPERATORS[tokens.pop(0)]
            except KeyError:
                raise SelectionError(sel, loc, 'invalid operator encountered')

            right, err = self._getNumeric(sel, loc, tokens.pop(0))
            if err: raise err

            if torf is None:
                torf = binop(left, right)
            else:
                logical_and(binop(left, right), torf, torf)
            left = right
        # check whether atomic data was contained in comparison
        # i.e. len(atoms) == len(torf)
        try:
            ndim, shape = torf.ndim, torf.shape
        except AttributeError:
            raise SelectionError(sel, loc,
                                'comparison must contain atomic data')
        else:
            if ndim != 1 or shape[0] != self._atoms.numAtoms():
                raise SelectionError(sel, loc,
                                    'comparison must contain atomic data')
            else:
                return torf

    def _binop(self, sel, loc, tokens):
        """Perform binary operation."""

        tokens = tokens[0]
        debug(sel, loc, '_binop', tokens)
        if NUMB: return

        if len(tokens) >= 3 and len(tokens) % 2 != 1:
            raise SelectionError(sel, loc, 'invalid number of items')
        left, err = self._getNumeric(sel, loc, tokens.pop(0), copy=True)
        if err: raise err

        while tokens:
            binop = tokens.pop(0)
            if binop not in OPERATORS:
                raise SelectionError(sel, loc, 'invalid operator encountered')

            right, err = self._getNumeric(sel, loc, tokens.pop(0))
            if err: raise err

            if DEBUG: print(binop, left, right)
            if binop == '/' and any(right == 0.0):
                raise SelectionError(sel, loc, 'zero division error')
            binop = OPERATORS[binop]

            try:
                ndim = left.ndim
            except:
                left = binop(left, right)
            else:
                # ndim must not be zero for in place operation
                if ndim:
                    binop(left, right, left)
                else:
                    left = binop(left, right)
        return left

    def _pow(self, sel, loc, tokens):
        """Perform power operation. Expected operands are numbers
        and numeric atom attributes."""

        tokens = tokens[0]
        debug(sel, loc, '_pow', tokens)
        if NUMB: return

        base, err = self._getNumeric(sel, loc, tokens.pop(0))
        if err: raise err
        power, err = self._getNumeric(sel, loc, tokens.pop())
        if err: raise err

        if tokens.pop() not in ('^', '**'):
            raise SelectionError(sel, loc, 'invalid power operator')
        while tokens:
            number, err = self._getNumeric(sel, loc,   tokens.pop())
            if err: raise err
            power = number ** power
            if tokens.pop() not in ('^', '**'):
                raise SelectionError(sel, loc, 'invalid power operator')

        return base ** power

    def _sign(self, sel, loc, tokens):
        """Change the sign of a selection argument."""

        tokens = tokens[0]
        debug(sel, loc, '_sign', tokens)
        if NUMB: return

        if len(tokens) != 2:
            raise SelectionError(sel, loc,
                                 'sign operators (+/-) must be followed '
                                 'by single keyword, e.g. "-x", "-beta"')
        arg, err = self._getNumeric(sel, loc, tokens[1])
        if err: raise err

        if tokens[0] == '-':
            arg =  -arg
        return arg

    def _func(self, sel, loc, tokens):
        """Evaluate functions used in selection strings."""

        tokens = list(tokens[0])
        debug(sel, loc, '_func', tokens)
        if NUMB: return

        if len(tokens) != 2:
            raise SelectionError(sel, loc, '{0} accepts a single numeric '
                             'argument, e.g. {0}(x)'.format(repr(tokens[0])))
        arg, err = self._getNumeric(sel, loc, tokens[1])
        if err: raise err
        debug(sel, loc, tokens[0], arg)
        return FUNCTIONS[tokens[0]](arg)

    def _generic(self, sel, loc, tokens, subset=None):

        debug(sel, loc, '_generic', tokens)

        label = tokens.pop(0)
        data, err = self._getData(sel, loc, label)
        if err: return None, err

        if subset is not None:
            data = data[subset]
            subset = None
        dtype = data.dtype
        type_ = dtype.type
        isstr = dtype.char == 'S' or dtype.char == 'U'
        if isstr:
            maxlen = int(dtype.str[2:])
        torf = None
        regexp = []
        values = []
        ranges = []

        valset = True
        for token in tokens:

            # check for regular expressions which are only compatible with
            # string data type
            try:
                token.pattern
            except AttributeError:
                pass
            else:
                if not isstr:
                    ptrn = '"{0}"'.format(token.pattern)
                    SelectionWarning(sel, loc, '{0} is a regular '
                        'expression and is not evaluated for {1}'
                        .format(ptrn, repr(label)), [label, ptrn])
                else:
                    regexp.append(token)
                continue


            # check for ranges
            if token[0] == 'range':
                if isstr:
                    SelectionWarning(sel, loc, 'number ranges '
                        'are not evaluated with data type of {0}'
                        .format(repr(label)), [label, token[1]])
                else:
                    if token[-1] in OPERATORS:
                        ranges.append(token)
                    else:
                        nrange = arange(*token[1:])
                        # if dtypes are not the same, don't use set method
                        if nrange.dtype != dtype: valset = False
                        values.extend(nrange)
                continue

            if isstr:
                if token == '_':
                    values.append('')
                    values.append(' ')
                else:
                    if len(token) > maxlen:
                        SelectionWarning(sel, loc, '{0} is longer than the '
                            'maximum characters for data field {1}'
                            .format(repr(token), repr(label)), [label, token])
                    values.append(token)
            else:
                try:
                    value = type_(token)
                except Exception as err:
                    SelectionWarning(sel, loc, '{0} could not be '
                        'converted to type of {1} ({2})'
                        .format(repr(token), repr(label), str(err)),
                        [label, token])
                    continue
                try:
                    val2 = float(token)
                except:
                    pass
                else:
                    if val2 != value:
                        SelectionWarning(sel, loc, '{0} has a different '
                            'values when converted to a float and to type of '
                            '{1}'.format(repr(token), repr(label)),
                            [label, token])
                values.append(value)

        if values:
            # use first option only if values and data array has the same dtype
            if valset and len(values) > 10:
                valset = set(values)
                torf = array([val in values for val in data], bool)
            else:
                torf = data == values.pop(0)
                for val in values:
                    subset = (torf == False).nonzero()[0]
                    if len(subset) == 0: return torf, False
                    torf[subset] = data[subset] == val

        if ranges:
            while ranges:
                _, start, stop, comp = ranges.pop(0)
                if torf is None:
                    torf = start <= data
                else:
                    subset = (torf == False).nonzero()[0]
                    if len(subset) == 0: return torf, False
                    subdata = data[subset]
                    torf[subset] = start <= subdata
                if subset is None:
                    torf = logical_and(torf, OPERATORS[comp](data, stop), torf)
                else:
                    torf[subset] = logical_and(torf[subset],
                                       OPERATORS[comp](subdata, stop))

        if regexp:
            for re in regexp:
                if torf is None:
                    torf = array([re.match(val) is not None
                                  for val in data], bool)
                else:
                    subset = (torf == False).nonzero()[0]
                    if len(subset) == 0: return torf, False
                    torf[subset] = [re.match(val) is not None
                                    for val in data[subset]]
        if torf is None:
            torf = self._getZeros(subset)
        return torf, False

    def _index(self, sel, loc, tokens, subset=None):

        debug(sel, loc, '_index', tokens)
        label = tokens.pop(0)
        torf = zeros(self._ag.numAtoms(), bool)

        for token in tokens:
            try:
                remainder = token % 1.
            except TypeError:
                pass
            else:
                return None, SelectionError(sel, loc, 'it is a number, index')
                if remainder == 0:
                    try:
                        torf[token] = True
                    except IndexError:
                        pass
                else:
                    SelectionWarning(sel, loc, '{0} must be followed by '
                        'integers and/or number ranges'.format(repr(label)),
                        [label, token])
                continue

            try:
                token.pattern
            except AttributeError:
                pass
            else:
                ptrn = '"{0}"'.format(token.pattern)
                SelectionWarning(sel, loc, '{0} is a regular '
                    'expression and is not evaluated for {1}'
                    .format(ptrn, repr(label)), [label, ptrn])
                continue

            if token[0] == 'range':
                _, start, stop, step = token
                if step in OPERATORS:
                    if step == '<=':
                        stop += 1
                    step = 1
                if start % 1.0 != 0 or stop % 1.0 != 0 or step % 1.0 != 0:
                    SelectionWarning(sel, loc, '{0} number ranges should be '
                        'specified by integers'.format(repr(label)),
                        [label, str(start)])
                torf[start:stop:step] = True
                continue

            try:
                val = float(token)
            except TypeError:
                return None, SelectionError(sel, loc, '{0} must be '
                    'followed by integers and/or number ranges',
                    [label, token])
            else:
                if val % 1.0 == 0:
                    try:
                        torf[int(val)] = True
                    except IndexError:
                        pass
                else:
                    SelectionWarning(sel, loc, '{0} must be followed by '
                        'integers and/or number ranges'.format(repr(label)),
                        [label, token])

        try:
            indices = self._atoms._getIndices()
        except AttributeError:
            pass
        else:
            torf = torf[indices]

        if subset is not None:
            torf = torf[subset]
        return torf, False

    def _serial(self, sel, loc, tokens, subset=None):

        debug(sel, loc, '_serial', tokens)
        label = tokens.pop(0)
        sn2i = self._ag._getSN2I()
        if sn2i is None:
            return None, SelectionError(sel, loc, 'serial numbers are not set',
                                        ['serial'])
        torf = zeros(len(sn2i), bool)
        for token in tokens:
            try:
                remainder = token % 1.
            except TypeError:
                pass
            else:
                return None, SelectionError(sel, loc, '??? it is a number, serial')
                if remainder == 0:
                    try:
                        torf[token] = True
                    except IndexError:
                        pass
                else:
                    SelectionWarning(sel, loc, '{0} must be followed by '
                        'integers and/or number ranges'.format(repr(label)),
                        [label, token])
                continue

            try:
                pattern = token.pattern
            except AttributeError:
                pass
            else:
                ptrn = '"{0}"'.format(pattern)
                SelectionWarning(sel, loc, '{0} is a regular '
                    'expression and is not evaluated for {1}'
                    .format(ptrn, repr(label)), [label, ptrn])
                continue

            if token[0] == 'range':
                _, start, stop, step = token
                if step in OPERATORS:
                    if step == '<=':
                        stop += 1
                    step = 1
                if start % 1.0 != 0 or stop % 1.0 != 0 or step % 1.0 != 0:
                    SelectionWarning(sel, loc, '{0} number ranges should be '
                        'specified by integers'.format(repr(label)),
                        [label, str(start)])
                torf[start:stop:step] = True
                continue

            try:
                val = float(token)
            except TypeError:
                return None, SelectionError(sel, loc, '{0} must be '
                    'followed by integers and/or number ranges',
                    [label, token])
            else:
                if val % 1.0 == 0:
                    try:
                        torf[int(val)] = True
                    except IndexError:
                        pass
                else:
                    SelectionWarning(sel, loc, '{0} must be followed by '
                        'integers and/or number ranges'.format(repr(label)),
                        [label, token])

        indices = sn2i[torf]
        indices = indices[where(indices != -1)[0]]
        if len(indices) == 0:
            return self._getZeros(subset), False
        torf = zeros(self._ag.numAtoms(), bool)
        torf[indices] = True
        try:
            indices = self._atoms._getIndices()
        except AttributeError:
            pass
        else:
            torf = torf[indices]

        if subset is not None:
            torf = torf[subset]
        return torf, False

    def _resnum(self, sel, loc, tokens, subset=None):

        debug(sel, loc, '_resnum', tokens)
        label = tokens.pop(0)

        resnums, err = self._getData(sel, loc, 'resnum')
        if err: return None, err
        wicode = set([])
        values = ['resnum']
        for token in tokens:

            try:
                pattern = token.pattern
            except AttributeError:
                pass
            else:
                ptrn = '"{0}"'.format(pattern)
                SelectionWarning(sel, loc, '{0} is a regular expression and '
                                 'its use with {1} is not recommended'
                                 .format(ptrn, repr(label)), [label, ptrn])
                continue

            if token[0] == 'range':
                values.append(token)
                continue

            try:
                value = float(token)
            except (TypeError, ValueError):
                icode = token[-1]
                value = token[:-1]
                try:
                    value = int(value)
                except:
                    SelectionWarning(sel, loc, '{0} must be followed by '
                        'integers, number ranges, or integer and insertion '
                        'code combinations, e.g. {1}'
                        .format(repr(label), repr('10A')),
                            [label, token])
                else:
                    wicode.add((value, '' if icode == '_' else icode))
            else:
                if value != int(token):
                    SelectionWarning(sel, loc, '{0} must be followed by '
                        'integers and/or number ranges'.format(repr(label)),
                        [label, token])
                values.append(token)
        torf = None

        if len(values) > 1:
            torf, _ = self._generic(sel, loc, values, subset)

        if wicode:
            icode, err = self._getData(sel, loc, 'icode')
            if err: return None, err
            if subset is None:
                rnic = zip(resnums, icode) # PY3K: OK
            else:
                rnic = zip(resnums[subset], icode[subset]) # PY3K: OK

            if torf is None:
                torf = array([val in wicode for val in rnic], bool)
            else:
                torf = logical_or(torf, array([val in wicode for val in rnic],
                                               bool), torf)
        return torf, False

    def _sequence(self, sel, loc, tokens, subset=None):

        debug(sel, loc, '_sequence', tokens)
        label = tokens.pop(0)

        regexp = []
        for token in tokens:

            try:
                token.pattern
            except AttributeError:
                if not token.isalpha() or not token.isupper():
                    SelectionWarning(sel, loc, '{0} does not look like a '
                        'valid sequence'.format(repr(token)),
                        [label, token])
                try:
                    token = re_compile(token)
                except Exception as err:
                    SelectionWarning(sel, loc, '{0} could not be compiled '
                        'as a regular expression for sequence evaluation'
                        .format(repr(token)), [label, token])
                else:
                    regexp.append(token)
            else:
                regexp.append(token)

        if not regexp:
            return self._getZeros(subset), False

        calpha = self._atoms.calpha
        if calpha is None:
            return self._getZeros(subset), False

        matches = []
        for chain in iter(HierView(calpha)):
            sequence = chain.getSequence()
            indices = chain._getIndices()
            for re in regexp:
                for match in re.finditer(sequence):
                    matches.extend(indices[match.start():match.end()])

        if matches:
            torf = zeros(self._ag.numAtoms(), bool)
            torf[matches] = True
            if self._indices is not None:
                torf = torf[self._indices]
            torf = self._sameas(sel, loc, [('same', 'residue', 'as'), torf])[0]
            if subset is None:
                return torf, False
            else:
                return torf[subset], False
        else:
            return self._getZeros(subset), False

    def _getData(self, sel, loc, keyword):
        """Returns atomic data."""

        data = self._data.get(keyword)
        if data is not None:
            return data, False

        try:
            idx = XYZ2INDEX[keyword]
        except KeyError:
            pass
        else:
            data = self._getCoords()
            if data is not None:
                data = data[:,idx]
                self._data[keyword] = data
            return data, False

        if keyword == 'index':
            try:
                data = self._atoms._getIndices()
            except AttributeError:
                data = arange(self._atoms.numAtoms())
            self._data['index'] = data
            return data, False

        field = ATOMIC_FIELDS.get(FIELDS_SYNONYMS.get(keyword, keyword))
        if field is None:
            data = self._atoms._getData(keyword)
            if data is None:
                return None, SelectionError(sel, loc, '{0} is not a valid '
                                       'data label'.format(repr(keyword)))
            elif not isinstance(data, ndarray) and data.ndim == 1:
                return None, SelectionError(sel, loc, '{0} is not a 1d '
                                      'array'.format(repr(keyword)))
        else:
            try:
                data = getattr(self._atoms, '_get' + field.meth_pl)()
            except Exception as err:
                return None, SelectionError(sel, loc, str(err))
            if data is None:
                return None, SelectionError(sel, loc, '{0} is not set'
                                                       .format(repr(keyword)))
        self._data[keyword] = data
        return data, False

    def _getCoords(self):
        """Returns coordinates of atoms."""

        if self._coords is None:
            self._coords = self._atoms._getCoords()
        return self._coords

    def _getAnisous(self):
        """Returns anisotropic temperature factors of atoms."""

        if self._anisous is None:
            self._anisous = self._atoms._getAnisous()
        return self._anisous