scanpy.tl.umap

Contents

scanpy.tl.umap#

scanpy.tl.umap(adata, *, min_dist=0.5, spread=1.0, n_components=2, maxiter=None, alpha=1.0, gamma=1.0, negative_sample_rate=5, init_pos='spectral', random_state=0, a=None, b=None, copy=False, method='umap', neighbors_key=None)[source]#

Embed the neighborhood graph using UMAP [McInnes18].

UMAP (Uniform Manifold Approximation and Projection) is a manifold learning technique suitable for visualizing high-dimensional data. Besides tending to be faster than tSNE, it optimizes the embedding such that it best reflects the topology of the data, which we represent throughout Scanpy using a neighborhood graph. tSNE, by contrast, optimizes the distribution of nearest-neighbor distances in the embedding such that these best match the distribution of distances in the high-dimensional space. We use the implementation of umap-learn [McInnes18]. For a few comparisons of UMAP with tSNE, see this preprint.

Parameters:
adata AnnData

Annotated data matrix.

min_dist float (default: 0.5)

The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out. The default of in the umap-learn package is 0.1.

spread float (default: 1.0)

The effective scale of embedded points. In combination with min_dist this determines how clustered/clumped the embedded points are.

n_components int (default: 2)

The number of dimensions of the embedding.

maxiter int | None (default: None)

The number of iterations (epochs) of the optimization. Called n_epochs in the original UMAP.

alpha float (default: 1.0)

The initial learning rate for the embedding optimization.

gamma float (default: 1.0)

Weighting applied to negative samples in low dimensional embedding optimization. Values higher than one will result in greater weight being given to negative samples.

negative_sample_rate int (default: 5)

The number of negative edge/1-simplex samples to use per positive edge/1-simplex sample in optimizing the low dimensional embedding.

init_pos Union[Literal['paga', 'spectral', 'random'], ndarray, None] (default: 'spectral')

How to initialize the low dimensional embedding. Called init in the original UMAP. Options are:

  • Any key for adata.obsm.

  • ’paga’: positions from paga().

  • ’spectral’: use a spectral embedding of the graph.

  • ’random’: assign initial embedding positions at random.

  • A numpy array of initial embedding positions.

random_state Union[int, RandomState, None] (default: 0)

If int, random_state is the seed used by the random number generator; If RandomState or Generator, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

a float | None (default: None)

More specific parameters controlling the embedding. If None these values are set automatically as determined by min_dist and spread.

b float | None (default: None)

More specific parameters controlling the embedding. If None these values are set automatically as determined by min_dist and spread.

copy bool (default: False)

Return a copy instead of writing to adata.

method Literal['umap', 'rapids'] (default: 'umap')

Chosen implementation.

'umap'

Umap’s simplical set embedding.

'rapids'

GPU accelerated implementation.

Deprecated since version 1.10.0: Use rapids_singlecell.tl.umap() instead.

neighbors_key str | None (default: None)

If not specified, umap looks .uns[‘neighbors’] for neighbors settings and .obsp[‘connectivities’] for connectivities (default storage places for pp.neighbors). If specified, umap looks .uns[neighbors_key] for neighbors settings and .obsp[.uns[neighbors_key][‘connectivities_key’]] for connectivities.

Return type:

AnnData | None

Returns:

Returns None if copy=False, else returns an AnnData object. Sets the following fields:

adata.obsm['X_umap']numpy.ndarray (dtype float)

UMAP coordinates of data.

adata.uns['umap']dict

UMAP parameters.