Python numpy.random.choice in C # mit nicht / einheitlicher Wahrscheinlichkeitsverteilung

Ich versuche, einen Code einzugeben, der das gleiche tun wird wie Python, Numpy.random.Choice

Der kritische Teil ist: probability

Die Wahrscheinlichkeiten, die mit jedem Eintrag in einem. Wenn nicht gegeben, nimmt die Probe eine einheitliche Verteilung über alle Einträge in a.

Einige Test-Code:

 import numpy as np n = 5 vocab_size = 3 p = np.array( [[ 0.65278451], [ 0.0868038725], [ 0.2604116175]]) print('Sum: ', repr(sum(p))) for t in range(n): x = np.random.choice(range(vocab_size), p=p.ravel()) print('x: %sx[x]: %s' % (x, p.ravel()[x])) print(p.ravel()) 

Dies gibt eine Ausgabe von:

 Sum: array([ 1.]) x: 0 x[x]: 0.65278451 x: 0 x[x]: 0.65278451 x: 0 x[x]: 0.65278451 x: 0 x[x]: 0.65278451 x: 0 x[x]: 0.65278451 [ 0.65278451 0.08680387 0.26041162] 

Manchmal.

Es gibt eine Verteilung hier, und es ist eine teilweise zufällige, aber es gibt auch dort Struktur.

Ich möchte dies in C # implementieren, und um ehrlich zu sein, bin ich mir nicht sicher, auf eine effiziente Möglichkeit, es zu tun.

Vor etwa 4 Jahren gab es eine gute Frage: Emulieren Sie Pythons random.choice in .NET

Da ist das jetzt ganz alt und auch nicht wirklich in die Tiefe der einheitlichen Wahrscheinlichkeitsverteilung, ich dachte, ich würde nach einer Ausarbeitung fragen?

Jetzt haben sich mal geändert und Code ändert sich, ich denke, es kann eine bessere Möglichkeit zur Implementierung einer .NET Random.Choice() Methode geben.

 public static int Choice(Vector sequence, int a = 0, int size = 0, bool replace = false) { // F(x) var Fx = 1/(b - a) var p = (xmax - xmin) * Fx return random.Next(0, sequence.Length); } 

Vektor ist nur ein Doppel [].

Wie würde ich nach zufällig gehen Wählen Sie eine Wahrscheinlichkeit von einem Vektor wie folgt:

  p = np.array( [[ 0.01313731], [ 0.01315883], [ 0.01312814], [ 0.01316345], [ 0.01316839], [ 0.01314225], [ 0.01317578], [ 0.01312916], [ 0.01316344], [ 0.01317046], [ 0.01314973], [ 0.01314432], [ 0.01317042], [ 0.01314846], [ 0.01315124], [ 0.01316694], [ 0.0131816 ], [ 0.01315033], [ 0.0131645 ], [ 0.01314199], [ 0.01315199], [ 0.01314431], [ 0.01314458], [ 0.01314999], [ 0.01315409], [ 0.01316245], [ 0.01315008], [ 0.01314104], [ 0.01315215], [ 0.01317024], [ 0.01315993], [ 0.01318789], [ 0.0131677 ], [ 0.01316761], [ 0.01315658], [ 0.01315902], [ 0.01314266], [ 0.0131637 ], [ 0.01315702], [ 0.01315776], [ 0.01316194], [ 0.01316246], [ 0.01314769], [ 0.01315608], [ 0.01315487], [ 0.01316117], [ 0.01315083], [ 0.01315836], [ 0.0131665 ], [ 0.01314706], [ 0.01314923], [ 0.01317971], [ 0.01316373], [ 0.01314863], [ 0.01315498], [ 0.01315732], [ 0.01318195], [ 0.01315505], [ 0.01315979], [ 0.01315992], [ 0.01316072], [ 0.01314744], [ 0.0131638 ], [ 0.01315642], [ 0.01314933], [ 0.01316188], [ 0.01315458], [ 0.01315551], [ 0.01317907], [ 0.01316296], [ 0.01317765], [ 0.01316863], [ 0.01316804], [ 0.01314882], [ 0.01316548], [ 0.01315487]]) 

Die Ausgabe in Python ist:

 Sum: array([ 1.]) x: 21 x[x]: 0.01314431 x: 30 x[x]: 0.01315993 x: 54 x[x]: 0.01315498 x: 31 x[x]: 0.01318789 x: 27 x[x]: 0.01314104 

Manchmal.

EDIT: Nach Kaffee und Schlaf, etwas mehr Einblick. Die Dokumentation erklärt:

Generiere eine ungleichförmige Stichprobe aus np.arange (5) der Größe 3 ohne Ersatz:

Np.random.choice (5, 3, replace = False, p = [0.1, 0, 0.3, 0.6, 0]) Array ([2, 3, 0])

Der Parameter p führt eine nicht-einheitliche Verteilung in die Sequenz oder Auswahl ein.

Die Wahrscheinlichkeiten, die mit jedem Eintrag in a . Wenn nicht gegeben, nimmt die Probe eine einheitliche Verteilung über alle Einträge in a .

Also ich vermute, wenn:

 static int[] a = new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75}; static double[] p = new double[] { 0.01313731, 0.01315883, 0.01312814, 0.01316345, 0.01316839, 0.01314225, 0.01317578, 0.01312916, 0.01316344, 0.01317046, 0.01314973, 0.01314432, 0.01317042, 0.01314846, 0.01315124, 0.01316694, 0.0131816, 0.01315033, 0.0131645, 0.01314199, 0.01315199, 0.01314431, 0.01314458, 0.01314999, 0.01315409, 0.01316245, 0.01315008, 0.01314104, 0.01315215, 0.01317024, 0.01315993, 0.01318789, 0.0131677, 0.01316761, 0.01315658, 0.01315902, 0.01314266, 0.0131637, 0.01315702, 0.01315776, 0.01316194, 0.01316246, 0.01314769, 0.01315608, 0.01315487, 0.01316117, 0.01315083, 0.01315836, 0.0131665, 0.01314706, 0.01314923, 0.01317971, 0.01316373, 0.01314863, 0.01315498, 0.01315732, 0.01318195, 0.01315505, 0.01315979, 0.01315992, 0.01316072, 0.01314744, 0.0131638, 0.01315642, 0.01314933, 0.01316188, 0.01315458, 0.01315551, 0.01317907, 0.01316296, 0.01317765, 0.01316863, 0.01316804, 0.01314882, 0.01316548, 0.01315487 }; 

Wie würde ich diese Verteilung effizient berechnen?

BEARBEITEN:

Während der obige p Parameter keine klare Verteilung hat:

Bildbeschreibung hier eingeben

Diese p Parameter tut:

 p = np.array( [[ 3.09571694e-03], [ 6.62372261e-04], [ 2.52917874e-04], [ 6.93371978e-04], [ 2.22301291e-04], [ 3.53796717e-02], [ 2.36204398e-04], [ 2.41100042e-04], [ 1.59093166e-02], [ 5.17099025e-04], [ 2.72037896e-04], [ 1.29918769e-03], [ 2.68077696e-02], [ 5.68696611e-04], [ 5.32142704e-04], [ 5.88432463e-05], [ 2.53700138e-02], [ 2.51216588e-03], [ 4.72895541e-04], [ 4.20276848e-03], [ 5.65701874e-05], [ 1.84972048e-03], [ 8.46515331e-03], [ 8.02505743e-02], [ 5.34274983e-04], [ 5.18868535e-04], [ 2.22580377e-04], [ 2.50133462e-02], [ 3.70997917e-02], [ 5.84941482e-05], [ 6.49978323e-04], [ 4.18675536e-01], [ 6.16371962e-02], [ 3.82260752e-04], [ 6.09901544e-04], [ 2.54540201e-03], [ 2.46758824e-04], [ 4.13621365e-04], [ 5.23495532e-04], [ 6.40675685e-03], [ 1.14165332e-03], [ 1.89148994e-04], [ 8.41715724e-04], [ 8.65699032e-04], [ 6.71368283e-04], [ 2.14908596e-03], [ 5.80679210e-02], [ 1.11176616e-02], [ 6.58134137e-05], [ 2.38992622e-02], [ 2.91388753e-04], [ 1.93989753e-03], [ 1.82157325e-03], [ 3.33691627e-03], [ 5.69157244e-03], [ 1.11033592e-04], [ 2.42448034e-04], [ 8.42765356e-05], [ 1.31656056e-02], [ 1.68779684e-02], [ 2.72298244e-02], [ 8.19056613e-04], [ 1.14640462e-02], [ 6.21846308e-05], [ 9.24618073e-04], [ 3.63659515e-02], [ 7.17286486e-05], [ 6.24008652e-04], [ 2.59900890e-03], [ 1.57848651e-04], [ 5.71378707e-05], [ 7.62828929e-04], [ 2.91648042e-04], [ 1.67612579e-04], [ 1.65455262e-04], [ 1.01981563e-02]]) 

Bildbeschreibung hier eingeben

Etwas, was von einer Gaußschen Verteilung mit einem Schiefe nach links. Dieses Video von PoyserMath ist ausgezeichnet: Stats: Finding Probability mit einer normalen Distribution Tabelle erklären, warum p muss Summe auf 1,0

EDIT: 12.04.17 – Endlich habe ich die python-Datei gefunden, die damit verbunden ist !!!

 # Author: Hamzeh Alsalhi <ha258@cornell.edu> # # License: BSD 3 clause from __future__ import division import numpy as np import scipy.sparse as sp import operator import array from sklearn.utils import check_random_state from sklearn.utils.fixes import astype from ._random import sample_without_replacement __all__ = ['sample_without_replacement', 'choice'] # This is a backport of np.random.choice from numpy 1.7 # The function can be removed when we bump the requirements to >=1.7 def choice(a, size=None, replace=True, p=None, random_state=None): """ choice(a, size=None, replace=True, p=None) Generates a random sample from a given 1-D array .. versionadded:: 1.7.0 Parameters ----------- a : 1-D array-like or int If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a was np.arange(n) size : int or tuple of ints, optional Output shape. Default is None, in which case a single value is returned. replace : boolean, optional Whether the sample is with or without replacement. p : 1-D array-like, optional The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. Returns -------- samples : 1-D ndarray, shape (size,) The generated random samples Raises ------- ValueError If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size See Also --------- randint, shuffle, permutation Examples --------- Generate a uniform random sample from np.arange(5) of size 3: >>> np.random.choice(5, 3) # doctest: +SKIP array([0, 3, 4]) >>> #This is equivalent to np.random.randint(0,5,3) Generate a non-uniform random sample from np.arange(5) of size 3: >>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0]) # doctest: +SKIP array([3, 3, 0]) Generate a uniform random sample from np.arange(5) of size 3 without replacement: >>> np.random.choice(5, 3, replace=False) # doctest: +SKIP array([3,1,0]) >>> #This is equivalent to np.random.shuffle(np.arange(5))[:3] Generate a non-uniform random sample from np.arange(5) of size 3 without replacement: >>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0]) ... # doctest: +SKIP array([2, 3, 0]) Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance: >>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher'] >>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3]) ... # doctest: +SKIP array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], dtype='|S11') """ random_state = check_random_state(random_state) # Format and Verify input a = np.array(a, copy=False) if a.ndim == 0: try: # __index__ must return an integer by python rules. pop_size = operator.index(a.item()) except TypeError: raise ValueError("a must be 1-dimensional or an integer") if pop_size <= 0: raise ValueError("a must be greater than 0") elif a.ndim != 1: raise ValueError("a must be 1-dimensional") else: pop_size = a.shape[0] if pop_size is 0: raise ValueError("a must be non-empty") if p is not None: p = np.array(p, dtype=np.double, ndmin=1, copy=False) if p.ndim != 1: raise ValueError("p must be 1-dimensional") if p.size != pop_size: raise ValueError("a and p must have same size") if np.any(p < 0): raise ValueError("probabilities are not non-negative") if not np.allclose(p.sum(), 1): raise ValueError("probabilities do not sum to 1") shape = size if shape is not None: size = np.prod(shape, dtype=np.intp) else: size = 1 # Actual sampling if replace: if p is not None: cdf = p.cumsum() cdf /= cdf[-1] uniform_samples = random_state.random_sample(shape) idx = cdf.searchsorted(uniform_samples, side='right') # searchsorted returns a scalar idx = np.array(idx, copy=False) else: idx = random_state.randint(0, pop_size, size=shape) else: if size > pop_size: raise ValueError("Cannot take a larger sample than " "population when 'replace=False'") if p is not None: if np.sum(p > 0) < size: raise ValueError("Fewer non-zero entries in p than size") n_uniq = 0 p = p.copy() found = np.zeros(shape, dtype=np.int) flat_found = found.ravel() while n_uniq < size: x = random_state.rand(size - n_uniq) if n_uniq > 0: p[flat_found[0:n_uniq]] = 0 cdf = np.cumsum(p) cdf /= cdf[-1] new = cdf.searchsorted(x, side='right') _, unique_indices = np.unique(new, return_index=True) unique_indices.sort() new = new.take(unique_indices) flat_found[n_uniq:n_uniq + new.size] = new n_uniq += new.size idx = found else: idx = random_state.permutation(pop_size)[:size] if shape is not None: idx.shape = shape if shape is None and isinstance(idx, np.ndarray): # In most cases a scalar will have been made an array idx = idx.item(0) # Use samples as indices for a if a is array-like if a.ndim == 0: return idx if shape is not None and idx.ndim == 0: # If size == () then the user requested a 0-d array as opposed to # a scalar object when size is None. However a[idx] is always a # scalar and not an array. So this makes sure the result is an # array, taking into account that np.array(item) may not work # for object arrays. res = np.empty((), dtype=a.dtype) res[()] = a[idx] return res return a[idx] def random_choice_csc(n_samples, classes, class_probability=None, random_state=None): """Generate a sparse random matrix given column class distributions Parameters ---------- n_samples : int, Number of samples to draw in each column. classes : list of size n_outputs of arrays of size (n_classes,) List of classes for each column. class_probability : list of size n_outputs of arrays of size (n_classes,) Optional (default=None). Class distribution of each column. If None the uniform distribution is assumed. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. Returns ------- random_matrix : sparse csc matrix of size (n_samples, n_outputs) """ data = array.array('i') indices = array.array('i') indptr = array.array('i', [0]) for j in range(len(classes)): classes[j] = np.asarray(classes[j]) if classes[j].dtype.kind != 'i': raise ValueError("class dtype %s is not supported" % classes[j].dtype) classes[j] = astype(classes[j], np.int64, copy=False) # use uniform distribution if no class_probability is given if class_probability is None: class_prob_j = np.empty(shape=classes[j].shape[0]) class_prob_j.fill(1 / classes[j].shape[0]) else: class_prob_j = np.asarray(class_probability[j]) if np.sum(class_prob_j) != 1.0: raise ValueError("Probability array at index {0} does not sum to " "one".format(j)) if class_prob_j.shape[0] != classes[j].shape[0]: raise ValueError("classes[{0}] (length {1}) and " "class_probability[{0}] (length {2}) have " "different length.".format(j, classes[j].shape[0], class_prob_j.shape[0])) # If 0 is not present in the classes insert it with a probability 0.0 if 0 not in classes[j]: classes[j] = np.insert(classes[j], 0, 0) class_prob_j = np.insert(class_prob_j, 0, 0.0) # If there are nonzero classes choose randomly using class_probability rng = check_random_state(random_state) if classes[j].shape[0] > 1: p_nonzero = 1 - class_prob_j[classes[j] == 0] nnz = int(n_samples * p_nonzero) ind_sample = sample_without_replacement(n_population=n_samples, n_samples=nnz, random_state=random_state) indices.extend(ind_sample) # Normalize probabilites for the nonzero elements classes_j_nonzero = classes[j] != 0 class_probability_nz = class_prob_j[classes_j_nonzero] class_probability_nz_norm = (class_probability_nz / np.sum(class_probability_nz)) classes_ind = np.searchsorted(class_probability_nz_norm.cumsum(), rng.rand(nnz)) data.extend(classes[j][classes_j_nonzero][classes_ind]) indptr.append(len(indices)) return sp.csc_matrix((data, indices, indptr), (n_samples, len(classes)), dtype=int) 

Python ist die beste Programmiersprache der Welt.