Mva Script |best| -
scores, clusters = run_mva(X, labels=y) We tested the script on a synthetic 100×10 dataset. The PCA scree plot (Fig. 1) showed that 3 components capture 82% of the variance. The LDA projection (Fig. 2) separated the two synthetic classes almost perfectly due to the constructed differences in means. Clustering on unlabeled data suggested an optimal k of 3.
# Step 3: PCA pca = PCA(n_components=min(data_scaled.shape[1], 10)) pca_scores = pca.fit_transform(data_scaled) cum_var = np.cumsum(pca.explained_variance_ratio_) n_comp = np.argmax(cum_var >= variance_threshold) + 1 print(f"Optimal PCA components: {n_comp} (explained {cum_var[n_comp-1]:.2%})") mva script
# Step 4: Plot scree plt.figure(figsize=(8,4)) plt.bar(range(1, len(pca.explained_variance_ratio_)+1), pca.explained_variance_ratio_) plt.step(range(1, len(cum_var)+1), cum_var, where='mid', color='red') plt.title('Scree Plot with Cumulative Variance') plt.xlabel('Principal Component') plt.ylabel('Variance Ratio') plt.savefig('scree_plot.png') scores, clusters = run_mva(X, labels=y) We tested the
# Step 5: LDA (if labels exist) if labels is not None: lda = LDA(n_components=min(2, len(np.unique(labels))-1)) lda_scores = lda.fit_transform(data_scaled, labels) print("LDA applied. Reduced shape:", lda_scores.shape) # LDA scatter plot plt.figure() for lab in np.unique(labels): subset = lda_scores[labels == lab] plt.scatter(subset[:,0], subset[:,1], label=f'Class {lab}') plt.legend() plt.title('LDA Projection') plt.savefig('lda_plot.png') The LDA projection (Fig