### Transcription

Locally-biased analyticsYou have BIG data and want to analyze asmallpart of it:Solution 1: Cut out small part and use traditional methods Challenge: cutting out may be difficult a prioriSolution 2: Develop locally-biased methods for data-analysis Challenge: Most data analysis tools (implicitly or explicitly)make strong local-global assumptions**spectral partitioning “wants” to find 50-50 clusters; recursive partitioning is ofinterest if recursion depth isn’t too deep; eigenvectors optimize global objectives, etc.

Locally-biased analyticsLocally-biased community identification: Find a “community” around an exogenously-specified seed nodeLocally-biased image segmentation: Find a small tiger in the middle of a big pictureLocally-biased neural connectivity analysis: Find neurons that are temporally-correlated with local stimulusLocally-biased inference, semi-supervised learning, etc.: Do machine learning with a “seed set” of ground truth nodes, i.e.,make predictions that “draws strength” based on local information

Global spectral methods DO work well(1) Construct a graph fromthe data(2) Use the secondeigenvalue/eigenvector ofLaplacian: do clustering,community detection,image segmentation,parallel computing, semisupervised/transductivelearning, etc.Why is it useful?(*) Connections with random walks and sparse cuts(*) Isoperimetric structure gives controls on capacity/inference(*) Relatively easy to compute

Global spectral methods DON’T work well(1) Leading nontrivialeigenvalue/eigenvectorare inherently globalquantities(2) May NOT be sensitiveto local information:(*) Sparse cuts may bepoorly correlated withsecond/all eigenvectors(*) Interesting localregion may be hidden toglobal eigenvectors thatare dominated by exactorthogonality constraint.QUES: Can we find a locally-biased analogue of theusual global eigenvectors that comes with the goodproperties of the global eigenvectors?(*) Connections with random walks and sparse cuts(*) This gives controls on capacity/inference(*) Relatively easy to compute

OutlineLocally-biased eigenvectors A methodology to construct a locally-biased analogue of leading nontrivialeigenvector of graph LaplacianImplicit regularization . . in early-stopped iterations and teleported PageRank computationsSemi-supervised eigenvectors Extend locally-biased eigenvectors to compute multiple locally-biasedeigenvectors, i.e., locally-biased SPSD kernelsImplicit regularization . . in truncated diffusions and push-based approximations to PageRank . connections to strongly-local spectral methods and scalable computation

OutlineLocally-biased eigenvectors A methodology to construct a locally-biased analogue of leading nontrivialeigenvector of graph LaplacianImplicit regularization . . in early-stopped iterations and teleported PageRank computationsSemi-supervised eigenvectors Extend locally-biased eigenvectors to compute multiple locally-biasedeigenvectors, i.e., locally-biased SPSD kernelsImplicit regularization . . in truncated diffusions and push-based approximations to PageRank . connections to strongly-local spectral methods and scalable computation

Recall spectral graph partitioningThe basic optimizationproblem: Relaxation of: Solvable via the eigenvalueproblem: Sweep cut of second eigenvectoryields:

Geometric correlation andgeneralized PageRank vectorsGiven a cut T, define thevector:Can use this to define a geometricnotion of correlation between cuts:

Local spectral partitioning ansatzMahoney, Orecchia, and Vishnoi (2010)Primal program:Dual program:Interpretation:Interpretation: Find a cut well-correlated with theseed vector s. Embedding a combination of scaledcomplete graph Kn and completegraphs T and T (KT and KT) - wherethe latter encourage cuts near (T,T). If s is a single node, this relaxes:

Main results (1 of 2)Mahoney, Orecchia, and Vishnoi (2010)Theorem: If x* is an optimal solution to LocalSpectral,it is a GPPR vector for parameter α, and it can becomputed as the solution to a set of linear equations.Proof:(1) Relax non-convex problem to convex SDP(2) Strong duality holds for this SDP(3) Solution to SDP is rank one (from comp. slack.)(4) Rank one solution is GPPR vector.

Main results (2 of 2)Mahoney, Orecchia, and Vishnoi (2010)Theorem: If x* is optimal solution to LocalSpect(G,s,κ), one can find a cut of conductance 8λ(G,s,κ) intime O(n lg n) with sweep cut of x*.Upper bound, as usual fromsweep cut & Cheeger.Theorem: Let s be seed vector and κ correlationparameter. For all sets of nodes T s.t. κ’ : s,sT D2 , wehave: φ(T) λ(G,s,κ) if κ κ’, and φ(T) (κ’/κ)λ(G,s,κ)if κ’ κ .Lower bound: Spectralversion of flowimprovement algs.

Illustration on small graphs Similar results ifwe do local randomwalks, truncatedPageRank, and heatkernel diffusions. Linear equationformulation is more“powerful” thandiffusions I.e., can access allα ε ( - , λ2(G) )parameter values

Illustration with general seeds Seed vector doesn’t need to correspond to cuts. It could be any vector on the nodes, e.g., can find a cut “near” lowdegree vertices with si -(di-dav), iε[n].

New methods are useful more generallyMaji, Vishnoi,and Malik (2011) applied Mahoney, Orecchia, and Vishnoi (2010) Cannot find the tiger with global eigenvectors. Can find the tiger with the LocalSpectral method!

OutlineLocally-biased eigenvectors A methodology to construct a locally-biased analogue of leading nontrivialeigenvector of graph LaplacianImplicit regularization . . in early-stopped iterations and teleported PageRank computationsSemi-supervised eigenvectors Extend locally-biased eigenvectors to compute multiple locally-biasedeigenvectors, i.e., locally-biased SPSD kernelsImplicit regularization . . in truncated diffusions and push-based approximations to PageRank . connections to strongly-local spectral methods and scalable computation

PageRank and implicit regularizationRecall the usual characterization of PPR:Compare with our definition of GPPR:Question: Can we formalize that PageRank is a regularizedversion of leading nontrivial eigenvector of the Laplacian?

Two versions of spectral partitioningVP:R-VP:

Two versions of spectral partitioningVP:SDP:R-VP:R-SDP:

A simple theoremMahoney and Orecchia (2010)Modification of the usualSDP form of spectral tohave regularization (but,on the matrix X, not thevector x).

CorollaryIf FD(X) -logdet(X) (i.e., Log-determinant), then thisgives scaled PageRank matrix, with t ηI.e., PageRank does two things: It approximately computes the Fiedler vector. It exactly computes a regularized version of the Fiedlervector implicitly!(Similarly, generalized entropy regularization implicit inHeat Kernel computations; & matrix p-norm regularizationimplicit in power iteration.)

Semi-supervised eigenvectorsHansen and Mahoney (NIPS 2013, JMLR 2014)Eigenvectors are inherently global quantities, and the leading ones maytherefore fail at modeling relevant local structures.Generalized eigenvalueproblem. Solution is given bythe second smallesteigenvector, and yields a“Normalized Cut”.Locally-biased analogue of thesecond smallest eigenvector.Optimal solution is a generalizationof Personalized PageRank and canbe computed in nearly-linear time[MOV2012].Semi-supervised eigenvectorgeneralization of [HM2013]. Thisobjective incorporates a generalorthogonality constraint, allowingus to compute a sequence of“localized eigenvectors”.Semi-supervised eigenvectors are efficient to compute and inherit manyof the nice properties that characterizes global eigenvectors of a graph.

Semi-supervised eigenvectorsHansen and Mahoney (NIPS 2013, JMLR 2014)This interpolates between verylocalized solutions and the globaleigenvectors of the graph Laplacian. For κ 0, this is the usual globalgeneralized eigenvalue problem. For κ 1, this returns the localseed set.Norm constraintOrthogonality constraintLocality constraintLeading solutionFor γ 0 , one we can compute the firstsemi-supervised eigenvectors usinglocal graph diffusions, i.e., personalizedPageRank. Approximate the solution using thePush algorithm [ACL06]. Implicit regularization characterizationby [M010] & [GM14].Seed vectorProjection operatorGeneral solutionDetermines the locality of the solution.Convex for.

Semi-supervised eigenvectorsSmall-world example - The eigenvectors having smallest eigenvaluescapture the slowest modes of variation.Probability of random edgesGlobal eigenvectorsGlobal eigenvectors

Semi-supervised eigenvectorsSmall-world example - The eigenvectors having smallest eigenvaluescapture the slowest modes of variation.Correlation with seedseed nodeSemi-supervised eigenvectorsLow correlationSemi-supervised eigenvectorsHigh correlation

Semi-supervised eigenvectorsHansen and Mahoney (NIPS 2013, JMLR 2014)One seed per classTen labeled samples per classused in a downstream classifier Semi-supervised eigenvector scatter plotsMany “real” applications: A spatially guided “searchlight” technique that comparedto [Kriegeskorte2006] account for spatially distributedsignal representations. Large/small-scale structure in DNA SNP data inpopulation genetics Local structure in astronomical data Code is available at:https://sites.google.com/site/tokejansenhansen/

Local structure in SDSS spectraLawlor, Budavari, and Mahoney (2014) Data: x ε R3841, N 500k are photon fluxes in 10 Å bins preprocessing corrects for redshift, gappy regions normalized by median flux at certain wavelengthsRed galaxyBlue galaxy

Local structure in SDSS spectraLawlor, Budavari, and Mahoney (2014)Galaxies along bridge& bridge spectraROC curves for classifyingAGN spectra on top fourglobal eigenvectors (left) ;and (right) top four semisupervised eigenvectors.

Push Algorithm for PageRankThePushMethod! Proposed (a variant) in ACL06 (also M0x, JW03) for Personalized PageRankStrongly related to Gauss-Seidel (see Gleich’s talk at Simons for this)Derived to show improved runtime for balanced solversApplied to graphs with 10M nodes and 1B edges

Why do we care about “push”?1.2. Widely-used forempirical studies vof “communities”Used for “fastPageRank”approximationProduces sparseapproximations toPageRank!Why does the “pushmethod” have suchempirical utility?has a single one hereNewman’s netscience379 vertices, 1828 nnz“zero” on most of the nodes

How might an algorithm be good?Two ways this algorithm might be good. Theorem 1. [ACL06] The ACL push procedure returns avector that is ε-worst than the exact PPR and much faster. Theorem 2. [GM14] The ACL push procedure returns avector that exactly solves an L1-regulairzed version of thePPR objective.I.e., the Push Method does two things: It approximately computes the PPR vector. It exactly computes a regularized version of the PPRvector implicitly!

The s-t min-cut problemUnweighted incidence matrixDiagonal capacity matrix Consider L2 variants of this objective & show how thePush Method and other diffusion-based ML algorithmsimplicitly regularize.

The localized cut graphGleich and Mahoney (2014)Solve the s-t min-cut

s-t min-cut - PageRankGleich and Mahoney (2014)L1- L2 changes s-tmin-cut to “electricalflow” s-t min-cut

Back to the push methodGleich and Mahoney (2014)Need fornormalizationL1 regularizationfor sparsity

ConclusionsLocally-biased and semi-supervised eigenvectors Local versions of the usual global eigenvectors that come withthe good properties of global eigenvectors Strong algorithmic and statistical theory & good initial results inseveral applicationsNovel connections between approximate computationand implicit regularizationSpecial cases already scaled up to LARGE data

Locally-biased analytics You have BIG data and want to analyze a small part of it: Solution 1: Cut out small part and use traditional methods Challenge: cutting out may be difficult a priori Solution 2: Develop locally-biased methods for data-analysis Challenge: Most data analysis tools (implicitly or explicitly)