Experiment/Principle Components - Unsupervised Learning

brojesus111 · Nov 4, 2013

Homework Statement

A researcher collects expression measurements for 1,000 genes in 100 tissue samples. The data can be written as a 1, 000 × 100 matrix, which we call X, in which each row represents a gene and each column a tissue sample. Each tissue sample was processed on a diﬀerent day, and the columns of X are ordered so that the samples that were processed earliest are on the left, and the samples that were processed later are on the right. The tissue samples belong to two groups: control (C) and treatment (T). The C and T samples were processed in a random order across the days. The researcher wishes to determine whether each gene’s expression measurements diﬀer between the treatment and control groups.

As a pre-analysis (before comparing T versus C), the researcher performs a principal component analysis of the data, and ﬁnds that the ﬁrst principal component (a vector of length 100) has a strong linear trend from left to right, and explains 10 % of the variation. The researcher now remembers that each patient sample was run on one of two machines, A and B, and machine A was used more often in the earlier times while B was used more often later. The researcher has a record of which sample was run on which machine.

(a) The researcher decides to replace the (i, j)th element of X with

x_ij − z_i1 φ_j1

where z_i1 is the ith score, and φ_j1 is the jth loading, for the ﬁrst principal component. He will then perform a two-sample t-test on each gene in this new data set in order to determine whether its expression diﬀers between the two conditions. Critique this idea, and suggest a better approach.

(b) Design and run a small simulation experiment to demonstrate the superiority of your idea.

The Attempt at a Solution

I'm just not sure what's going on in this problem. I'm pretty sure there's something wrong with how he decides to replace the (i,j)th element of X, but I'm not sure what. What is he accomplishing with his subtraction?

I'm assuming my simulation should be based on my approach from part a, but does that mean I have to make up some fake data? Will any fake data work?

I appreciate any help.

brojesus111 · Nov 5, 2013

Anyone? :/

Experiment/Principle Components - Unsupervised Learning

SUMMARY

PREREQUISITES

NEXT STEPS

USEFUL FOR

Homework Statement

The Attempt at a Solution

Similar threads

Hi! Can someone explain about Differential Equations?

Deriving spatial derivatives

Is this the correct general solution of the given PDE?

What does "compute Aut(G)" mean?

J_1(x) = (x^2/10)*(J_1(x) + J_3(x)) How to solve?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect