Friday, July 22, 2022

Week 9: Experiment on Conditional Expectation with ML

 1 Conditional Expectation with ML


This week I mainly focused on doing experiments on conditional expectation with ML. The purpose of the experiments is to compare the performance between ML method and original implemented method (Discrete, J and U method, which utilizing RKHS), and between different ML algorithms.

1.1 Correctness of implementation


The first experiment was to test the correctness of the implementation, to see whether it can calculate expectation correctly.

The data model used and the experiments results are shown below:



The program are testing the ACE of IVA on IVC, which should be -3. We can see from the results that ML method can calculate correctly with the best accuracy.

1.2 Performance when increasing dimensions


The hardest part for calculating conditional expectation is that when dimensions go large, the dimensional curse occurs due to the sparsity of data. With high dimensions, the traditional discrete way to calculate expectation becomes infeasible since there are not enough data to give an accuracy estimation.

The experiments at this time try to show how the performance changes due to the increasing of the dimensions. The results are shown below:



Here we used Kernel Ridge Regression with Random Fourier Feature as our machine learning method. From the results above, we can observe that the average R2 (indicating how much the method has explained for total variance) decreases rapidly as the dimensions goes high. When running on 6 dimensions conditional, machine learning method seems to be unable to give a correct results, while J-prob method still shows good capability even with only 1000 samples.

1.3 Experiment with different ML algorithm


Since there are lots of ML methods for regression task, which has different properties, it is better to test all of them to say whether they have similar results regarding this task. The experiments results are shown below:



It turns out that different ML methods vary a lot in performance. K-Nearest Neighbor, SVM, Random Forest and Neural Network all show a better performance than J-Prob method. Especially for Random Forest, it outperforms other methods significantly with both 1000 or 10000 samples with very good runtimes.

Therefore, more experiments are done for RF algorithm to test out the best hyperparameter: the number of trees. The results are shown below:


From the experiments we can observe that the increasing of trees always shows a positive influence on the performance, while the price would be the increasing of runtimes.

2 Plan for Next Week


  • Extend the conditional expectation with ML to discrete cases.

No comments:

Post a Comment

Week 10: Sensitive Feature in RCoT

 1 Sensitive Feature in RCoT The need for adding a parameter sensitive into RCoT algorithm is that, for data from reality, the variables alw...