Accurate prediction of fault-prone modules in software development process enables effective discovery and identification of the defects. Such prediction models are especially valuable for the large-scale systems, where verification experts need to focus their attention and resources to problem areas in the system under development. This chapter presents a methodology for predicting fault-prone modules using a modified random forests algorithm. Random forests improve classification accuracy by growing an ensemble of trees and letting them vote on the classification decision. We applied the methodology to five NASA public domain defect datasets. These datasets vary in size, but all typically contain a small number of defect samples. If overall accuracy maximization is the goal, then learning from such data usually results in a biased classifier. To obtain better prediction of fault-proneness, two strategies are investigated: proper sampling technique in constructing the tree classifiers, and threshold adjustment in determining the “winning” class. Both are found to be effective in accurate prediction of fault-prone modules. In addition, the chapter presents a thorough and statistically sound comparison of these methods against many other classifiers frequently used in the literature.