Automatic methods for hate speech detection are still in their infancy, and mostly employ standard supervised text classification tools, which are essentially highly sophisticated statistical machines for finding correlations between textual patterns and the predicted label (hate speech or not). Two major methodological issues limit the applicability of such methods. The first is that supervised methods assume there is a fixed linguistic characterization of hate speech that can be approximated by a supervised algorithm, and do not take into account that the definition of hate speech may be fluid and based on a rapidly changing policy and social context. The second is the inability of simple correlation machines to tune to superficial patterns of the text, leading to grave errors. For example, recent studies have found that such statistical methods are unjustly biased against members of the protected groups they are employed to monitor hate speech against. In my talk, I will outline a technical approach for addressing both issues, and present recent research efforts to implement it.