1. Introduction
Water pollution is currently a serious problem that threatens the seas and inland waters. Industrial and household waste discharged into rivers and streams disrupt the balance of the ecosystem and lead to significant problems in terms of public health by affecting the quality of water[1]. Surface and dam waters in particular are more sensitive to pollution, they pose even greater health risks than other water sources as they cannot be self-purified[2]. The quality of surface water is strongly affected by both natural processes due to the hydrological, geological, and climatic factors and by anthropogenic impacts (agricultural, urban, and industrial discharges)[3-4]. Rigorous environmental monitoring of changes in pollution level is necessary to ensure the safety of this ecosystem. Of all the parameters needed to determine the state of surface water, turbidity can be considered as one of the most important. High values of this parameter normally reflect high values of other pollution-related parameters such as chemical oxygen demand, total suspended solid, nitrate, ammonium, sulphate, ...etc[5]. The measurement of turbidity is an effective mean of determining the optical quality of water; its magnitude is indicative of probable water pollution which could be hazardous to human health[6]. Furthermore, high levels of turbidity present during the treatment of raw water can limit the effectiveness of filtration and chlorination processes designed to remove dangerous bacteria and parasites such as Cryptosporidium [7].
Water quality monitoring generates complex and high-dimensional data which are generally analyzed and evaluated via statistical techniques[ 8]. Techniques such as cluster analysis (CA), factor analysis (FA), discriminant analysis (DA), analysis of variance (ANOVA) and water quality index (WQI) have proven to be very helpful in understanding spatial and temporal variations in water quality data. Besides, other statistical approaches including multiple linear regression (MLR), Principal Component Analysis (PCA), artificial neural networks (ANNs), multivariate receptor models (MRMs) and several simulation and Forecasting methods have been successfully applied in recent studies[ 9-14]. It has been shown that these methods can reduce data dimensions and highlight the significant variables that explain changes in water quality. Also, they permit to assess the correlation among the variables and to develop predictive models for the selected ones.
PCA is one of the dimensional reduction techniques that retains most of the useful information from a dataset while attempting to reduce its dimensions. Literature data indicated that PCA was the most frequently used method for water quality assessment over the past thirty years[15]. The benefit of this approach is that it permits to link and correlate the results to environmental factors, to processes and to contamination sources in water ecosystems[15]. It has been applied for evaluating spatial and temporal variations in surface water and groundwater quality[16-18]. Also, PCA was used. to identify the principal sources of pollution in xin’anjiang river (China), the results revealed that nutrient and organic pollutants were the principal factors affecting water quality of the examined river[19]. Additionally, PCA has been successfully applied for optimizing water quality monitoring networks [20]. In some cases, PCA has been combined with other statistical tools for data analysis, the combined models demonstrated effectiveness and robustness on assessing, monitoring and predicting water quality[21,22]. Multiple linear regression is a statistical tool that allows to establish linear relationships between a response variable and several explanatory variables[23]. Used for predictive purposes, MLR in combination with PCA has proven to be effective in identifying the most significant parameters that contributed to the variation in water quality[ 23,24].
In Algeria, an acute population increase caused a rapid increase in agricultural land use and industrial development[25]. For these reasons, the Algerian authorities have implemented an important plan to the construction of dams and reservoirs to surmount the water deficit. The first phenomenon of “water pollution” appeared as soon as the work was completed. The wadis (streams), which are the dam’s main source of water supply, may also be the main cause of pollution[26]. In this perspective this work is carried out. The objective is to demonstrate the importance of monitoring the turbidity parameter as an indicator of surface water pollution by applying statistical methods. The first technique used is the principal component analysis (PCA) which allows us to extract the different elements correlated to Turb and the potential sources of pollution. Multiple linear regression (MLR) was then applied to predict turbidity. Indeed, MLR models have been successfully employed to study the behavior of natural systems and have demonstrated high performance and accuracy.
2. Materials and methods
2.1. Study area
The Cheurfa dam (35°23'29''N/0°16'22''W), currently named Cheurfa II, is one of the most important dams in North-West Algeria, located in the large Macta watershed. The dam regulates the waters of the Mabtouh Wadi (35°21'20''N/0°19'1''W) (Figure 1) which is the ex- tension of the Mekerra Wadi (35°12'05''N/0°36'18''W). Thus, the Cheurfa dam is mainly fed by the Mekerra Wadi. It was built upstream of the old Cheurfas dam (Cheurfa I) and was commissioned in 1992. Theoretically, the storage capacity of the dam is 83 hm3 with an annual regulated volume of 45 hm3, 20 hm3 of this volume are for irrigation [27]. It is used to supply drinking water to the following urban areas: Ain Adden, Boujebha El Borj, Oued Mabtouh, Chorfa and Douar Rehailia which are thickly settled as well as the industrial zone of Sig [28].
From the climate point of view, the watershed of Cheurfa dam is subject to a semi-arid climate with irregular rainfall characterized by intense autumnal showers causing major floods. The monthly average temperature is around 27.21 °C with a cold winter where the average temperature in January is about 2.45 °C and a hot-dry summer with a temperature of 36.12 °C at July[27]. The average annual rainfall during the analyzed period was 230.34 ± 87.816 mm, it varied between a minimum of 125.3 and a maximum of 368.9 mm/year. The highest amount of precipitation was recorded in January 2016 (147.9 mm). The main characteristics of the dam are summarized in Table 1.
The Cheurfa dam is affected by various sources of pollution. Significant quantities of wastewaters are discharged into the Oued Mekerra-Mebtouh and approximately 8000 m3/year reach the Cheurfa Dam[29]. Other sources of pollution are involved, pollution of agricultural origin (mainly poultry farming) accounts for 1.68 T/d, urban pollution emanating from urban areas as well as industrial pollution estimated at 1542 m3 of discharges / d[29]. Moreover, the main industrial activities in the basin of the Mekerra are located in the northwestern of Sidi Bel Abbes city, known by the presence of large industrial units for dairy production and food processing. These discharge wastewater into the Oued El maleh (tributary of oued Mekerra) without any prior treatment contributing to the pollution of the Cheurfa dam[30].
2.2. Sampling and analytical methods
To evaluate the effect of anthropogenic pollution on water quality and show the importance of measuring turbidity during a surface water analysis, monthly raw water were sampled over periods ranging from 2014 to 2018 to monitor and analyze twelve (12) physico-chemical variables, namely: temperature(T°C), potential hydrogen (pH), conductivity (EC), turbidity (Turb), dissolved oxygen (DO), ammonium (N-NH4+), nitrate (N-NO3-), nitrite (N-NO2-), orthophosphates (P-PO43-), total suspended solid (TSS), biochemical oxygen demand (BOD5) and chemical oxygen demand (COD). The water samples stored in polyethylene bottles of one-liter capacity were collected at a depth of 0.50 m and at 3 m from the border of the dam's dike according to Rodier et al.[31], and then transported to the laboratory in a cooler as to maintain the temperature at 4 °C.
The water temperature, pH, electrical conductivity, and dissolved oxygen were measured in situ using a mercury thermometer, an OHARU-ST10 pH meter, a HANNA conductivity meter and a HANNA oximeter respectively. The turbidity measurements (in Nephelometric Turb units (NTU)) were performed with a portable AL450T-IR turbidimeter. The remaining water parameters were analyzed in the laboratory using the standard methods for water and wastewater. The filtration method (NFT90-105) for TSS measurement, the BOD5 (mg/L d’O2) was measured using a manometric method. N-NO3- (mg/L), N-NO2- (mg/L) and N-NH4+ (mg/L) were determined by applying spectrophotometric methods: ISO 7890-3, NFT90-013, NFT90-015 respectively. The P-PO43- (mg/L) and COD (mg/L d’O2) were analyzed by colorimetry that uses molybdate method (DR/820) and Manganese III method (8048/10067) respectively.
2.3. Statistical analysis
Chemometric techniques are very useful for the description of many variables in an analytical system and determine possible relationships between them. The explication of water quality status of an aquatic system is difficult and complicated. Principal Component Analysis (PCA) is one of the most important methods used to reduce the dimensionality of a data matrix while retaining most of the original information[ 15-16,32-33] and to better assess the effect of human activities on water quality. The data matrix used contains 12 variables (parameters analyzed) namely: (T, pH, EC, Turb, TSS, DO, COD, BOD5, NO3-, NO2-, NH4+, PO43- ) and 60 samples (individuals). Analyses were carried out by the software “R 3.6.1” available from: (https://cran.r-project.org/bin/windows/base/old/3.6.1/).
The multiple linear regression model (MLR) consists of explaining an indicator parameter of surface water pollution which is the Turb (y, as a dependent response) as a function of the physicochemical parameters (x1, x2, x3, x4…. x11) which are therefore the independent variables (T°, pH, EC, TSS, COD, BOD5, NO3-, NO2-, NH4+, PO43-, DO. This is the principle of analysis when, in a statistical series at p dimensions, a relationship is established between one of the quantitative variables and the other variables[34]. The Turb equation as a function of the physicochemical parameters will be as follows (Equation 1):
Where
-
y is denoted as the expected value of the predictor variable, A0; A1; A2; … Ak is the regression coefficients associated with the independent variables X1; X2; X3… Xk , respectively and ε is denoted the random error.
The software “MINITAB16” was used to process the statistical model, it is downloadable from the website: (https://minitab.informer.com/16.2/). The analysis of variance (ANOVA) was applied to predict the fitness and significance of the regression model.
3. Results and discussion
3.1. Surface water quality parameter
The temporal variations analysis results during the period 2014-2018 of the physico-chemical parameters waters sampled at the Cheurfa dam and their summary are presented in Tables 2 and 3. Box-plot graphs for water quality data are shown in Figure 2, highlighting that the average of T° values vary between 17.9 °C and 21 °C with a maximum of 30 °C (Figure 2), this value recorded exceeds the 25 °C standard[35]. The water pH (Table 2) shows the average values recorded ranging from 7.54 to 8.12, it indicates a low to medium alkaline water, and these values correspond to the Algerian standard for the quality of surface water intended for drinking water supply[35] where the standard range for pH is set at 6.5 ≤ pH ≤ 9. Just as important is the EC, it reflects the overall degree of mineralization and provides information on the salinity rate[36]. The average values obtained fluctuate between 1974.92 μS/cm and 2568.33 μS/cm and indicate highly mineralized water that is difficult to use in irrigated areas according to Rodier et al.[31]. Since Turb depends on the presence of suspended solids in the water such as organic debris, clays, microscopic organisms..., the quantification of these suspended solids measures its degree[37]. The monitoring of this parameter indicates a maximum value registered of 105 NTU and a minimum of 4.18 NTU (Table 3), the highest value was observed during a heavy precipitation of 53.7 mm in a winter period (January 2014). In fact, after heavy precipitation, Turb can exceed 100 and even 200 NTU[31]. Water with high turbidity is a hindrance to the effectiveness of microbial decontamination treatment, even when the free residual chlorine was sustained for more than an hour[31]. The indicative value set by decree n°11-125-03/2011 [38] relating to the quality of drinking water is 5 NTU. Regarding TSS, Table 3 displays a maximum value of 90 mg/L and a minimum of 1.5 mg/L with an annual average of 27.80 mg/L. The measurements obtained during the rainy period exceed the limit value of 25 mg/L[35]. As for DO, its concentration gives us information on the level of pollution and consequently on the degree of self-purification of water source[39]. The observations in Table 3 reveal a maximum DO concentration of 13.1 mg/L and a minimum of 4.1 mg/L with an annual average of 8.23 mg/L. The highest and lowest levels of DO were observed in wet and dry periods respectively, this is consistent with conclusions of Hébert and Légaré[40] indicating that water at low temperature contains more dissolved oxygen than at high temperature. The COD and BOD5 value varied between 19 mg/L and 118 mg/L; 4.5 mg/L and 16.1 mg/L with an average of 59.4 mg/L and 9.6 mg/L respectively (Table 3), these high values exceeded the standard of 30 mg/L and 7 mg/L[35]. According to the ANRH[41] normative grid, these waters belong to category 3 (highly polluted waters). For nutrient concentrations, Table 3 illustrates the extreme measures noted 2 mg/L and 41 mg/L for NO3-, 0.007 mg/L and 0.8 mg/L for NO2- with an annual average values of 16.10 mg/L and 0.27 mg/L respectively. These results clearly imply the presence of acceptable condition below the upper limit set by decree n°11-219-06/2011[35] for NO3- and NO2-. The high concentrations in specific periods of the year are probably related to the fertilization practices in the area and to the fertilizer runoff caused by the seasonal rainfalls. For NH4+, the maximum values were registered in the wet season (1.62 mg/L) and the lowest ones (0.01 mg/L) were observed in the summer period with an average value of 0.55 mg/L; these amounts are above the upper limit of the acceptable water quality range (0.01 mg / L to 0.1 mg / L). Rezak et al.[27] and Papin et al.[42] have found similar results in surface water samples used for drinking water production. The PO43- concentrations varied between a minimum of 0.1 mg/L and a maximum of 2.01 mg/L, these results remain much lower than those reported by Akatumbila et al.[43] who showed maximum PO43- concentrations in the order of 39.48 mg/L. Nevertheless, our results are higher compared to those obtained by Allalgua et al.[44] in the Dam of Foum El-Khanga (East of Algeria) where maximum level of phosphate ions found was only 0.13 mg/L. According to the ANRH[41], this water can be classified in 3rd category (poor quality water from 0.1 mg/L to 3 mg/L in PO43-).
3.2. Principal component analysis (PCA)
The principal component analysis results are shown in Table 5 and Figures 3 and 4. The correlation matrix (Table 4) gives a first insight of the existing association between the studied parameters and relates the common origin of the studied elements. Linear correlations are observed between the parameters measured during our study and are shown in bold in Table 4. The analysis elucidates the relationship between the physicochemical parameters and the extraction of the most relevant variables correlated with turbidity. The determination of the principal sources of pollution is explained by the contribution of each element to the formation of the three main factors (Table 5). Our findings reveal three axes that express 75.5% of the information contained in the matrix of input variables, of which factor 1 (F1), factor 2 (F2) and factor3 (F3) summarize 36.5%, 26.7% and 12.26% respectively. Projection of the variables on the F1-F2 axis (Figure 3) makes it possible to distinguish the groups of variables having certain conformity among them, the first group of elements best explained by the F1 are: (Turb, TSS, NO2-, NO3-) in its positive part and are highly correlated to each other (Table 4), their evolution is inverse to T° (negatively correlated with F1). Similar results were reported in study conducted by Soltani et al.[26]. Authors indicated a negative association of temperature with the F2 axis which explains 9.25% of the total and positive deviation with nutrient loads, which occur more intensely in winter due to rainfall-runoff. The second group DO expresses the element that contributes to the formation of the F1 in its positive part and opposite to the group formed by BOD5 and COD which are negatively correlated to F1. This explains the degradation of the organic matter consumable of dissolved oxygen by chemical and biochemical processes. The F1 axis confirms the increase in turbidity in the wet season at low temperatures and shows that this quality parameter is an indication of the presence of mineral and organic particles in suspension in the water. From a health point of view the increase in turbidity influences the microbiological and chemical characteristics of the water through the adsorption of microorganisms or chemical particles on the suspended matter and consequently makes disinfection of this water difficult[ 31]. F2 defines NH4+, PO43-, COD, BOD5, Turb and TSS, in its positive part and DO in its negative part, indicating mineral and organic pollution of domestic origin.
Nitrogen and phosphorus represent the major plant nutrients (algae and phytoplankton), their presence in excess causes eutrophication of the aquatic environment. Results show that point (urban and industrial effluents) and nonpoint sources (agricultural runoff) are the main contributors to organic and nutrient parameters. The result is a real degradation, by increasing the opacity of the water; it indeed limits the amount of incident light for photosynthesis and subsequently decreases the amount of dissolved oxygen and increases the COD and BOD5.. PO43- participates most in the formation of the F2 axis with r = 0.88, this is indicative of the extent of reservoir water eutrophication as reported by Bouzid-Lagha and Djelita[45] when studying the eutrophication of the Hammam Boughrara Reservoir (northwest of Algeria) using PCA method. The two variables EC and pH show weak correlations with the F1 and F2 axes and are defined by the main component 3, which reveals mineralization.
Therefore, the PCA analysis indicates two pollution axes F1 and F2 of anthropogenic origin (domestic, agricultural and industrial) that allowed to estimate the load of the water samples in nutrients, TSS and DO rate, in accordance with previous observations reported by Jurado et al.[46].
The projection on the axis F2-F3 confirms that the turbidity is most correlated to all the other parameters analyzed except EC and pH. Thus, developing a predictive relationship of the turbidity using mathematical models of simulation is crucial to understand variations on water quality in different seasons and to estimate solutions and effective management practices.
3.3. Turbidity prediction results using the MLR model
To model influence and correlation of the investigated water quality parameters on the turbidity, different combinations of water quality parameters were used in the MLR prediction model. By first using the 4 parameters that have significant correlation according to the correlation matrix (Equation 2 in Table 6), then with the all water quality parameters (Equation 3 in Table 6) and then by including the statistically significant variables confirmed by the ANOVA tests (Equation 4). The summary of the models is presented in Table 6.
The validity of the MLR formulations is tested by the p-value at significance thresholds α= 0.05, by the coefficient of determination R2 and the plots of residual values. As can be seen from Table 6, the best results for the prediction of Turb are represented in equation 3 (model 2) and 4 (model 3) with high values of R2 (93.20 and 92.20 respectively) proving that both models are highly correlated. In addition, the associated adj R2 (91.64, 91.31 for models 2 and 3 respectively) were close to the R2 confirming the good correlation between response (Turb) and the fitted models. However, model 3 presented the highest predicted R2 (89.94%) which means that the predicted turbidity could be well calculated by the model. The very low probability value (p = 0.000000), confirmed by the ANOVA tests (Table 7), demonstrated that model 3 is highly significant over the other models and is the best MLR model to be used. Therefore, the variables that most contributed in the prediction of turbidity in Cheurfa dam and were found to be statistically significant (smaller p-values) as being clear in Table 7 are: DO, TSS, COD, EC, NO2-, NO3-. The significant predictive equation is stated below:
Our results are similar to those of Miljojkovic et al.[5] who showed that total suspended solid and dissolved oxygen saturation have the greatest effect on Turb prediction with high precision. Ayanshola et al. [47] adopted MLR model to predict the treated water turbidity from rainfall, coagulant dosage retention time and raw water turbidity and achieved best results with R2 value of 0.731. Amanda et al.[48] applied simple linear regression to predict total suspended solids (TSS) as a function of turbidity, analysis of variance (ANOVA) showed that turbidity has a significant linear relationship with TSS concentration (p-value ≤ 0.01). García et al.[49], confirmed that turbidity can be successfully predicted from ammonium, conductivity, dissolved oxygen, pH and temperature values by employing MLR model. According to Lee et al.[50], Turb is not a concentration of contaminants but a property that represents the “sum” of all other contaminants, with the advantage that it can be easily implemented and measurable than COD, NO2-, NO3-, DO and EC. This method seeks to explain and predict the phenomenon (y) as a function of the explanatory variables (x), based on their effect on Turb.
3.4. Model adequacy checking: Residual analysis
In a linear regression model, a diagnostic step of the residual graphs should not be discounted. The graph versus fits represents the estimated residuals against the values predicted by the model; it is very important to check that the residuals are centered on zero. This graph is based on all samples compared by the variance analysis; they have the same distribution of residuals and show no different structure along the ordinate axis or any particular shape. According to the graph (Figure 5), the residuals are homogeneously distributed around zero. The verification of the normality of the residuals is done by studying their distribution by a simple Henry's line (Normal Probability Plot). According to this linear plot, the residuals are also normally distributed around zero.
4. Conclusions
In the framework of the ecosystem protection, the current study was carried out on assessing and predicting water quality of the Cheurfa dam. it has been shown that the surface waters present a strong mineralization and low contents of DO in the summer period. This phenomenon is on the other hand rare during the rainy season at the time of an algal bloom. Recorded PO43- and NH4+ contents increase especially during the sowing season (use of agricultural fertilizers) and from domestic discharges containing detergents. These nutrients participate largely in the eutrophication of the aquatic environment by influencing the transparency of the water which was confirmed by the high Turb values recorded. In the present study, analysis of turbidity gives a con- crete insight into the dam water pollution, high Turb indicates strong health risks. At the same time, the statistical study shows the visibility of the parameters responsible on water quality changes and gives insights to control the various sources of pollution. Using PCA principal component analysis, the 12 studied variables reduce to only one that is Turb, the controlling element of the two pollution axes F1 and F2. Then, a new equation with only 6 relevant parameters is proposed to simulate turbidity of the Cheurfa dam through the MLR with a high coefficient of determination (R2 = 0.9220), a significant p value << 0.05 and a good distribution of residuals around the mean.