1 / 29

Using Models to Detect Outliers in Refinery Data

Outline. Objectives of this studyThe bigger picture: Effects of prices on demandHow does EIA define demand?Refinery inputs and outputsAnalysis of refinery operationsQuestions for the Committee. www.eia.doe.gov. Objectives. To review the refinery input and output that report demand for petroleu

seda
Download Presentation

Using Models to Detect Outliers in Refinery Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    2. Outline Objectives of this study The bigger picture: Effects of prices on demand How does EIA define demand? Refinery inputs and outputs Analysis of refinery operations Questions for the Committee

    3. Objectives To review the refinery input and output that report demand for petroleum products To identify possible data inconsistencies To ultimately develop a methodology to detect outliers and improve data quality To obtain your input on our approach

    4. The Bigger Picture Reported refinery production determines product supplied and can impact inventories In the commodity market, gasoline inventory is used as an indicator for potential shortages, which can have profound effects on oil spot and futures market We will look at both demand and supply to assess the overall quality of EIA petroleum supply data

    5. What Does BEA Say? Bureau of Economic Analysis (US Dept. of Commerce) Gasoline and Oil (Chain-type Price Indexes) ? 2000 is the base (100) Real PCE -- PCE adjusted to remove price changes ? (Gasoline & Oil/Chain-type price index)*100 BEA data indicates that PCE on gasoline and oil declined in 2005 from 2004. EIA data, however, shows that gasoline consumption increased in 2005 from 2004. We would like to study the patterns and relationships between refinery inputs and outputs and hopefully identify an approach to flag and edit questionable data points.Bureau of Economic Analysis (US Dept. of Commerce) Gasoline and Oil (Chain-type Price Indexes) ? 2000 is the base (100) Real PCE -- PCE adjusted to remove price changes ? (Gasoline & Oil/Chain-type price index)*100 BEA data indicates that PCE on gasoline and oil declined in 2005 from 2004. EIA data, however, shows that gasoline consumption increased in 2005 from 2004. We would like to study the patterns and relationships between refinery inputs and outputs and hopefully identify an approach to flag and edit questionable data points.

    6. How does EIA Define Demand? EIA defines demand as product supplied, which measures the disappearance of products from primary sources: refineries, natural gas processing plants, blending plants, pipelines, and bulk terminals Product supplied = field production + refinery & blender production + imports - ? stock - refinery inputs - exports In general, product supplied of each refined product in any given period is computed as follows: field production + refinery & blender production + imports - ? stock - refinery inputs - exportsIn general, product supplied of each refined product in any given period is computed as follows: field production + refinery & blender production + imports - ? stock - refinery inputs - exports

    7. Refinery Inputs and Outputs Major inputs: Crude oil, Natural gas liquids, Other liquids Major outputs: Motor gasoline Diesel fuel Jet fuel Residual fuel Liquefied refinery gases Other Natural Gas Liquids include Pentanes Plus and Liquefied Petroleum Gases (Ethane, Propane, Normal Butane, and Isobutane) Other Liquids include Oxygenates (Fuel Ethanol), Unfinished Oils (Heavy Gas Oils and Kerosene and Light Gas Oils) and MGBC Other Outputs include Aviation Gasoline, Kerosene and Asphalt and Road Oil Natural Gas Liquids include Pentanes Plus and Liquefied Petroleum Gases (Ethane, Propane, Normal Butane, and Isobutane) Other Liquids include Oxygenates (Fuel Ethanol), Unfinished Oils (Heavy Gas Oils and Kerosene and Light Gas Oils) and MGBC Other Outputs include Aviation Gasoline, Kerosene and Asphalt and Road Oil

    8. The peak of crude oil inputs occur in the summer time, during the peak driving season. There is a sharp decline in September and then an increase in November and December to kick off the winter heating season. The peak of crude oil inputs occur in the summer time, during the peak driving season. There is a sharp decline in September and then an increase in November and December to kick off the winter heating season.

    9. Unlike Crude Oil Inputs, Motor Gasoline Production’s peak is during the December. The pattern seems normal from January through October, but then there is a rapid increase during the last two months of the year. Unlike Crude Oil Inputs, Motor Gasoline Production’s peak is during the December. The pattern seems normal from January through October, but then there is a rapid increase during the last two months of the year.

    10. Refinery Processing Gain = The volumetric amount by which total output is greater than input for a given period of time. This difference is due to the processing of crude oil into products which, in total, have a lower specific gravity than the crude oil processed. There is generally no seasonal pattern in the processing gain. The peak, like in Motor Gasoline Production, is in December (except 2005). Refinery Processing Gain = The volumetric amount by which total output is greater than input for a given period of time. This difference is due to the processing of crude oil into products which, in total, have a lower specific gravity than the crude oil processed. There is generally no seasonal pattern in the processing gain. The peak, like in Motor Gasoline Production, is in December (except 2005).

    11. Analysis of Refinery Operations Refinery Gain Definition Determinants of Refinery Gain Examples of Regression Analysis What can we say about production of refined products

    12. Refinery Gain Definition Represents the percent of finished product produced from input of crude oil and net input of unfinished oils Before calculating the yield for finished motor gasoline, the input of natural gas liquids, other hydrocarbons and oxygenates, and net input of motor gasoline blending components must be subtracted from the net production of finished motor gasoline. Before calculating the yield for finished aviation gasoline, input of aviation gasoline blending components must be subtracted from the net production of finished aviation gasoline. Before calculating the yield for finished motor gasoline, the input of natural gas liquids, other hydrocarbons and oxygenates, and net input of motor gasoline blending components must be subtracted from the net production of finished motor gasoline. Before calculating the yield for finished aviation gasoline, input of aviation gasoline blending components must be subtracted from the net production of finished aviation gasoline.

    13. Determinants of Refinery Gain Average gasoline yield from a barrel of crude oil is below 15% Refiners can increase production of gasoline by cracking heavy end of the barrel Refiners use catalytic crackers, hydro crackers, and cokers, to convert heavy end of a barrel to lighter products Heavy crude oils contain smaller share of straight run gasoline, therefore, require more cracking to increase gasoline production Catalytic Cracking: The refining process of breaking down the larger, heavier, and more complex hydrocarbon molecules into simpler and lighter molecules. Catalytic cracking is accomplished by the use of a catalytic agent and is an effective process for increasing the yield of gasoline from crude oil. Catalytic Hydrocracking: A refining process that uses hydrogen and catalysts with relatively low temperatures and high pressures for converting middle boiling or residual material to high-octane gasoline, reformer charge stock, jet fuel, and/or high grade fuel oil. The process uses one or more catalysts, depending upon product output, and can handle high sulfur feedstocks without prior desulfurization. Coking: Thermal refining processes used to produce fuel gas, gasoline blendstocks, distillates, and petroleum coke from the heavier products of atmospheric and vacuum distillation. Average gasoline yield from a barrel of crude oil is below 15% Refiners can increase production of gasoline by cracking heavy end of the barrel Refiners use catalytic crackers, hydro crackers, and cokers, to convert heavy end of a barrel to lighter products Heavy crude oils contain smaller share of straight run gasoline, therefore, require more cracking to increase gasoline production Catalytic Cracking: The refining process of breaking down the larger, heavier, and more complex hydrocarbon molecules into simpler and lighter molecules. Catalytic cracking is accomplished by the use of a catalytic agent and is an effective process for increasing the yield of gasoline from crude oil. Catalytic Hydrocracking: A refining process that uses hydrogen and catalysts with relatively low temperatures and high pressures for converting middle boiling or residual material to high-octane gasoline, reformer charge stock, jet fuel, and/or high grade fuel oil. The process uses one or more catalysts, depending upon product output, and can handle high sulfur feedstocks without prior desulfurization. Coking: Thermal refining processes used to produce fuel gas, gasoline blendstocks, distillates, and petroleum coke from the heavier products of atmospheric and vacuum distillation.

    14. Examples of Regression Analysis f(refinery gain) = c + crude inputs + API gravity + sulfur content + coking + cracking + hydrocracking + gasoline production f(gasoline production) = c + crude inputs + API gravity + coking + hydrocracking + cracking We are looking at Volumetric Refinery Gain as opposed to the percentage Refinery and Blender Net Input of Crude Oil Refinery and Blender Net Production Finished Motor Gasoline API Gravity and Sulfur Content are the Crude Oil Input Qualities that affect the processing complexity and product characteristics. The weighted average of the API Gravity is between 30 and 32 degrees. It negatively affected the regression analysis because the majority of the distribution of API Gravity falls below the average, so the percentage total imported Crude Oil by API Gravity. Percentage by interval is used (Less than 35 degrees). API gravity: American Petroleum Institute measure of specific gravity of crude oil or condensate in degrees. An arbitrary scale expressing the gravity or density of liquid petroleum products. The higher the API gravity, the lighter the compound. Light crudes generally exceed 38 degrees API and heavy crudes are commonly labeled as all crudes with an API gravity of 22 degrees or below. Intermediate crudes fall in the range of 22 degrees to 38 degrees API gravity. We are looking at Volumetric Refinery Gain as opposed to the percentage Refinery and Blender Net Input of Crude Oil Refinery and Blender Net Production Finished Motor Gasoline API Gravity and Sulfur Content are the Crude Oil Input Qualities that affect the processing complexity and product characteristics. The weighted average of the API Gravity is between 30 and 32 degrees. It negatively affected the regression analysis because the majority of the distribution of API Gravity falls below the average, so the percentage total imported Crude Oil by API Gravity. Percentage by interval is used (Less than 35 degrees). API gravity: American Petroleum Institute measure of specific gravity of crude oil or condensate in degrees. An arbitrary scale expressing the gravity or density of liquid petroleum products. The higher the API gravity, the lighter the compound. Light crudes generally exceed 38 degrees API and heavy crudes are commonly labeled as all crudes with an API gravity of 22 degrees or below. Intermediate crudes fall in the range of 22 degrees to 38 degrees API gravity.

    15. Regression Time Periods First Trial 1993 – 2005 Second Trial 1993 – 2002 Third Trial 2002 – May 2006 For 1993-2002 trial the 2003 – 2005 data was forecasted using the resultsFor 1993-2002 trial the 2003 – 2005 data was forecasted using the results

    16. Regression Analysis Results Refinery Gain Gasoline Production and Coking highly correlate to Refinery Gain Crude Oil Input and API Gravity weakly correlate to Refinery Gain Less successful in predicting outliers over shorter time periods 93-02 93-05 02-06 Crude input 0.0077 0.003 -0.039 Mogas production 0.12 0.087 0.05 Under 35 0.517 2.58 9.67 Coking 0.076 0.179 0.29 Hydrocracking -0.127 -0.136 -0.111 Cracking -0.039 0.016 0.147 Sulfur content 19507 14260 36527 R-squared 0.791 0.746 0.552 Mogas production is becoming less of a determinant of refinery gain over time, but that under 35, coking, and cracking are becoming more important. Interestingly hydrocracking seems to be relatively stable.  Note that the coefficient on cracking was negative from 93-02, but is positive and significant in 02-06. 93-02 93-05 02-06 Crude input 0.0077 0.003 -0.039 Mogas production 0.12 0.087 0.05 Under 35 0.517 2.58 9.67 Coking 0.076 0.179 0.29 Hydrocracking -0.127 -0.136 -0.111 Cracking -0.039 0.016 0.147 Sulfur content 19507 14260 36527 R-squared 0.791 0.746 0.552 Mogas production is becoming less of a determinant of refinery gain over time, but that under 35, coking, and cracking are becoming more important. Interestingly hydrocracking seems to be relatively stable.  Note that the coefficient on cracking was negative from 93-02, but is positive and significant in 02-06.

    17. Standardized Residuals for Refinery Gain Approximately 95% of the values in this case having normal distribution are within two standard deviations away from the mean. Outliers from 1993-2002 = 4 --> Dec 1993, Mar 2000, Jun 2001, Aug 2001 Outliers for (93-02) = Dec 2003, Feb 2004, Aug 2004, Nov 2004, Dec 2004, May 2005, Oct 2005 Outliers for (93-05) = Feb 2004, Aug 2004, Dec 2004, Jul 2005 Outliers for (02-06) = Feb 2004, Dec 2004Approximately 95% of the values in this case having normal distribution are within two standard deviations away from the mean. Outliers from 1993-2002 = 4 --> Dec 1993, Mar 2000, Jun 2001, Aug 2001 Outliers for (93-02) = Dec 2003, Feb 2004, Aug 2004, Nov 2004, Dec 2004, May 2005, Oct 2005 Outliers for (93-05) = Feb 2004, Aug 2004, Dec 2004, Jul 2005 Outliers for (02-06) = Feb 2004, Dec 2004

    18. Regression Analysis Results Gasoline Production Crude Input, Coking and Cracking highly correlate to Gasoline Production Hydrocracking and API Gravity weakly correlate to Gasoline Production Similarly successful in predicting outliers over shorter time periods 93-02 93-05 02-06 crude input 0.12 0.13 0.21 Under 35 2.49 10.7 18.1 Coking 0.88 0.81 0.35 Hydrocracking 0.245 0.23 -0.045 Cracking 0.635 0.45 0.16 R-squared 0.844 0.842 0.718 Under 35 is growing in significance, but coking etc are decreasing in significance 93-02 93-05 02-06 crude input 0.12 0.13 0.21 Under 35 2.49 10.7 18.1 Coking 0.88 0.81 0.35 Hydrocracking 0.245 0.23 -0.045 Cracking 0.635 0.45 0.16 R-squared 0.844 0.842 0.718 Under 35 is growing in significance, but coking etc are decreasing in significance

    19. Standardized Residuals for Gasoline Production Approximately 95% of the values in this case having normal distribution are within two standard deviations away from the mean. Outliers from 1993-2002 = 3 --> Mar 1993, Mar 1994, Nov 1994 Outlier for (93-02) & (93-05) = Oct. 2005 Outlier for (02-06) = Mar. 2003Approximately 95% of the values in this case having normal distribution are within two standard deviations away from the mean. Outliers from 1993-2002 = 3 --> Mar 1993, Mar 1994, Nov 1994 Outlier for (93-02) & (93-05) = Oct. 2005 Outlier for (02-06) = Mar. 2003

    20. Conclusions There are several indicators that can help us identify outliers: refinery gain, gasoline production, and share of outputs for each product A full refinery model could help us verify the relationship between refinery gain and gasoline production. A more reliable identification method can then be developed

    21. Questions for the Committee Do you think this approach can be useful to identify outliers? The residual terms show serial correlation. Do we need to make any corrections before we use these equations in our data editing routine?

    23. Appendix: Regression Analysis Equation Outputs

    24. Refinery Gain Equation We’re looking at monthly aggregate level data from the Petroleum Navigator on the EIA website. The historical values of refinery processing gain and finished refinery yield data is from 1993-2005. Programmed used ? EViews Basic Regression Analysis – Least Squares Method Std. Error ? reports the estimated standard errors of the coefficient estimates t-Statistic ? computed as the ratio of an estimated coefficient to its standard error, is used to test the hypothesis that a coefficient is equal to zero Probability ? shows the probability of drawing a t-statistic R-Squared ? measures the success of the regression in predicting the values of the dependent variable within the sampleWe’re looking at monthly aggregate level data from the Petroleum Navigator on the EIA website. The historical values of refinery processing gain and finished refinery yield data is from 1993-2005. Programmed used ? EViews Basic Regression Analysis – Least Squares Method Std. Error ? reports the estimated standard errors of the coefficient estimates t-Statistic ? computed as the ratio of an estimated coefficient to its standard error, is used to test the hypothesis that a coefficient is equal to zero Probability ? shows the probability of drawing a t-statistic R-Squared ? measures the success of the regression in predicting the values of the dependent variable within the sample

    25. Dependent Variable: REF_GAIN Sample: 1993M01 2002M12 Included observations: 120 Coefficient Std. Error t-Statistic Prob. C -283.6663 137.2078 -2.067421 0.0410 CRUDE_INPUT 0.007740 0.014957 0.517506 0.6058 GASOLINE 0.120012 0.023734 5.056535 0.0000 API_GRAVITY 0.517493 1.678401 0.308325 0.7584 COKING 0.076027 0.058537 1.298793 0.1967 HYDROCRACKING -0.126848 0.064745 -1.959186 0.0526 CRACKING -0.039760 0.037531 -1.059398 0.2917 SULFUR_ CONTENT 19507.57 8381.549 2.327442 0.0217 R-squared 0.746060 Refinery Gain (1993-2002) We’re looking at monthly aggregate level data from the Petroleum Navigator on the EIA website. The historical values of refinery processing gain and finished refinery yield data is from 1993-2002. Programmed used ? EViews Basic Regression Analysis – Least Squares Method Std. Error ? reports the estimated standard errors of the coefficient estimates t-Statistic ? computed as the ratio of an estimated coefficient to its standard error, is used to test the hypothesis that a coefficient is equal to zero Probability ? shows the probability of drawing a t-statistic R-Squared ? measures the success of the regression in predicting the values of the dependent variable within the sampleWe’re looking at monthly aggregate level data from the Petroleum Navigator on the EIA website. The historical values of refinery processing gain and finished refinery yield data is from 1993-2002. Programmed used ? EViews Basic Regression Analysis – Least Squares Method Std. Error ? reports the estimated standard errors of the coefficient estimates t-Statistic ? computed as the ratio of an estimated coefficient to its standard error, is used to test the hypothesis that a coefficient is equal to zero Probability ? shows the probability of drawing a t-statistic R-Squared ? measures the success of the regression in predicting the values of the dependent variable within the sample

    26. Refinery Gain (2002-2006) Dependent Variable: REF_GAIN Sample: 2002M01 2006M05 Included observations: 53 Coefficient Std. Error t-Statistic Prob.   C -1270.908 407.1756 -3.12127 0.0031 CRUDE -0.039259 0.030793 -1.274930 0.2089 GASOLINE 0.050160 0.054609 0.918538 0.3632 API_GRAVITY 9.675334 3.795627 2.549074 0.0143 COKING 0.290206 0.115907 2.503783 0.0160 HYDROCRACKING -0.111373 0.105438 -1.056289 0.2965 CRACKING 0.147183 0.060132 2.447678 0.0183 SULFUR_ CONTENT 36527.59 24614.91 1.483962 0.1448 R-squared 0.552489 We’re looking at monthly aggregate level data from the Petroleum Navigator on the EIA website. The historical values of refinery processing gain and finished refinery yield data is from 2002 – May 2006 Programmed used ? EViews Basic Regression Analysis – Least Squares Method Std. Error ? reports the estimated standard errors of the coefficient estimates t-Statistic ? computed as the ratio of an estimated coefficient to its standard error, is used to test the hypothesis that a coefficient is equal to zero Probability ? shows the probability of drawing a t-statistic R-Squared ? measures the success of the regression in predicting the values of the dependent variable within the sample This regression is estimated using a shorter series. Results are not as clear. Coefficients have changed.We’re looking at monthly aggregate level data from the Petroleum Navigator on the EIA website. The historical values of refinery processing gain and finished refinery yield data is from 2002 – May 2006 Programmed used ? EViews Basic Regression Analysis – Least Squares Method Std. Error ? reports the estimated standard errors of the coefficient estimates t-Statistic ? computed as the ratio of an estimated coefficient to its standard error, is used to test the hypothesis that a coefficient is equal to zero Probability ? shows the probability of drawing a t-statistic R-Squared ? measures the success of the regression in predicting the values of the dependent variable within the sample This regression is estimated using a shorter series. Results are not as clear. Coefficients have changed.

    27. Gasoline Production Equation We’re looking at monthly aggregate level data from the Petroleum Navigator on the EIA website. The historical values of refinery processing gain and finished refinery yield data is from 1993-2005. Programmed used ? EViews Basic Regression Analysis – Least Squares Method Std. Error ? reports the estimated standard errors of the coefficient estimates t-Statistic ? computed as the ratio of an estimated coefficient to its standard error, is used to test the hypothesis that a coefficient is equal to zero Probability ? shows the probability of drawing a t-statistic R-Squared ? measures the success of the regression in predicting the values of the dependent variable within the sampleWe’re looking at monthly aggregate level data from the Petroleum Navigator on the EIA website. The historical values of refinery processing gain and finished refinery yield data is from 1993-2005. Programmed used ? EViews Basic Regression Analysis – Least Squares Method Std. Error ? reports the estimated standard errors of the coefficient estimates t-Statistic ? computed as the ratio of an estimated coefficient to its standard error, is used to test the hypothesis that a coefficient is equal to zero Probability ? shows the probability of drawing a t-statistic R-Squared ? measures the success of the regression in predicting the values of the dependent variable within the sample

    28. Gasoline Production (1993-2002) We’re looking at monthly aggregate level data from the Petroleum Navigator on the EIA website. The historical values of refinery processing gain and finished refinery yield data is from 1993-2002. Programmed used ? EViews Basic Regression Analysis – Least Squares Method Std. Error ? reports the estimated standard errors of the coefficient estimates t-Statistic ? computed as the ratio of an estimated coefficient to its standard error, is used to test the hypothesis that a coefficient is equal to zero Probability ? shows the probability of drawing a t-statistic R-Squared ? measures the success of the regression in predicting the values of the dependent variable within the sampleWe’re looking at monthly aggregate level data from the Petroleum Navigator on the EIA website. The historical values of refinery processing gain and finished refinery yield data is from 1993-2002. Programmed used ? EViews Basic Regression Analysis – Least Squares Method Std. Error ? reports the estimated standard errors of the coefficient estimates t-Statistic ? computed as the ratio of an estimated coefficient to its standard error, is used to test the hypothesis that a coefficient is equal to zero Probability ? shows the probability of drawing a t-statistic R-Squared ? measures the success of the regression in predicting the values of the dependent variable within the sample

    29. Gasoline Production (2002-2006) Dependent Variable: GASOLINE Sample: 2002M01 2006M05 Included observations: 53 Coefficient Std. Error t-Statistic Prob.   C 1885.102 880.2708 2.141503 0.0374 CRUDE_INPUT 0.206992 0.078673 2.631032 0.0115 API_GRAVITY 18.10115 9.191265 1.969386 0.0548 COKING 0.351424 0.314254 1.118279 0.2691 HYDRO- CRACKING -0.045074 0.289841 -0.155512 0.8771 CRACKING 0.157862 0.164145 0.961721 0.3411 R-squared 0.718402 We’re looking at monthly aggregate level data from the Petroleum Navigator on the EIA website. The historical values of refinery processing gain and finished refinery yield data is from 2002 – May 2006 Programmed used ? EViews Basic Regression Analysis – Least Squares Method Std. Error ? reports the estimated standard errors of the coefficient estimates t-Statistic ? computed as the ratio of an estimated coefficient to its standard error, is used to test the hypothesis that a coefficient is equal to zero Probability ? shows the probability of drawing a t-statistic R-Squared ? measures the success of the regression in predicting the values of the dependent variable within the sampleWe’re looking at monthly aggregate level data from the Petroleum Navigator on the EIA website. The historical values of refinery processing gain and finished refinery yield data is from 2002 – May 2006 Programmed used ? EViews Basic Regression Analysis – Least Squares Method Std. Error ? reports the estimated standard errors of the coefficient estimates t-Statistic ? computed as the ratio of an estimated coefficient to its standard error, is used to test the hypothesis that a coefficient is equal to zero Probability ? shows the probability of drawing a t-statistic R-Squared ? measures the success of the regression in predicting the values of the dependent variable within the sample

More Related