Reconstruction of shrimp catches in Brazil based on generalized linear models

Catch data comprises important information for assessing the status of several fisheries. However, it is not always available. A modeling approach using generalized linear models was performed to rebuild catch data supported by environmental variables. Catch information was provided by fisheries’ statistical bulletins about pink ( Farfantepenaeus subtilis , F. brasiliensis , and F. paulensis ), white ( Litopenaeus schmitti ), and seabob shrimp ( Xiphopenaeus kroyeri ). Sea surface temperature and rainfall information were collected from open-access databases by meteorological agencies. Due to low species discrimination over time, a general shrimp catch category was added to the models to help disaggregate quantities for each species. The general category was the most relevant variable, whereas temperature indices showed reduction patterns in catches over time, which may indicate the likely effects of temperature increase on shrimp fisheries. Beyond that, extreme peaks and falls tested through residual analysis indicate low reliability mainly in the 1970s and ’80s reports. Information gain varied according to the discrimination ability. States that took longer to discriminate the species presented predictions far from the reports, so the information gains were greater than 100%. Accordingly, reconstructions can be an alternative to restore outdated or missing information and help judge the reliability of official data.


INTRODUCTION
Fishing activity has stood out as an animal protein source, contributing significantly to human food security over time (Oynlola et al., 2018).Since the law of the sea was enacted in 1982 (UNCLOS, 1982), countries have agreed to maintain exploitation levels of stocks so that they could reach maximum sustainable yield and, consequently, introduced monitoring systems to support assessments in several regions worldwide (EC, 2001;MSA, 2006).However, estimating such reference points may not be a simple task, as it requires information about catch data, which, among other variables, are not always available for most exploited resources (Berkson and Thorson, 2015).This issue is mainly aggravated in developing countries due to political instability leading to the lack of public policies and investments.Hence, data from these countries become inconsistent, showing temporal and spatial gaps, and sometimes they may not even exist (Zeller et al., 2016).
Catch series are considered one of the most pieces of information about a fishery (Pauly and Zeller, 2003).When catch data is not available, inconsistent, or incomplete, reconstruction techniques can be used to avoid these problems.Among the goals of reconstruction, there is the estimating values to fill gaps or replace invalid data.Time series reconstructions can be performed in several ways, including by applying statistical interpolation techniques in linear models (polynomial regressions of n degree) or non-linear models (Freire et al., 2021;Pitcher, 2005).Variables such as data on world consumption per capita, international fish trade (IOTC, 2015), or sectors' relative contributions (industrial and artisanal) can be added to these models.Oceanographic variables such as temperature, wind direction, and rainfall can also be useful to fill gaps and to understand their effects on catches (Tesfamichael and Pauly, 2011).
Reconstructed catch series play an important role in assessing the accuracy of historical data and at the same time judging whether reports from official bodies are reliable, or not.This correction is necessary because the resulting data can be applied to stock assessment models, which are crucial for developing effective environmental policies that are as accurate and reliable as possible (Nash et al., 2017).
Shrimp fishing is an important part of the marine-origin protein supply since it is mainly carried out in areas of strong environmental gradient, such as estuarine regions, which are influenced by several seasonal environmental variables, such as rainfall and temperature (Behzadi et al., 2019).Temperature acts as a trigger for reproductive maturation, and rainfall brings continental sediment, that increases organic matter contents to feed the juveniles.Thus, rainfall increase favors shrimp recruitment and maintains the longevity of the fishery (Pratiwi and Sukardjo, 2018).
According to reconstruction studies, bottom trawl fisheries contributed to 23% of the world's catches from 1950 to 2014 (Cashion et al., 2018).This activity expanded to the Western Atlantic regions from the 1950s onwards, and Brazil is one of the countries where shrimp fishing is carried out along the entire coast and has historical, cultural, social, and economic relevance (Branco, 2005).Despite its importance, fishery monitoring in the country remains deficient, and historical records are limited and were interrupted in 2011 (MPA, 2012).It is estimated that 38,729 t of penaeid shrimp were landed in 2011, with seabob shrimp Xiphopenaeus kroyeri (Heller, 1862) accounting for 40% of this amount.Species generically classified as pink shrimp Farfantepenaeus subtilis (Pérez-Farfante, 1967), F. brasiliensis (Latreille, 1817), and F. paulensis (Perez-Farfante, 1967) corresponded to 27% altogether.White shrimp Litopenaeus schmitti (Burkenroad, 1936) accounted for approximately 10% of the total shrimp caught in Brazil (MPA, 2012).
Shrimp data limitation in Brazil was oftentimes highlighted by other authors (D'Incao et al., 2002;Vasconcellos et al., 2011).Some efforts were made to mitigate this situation by reconstructing general catch information about fish, crustaceans, and shrimp countrywide (Freire et al., 2015;2021).However, these efforts did not focus on shrimp fishing.Therefore, they did not take into consideration the influence of environmental variables essential for these fisheries, nor the detachment of catches concerning species from broader shrimp-fishing categories.They did not judge the reliability level of the reported information either.
Therefore, the present study aimed to fill these information gaps in the shrimp catch series in Brazil using generalized linear models (GLMs).These models were predictively used to estimate catch values to fill the gaps and to replace the low credibility available data.Reconstructed series are alternative input data in stock assessment analyses since they can generate subsidies for shrimp fishery management.

Database
Shrimp catch information from several sources was provided by government fisheries' statistical bulletins.All this information is filed in Chico Mendes Institute of Biodiversity Conservation (Instituto Chico Mendes de Conservação da Biodiversidade-ICMBio) (MMA; IBAMA, 1990).Several catch data were Sea surface temperature (SST) near each coastal state was provided by the Physical Sciences Laboratory (PSL) of the National Oceanic and Atmospheric Administration (NOAA).While it has a low spatial resolution (1º lat/lon), this data can provide the temporal range of shrimp catch data since the 1940s (PSL; NOAA, 2020).
Rainfall information was collected from the public meteorological database of the National Institute of Meteorology (Instituto Nacional de Meteorologia-INMET, 2020), which gathers meteorological information from several meteorological stations throughout Brazil.
Both SST and rainfall data regarded locations close to estuarine regions in each state of Brazil (Barioto et al., 2017) (Fig. 1).

Shrimp catch series
The amount of shrimp in fishing reports was not available in detail over the years.For a long time, all species were reported together without discrimination, under the category of marine shrimp.In more recent decades, the amounts of pink, white, and seabob shrimp started being reported separately.To account for the years when shrimp catches were poorly discriminated or not reported, a general category of shrimp fisheries was created to comprise all reported amounts of shrimp over time, including marine shrimp and the catches of pink, white, and seabob shrimp.This was done to extract the amount of each species in years when detailed reporting was not available.
Total marine catch and crustaceans were used as percentages relative to the general shrimp category to fill data gaps when no shrimp information was available.This was typically necessary for the 1940s, 1950s, and after 2008.
In cases in which states had incipient discrimination of species, their quantities were initially filled based on the percentage of each species relative to the general shrimp category.This generated quantities for a few years that could serve as input data in the generalized additive models.In this way, starting quantities of catch for species with a high scarcity of discriminated data could be disaggregated from the higher-order categories using the real proportions of each one of them (Bultel et al., 2015).

Sea surface temperature and rainfall
Indices of means, amplitude, and coefficient of variation were created based on rainfall and SST data.These indices were used as explanatory variables to summarize annual variations into a single value that could be associated with the corresponding catch information.Monthly rainfall and SST data were analyzed separately for the dry and rainy seasons to take into account seasonal variations that occur across the states.This approach was based on studies of atmospheric systems and the influence of oceans on climate, as described in Minuzzi et al. (2007).

Generalized linear models
The structure of herein used GLMs is expressed by Eq. 1: In which: E(Y) = the expectation (mean) of response variable Y; X = the matrix with the explanatory variables; β = the vector of parameters to be estimated; g = a link function (McCullagh and Nelder, 1989).
Reconstruction of shrimp catches in Brazil based on generalized linear models Two continuous probability distributions, gamma and inverse gaussian, were tested for the response variables.Both distributions are continuous, asymmetric, and suitable for modeling a positive-valued random variable, such as the catch data.The identity and logarithmic functions were tested as link functions for both distributions.
Unlike the common models used in time series analysis, which typically treat time as the dependent variable, GLM assumes independence of observations.The inclusion of variable time (years) addresses this issue, but its association with catches is modeled in a quadratic and sinusoidal way to account for the non-linear relationship between time and catches.
All variables adopted herein were of continuous type and incorporated into the models using a square root transformation due to the positive asymmetry of the catches and the high magnitude of the rainfall information.Accordingly, they were less likely to generate outliers in the models.
The Akaike information criterion (AIC) (Akaike, 1974) was used for term selection, starting with the model containing only the intercept and then adding variables (one by one) until reaching a final model that accounted for the lowest AIC value.This process was used to determine which variables to include in the models.
Table 1.Response and explanatory variables used to formulate the models."C" represents the catch, "sst" is sea surface temperature, "rain" is rainfall, "avg" is average, "cv" is coefficient of variation, "sin" is sine, "rain" is rainy period, "dry" is dry period and "amp" represents the amplitude.Comparisons were made by using the Akaike criterion and the residual analysis to select the model with the best distribution and link function (McCullagh and Nelder, 1989;Coelho et al., 2020).Graphical analysis was performed to identify biases in the models, such as curvatures of Pearson's standardized residuals.More stringent tests, such as the Breusch-Pagan (p < 0.05) and Shapiro-Wilk (p < 0.05) tests, were conducted to assess the homogeneity of variance and normality of residuals.In addition, leverage and Cook's distances were calculated, with significant high-leverage points identified using a Cook's distance greater than 0.5 (Dobson, 2002).Reported catches that somehow caused leverage or significant discrepancy, as described before, were classified as "low credibility" or "unreliable" information; therefore, they were excluded from the final model.

Response variables
All analyses in the present study were carried out in the "stats" standard library in R statistical computing language, using the RStudio interface (R Core Team, 2022).

Input series
The inputted time series comprised information about the bulletin reports, but in some cases it was necessary to have estimated information to compose the initial input due to the low availability of reports.Overall, the Northeastern region (MA, PI, CE, RN, PB, PE, AL, SE, and BA) reported fewer amounts of the assessed species in comparison to states in Southeastern and Southern Brazil, a fact that required more initial estimates.General reports started being issued in the 1940s, but species discrimination started about 1970 in very small numbers.About 60% of the 17 assessed states did not have any data available concerning the time lapse from 1991 to 1994; these missing data were replaced by the arithmetic average of catches from 1986 to 1989 in bulletins.This procedure was not used only in six states, namely: Pará, Piauí, Paraíba, Pernambuco, Alagoas, and Sergipe.Therefore, gaps in, or non-existing data were kept in the records.Besides that, recorded catches increased over time for different species, but fishery monitoring in Brazil was interrupted in 2011 (Fig. 2).
Reports started being issued in the 1980s in the coastal states forming the Northern region of Brazil (AP and PA); they were mostly related to pink shrimp.Catches recorded for the Northeastern region date back to the late 1970s and the white and seabob shrimps were the most common species in the reports.It is also important to point out that Piauí, Ceará, Rio Grande Silva MLS, Andrade HA do Norte, and Paraíba states had less information available; therefore, more initial estimates were needed, mainly after 1996.By the late 1970s, several information about the species had already been detailed in bulletins in the Southeastern region.Seabob shrimp overcame captures in Southeastern states.Seabob and pink shrimps were also more often found in the Southern reports, that have shown the highest peaks over time (Fig. 2).

Model selection and diagnosis
Overall, the inverse gaussian distribution presented some difficulty in converging the dispersion parameters due to the response variable (catch) that has presented low variance, in some cases.The gamma distribution did not show such difficulties; it stood out in most models for showing the best adjustments and the lowest AIC values.Models were not bias-free; in most cases, some input data were causing significant residuals or some leverage in the models (cook distance ≈ 0.5).This outcome was more evident in low availability data cases whose any point ends up having a great influence on the models.Therefore, any exclusion could cause strong discrepancies in residuals.
The tests used to evaluate the normality and homoscedasticity of residuals showed that almost all models lack enough evidence to reject the normality and homoscedasticity hypotheses.Probability values (p-values) were greater than 0.05 in most Figure 2. Input time series in the models, excepting the arithmetic mean reported between 1991 and 1994, about the catch of pink shrimp (in solid red line) reported in statistical bulletins between 1946 to 2011, for white shrimp (in solid blue line), and for seabob shrimp (in solid green line).Initial estimates for each species are also represented by the same colors as the reported amounts, but in dashed lines.States and their respective regions are represented by their abbreviated names.800 400 0 AP(N) 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990   Concerning pink shrimp models, the general shrimp category showed the greatest explanatory power among the tested variables; therefore, it was incorporated into all models.The quadratic form showed a strong correlation to catches in cases in which the year was relevant, causing positive effects in models and creating relationships stronger than the general category in some situations.Environmental indices expressing variations in the dry and rainy seasons seem to be more closely correlated to catches in the Northeastern region.These environmental indices had both positive and negative effects on response variables, but it is possible observing that precipitation indices, based on the dry or rainy period, had slight negative effects on captures.The gamma distribution with log link function stood out in most fitted models accounting for the lowest AIC.
The best models adjusted for white shrimp showed similar cases.The general category was relevant and induced positive effects on response variables.Year, in its quadratic form, was also the most relevant variable; environmental indices were equally present in all states, except for Santa Catarina, which presented captures only correlated to precipitation.It is worth mentioning that slight negative effects were observed depending on the rainy season indices (mainly due to rainfall); overall, the ones based on the dry seasons showed positive effects on the response variable.When it comes to probability distributions and the tested link functions, gamma distribution stood out in almost all cases presenting the lowest AIC values, without link function prevalence.
Models for seabob shrimp were quite similar to the white ones.Differences were observed for the variable year, which was relevant in both the quadratic and sinusoidal forms.Environmental indices were equally present for all states, except for Santa Catarina, where variables were based on rainfall again.As for the effects, the pattern was observed at SST indices based on the dry season, and it had a negative effect on response variables; this finding points out that a temperature-amplitude increase would be correlated to catching decreased.Gama distribution was also much more relevant in combination with the identity link function.

Model predictions
Pink shrimp reconstructions in the Northern region showed low catch amounts at the beginning of the series; the largest amounts were observed from the 1980s onwards.There was little information about pink shrimp; therefore, they had to be removed from models.The maximum of 7,596 t caught in Pará state in 2006 was replaced by the prediction of 23,026 t (200% higher); this finding indicates quantities above the reported production.
The Northeastern region showed a small increase in catches from the 1970s onwards.In some cases, such as catches in Maranhão, Piauí, Ceará, and Paraíba states, it rose at first and dropped down in the following decades; in the other ones, catch showed an upward trend over time.It is worth mentioning the extreme catch of 1,320 t reported in 1999, in Alagoas state (omitted in the graph), which was removed from the analysis and replaced by the estimate of 138 t (89% smaller).This scenario indicates lower catches in comparison to that extreme peak reported.
Reconstructions for the Southeastern region showed a significant increase in catches since the 1960s.There were more reports on pink shrimp in the Southeastern region than in the Northeastern region, but several outliers were removed from the analyses, for example, the extreme peak of 2,530 t in 1985, in São Paulo state, which was replaced by the 1,755 t prediction (30% smaller).
Wide catch variations were observed in the Southern region, but the moments concerning the greatest catch increases were observed between the 1960s and '80s.The extreme peak of 627 t (omitted in the graph) reported in 1982 in Paraná state, a fact that has caused strong leverage in the model, was outstanding and replaced by the 10 t estimate (98% smaller) (Fig. 3).
Just one piece of information was reported in Pará state about white shrimp catch (4 t, in 1986), but it was replaced by the 410 t estimate, which is much higher than what was reported in that same year (Fig. 4).
Predictions for the Northeastern region showed a growth trend over the years, except for Ceará, Piauí, and Paraíba states, which started at low catches and recorded an increase between the 1960s to 1980s and a decline in them, afterward.The Rio Grande do Norte state stood out with extreme catches around 140 t, in 1981, and 124 t, in 1982, but these numbers were replaced by the 47 t (66% smaller) and 46 t (62% smaller) predictions, respectively.
Reconstructions for seabob shrimp in Northern Brazil showed similar situations to those of white shrimp, namely: a greater increase in the 1970s and '80s, and a decrease in the 1990s.There was little information about the species, with emphasis on Pará state, where isolated reports about 12 t and 2 t, in 1983 and 1984, were replaced by the 303 t and 753 t estimates, which were much higher than those indicated in the bulletins.

8/13
Reconstruction of shrimp catches in Brazil based on generalized linear models for Ceará state, in 1986, was excluded and replaced by the 11 t estimate (63% smaller).
Catches of the Southeastern region showed a significant increase in the 1970s, but it was also followed by significant depletion in the late 1980s.Records of 30 t, in 1999, in Espírito Santo state, and of 1,854 t, in 1990, in São Paulo state, were excluded and replaced by the 77 t (156% higher) and 2,639 t (42% higher) estimates-it was not too different (in absolute terms) from what was reported, despite the visible difference in percentages.
The minimum report of 17 t, in 1985, in Rio Grande do Sul state, stands out, but it was excluded and replaced by a peak of 1,470 t (8,500% higher).These numbers indicate amounts much higher than those reported in the Southern bulletins (Fig. 5).

DISCUSSION
The first official fishery monitoring information in Brazil dates to 1946.More detailed information was added to these bulletins, such as registered fishermen and vessels, over the years, mainly between 1970 and 1980.During this period, shrimp catch was included in the bulletins as marine shrimp.Later, this category was divided into pink, white, and seabob shrimp species, and it meant an update in monitory systems   (IBGE, 1980).
The herein-built models were an attempt to overcome the aforementioned problems.They were composed of different distributions and link functions because catch data can be derived from distinct sources and statistical collection programs, showing distinct probability forms.Tests helped to understand gamma as the more appropriate for their behavior in the present study.
These findings bring up alternatives for modeling catch data and dealing with data reconstruction, and give more flexibility to solve problems by taking into account distinct distributions, link functions, and non-linear relationships between the catches and the response variables.Several reconstructions studies are restricted to using linear estimates between catch and time (Freire  ) 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 , 2021;Ralston et al., 2011;Tesfamichael and Pauly, 2011;Zeller et al., 2016).Such assumptions restrict the catch distribution to a gaussian type, which may not be true given the asymmetric characteristics of catch data.Additionally, linear relationships between response and explanatory variables may not reveal some real non-linear patterns in the data.Besides that, variables such as SST, rainfall, wind, or chlorophyll are at least important to take into account in environmentally driven fisheries.
Rainfall was the main variable that influenced catches and it was the most common variable in the fitted models, but SST also showed some explanatory power, and its indices, mainly those based on dry seasons, stood out with significant negative effects on the response variable.This finding may indicate that an increase in temperature during the lesser rainy seasons may be related to a decrease in catches over time.Several studies have already reported the impacts of climate change on fishery worldwide (Marengo et al., 2011;Sumaila and Lam, 2020).Equatorial Brazilian regions may be the most affected.Estimates of a sharp drop in rainfall by approximately 40% and a significant increase in air temperature (higher than 3°C), between 2070 and 2100, are expected to occur (Marengo et al., 2011).These changes in shrimp fishery tend to modify stock abundance, since high temperatures may exceed the thermal tolerance limits set for shrimp species, and rainfall reduction could mean a lower supply of nutrients in estuarine areas to feed juvenile shrimp (Lopes et al., 2018).
In the 1960s, 1970s, and 1980s, there were improvements in data collection systems (IBGE, 1980) which resulted in strong distinctions between discriminated quantities (since the discrimination system was still in its early stages) and nondiscriminated catches (encompassed by the general category).As the models were composed of the general category, the predictions were taken by this category and were quite different from the incipient discriminated quantities.Therefore, the catch disaggregation among each species made by the models indicated that the amounts of pink, white, and seabob shrimp would be much higher than those reported in the 1960s, 1970s, and 1980s.The distinctions found in these decades also translated into the gain of information, which exceeded 100% for several states due to the delay in advancing catch discrimination.
Significant catch increases were observed due to government subsidies that helped develop fishery in Brazil close to the 1970s (Santos, 2007;SUDEPE-PDP, 1987).Thereafter, a strong decline was observed in shrimp fishery, likely due to stock depletion, which lead to the creation of closed seasons measures in the Northern region (Aragão et al., 2015) and bottom trawling prohibition inside bays, fishing licenses, closed seasons, as well as to the creation of conservation units in Southeastern and Southern Brazil (Begossi et al., 2006).However, this pattern was not observed in the Northeastern region.Some time series of catches have indicated an upward trend in it, without further strong decline, and this may point to low depletion in these fisheries over time (Lopes et al., 2014).
Species predominance in data reports for some states probably results from the higher occurrence and the consequent economic relevance, such as pink shrimp (F.subtilis) in the North (Aragão et al., 2015), seabob shrimp (X.kroyeri) in the Northeast (MPA, 2012) and seabob shrimp and pink shrimp (F.brasiliensis and F. paulensis) in the Southeast regions (Branco, 2005).Despite the attention given to these species, the other ones have lower occurrence, hence less attention was given to their amounts.Initial catch estimates were more necessary for some states, such as Pará and Amapá (except for pink shrimp); and for Piauí, Ceará, Rio Grande do Norte, and Paraíba states.This outcome resulted from the low development of these fisheries throughout time, which deal with lower and mainly artisanal catch, and with scarce information about this activity (Santos et al., 2016).It makes reconstructions more difficult, given that no report was available; therefore, more initial estimates were needed to help build models, making the predictions very similar between themselves (since, in a state, they derive from the extraction of the same series of general shrimp catches) and reconstructions less reliable.Hence, further uses of this data should be taken with caution.

CONCLUSIONS
Modeling approaches to solving reconstruction problems bring up an alternative to fill the large gap in data about shrimp, and other fisheries.Besides that, it enables judgment about the reliability of data reported in official fishery statistics.Thus, the estimates can be used, carefully, to assess the stock status and to corroborate better fishery management practices to help to maintain shrimp fishing's sustainability, although new studies including reconstructions, after 2011, are also necessary.

Figure 1 .
Figure 1.Spatial distribution of sea surface temperature (SST) estimates (in red) and rainfall data from meteorological stations (in blue).The scale of gray indicates the coastal states included in the analysis and their respective regions.
Reconstructed pink shrimp series.The prediction is represented in black, intervals contained approximately 95% of the estimates (in red).Reported information incorporated into the model is represented by solid blue lines, and information excluded from the analysis is represented by a circle.States and their respective regions are represented by their abbreviated names.approximately 170 t (46% smaller), and it indicates a catch decrease in this time interval.

Figure 4 .
Figure 4. Reconstructed white shrimp series.The prediction is represented in black, with intervals of approximately 95% of the estimates (in light blue).Reported information incorporated into the models is represented by a solid blue line; excluded information is represented by a circle.States and their respective regions are represented by their abbreviated names.

Figure 5 .
Figure 5. Reconstructed seabob shrimp series.Prediction is represented in black, with intervals of approximately 95% of the estimates (in green).Reported information incorporated into the model is represented by a solid blue line; excluded information is represented by a circle.States and their respective regions are represented by their abbreviated names.
Reconstruction of shrimp catches in Brazil based on generalized linear models models, except for pink shrimp in São Paulo state, which presented sufficient evidence to reject the homoscedastic residuals hypothesis, and for white shrimp in Santa Catarina state, which presented evidence to reject the normality of residuals hypothesis.