反爬(四)

网站四处罚信息反爬破解

在Chrome中请求列表页时,刷新网页观察网络请求发现,先是跳转到一个中转链接:http://www.pbc.gov.cn/WZWSREL3poZW5nd3Vnb25na2FpLzEyNzkyNC8xMjgwNDEvMjE2MTQyMS9pbmRleC5odG1s?wzwschallenge=V1pXU19DT05GSVJNX1BSRUZJWF9MQUJFTDUxODU2NjI=,然后再跳转回到列表页。

下面我们来具体看看中间的过程:

  1. 请求列表页http://www.pbc.gov.cn/zhengwugongkai/127924/128041/2161421/index.html

    请求头中没有携带Cookie,请求响应返回的Cookie为:wzws_cid=6fe01577596a409d77f565c225c8c5a664a89799db7d0985a62413f32f6ba6e2d15be123681d18238c4216ea84a3b705e2a6611466562bf4c66a22358bff5b241873a84dab388b5da80fc2ecf6a2acf41f764692145f8bb5a57762e2fe08c49c; path=/; expires=Thu, 19 Sep 2019 08:41:13 GMT

    此时请求返回的是一个携带大量加密js的HTML文件;

  2. 自动去请求http://www.pbc.gov.cn/WZWSREL3poZW5nd3Vnb25na2FpLzEyNzkyNC8xMjgwNDEvMjE2MTQyMS9pbmRleC5odG1s?wzwschallenge=V1pXU19DT05GSVJNX1BSRUZJWF9MQUJFTDUxODU2NjI=

    请求头中携带的Cookie为:

    wzws_cid=6fe01577596a409d77f565c225c8c5a664a89799db7d0985a62413f32f6ba6e2d15be123681d18238c4216ea84a3b705e2a6611466562bf4c66a22358bff5b241873a84dab388b5da80fc2ecf6a2acf41f764692145f8bb5a57762e2fe08c49c

    请求响应返回的Cookie为:

    wzws_cid=6fe01577596a409d77f565c225c8c5a664a89799db7d0985a62413f32f6ba6e2d15be123681d18238c4216ea84a3b705b81cba9208ba6c30f9c7a9ed07941d8e; path=/; expires=Thu, 19 Sep 2019 08:41:13 GMT

    此时请求返回的是一个302的重定向页面;

  3. 自动再次请求列表页http://www.pbc.gov.cn/zhengwugongkai/127924/128041/2161421/index.html

    请求头中携带的Cookie为:

    wzws_cid=6fe01577596a409d77f565c225c8c5a664a89799db7d0985a62413f32f6ba6e2d15be123681d18238c4216ea84a3b705b81cba9208ba6c30f9c7a9ed07941d8e

    请求响应中没有返回Cookie;

    但是返回了我们想要获取的列表页内容;

通过观察总结,我们发现步骤2是为了返回给我们请求需要的真正的Cookie,只有携带步骤2返回的Cookie才可以成功请求到数据,所以我们每次只要进行一次步骤2的操作去获取cookie就可以了;但是,步骤2的请求链接是如何获取的呢?

我做了三次请求,分别将步骤2的url(省略了前缀)都复制了下来:

/WZWSREL3poZW5nd3Vnb25na2FpLzEyNzkyNC8xMjgwNDEvMjE2MTQyMS9pbmRleC5odG1s?wzwschallenge=V1pXU19DT05GSVJNX1BSRUZJWF9MQUJFTDg2MzAxMQ==

/WZWSREL3poZW5nd3Vnb25na2FpLzEyNzkyNC8xMjgwNDEvMjE2MTQyMS9pbmRleC5odG1s?wzwschallenge=V1pXU19DT05GSVJNX1BSRUZJWF9MQUJFTDIwMjU3MjM=

/WZWSREL3poZW5nd3Vnb25na2FpLzEyNzkyNC8xMjgwNDEvMjE2MTQyMS9pbmRleC5odG1s?wzwschallenge=V1pXU19DT05GSVJNX1BSRUZJWF9MQUJFTDYxNjI1MzU=

仔细观察,发现变化的是wzwschalleng参数的值,更准确地说,wzwschalleng参数的前半部分V1pXU19DT05GSVJNX1BSRUZJWF9MQUJFTD都是不变的,改变的只有末尾的八九个字符

这里发现wzwschalleng参数的值很像base64编码,我们全部拿来解码:

1
2
3
4
5
6
7
8
9
import base64
base64.b64decode('V1pXU19DT05GSVJNX1BSRUZJWF9MQUJFTDg2MzAxMQ==')
b'WZWS_CONFIRM_PREFIX_LABEL863011'

base64.b64decode('V1pXU19DT05GSVJNX1BSRUZJWF9MQUJFTDIwMjU3MjM=')
b'WZWS_CONFIRM_PREFIX_LABEL2025723'

base64.b64decode('V1pXU19DT05GSVJNX1BSRUZJWF9MQUJFTDYxNjI1MzU=')
b'WZWS_CONFIRM_PREFIX_LABEL6162535'

通过解码对比发现,改变的只有末尾的七位数字;到这里我们就该去思考如何获取每次变化的wzwschalleng或者更精确的后面的这七位数字。

这时我们想到了,第一次请求之后返回的js代码,几乎可以推测就是这段js代码,生成了wzwschalleng的值。

我们在chrome的source栏下面给网页打上script事件断点,这样当网页执行到该js代码时会停下来,便于我们观察,具体操作如图:

注:当我们刷新无法在该js代码处停下时,是因为缓存的原因,清空缓存后,使用Ctrl+F5强制刷新

之后我们看到了全部的js代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
eval(function(p, a, c, k, e, r) {
e = function(c) {
return c.toString(a)
}
;
if (!''.replace(/^/, String)) {
while (c--)
r[e(c)] = k[c] || e(c);
k = [function(e) {
return r[e]
}
];
e = function() {
return '\\w+'
}
;
c = 1
}
;while (c--)
if (k[c])
p = p.replace(new RegExp('\\b' + e(c) + '\\b','g'), k[c]);
return p
}('0 1="2";0 3="4";0 5="6";0 7="8";0 9="a";', 11, 11, 'var|dynamicurl|/WZWSREL3poZW5nd3Vnb25na2FpLzEyNzkyNC8xMjgwNDEvMjE2MTQyMS9pbmRleC5odG1s|wzwsquestion|;YSl^]a|wzwsfactor|2963|wzwsmethod|WZWS_METHOD|wzwsparams|WZWS_PARAMS'.split('|'), 0, {}));

var encode_version = 'sojson.v5'
, jezoh = '__0x3fb5e'
, __0x3fb5e = ['dcK9wotew5nCu2wvw6nCmsOvQcOONsOk', 'K8Kow4fDhzDDqwdh', 'UAATJSU=', 'wr8gw5HCqWw=', 'G8KzKhLDkA==', 'wrLDisOUw4HDiTTCnsKnwqHCg8O2w7XClg==', 'LmrDog4=', 'e8Ora13Dow==', 'wodfacKQw5o=', 'w74Sw5FreA==', 'wr94w6LDhMOgw4E=', 'wpkHw53DgsKKwrHDhcKbQ8Kpwp8=', 'dWPDons=', 'w7kbw7vDgMKb', 'w6DDkFFwwp/Cq3jCjUXDsW8=', 'TBIbBAfDtw==', 'wok5w7/ChDZV', 'wq3CvlzCtw==', 'wrHDsgzClQ==', 'IcORUmfDlcOPDsOSwr06fMKgBMKcTQ==', 'CgjDpSkw', 'w5oWw5vDhMKk', 'CcK4wpLDlEnCjnXClg==', 'w7zDhsKwTMOW', 'w7jDpFXCvcKm', 'wrTDlsOUw6rDtA==', 'w4bDn8KcXsOQVVHDkw==', 'bMOAwr3CsVzDksKTcAc=', 'wodAb8KKw4HCrDBoaA==', 'wrDDlMOUw5LDiQ==', '5Lm26ICj5Yu/6ZmGw5zDgxHCnMOywpZDM8KD', 'VQPChSVsbsOvWMODRMOlwqBAWMKz', 'U8K8TsOnHsKOWMOpb11CwpjDkcOJZTTChAbCixvDtcO0wplFwoZdwrswWcOiwq1sJsOnw50VHhfDgwXDoMKmDBTClsOkJ1RBKkc3YzQYw4zDuEUgY0xEX8KXwrU=', 'wpZYQsKjw7Y=', 'wqLDiMOdw4nDiQ==', 'wpLCqMO8wpPCiQ==', 'w54Aw5bDqsKD', 'K23DsRErcg==', 'eETDt1Nj', 'w7XDhsOhwpfCrg==', 'LBMPdFk=', 'woXDmQ3Cu8Kl', 'eGrDrmxAdA==', 'bcKBwpPDmz3Clw==', 'KmzDghcL', 'w6vDu8K7MQY=', 'wr4+w6fCph0=', 'WsKSWMO7EQ==', 'WMKNwp7Dojg=', 'wosyw6rCnxhP', 'w7LDhcK8FTI=', 'wqDCuVPCoFbCgcOgbm3Dkw==', 'wq4Hw6HCoTDDlg==', 'TgxLwqPDtg==', 'wqXCoVfClmY=', 'wptsw7vDj8OB', 'wqVow4bDt8O5', 'QcOma8O2w4U=', 'D8Ohwo/DvjRK', 'wpZLX8Kyw5o=', 'NcKjDgPDjw==', 'E8K1RcO1w4U1', 'XA8YGy3DrMKqw49/w5A=', 'woctw4bChj0=', 'CMKXw6jDoA7Dmi1Ow4xUGMK4YMKwacO4SMKKc3ldwrQDw4RG'];
(function(_0x5bc68b, _0x259158) {
var _0x102152 = function(_0x1797a6) {
while (--_0x1797a6) {
_0x5bc68b['push'](_0x5bc68b['shift']());
}
};
_0x102152(++_0x259158);
}(__0x3fb5e, 0x123));
var _0x56ae = function(_0xca96c7, _0x241ea9) {
_0xca96c7 = _0xca96c7 - 0x0;
var _0x57cca1 = __0x3fb5e[_0xca96c7];
if (_0x56ae['initialized'] === undefined) {
(function() {
var _0x228394 = typeof window !== 'undefined' ? window : typeof process === 'object' && typeof require === 'function' && typeof global === 'object' ? global : this;
var _0x356c10 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';
_0x228394['atob'] || (_0x228394['atob'] = function(_0x16460d) {
var _0x4e207e = String(_0x16460d)['replace'](/=+$/, '');
for (var _0x15f638 = 0x0, _0x2abf93, _0x3df9f8, _0x479e2a = 0x0, _0x411a0f = ''; _0x3df9f8 = _0x4e207e['charAt'](_0x479e2a++); ~_0x3df9f8 && (_0x2abf93 = _0x15f638 % 0x4 ? _0x2abf93 * 0x40 + _0x3df9f8 : _0x3df9f8,
_0x15f638++ % 0x4) ? _0x411a0f += String['fromCharCode'](0xff & _0x2abf93 >> (-0x2 * _0x15f638 & 0x6)) : 0x0) {
_0x3df9f8 = _0x356c10['indexOf'](_0x3df9f8);
}
return _0x411a0f;
}
);
}());
var _0x172d34 = function(_0xa28d48, _0x346449) {
var _0x55c23f = [], _0x3809ab = 0x0, _0x5298ee, _0x3c825f = '', _0x8b8e9a = '';
_0xa28d48 = atob(_0xa28d48);
for (var _0xee1bef = 0x0, _0x3023b5 = _0xa28d48['length']; _0xee1bef < _0x3023b5; _0xee1bef++) {
_0x8b8e9a += '%' + ('00' + _0xa28d48['charCodeAt'](_0xee1bef)['toString'](0x10))['slice'](-0x2);
}
_0xa28d48 = decodeURIComponent(_0x8b8e9a);
for (var _0x308939 = 0x0; _0x308939 < 0x100; _0x308939++) {
_0x55c23f[_0x308939] = _0x308939;
}
for (_0x308939 = 0x0; _0x308939 < 0x100; _0x308939++) {
_0x3809ab = (_0x3809ab + _0x55c23f[_0x308939] + _0x346449['charCodeAt'](_0x308939 % _0x346449['length'])) % 0x100;
_0x5298ee = _0x55c23f[_0x308939];
_0x55c23f[_0x308939] = _0x55c23f[_0x3809ab];
_0x55c23f[_0x3809ab] = _0x5298ee;
}
_0x308939 = 0x0;
_0x3809ab = 0x0;
for (var _0x66c563 = 0x0; _0x66c563 < _0xa28d48['length']; _0x66c563++) {
_0x308939 = (_0x308939 + 0x1) % 0x100;
_0x3809ab = (_0x3809ab + _0x55c23f[_0x308939]) % 0x100;
_0x5298ee = _0x55c23f[_0x308939];
_0x55c23f[_0x308939] = _0x55c23f[_0x3809ab];
_0x55c23f[_0x3809ab] = _0x5298ee;
_0x3c825f += String['fromCharCode'](_0xa28d48['charCodeAt'](_0x66c563) ^ _0x55c23f[(_0x55c23f[_0x308939] + _0x55c23f[_0x3809ab]) % 0x100]);
}
return _0x3c825f;
};
_0x56ae['rc4'] = _0x172d34;
_0x56ae['data'] = {};
_0x56ae['initialized'] = !![];
}
var _0x190c72 = _0x56ae['data'][_0xca96c7];
if (_0x190c72 === undefined) {
if (_0x56ae['once'] === undefined) {
_0x56ae['once'] = !![];
}
_0x57cca1 = _0x56ae['rc4'](_0x57cca1, _0x241ea9);
_0x56ae['data'][_0xca96c7] = _0x57cca1;
} else {
_0x57cca1 = _0x190c72;
}
return _0x57cca1;
};
function _0x412a72(_0x2a28c0) {
var _0x4257c9 = {
'bwGZX': _0x56ae('0x0', 'jo5I'),
'mGirf': function _0x2eb028(_0x5ab0bc, _0x5505f4) {
return _0x5ab0bc < _0x5505f4;
},
'hOkXt': function _0x16449b(_0x22286c, _0x41c8cd) {
return _0x22286c & _0x41c8cd;
},
'RJeYY': function _0x24beb6(_0x59303b, _0x576d3b) {
return _0x59303b == _0x576d3b;
},
'cFxMb': function _0x45b03c(_0xadce3d, _0x5416a9) {
return _0xadce3d >> _0x5416a9;
},
'spzgJ': function _0x3c313d(_0x19fd11, _0xcacabb) {
return _0x19fd11 << _0xcacabb;
},
'VdlKD': function _0x2427d5(_0x23b25b, _0x23b39e) {
return _0x23b25b & _0x23b39e;
},
'VDeWo': function _0x1ef1b0(_0x476993, _0x40dd2a) {
return _0x476993 == _0x40dd2a;
},
'gHLRp': function _0x16afb3(_0x4bdebb, _0x1065a7) {
return _0x4bdebb >> _0x1065a7;
},
'biRta': function _0x301047(_0x2ada60, _0x1c4232) {
return _0x2ada60 | _0x1c4232;
},
'oKMpY': function _0x1d0b02(_0x547e37, _0x500868) {
return _0x547e37 << _0x500868;
},
'HlUXJ': function _0x21902c(_0x16ae1a, _0x466bbf) {
return _0x16ae1a >> _0x466bbf;
},
'vuJTm': function _0x2fea95(_0x34f7b5, _0x59e46f) {
return _0x34f7b5 << _0x59e46f;
},
'lHuwG': function _0x1339d0(_0x3c775a, _0x3450ae) {
return _0x3c775a >> _0x3450ae;
},
'fpeDs': function _0x52b661(_0x318fc3, _0x59aa7b) {
return _0x318fc3 & _0x59aa7b;
},
'HqwlU': function _0x2144ca(_0x4799d4, _0x25b745) {
return _0x4799d4 | _0x25b745;
},
'nPBKx': function _0x42b833(_0xe339b1, _0x5c500c) {
return _0xe339b1 & _0x5c500c;
},
'ZRhVT': function _0xc9529d(_0x5ed560, _0x4383da) {
return _0x5ed560 & _0x4383da;
},
'bdZKt': _0x56ae('0x1', '5jBa')
};
var _0x6c47cd = _0x4257c9[_0x56ae('0x2', 'LFWf')][_0x56ae('0x3', 'Q@8l')]('|')
, _0x3a5836 = 0x0;
while (!![]) {
switch (_0x6c47cd[_0x3a5836++]) {
case '0':
_0x27d1f5 = '';
continue;
case '1':
var _0x27d1f5, _0x4262d0, _0xc876d4;
continue;
case '2':
_0x4262d0 = 0x0;
continue;
case '3':
while (_0x4257c9[_0x56ae('0x4', '*h#g')](_0x4262d0, _0xc876d4)) {
_0x5526a7 = _0x4257c9[_0x56ae('0x5', 'a6w(')](_0x2a28c0['charCodeAt'](_0x4262d0++), 0xff);
if (_0x4257c9['RJeYY'](_0x4262d0, _0xc876d4)) {
_0x27d1f5 += _0x2097d8[_0x56ae('0x6', ')Z%%')](_0x4257c9[_0x56ae('0x7', 'iAGA')](_0x5526a7, 0x2));
_0x27d1f5 += _0x2097d8['charAt'](_0x4257c9[_0x56ae('0x8', 'IM$w')](_0x4257c9[_0x56ae('0x9', 'Dk(l')](_0x5526a7, 0x3), 0x4));
_0x27d1f5 += '==';
break;
}
_0x138cf5 = _0x2a28c0['charCodeAt'](_0x4262d0++);
if (_0x4257c9[_0x56ae('0xa', 'HLR(')](_0x4262d0, _0xc876d4)) {
_0x27d1f5 += _0x2097d8[_0x56ae('0xb', 'iAGA')](_0x4257c9['gHLRp'](_0x5526a7, 0x2));
_0x27d1f5 += _0x2097d8[_0x56ae('0xc', 'j%QO')](_0x4257c9[_0x56ae('0xd', ')Z%%')](_0x4257c9[_0x56ae('0xe', 'L6ge')](_0x4257c9[_0x56ae('0xf', '02EH')](_0x5526a7, 0x3), 0x4), _0x4257c9[_0x56ae('0x10', '5jBa')](_0x4257c9[_0x56ae('0x11', 'j%QO')](_0x138cf5, 0xf0), 0x4)));
_0x27d1f5 += _0x2097d8[_0x56ae('0x12', '02EH')](_0x4257c9[_0x56ae('0x13', 'L6ge')](_0x4257c9['VdlKD'](_0x138cf5, 0xf), 0x2));
_0x27d1f5 += '=';
break;
}
_0x4093e6 = _0x2a28c0[_0x56ae('0x14', '%FZJ')](_0x4262d0++);
_0x27d1f5 += _0x2097d8[_0x56ae('0x15', 'd2rH')](_0x4257c9['lHuwG'](_0x5526a7, 0x2));
_0x27d1f5 += _0x2097d8['charAt'](_0x4257c9[_0x56ae('0x16', 'Zp5!')](_0x4257c9['VdlKD'](_0x5526a7, 0x3) << 0x4, _0x4257c9[_0x56ae('0x17', '%FZJ')](_0x138cf5, 0xf0) >> 0x4));
_0x27d1f5 += _0x2097d8[_0x56ae('0x12', '02EH')](_0x4257c9[_0x56ae('0x18', '*FHt')](_0x4257c9[_0x56ae('0x19', '*FHt')](_0x4257c9['nPBKx'](_0x138cf5, 0xf), 0x2), _0x4257c9[_0x56ae('0x1a', 'scqQ')](_0x4093e6, 0xc0) >> 0x6));
_0x27d1f5 += _0x2097d8[_0x56ae('0x1b', 'eygr')](_0x4257c9['ZRhVT'](_0x4093e6, 0x3f));
}
continue;
case '4':
return _0x27d1f5;
case '5':
_0xc876d4 = _0x2a28c0['length'];
continue;
case '6':
var _0x5526a7, _0x138cf5, _0x4093e6;
continue;
case '7':
var _0x2097d8 = _0x4257c9[_0x56ae('0x1c', 'LFWf')];
continue;
}
break;
}
}
function _0x344cd4() {
var _0x53d9fc = {
'GjCbS': function _0x1a0314(_0x33da81, _0xe25eb5) {
return _0x33da81 < _0xe25eb5;
},
'JBFUL': function _0x1af799(_0x51aa2f, _0x2e4887) {
return _0x51aa2f + _0x2e4887;
}
};
var _0x3c9135 = 0x0;
var _0x43beea = 0x0;
for (_0x43beea = 0x0; _0x53d9fc[_0x56ae('0x1d', 'uGC9')](_0x43beea, wzwsquestion[_0x56ae('0x1e', 'V2r4')]); _0x43beea++) {
_0x3c9135 += wzwsquestion[_0x56ae('0x1f', '!2cw')](_0x43beea);
}
_0x3c9135 *= wzwsfactor;
_0x3c9135 += 0x1b207;
return _0x53d9fc[_0x56ae('0x20', 'd2rH')](_0x56ae('0x21', 'Rau%'), _0x3c9135);
}
function _0x2ff265(_0x26b826, _0xea8bd1) {
var _0x253f74 = {
'ogjLK': _0x56ae('0x22', 'Qy14'),
'izgsL': 'post',
'eMCME': function _0x3b581c(_0xd2391, _0x1a9ef1) {
return _0xd2391 != _0x1a9ef1;
},
'aCWaI': function _0x5c65fc(_0x1402c7, _0x41e446) {
return _0x1402c7 < _0x41e446;
},
'OTFrl': _0x56ae('0x23', 'Rau%')
};
var _0x370b5e = _0x253f74[_0x56ae('0x24', '!2cw')][_0x56ae('0x25', 'i[Ts')]('|')
, _0x1ba457 = 0x0;
while (!![]) {
switch (_0x370b5e[_0x1ba457++]) {
case '0':
_0x15a9ed['method'] = _0x253f74[_0x56ae('0x26', 'uGC9')];
continue;
case '1':
return _0x15a9ed;
case '2':
var _0x15a9ed = document[_0x56ae('0x27', 'Q@8l')](_0x56ae('0x28', ')Z%%'));
continue;
case '3':
if (_0x253f74[_0x56ae('0x29', 'YXCs')](_0xea8bd1['search']('='), -0x1)) {
var _0x573df6 = _0xea8bd1[_0x56ae('0x2a', 'LFWf')]('&');
for (var _0x426cb4 = 0x0; _0x253f74[_0x56ae('0x2b', '57vf')](_0x426cb4, _0x573df6[_0x56ae('0x2c', '*FHt')]); _0x426cb4++) {
var _0x3ddbc7 = _0x56ae('0x2d', 'V]Be')['split']('|')
, _0x1fdb10 = 0x0;
while (!![]) {
switch (_0x3ddbc7[_0x1fdb10++]) {
case '0':
_0x2a293f[_0x56ae('0x2e', 'iAGA')] = _0x422f0a[0x0];
continue;
case '1':
var _0x2a293f = document['createElement'](_0x253f74[_0x56ae('0x2f', 'a6w(')]);
continue;
case '2':
var _0x422f0a = _0x8ad1c0['split']('=');
continue;
case '3':
var _0x8ad1c0 = _0x573df6[_0x426cb4];
continue;
case '4':
_0x15a9ed[_0x56ae('0x30', 'WuNj')](_0x2a293f);
continue;
case '5':
_0x2a293f['value'] = _0x422f0a[0x1];
continue;
}
break;
}
}
}
continue;
case '4':
_0x15a9ed[_0x56ae('0x31', '!2cw')]();
continue;
case '5':
_0x15a9ed[_0x56ae('0x32', '02EH')] = _0x26b826;
continue;
case '6':
_0x15a9ed['style']['display'] = _0x56ae('0x33', '%FZJ');
continue;
case '7':
document[_0x56ae('0x34', 'HLR(')]['appendChild'](_0x15a9ed);
continue;
}
break;
}
}
function _0x33f22a() {
var _0x532424 = {
'hwQpj': function _0x3b4af9(_0x2ff2ab) {
return _0x2ff2ab();
},
'lYfvS': function _0x242f23(_0x57f673, _0x33b4b3) {
return _0x57f673(_0x33b4b3);
},
'VvOsr': function _0x33a26c(_0xb8a476, _0x580dd6) {
return _0xb8a476 + _0x580dd6;
},
'vOmWg': _0x56ae('0x35', 'YXCs'),
'LaaBO': function _0x1b637c(_0x5c57e1, _0x41b90a) {
return _0x5c57e1 == _0x41b90a;
},
'eneJI': 'post'
};
var _0xb14971 = _0x532424[_0x56ae('0x36', 'jo5I')](_0x344cd4);
var _0x10ace8 = _0x532424[_0x56ae('0x37', 'a6w(')](_0x412a72, _0xb14971[_0x56ae('0x38', '*8t[')]());
var _0x35ace3 = _0x532424[_0x56ae('0x39', ')9A&')](dynamicurl, _0x532424[_0x56ae('0x3a', 'N&Yh')]) + _0x10ace8;
if (_0x532424['LaaBO'](wzwsmethod, _0x532424[_0x56ae('0x3b', 'Q@8l')])) {
_0x2ff265(_0x35ace3, wzwsparams);
} else {
window[_0x56ae('0x3c', ')9A&')] = _0x35ace3;
}
}
_0x33f22a();
;if (!(typeof encode_version !== _0x56ae('0x3d', 'QE(m') && encode_version === _0x56ae('0x3e', 'LFWf'))) {
window[_0x56ae('0x3f', 'Q@8l')](_0x56ae('0x40', 'YtnB'));
}
;encode_version = 'sojson.v5';

经过漫长的三秒钟的冷静,开始研究这段一丢丢也看不懂的js代码。

首先要找函数入口,从上到下看过来发现,先是执行了eval(function(p, a, c, k, e, r)...,再就是从函数_0x33f22a()开始执行。

我们在_0x33f22a()函数内部打断点,查看变量值的情况,如下图所示:

我们发现变量_0xb14971的值为WZWS_CONFIRM_PREFIX_LABEL4914759,这个值看起来好像很熟悉,没错正是我们前面base64解码后的值,将该值进行base64编码就是wzwschalleng的值。

那么我们看_0xb14971的值是怎么算出来的,发现他后面调用了_0x344cd4方法,我们进入该方法,打上断点查看变量值的的情况:

我们发现_0x3c9135变量的值是一个七位的数字,后面return _0x53d9fc[_0x56ae('0x20', 'd2rH')](_0x56ae('0x21', 'Rau%'), _0x3c9135),我们用鼠标把这一串选中,发现其值为WZWS_CONFIRM_PREFIX_LABEL8455744,因为我们上面得到的结论是,只要更改最后面这七位数字的值就可以得到每次变化的wzwschalleng的值。所以我们只需要把这个最核心的函数抠出来就可以:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
function _0x344cd4() {
var _0x53d9fc = {
'GjCbS': function _0x1a0314(_0x33da81, _0xe25eb5) {
return _0x33da81 < _0xe25eb5;
},
'JBFUL': function _0x1af799(_0x51aa2f, _0x2e4887) {
return _0x51aa2f + _0x2e4887;
}
};
var _0x3c9135 = 0x0;
var _0x43beea = 0x0;
for (_0x43beea = 0x0; _0x53d9fc[_0x56ae('0x1d', 'uGC9')](_0x43beea, wzwsquestion[_0x56ae('0x1e', 'V2r4')]); _0x43beea++) {
_0x3c9135 += wzwsquestion[_0x56ae('0x1f', '!2cw')](_0x43beea);
}
_0x3c9135 *= wzwsfactor;
_0x3c9135 += 0x1b207;
return _0x53d9fc[_0x56ae('0x20', 'd2rH')](_0x56ae('0x21', 'Rau%'), _0x3c9135);
}

我们的目标是能够独立运行该函数,所以需要去修改这段代码,在for循环之前都是定义变量,无需修改,for循环的终止条件中_0x56ae('0x1d', 'uGC9')其实是"GjCbS",_0x56ae('0x1e', 'V2r4')length

小技巧:我们将这两个字符串,复制到chrome的console中,按回车看输出就知道了;在console中执行copy(_0x56ae('0x1e', 'V2r4'))还可以直接复制出其值。

这里还有两个重要的变量,wzwsquestionwzwsfactor,去哪找呢?这时候想起来,一开始不是执行了一段eval()代码嘛,我们执行一下看看:

太巧了!这段代码,执行完之后,正好初始化了wzwsquestionwzwsfactor这两个变量。

其实,我们仔细观察eval执行时的默认参数值,可以发现这两个值隐藏其中,该函数做的工作只是将原始字符串做了分割之类的工作。

'var|dynamicurl|/WZWSREL2Z6aHNoYW5naGFpLzExMzU3Ny8xMTQ4MzIvMTE0OTE4L2luZGV4Lmh0bWw=|wzwsquestion|GT#rs1mW}J{x,GghI6|wzwsfactor|1574|wzwsmethod|WZWS_METHOD|wzwsparams|WZWS_PARAMS'

这就是那段原始的字符串,所以我们只要写正则匹配一下提出这两个值就可以了,所以最终我们只要改写_0x344cd4方法就行了。

此外,还有一点要注意,原来的js代码中为有这样一个语句_0x3c9135 += 0x1b207;,其实我们保持原样也是可以的运行的,但是有必要指出,0x1b207表示十六进制的111111,不要和这段代码的变量形式_0x...弄混。

下面是完整的爬取代码,包括改写后的_0x344cd4方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# _*_ coding:utf-8 _*_
# @Time :2019/9/18 10:53
import requests
import execjs
import base64
import re


class BankOfChina(object):
Request_Headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36",
"Host": "www.pbc.gov.cn",
"Referer": "http://www.pbc.gov.cn/zhengwugongkai/127924/128041/2161421/index.html",
"Upgrade-Insecure-Requests": '1',
}

def __init__(self):
self._url = 'http://www.pbc.gov.cn/zhengwugongkai/127924/128041/2161421/index.html'
self._url1 = 'http://www.pbc.gov.cn/WZWSREL3poZW5nd3Vnb25na2FpLzEyNzkyNC8xMjgwNDEvMjE2MTQyMS9pbmRleC5odG1s?wzwschallenge={}'
# 为了简化cookie变化方面的操作,这里使用session
self._session = requests.session()
self._factor = None
self._question = None

def request_data_page(self):
for _ in range(3):
try:
response = self._session.get(self._url, headers=self.Request_Headers)
except Exception as e:
print(e)
else:
if response.status_code == 200:
self._factor = int(re.findall(r'wzwsfactor\|(\d+)\|', response.text)[0])
self._question = re.findall(r'wzwsquestion\|(.*?)\|', response.text)[0]
break

@staticmethod
def get_js():
js_str = """
function _0x344cd4(factor, wzwsquestion) {
var _0x53d9fc = {
'GjCbS': function _0x1a0314(_0x33da81, _0xe25eb5) {
return _0x33da81 < _0xe25eb5;
},
'JBFUL': function _0x1af799(_0x51aa2f, _0x2e4887) {
return _0x51aa2f + _0x2e4887;
}
};
var _0x3c9135 = 0x0;
var _0x43beea = 0x0;
for (_0x43beea = 0x0; _0x53d9fc['GjCbS'](_0x43beea, wzwsquestion['length']); _0x43beea++) {
_0x3c9135 += wzwsquestion['charCodeAt'](_0x43beea);
}
_0x3c9135 *= factor;
return _0x3c9135+111111
}
"""
return js_str

def decrypt_fucntion(self):
js_str = self.get_js()
ctx = execjs.compile(js_str)
b = ctx.call('_0x344cd4', self._factor, self._question)
a = base64.b64encode('WZWS_CONFIRM_PREFIX_LABEL{}'.format(b).encode()).decode()
self._url1 = self._url1.format(a)

def request_page(self):
for _ in range(3):
try:
response = self._session.get(self._url1, headers=self.Request_Headers, allow_redirects=True)
except Exception as e:
print(e)
else:
if response.status_code == 200:
response.encoding = 'utf-8'
print(response.text)
break


if __name__ == "__main__":
bank = BankOfChina()
bank.request_data_page()
bank.decrypt_fucntion()
bank.request_page()
# bank.request_data_page()

上面的request_page方法中,也可以设置不允许重定向,那么其返回的状态码应该为302,我们只需要取其返回的cookie,然后拿去再去请求列表页即可。